Description The problem of ties Interaction with random seeds Turning off the warnings Author(s) See Also Examples

Interpreting the warnings when distances are tied in an exact nearest neighbor (NN) search.

The most obvious problem with ties is that it may affect the identity of the reported neighbors.
The various NN search functions will return a constant number of neighbors for each data point.
If the `k`

th neighbor is tied with the `k+1`

th neighbor, this requires an arbitrary decision about which data point to retain in the NN set.
A milder issue is that the order of the neighbors within the set is arbitrary, which may be important for certain algorithms.

As such, a warning will be raised if tied distances are detected among the `k+1`

NNs for any of the exact NN search methods.
We only consider exact ties at double precision - previous versions of this package would account for numerical imprecision, but this is no longer the case.
No warning is given for the approximate methods as their use already implies that a certain degree of inaccuracy is acceptable.

In general, the exact NN search algorithms in this package are fully deterministic despite the use of stochastic steps during index construction.
The only exception occurs when there are tied distances to neighbors, at which point the order and/or identity of the k-nearest neighboring points is not well-defined.
This is because, in the presence of ties, the output will depend on the ordering of points in the constructed index from `buildKmknn`

or `buildVptree`

.

Users should set the seed to guarantee consistent (albeit arbitrary) results across different runs of the function. However, note that the exact selection of tied points depends on the numerical precision of the system. Thus, even after setting a seed, there is no guarantee that the results will be reproducible across machines (especially Windows)!

It may ocassionally be appropriate to disable the warnings by setting `warn.ties=FALSE`

.
The most obvious scenario is when `get.index=FALSE`

, i.e., we are only interested in the distances to the neighbors.
In such cases, the presence of ties does not matter as changes to the identity of tied neighbors do not affect the returned distances (which, for ties, are equal by definition).
Similarly, if the seed is set prior to the search, the warnings are unnecessary as the output is fully deterministic.

Aaron Lun

`findKmknn`

and `findVptree`

for examples where tie warnings are produced.

1 2 |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.