merge_knn takes a list of nearest neighbor graphs and merges them into a
single graph, with the same number of neighbors as the first graph. This is
useful to combine the results of multiple different nearest neighbor
searches: the output will be at least as accurate as the most accurate of the
two input graphs, and ideally will be more accurate than either.
Arguments
- graphs
A list of nearest neighbor graphs to merge. Each item in the list should consist of a sub-list containing:
idxan n by k matrix containing the k nearest neighbor indices.distan n by k matrix containing k nearest neighbor distances. The number of neighbors can differ between graphs, but the merged result will have the same number of neighbors as the first graph in the list.
- is_query
If
TRUEthen the graphs are treated as the result of a knn query, not a knn building process. Or: is the graph bipartite? This should be set toTRUEifnn_graphsare the results of using e.g.graph_knn_query()orrandom_knn_query(), and set toFALSEif these are the results ofnnd_knn()orrandom_knn(). The difference is that ifis_query = FALSE, if an indexpis found innn_graph1[i, ], i.e.pis a neighbor ofiwith distanced, then it is assumed thatiis a neighbor ofpwith the same distance. Ifis_query = TRUE, theniandpare indexes into two different datasets and the symmetry does not hold. If you aren't sure what case applies to you, it's safe (but potentially inefficient) to setis_query = TRUE.- n_threads
Number of threads to use.
- verbose
If
TRUE, log information to the console.
Value
a list containing:
idxan n by k matrix containing the merged nearest neighbor indices.distan n by k matrix containing the merged nearest neighbor distances.
The size of k in the output graph is the same as that of the first
item in nn_graphs.
Examples
set.seed(1337)
# Nearest neighbor descent with 15 neighbors for iris three times,
# starting from a different random initialization each time
iris_rnn1 <- nnd_knn(iris, k = 15, n_iters = 1)
iris_rnn2 <- nnd_knn(iris, k = 15, n_iters = 1)
iris_rnn3 <- nnd_knn(iris, k = 15, n_iters = 1)
# Merged results should be an improvement over individual results
iris_mnn <- merge_knn(list(iris_rnn1, iris_rnn2, iris_rnn3))
sum(iris_mnn$dist) < sum(iris_rnn1$dist)
#> [1] TRUE
sum(iris_mnn$dist) < sum(iris_rnn2$dist)
#> [1] TRUE
sum(iris_mnn$dist) < sum(iris_rnn3$dist)
#> [1] TRUE