Overlap between the indices of two nearest neighbor graphs
Source:R/rnndescent.R
neighbor_overlap.RdCalculates the mean average number of neighbors in common between the two graphs. The per-item overlap can also be returned. This function can be useful as a measure of accuracy of approximation algorithms, if the exact nearest neighbors are known, or as a measure of diversity of two different approximate graphs.
Arguments
- idx1
Indices of a nearest neighbor graph, i.e. a matrix of nearest neighbor indices. Can also be a list containing an
idxelement.- idx2
Indices of a nearest neighbor graph, i.e. a matrix of nearest neighbor indices. Can also be a list containing an
idxelement. This is considered to be the ground truth.- k
Number of neighbors to consider. If
NULL, then the minimum of the number of neighbors inidx1andidx2is used.- ret_vec
If
TRUE, also return a vector containing the per-item overlap.
Value
The mean overlap between idx1 and idx2. If ret_vec = TRUE,
then a list containing the mean overlap and the overlap of each item in
is returned with names mean and overlaps, respectively.
Details
The graph format is the same as that returned by e.g. nnd_knn() and should
be of dimensions n by k, where n is the number of points and k is the number
of neighbors. If you pass a neighbor graph directly, the index matrix will be
extracted if present. If the two graphs have different numbers of neighbors,
then the smaller number of neighbors is used.
Examples
set.seed(1337)
# Generate two random neighbor graphs for iris
iris_rnn1 <- random_knn(iris, k = 15)
iris_rnn2 <- random_knn(iris, k = 15)
# Overlap between the two graphs
mean_overlap <- neighbor_overlap(iris_rnn1, iris_rnn2)
# Also get a vector of per-item overlap
overlap_res <- neighbor_overlap(iris_rnn1, iris_rnn2, ret_vec = TRUE)
summary(overlap_res$overlaps)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.0000 0.1333 0.2000 0.1871 0.2667 0.4000