Skip to contents

A place for me to put the old less-structured updates that I post at the top of the README.md file for the package. When they aren’t new any more, they will get moved here. You should look at the Changelog for fuller details.

14 May 2024: Version 0.1.6 has been released to CRAN. The previous release didn’t quite get compatibility with dqrng right so here is another attempt. Also a couple of other bug fixes have been included.

18 April 2024: Version 0.1.5 has been released to CRAN. This is an internal API change to support a forthcoming release of dqrng, so you should notice no changes on upgrading.

18 March 2024: Version 0.1.4 has been released to CRAN. This is a bug fix release. Most notably, it fixes an issue where rnnd_build would fail with metric = "cosine".

08 December 2023: Version 0.1.3 has been released. This deals with some UBSAN and ASAN problems when missing data was present in the k-nearest neighbors graph.

27 November 2023: Frabjous day, rnndescent is now on CRAN. The version number has been bumped to 0.1.1.

24 Nov 2023 A new function rnnd_knn has been added if you just want the k-nearest neighbors graph for a dataset (i.e. no querying). I have also removed some other functions and made some other breaking changes as I prepare for CRAN submission. See the NEWS for details.

19 Nov 2023 The rnnd_build function and rnnd_query functions have been added which simplify creating a knn/building an index and querying it, respectively and should be the main way of using the package. The other functions remain should you need more flexibility. Some functions have been removed: the local scaling and the standalone distance functions. The latter could return in a different package at some point.

13 November 2023. I have added most of the metrics that don’t need extra parameters for both sparse and non-sparse data, e.g. braycurtis, dice, jaccard, hellinger etc. See the Missing Metrics section at the end of this README for those which are not implemented. There are a few breaking changes (mainly around the hamming metric, see NEWS.md for the exact details).

06 November 2023 Sparse data support has been added. You should be able to use e.g. a dgCMatrix with all the methods and currently supported metrics as easily as a dense matrix.

30 October 2023 At last, a workable random partition forest implementation has been added. This can be used standalone (e.g. rpf_knn, rpf_build, rpt_knn_query) or as initialization to nearest neighbor descent (nnd_knn(init = "tree", ...)). The forest itself can be serialized with saveRDS but you will pay a price for that convenience by having to pass it back and forth from the R to C++ layer when querying. For now there is no access to the underlying C++ class via R like in RcppHNSW and RcppAnnoy so it may not be suitable for some use cases.

19 October 2023 Inevitably 0.0.11 is here because of a bug in 0.0.10 where nearest neighbor descent was not correctly flagging new/old neighbors which reduced performance (but not the actual result).

18 October 2023 A long-postponed major internal refactoring means I might be able to make a bit of progress on this package. For now, the cosine and correlation metrics have migrated to not preprocessing their data (these versions are still available as cosine-preprocess and correlation-preprocess respectively). Also, I have exported the distance metrics as R functions (e.g. cosine_distance, euclidean_distance).

18 September 2021 The "hamming" metric now supports integer-valued (not just binary) inputs, thanks to a contribution from Vitalie Spinu. The older metric code path for binary data only is supported via metric = "bhamming".

20 June 2021 A big step forward in usefulness with the addition of the prepare_search_graph function which creates and prunes an undirected search graph from the neighbor graph for use with the (now re-named) graph_knn_query function. The latter is now also capable of backtracking search and performs fairly well.

4 October 2020 Added "correlation" as a metric and the k_occur function to help diagnose potential hubness in a dataset.

23 November 2019 Added merge_knn and merge_knnl for combining multiple nn results.

15 November 2019 It is now possible to query a reference set of data to produce the approximate knn graph relative to the references (i.e. none of the queries will be selected as neighbors) via nnd_knn_query (and related brute_force and random variants).

27 October 2019 rnndescent creeps towards usability. A multi-threaded implementation (using RcppParallel) has now been added.

20 October 2019 The nnd_knn function now has a init parameter which can be used to specify the initialization method. Currently "random" and "forest" are supported. The latter uses a random partition forest to initialize the search graph. This is much faster than the random initialization but still not as fast as I would like.