A place for me to put the old less-structured updates that I post at
the top of the README.md
file for the package. When they
aren’t new any more, they will get moved here. You should look at the Changelog
for fuller details.
14 May 2024: Version 0.1.6 has been released to CRAN. The
previous release didn’t quite get compatibility with dqrng
right so here is another attempt. Also a couple of other bug fixes have
been included.
18 April 2024: Version 0.1.5 has been released to CRAN. This is an internal API change to support a forthcoming release of dqrng, so you should notice no changes on upgrading.
18 March 2024: Version 0.1.4 has been released to CRAN. This
is a bug fix release. Most notably, it fixes an issue where
rnnd_build
would fail with
metric = "cosine"
.
08 December 2023: Version 0.1.3 has been released. This deals with some UBSAN and ASAN problems when missing data was present in the k-nearest neighbors graph.
27 November 2023: Frabjous day, rnndescent
is
now on CRAN. The version number has been bumped to 0.1.1.
24 Nov 2023 A new function rnnd_knn
has been
added if you just want the k-nearest neighbors graph for a dataset
(i.e. no querying). I have also removed some other functions and made
some other breaking changes as I prepare for CRAN submission. See the NEWS for details.
19 Nov 2023 The rnnd_build
function and
rnnd_query
functions have been added which simplify
creating a knn/building an index and querying it, respectively and
should be the main way of using the package. The other functions remain
should you need more flexibility. Some functions have been removed: the
local scaling and the standalone distance functions. The latter could
return in a different package at some point.
13 November 2023. I have added most of the metrics that
don’t need extra parameters for both sparse and non-sparse data,
e.g. braycurtis
, dice
, jaccard
,
hellinger
etc. See the Missing Metrics
section
at the end of this README for those which are not implemented. There are
a few breaking changes (mainly around the hamming metric, see
NEWS.md
for the exact details).
06 November 2023 Sparse data support has been added. You
should be able to use e.g. a dgCMatrix
with all the methods
and currently supported metrics as easily as a dense matrix.
30 October 2023 At last, a workable random partition forest
implementation has been added. This can be used standalone
(e.g. rpf_knn
, rpf_build
,
rpt_knn_query
) or as initialization to nearest neighbor
descent (nnd_knn(init = "tree", ...)
). The forest itself
can be serialized with saveRDS
but you will pay a price for
that convenience by having to pass it back and forth from the R to C++
layer when querying. For now there is no access to the underlying C++
class via R like in RcppHNSW and RcppAnnoy so it may not be suitable for
some use cases.
19 October 2023 Inevitably 0.0.11 is here because of a bug in 0.0.10 where nearest neighbor descent was not correctly flagging new/old neighbors which reduced performance (but not the actual result).
18 October 2023 A long-postponed major internal refactoring
means I might be able to make a bit of progress on this package. For
now, the cosine
and correlation
metrics have
migrated to not preprocessing their data (these versions are still
available as cosine-preprocess
and
correlation-preprocess
respectively). Also, I have exported
the distance metrics as R functions (e.g. cosine_distance
,
euclidean_distance
).
18 September 2021 The "hamming"
metric now
supports integer-valued (not just binary) inputs, thanks to a
contribution from Vitalie Spinu.
The older metric code path for binary data only is supported via
metric = "bhamming"
.
20 June 2021 A big step forward in usefulness with the
addition of the prepare_search_graph
function which creates
and prunes an undirected search graph from the neighbor graph for use
with the (now re-named) graph_knn_query
function. The
latter is now also capable of backtracking search and performs fairly
well.
4 October 2020 Added "correlation"
as a metric
and the k_occur
function to help diagnose potential hubness
in a dataset.
23 November 2019 Added merge_knn
and
merge_knnl
for combining multiple nn results.
15 November 2019 It is now possible to query a reference set
of data to produce the approximate knn graph relative to the references
(i.e. none of the queries will be selected as neighbors) via
nnd_knn_query
(and related brute_force
and
random
variants).
27 October 2019 rnndescent
creeps towards
usability. A multi-threaded implementation (using RcppParallel)
has now been added.
20 October 2019 The nnd_knn
function now has a
init
parameter which can be used to specify the
initialization method. Currently "random"
and
"forest"
are supported. The latter uses a random partition
forest to initialize the search graph. This is much faster than the
random initialization but still not as fast as I would like.