uwot
relies on the underlying compiler and C++ standard
library on your machine and this can result in differences in output
even with the same input data, arguments, packages and R version. If you
require reproducibility between machines, it is strongly suggested that
you stick with the same OS and compiler version on all of them (e.g. a
fixed LTS of a Linux distro and gcc version). Otherwise, the following
can help:
- Use the
tumap
method instead ofumap
. This avoid the use ofstd::pow
in gradient calculations. This also has the advantage of being faster to optimize. However, this gives larger clusters in the output, and you don’t have the ability to control that witha
andb
(orspread
andmin_dist
) parameters. - For
umap
, it’s better to providea
andb
directly with a fixed precision rather than allowing them to be calculated via thespread
andmin_dist
parameters. For default UMAP, usea = 1.8956, b = 0.8006
. - Use
approx_pow = TRUE
, which avoids the use of thestd::pow
function. - Use
init = "spca"
rather thaninit = "spectral"
(although the latter is the default and preferred method for UMAP initialization). - If
n_sgd_threads
is set larger than1
, then even if you useset.seed
, results of the embeddings are not repeatable, This is because there is no locking carried out on the underlying coordinate matrix, and work is partitioned by edge not vertex and a given vertex may be processed by different threads. The order in which reads and writes occur is of course at the whim of the thread scheduler. This is the same behavior as LargeVis.
In summary, your chances of reproducibility are increased by using: