uwot
relies on the underlying compiler and C++ standard
library on your machine and this can result in differences in output
even with the same input data, arguments, packages and R version. If you
require reproducibility between machines, it is strongly suggested that
you stick with the same OS and compiler version on all of them (e.g. a
fixed LTS of a Linux distro and gcc version). Otherwise, the following
can help:
- Use the
tumap
method instead ofumap
. This avoid the use ofstd::pow
in gradient calculations. This also has the advantage of being faster to optimize. However, this gives larger clusters in the output, and you don’t have the ability to control that witha
andb
(orspread
andmin_dist
) parameters. - For
umap
, it’s better to providea
andb
directly with a fixed precision rather than allowing them to be calculated via thespread
andmin_dist
parameters. For default UMAP, usea = 1.8956, b = 0.8006
. - Use
approx_pow = TRUE
, which avoids the use of thestd::pow
function. - Use
init = "spca"
rather thaninit = "spectral"
(although the latter is the default and preferred method for UMAP initialization). - If
n_sgd_threads
is set larger than1
, then even if you useset.seed
, results of the embeddings are not repeatable, This is because there is no locking carried out on the underlying coordinate matrix, and work is partitioned by edge not vertex and a given vertex may be processed by different threads. The order in which reads and writes occur is of course at the whim of the thread scheduler. This is the same behavior as LargeVis. - Use
rng_type = "deterministic
, which will make vertex sampling during the optimization deterministic. Note that this will not affect the use of a random number generator in other parts of the algorithm, such as approximate nearest neighbor search and initialization. This may give slightly less accurate results due to the lack of random sampling but the trade-off may be worth it (and it’s also a bit faster). - For random number generation, you can provide a
seed
parameter. This doesn’t do anything other than callset.seed
for you inside the routine, but you may find it convenient to fix the seed in the call toumap
.
In summary, your chances of reproducibility are increased by using: