Changelog
Source:NEWS.md
    uwot (development version)
Bug fixes and minor improvements
- The installation status of optional dependencies were not being detected correctly. This meant that different packages could be used for initialization in unpredictable ways depending on whether they had been explicitly loaded or not. Thank you hsuknowledge for the report (https://github.com/jlmelville/uwot/issues/134).
 - Users of the bbknnR package were suffering from a not-helpful error message when the custom neighbor data contained missing values. An explicit check has been added and although this is still a fatal error, the message should be more informative (https://github.com/jlmelville/uwot/issues/135).
 - Fixed a partially-specified parameter name being passed to irlba. Thank you Hugo Gruson for the fix (https://github.com/jlmelville/uwot/pull/136).
 
uwot 0.2.3
CRAN release: 2025-02-24
New features:
- New parameter: 
rng_type. This will be used in favor of the booleanpcg_randparameter, althoughpcg_randwill still work for backwards compatibility. - New negative sampling option: set 
rng_type = "deterministic"to use a deterministic sampling of vertices during the optimization phase. This should give qualitatively similar results to using a real PRNG, but has the advantage of being faster and giving more reproducible output. This feature was inspired by a comment by Leland McInnes on Reddit. 
Bug fixes and minor improvements
- Setting 
num_threadsdirectly inumap2did not result in the number of SGD threads being updated to that value whenbatch = TRUE, which it should have been. - Despite assertions to the contrary in version 0.2.1, 
umap_transformcontinued to return the fuzzy graph in transposed form. Thank you PedroMilanezAlmeida for reopening the issue (https://github.com/jlmelville/uwot/issues/118). - Relative paths could not be used to save a model. Thank you Wouter van der Bijl for the bug report (https://github.com/jlmelville/uwot/issues/131) and the suggested fix.
 - 
repulsion_strengthwas silently ignored if used withtumaporumap2witha = 1, b = 1. Ignoring the setting was on purpose, but it was not documented anywhere.repulsion_strengthis now compatible with these settings. - It’s no longer an error to provide a 
pcaargument if the input data has a maximum rank smaller than the value ofpca. No PCA is applied in this case. Ifverbose = TRUE, a message will be printed to inform the user. 
uwot 0.2.2
CRAN release: 2024-04-21
Bug fixes and minor improvements
- 
RSpectrais now a required dependency (again). It was a required dependency up until version 0.1.12, when it became optional (irlbawas used in its place). However, problems with interactions of the current version ofirlbawith an ABI change in theMatrixpackage means that it’s hard for downstream packages and users to builduwotwithout re-installingMatrixandirlbafrom source, which may not be an option for some people. Also it was causing a CRAN check error. I have changed some tests, examples and vignettes to useRSpectraexplicitly, and to only testirlbacode-paths where necessary. See https://github.com/jlmelville/uwot/issues/115 and links therein for more details. 
uwot 0.2.1
CRAN release: 2024-04-15
New features:
- The HNSW approximate nearest neighbor search algorithm is now supported via the RcppHNSW package. Set 
nn_method = "hnsw"to use it. The behavior of the method can be controlled by the newnn_argsparameter, a list which may containM,ef_constructionandef. See the hnswlib library’s ALGO_PARAMS documentation for details on these parameters. Although typically faster than Annoy (for a given accuracy), be aware that the only supportedmetricvalues are"euclidean","cosine"and"correlation". Finally, RcppHNSW is only a suggested package, not a requirement, so you need to install it yourself (e.g. viainstall.packages("RcppHNSW")). Also see the article on HNSW in uwot in the documentation. - The nearest neighbor descent approximate nearest neighbor search algorithm is now supported via the rnndescent package. Set 
nn_method = "nndescent"to use it. The behavior of the method can be controlled by the newnn_argsparameter. There are many supported metrics and possible parameters that can be set innn_args, so please see the article on nearest neighbor descent in uwot in the documentation, and also the rnndescent package’s documentation for details.rnndescentis only a suggested package, not a requirement, so you need to install it yourself (e.g. viainstall.packages("rnndescent")). - New function: 
umap2, which acts likeumapbut with modified defaults, reflecting my experience with UMAP and correcting some small mistakes. See the umap2 article for more details. 
Bug fixes and minor improvements
- 
init_sdev = "range"caused an error with a user-suppliedinitmatrix. - Transforming new data with the 
correlationmetric was actually using thecosinemetric if you saved and reloaded the model. Thank you Holly Hall for the report and helpful detective work (https://github.com/jlmelville/uwot/issues/117). - 
umap_transformcould fail if the new data to be transformed had thescaled:centerandscaled:scaleattributes set (e.g. from applying thescalefunction). - If you asked 
umap_transformto return the fuzzy graph (ret_extra = c("fgraph")), it was transposed whenbatch = TRUE, n_epochs = 0. Thank you PedroMilanezAlmeida for reporting (https://github.com/jlmelville/uwot/issues/118). - Setting 
n_sgd_threads = "auto"withumap_transformcaused a crash. - A warning was being emitted due to not being specific enough about what 
distclass was meant that may have been particularly affecting Seurat users. Thank you AndiMunteanu for reporting (and suggesting a solution) (https://github.com/jlmelville/uwot/issues/121). 
uwot 0.1.16
CRAN release: 2023-06-29
Bug fixes and minor improvements
- A small change to a header file was required to fully support the next version of RcppAnnoy. Thank you Dirk Eddelbuettel for the PR (https://github.com/jlmelville/uwot/issues/112).
 
uwot 0.1.15
CRAN release: 2023-06-26
New features:
- New function: 
optimize_graph_layout. Use this to produce optimized output coordinates that reflect an input similarity graph (such as that produced by thesimilarity_graphfunction.similarity_graphfollowed byoptimize_graph_layoutis the same as runningumap, so the purpose of these functions is to allow for more flexibility and decoupling between generating the nearest neighbor graph and optimizing the low-dimensional approximation to it. Based on a request by user Chengwei94 (https://github.com/jlmelville/uwot/issues/98). - New functions: 
simplicial_set_unionandsimplicial_set_intersect. These allow for the combination of different fuzzy graph representations of a dataset into a single fuzzy graph using the UMAP simplicial set operations. Based on a request in the Python UMAP issues tracker by user Dhar xion. - New parameter for 
umap_transform:ret_extra. This works like the equivalent parameter forumap, and should be a character vector specifying the extra information you would like returned in addition to the embedding, in which case a list will be returned with anembeddingmember containing the optimized coordinates. Supported values are"fgraph","nn","sigma"and"localr". Based on a request by user PedroMilanezAlmeida (https://github.com/jlmelville/uwot/issues/104). - New parameter from 
umap,tumapandumap_transform:seed. This will do the equivalent of callingset.seedinternally, and hence will help with reproducibility. The chosen seed is exported ifret_model = TRUEandumap_transformwill use that seed if present, so you only need to specify it inumap_transformif you want to change the seed. The default behavior remains to not modify the random number state. Based on a request by SuhasSrinivasan (https://github.com/jlmelville/uwot/issues/110). 
Bug fixes and minor improvements
- A new setting for 
init_sdev: setinit_sdev = "range"and initial coordinates will be range-scaled so each column takes values between 0-10. This pre-processing was added to the Python UMAP package at some point afteruwotbegan development and so should probably always be used with the defaultinit = "spectral"setting. However, it is not set by default to maintain backwards compatibility with older versions ofuwot. - 
ret_extra = c("sigma")is now supported bylvish. The Gaussian bandwidths are returned in asigmavector. In addition, a vector of intrinsic dimensionalities estimated for each point using an analytical expression of the finite difference method given by Lee and co-workers is returned in thedintvector. - The 
min_distandspreadparameters are now returned in the model whenumapis run withret_model = TRUE. This is just for documentation purposes, these values are not used directly by the model inumap_transform. If the parametersaandbare set directly when invokingumap, then bothmin_distandspreadwill be set toNULLin the returned model. This feature was added in response to a question from kjiang18 (https://github.com/jlmelville/uwot/issues/95). - Some new checks for NA values in input data have been added. Also a warning will be emitted if 
n_componentsseems to have been set too high. - If 
n_componentswas greater thann_neighborsthenumap_transformwould crash the R session. Thank you to ChVav for reporting this (https://github.com/jlmelville/uwot/issues/102). - Using 
umap_transformwith a model wheredens_scalewas set could cause a segmentation fault, destroying the session. Even if it didn’t it could give an entirely artifactual “ring” structure. Thank you FemkeSmit for reporting this and providing assistance in diagnosing the underlying cause (https://github.com/jlmelville/uwot/issues/103). - If you set 
binary_edge_weights = TRUE, this setting was not exported whenret_model = TRUE, and was therefore not respected byumap_transform. This has now been fixed, but you will need to regenerate any models that used binary edge weights. - The rdoc for the 
initparam said that if there were multiple disconnected components, a spectral initialization would attempt to merge multiple sub-graphs. Not true: actually, spectral initialization is abandoned in favor of PCA. The documentation has been updated to reflect the true state of affairs. No idea what I was thinking of there. - 
load_modelandsave_modeldidn’t work on Windows 7 due to how the version oftarthere handles drive letters. Thank you mytarmail for the report (https://github.com/jlmelville/uwot/issues/109). - Warn if the initial coordinates have a very large scale (a standard deviation > 10.0), because this can lead to small gradients and poor optimization. Thank you SuhasSrinivasan for the report (https://github.com/jlmelville/uwot/issues/110).
 - A change to accommodate a forthcoming version of RcppAnnoy. Thank you Dirk Eddelbuettel for the PR (https://github.com/jlmelville/uwot/issues/111).
 
uwot 0.1.14
CRAN release: 2022-08-22
New features
- New function: 
similarity_graph. If you are more interested in the high-dimensional graph/fuzzy simplicial set representation of your input data, and don’t care about the low dimensional approximation, thesimilarity_graphfunction offers a similar API toumap, but neither the initialization nor optimization of low-dimensional coordinates will be performed. The return value is the same as that which would be returned in the results list as thefgraphmember if you had providedret_extra = c("fgraph"). Compared to getting the same result via runningumap, this function is a bit more convenient to use, makes your intention clearer if you would be discarding the embedding, and saves a small amount of time. A t-SNE/LargeVis similarity graph can be returned by settingmethod = "largevis". 
Bug fixes and minor improvements
- If a model was generated without using pre-generated nearest neighbors, you couldn’t use 
umap_transformwith pre-generated nearest neighbors (also the error message was completely useless). Thank you to AustinHartman for reporting this (https://github.com/jlmelville/uwot/issues/97). 
uwot 0.1.13
CRAN release: 2022-08-16
- This is a resubmission of 0.1.12 but with an internal function (
fuzzy_simplicial_set) refactored to behave more like that of previous versions. This change was breaking the behavior of the CRAN package bbknnR. 
uwot 0.1.12
New features
- New parameter: 
dens_weight. If set to a value between 0 and 1, an attempt is made to include the relative local densities of the input data in the output coordinates. This is an approximation to the densMAP method. A large value ofdens_weightwill use a larger range of output densities to reflect the input data. If the data is too spread out, reduce the value ofdens_weight. For more information see the documentation at the uwot repo. - New parameter: 
binary_edge_weights. If set toTRUE, instead of smoothed knn distances, non-zero edge weights all have a value of 1. This is how PaCMAP works and there is practical and theoretical reasons to believe this won’t have a big effect on UMAP but you can try it yourself. - New options for 
ret_extra:- 
"sigma": the return value will contain asigmaentry, a vector of the smooth knn distance scaling normalization factors, one for each observation in the input data. A small value indicates a high density of points in the local neighborhood of that observation. Forlvishthe equivalent bandwidths calculated for the input perplexity is returned. - also, a vector 
rhowill be exported, which is the distance to the nearest neighbor after the number of neighbors specified by thelocal_connectivity. Only applies forumapandtumap. - 
"localr": exports a vector of the local radii, the sum ofsigmaandrhoand used to scale the output coordinates whendens_weightis set. Even if not usingdens_weight, visualizing the output coordinates using a color scale based on the value oflocalrcan reveal regions of the input data with different densities. 
 - 
 - For functions 
umapandtumaponly: new data type for precomputed nearest neighbor data passed as thenn_methodparameter: you may use a sparse distance matrix of formatdgCMatrixwith dimensionsN x NwhereNis the number of observations in the input data. Distances should be arranged by column, i.e. a non-zero entry in rowjof theith column indicates that thejth observation in the input data is a nearest neighbor of theith observation with the distance given by the value of that element. Note that this is a different format to the sparse distance matrix that can be passed as input toX: notably, the matrix is not assumed to be symmetric. Unlike other input formats, you may have a different number of neighbors for each observation (but there must be at least one neighbor defined per observation). - 
umap_transformcan also take a sparse distance matrix as itsnn_methodparameter if precomputed nearest neighbor data is used to generate an initial model. The format is the same as for thenn_methodwithumap. Because distances are arranged by columns, the expected dimensions of the sparse matrix isN_model x N_newwhereN_modelis the number of observations in the original data andN_newis the number of observations in the data to be transformed. 
Bug fixes and minor improvements
- Models couldn’t be re-saved after loading. Thank you to ilyakorsunsky for reporting this (https://github.com/jlmelville/uwot/issues/88).
 - 
RSpectra is now a ‘Suggests’, rather than an ‘Imports’. If you have RSpectra installed, it will be used automatically where previous versions required it (for spectral initialization). Otherwise, irlba will be used. For two-dimensional output, you are unlikely to notice much difference in speed or accuracy with real-world data. For highly-structured simulation datasets (e.g. spectral initialization of a 1D line) then RSpectra will give much better, faster initializations, but these are not the typical use cases envisaged for this package. For embedding into higher dimensions (e.g. 
n_components = 100or higher), RSpectra is recommended and will likely out-perform irlba even if you have installed a good linear algebra library. - 
init = "laplacian"returned the wrong coordinates because of a slightly subtle issue around how to order the eigenvectors when using the random walk transition matrix rather than normalized graph laplacians. - The 
init_sdevparameter was ignored when theinitparameter was a user-supplied matrix. Now the input will be scaled. - Matrix input was being converted to and from a data frame during pre-processing, causing R to allocate memory that it was disinclined to ever give up even after the function exited. This unnecessary manipulation is now avoided.
 - The behavior of the 
bandwidthparameter has been changed to give results more like the current version (0.5.2) of the Python UMAP implementation. This is likely to be a breaking change for non-default settings ofbandwidth, but this is not a parameter which is actually exposed by the Python UMAP public API any more, so is on the road to deprecation in uwot too and I don’t recommend you change this. - Transforming data with multiple blocks would give an error if the number of rows of the new data did not equal the number of number of rows in the original data.
 
uwot 0.1.11
CRAN release: 2021-12-02
New features
- New parameter: 
batch. IfTRUE, then results are reproducible whenn_sgd_threads > 1(as long as you useset.seed). The price to be paid is that the optimization is slightly less efficient (because coordinates are not updated as quickly and hence gradients are staler for longer), so it is highly recommended to setn_epochs = 500or higher. Thank you to Aaron Lun who not only came up with a way to implement this feature, but also wrote an entire C++ implementation of UMAP which does it (https://github.com/jlmelville/uwot/issues/83). - New parameter: 
opt_args. The default optimization method whenbatch = TRUEis Adam. You can control its parameters by passing them in theopt_argslist. As Adam is a momentum-based method it requires extra storage of previous gradient data. To avoid the extra memory overhead you can also useopt_args = list(method = "sgd")to use a stochastic gradient descent method like that used whenbatch = FALSE. - New parameter: 
epoch_callback. You may now pass a function which will be invoked at the end of each epoch. Mainly useful for producing an image of the state of the embedding at different points during the optimization. This is another feature taken from umappp. - New parameter: 
pca_method, used when thepcaparameter is supplied to reduce the initial dimensionality of the data. This controls which method is used to carry out the PCA and can be set to one of:- 
"irlba"which usesirlba::irlbato calculate a truncated SVD. If this routine deems that you are trying to extract 50% or more of the singular vectors, you will see a warning to that effect logged to the console. - 
"rsvd", which usesirlba::svdrfor truncated SVD. This method uses a small number of iterations which should give an accuracy/speed up trade-off similar to that of the scikit-learn TruncatedSVD method. This can be much faster than using"irlba"but potentially at a cost in accuracy. However, for the purposes of dimensionality reduction as input to nearest neighbor search, this doesn’t seem to matter much. - 
"bigstatsr", which uses the bigstatsr package will be used. Note: that this is not a dependency ofuwot. If you want to usebigstatsr, you must install it yourself. On platforms without easy access to fast linear algebra libraries (e.g. Windows), usingbigstatsrmay give a speed up to PCA calculations. - 
"svd", which usesbase::svd. Warning: this is likely to be very slow for most datasets and exists as a fallback for small datasets where the"irlba"method would print a warning. - 
"auto"(the default) which uses"irlba"to calculate a truncated SVD, unless you are attempting to extract 50% or more of the singular vectors, in which case"svd"is used. 
 - 
 
Bug fixes and minor improvements
- If row names are provided in the input data (or nearest neighbor data, or initialization data if it’s a matrix), this will be used to name the rows of the output embedding (https://github.com/jlmelville/uwot/issues/81), and also the nearest neighbor data if you set 
ret_nn = TRUE. If the names exist in more than one of the input data parameters listed above, but are inconsistent, no guarantees are made about which names will be used. Thank you jwijffels for reporting this. - In 
umap_transform, the learning rate is now down-scaled by a factor of 4, consistent with the Python implementation of UMAP. If you need the old behavior back, use the (newly added)learning_rateparameter inumap_transformto set it explicitly. If you used the default value inumapwhen creating the model, the correct setting inumap_transformislearning_rate = 1.0. - Setting 
nn_method = "annoy"andverbose = TRUEwould lead to an error with datasets with fewer than 50 items in them. - Using multiple pre-computed nearest neighbors blocks is now supported with 
umap_transform(this was incorrectly documented to work). - Documentation around pre-calculated nearest neighbor data for 
umap_transformwas wrong in other ways: it has now been corrected to indicate that there should be neighbor data for each item in the test data, but the neighbors and distances should refer to items in training data (i.e. the data used to build the model). - 
n_neighborsparameter is now correctly ignored in model generation if pre-calculated nearest neighbor data is provided. - Documentation incorrectly said 
grain_sizedidn’t do anything. 
uwot 0.1.10
CRAN release: 2020-12-15
This release is mainly to allow for some internal changes to keep compatibility with RcppAnnoy, used for the nearest neighbor calculations.
Bug fixes and minor improvements
- Passing in data with missing values will now raise an error early. Missing data in factor columns intended for supervised UMAP is still ok. Thank you David McGaughey for tweeting about this issue.
 - The documentation for the return value of 
umapandtumapnow note that the contents of themodellist are subject to change and not intended to be part of the uwot public API. I recommend not relying on the structure of themodel, especially if your package is intended to appear on CRAN or Bioconductor, as any breakages will delay future releases of uwot to CRAN. 
uwot 0.1.9
CRAN release: 2020-11-15
New features
- New metric: 
metric = "correlation"a distance based on the Pearson correlation (https://github.com/jlmelville/uwot/issues/22). Supporting this required a change to the internals of how nearest neighbor data is stored. Backwards compatibility with models generated by previous versions usingret_model = TRUEshould have been preserved. 
Bug fixes and minor improvements
- New parameter, 
nn_method, forumap_transform: pass a list containing pre-computed nearest neighbor data (identical to that used in theumapfunction). You should not pass anything to theXparameter in this case. This extends the functionality for transforming new points to the case where nearest neighbor data between the original data and new data can be calculated external touwot. Thanks to Yuhan Hao for contributing the PR (https://github.com/jlmelville/uwot/issues/63 and https://github.com/jlmelville/uwot/issues/64). - New parameter, 
init, forumap_transform: provides a variety of options for initializing the output coordinates, analogously to the same parameter in theumapfunction (but without as many options currently). This is intended to replaceinit_weighted, which should be considered deprecated, but won’t be removed until uwot 1.0 (whenever that is). Instead ofinit_weighted = TRUE, useinit = "weighted"; replaceinit_weighted = FALSEwithinit = "average". Additionally, you can pass a matrix toinitto act as the initial coordinates. - Also in 
umap_transform: previously, settingn_epochs = 0was ignored: at least one iteration of optimization was applied. Now,n_epochs = 0is respected, and will return the initialized coordinates without any further optimization. - Minor performance improvement for single-threaded nearest neighbor search when 
verbose = TRUE: the progress bar calculations were taking up a detectable amount of time and has now been fixed. With very small data sets (< 50 items) the progress bar will no longer appear when building the index. - Passing a sparse distance matrix as input now supports upper/lower triangular matrix storage rather than wasting storage using an explicitly symmetric sparse matrix.
 - Minor license change: uwot used to be licensed under GPL-3 only; now it is GPL-3 or later.
 
uwot 0.1.8
CRAN release: 2020-03-16
Bug fixes and minor improvements
- default for 
n_threadsis nowNULLto provide a bit more protection from changing dependencies. - parallel code now uses the standard C++11 implementation of threading rather than tinythread++.
 - The 
grain_sizeparameter has been undeprecated. As the version that deprecated this never made it to CRAN, this is unlikely to have affected many people. 
uwot 0.1.7
Bug fixes and minor improvements
- uwot should no longer trigger undefined behavior in sanitizers, due to the temporary replacement of the RcppParallel package with code “borrowed” from that package and using tinythread++ rather than tbb (https://github.com/jlmelville/uwot/issues/52).
 - Further sanitizer improvements in the nearest neighbor search code due to the upstream efforts of erikbern and eddelbuettel (https://github.com/jlmelville/uwot/issues/50).
 - The 
grain_sizeparameter is now ignored and remains to avoid breaking backwards compatibility only. 
uwot 0.1.6
New features
- New parameter, 
ret_extra, a vector which can contain any combination of:"model"(same asret_model = TRUE),"nn"(same asret_nn = TRUE) andfgraph(see below). - New return value data: If the 
ret_extravector contains"fgraph", the returned list will contain anfgraphitem representing the fuzzy simplicial input graph as a sparse N x N matrix. Forlvish, use"P"instead of"fgraph” (https://github.com/jlmelville/uwot/issues/47). Note that there is a further sparsifying step where edges with a very low membership are removed if there is no prospect of the edge being sampled during optimization. This is controlled byn_epochs: the smaller the value, the more sparsifying will occur. If you are only interested in the fuzzy graph and not the embedded coordinates, setn_epochs = 0. - New function: 
unload_uwot, to unload the Annoy nearest neighbor indices in a model. This prevents the model from being used inumap_transform, but allows for the temporary working directory created by bothsave_uwotandload_uwotto be deleted. Previously, bothload_uwotandsave_uwotwere attempting to delete the temporary working directories they used, but would always silently fail because Annoy is making use of files in those directories. - An attempt has been made to reduce the variability of results due to different compiler and C++ library versions on different machines. Visually results are unchanged in most cases, but this is a breaking change in terms of numerical output. The best chance of obtaining floating point determinism across machines is to use 
init = "spca", fixed values ofaandb(rather than allowing them to be calculated through settingmin_distandspread) andapprox_pow = TRUE. Using thetumapmethod withinit = "spca"is probably the most robust approach. 
Bug fixes and minor improvements
- New behavior when 
n_epochs = 0. This used to behave like (n_epochs = NULL) and gave a default number of epochs (dependent on the number of vertices in the dataset). Now it more usefully carries out all calculations except optimization, so the returned coordinates are those specified by theinitparameter, so this is an easy way to access e.g. the spectral or PCA initialization coordinates. If you want the input fuzzy graph (ret_extravector contains"fgraph"), this will also prevent the graph having edges with very low membership being removed. You still get the old default epochs behavior by settingn_epochs = NULLor to a negative value. - 
save_uwotandload_uwothave been updated with averboseparameter so it’s easier to see what temporary files are being created. - 
save_uwothas a new parameter,unload, which if set toTRUEwill delete the working directory for you, at the cost of unloading the model, i.e. it can’t be used withumap_transformuntil you reload it withload_uwot. - 
save_uwotnow returns the saved model with an extra field,mod_dir, which points to the location of the temporary working directory, so you should now assign the result of callingsave_uwotto the model you saved, e.g.model <- save_uwot(model, "my_model_file"). This field is intended for use withunload_uwot. - 
load_uwotalso returns the model with amod_diritem for use withunload_uwot. - 
save_uwotandload_uwotwere not correctly handling relative paths. - A previous bug fix to 
load_uwotin uwot 0.1.4 to work with newer versions of RcppAnnoy (https://github.com/jlmelville/uwot/issues/31) failed in the typical case of a single metric for the nearest neighbor search using all available columns, giving an error message along the lines of:Error: index size <size> is not a multiple of vector size <size>. This has now been fixed, but required changes to bothsave_uwotandload_uwot, so existing saved models must be regenerated. Thank you to reporter OuNao. 
uwot 0.1.5
CRAN release: 2019-12-04
Bug fixes and minor improvements
- The R API was being accessed from inside multi-threaded code to seed the (non-R) random number generators. Probably this was causing users in downstream projects (seurat and monocle) to experience strange RcppParallel-related crashes. Thanks to aldojongejan for reporting this (https://github.com/jlmelville/uwot/issues/39).
 - Passing a floating point value smaller than one to 
n_threadscaused a crash. This was particularly insidious if running with a system with only one default thread available as the defaultn_threadsbecomes0.5. Nown_threads(andn_sgd_threads) are rounded to the nearest integer. - Initialization of supervised UMAP should now be faster (https://github.com/jlmelville/uwot/issues/34). Contributed by Aaron Lun.
 
uwot 0.1.4
CRAN release: 2019-09-23
Bug fixes and minor improvements
- Fixed incorrect loading of Annoy indexes to be compatible with newer versions of RcppAnnoy (https://github.com/jlmelville/uwot/issues/31). My thanks to Dirk Eddelbuettel and Erik Bernhardsson for aid in identifying the problem.
 - Fix for 
ERROR: there is already an InterruptableProgressMonitor instance defined. - If 
verbose = TRUE, thea,bcurve parameters are now logged. 
uwot 0.1.3
CRAN release: 2019-04-07
Bug fixes and minor improvements
- Fixed an issue where the session would crash if the Annoy nearest neighbor search was unable to find k neighbors for an item.
 
Known issue
Even with a fix for the bug mentioned above, if the nearest neighbor index file is larger than 2GB in size, Annoy may not be able to read the data back in. This should only occur with very large or high-dimensional datasets. The nearest neighbor search will fail under these conditions. A work-around is to set n_threads = 0, because the index will not be written to disk and re-loaded under these circumstances, at the cost of a longer search time. Alternatively, set the pca parameter to reduce the dimensionality or lower n_trees, both of which will reduce the size of the index on disk. However, either may lower the accuracy of the nearest neighbor results.
uwot 0.1.2
CRAN release: 2019-04-06
Initial CRAN release.
New features
- New parameter, 
tmpdir, which allows the user to specify the temporary directory where nearest neighbor indexes will be written during Annoy nearest neighbor search. The default isbase::tempdir(). Only used ifn_threads > 1andnn_method = "annoy". 
Bug fixes and minor improvements
Fixed an issue with
lvishwhere there was an off-by-one error when calculating input probabilities.Added a safe-guard to
lvishto prevent the gaussian precision, beta, becoming overly large when the binary search fails during perplexity calibration.The
lvishperplexity calibration uses the log-sum-exp trick to avoid numeric underflow if beta becomes large.
uwot 0.0.0.9010 (31 March 2019)
New features
- New parameter: 
pcg_rand. IfTRUE(the default), then a random number generator from the PCG family is used during the stochastic optimization phase. The old PRNG, a direct translation of an implementation of the Tausworthe “taus88” PRNG used in the Python version of UMAP, can be obtained by settingpcg_rand = FALSE. The new PRNG is slower, but is likely superior in its statistical randomness. This change in behavior will be break backwards compatibility: you will now get slightly different results even with the same seed. - New parameter: 
fast_sgd. IfTRUE, then the following combination of parameters are set:n_sgd_threads = "auto",pcg_rand = FALSEandapprox_pow = TRUE. These will result in a substantially faster optimization phase, at the cost of being slightly less accurate and results not being exactly repeatable.fast_sgd = FALSEby default but if you are only interested in visualization, thenfast_sgdgives perfectly good results. For more generic dimensionality reduction and reproducibility, keepfast_sgd = FALSE. - New parameter: 
init_sdevwhich specifies how large the standard deviation of each column of the initial coordinates should be. This will scale any input coordinates (including user-provided matrix coordinates).init = "spca"can now be thought of as an alias ofinit = "pca", init_sdev = 1e-4. This may be too aggressive scaling for some datasets. The typical UMAP spectral initializations tend to result in standard deviations of around2to5, so this might be more appropriate in some cases. If spectral initialization detects multiple components in the affinity graph and falls back to scaled PCA, it usesinit_sdev = 1. - As a result of adding 
init_sdev, theinitoptionssspectral,slaplacianandsnormlaplacianhave been removed (they weren’t around for very long anyway). You can get the same behavior by e.g.init = "spectral", init_sdev = 1e-4.init = "spca"is sticking around because I use it a lot. 
Bug fixes and minor improvements
- Spectral initialization (the default) was sometimes generating coordinates that had too large a range, due to an erroneous scale factor that failed to account for negative coordinate values. This could give rise to embeddings with very noticeable outliers distant from the main clusters.
 - Also during spectral initialization, the amount of noise being added had a standard deviation an order of magnitude too large compared to the Python implementation (this probably didn’t make any difference though).
 - If requesting a spectral initialization, but multiple disconnected components are present, fall back to 
init = "spca". - Removed dependency on C++ 
<random>header. This breaks backwards compatibility even if you setpcg_rand = FALSE. - 
metric = "cosine"results were incorrectly using the unmodified Annoy angular distance. - Numeric matrix columns can be specified as the target for the 
categoricalmetric (fixes https://github.com/jlmelville/uwot/issues/20). 
uwot 0.0.0.9009 (1 January 2019)
- Data is now stored column-wise during optimization, which should result in an increase in performance for larger values of 
n_components(e.g. approximately 50% faster optimization time with MNIST andn_components = 50). - New parameter: 
pca_center, which controls whether to center the data before applying PCA. It would be typical to set this toFALSEif you are applying PCA to binary data (although note you can’t use this with setting withmetric = "hamming") - PCA will now be used when the 
metricis"manhattan"and"cosine". It’s still not applied when using"hamming"(data still needs to be in binary format, not real-valued). - If using mixed datatypes, you may override the 
pcaandpca_centerparameter values for a given data block by using a list for the value of the metric, with the column ids/names as an unnamed item and the overriding values as named items, e.g. instead ofmanhattan = 1:100, usemanhattan = list(1:100, pca_center = FALSE)to turn off PCA centering for just that block. This functionality exists mainly for the case where you have mixed binary and real-valued data and want to apply PCA to both data types. It’s normal to apply centering to real-valued data but not to binary data. 
Bug fixes and minor improvements
- Fixed bug that affected 
umap_transform, where negative sampling was over the size of the test data (should be the training data). - Some other performance improvements (around 10% faster for the optimization stage with MNIST).
 - When 
verbose = TRUE, log the Annoy recall accuracy, which may help tune values ofn_treesandsearch_k. 
uwot 0.0.0.9008 (December 23 2018)
New features
- New parameter: 
n_sgd_threads, which controls the number of threads used in the stochastic gradient descent. By default this is now single-threaded and should result in reproducible results when usingset.seed. To get back the old, less consistent, but faster settings, setn_sgd_threads = "auto". - API change for consistency with Python UMAP:
- 
alphais nowlearning_rate. - 
gammais nowrepulsion_strength. 
 - 
 - Default spectral initialization now looks for disconnected components and initializes them separately (also applies to 
laplacianandnormlaplacian). - New 
initoptions:sspectral,snormlaplacianandslaplacian. These are likespectral,normlaplacian,laplacianrespectively, but scaled so that each dimension has a standard deviation of 1e-4. This is like the difference between thepcaandspcaoptions. 
Bug fixes and minor improvements
- Hamming distance support (was actually using Euclidean distance).
 - Smooth knn/perplexity calibration results had a small dependency on the number of threads used.
 - Anomalously long spectral initialization times should now be reduced.
 - Internal changes and fixes thanks to a code review by Aaron Lun.
 
uwot 0.0.0.9007 (December 9 2018)
New features
- New parameter 
pca: set this to a positive integer to reduce matrix of data frames to that number of columns using PCA. Only works ifmetric = "euclidean". If you have > 100 columns, this can substantially improve the speed of the nearest neighbor search. t-SNE implementations often set this value to 50. 
uwot 0.0.0.9006 (December 5 2018)
New features
- Highly experimental mixed data type support for 
metric: instead of specifying a single metric name (e.g.metric = "euclidean"), you can pass a list, where the name of each item is the metric to use and the value is a vector of the names of the columns to use with that metric, e.g.metric = list("euclidean" = c("A1", "A2"), "cosine" = c("B1", "B2", "B3"))treats columnsA1andA2as one block, using the Euclidean distance to find nearest neighbors, whereasB1,B2andB3are treated as a second block, using the cosine distance. - Factor columns can also be used in the metric, using the metric name 
categorical. - 
ymay now be a data frame or matrix if multiple target data is available. - New parameter 
target_metric, to specify the distance metric to use with numericaly. This has the same capabilities asmetric. - Multiple external nearest neighbor data sources are now supported. Instead of passing a list of two matrices, pass a list of lists, one for each external metric.
 - More details on mixed data types can be found at https://github.com/jlmelville/uwot#mixed-data-types.
 - Compatibility with older versions of RcppParallel (contributed by sirusb).
 - 
scale = "Z"To Z-scale each column of input (synonym forscale = TRUEorscale = "scale"). - New scaling option, 
scale = "colrange"to scale columns in the range (0, 1). 
uwot 0.0.0.9004 (October 21 2018)
New features
- For supervised UMAP with numeric 
y, you may pass nearest neighbor data directly, in the same format as that supported byX-related nearest neighbor data. This may be useful if you don’t want to use Euclidean distances for theydata, or if you have missing data (and have a way to assign nearest neighbors for those cases, obviously). See the Nearest Neighbor Data Format section for details. 
uwot 0.0.0.9003 (September 22 2018)
New features
- New parameter 
ret_nn: whenTRUEreturns nearest neighbor matrices as annlist: indices in itemidxand distances in itemdist. Embedded coordinates are inembedding. Bothret_nnandret_modelcan beTRUE, and should not cause any compatibility issues with supervised embeddings. - 
nn_methodcan now take precomputed nearest neighbor data. Must be a list of two matrices:idx, containing integer indexes, anddistcontaining distances. By no coincidence, this is the format return byret_nn. 
Bug fixes and minor improvements
- Embedding to 
n_components = 1was broken (https://github.com/jlmelville/uwot/issues/6) - User-supplied matrices to 
initparameter were being modified, in defiance of basic R pass-by-copy semantics. 
uwot 0.0.0.9002 (August 14 2018)
Bug fixes and minor improvements
- 
metric = "cosine"is working again forn_threadsgreater than0(https://github.com/jlmelville/uwot/issues/5) 
uwot 0.0.0.9001
New features
August 5 2018. You can now use an existing embedding to add new points via
umap_transform. See the example section below.August 1 2018. Numerical vectors are now supported for supervised dimension reduction.
July 31 2018. (Very) initial support for supervised dimension reduction: categorical data only at the moment. Pass in a factor vector (use
NAfor unknown labels) as theyparameter and edges with bad (or unknown) labels are down-weighted, hopefully leading to better separation of classes. This works remarkably well for the Fashion MNIST dataset.July 22 2018. You can now use the cosine and Manhattan distances with the Annoy nearest neighbor search, via
metric = "cosine"andmetric = "manhattan", respectively. Hamming distance is not supported because RcppAnnoy doesn’t yet support it.