Welcome to the documentation for sneer: Stochastic Neighbor Embedding Experiments in R. It’s a pure R package for carrying out not only the popular t-distributed SNE, but also other related dimensionality reduction methods.

Installing

Make sure you have the devtools package installed, then use that to install sneer:

install.packages("devtools")
devtools::install_github("jlmelville/sneer")

Sneer Options

The different options available in sneer are covered by the sections below.

  • Embedding Concepts - in which I try and define some generic terms and the basic workflow used by all the methods covered in sneer.
  • Data Sets - format of the input data and places to look for data sets.
  • Data Preprocessing - basic filtering and scaling that sneer will apply to your input.
  • Input Initialization - transforming the input distances to probabilities or other parameterization needed for the optimization.
  • Output Initialization - how to create the initial output configuration.
  • Optimization - controlling the type of optimization of the output coordinates.
  • Embedding Methods - there’s a whole family of embedding methods related to t-SNE out there. Let me introduce you.
  • Advanced Embedding Methods - details on embedder function.
  • Reporting - understanding the logging info during optimization.
  • Exported Data - what data you can expect to be returned from calling sneer (and the extra data you can ask for).
  • Analysis - if you don’t trust your lyin’ eyes, some functions to help with quantitative evaluation of embeddings.
  • Visualization - options for conveniently viewing embeddings.
  • References - links to the research that introduced the various methods in sneer.

The examples page brings most of this together. The same examples can be found in the R documentation for the sneer function, i.e. type ?sneer at the R console.

Just t-SNE, Please

Don’t want to deal with all those options? Ok, let’s use the iris data set as an example.

library("sneer")
res <- sneer(iris)

You should see a running commentary on what sneer is doing being logged to the console, including the iteration number, some error values (that get smaller) and convergence information. After 1000 iterations, it stops.

Also, we saw a plot of the iris data, in three colored clusters slowly changing shape as the embedding converged. Hopefully, it looks a bit like:

If you know anything about the iris data set the result shouldn’t be very surprising.

Anyway, the embedded coordinates are in the matrix res$coords. You probably have a much better idea about how to display and analyze the results than I do. Off you go. Have fun.

Theory

If you are at all curious about how the gradients are derived, perhaps because you would like to tweak a cost function or similarity function, take a look at the gradients page.

Some other increasingly arcane gradient derivations are also available, which are mainly just attempts to demonstrate how specific literature variations can be treated within a generic framework.