Up: Documentation Home.

The investigation of the three clusters dataset found that using ASNE with the input bandwidths used in the output kernel solved all the problems with strange distortions occuring at higher perplexities. Unfortunately, there was reason to believe that this would not generalize to other datasets. Let’s see if that is the case.

Also, the original description of NeRV transferred bandwidths directly from the input kernel to the output kernel. So there is a literature precedent for this. However, later publications about NeRV don’t mention this step, so it may have turned out to be a bad idea. We shall see.

Datasets

See the Datasets page.

Evaluation

Apart from visualizing the results, the mean neighbor preservation of the 40 closest neighbors is used to provide a rough quantification of the quality of the result, labelled as mnp@40 in the plots.

Settings

Adding bandwidths massively changes the magnitude of the gradients compared to non-bandwidth methods. This required extensive learning rate twiddling, both on a per-method and per-dataset basis.

# eta = 1000 for basne
# eta = 20000 for btasne
mnist_bnerv <- smallvis(s1k, scale = FALSE, perplexity = 40, Y_init = "spca", method = "bnerv", eta = 100, max_iter = 2000, epoch = 100)

Results

Given the lack of anything interesting happening with SSNE, t-SNE and their bandwidthed equivalents on the Swiss Roll data, to save some effort, we’ll only look at ASNE and t-ASNE, and their bandwidthed versions below.

iris

iris ASNE iris BASNE
iris t-ASNE iris Bt-ASNE
iris NeRV (0.5) iris BNeRV

s1k

s1k ASNE s1k BASNE
s1k t-ASNE s1k Bt-ASNE
s1k NeRV (0.5) s1k BNeRV

oli

oli ASNE oli BASNE
oli t-ASNE oli Bt-ASNE
oli NeRV (0.5) oli BNeRV

frey

frey ASNE frey BASNE
frey t-ASNE frey Bt-ASNE
frey NeRV (0.5) frey BNeRV

coil20

coil20 ASNE coil20 BASNE
coil20 t-ASNE coil20 Bt-ASNE
coil20 NeRV (0.5) coil20 BNeRV

mnist

mnist ASNE mnist BASNE
mnist t-ASNE mnist Bt-ASNE
mnist NeRV (0.5) mnist BNeRV

fashion

fashion ASNE fashion BASNE
fashion t-ASNE fashion Bt-ASNE
fashion NeRV (0.5) fashion BNeRV

Conclusions

In terms of neighborhood preservation, there are a few datasets (iris, oli, frey and coil20) where adding bandwidths to ASNE improves the results by a tiny amount. But it’s not a major improvement: t-ASNE always outperforms BASNE except on the consistently anomalous iris dataset. So it’s not hugely to discover that t-ASNE and NeRV are improved by adding bandwidths to the output kernel only for iris.

Adding bandwidths directly from the input kernel to the output kernel has nothing to recommend it, so it’s not surprising that the NeRV method stopped doing it.

Up: Documentation Home.