Part 2 of many example images of PaCMAP. A continuation of Notes on PaCMAP and PaCMAP Examples.

Here I have tried to make as direct a comparison with UMAP as I can. I again fiddled with the PaCMAP code to turn off some things: non-PCA data is centered but not range-scaled and I changed the PCA initialization in PaCMAP to scale the standard deviation to 1 for all columns. Back in PaCMAP Examples I speculated that an initialization like this would avoid the poor results I saw when I turned off mid-pair weights.

For UMAP, I used uwot which I have easy control over to mess about with for my needs. I used the exact nearest neighbors for the UMAP case because I have pregenerated them for these datasets which removes a large source of computational time. For initialization, I used the coordinates from PaCMAP at iteration 0.

In the table below, the first row is PaCMAP. The first column uses mid-near weights as usual, the second doesn’t use them and sets the near-pairs to 1 for all iterations. This is to test my theory that scaling the PCA initialization should be good enough for PaCMAP. It also removes the influence of the mid-near pairs vs UMAP. Additionally, if it works well enough without mid-near interactions, that would suggest you could implement something PaCMAP-like fairly easily in existing UMAP-style code bases (like uwot).

The second row is UMAP results. The left column is UMAP with default settings. The right column is t-UMAP, which should be “gentler” in terms of its forces (although not as gentle as PaCMAP).

The third row is UMAP results but I use the 100D PCA input like PaCMAP would (if the input dimensionality > 100), and also the near pairs are the scaled nearest neighbors. Previous results suggest that this won’t have a noticeable effect except on macosko2015 and ng20. Again UMAP is on the left and t-UMAP is on the right.

As we don’t have enough images as it is, I have added two other datasets: coil100, which is like coil20 but has more images; and isofaces, a more manifold-like dataset, beloved of many a spectral-based embedding paper.

Finally, sorry for not having useful titles on the images.

UMAP, t-UMAP and PaCMAP

iris
iris-pacmap-15-spca1-it450 iris-pacmap-15-spca1nomid1-it450
iris-umap-spca1 iris-tumap-spca1
iris-umap-spca1s iris-tumap-spca1s
s1k
s1k-pacmap-15-spca1-it450 s1k-pacmap-15-spca1nomid1-it450
s1k-umap-spca1 s1k-tumap-spca1
s1k-umap-spca1s s1k-tumap-spca1s
oli
oli-pacmap-15-spca1-it450 oli-pacmap-15-spca1nomid1-it450
oli-umap-spca1 oli-tumap-spca1
oli-umap-spca1s oli-tumap-spca1s
frey
frey-pacmap-15-spca1-it450 frey-pacmap-15-spca1nomid1-it450
frey-umap-spca1 frey-tumap-spca1
frey-umap-spca1s frey-tumap-spca1s
coil20
coil20-pacmap-15-spca1-it450 coil20-pacmap-15-spca1nomid1-it450
coil20-umap-spca1 coil20-tumap-spca1
coil20-umap-spca1s coil20-tumap-spca1s

This the one dataset where the dcorr is noticeably worse for the PaCMAP result without the mid-pair forces vs with it.

coil100
coil100-pacmap-15-spca1-it450 coil100-pacmap-15-spca1nomid1-it450
coil100-umap-spca1 coil100-tumap-spca1
coil100-umap-spca1s coil100-tumap-spca1s
mnist
mnist-pacmap-15-spca1-it450 mnist-pacmap-15-spca1nomid1-it450
mnist-umap-spca1 mnist-tumap-spca1
mnist-umap-spca1s mnist-tumap-spca1s
fashion
fashion-pacmap-15-spca1-it450 fashion-pacmap-15-spca1nomid1-it450
fashion-umap-spca1 fashion-tumap-spca1
fashion-umap-spca1s fashion-tumap-spca1s
kuzushiji
kuzushiji-pacmap-15-spca1-it450 kuzushiji-pacmap-15-spca1nomid1-it450
kuzushiji-umap-spca1 kuzushiji-tumap-spca1
kuzushiji-umap-spca1s kuzushiji-tumap-spca1s
cifar10
cifar10-pacmap-15-spca1-it450 cifar10-pacmap-15-spca1nomid1-it450
cifar10-umap-spca1 cifar10-tumap-spca1
cifar10-umap-spca1s cifar10-tumap-spca1s
norb
norb-pacmap-15-spca1-it450 norb-pacmap-15-spca1nomid1-it450
norb-umap-spca1 norb-tumap-spca1
norb-umap-spca1s norb-tumap-spca1s
ng20
ng20-pacmap-15-spca1-it450 ng20-pacmap-15-spca1nomid1-it450
ng20-umap-spca1 ng20-tumap-spca1
ng20-umap-spca1s ng20-tumap-spca1s

This dataset shows the biggest difference between PaCMAP and UMAP according to both global and local metrics. This might really be saying that it’s a bad idea to use the Euclidean metric on this dataset no matter what you use. Hard to see that any of the visualizations are incredibly insightful at a global level.

mammoth
mammoth-pacmap-15-spca1-it450 mammoth-pacmap-15-spca1nomid1-it450
mammoth-umap-spca1 mammoth-tumap-spca1
mammoth-umap-spca1s mammoth-tumap-spca1s
swissroll
swissroll-pacmap-15-spca1-it450 swissroll-pacmap-15-spca1nomid1-it450
swissroll-umap-spca1 swissroll-tumap-spca1
swissroll-umap-spca1s swissroll-tumap-spca1s
s-curve with a hole
scurvehole-pacmap-15-spca1-it450 scurvehole-pacmap-15-spca1nomid1-it450
scurvehole-umap-spca1 scurvehole-tumap-spca1
scurvehole-umap-spca1s scurvehole-tumap-spca1s
isofaces
isofaces-pacmap-15-spca1-it450 isofaces-pacmap-15-spca1nomid1-it450
isofaces-umap-spca1 isofaces-tumap-spca1
isofaces-umap-spca1s isofaces-tumap-spca1s

PaCMAP reliably splits the isofaces dataset into two from a PCA initialization. The dcorr and rtp metrics indicate that the UMAP results are to be preferred but they are not a feast for the eyes that’s for sure. The emd metric marginally prefers the PaCMAP results though.

macosko2015
macosko2015-pacmap-15-spca1-it450 macosko2015-pacmap-15-spca1nomid1-it450
macosko2015-umap-spca1 macosko2015-tumap-spca1
macosko2015-umap-spca1s macosko2015-tumap-spca1s

This the only dataset where the rtp metric suggests that UMAP and t-UMAP results on PCA-processed data do a noticeably better job of preserving the global structure than not applying PCA.

tasic2018
tasic2018-pacmap-15-spca1-it450 tasic2018-pacmap-15-spca1nomid1-it450
tasic2018-umap-spca1 tasic2018-tumap-spca1
tasic2018-umap-spca1s tasic2018-tumap-spca1s
lamanno2020
lamanno2020-pacmap-15-spca1-it450 lamanno2020-pacmap-15-spca1nomid1-it450
lamanno2020-umap-spca1 lamanno2020-tumap-spca1
lamanno2020-umap-spca1s lamanno2020-tumap-spca1s

Overall, the global structure according the to rtp metric is pretty much the same for all the methods. According to dcorr, UMAP might do a better job. For PaCMAP, where there is a difference when not including the mid-pair forces, once again the evidence suggests that even with a PCA initialization, having the mid-pair forces helps.

For local preservation, UMAP seems better at retaining the 15 nearest neighbors, but the difference disappears at 65 nearest neighbors. This would be in line with the idea that UMAP’s attractive forces are stronger than those of PaCMAP and the consistently more torn and distorted manifolds for the S-curve-with-a-hole, swiss roll and mammoth also show that. t-UMAP definitely seems to tear manifolds less strongly than UMAP (but definitely more than PaCMAP).

The effect of using the local scaling + PCA for finding nearest neighbors seems slight for all methods, whether we look at global or local measures of preservation. I couldn’t see any particular visual trend either, apart from the expected difference for macosko2015 and ng20 and it’s the effect of applying PCA that has the big effect there.

Effect of Edge Weights

Böhm and co-workers note that replacing UMAP’s edge weights with uniform values didn’t have a big effect on results (see their figure 2 with the MNIST digits for example). But when I added the PaCMAP near pair and far pair forces (no mid near forces) to uwot, I got results that on a lot of occasions looked more like t-UMAP than the Python PaCMAP results, even though I had gone through the various efforts outlined above to remove several differing aspects between PaCMAP and UMAP.

Assuming it’s not a bug (in which case this entire section will be stricken from the record), I noticed a shift to a more PaCMAP-like behavior when I set all the edge weights to 1. I also tested putting the t-UMAP forces into PaCMAP and still got results that looked PaCMAP-like to me. So below is a more in-depth investigation on each of the datasets with various ways of shoe-horning t-UMAP into PaCMAP or vice versa.

For each set of four images, the methods are as follows.

Top left: UMAP with the attractive and repulsive interactions replaced with the PaCMAP versions for near and far pairs respectively. The sampling is based on the UMAP graph edges.

Top right: UMAP with the attractive and repulsive interactions replaced with the PaCMAP versions for near and far pairs respectively. The sampling is based on the PaCMAP graph edges (i.e. the same weight for all edges).

Center left: t-UMAP with sampling based on the PaCMAP graph edges (i.e. the same weight for all edges).

Center right: PaCMAP Python code with the mid-pair interactions turned off and the near and far pair interactions replaced with the t-UMAP attractive and repulsive interactions respectively. Sampling uses PaCMAP graph edges (i.e. the same weight for all edges).

Bottom left: PaCMAP with mid-pair forces turned off (repeated from the previous section). This is what PaCMAP “should” look like.

Bottom right: t-UMAP with 100D PCA and local scaling (also repeated from above). This is what t-UMAP “should” look like.

Lots of plots below with some commentary where I think there is an example that illustrates the effect.

iris
iris-upacmap-spca1 iris-upacmapv1-spca1
iris-tumapv1-spca1 iris-pacmap-15-tpacmap-it450
iris-pacmap-15-spca1nomid-it450 iris-tumap-spca1s
s1k
s1k-upacmap-spca1 s1k-upacmapv1-spca1
s1k-tumapv1-spca1 s1k-pacmap-15-tpacmap-it450
s1k-pacmap-15-spca1nomid-it450 s1k-tumap-spca1s

Alright so in this case, the PaCMAP in uwot have the desired effect without having to binarize the edge weights. Like adding the t-UMAP forces to pacmap creates a very t-UMAP-like result.

oli
oli-upacmap-spca1 oli-upacmapv1-spca1
oli-tumapv1-spca1 oli-pacmap-15-tpacmap-it450
oli-pacmap-15-spca1nomid-it450 oli-tumap-spca1s

Certainly for oli just putting PaCMAP near and far forces into uwot (top left) does not produce a PaCMAP effect, but binarizing the edges (top right) does then give something close to the original PaCMAP results (bottom left).

frey
frey-upacmap-spca1 frey-upacmapv1-spca1
frey-tumapv1-spca1 frey-pacmap-15-tpacmap-it450
frey-pacmap-15-spca1nomid-it450 frey-tumap-spca1s

As usual, frey does its own thing. I wouldn’t say this can tell us much either way.

coil20
coil20-upacmap-spca1 coil20-upacmapv1-spca1
coil20-tumapv1-spca1 coil20-pacmap-15-tpacmap-it450
coil20-pacmap-15-spca1nomid-it450 coil20-tumap-spca1s

The PaCMAP-in-uwot result is rather odd: you can see one of the loops becomes a straight line. This isn’t a quirk of the random number generator, it happens every time I run it. But looking at the bottom row you can see that t-UMAP causes the loops to be less compact and more evenly spaced (bottom right) versus PaCMAP (bottom left). The t-UMAP-in-PaCMAP result (center right) looks closer to the PaCMAP result than the t-UMAP result.

coil100
coil100-upacmap-spca1 coil100-upacmapv1-spca1
coil100-tumapv1-spca1 coil100-pacmap-15-tpacmap-it450
coil100-pacmap-15-spca1nomid-it450 coil100-tumap-spca1s

Ok, here it’s pretty obvious, right? PaCMAP tends to create three clusters of really bunched up loops with the others highly compressed and studded across the plot. t-UMAP is much more even in its distribution. The PaCMAP-in-uwot version (top left) looks nothing like the Python PaCMAP version (bottom left) until binarized edges are used (top right). Meanwhile, the t-UMAP-in-pacmap results (center right) also resembles the PaCMAP version.

mnist
mnist-upacmap-spca1 mnist-upacmapv1-spca1
mnist-tumapv1-spca1 mnist-pacmap-15-tpacmap-it450
mnist-pacmap-15-spca1nomid-it450 mnist-tumap-spca1s
fashion
fashion-upacmap-spca1 fashion-upacmapv1-spca1
fashion-tumapv1-spca1 fashion-pacmap-15-tpacmap-it450
fashion-pacmap-15-spca1nomid-it450 fashion-tumap-spca1s
kuzushiji
kuzushiji-upacmap-spca1 kuzushiji-upacmapv1-spca1
kuzushiji-tumapv1-spca1 kuzushiji-pacmap-15-tpacmap-it450
kuzushiji-pacmap-15-spca1nomid-it450 kuzushiji-tumap-spca1s
cifar10
cifar10-upacmap-spca1 cifar10-upacmapv1-spca1
cifar10-tumapv1-spca1 cifar10-pacmap-15-tpacmap-it450
cifar10-pacmap-15-spca1nomid-it450 cifar10-tumap-spca1s
norb
norb-upacmap-spca1 norb-upacmapv1-spca1
norb-tumapv1-spca1 norb-pacmap-15-tpacmap-it450
norb-pacmap-15-spca1nomid-it450 norb-tumap-spca1s

Another example where the PaCMAP result (bottom left) and t-UMAP results (bottom right) are quite distinguishable, with the blue structures and especially the green bits being a lot more compact for PaCMAP. The switch for PaCMAP-in-uwot (top left) going to binarized edges (top right) is pretty clear.

ng20
ng20-upacmap-spca1 ng20-upacmapv1-spca1
ng20-tumapv1-spca1 ng20-pacmap-15-tpacmap-it450
ng20-pacmap-15-spca1nomid-it450 ng20-tumap-spca1s

In the case of ng20, it seems the change in force types does most of the work, not the effect of edge weights.

mammoth
mammoth-upacmap-spca1 mammoth-upacmapv1-spca1
mammoth-tumapv1-spca1 mammoth-pacmap-15-tpacmap-it450
mammoth-pacmap-15-spca1nomid-it450 mammoth-tumap-spca1s

mammoth proving its worth again as a way to interrogate these algorithms. The effect is pretty stark here. t-UMAP tends to curve the legs and tusks while PaCMAP keeps them straight. And yet it does seem a substantial effect comes from binarizing edges: PaCMAP-in-uwot (top left) shows lots of curving until edges are binarized (top right). The tusks still curve but the effect is reduced. Meanwhile, t-UMAP-in-pacmap (center right) hsa straight lines. Using binarized edges with t-UMAP (center left) also looks much “cleaner”.

swissroll
swissroll-upacmap-spca1 swissroll-upacmapv1-spca1
swissroll-tumapv1-spca1 swissroll-pacmap-15-tpacmap-it450
swissroll-pacmap-15-spca1nomid-it450 swissroll-tumap-spca1s

None of the results here give as good results as plain PaCMAP (bottom right), but adding binarized edges seems to always improve t-UMAP. Running out of differences between UMAP and PaCMAP here, so the remaining difference in results is either due to the approximate nearest neighbors (unlikely as in 3 dimensions Annoy should do a very good job), or because of the effect of UMAP always choosing different random neighbors.

s-curve with a hole
scurvehole-upacmap-spca1 scurvehole-upacmapv1-spca1
scurvehole-tumapv1-spca1 scurvehole-pacmap-15-tpacmap-it450
scurvehole-pacmap-15-spca1nomid-it450 scurvehole-tumap-spca1s

The top left and bottom right results do stand out from the others in terms of smoothness of the manifold.

isofaces
isofaces-upacmap-spca1 isofaces-upacmapv1-spca1
isofaces-tumapv1-spca1 isofaces-pacmap-15-tpacmap-it450
isofaces-pacmap-15-spca1nomid-it450 isofaces-tumap-spca1s
macosko2015
macosko2015-upacmap-spca1 macosko2015-upacmapv1-spca1
macosko2015-tumapv1-spca1 macosko2015-pacmap-15-tpacmap-it450
macosko2015-pacmap-15-spca1nomid-it450 macosko2015-tumap-spca1s
tasic2018
tasic2018-upacmap-spca1 tasic2018-upacmapv1-spca1
tasic2018-tumapv1-spca1 tasic2018-pacmap-15-tpacmap-it450
tasic2018-pacmap-15-spca1nomid-it450 tasic2018-tumap-spca1s
lamanno2020
lamanno2020-upacmap-spca1 lamanno2020-upacmapv1-spca1
lamanno2020-tumapv1-spca1 lamanno2020-pacmap-15-tpacmap-it450
lamanno2020-pacmap-15-spca1nomid-it450 lamanno2020-tumap-spca1s

The effect of binarizing edges is not always obvious or consistent across datasets, but there are enough examples here to make me think that binarizing the edges has quite a bit to do with PaCMAP’s behavior on low-dimensional manifolds like the mammoth and even more structured data like coil100. Although binarizing edge weights has been treated as an approximation that has no effect on UMAP, I would caution against that, although it may be something that only affects a small number of datasets in the same way preprocessing the data with PCA may not have a big effect.

Without considering clusters or other more supervised methods of global structure preservation, the global preservation metrics indicate that if you have access to a sensibly-scaled PCA initialization for UMAP I am not convinced that PaCMAP is a huge win over UMAP. However, first, the PaCMAP paper doesn’t say that, it is focused on being able to get good results from more random initializations. Second, the Python UMAP implementation doesn’t currently have such a PCA implementation option. But uwot does, and that’s what I use so this is less of a concern for me.

PaCMAP definitely seems to do a better job or ripping manifolds less (although it certainly can’t work miracles with swiss roll and will detach a leg on the mammoth) so it would have a big advantage there. Unfortunately, apart from the more synthetic simulation sets used here, it’s not clear to me that for the other datasets, if there is some hidden manifold structure within the cluster-like blobs, that PaCMAP is any better at revealing that than UMAP.

Finally, I still have some qualms about whether the global preservation metrics are helpful. In particular, I am suspicious of the emd metric which regularly gives the opposite conclusion to rtp and dcorr. This may be down to the stochastic sampling nature of my implementation, but I think there are deeper issues about the normalization that is required. In particular, global metrics could give entirely the opposite conclusion if there is the manifold structure that PaCMAP is better at handling: successfully unrolling the swiss roll will lead to very poor correlations with the original input distances. The spectral initialization that UMAP does by default is usually much more successful with these kinds of structures visually (mammoth is not a huge success, but isofaces and swiss roll do well) but you wouldn’t know it if you just looked at the global preservation metrics (as usual the local preservation is unchanged). Also, I think these metrics may reward large separations of roughly clustered data. Arguably very best norb results are from when the mid-pair weights are turned off and the initial PCA result caused large distances. The different parts of the dataset merrily rearranged locally without shifting from their locations due to PCA so both global and local results were pretty good. But I don’t know if that really represents the best possible visualization that PaCMAP or UMAP could come up with for that dataset. However, the idea of optimizing each neighborhood in isolation in a series of mini-UMAPs, then trying to find a good alignment for them in the style of LTSA might be an interesting way to leverage this observation.