tSNE · Deepesh Nathani

Recently, I came across an interesting non-linear visualization or dimensionality reduction method - tSNE(Distributed Stochastic Neighbour Embedding). Here is the original paper describing the method in much more detail.

tSNE is just like PCA where you can reduce the dimensionality of data to 2 or 3 dimensions to visualize it. But unlike PCA, tSNE does that using non-linear ways. It converts everything to probability distributions. I am listing down the steps on a high level for a bigger picture.

Assume we have points of shape=(n, m). We’ll call this original or higher dimension
We need to convert these points to shape=(n, 2). We’ll call this lower or embedded dimension.
Calculate Pairwise similarity matrix for all data points(example eculedian distance between all pairs)
Each point is then converted to a probability distribution using the above similarity matrix
Now, we initialize n points using a normal distribution but in the lower dimension.
Again we convert these new points to a probability distribution
Finally, run gradient-descent on the two distributions(point 4 and 6) with kl-divergence between them as the cost function

Here is the link to the Jupyter Notebook. I had to dig into sklearn source code a bit to understand the algorithm completely. For this I have extensive comments for anyone to understand it. Hope it helps :)