: interpolation, t-SNE projection (with gifs & examples!) Featured In the “ ” series, we will see how to use deep learning to solve complex problems end-to-end as we do in We will rather look at different techniques, along with some Don’t forget to check out ! Deep Learning bits not A.I. Odyssey . examples and applications. Deep Learning bits #1 If you like Artificial Intelligence, make sure to subscribe to the newsletter to receive updates on articles and much more! Introduction , we have seen what autoencoders are, and how they work. Today, we will see how they can help us in some cool ways. For that, we will work on images, using the Convolutional Autoencoder architecture ( ). Last time visualize the data very CAE What’s the latent space again? An autoencoder is made of two components, here’s a quick reminder. The brings the data from a high dimensional input to a layer, where the number of neurons is the smallest. Then, the takes this encoded input and converts it back to the original input shape — in our case an image. The the space in which the data lies in the bottleneck layer. encoder bottleneck decoder latent space is Convolutional Encoder-Decoder architecture The latent space contains a representation of the image, which is the decoder is allowed to use to try to reconstruct the input . To perform well, the network has to learn to extract the features in the bottleneck. compressed the only information as faithfully as possible most relevant Let’s see what we can do! The dataset We’ll change from the datasets of last time. Instead of looking at or , we will work on probably the the dataset of . I usually prefer to work with datasets just for diversity, but MNIST is for what we will do today. my eyes blue squares most famous for computer vision: MNIST handwritten digits less conventional really convenient Although MNIST visualizations are on the internet, the images in this post are 100% generated so you can use these techniques with your own models. Note: pretty common from the code, MNIST is a labelled dataset of 28x28 images of handwritten digits Baseline — Performance of the autoencoder To understand what kind of features the encoder is capable of extracting from the inputs, we can first look at If this , it’s normal, we already did that last time. However, this step is because it sets the baseline for our of the model. reconstructed of images. sounds familiar necessary expectations For this post, the bottleneck layer has only , which is some , brutal dimensionality reduction. If it was an image, it pixels. Note: 32 units really really wouldn’t even be 6x6 Each digit is displayed next to its blurry reconstruction We can see that the autoencoder reconstructs the digits. The because the input is at the bottleneck layer. The reason we need to take a look at is to be sure we are not the training set. successfully reconstruction is blurry compressed validation samples overfitting : Bonus Here’s the training process animation Reconstruction of (left) and (right) samples at each step training validation t-SNE visualization What’s t-SNE? The first thing we want to do when working with a dataset is to the data in a way. In our case, the has 784 dimensions (28_*28*1_), and we clearly plot that. The challenge is to squeeze all this dimensionality into something we can grasp, in or . visualize meaningful image (or pixel) space cannot 2D 3D Here comes , an algorithm that maps a to a while trying to between the points . We will use this technique to plot embeddings of our dataset, directly from the , and from the . t-SNE high dimensional space 2D or 3D space, keep the distance the same first image space then smaller latent space Note: t-SNE is better for visualization than it’s cousins PCA and ICA . Projecting the pixel space Let’s start by plotting the t-SNE embedding of our dataset (from image space) and see what it looks like. t-SNE projection of representations from the validation set image space We can already see that some numbers are together. That’s because the dataset is really simple*, and we can use simple on pixels to classify the samples. Look how there’s no cluster for the digits , that’s because they are all made of the , and only minor changes differentiates them. roughly clustered heuristics 8, 5, 7 and 3 same pixels *On more complex data, such as RGB images , the only clusters would be of images of the same general color . Projecting the latent space We know that the contains of our images than the pixel space**,** so we can hope that t-SNE will give us an interesting . latent space a simpler representation 2-D projection of the latent space t-SNE projection of representations from the validation set latent space Although , the projection shows clusters. This shows that in the latent space, the same digits are close to one another. We can see that the digits are now easier to distinguish, and appear in clusters. not perfect denser 8, 7, 5 and 3 small Interpolation Now that we know what the model is capable of extracting, we can the structure of the latent space. To do that, we will compare how looks in the , versus . level of detail probe interpolation image space latent space Linear interpolation in image space We start off by taking , and linearly interpolate between them. Effectively, this the images in a kind of way. two images from the dataset blends ghostly Interpolation in pixel space The reason for this messy transition is the It’s simply not possible to go smoothly from one image to another in the image space. This is the reason why blending the image of and the image of an will not give the image of a . structure of the pixel space itself. an empty glass full glass half-full glass Linear interpolation in latent space Now, let’s do the same in the latent space. We take the same start and end images and to obtain their We then interpolate between the two latent vectors, and feed these to the . feed them to the encoder latent space representation. decoder Interpolation in latent space The result is much . Instead of having a of the two digits, we clearly see the shape slowly from one to the other. This shows how well the latent space of the images. more convincing fading overlay transform understands the structure here’s a few animations of the interpolation in both spaces Bonus: Linear interpolation in (left) and (right) image space latent space More techniques & examples Interpolation examples On datasets, and with model, we can get visuals. richer better incredible 3-way interpolation for Latent space faces Interpolation of 3D shapes Latent space arithmetics We can also do in the latent space**.** This means that latent space representations. arithmetics instead of interpolating, we can add or subtract This technique gives mind-blowing results. For example with faces, man with glasses - man without glasses + woman without glasses = woman with glasses. Arithmetics on 3D shapes I’ve put a function for that in the code, but it looks terrible on MNIST. Note: Conclusions In this post, we have seen several techniques to visualize the features in the latent space of an autoencoder neural network. These visualizations help understand the network is learning. From there, we can exploit the latent space for , , and many other applications. learned embedded what clustering compression If you like Artificial Intelligence, make sure to subscribe to the newsletter to receive updates on articles and much more! You can play with the code over there: _LatentSpaceVisualization - Visualization techniques for the latent space of a convolutional autoencoder in Keras_github.com GitHub - despoisj/LatentSpaceVisualization: Visualization techniques for the latent space of a… Thanks for reading this post, stay tuned for more !