This blog post begins with a playful analogy to the scientific paper by King and Welling [1], likening to a friend who studies your fashion to create their own distinctive style. Most of inspiration is drawn from this paper. VAE is a generative model that learns from a sampled distribution.
Your Quirky Friend: The Encoder
This friend, called "Q," encapsulates the essence of your style and records it in a notebook, symbolizing the latent space in a VAE. Q uses the encoder for this task. They observe your outfit to capture its essence. Instead of merely replicating your outfit, Q identifies the key features that make your style unique.
Q has several methods to capture the key features of your style using the encoder, which is a neural network and can take various forms, such as fully connected networks, Convolutional Neural Networks (CNNs) for image data, or Long Short-Term Memory (LSTM) networks for sequential data.
Once Q has captured the key features of your style, they record these insights in their notebook. This notebook, or latent space, is where Q stores the essence of your fashion sense. Let's take a closer look at what this latent space represents
More Concretely, the Encoder is responsible for mapping the input data into a latent space representation using neural network forms just mentioned. It takes the input data and compresses it into a smaller, latent variable space, typically characterized by a mean and variance that define a probability distribution.
Q's Notebook: The Latent Space
Q's notebook represents the latent space in the VAE. It's where Q stores their notes, sketches, and ideas inspired by your fashion sense. This notebook contains the essence of your style, distilled into a set of key features. In the VAE, the latent space is a probabilistic representation of the input data. It's a distribution over the possible values of the latent variables, which capture the underlying patterns and structures in the data.
ELBO – Evidence Lower Bound
In order to make the VAE learns a meaningful latent space representation, the model optimizes the Evidence Lower Bound (ELBO). The ELBO has two main components: the reconstruction term, which checks how well the decoder can recreate the input data from the latent space, and the regularization term, which ensures the latent space distribution matches a prior distribution (usually a standard Gaussian distribution).
Reparameterization: Q's Creative Twist
Now, imagine Q wants to create a new outfit inspired by your style, but with a twist. Instead of directly sampling from their notebook, Q uses a clever trick: they sample features from a standard normal distribution (like a random fashion magazine) and then transforms the sample using their notebook's parameters (like applying their fashion sense). This is similar to the reparameterization trick used in VAEs. By sampling from a standard normal distribution and transforming the sample using the latent space's parameters, we can backpropagate through the sampling process and optimize the VAE's parameters efficiently.
Q's Fashion Creations: The Decoder
Q will use their parameterized samples to create new outfits inspired by your fashion sense. Q's creations may not be exact replicas, but they capture the essence of your style.
This is the decoder in a VAE. The decoder takes the reparameterized samples from Q's notebook and uses them to generate new data samples similar to the original input data.
In the VAE, the decoder is trained to reconstruct the input data from the latent space. decoder is also another neural network that takes samples from the latent space and reconstructs the input data, mapping the latent variables back to the data space.
Key Functions
Encoder: Compresses input data into a latent representation.
Decoder: Reconstructs data from the latent representation.
Q's Fashion Evolution: Training the VAE
As Q continues to study your fashion sense, create new outfits, and refine their reparameterization trick, they develop a unique understanding of your style. This process is like training the VAE. During training, the VAE learns to optimize the encoder, latent space, and decoder. The goal is to find a balance between reconstructing the input data accurately and capturing the underlying patterns and structures in the data.
Training Objective
The VAE is trained to minimize the difference between the original input and the reconstructed output while also ensuring that the latent space follows a desired distribution (usually a Gaussian distribution).
This structure allows the VAE to learn meaningful representations of the data while also enabling effective generation of new data samples.
The Result: A Quirky yet Stylish VAE
Through this process, Q develops a unique understanding of your fashion sense, which is reflected in their quirky yet stylish outfits. Similarly, the trained VAE can generate new data samples that capture the essence of the input data, while also introducing new and interesting variations.

Implementation Details Using Pytorch
The encoder/decoder-based Variational Autoencoder (VAE) shown in this Figure 2.0 processes inputs consisting of sine waves. These sine waves have an amplitude of 1 unit and a single frequency, but the phase varies randomly within the range of [-π, +π]. The encoder maps these inputs to a latent representation, characterized by mean and variance, which the decoder uses to reconstruct the sine waves.

VAE Loss Function
The VAE loss function [1] comprises two main components:
Reconstruction Term: Measures the discrepancy between the input data and its reconstructed version.
Regularization Term (Kullback-Leibler Divergence): Ensures that the latent space distribution aligns closely with a prior distribution.
By minimizing the VAE loss function, the model learns to encode input data into a meaningful latent space representation, decode this representation into a reconstructed version of the input data, and generate new data samples akin to the input data.

Generating Synthetic Data Using VAEs
The steps for generation of Synthetic Sine Wave data using a Trained VAE are given here under:
Set the VAE to evaluation mode after sufficient convergence.
Disable gradient tracking to save memory and computation.
Sample latent vectors from a standard normal distribution. This is on Q's Creative Twist mentioned above.
Q's Synthetic Data Factory Powered by Circular Buffers
A circular buffer, often used for efficient data management, operates like a fixed-size, looping queue. In the of a VAE system, it functions as follows:
The circular buffer comprises two components—the read buffer and the write buffer:
Read Buffer: This is where Q continuously inputs random fashion patterns drawn from various magazines. These patterns are stored sequentially in the buffer for processing by the VAE encoder. The read buffer ensures a steady data to the encoder enabling operation.
Write Buffer Once the read buffer reaches its capacity (i.e., it becomes full), the write buffer takes over and begins storing new incoming patterns. This mechanism allows the system to handle data without overflow or interruptions. The decoder subsequently reads patterns the write buffer, transforming them into synthetic sine waves as part of the output.
The circular buffer operates such that when the end of the buffer is reached, it loops back to the beginning. This ensures optimal use of space, as no data is lost or wasted—old data is simply overwritten when the buffer cycles around.
In this implementation, the circular buffer enables efficient coordination between the encoder and decoder ensuring a seamless flow of and pattern decoding into synthetic sine waves. It is akin to a never-ending loop pattern generation and transformation.
References
[1] Diederik P. Kingma and Max Welling (2019), An Introduction to Variational Autoencoders, Foundations and Trends in Machine Learning, arXiv:1906.02691v3 [cs.LG] 11 Dec 2019, https://doi.org/10.48550/arXiv.1906.02691
Comments