Click the button below to mix two faces.
Mixing faces...
A VAE for Image Reconstruction and Generation
This project implements a Variational Autoencoder (VAE) designed for image reconstruction and generative modeling. A VAE learns a latent representation of images that can be sampled to generate new outputs with similar characteristics to the training data. Built using a typical Encoder-Decoder architecture, the implementation includes standard VAE components such as the reparameterization trick, KL-divergence regularization, and a reconstruction loss.
Architecture Overview
Encoder (Inference Model)
- Purpose: Takes an input image and encodes it into a lower-dimensional latent distribution parameterized by μ (mean) and σ (standard deviation).
- Layers and Structure: Often consists of several convolutional layers that progressively reduce spatial dimensions. The final layer outputs two vectors: μ (mean) and logσ² for the latent space.
- Activation & Normalization: Commonly uses ReLU or LeakyReLU activations, and may include Batch Normalization for efficiency.
Latent Space & Reparameterization
Instead of sampling z directly from N(μ, σ²I) (which breaks backpropagation), we use reparameterization to sample an ε from N(0,I) and construct z = μ + σ × ε. This trick allows gradients to flow through μ and σ.
Decoder (Generative Model)
- Purpose: Takes a sampled latent vector z and reconstructs it back into the input image space.
- Layers and Structure: Mirrors the Encoder via transposed convolutions (or fully connected layers), gradually increasing spatial dimensions until reaching the original image size. Often ends with a Sigmoid (if output range is [0,1]) or Tanh (if [-1,1]).
- Design Choices: BatchNorm or Dropout can help training stability and generalization.
Implemented Solutions
Loss Function
- Reconstruction Loss: Typically Binary Cross-Entropy (BCE) or MSE to measure fidelity between input and output.
- KL-Divergence: Encourages the inferred distribution q(z|x) to stay close to the prior p(z), e.g. N(0,I).
- Total VAE Loss: Reconstruction term + β * KL term (β can be 1.0 or annealed, as in β-VAE).
Optimizer and Training Enhancements
- Adam optimizer (typical LR ~ 0.0002 or 0.001, β1=0.9, β2=0.999)
- Data Preprocessing: Resize, crop, and normalize images (commonly [0,1] or [-1,1])
- Checkpointing: Regular saving of model and optimizer states
- Device Agnosticism: Automatic selection of CUDA if available, else CPU
Visualization and Monitoring
- Loss Tracking: Plots or logs of reconstruction and KL terms over epochs
- Sampling: Generate new images by sampling z ~ N(0,I) and passing it through the Decoder
- Latent Space Exploration: Interpolations between different z vectors to see gradual transformations in output