FaceMix

A VAE for Image Reconstruction and Generation

This project implements a Variational Autoencoder (VAE) designed for image reconstruction and generative modeling. A VAE learns a latent representation of images that can be sampled to generate new outputs with similar characteristics to the training data. Built using a typical Encoder-Decoder architecture, the implementation includes standard VAE components such as the reparameterization trick, KL-divergence regularization, and a reconstruction loss.

Architecture Overview

Encoder (Inference Model)

Purpose: Takes an input image and encodes it into a lower-dimensional latent distribution parameterized by μ (mean) and σ (standard deviation).
Layers and Structure: Often consists of several convolutional layers that progressively reduce spatial dimensions. The final layer outputs two vectors: μ (mean) and logσ² for the latent space.
Activation & Normalization: Commonly uses ReLU or LeakyReLU activations, and may include Batch Normalization for efficiency.

Latent Space & Reparameterization

Instead of sampling z directly from N(μ, σ²I) (which breaks backpropagation), we use reparameterization to sample an ε from N(0,I) and construct z = μ + σ × ε. This trick allows gradients to flow through μ and σ.

Decoder (Generative Model)

Purpose: Takes a sampled latent vector z and reconstructs it back into the input image space.
Layers and Structure: Mirrors the Encoder via transposed convolutions (or fully connected layers), gradually increasing spatial dimensions until reaching the original image size. Often ends with a Sigmoid (if output range is [0,1]) or Tanh (if [-1,1]).
Design Choices: BatchNorm or Dropout can help training stability and generalization.

Implemented Solutions

Loss Function

Reconstruction Loss: Typically Binary Cross-Entropy (BCE) or MSE to measure fidelity between input and output.
KL-Divergence: Encourages the inferred distribution q(z|x) to stay close to the prior p(z), e.g. N(0,I).
Total VAE Loss: Reconstruction term + β * KL term (β can be 1.0 or annealed, as in β-VAE).

Optimizer and Training Enhancements

Adam optimizer (typical LR ~ 0.0002 or 0.001, β1=0.9, β2=0.999)
Data Preprocessing: Resize, crop, and normalize images (commonly [0,1] or [-1,1])
Checkpointing: Regular saving of model and optimizer states
Device Agnosticism: Automatic selection of CUDA if available, else CPU

Visualization and Monitoring

Loss Tracking: Plots or logs of reconstruction and KL terms over epochs
Sampling: Generate new images by sampling z ~ N(0,I) and passing it through the Decoder
Latent Space Exploration: Interpolations between different z vectors to see gradual transformations in output