Introduction: Two Networks in Competition
Generative Adversarial Networks (GANs), introduced by Ian Goodfellow in 2014, represent one of the most innovative paradigms in deep learning. The idea is elegant in its simplicity: two neural networks compete against each other in a zero-sum game. The Generator creates synthetic data trying to fool the Discriminator, which in turn learns to distinguish real data from generated data. This competition pushes both networks to continuously improve.
GANs have revolutionized image generation: photorealistic faces that do not exist, artistic style transfer, super-resolution, data augmentation, and much more. In this article we will explore the theory, training challenges, and the most important variants, with practical PyTorch implementation.
What You Will Learn
- GAN architecture: Generator and Discriminator
- The adversarial loss function: the min-max game
- Training loop: alternating between the two networks
- Mode collapse and training instability: the main challenges
- Variants: Conditional GAN, DCGAN, Wasserstein GAN
- Applications: face generation, style transfer, data augmentation
- Complete DCGAN implementation in PyTorch
Architecture: Generator and Discriminator
The Generator
The Generator G takes as input a random noise vector z (typically sampled from a Gaussian or uniform distribution) and transforms it into synthetic data (e.g., an image). The generator's goal is to produce outputs realistic enough to fool the discriminator.
The Discriminator
The Discriminator D is a binary classifier: it receives data (real or generated) and produces a probability that it is real. The discriminator's goal is to correctly distinguish real data from generated data.
The adversarial loss formalizes this game: the discriminator maximizes the probability of classifying correctly, while the generator minimizes the probability that the discriminator detects its outputs as fake.
import torch
import torch.nn as nn
class Generator(nn.Module):
"""Generator: random noise -> 64x64 image"""
def __init__(self, latent_dim=100, channels=3):
super().__init__()
self.net = nn.Sequential(
# latent_dim -> 512 x 4 x 4
nn.ConvTranspose2d(latent_dim, 512, 4, 1, 0, bias=False),
nn.BatchNorm2d(512),
nn.ReLU(True),
# 512 x 4 x 4 -> 256 x 8 x 8
nn.ConvTranspose2d(512, 256, 4, 2, 1, bias=False),
nn.BatchNorm2d(256),
nn.ReLU(True),
# 256 x 8 x 8 -> 128 x 16 x 16
nn.ConvTranspose2d(256, 128, 4, 2, 1, bias=False),
nn.BatchNorm2d(128),
nn.ReLU(True),
# 128 x 16 x 16 -> 64 x 32 x 32
nn.ConvTranspose2d(128, 64, 4, 2, 1, bias=False),
nn.BatchNorm2d(64),
nn.ReLU(True),
# 64 x 32 x 32 -> channels x 64 x 64
nn.ConvTranspose2d(64, channels, 4, 2, 1, bias=False),
nn.Tanh() # Output in [-1, 1]
)
def forward(self, z):
return self.net(z.view(z.size(0), -1, 1, 1))
class Discriminator(nn.Module):
"""Discriminator: 64x64 image -> real/fake"""
def __init__(self, channels=3):
super().__init__()
self.net = nn.Sequential(
nn.Conv2d(channels, 64, 4, 2, 1, bias=False),
nn.LeakyReLU(0.2, inplace=True),
nn.Conv2d(64, 128, 4, 2, 1, bias=False),
nn.BatchNorm2d(128),
nn.LeakyReLU(0.2, inplace=True),
nn.Conv2d(128, 256, 4, 2, 1, bias=False),
nn.BatchNorm2d(256),
nn.LeakyReLU(0.2, inplace=True),
nn.Conv2d(256, 512, 4, 2, 1, bias=False),
nn.BatchNorm2d(512),
nn.LeakyReLU(0.2, inplace=True),
nn.Conv2d(512, 1, 4, 1, 0, bias=False),
nn.Sigmoid()
)
def forward(self, img):
return self.net(img).view(-1, 1)
Training Loop: The Adversarial Game
GAN training alternates between updating the discriminator and the generator. At each iteration: (1) the discriminator is trained on a batch of real data and a batch of generated data; (2) the generator is trained trying to fool the updated discriminator.
import torch.optim as optim
# Setup
latent_dim = 100
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
G = Generator(latent_dim).to(device)
D = Discriminator().to(device)
criterion = nn.BCELoss()
optimizer_G = optim.Adam(G.parameters(), lr=0.0002, betas=(0.5, 0.999))
optimizer_D = optim.Adam(D.parameters(), lr=0.0002, betas=(0.5, 0.999))
def train_step(real_images):
batch_size = real_images.size(0)
real_labels = torch.ones(batch_size, 1).to(device)
fake_labels = torch.zeros(batch_size, 1).to(device)
# --- Train Discriminator ---
optimizer_D.zero_grad()
# On real data
real_output = D(real_images)
d_loss_real = criterion(real_output, real_labels)
# On generated data
z = torch.randn(batch_size, latent_dim).to(device)
fake_images = G(z).detach() # detach: don't compute gradient for G
fake_output = D(fake_images)
d_loss_fake = criterion(fake_output, fake_labels)
d_loss = d_loss_real + d_loss_fake
d_loss.backward()
optimizer_D.step()
# --- Train Generator ---
optimizer_G.zero_grad()
z = torch.randn(batch_size, latent_dim).to(device)
fake_images = G(z)
fake_output = D(fake_images)
g_loss = criterion(fake_output, real_labels) # Wants to fool D
g_loss.backward()
optimizer_G.step()
return d_loss.item(), g_loss.item()
Training Challenges: Mode Collapse and Instability
Mode Collapse
Mode collapse is the most common GAN problem: the generator finds a few outputs that fool the discriminator and keeps producing only those, ignoring the diversity of real data. Instead of generating diverse faces, it might produce the same face repeatedly with minor variations.
Training Instability
GANs are notoriously difficult to train. If the discriminator becomes too strong, the gradient for the generator becomes nearly zero (vanishing gradient). If the generator dominates, the discriminator provides no useful feedback. Finding the right balance is an ongoing challenge.
Wasserstein GAN: Stabilizing Training
The Wasserstein GAN (WGAN) solves many instability problems by replacing Binary Cross-Entropy with the Wasserstein distance (Earth Mover's Distance). This provides a meaningful gradient even when real and generated distributions are very different, eliminating vanishing gradient. WGAN uses weight clipping or gradient penalty (WGAN-GP) to ensure the Lipschitz condition.
Important Variants
Conditional GAN (cGAN)
Conditional GANs add conditional information (e.g., class, text) to both the generator and discriminator. This allows controlling generation: "generate an image of a cat" or "generate the digit 7".
CycleGAN
CycleGAN enables translation between domains without paired data. It can transform photos into Monet-style paintings, convert horses to zebras, or transform summer landscapes to winter, without ever having seen the same scenes in both domains.
StyleGAN
StyleGAN (NVIDIA) achieved photorealistic high-resolution face generation. It introduces the concept of "style" at different levels of the network: coarse styles control the face structure, fine styles control details like hair texture and color.
Real-World Applications
- Face generation: realistic faces for gaming, cinema, avatars (ThisPersonDoesNotExist.com)
- Data augmentation: generating synthetic training data for underrepresented classes in medicine, manufacturing
- Super-resolution: increasing image resolution (SRGAN) for photography, satellites, medicine
- Image-to-image translation: sketch-to-photo, day-to-night, segmentation-to-image (Pix2Pix)
- Drug discovery: generating molecular structures with desired properties
Next Steps in the Series
- In the next article we will explore Diffusion Models, which have surpassed GANs in generation quality
- We will see the diffusion process: adding and removing noise to generate images
- We will analyze DALL-E, Stable Diffusion, and ControlNet







