Today I had the pleasure of attending a very interesting workshop on generative adversarial networks. The goal of the workshop was to teach attendees about deep learning and Generative Adversarial Networks (GANS). In the lab we used PyTorch, an open source deep learning framework, used to demonstrate and explore this type of neural network architecture. The lab was comprised of two major parts an introduction to both PyTorch and GANs followed by text-to-image generation.
The first part of the lab began by importing torch modules, creating a simple linear transformation model creating a loss function to understand the difference between our model and the ground truth.
Next we ran our model on a GPU! Earlier in the session we learned that GPUs work well for deep learning because they are inherently parallel.
With GPUs, trained neural networks can occur in minutes.
We then began to focus more on GANs. The facilitators of the workshop shared that GANs are getting widespread attention in the deep learning community for their image generation and style transfer capabilities.
This deep learning technique uses two neural networks in a adversarial way to complete its objective.
One network is called the generator and the other the discriminator. The discriminator network is trained with a dataset comprised of real data and output from the generator network, and its objective is to discriminate between the two. The generator network’s objective is to fool the discriminator into classifying its output as real data. While training the generator is updated to generate data that mimics the real data and fool the discriminator.
In this part of the lab attendees were tasked to :
- Feed data into PyTorch using Numpy
- Create a multi-layer network
- Configure the generator and discriminator network
- Learn how to update the generator network
The second part of the lab we built upon the popular Deep Convolution Generative Adversarial Network (DCGAN) to enable text to image generation. This part of the lab was based on the paper Generative Adversarial Text to Image Synthesis by Reed et, al. Captions of images were encoded and concatenated with the input noise vector before being propagated to the generator. Then the encoded caption was concatenated again with a feature map in the discriminator network after the fourth leaky Rectified Linear Unit (ReLU) layer.The goal of the second half of the lab was to create a text to image model by using the GAN+CLS technique.
We demonstrated the capability of our model to generate plausible images of pizzas and broccoli from detailed text descriptions/captions. While this was just a case for learning purposes its clear that there are many powerful applications to this deep learning technique.