[Articles] Generative Adversarial Nets

About this Article

Authors: Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio
Journals: arXiv (later, Advances in neural information processing systems)
Year: 2014
Citation: Goodfellow, Ian J., et al. “Generative adversarial nets.” Advances in neural information processing systems 27 (2014).

Accomplishments

Developed Generative Adversarial Nets (GAN), an architecture for generating data.

Key Points

1. Idea

The basic idea of GAN is to compete two distinct models(networks), one is a Generator(G) and the other is a Discriminator(D). The purpose of two networks are as follows.

G: Make data which is difficult to discriminate from the real data.
D: Distinguish whether given data is fake or real.

G and D will progressively compete like a police(D) and a fraud(G). The final goal is to make G perfect that D prints 1/2 for any input data.

2. Value Function

1. Basic Notations

$p_g$: generator’s distribution.
$G(z)$: data produced by generator G from noise z.
$D(x)$: the probability that data x came from the data rather than $p_g$. (In other words, from actual data.)
$D(G(z)))$: the probability that D determines fake data $G(z)$ as an actual data.
$(1-D(G(z)))$: the probability that D determines a fake data as a fake data.

2. Basic sketch We train D and G:

D to maximize the probability of assigning correct label to both training examples and samples from G.
G to minimize $log(1-D(G(z)))$.

3. Value function The value(objective) function $V(D,G)$ is $\min_{G} \max_{D} V(D,G) = \mathbb{E}_{x \sim p_{\text{data}}(x)}[\log D(x)] + \mathbb{E}_{z \sim p_{z}(z)}[\log(1-D(G(z)))]$

4. In the viewpoint of D D should discriminate real data as real data. Therefore, D has to maximize the probability $D(x)$ close to 1.

5. In the viewpoint of G The goal of G is to cheat D. Therefore, G has to minimize $log(1-D(G(z)))$.

6. In early training In early training, $log(1-D(G(z)))$ saturates (vanishing gradient). To solve this problem, authors used maximizing $log(D(G(z)))$ instead.

3. Algorithm

Authors alternatively trained D (k steps) and G (1 step).

Training D: Extract m noise data and m real data. Update D by using a gradient of value function. Add (ascending) the gradient because the purpose of D is maximization.
Training G: Extract m noise data. Update G by using a gradient of value function. Subtract (descending) the gradient because the purpose of G is minimization.

4. Theoretical Guarantees

There are two mathematical theorems that guarantee the validity of GAN.

Global Optimality of $p_g = p_{\text{data}}$: This shows that GAN works as we intended. Authors also suggest that the global optimum of value function is $-log~4$.
Convergence of Algorithm: This shows that GAN training algorithm works.

5. Advantages and Disadvantages

1. Advantages

Markov chains are never needed, only backprop is needed.
No inference is needed.
A wide variety of functions can be incorporated into the model.
Components of inputs are not copied directly into the generator’s parameters.

2. Disadvantages

No explicit representation of $p_g(x)$.
Training speed of G and D should be well-synchronized.
The Helvetica scenario: Mode collapse can happen. If G is trained too much without updating D, G can find and make only one perfect sample that can deceive D, for any input noise distribution z.

Further Research

Conditional Generative Model: By conditioning input, we can control a model to generate certain types of data. (This idea leads to cGAN.)
Learned Approximate Inference: GAN in this article learned ‘input noise z -> output x’ direction. We can introduce another auxiliary network to infer the reverse direction, from output to input noise. (This idea leads to BiGAN, ALI.)
Modeling Conditionals: By training conditional models that share the same parameters, we can make a model to predict the rest part of data when only part of it is given. (Think image refining tools, ‘AI eraser’ of Galaxy or ‘Clean up’ of iOS.)
Semi-supervised Learning: If labeled data is limited, features from the discriminator D can be used to improve performance of classifiers.
Efficiency improvements: Training can be accelerated by devising better methods for coordinating G and D. It can also be improved by a better noise distribution z.

Full Article

Visit here

Twitter Facebook LinkedIn

[Articles] Generative Adversarial Nets

HJ

About this Article

Accomplishments

Key Points

1. Idea

2. Value Function

3. Algorithm

4. Theoretical Guarantees

5. Advantages and Disadvantages

Further Research

Full Article

공유하기

댓글남기기

참고

[Articles] Highly accurate protein structure prediction with AlphaFold

[Articles] A universal SNP and small-indel variant caller using deep neural networks

[Articles] Learning Transferable Visual Models From Natural Language Supervision

[Articles] Mastering the game of Go with deep neural networks and tree search