Conditional Image Generation with PixelCNN Decoders

Size: px

Start display at page:

Download "Conditional Image Generation with PixelCNN Decoders"

Barbra Hill
6 years ago
Views:

1 Conditional Image Generation with PixelCNN Decoders Aaron van den Oord 1 Nal Kalchbrenner 1 Oriol Vinyals 1 Lasse Espeholt 1 Alex Graves 1 Koray Kavukcuoglu 1 1 Google DeepMind NIPS, 2016 Presenter: Beilun Wang NIPS, 2016 Presenter: Beilun Wang 1 / Aaron van den Oord, Nal Kalchbrenner, OriolConditional Vinyals, Lasse Image Espeholt, Generation Alex with Graves, PixelCNN Koray Kavukcuoglu Decoders (Google DeepMind) 21

2 Outline 1 Introduction Motivation Previous Solutions Contributions 2 Proposed Methods PixelCNN Conditional PixelCNN 3 Summary NIPS, 2016 Presenter: Beilun Wang 2 / Aaron van den Oord, Nal Kalchbrenner, OriolConditional Vinyals, Lasse Image Espeholt, Generation Alex with Graves, PixelCNN Koray Kavukcuoglu Decoders (Google DeepMind) 21

3 Outline 1 Introduction Motivation Previous Solutions Contributions 2 Proposed Methods PixelCNN Conditional PixelCNN 3 Summary NIPS, 2016 Presenter: Beilun Wang 3 / Aaron van den Oord, Nal Kalchbrenner, OriolConditional Vinyals, Lasse Image Espeholt, Generation Alex with Graves, PixelCNN Koray Kavukcuoglu Decoders (Google DeepMind) 21

4 Motivation Motivation: Conditional image generator with image density model. This generator can also be conditioned on class labels, descriptions and a single human face. Fast and parallel training. NIPS, 2016 Presenter: Beilun Wang 4 / Aaron van den Oord, Nal Kalchbrenner, OriolConditional Vinyals, Lasse Image Espeholt, Generation Alex with Graves, PixelCNN Koray Kavukcuoglu Decoders (Google DeepMind) 21

5 Problem Setting: Problem Setting: Input: Image with missing pixels, OR image class labels, OR a vector in the embedded space, OR a single human face Target: joint distribution consisting of conditional distribution with CNN. Output: Image PixelCNN (Pixel RNN): Conditional Version: p(x) = p(x i x 1,..., x i 1 ) (1) n 2 i=1 p(x h) = p(x i x 1,..., x i 1, h) (2) n 2 i=1 B conditioned on (R,G); G conditioned on R. NIPS, 2016 Presenter: Beilun Wang 5 / Aaron van den Oord, Nal Kalchbrenner, OriolConditional Vinyals, Lasse Image Espeholt, Generation Alex with Graves, PixelCNN Koray Kavukcuoglu Decoders (Google DeepMind) 21

6 Outline 1 Introduction Motivation Previous Solutions Contributions 2 Proposed Methods PixelCNN Conditional PixelCNN 3 Summary NIPS, 2016 Presenter: Beilun Wang 6 / Aaron van den Oord, Nal Kalchbrenner, OriolConditional Vinyals, Lasse Image Espeholt, Generation Alex with Graves, PixelCNN Koray Kavukcuoglu Decoders (Google DeepMind) 21

7 Previous Solutions PixelRNN Figure: PixelRNN NIPS, 2016 Presenter: Beilun Wang 7 / Aaron van den Oord, Nal Kalchbrenner, OriolConditional Vinyals, Lasse Image Espeholt, Generation Alex with Graves, PixelCNN Koray Kavukcuoglu Decoders (Google DeepMind) 21

8 Outline 1 Introduction Motivation Previous Solutions Contributions 2 Proposed Methods PixelCNN Conditional PixelCNN 3 Summary NIPS, 2016 Presenter: Beilun Wang 8 / Aaron van den Oord, Nal Kalchbrenner, OriolConditional Vinyals, Lasse Image Espeholt, Generation Alex with Graves, PixelCNN Koray Kavukcuoglu Decoders (Google DeepMind) 21

9 Contributions A fast and parallel trainable deep neural nets model for conditional image generator (?) Gated convolutional layers Conditional Gated convolutional layers NIPS, 2016 Presenter: Beilun Wang 9 / Aaron van den Oord, Nal Kalchbrenner, OriolConditional Vinyals, Lasse Image Espeholt, Generation Alex with Graves, PixelCNN Koray Kavukcuoglu Decoders (Google DeepMind) 21

10 Outline 1 Introduction Motivation Previous Solutions Contributions 2 Proposed Methods PixelCNN Conditional PixelCNN 3 Summary NIPS, 2016 Presenter: Beilun Wang 10

Pixel CNN Input: N N 3 Output: N N 3 256 Figure: PixelCNN Only above

11 Pixel CNN Input: N N 3 Output: N N Figure: PixelCNN Only above and left Pixels are considered. NIPS, 2016 Presenter: Beilun Wang 11

12 Masked Filter NIPS, 2016 Presenter: Beilun Wang 12

13 Blind spot NIPS, 2016 Presenter: Beilun Wang 13

14 Gated Convolutional Layers The gates in LSTM may help it to model more complex interactions. This is also studied by paper like Highway networks, grid LSTM, and Neural GPUs. y = tanh(w k,f x) σ(w k,g x) (3) NIPS, 2016 Presenter: Beilun Wang 14

15 Gated Convolutional Layers figure Figure: Gated Convolutional Layers NIPS, 2016 Presenter: Beilun Wang 15

16 Outline 1 Introduction Motivation Previous Solutions Contributions 2 Proposed Methods PixelCNN Conditional PixelCNN 3 Summary NIPS, 2016 Presenter: Beilun Wang 16

17 Conditional PixelCNN replace p(x h) from p(x). y = tanh(w k,f x + Vk,f T h) σ(w k,g x + Vk,g T h) (4) NIPS, 2016 Presenter: Beilun Wang 17

18 Conditional PixelCNN Embedded Use a deconvolutional neural nets m() map h back to the image space as s y = tanh(w k,f x + Vk,f T s) σ(w k,g x + Vk,g T s) (5) NIPS, 2016 Presenter: Beilun Wang 18

19 Experiment Results image generation based on labels NIPS, 2016 Presenter: Beilun Wang 19 Aaron van den Oord, Nal Kalchbrenner, Oriol Conditional Vinyals, Lasse Image Espeholt, Generation Alex with Graves, PixelCNN Koray Kavukcuoglu Decoders (Google DeepMind) / 21

20 Experiment Results image generation based on a single human face NIPS, 2016 Presenter: Beilun Wang 20

21 Summary This paper improves the PixelCNN by the gated activation unit This paper extends the PixelCNN to a conditional version NIPS, 2016 Presenter: Beilun Wang 21

arxiv: v1 [cs.lg] 28 Dec 2017

arxiv: v1 [cs.lg] 28 Dec 2017 PixelSNAIL: An Improved Autoregressive Generative Model arxiv:1712.09763v1 [cs.lg] 28 Dec 2017 Xi Chen, Nikhil Mishra, Mostafa Rohaninejad, Pieter Abbeel Embodied Intelligence UC Berkeley, Department of