Pixel Recurrent Neural Networks

Size: px

Start display at page:

Download "Pixel Recurrent Neural Networks"

Mervin Carroll
5 years ago
Views:

1 Pixel Recurret Neural Networks Aa ro va de Oord, Nal Kalchbreer, Koray Kavukcuoglu Google DeepMid August 2016 Preseter - Neha M

2 Example problem (completig a image) Give the first half of the image, create the secod half that is masked out. Origial image Masked image 2

3 Example problem (completig a image) Give the first half of the image, create the secod half that is masked out. Origial image Masked image Example geerated images 3

4 Geerative modellig Modelig distributio of atural images is a ladmark problem i usupervised learig. May Applicatios: Image compressio, deblurrig, geerate sythetic images, text to image ad so o. VAE GAN Autoregressive models

6 Why RNN? A pixel depeds o the previously geerated pixels i the image. PixelRNN is a deep eural etwork that sequetially predicts the pixels i a image alog two spatial dimesios. Log rage depedecies withi pixels i a image OR RNN Log rage depedecies withi differet frames i video

7 Geeratig a image pixel by pixel Previous approaches use cotiuous distributio pixel values i the image Pixels have discrete values from multiomial distributio implemeted with softmax layer. Each pixel is determied by three values: R,G,B.

8 Geeratig a image pixel by pixel Probability of pixel x i give all the previous pixels x 1,..., x i 1 = P( x i x <i )

9 = P( x i x <i ) Every coditioal distributio i equatio 2 is a multiomial that is modeled with a softmax layer.

10 Models Pixel RNN with Row LSTM layer Pixel RNN with Diagoal BiLSTM layer PixelCNN Multiscale Pixel-RNN

11 LSTM Traied pixelwise, oe pixel at a time. Really slow give the amout of pixels the dataset will have Hidde state(i, j) = LSTM( Hidde state(i-1, j), Hidde state( i, j-1 ), p( i, j )) No parallelisatio as all pixels are iterdepedet

12 Row LSTM Iput a etire row to predict the ext row. Decreases the traiig time (traiig oe row at a time istead of oe pixel) Moreover, traiig of pixels ca be parallelized. Hidde state(i, j) = LSTM( Hidde state(i-1, j-1), Hidde state( i-1, j+1 ), Hidde state(i-1, j), p( i, j )) parallelisatio as all pixels are ot iterdepedet

13 Row LSTM Cotext of 2 pixels per row is lost recursively Gives rise to a triagular cotext. Hidde state(i, j) = LSTM( Hidde state(i-1, j-1), Hidde state( i-1, j+1 ), Hidde state(i-1, j), p( i, j ))

14 Diagoal BiLSTM Desiged to capture the etire available cotext for ay image size. Each of the two directios of the layer scas the image i a diagoal fashio startig from a corer at the top ad reachig the opposite corer at the bottom.

15 Residual coectios Pixel RNNs are traied to up to 12 layers of depth. To icrease covergece speed ad propagate sigals more directly through the etwork, residual coectios are deployed.

16 Pixel CNN Row ad Diagoal LSTM layers log rage depedecies i the images. Learig of depedecies comes at a computatioal cost as each state eeds to be computed sequetially. Stadard covolutioal layers ca capture a bouded receptive field ad compute features for all pixel positios at oce. PixelCNN uses multiple covolutioal layers that preserve spatial resolutio; poolig layers are ot used. Masks are adopted i covolutios to avoid seeig future cotext.

17 Masked Covolutios Zero-out all the pixel values that are ot supposed to be available to the model Pixel x i depeds o pixels x 1...x i 1 so we have to esure it wo t access all subsequet pixels: x i+1.x ^2

18 Masked Covolutio

19 Models

20 Compariso

21 Assume we kow p(.) How to geerate a sigle pixel? 1 Grey-scale to simplify figure (color works too) 21

22 Assume we kow p(.) How to geerate a sigle pixel? 1 Grey-scale to simplify figure (color works too) 22

23 Assume we kow p(.) How to geerate a sigle pixel? Model p(x) 1 p(x) Grey-scale to simplify figure (color works too) 23

24 Assume we kow p(.) How to geerate a sigle pixel? Predicted itesity probabilities 90 Model p(x) p(x) Grey-scale to simplify figure (color works too) Predict the grayscale value 24

25 How to geerate a sigle pixel? Predicted itesity probabilities Predicted itesity 90 Model p(x) p(x) Grey-scale to simplify figure (color works too) Predict the grayscale value 25

26 How to geerate a sigle pixel? Predicted itesity probabilities Predicted itesity 90 Model p(x) usig a CNN to get probabilities for all pixels p(x) Grey-scale to simplify figure (color works too) Predict the grayscale value 26

27 How to geerate a image? Forward pass through traied CNN p(x) 1 Grey-scale to simplify figure (color works too) 256 Predict the grayscale value 27

28 How to geerate a image? Probability of grayscale value for that pixel (high values at tales) Start with all 0's (or could sample some value) Cosider the first pixel Forward pass through traied CNN p(x) 1 Grey-scale to simplify figure (color works too) 256 Predict the grayscale value 28

29 Probability of grayscale value for that pixel How to geerate a image? Start with all 0's (or could sample some value) Assig the first pixel Choose the predictio/argmax (or could sample) Forward pass through traied CNN p(x) 1 Grey-scale to simplify figure (color works too) 256 Predict the grayscale value 29

30 How to geerate a image? Probability of grayscale value for that pixel (distributio chages) Start with all 0's (or could sample some value) Cosider the secod pixel (usig ifo about the first) Forward pass through traied CNN p(x) 1 Grey-scale to simplify figure (color works too) 256 Predict the grayscale value 30

31 Probability of grayscale value for that pixel How to geerate a image? Start with all 0's (or could sample some value) Assig the secod pixel Choose the argmax (or could sample) Forward pass through traied CNN p(x) 1 Grey-scale to simplify figure (color works too) 256 Predict the grayscale value 31

32 Probability of grayscale value for that pixel How to geerate a image? Cosider the ith pixel Forward pass through traied CNN p(x) 1 Grey-scale to simplify figure (color works too) 256 Predict the grayscale value 32

33 How to lear p(x)? Cosider the ith pixel p(x) Grey-scale to simplify figure (color works too) Forward pass through traied CNN Predict the grayscale value 33

34 How to lear p(x)? PixelRNN: (recurret eural etwork) models log rage depedecies. Works well but very slow to trai. p(x) Cosider the ith pixel Grey-scale to simplify figure (color works too) Forward pass through traied CNN Predict the grayscale value 34

35 How to lear p(x)? PixelRNN: (recurret eural etwork) models log rage depedecies. Works well but very slow to trai. PixelCNN: (covolutioal eural etwork) captures local properties. Fast to trai (local covolutios), but work less well. p(x) Cosider the ith pixel Grey-scale to simplify figure (color works too) Forward pass through traied CNN Predict the grayscale value 35

36 How to lear p(x)? PixelRNN: (recurret eural etwork) models log rage depedecies. Works well but very slow to trai. PixelCNN: (covolutioal eural etwork) captures local properties. Fast to trai (local covolutios), but work less well. PixelRNN: hard to parallelize traiig cosiders all the pixels before it. PixelCNN: ca parallelize durig traiig If we are very careful about the covolutio widows 36

37 Blid Spots i Pixel CNN

38 This is a image with 5x5 pixels. Each cotais a umber (represeted usig letters) a b c d e f g h i j k l m o p q r s t u v w z y This is a filter of size 3x3. Each elemet cotais a umber (weight)

39 This is a image with 5x5 pixels. Each cotais a umber (represeted usig letters) a b c d e f g h i j k l m o p q r s t u v w z y Images are ofte zero-padded so the etire iput ca be covolved ad the output will be the same size zero-paddig 39

40 This is a image with 5x5 pixels. Each cotais a umber (represeted usig letters) Red idicates the respose (or a feature map) after covolutio a b c d e f g h i j k l m o p q r s t u v w z y a b c d e f g h i j k l m o p q r s t u v w x y Red idicates the value we wat to predict (our ukow) zero-paddig 40

41 This is a image with 5x5 pixels. Each cotais a umber (represeted usig letters) Red idicates the respose (or a feature map) after covolutio a b c d e f g h i j k l m o p q r s t u v w z y a b c d e f g h i j k l m o p q r s t u v w x y Red idicates the value we wat to predict (our ukow) This filter is bad sice a caot kow about b,f,g! (sice they come after a) zero-paddig 41

42 This is a image with 5x5 pixels. Each cotais a umber (represeted usig letters) Red idicates the value we predict (or a feature map) after covolutio a b c d e f g h i j k l m o p q r s t u v w z y a b c d e f g h i j k l m o p q r s t u v w x y This filter is bad sice a caot kow about b,f,g! (sice they come after a) zero-paddig Solutio: mask the filter (i.e., fix to 0) for elemets that a caot kow about

43 a b c d e f g h i j k l m o p q r s t u v w z y a b c d e f g h i j k l m o p q r s t u v w x y Note that it's coditioig o all zero's for the first elemet (as desired) zero-paddig 43

44 a b c d e f g h i j k l m o p q r s t u v w z y a b c d e f g h i j k l m o p q r s t u v w x y Blue idicates elemets that are observed, so we ca coditio o them zero-paddig 44

45 a b c d e f g h i j k l m o p q r s t u v w z y a b c d e f g h i j k l m o p q r s t u v w x y zero-paddig 45

46 a b c d e f g h i j k l m o p q r s t u v w z y a b c d e f g h i j k l m o p q r s t u v w x y zero-paddig 46

47 a b c d e f g h i j k l m o p q r s t u v w z y a b c d e f g h i j k l m o p q r s t u v w x y zero-paddig 47

48 a b c d e f g h i j k l m o p q r s t u v w z y a b c d e f g h i j k l m o p q r s t u v w x y zero-paddig 48

49 a b c d e f g h i j k l m o p q r s t u v w z y a b c d e f g h i j k l m o p q r s t u v w x y zero-paddig 49

50 a b c d e f g h i j k l m o p q r s t u v w z y a b c d e f g h i j k l m o p q r s t u v w x y q is a fuctio of k,l,m,p a b c d e f g h i j k l m o p q r s t u v w x y zero-paddig 50

51 a b c d e f g h i j k l m o p q r s t u v w z y a b c d e f g h i j k l m o p q r s t u v w x y a b c d e f g h i j k l m o p q r s t u v w x y q is a fuctio of k,l,m,p which are a fuctio of f,g,h,i, which are a fuctio of a,b,c,d,e Deeper layers will better model this but q has "blid spots" as it does ot cosider j,,o zero-paddig 51

52 Evaluatio Datasets - MNIST, CIFAR-10, ad ImageNet Models are traied ad evaluated o the log likelihood loss fuctio Maximizig log likelihood = miimizig "egative log likelihood" (NLL)

53 Experimets

54 Image completios

55 Refereces Gated Pixel CNN, dit#slide=id.g1aa6f048ae_0_4226

56 Questios?

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it