Deep Learning CMSC 422 MARINE CARPUAT. Based on slides by Vlad Morariu

Size: px

Start display at page:

Download "Deep Learning CMSC 422 MARINE CARPUAT. Based on slides by Vlad Morariu"

Beatrice Norman
6 years ago
Views:

1 Deep Learig CMSC 422 MARINE CARPUAT Based o slides by Vlad Morariu

2 feature extractio classificatio Stadard Applicatio of Machie Learig to Computer Visio cat or backgroud features predicted labels Features: e.g., Scale Ivariat Feature Trasform(SIFT) Classifiers: SVM, Radom Forests, KNN, Features are had-crafted, ot traied evetually limited by feature quality cat supervisio traiig Cat image credit:

3 traiig supervisio features classifier Image credit: LeCu, Y., Bottou, L., Begio, Y., Haffer, P. Gradiet-based learig applied to documet recogitio. Proceedigs of the IEEE, Deep learig multiple layer eural etworks lear features ad classifiers directly ( ed-to-ed traiig) breakthrough i Computer Visio, ow i other AI areas

4 Speech Recogitio Slide credit: Bohyug Ha

5 Image Classificatio Performace Image Classificatio Top-5 Errors (%) Figure from: K. He, X. Zhag, S. Re, J. Su. Deep Residual Learig for Image Recogitio. arxiv (slides) Slide credit: Bohyug Ha

6 Today s lecture: key cocepts Covolutioal Neural Networks Revisitig Backpropagatio ad Gradiet Descet for Deep Networks

7 Multi-Layer Perceptro (MLP) Image source:

8 Image credit: LeCu, Y., Bottou, L., Begio, Y., Haffer, P. Gradiet-based learig applied to documet recogitio. Proceedigs of the IEEE, Neural Networks Applied to Visio LeCu, Y; Boser, B; Deker, J; Hederso, D; Howard, R; Hubbard, W; Jackel, L, Backpropagatio Applied to Hadwritte Zip Code Recogitio, i Neural Computatio, 1989 USPS digit recogitio, later check readig Covolutio, poolig ( weight sharig ), fully coected layers

9 Architecture overview Image credit: LeCu, Y., Bottou, L., Begio, Y., Haffer, P. Gradiet-based learig applied to documet recogitio. Proceedigs of the IEEE, Compoets: Covolutio layers Poolig/Subsamplig layers Fully coected layers

10 Covolutioal Layer 32x32x3 image 32 height 3 32 depth width Slide credit: Fei-Fei Li, Adrej Karpathy, ad Justi Johso

11 Covolutioal Layer Slide credit: Fei-Fei Li, Adrej Karpathy, ad Justi Johso 32x32x3 image 5x5x3 filter Covolve the filter with the image i.e. slide over the image spatially, computig dot products 11 Ja 2016

12 Covolutioal Layer Slide credit: Fei-Fei Li, Adrej Karpathy, ad Justi Johso 32x32x3 image Filters always exted the full depth of the iput volume 5x5x3 filter Covolve the filter with the image i.e. slide over the image spatially, computig dot products 11 Ja 2016

13 Covolutioal Layer Slide credit: Fei-Fei Li, Adrej Karpathy, ad Justi Johso 32 32x32x3 image 5x5x3 filter umber: the result of takig a dot product betwee the filter ad a small 5x5x3 chuk of the image (i.e. 5*5*3 = 75-dimesioal dot product + bias) 11 Ja 2016

14 Covolutioal Layer Slide credit: Fei-Fei Li, Adrej Karpathy, ad Justi Joh 32 32x32x3 image 5x5x3 filter activatio map 28 covolve (slide) over all spatial locatios Ja 2016

15 Covolutioal Layer Slide credit: Fei-Fei Li, Adrej Karpathy, ad Justi Johso cosider a secod, gree filter 32 32x32x3 image 5x5x3 filter activatio maps 28 covolve (slide) over all spatial locatios Ja 2016

16 Covolutioal Layer For example, if we had 6 5x5 filters, we ll get 6 separate activatio maps: activatio maps Covolutio Layer Ja 2016 We stack these up to get a ew image of size 28x28x6! Slide credit: Fei-Fei Li, Adrej Karpathy, ad Justi Johso

17 Covolutioal Layer Slide credit: Fei-Fei Li, Adrej Karpathy, ad Justi Johso CovNet is a sequece of Covolutioal Layers, iterspersed with activatio fuctios CONV, ReLU e.g. 6 5x5x3 filters Ja 2016

18 Covolutioal Layer Slide credit: Fei-Fei Li, Adrej Karpathy, ad Justi Johso CovNet is a sequece of Covolutioal Layers, iterspersed with activatio fuctios CONV, ReLU e.g. 6 5x5x3 filters 28 6 CONV, ReLU e.g. 10 5x5x6 filters CONV, ReLU. 11 Ja 2016

19 Rectified Liear Uits (ReLU) Use rectified liear fuctio istead of sigmoid ReL(x) = max (0,x) Advatages Fast No vaishig gradiets

20 Poolig Layer - makes the represetatios smaller ad more maageable - operates over each activatio map idepedetly 11 Ja 2016 Slide credit: Fei-Fei Li, Adrej Karpathy, ad Justi Johso

21 Poolig Layer MAX POOLING x Sigle depth slice y max pool with 2x2 filters ad stride Ja 2016 Slide credit: Fei-Fei Li, Adrej Karpathy, ad Justi Johso

22 Covolutioal filter visualizatio Slide credit: Fei-Fei Li, Adrej Karpathy, ad Justi Johso [From recet Ya LeCu slides] 11 Ja 2016

Covolutioal filter visualizatio Slide credit: Fei-Fei Li, Adrej Karpathy, ad Justi Johso oe filter => oe activatio map example 5x5 filters (32 total) We

23 Covolutioal filter visualizatio Slide credit: Fei-Fei Li, Adrej Karpathy, ad Justi Johso oe filter => oe activatio map example 5x5 filters (32 total) We call the layer covolutioal because it is related to covolutio of two sigals: 11 Ja 2016 elemetwise multiplicatio ad sum of a filter ad the sigal (image)

24 Today s lecture: key cocepts Covolutioal Neural Networks Revisitig Backpropagatio ad Gradiet Descet for Deep Networks

25 Multi-Layer Perceptro (MLP) Image source:

26 Sigle euro gradiet x 1 w 1 b y x 2 w 2 z y Σ w d Sigmoid L L x d z = b + i w i x i y = e z L = 1 2 y y 2 z w i = x i d y dz = y(1 y) L y = y y Chai rule: L w i = y w i L y = z d y w i dz L y = x i y 1 y y y Slide credit: Adapted from Bohyug Ha

27 Sigle euro traiig for t = 1,, T y = f x, w t = 1,, N L w i = w t+1 = w t + Δw edfor x i y 1 y y y i = 1,, d a epoch Slide credit: Adapted from Bohyug Ha

28 Multi-Layer: Backpropagatio y x k w ij z j y i Sigmoid Sigmoid w ki z i Σ Σ L Neuro i Neuro j y j L L z j = d y j dz j L y j L y i = j dz j d y i L z j = j w ij L z j = j w ij d y j dz j L y j L w ki = z i d y i w ki dz i L y i = z i d y i w ki dz i j w ij d y j dz j L y j Slide credit: 28 Bohyug Ha

29 Backpropagatio i practice Two passes per iteratio: Forward pass: compute value of loss fuctio (ad itermediate euros) give iputs Backward pass: propagate gradiet of loss (error) backwards through the etwork usig the chai rule

30 Stochastic Gradiet Descet (SGD) Update weights for each sample E = 1 2 y y 2 + Fast, olie Sesitive to oise w i t + 1 = w i t ε E w i Miibatch SGD: Update weights for a small set of samples E = 1 2 B y y 2 w i t + 1 = w i t ε EB w i + Fast, olie + Robust to oise Slide credit: Bohyug Ha

31 SGD improvemets: Mometum Remember the previous directio + Coverge faster + Avoid oscillatio v i t = αv i t 1 ε E w i (t) w t + 1 = w t + v(t) Slide credit: Bohyug Ha

32 SGD improvemets: Weight Decay Pealize the size of the weights C = E i w i 2 w i t + 1 = w i t ε C w i = w i t ε E w i λw i + Improve geeralizatio a lot! Slide credit: Bohyug Ha

33 Key cocepts Covolutioal Neural Networks Revisitig Backpropagatio ad Gradiet Descet for Deep Networks

34 History: NN Revival i the 1980 s Backpropagatio discovered i 1970 s but popularized i 1986 David E. Rumelhart, Geoffrey E. Hito, Roald J. Williams. Learig represetatios by back-propagatig errors. I Nature, MLP is a uiversal approximator Ca approximate ay o-liear fuctio i theory, give eough euros, data Kurt Horik, Maxwell Stichcombe, Halbert White. Multilayer feedforward etworks are uiversal approximators. Neural Networks, 1989 Geerated lots of excitemet ad applicatios 35

35 Image credit: LeCu, Y., Bottou, L., Begio, Y., Haffer, P. Gradiet-based learig applied to documet recogitio. Proceedigs of the IEEE, Neural Networks Applied to Visio LeNet visio applicatio LeCu, Y; Boser, B; Deker, J; Hederso, D; Howard, R; Hubbard, W; Jackel, L, Backpropagatio Applied to Hadwritte Zip Code Recogitio, i Neural Computatio, 1989 USPS digit recogitio, later check readig Covolutio, poolig ( weight sharig ), fully coected layers

36 Issues i Deep Neural Networks Prohibitive traiig time Especially with lots of traiig data May epochs typically required for optimizatio Expesive gradiet computatios Overfittig Leared fuctio fits traiig data well, but performs poorly o ew data (high capacity model, ot eough traiig data) Slide credit: adapted from Bohyug Ha

37 Issues i Deep Neural Networks Vaishig gradiet problem z y Sigmoid E w ki = z i d y i w ki dz i E y i = z i w ki d y i dz i j w ij d y j dz j E y j Gradiets i the lower layers are typically extremely small Optimizig multi-layer eural etworks takes huge amout of time Slide credit: adapted from Bohyug Ha

38 New witer ad revival i early 2000 s New witer i the early 2000 s due to problems with traiig NNs Support Vector Machies (SVMs), Radom Forests (RF) easy to trai, ice theory Revival agai by Name chage ( eural etworks -> deep learig ) + Algorithmic developmets usupervised layer-wise pre-traiig ReLU, dropout, layer ormalizatoi + Big data + GPU computig = Large outperformace o may datasets (Visio: ILSVRC 12)

Big Data ImageNet Large Scale Visual Recogitio Challege 1000 categories w/ 1000 images per category 1.2 millio traiig images, 50,000 validatio, 150,000 testig 40 O.

39 Big Data ImageNet Large Scale Visual Recogitio Challege 1000 categories w/ 1000 images per category 1.2 millio traiig images, 50,000 validatio, 150,000 testig 40 O. Russakovsky, J. Deg, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huag, A. Karpathy, A. Khosla, M. Berstei, A. C. Berg ad L. Fei-Fei. ImageNet Large Scale Visual Recogitio Challege. IJCV, 2015.

40 AlexNet Architecture 60 millio parameters! Various tricks ReLU oliearity Overlappig poolig Local respose ormalizatio Dropout set hidde euro output to 0 with probability.5 Data augmetatio Traiig o GPUs Figure credit: Krizhevsky et al, NIPS Alex Krizhevsky, Ilya Sutskeyer, Geoffrey E. Hito. ImageNet Classificatio with Deep Covolutioal Neural Networks. NIPS, 2012.

41 GPU Computig Big data ad big models require lots of computatioal power GPUs thousads of cores for parallel operatios multiple GPUs still took about 5-6 days to trai AlexNet o two NVIDIA GTX 580 3GB GPUs (much faster today)

42 Recurret Neural Networks Networks with loops The output of a layer is used as iput for the same (or lower) layer Ca model dyamics (e.g. i space or time) Image credit: Chritopher Olah s blog Sepp Hochreiter (1991), Utersuchuge zu dyamische euroale Netze, Diploma thesis. Istitut f. Iformatik, Techische Uiv. Muich. Advisor: J. Schmidhuber. Y. Begio, P. Simard, P. Frascoi. Learig Log-Term Depedecies with Gradiet Descet is Difficult. I TNN 1994.

Recurret Neural Networks Let s uroll the loops Now a stadard feed-forward etwork with may layers Suffers from vaishig gradiet problem I theory, ca lear log term memory, i practice ot (Begio et al,

43 Recurret Neural Networks Let s uroll the loops Now a stadard feed-forward etwork with may layers Suffers from vaishig gradiet problem I theory, ca lear log term memory, i practice ot (Begio et al, 1994) Image credit: Chritopher Olah s blog Sepp Hochreiter (1991), Utersuchuge zu dyamische euroale Netze, Diploma thesis. Istitut f. Iformatik, Techische Uiv. Muich. Advisor: J. Schmidhuber. Y. Begio, P. Simard, P. Frascoi. Learig Log-Term Depedecies with Gradiet Descet is Difficult. I TNN 1994.

Log Short Term Memory (LSTM) A type of RNN explicitly desiged ot to have the vaishig or explodig gradiet problem Models log-term depedecies Memory is propagated ad accessed by gates Used for speech

44 Log Short Term Memory (LSTM) A type of RNN explicitly desiged ot to have the vaishig or explodig gradiet problem Models log-term depedecies Memory is propagated ad accessed by gates Used for speech recogitio, laguage modelig Hochreiter, Sepp; ad Schmidhuber, Jürge. Log Short-Term Memory. Neural Computatio, Image credit: Christopher Colah s blog,

45 Usupervised Neural Networks Autoecoders Ecode the decode the same iput No supervisio eeded output x hidde layer iput x H. Bourlard ad Y. Kamp Auto-associatio by multilayer perceptros ad sigular value decompositio. Biol. Cyber. 59, 4-5 (September 1988),

Deep Neural Networks CMSC 422 MARINE CARPUAT. Deep learning slides credit: Vlad Morariu

Deep Neural Networks CMSC 422 MARINE CARPUAT marie@cs.umd.edu Deep learig slides credit: Vlad Morariu Traiig (Deep) Neural Networks Computatioal graphs Improvemets to gradiet descet Stochastic gradiet