Deep Neural Networks CMSC 422 MARINE CARPUAT. Deep learning slides credit: Vlad Morariu

Size: px

Start display at page:

Download "Deep Neural Networks CMSC 422 MARINE CARPUAT. Deep learning slides credit: Vlad Morariu"

Jordan Mason
5 years ago
Views:

1 Deep Neural Networks CMSC 422 MARINE CARPUAT Deep learig slides credit: Vlad Morariu

2 Traiig (Deep) Neural Networks Computatioal graphs Improvemets to gradiet descet Stochastic gradiet descet Mometum Weight decay Vaishig Gradiet Problem Examples of deep architectures

3 Vaishig Gradiet Problem I deep etworks Gradiets i the lower layers are typically extremely small Optimizig multi-layer eural etworks takes huge amout of time z y Sigmoid E w ki = z i w ki d y i dz i E y i = z i w ki d y i dz i j w ij d y j dz j E y j Slide credit: adapted from Bohyug Ha

4 Vaishig Gradiet Problem Vaishig gradiet problem ca be mitigated Usig other o-liearities E.g., Rectifier: f(x) = max(0,x) Usig custom eural etwork architectures E.g., LSTM

5 Traiig (Deep) Neural Networks Computatioal graphs Improvemets to gradiet descet Stochastic gradiet descet Mometum Weight decay Vaishig Gradiet Problem Examples of deep architectures

6 traiig supervisio features classifier Image credit: LeCu, Y., Bottou, L., Begio, Y., Haffer, P. Gradiet-based learig applied to documet recogitio. Proceedigs of the IEEE, A example of deep eural etwork for computer visio lear features ad classifiers joitly ( ed-toed traiig)

7 New witer ad revival i early 2000 s New witer i the early 2000 s due to problems with traiig NNs Support Vector Machies (SVMs), Radom Forests (RF) easy to trai, ice theory Revival agai by Name chage ( eural etworks -> deep learig ) + Algorithmic developmets usupervised pre-traiig ReLU, dropout, layer ormalizatoi + Big data + GPU computig = Large outperformace o may datasets (Visio: ILSVRC 12)

Big Data ImageNet Large Scale Visual Recogitio Challege 1000 categories w/ 1000 images per category 1.2 millio traiig images, 50,000 validatio, 150,000 testig O.

8 Big Data ImageNet Large Scale Visual Recogitio Challege 1000 categories w/ 1000 images per category 1.2 millio traiig images, 50,000 validatio, 150,000 testig O. Russakovsky, J. Deg, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huag, A. Karpathy, A. Khosla, M. Berstei, A. C. Berg ad L. Fei-Fei. ImageNet Large Scale Visual Recogitio Challege. IJCV, 2015.

9 AlexNet Architecture Figure credit: Krizhevsky et al, NIPS millio parameters! Various tricks ReLU oliearity Dropout set hidde euro output to 0 with probability.5 Traiig o GPUs Alex Krizhevsky, Ilya Sutskeyer, Geoffrey E. Hito. ImageNet Classificatio with Deep Covolutioal Neural Networks. NIPS, 2012.

10 GPU Computig Big data ad big models require lots of computatioal power GPUs thousads of cores for parallel operatios multiple GPUs still took about 5-6 days to trai AlexNet o two NVIDIA GTX 580 3GB GPUs (much faster today)

11 Image Classificatio Performace Image Classificatio Top-5 Errors (%) Figure from: K. He, X. Zhag, S. Re, J. Su. Deep Residual Learig for Image Recogitio. arxiv (slides) Slide credit: Bohyug Ha

12 Speech Recogitio Slide credit: Bohyug Ha

13 Recurret Neural Networks for Laguage Modelig Speech recogitio is difficult due to ambiguity how to recogize speech or how to wreck a ice beach? Laguage model gives probability of ext word give history P( speech how to recogize )?

ot (Begio et al, 1994) Image credit: Chritopher Olah s blog http://colah.github.

14 Recurret Neural Networks Networks with loops The output of a layer is used as iput for the same (or lower) layer Ca model dyamics (e.g. i space or time) Loops are urolled Now a stadard feed-forward etwork with may layers Suffers from vaishig gradiet problem I theory, ca lear log term memory, i practice ot (Begio et al, 1994) Image credit: Chritopher Olah s blog Sepp Hochreiter (1991), Utersuchuge zu dyamische euroale Netze, Diploma thesis. Istitut f. Iformatik, Techische Uiv. Muich. Advisor: J. Schmidhuber. Y. Begio, P. Simard, P. Frascoi. Learig Log-Term Depedecies with Gradiet Descet is Difficult. I TNN 1994.

15 A Recurret Neural Network Computatioal Graph

16 A Recurret Neural Network Computatioal Graph

Log Short Term Memory (LSTM) Image credit: Christopher Colah s blog, http://colah.github.

17 Log Short Term Memory (LSTM) Image credit: Christopher Colah s blog, LSTMs/ A type of RNN explicitly desiged ot to have the vaishig or explodig gradiet problem Models log-term depedecies Memory is propagated ad accessed by gates Used for speech recogitio, laguage modelig Hochreiter, Sepp; ad Schmidhuber, Jürge. Log Short-Term Memory. Neural Computatio, 1997.

18 Log Short Term Memory (LSTM) Image credit: Christopher Colah s blog, LSTMs/

19 What you should kow about deep eural etworks Why they are difficult to trai Iitializatio Overfittig Vaishig gradiet Require large umber of traiig examples What ca be doe about it Improvemets to gradiet descet Stochastic gradiet descet Mometum Weight decay Alterate o-liearities ad ew architectures Refereces (& great tutorials) if you wat to explore further:

20 Keepig thigs i perspective I 1958, the New York Times reported the perceptro to be "the embryo of a electroic computer that [the Navy] expects will be able to walk, talk, see, write, reproduce itself ad be coscious of its existece."

21 Project 3 Due May 10 PCA, digit classificatio with eural etworks 2 importat cocepts Logistic regressio Softmax classifier

Deep Learning CMSC 422 MARINE CARPUAT. Based on slides by Vlad Morariu

Deep Learning CMSC 422 MARINE CARPUAT. Based on slides by Vlad Morariu Deep Learig CMSC 422 MARINE CARPUAT marie@cs.umd.edu Based o slides by Vlad Morariu feature extractio classificatio Stadard Applicatio of Machie Learig to Computer Visio cat or backgroud features predicted