Deep Learning: Theory, Techniques & Applications - Recurrent Neural Networks -

Size: px

Start display at page:

Download "Deep Learning: Theory, Techniques & Applications - Recurrent Neural Networks -"

Arline Edwards
5 years ago
Views:

1 Deep Learning: Theory, Techniques & Applicaions - Recurren Neural Neworks - Prof. Maeo Maeucci maeo.maeucci@polimi.i Deparmen of Elecronics, Informaion and Bioengineering Arificial Inelligence and Roboics Lab - Poliecnico di Milano

2 Sequence Modeling So far we have considered only «saic» daases X w 0 w x w ji g x w g K x w w JI 2

3 Sequence Modeling So far we have considered only «saic» daases X 0 X X 2 X 3 X x x x x x ime 3

4 Sequence Modeling Differen ways o deal wih «dynamic» daa: Memoryless models: Auoregressive models Feedforward neural neworks Models wih memory: Linear dynamical sysems Hidden Markov models Recurren Neural Neworks... X 0 x X x X 2 x X 0 X X 2 X 3 X X 3 x X x ime 4

5 Memoryless Models for Sequences Auoregressive models Predic he nex inpu from previous ones using «delay aps» W 2 X 0 X X 2 X 3 X W ime Feed forward neural neworks Generalize auoregressive models using non linear hidden layers W 2 Hidden W W X 0 X X 2 X 3 X ime 5

6 Hidden Hidden Hidden Dynamical Sysems Sochasic sysems... Generaive models wih a real-valued hidden sae which canno be observed direcly The hidden sae has some dynamics possibly affeced by noise and produces he oupu To compue he oupu has o infer hidden sae Inpu are reaed as driving inpus Y 0 Y Y In linear dynamical sysems his becomes: Sae coninuous wih Gaussian uncerainy Transformaions are assumed o be linear Sae can be esimaed using Kalman filering X 0 X X ime 6

7 Hidden Hidden Hidden Dynamical Sysems Sochasic sysems... Generaive models wih a real-valued hidden sae which canno be observed direcly The hidden sae has some dynamics possibly affeced by noise and produces he oupu To compue he oupu has o infer hidden sae Inpu are reaed as driving inpus Y 0 Y Y In hidden Markov models his becomes: Sae assumed o be discree, sae ransiions are sochasic (ransiion marix) Oupu is a sochasic funcion of hidden saes Sae can be esimaed via Vierbi algorihm. ime 7

Recurren Neural neworks Deerminisic sysems.

hidden sae allows o sore a informaion efficienly Non-linear dynamics

and ime, RNNs can compue anyhing ha can be compued by a compuer.

8 Recurren Neural neworks Deerminisic sysems... Inroduce memory by recurren connecions: w h j (x, W (),) Disribued hidden sae allows o sore a informaion efficienly Non-linear dynamics allows complex hidden sae updaes x w ji g x w w JI Wih enough neurons and ime, RNNs can compue anyhing ha can be compued by a compuer. c c x, W B, V B () (Compuaion Beyond he Turing Limi Hava T. Siegelmann, 995) c B x, W B, V B () c B 8

9 Recurren Neural neworks Inroduce memory by recurren connecions: w h j (x, W, V () ) Disribued hidden sae allows o sore a informaion efficienly Non-linear dynamics allows complex hidden sae updaes g x n w = g J j=0 B (2) w j hj + b=0 (2) v b cb x w ji w JI g x w h j = h j J j=0 w ji () xi,n + B b=0 () v jb cb c c x, W B, V B () c b = c b J j=0 B () v bi xi,n + b =0 () v bb cb c B c B x, W B, V B () 9

10 Backpropagaion Through Time w h j (x, W, V ) x w ji g x w w JI c c x, W B, V B () c B x, W B, V B () c B 0

11 Backpropagaion Through Time w h j (x, W, V ) x x x All hese weighs should be he same. x w ji g x w w JI c x, W B, V B () c B x, W B, V B ()

12 Backpropagaion Through Time Perform nework unroll for U seps Iniialize V, V B replicas o be he same Compue gradiens and updae replicas wih he average of heir gradiens x w w ji h j (x, W, V ) U V = V η U 0 U V u V B = V B η U 0 V B u w JI g x w c x, W B, V B () V B 3 V B 2 V B V B c B x, W B, V B () 2

13 How much should we go back in ime? Someime he oupu migh be relaed o some inpu happened quie long before x w w ji h j (x, W, V ) Jane walked ino he room. John walked in oo. I was lae in he day. Jane said hi o <???> g x w However backpropagaion hrough ime was no able o rain recurren neural neworks significanly back in ime... c w JI c x, W B, V B () Was due o no being able o backprop hrough many layers... c B c B x, W B, V B () 3

14 How much can we go back in ime? To beer undersand why i was no working le consider a simplified case: x h = g(v () h + w () x) y = w (2) g(h ) Backpropagaion over an enire sequence is compued as E S E w = w = E w = = E y h h k y h h k w h h k = i=k+ h i h i = i=k+ v g h i If we consider he norm of hese erms h i = v g h i h h i h k γ v γ g k If (γ v γ g ) < his converges o 0... Wih Sigmoids and Tanh we have vanishing gradiens 4

15 Dealing wih Vanishing Gradien Force all gradiens o be eiher 0 or g a = ReLu a = max 0, a g a = a>0 Build Recurren Neural Neworks using small modules ha are designed o remember values for a long ime. x h = v () h + w () x y = w (2) g(h ) v () = I only accumulaes he inpu... 5

Long-Shor Term Memories Hochreier & Schmidhuber (997) solved he problem of vanishing gradien designing a memory cell using logisic and linear unis wih muliplicaive ineracions: Informaion ges ino he

16 Long-Shor Term Memories Hochreier & Schmidhuber (997) solved he problem of vanishing gradien designing a memory cell using logisic and linear unis wih muliplicaive ineracions: Informaion ges ino he cell whenever is wrie gae is on. The informaion says in he cell so long as is keep gae is on. Informaion is read from he cell by urning on is read gae. Can backpropagae hrough his since he loop has fixed weigh. 6

17 RNN vs. LSTM RNN 7

18 Long Shor Term Memory LSTM 9

19 Long Shor Term Memory Inpu gae LSTM 20

20 Long Shor Term Memory Forge gae LSTM 2

21 Long Shor Term Memory Memory gae LSTM 22

22 Long Shor Term Memory Oupu gae LSTM 23

23 Gaed Recurren Uni (GRU) I combines he forge and inpu gaes ino a single updae gae. I also merges he cell sae and hidden sae, and makes some oher changes. 24

24 Hidden Hidden Hidden LSTM Neworks You can build a compuaion graph wih coninuous ransformaions. Y 0 Y Y X 0 X X 25

25 Hidden Hidden Hidden LSTM Neworks You can build a compuaion graph wih coninuous ransformaions. Y 0 Y Y X 0 X X 26

26 LSTM LSTM LSTM LSTM LSTM LSTM ReLu ReLu ReLu LSTM Neworks You can build a compuaion graph wih coninuous ransformaions. Y 0 Y Y X 0 X X 27

Sequenial Daa Problems Fixed-sized inpu o fixed-sized oupu (e.g. image classificaion) Sequence oupu (e.g. image capioning akes an image and oupus a senence of words). Sequence inpu (e.g. senimen analysis where a given senence is classified as expressing posiive or negaive senimen).

27 Sequenial Daa Problems Fixed-sized inpu o fixed-sized oupu (e.g. image classificaion) Sequence oupu (e.g. image capioning akes an image and oupus a senence of words). Sequence inpu (e.g. senimen analysis where a given senence is classified as expressing posiive or negaive senimen). Sequence inpu and sequence oupu (e.g. Machine Translaion: an RNN reads a senence in English and hen oupus a senence in French) Synced sequence inpu and oupu (e.g. video classificaion where we wish o label each frame of he video) Credis: Andrej Karpahy 28

28 Sequence o Sequence Modeling Given <S, T> pairs, read S, and oupu T ha bes maches T 29

29 Tips & Tricks When condiioning on a full inpu sequence Bidirecional RNNs can exploi i: Have one RNNs raverse he sequence lef-o-righ Have anoher RNN raverse he sequence righ-o-lef Use concaenaion of hidden layers as feaure represenaion When iniializing RNN we need o specify he iniial sae Could iniialize hem o a fixed value (such as 0) Beer o rea he iniial sae as learned parameers Sar off wih random guesses of he iniial sae values Backpropagae he predicion error hrough ime all he way o he iniial sae values and compue he gradien of he error wih respec o hese Updae hese parameers by gradien descen 30

30 Acknowledgemens This slides are highly based on maerial aken from: Jeoffrey Hinon Hugo Larochelle Andrej Karpahy Nando De Freias Chris Olah You can find more deails on he original slides The amazing images on LSTM cell are aken from Chris Hola s blog: hp://colah.gihub.io/poss/ undersanding-lstms/ 3 3

31 Deep Learning: Theory, Techniques & Applicaions - Recurren Neural Neworks - Prof. Maeo Maeucci maeo.maeucci@polimi.i Deparmen of Elecronics, Informaion and Bioengineering Arificial Inelligence and Roboics Lab - Poliecnico di Milano

Learning to Process Natural Language in Big Data Environment

Learning to Process Natural Language in Big Data Environment CCF ADL 2015 Nanchang Oc 11, 2015 Learning o Process Naural Language in Big Daa Environmen Hang Li Noah s Ark Lab Huawei Technologies Par 2: Useful Deep Learning Tools Powerful Deep Learning Tools (Unsupervised