RECURRENT NETWORKS I. Philipp Krähenbühl

Size: px

Start display at page:

Download "RECURRENT NETWORKS I. Philipp Krähenbühl"

Willis Webster
5 years ago
Views:

1 RECURRENT NETWORKS I Philipp Krähenbühl

2 RECAP: CLASSIFICATION conv 1 conv 2 conv 3 conv tu

3 RECAP: SEGMENTATION conv 1 conv 2 conv 3 conv 4

4 RECAP: DETECTION conv 1 conv 2 conv 3 conv 4

5 RECAP: GENERATION noise conv 1 conv 2 conv 3 conv 4

6 FEED FORWARD NETWORKS order of computation: conv 1 conv 2 conv 3 conv tu

7 FEED FORWARD NETWORKS (Fied) order of conv 1 conv 2 conv 3 conv tu computation Lower to upper layers Once we have the result discard all activations

8 WOULD YOU USE THIS TO DRIVE A CAR?

9 1 conv 3 conv 2 conv 1 WOULD YOU USE THIS TO DRIVE A CAR?

10 WOULD YOU USE THIS TO DRIVE A CAR? Independent decision for each frame No state or memory 1 Real world not conv 3 For supertu kart it might still be ok conv 2 Probably not conv 1

11 conv 1 conv 2 conv 3 1 conv 1 conv 2 conv conv 3 conv 2 conv 1 HOW DO WE KEEP A STATE AROUND?

12 conv 1 conv 2 conv 3 1 conv 1 conv 2 conv conv 3 conv 2 conv 1 HOW DO WE KEEP A STATE AROUND?

13 RECURRENT NEURAL NETWORK (RNN) State update: h t = f h ( t, h t 1, θ h ) h y Output: y t = f y ( t, h t, θ y ) recurrent connection

14 ELMAN NETWORKS State update: h t = f h ( t, h t 1, θ h ) = σ(u h h t 1 + W h + b h ) Output: y t = f y ( t, h t, θ y ) = σ(w y h t + b y ) h sigmoid y sigmoid

15 JORDAN NETWORKS State update: h t = f h ( t, h t 1, θ h ) = σ(u h y t 1 + W h + b h ) Output: y t = f y ( t, h t, θ y ) = σ(w y h t + b y ) h sigmoid y sigmoid

16 HOW DO WE TRAIN RNNS? State update: h t = f h ( t, h t 1, θ h ) h y Output: y t = f y ( t, h t, θ y ) recurrent connection

17 UNROLLING THROUGH TIME y 0 y 1 y 2 y t t

18 UNROLLING THROUGH TIME y 0 y 1 y 2 y t Unrolled RNN Freed forward network t Shared parameters Trained with back-prop

19 UNROLLING THROUGH TIME - ISSUES Long unrolling y 0 y 1 y 2 y t Vanishing or eploding gradients t Very long unrolling Computationally epensive

20 VERY LONG UNROLLING y 0 y 1 y 2 y t Solution (hack) During training: Cut RNN (set h=0) after n timesteps t Often still trains well in practice

21 EXPLODING AND VANISHING GRADIENTS h t h t 1 α h n h 0 α n h y Vanishing gradients: α 1 : α n 0 Eploding gradients: α α 1 : α n

22 EXPLODING AND VANISHING GRADIENTS Eploding gradients Gradient clipping (hack) l l clip h t 1 ( h t h t h t 1, ε, ε ) Vanishing gradients Different RNN structure

23 LSTM Long short-term memory f t = σ(w f t + U f h t 1 + b f ) i t = σ(w i t + U i h t 1 + b i ) o t = σ(w o t + U o h t 1 + b o ) c t = f t c t 1 + i t τ(w c t + U c h t 1 + b c ) h t = o t τ(c t ) c t 1 h t 1 + f t i t sigmoid sigmoid tanh o t sigmoid tanh c t h t t

24 LSTM Long short-term memory c t 1 + c t Cell state c f t i t o t tanh Allows for information to just flow through nearly unchanged h t 1 sigmoid sigmoid tanh sigmoid h t t

25 LSTM Long short-term memory c t 1 + c t Forget gate f f t i t sigmoid sigmoid tanh o t sigmoid tanh Clears the cell state h t 1 h t t

26 LSTM Long short-term memory Input gate i Allows a state update (or not) c t 1 + f t i t sigmoid sigmoid tanh o t sigmoid tanh c t Input h t 1 h t h (previous cell state) t

27 LSTM Long short-term memory Output gate o c t 1 + c t Should we produce an output? f t i t sigmoid sigmoid tanh o t sigmoid tanh Output h t 1 h t tanh of cell state t

28 LSTM Long short-term memory Can learn to keep state for up to 100 time steps Fewer vanishing gradients c t 1 + f t i t sigmoid sigmoid tanh o t sigmoid tanh c t h t 1 h t Trained by unrolling through time t

29 GRU Gated Recurrent Unit z t = σ(w z t + U z h t 1 + b z ) h t 1 + h t r t = σ(w r t + U r h t 1 + b r ) z t 1 z t h t = τ(w h t + U h (r t h t 1 ) + b h ) sigmoid tanh h t h t = (1 z t ) h t 1 + z t h t sigmoid r t t

30 GRU Gated Recurrent Unit Similar performance to LSTM h t 1 z t 1 z t + h t sigmoid tanh h t Almost same state update r t Fewer gates sigmoid t

31 SUMMARY Training RNNs Unroll in time + Backdrop Eploding gradients Clip Vanishing gradients (no long term interactions) Use LSTM or GRU

Recurrent Neural Networks. Jian Tang

Recurrent Neural Networks. Jian Tang Recurrent Neural Networks Jian Tang tangjianpku@gmail.com 1 RNN: Recurrent neural networks Neural networks for sequence modeling Summarize a sequence with fix-sized vector through recursively updating