Gianluca Pollastri, Head of Lab School of Computer Science and Informatics and. University College Dublin

Size: px

Start display at page:

Download "Gianluca Pollastri, Head of Lab School of Computer Science and Informatics and. University College Dublin"

Nancy Park
5 years ago
Views:

1 Introduction ti to Neural Networks Gianluca Pollastri, Head of Lab School of Computer Science and Informatics and Complex and Adaptive Systems Labs University College Dublin

Credits Geoffrey Hinton, University of Toronto.

This guy taught me Neural Networks in the first This

2 Credits Geoffrey Hinton, University of Toronto. borrowed some of his slides for Neural Networks and Computation in Neural Networks courses. Paolo Frasconi, University of Florence. This guy taught me Neural Networks in the first This guy taught me Neural Networks in the first place (*and* I borrowed some of his slides too!).

3 Recurrent Neural Networks (RNN) One of the earliest versions: Jeffrey Elman, 1990, Cognitive Science. Problem: P it isn t easy to represent time with Feedforward Neural Nets: usually time is represented with space. Attempt to design networks with memory.

4 RNNs The idea is having discrete time steps, and considering the hidden layer at time t-1 as an input at time t. This effectively removes cycles: we can model the network using an FFNN, and model memory explicitly.

5 Ot Xt d It d = delay element

6 BPTT BackPropagation Through Time. If Ot is the output at time t, It the input at time t, and Xt the memory (hidden) at time t, we can model the dependencies as follows:

7 BPTT We can model both f() and g() with (possibly multilayered) networks. We can transform the recurrent network by unrolling it in time. Backpropagation works on any DAG. An RNN becomes one once it s unrolled.

8 Ot Xt d It d = delay element

9 Ot-2 Ot-1 Ot Ot+1 Ot+2 Xt-2 Xt-1 Xt Xt+1 X Xt+2 It-2 It-1 It It+1 It+2

10 gradient in BPTT GRADIENT(I,O,T) T) { # I=inputs, O=outputs, T=targets T := size(o); X0 := 0; for t := 1..T Xt := f( Xt-1, It ); for t := 1..T { Ot := g( Xt, It ); g.gradient( Ot - Tt ); δt = g.deltas( Ot - Tt ); } for t := T..1 f.gradient( δt ); δt-1 += f.deltas( δt ); }

11 Ot-2 Ot-1 Ot Ot+1 Ot+2 Xt-2 Xt-1 Xt Xt+1 X Xt+2 It-2 It-1 It It+1 It+2

12 Ot-2 Ot-1 Ot Ot+1 Ot+2 Xt-2 Xt-1 Xt Xt+1 X Xt+2 It-2 It-1 It It+1 It+2

13 Ot-2 Ot-1 Ot Ot+1 Ot+2 Xt-2 Xt-1 Xt Xt+1 X Xt+2 It-2 It-1 It It+1 It+2

14 Ot-2 Ot-1 Ot Ot+1 Ot+2 Xt-2 Xt-1 Xt Xt+1 X Xt+2 It-2 It-1 It It+1 It+2

15 Ot-2 Ot-1 Ot Ot+1 Ot+2 Xt-2 Xt-1 Xt Xt+1 X Xt+2 It-2 It-1 It It+1 It+2

16 Ot-2 Ot-1 Ot Ot+1 Ot+2 Xt-2 Xt-1 Xt Xt+1 X Xt+2 It-2 It-1 It It+1 It+2

17 Ot-2 Ot-1 Ot Ot+1 Ot+2 Xt-2 Xt-1 Xt Xt+1 X Xt+2 It-2 It-1 It It+1 It+2

18 Ot-2 Ot-1 Ot Ot+1 Ot+2 Xt-2 Xt-1 Xt Xt+1 X Xt+2 It-2 It-1 It It+1 It+2

19 Ot-2 Ot-1 Ot Ot+1 Ot+2 Xt-2 Xt-1 Xt Xt+1 X Xt+2 It-2 It-1 It It+1 It+2

20 Ot-2 Ot-1 Ot Ot+1 Ot+2 Xt-2 Xt-1 Xt Xt+1 X Xt+2 It-2 It-1 It It+1 It+2

21 Ot-2 Ot-1 Ot Ot+1 Ot+2 Xt-2 Xt-1 Xt Xt+1 X Xt+2 It-2 It-1 It It+1 It+2

22 Ot-2 Ot-1 Ot Ot+1 Ot+2 Xt-2 Xt-1 Xt Xt+1 X Xt+2 It-2 It-1 It It+1 It+2

23 Ot-2 Ot-1 Ot Ot+1 Ot+2 Xt-2 Xt-1 Xt Xt+1 X Xt+2 It-2 It-1 It It+1 It+2

24 What I will talk about Neurons Multi-Layered Neural Networks: Basic learning algorithm Expressive e power Classification How can we *actually* train Neural Networks: Speeding up training Learning just right (not too little, not too much) Figuring out you got it right Feed-back networks? Anecdotes on real feed-back networks (Hopfield Nets, Boltzmann Machines) Recurrent Neural Networks Bidirectional RNN 2D-RNN Concluding remarks

25 Bidirectional Recurrent Neural Networks (BRNN)

26 BRNN Ft = ( Ft-1, Ut ) Bt = ( Bt+1, Ut ) Yt = ( Ft, Bt, Ut ) () () ed () are realised with NN (), () and () are independent from t: stationary

27 BRNN Ft = ( Ft-1, Ut ) Bt = ( Bt+1, Ut ) Yt = ( Ft, Bt, Ut ) () () ed () are realised with NN (), () and () are independent from t: stationary

28 BRNN Ft = ( Ft-1, Ut ) Bt = ( Bt+1, Ut ) Yt = ( Ft, Bt, Ut ) () () ed () are realised with NN (), () and () are independent from t: stationary

29 BRNN Ft = ( Ft-1, Ut ) Bt = ( Bt+1, Ut ) Yt = ( Ft, Bt, Ut ) () () ed () are realised with NN (), () and () are independent from t: stationary

30 Inference in BRNNs FORWARD(U) { T size(u); F 0 B T+1 0; for t 1..T F t = ( F t-1, U t ); for t T..1 B t = ( B t+1, U t ); for t 1..T Y t = ( F t,b t, U t ); return Y; }

31 Learning in BRNNs GRADIENT(U,Y) U Y { T size(u); F 0 B T+1 0; for t 1..T F t = ( F t-1, U t ); for t T..1 B t = ( B t+1, U t ); for t 1..T { Y t = ( F t, B t, U t ); [δf t, δb t ] =.backprop&gradient( Y t -Y t ); } for t T..1 δf t-1 +=.backprop&gradient(δf p p g ( F t ); for t 1..T δb t+1 +=.backprop&gradient(δbb k & t ); }

32 What I will talk about Neurons Multi-Layered Neural Networks: Basic learning algorithm Expressive e power Classification How can we *actually* train Neural Networks: Speeding up training Learning just right (not too little, not too much) Figuring out you got it right Feed-back networks? Anecdotes on real feed-back networks (Hopfield Nets, Boltzmann Machines) Recurrent Neural Networks Bidirectional RNN 2D-RNN Concluding remarks

33 2D RNNs ll i ldi f Pollastri & Baldi 2002, Bioinformatics Baldi & Pollastri 2003, JMLR

34 2D RNNs

35 2D RNNs

36 2D RNNs

37 2D RNNs

38 2D RNNs

39 2D RNNs

Lecture 11 Recurrent Neural Networks I

Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 01, 2017 Introduction Sequence Learning with Neural Networks Some Sequence Tasks