Neural Network Back-propagation HYUNG IL KOO

Size: px

Start display at page:

Download "Neural Network Back-propagation HYUNG IL KOO"

Patricia Glenn
5 years ago
Views:

1 Neural Network Back-propagation HYUNG IL KOO

2 Hidden Layer Representations Backpropagation has an ability to discover useful intermediate representations at the hidden unit layers inside the networks which capture properties of the input spaces that are most relevant to learning the target function. When more layers of units are used in the network, more complex features can be invented. But the representations of the hidden layers are very hard to understand for humans.

3 Basic Math

4 Optimization Find x that minimizes f(x) If f(x) is differentiable, But, in many cases, solving the above equation is a still difficult problem.

5 Gradient descent f(x 1 ) y = f(x) f(x 2 ) x 1 x 2 x 3

6 Chain Rule Chain rule with a single variable Chain Rule (multiple variables) w = f x, y, z => dw = f dx + f dy + f dz dt x dt y dt z dt Δw f x Δx + f y Δy + f z Δz

7 Feed-forward neural network

8 Feed forward network example: 1 st layer w 11 w 12w21 +1 w 22 w 31 w 32 The 1st hidden layer Non-linear function

9 Feed forward network example: 2 nd layer u 11 u 21 u 22 u 12 u u 23 The 1st hidden layer +1 The 2 nd hidden layer

10 Layer 1 Layer 2 Soft Max Forward propagation The 1st hidden layer The 2 nd hidden layer Output

11 Layer 1 Layer 2 Soft Max Forward propagation matrix repr. The 1st hidden layer The 2 nd hidden layer Output

12 Back-propagation algorithm

13 Weight update method Parameter update : Gradient Descent ` Loss function (L) Learning rate

14 Layer 1 Layer 2 Soft Max Dataflow diagram VS The 1st hidden layer The 2 nd hidden layer Output Ground Truth

15 Soft Max Back-propagation step; Loss function Output Ground Truth Computing Loss function(l): ex) Cross entropy

16 Layer 1 Layer 2 Soft Max Layer 1 Layer 2 Soft Max Overview 1 0 The 1st hidden layer The 2 nd hidden layer Output Ground Truth Loss function

17 Layer 2 Soft Max Back-propagation; 2 nd layer The 1st hidden layer The 2 nd hidden layer Output Ground Truth Weight update the Layer 2 has to do Error propagation

18 Layer 2 Soft Max Back-propagation; 2 nd layer The 1st hidden layer The 2 nd hidden layer Output Ground Truth Weight update the Layer 2 has to do Error propagation

19 Error propagation (feed forward network) The 1st hidden layer The 2 nd hidden layer z 1 and z 2 are from its upper layer.

20 Weight updates (feed forward network) The 1st hidden layer The 2 nd hidden layer z 1 and z 2 are from its upper layer.

21 Layer 1 Layer 2 Soft Max Back propagation; 1 st layer The 1st hidden layer The 2 nd hidden layer Output Ground Truth the Layer 1 has to do Weight update w new ij = w old ij μ w ij Error propagation, Input update???

22 Weight updates w ij new = w ij old μ w ij (feed forward network) +1 The 1st hidden layer, and are y 1 y 2 y 3 from its upper layer

23 Block-based perspective

24 Basic Math Y= X AX Y+ΔY X + ΔX A X + ΔX = X AX + ΔX AX + X ΔX + ΔX AΔX X AX + X A + A ΔX = Y X Y A Y+ΔY X A + ΔA X = X AX + X ΔAX ΔY = X ΔAX = tr X ΔAX = tr(xx ΔA) Y A = XX

25 Block-based representation X Y Z Y = WX + b Z = φ(y) X Y Z X = W Y Y = diag φ Z Z W b = Y = Y X

26 Soft Max Forward propagation (block-based representation) Layer 1 Layer 2 Output Layer 1 Layer 2

27 Soft Max Backward propagation; 2 nd layer VS Layer 1 Layer 2 Output Ground Truth Error propagation

28 Soft Max Backward propagation; 2 nd layer VS Layer 1 Layer 2 Error propagation Output Ground Truth VS

29 Soft Max Backward propagation; 2 nd layer VS Layer 1 Layer 2 Output Ground Truth Error propagation Weight update

30 Soft Max Backward propagation; 1 st layer VS Layer 1 Layer 2 Error propagation Output Ground Truth

31 Soft Max Backward propagation; 1 st layer VS Layer 1 Layer 2 Error propagation Weight update Output Ground Truth

32 Input Optimization while fixing all weights

33 Input update (feed forward network) +1 The 1st hidden layer y 1, y 2 and 3 are from its upper layer.

34 Output Loss Building

35 I new = I old μ I Output Loss Building

36 Output Loss Building

Inceptionism: Going Deeper into Neural Networks https://research.

37 Inceptionism: Going Deeper into Neural Networks

38 Inceptionism: Going Deeper into Neural Networks

39 DEEP INSIDE CONVOLUTION NETWORKS Inputs maximizing class score

40 Inputs maximizing class score S c I λ I 2 Objective Function (to be maximized) K. Simonyan, A. Vedaldi, A. Zisserman, Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps, ICLR Workshop 2014

41 Inputs maximizing class score S c I λ I 2 Objective Function (to be maximized) Goose class

42 Maximizing class score S c I λ I 2 Objective Function (to be maximized) Goose class

43 Inputs maximizing class score dumbbell cup dalmatian bell pepper lemon husky

44 Inputs maximizing class score computer keyboard kit fox limousine Washing machine goose ostrich

45 DEEP INSIDE CONVOLUTION NETWORKS Saliency visualization

46 Saliency visualization Linear score model for class c: w : importance of corresponding pixels of I for class c

47 Saliency visualization Dog Class Objective Function

48 Saliency visualization Dog Class Differentiation Objective Function S c (I 0 ) I 0 =

49 Saliency visualization I 0 = saliency map

50 yacht dog monkey washing machine cow building

51 A NEURAL ALGORITHM OF ARTISTIC STYLE

52 Loss Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge. "A neural algorithm of artistic style." arxiv preprint arxiv: (2015).

53 Loss

55 Artistic style

56 Artistic style

57 Backups

58 Vector-by-scalar

59 Scalar-by-vector

60 Vector-by-vector

61 Scalar-by-matrix

62 Why W = X Y? We want to find W satisfying: ΔL = tr from Y = WX ΔL = Y ΔY. W ΔW. [Intuitive derivation] ΔY = ΔWX ΔL = W Y ΔY = ΔWX = tr Y = X Y Y ΔWX = tr X Y ΔW

63 X Y Z V Y = WX Z = Y + b V = φ(z) X Y Z V X = W Y Y = Z Z = diag φ Z V W = Y X b = Z

64 Proof Element-wise operation y 1 y 2 y n y i = z i φ z i Y y 1 y 2 y n = diag φ Z Z z 1 = φ(y 1 ) z 2 = φ(y 2 ) z n = φ(y n ) z 1 z 2 z n

65 Proof Y = W X L Y Y Y W X L Y W X X = W Y = X y = WX L tr y WX L Y Y W W L WX = tr Y W = Y Y WX = tr X Y W X

66 New Layer Design

67 Layer 1 Layer m Layer N Soft Max New layer addition α 1 β 1 α 2 α 3 β 2 β 3 The 1st layer A newly added layer The Nth layer Output

68 Layer m Layer m New layer design Forward pass Compute output from input Backward pass Compute the derivatives w.r.t. data Update weights?? dl dl α 1 β 1 dα 1 dl dβ 1 dl α 2 β 2 dα 2 dl dβ 2 dl α 3 β 3 dα 3 dβ 3 Forward pass Backward pass

69 Max pooling layer Example: Max pooling

70 max max Derivatives of max For forward pass For backward pass x dl dx = dl dz dz dx y z dl dy = dl dz dz dy dl dz Forward pass Backward pass

Tasks ADAS. Self Driving. Non-machine Learning. Traditional MLP. Machine-Learning based method. Supervised CNN. Methods. Deep-Learning based

Tasks ADAS. Self Driving. Non-machine Learning. Traditional MLP. Machine-Learning based method. Supervised CNN. Methods. Deep-Learning based UNDERSTANDING CNN ADAS Tasks Self Driving Localizati on Perception Planning/ Control Driver state Vehicle Diagnosis Smart factory Methods Traditional Deep-Learning based Non-machine Learning Machine-Learning