Convolutional Neural Networks

Size: px

Start display at page:

Download "Convolutional Neural Networks"

Jessie Sutton
5 years ago
Views:

1 Convolutional Neural Networks

2 Books»

3 Books

4 reviews»

6 Goal: Given an image, we want to identify what class that image belongs to. Input: Output:

7 Pipeline: Input Convolutional Neural Network (CNN) Output A Monitor

8 Convolutional Neural Nets (CNNs) in a nutshell: A typical CNN takes a raw RGB image as an input. It then applies a series of non-linear operations on top of each other. These include convolution, sigmoid, matrix multiplication, and pooling (subsampling) operations. The output of a CNN is a highly non-linear function of the raw RGB image pixels.

9 How the key operations are encoded in standard CNNs: Convolutional Layers: 2D Convolution Fully Connected Layers: Matrix Multiplication Sigmoid Layers: Sigmoid function Pooling Layers: Subsampling

10 2D convolution: h f g - the values in a 2D grid that we want to convolve - convolutional weights of size MxN M N hij f ( i m, j n) g( m, n) m 0 n 0 A sliding window operation across the entire grid.

0 0 0 0 1 0 0 0 0 0.107 0.113 0.107 0.113 0.119 0.

11 f g2 3 f g 1 f g Unchanged Image Blurred Image Vertical Edges

12 CNNs aim to learn convolutional weights directly from the data

13 Input: Convolutional Neural Network (CNN) Early layers learn to detect low level structures such as oriented edges, colors and corners

14 Input: Convolutional Neural Network (CNN) Deep layers learn to detect high-level object structures and their parts.

15 A Closer Look inside the Convolutional Layer A Chair Filter Input Image A Person Filter A Table Filter A Cupboard Filter

16 Fully Connected Layers:

17 Fully Connected Layers: matrix multiplication

18 Max Pooling Layer: Sliding window is applied on a grid of values. The maximum is computed using the values in the current window

19 Max Pooling Layer: Sliding window is applied on a grid of values. The maximum is computed using the values in the current window

20 Max Pooling Layer: Sliding window is applied on a grid of values. The maximum is computed using the values in the current window

21 Max Pooling Layer: Sliding window is applied on a grid of values. The maximum is computed using the values in the current window

22 Sigmoid Layer: Applies a sigmoid function on an input

23 Convolutional Networks Let us now consider a CNN with a specific architecture: 2 convolutional layers. 2 pooling layers. 2 fully connected layers. 3 sigmoid layers.

24 Notation: - convolutional layer output - fully connected layer output - pooling layer - sigmoid function - softmax function Forward Pass:

25 Notation: - convolutional layer output - fully connected layer output - pooling layer - sigmoid function - softmax function Forward Pass:

26 Notation: - convolutional layer output - fully connected layer output - pooling layer - sigmoid function - softmax function Forward Pass:

27 Notation: - convolutional layer output - fully connected layer output - pooling layer - sigmoid function - softmax function Forward Pass:

28 Notation: - convolutional layer output - fully connected layer output - pooling layer - sigmoid function - softmax function Forward Pass:

29 Notation: - convolutional layer output - fully connected layer output - pooling layer - sigmoid function - softmax function Forward Pass:

30 Notation: - convolutional layer output - fully connected layer output - pooling layer - sigmoid function - softmax function Forward Pass:

31 Notation: - convolutional layer output - fully connected layer output - pooling layer - sigmoid function - softmax function Forward Pass:

32 Notation: - convolutional layer output - fully connected layer output - pooling layer - sigmoid function - softmax function Forward Pass:

33 Convolutional Networks Notation: - convolutional layer output - fully connected layer output - pooling layer - sigmoid function - softmax function Forward Pass:

34 Notation: - convolutional layer output - fully connected layer output - pooling layer - sigmoid function - softmax function Forward Pass:

35 Notation: - convolutional layer output - fully connected layer output - pooling layer - sigmoid function - softmax function Forward Pass: Final Predictions

36 Notation: - convolutional layer output - fully connected layer output - pooling layer - sigmoid function - softmax function Forward Pass: Convolutional layer parameters in layers 1 and 2

37 Notation: - convolutional layer output - fully connected layer output - pooling layer - sigmoid function - softmax function Forward Pass: Fully connected layer parameters in the fully connected layers 1 and 2

38 Notation: - convolutional layer output - fully connected layer output - pooling layer - sigmoid function - softmax function Forward Pass:

39 Notation: - convolutional layer output - fully connected layer output - pooling layer - sigmoid function - softmax function Forward Pass: 1.

40 Notation: - convolutional layer output - fully connected layer output - pooling layer - sigmoid function - softmax function Forward Pass: 1. 2.

41 Notation: - convolutional layer output - fully connected layer output - pooling layer - sigmoid function - softmax function Forward Pass:

42 Notation: - convolutional layer output - fully connected layer output - pooling layer - sigmoid function - softmax function Forward Pass:

43 Notation: - convolutional layer output - fully connected layer output - pooling layer - sigmoid function - softmax function Forward Pass: Key Question: How to learn the parameters from the data?

44 Backpropagation for Convolutional Neural Networks

45 Prediction True Label How to learn the parameters of a CNN? Assume that we are a given a labeled training dataset We want to adjust the parameters of a CNN such that CNN s predictions would be as close to true labels as possible. This is difficult to do because the learning objective is highly non-linear.

46 Gradient descent: Iteratively minimizes the objective function. The function needs to be differentiable.

47 Gradient descent: Iteratively minimizes the objective function. The function needs to be differentiable.

48 Gradient descent: Iteratively minimizes the objective function. The function needs to be differentiable.

49 Gradient descent: Iteratively minimizes the objective function. The function needs to be differentiable.

50 1. Compute the gradients of the overall loss w.r.t. to our predictions and propagate it back: True Label

51 2. Compute the gradients of the overall loss and propagate it back: True Label

52 True Label 3. Compute the gradients to adjust the weights:

53 4. Backpropagate the gradients to previous layers: True Label

54 True Label 5. Compute the gradients to adjust the weights:

55 6. Backpropagate the gradients to previous layers: True Label

56 True Label 7. Compute the gradients to adjust the weights:

57 8. Backpropagate the gradients to previous layers: True Label

58 True Label 9. Compute the gradients to adjust the weights:

59 An Example of Backpropagation Convolutional Neural Networks

60 True Label Assume that we have K=5 object classes: Class 1: Penguin Class 2: Building Class 3: Chair Class 4: Person Class 5: Bird

61 True Label

62 where True Label

63 where True Label

64 where True Label

65 where True Label

66 True Label

67 True Label

68 True Label

69 True Label Assume that we have K=5 object classes: Class 1: Penguin Class 2: Building Class 3: Chair Class 4: Person Class 5: Bird

70 True Label Assume that we have K=5 object classes: Class 1: Penguin Class 2: Building Class 3: Chair Class 4: Person Class 5: Bird

71 True Label Assume that we have K=5 object classes: Class 1: Penguin Class 2: Building Class 3: Chair Class 4: Person Class 5: Bird Increasing the score corresponding to the true class decreases the loss.

72 True Label Assume that we have K=5 object classes: Class 1: Penguin Class 2: Building Class 3: Chair Class 4: Person Class 5: Bird Decreasing the score of other classes also decreases the loss.

73 Adjusting the weights: True Label

74 True Label Adjusting the weights: Need to compute the following gradient

75 True Label Adjusting the weights: Need to compute the following gradient

76 True Label Adjusting the weights: Need to compute the following gradient was already computed in the previous step

77 True Label Adjusting the weights: Need to compute the following gradient

78 True Label Adjusting the weights: Need to compute the following gradient where

79 True Label Adjusting the weights: Need to compute the following gradient where

80 True Label Adjusting the weights: Need to compute the following gradient Update rule:

81 Backpropagating the gradients: True Label

82 True Label Backpropagating the gradients: Need to compute the following gradient:

83 True Label Backpropagating the gradients: Need to compute the following gradient:

84 True Label Backpropagating the gradients: Need to compute the following gradient: was already computed in the previous step

85 True Label Backpropagating the gradients: Need to compute the following gradient:

86 True Label Backpropagating the gradients: Need to compute the following gradient: where

87 True Label Backpropagating the gradients: Need to compute the following gradient: where

88 True Label Backpropagating the gradients: Need to compute the following gradient:

89 True Label Backpropagating the gradients: Need to compute the following gradient:

90 True Label Backpropagating the gradients: Need to compute the following gradient:

91 Adjusting the weights: True Label

92 True Label Adjusting the weights: Need to compute the following gradient Update rule:

93 Backpropagating the gradients: True Label

94 True Label Backpropagating the gradients: Need to compute the following gradient:

95 Adjusting the weights: True Label

96 True Label Adjusting the weights: Need to compute the following gradient

97 True Label Adjusting the weights: Need to compute the following gradient

98 True Label Adjusting the weights: Need to compute the following gradient was already computed in the previous step

99 True Label Adjusting the weights: Need to compute the following gradient

100 True Label Adjusting the weights: Need to compute the following gradient where

101 True Label Adjusting the weights: Need to compute the following gradient where

102 True Label Adjusting the weights: Need to compute the following gradient Update rule:

103 Backpropagating the gradients: True Label

104 True Label Backpropagating the gradients: Need to compute the following gradient:

105 True Label Backpropagating the gradients: Need to compute the following gradient:

106 True Label Backpropagating the gradients: Need to compute the following gradient: was already computed in the previous step

107 True Label Backpropagating the gradients: Need to compute the following gradient:

108 True Label Backpropagating the gradients: Need to compute the following gradient: where

109 True Label Backpropagating the gradients: Need to compute the following gradient: where

110 True Label Backpropagating the gradients: Need to compute the following gradient:

111 True Label Backpropagating the gradients: Need to compute the following gradient:

112 True Label Backpropagating the gradients: Need to compute the following gradient:

113 Adjusting the weights: True Label

114 True Label Adjusting the weights: Need to compute the following gradient Update rule:

115 Visual illustration Backpropagation Convolutional Neural Networks

116 Fully Connected Layers: Forward:

117 Fully Connected Layers: Forward: Activation unit of interest

118 Fully Connected Layers: Forward: Activation unit of interest The output The weight that is used in conjunction with the activation unit of interest

119 Fully Connected Layers: Forward:

120 Fully Connected Layers: Forward:

121 Fully Connected Layers: Forward:

122 Fully Connected Layers: Forward: Backpropagation

123 Fully Connected Layers: Forward:

124 Fully Connected Layers: Forward:

125 Fully Connected Layers: Forward:

126 Fully Connected Layers: Forward: Backward:

127 Fully Connected Layers: Forward: A measure how much an activation unit contributed to the loss Backward:

128 Fully Connected Layers: Forward: A measure how much an activation unit contributed to the loss Backward:

129 Fully Connected Layers: Forward: Backward:

130 Fully Connected Layers: Forward: Backward:

131 Fully Connected Layers: Forward: Backward:

132 Fully Connected Layers: Forward: Backward:

133 Fully Connected Layers: Forward: Backward:

134 Fully Connected Layers: Forward: Backward:

135 Summary for fully connected layers Backpropagation Convolutional Neural Networks

136 Summary: 1. Let, where n denotes the number of layers in the network.

137 Summary: 1. Let, where n denotes the number of layers in the network. 2. For each fully connected layer : For each node in layer set:

138 Summary: 1. Let, where n denotes the number of layers in the network. 2. For each fully connected layer : For each node in layer set: Compute partial derivatives:

139 Summary: 1. Let, where n denotes the number of layers in the network. 2. For each fully connected layer : For each node in layer set: Compute partial derivatives: Update the parameters:

140 Visual illustration Backpropagation Convolutional Neural Networks

141 Convolutional Layers: Forward: a g z ( l ) ( l ) ( l 1 )

142 Convolutional Layers: Forward: a g z ( l ) ( l ) ( l 1 )

143 Convolutional Layers: Forward: a g z ( l ) ( l ) ( l 1 )

144 Summary: 1. Let, where c denotes the index of a first fully connected layer. 2. For each convolutional layer : For each node in layer set

145 Summary: 1. Let, where c denotes the index of a first fully connected layer. 2. For each convolutional layer : For each node in layer set Compute partial derivatives:

146 Summary: 1. Let, where c denotes the index of a first fully connected layer. 2. For each convolutional layer : For each node in layer set Compute partial derivatives: Update the parameters:

147 True Label Gradient in pooling layers: There is no learning done in the pooling layers The error that is backpropagated to the pooling layer, is sent back from to the node where it came from.

148 True Label Gradient in pooling layers: There is no learning done in the pooling layers The error that is backpropagated to the pooling layer, is sent back from to the node where it came from. Forward Pass Layer Layer

149 True Label Gradient in pooling layers: There is no learning done in the pooling layers The error that is backpropagated to the pooling layer, is sent back from to the node where it came from. Backward Pass Layer Layer 0 0 0

150 Recognition as a big table lookup Image bits Recognition Label n Image = Y N m N Y Y N Y write down all images in a table and record the object label Y N Y n*m N

151 How many images are there? An tiny image example (can you see what it is?) 3 18 Each pixel has 2 = 256 values 8 3x18 image above has 54 pixels Total possible 54 pixel images = = 1.1 x Prof. Kanade s Theorem: we have not seen anything yet!

152 3 Total possible 54 pixel images =1.1 x Compared number of images seen by all humans ever: 10 billion x 1000 x 100 x 356 x 24 x 60 x 60 x 30 = Total population years hours frame rate generations days min/sec We have to be clever in writing down this table!

153 A Closer Look inside the Convolutional Layer A Chair Filter Input Image A Person Filter A Table Filter A Cupboard Filter

154 A Closer Look inside the Convolutional Layer :

155 A Closer Look inside the Back Propagation Convolutional Layer :

156 Adjusting the weights: True Label

157 Training Batch ( ) ( ) ( ) ( ) Average the Gradient Across the Batch Parameter Update ( ) (performed in a sliding window fashion) - elementwise multiplication new old

158 Training Batch ( ) ( ) ( ) ( ) Average the Gradient Across the Batch Parameter Update ( ) (performed in a sliding window fashion) - elementwise multiplication new old

159 Training Batch ( ) ( ) ( ) ( ) Average the Gradient Across the Batch Parameter Update ( ) (performed in a sliding window fashion) - elementwise multiplication new old

160 Adjusting the weights: True Label

161 Training Batch ( ) ( ) ( ) ( ) Average the Gradient Across the Batch Parameter Update ( ) (performed in a sliding window fashion) - elementwise multiplication new old

162 Training Batch ( ) ( ) ( ) ( ) Average the Gradient Across the Batch Parameter Update ( ) (performed in a sliding window fashion) - elementwise multiplication new old

163 Training Batch ( ) ( ) ( ) ( ) Average the Gradient Across the Batch Parameter Update ( ) (performed in a sliding window fashion) - elementwise multiplication new old

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as