CURRENNT TOOLKIT. Figure illustration

Size: px

Start display at page:

Download "CURRENNT TOOLKIT. Figure illustration"

Dale Haynes
5 years ago
Views:

1 Figure illustration

2 l In case of Image processing channel 1 channel 2 channel 3 H height W width Channels: number of input image/feature maps When the input is an image, there may be 3 channels Red, Green, Blue When the input is the output of another layer, there may be N channels, where N is determined by the previous layer

3 l In case of Image processing channel 1 channel 2 filter 1 channel 3 H height height W width width Filter 1 contains 3 weight maps (not 1), where each weight map is used upon one channel of input image In other words, a weight scalar in the filter can be indexed as w k,l i,j k: index of this filter (=1, in this example) l : index of the input channel (=1, 2, or 3, in this example) i,j: index of the weight scalar inside the filter map For reference: figure 1

4 Operation: sum the results over the 3 input channels l In case of Image processing Output of filter 1

5 Operation: sum the results over the 3 input channels l In case of Image processing Output of filter 1

6 Operation: sum the results over the 3 input channels l In case of Image processing Output of filter 1

7 Operation: sum the results over the 3 input channels l In case of Image processing Output of filter 1

8 l In case of Image processing Input image (RGB) Output of filter 1 filter 1 filter 1 extracts one feature map output feature map size is determined by 1. input feature map size 2. filter size 3. padding size 4. stride reference:

9 l In case of Image processing Output of layer m-1 Output of layer m channel 1 channel 1 channel 2 channel 2 channel 3 filter 1 extracts one feature map filter 2 extracts another map filter 3... Filters of layer m filter 1 filter 2 How many filters -> how many extracted feature maps (how many output channels)

10 l In case of 1x1 conv for Image processing filter 1 filter 2 Each weight map of a filter is 1x1 (1 scalar)

11 l In case of 1x1 conv for Image processing Input image (RGB) Output of filter 1 filter 1 Each input channel is multiplied by a 1x1 weight map Results are summed together (with bias, ReLU)

12 Each input channel is multiplied by a 1x1 weight map Results are summed together (with bias, ReLU) l In case of 1x1 conv for Image processing

13 We can get multiple output channels by using multiple filters We can reduce the 'dimension' of features by using a small number of filters ( l In case of 1x1 conv for Image processing Layer m-1 Layer m channel 1 channel 1 channel 2 channel 2 channel 3 filter 1 filter 2

14 For text and audio processing

15 l In case of normal conv for textual or audio processing (one type of implementation) Layer m-1 channel 1 Layer m channel 1 channel 2 channel 2 channel 3 filter 1 filter 2 Just set the height of image and filters to 1

16 l In case of normal conv for textual or audio processing (one type of implementation) Layer m-1 Layer m channel 1 channel 2 D dim channel 1 channel 2 2 dim channel 3 T frames T frames filter 1 filter 2 D dim In implementation, we can treat channel index as dimension index ex. channel 1 is 1 st dim MGC, channel 2 is the 2 nd dim MGC,...

17 l In case of 1x1 conv for textual or audio processing (one type of implementation) Layer m-1 Layer m channel 1 channel 2 D dim channel 1 channel 2 2 dim channel 3 T frames T frames filter 1 filter 2 D dim Reduce the filter width for 1x1 convolution

18 l Toy example of normal convolution (one type of implementation) Filter 1 Filter 2 D dim Filter width Input features D dim T frames

19 l Toy example of normal convolution The number of frames will not be changed Filter 1 Filter 2 Output features D dim T frames Filter width Input features D dim T frames

20 l Toy example of normal convolution Filter 1 Filter 2 Output features D dim T frames Filter width Input features D dim T frames

21 l Toy example Filter 1 Filter 2 Output features D dim T frames Filter width Input features D dim T frames

22 l Toy example Filter 1 Filter 2 Output features D dim T frames Filter width Input features D dim T frames

23 l Toy example Filter 1 Filter 2 Output features D dim T frames Filter width Input features D dim T frames

24 l Toy example The number of frames will not be changed Filter 1 Filter 2 Output features D dim T frames Filter width Input features D dim T frames

25 CURRENNT IMPLEMENTATION

26 l Toy example to get o 0:3 = w 0:2 x 0:3 Filter 1 w 0 w 1 w 2 Input features x 0 x 1 x 2 x 3

27 l Toy example to get o 0:3 = w 0:2 x 0:3 o 0 = w > 1 x 0 + w > 2 x 1 x 0 x 1 x 2 x 3

28 l Toy example to get o 0:3 = w 0:2 x 0:3 o 1 = w > 0 x 0 + w > 1 x 1 + w > 2 x 2 x 0 x 1 x 2 x 3

29 l Toy example to get o 0:3 = w 0:2 x 0:3 o 2 = w > 0 x 1 + w > 1 x 2 + w > 2 x 3 x 0 x 1 x 2 x 3

30 why +m not m in x[n+m]? we didn't reverse h[n] l Toy example to get o 0:3 = w 0:2 x 0:3 o 0 o 1 o 2 o 3 x 0 x 1 x 2 x 3 o 0 = w > 1 x 0 + w > 2 x 1 o 1 = w > 0 x 0 + w > 1 x 1 + w > 2 x 2 o 2 = w > 0 x 1 + w > 1 x 2 + w > 2 x 3 o 3 = w > 0 x 2 + w > 1 x 3 Note: we define 'convolution' as: x[n] h[n] = X m x[n + m]h[m]

31 l Toy example to get o 0:3 = w 0:2 x 0:3 remember that x[n + m]h[m], x[n] h[n] = X m [x 0, x 1,, x N ] [h 1, h 2, h 3 ] multiply the input with one 'coefficient' of the filter, and shift shift x[n]h[0] 0 0 w > 0 x 0 w > 0 x 1 w > 0 x 2 w > 0 x 3 x[n + 1]h[1] x[n + 2]h[2] 0 w > 1 x 0 w > 1 x 1 w > 1 x 2 w > 1 x 3 0 w > 2 x 0 w > 2 x 1 w > 2 x 2 w > 2 x chop the output to get o 0:3

32 l Toy example to get o 0:3 = w 0:2 x 0:3 remember that x[n + m]h[m], x[n] h[n] = X m multiply the input with one 'coefficient' of the filter, and shift 0 0 w > 0 x 0 w > 0 x 1 w > 0 x 2 w > 0 x 3 0 w > 1 x 0 w > 1 x 1 w > 1 x 2 w > 1 x 3 0 w > 2 x 0 w > 2 x 1 w > 2 x 2 w > 2 x o 0 o 1 o 2 o 3 This is the implementation in CURRENNT

33 l Implementation (computeforward) Input feature vectors (output from previous layer) w > 0 Filter 1 w > 1 x 0 x T w > 2 Filter 2 assignproduct(matrix multiplication) step1: matrix multiplication m_databuffer = 2 3 w > 0 x 0 w > 0 x 1 w > 0 x T w > 1 x 0 w > 1 x 1 w > 1 x T 6w > 2 x 0 w > 2 x 1 w > 2 x T from Filter 1 other filters

34 l Implementation (computeforward) step2: sum and merge the results (by calling ConvolutionCore) m_databuffer = 2 3 w > 0 x 0 w > 0 x 1 w > 0 x 2 w > 0 x 3 w > 1 x 0 w > 1 x 1 w > 1 x 2 w > 1 x 3 6w > 2 x 0 w > 2 x 1 w > 2 x 2 w > 2 x from Filter 1 other filters m_outputs = o 0,0 o 0,1 note: each filter only extracts one dimension of the output feature

35 l Implementation (computebackward) step1: propagate gradients to the m_databuffer (ConvolutionCoreGra) m_databuffer = 2 3 w > 0 x 0 w > 0 x 1 w > 0 x 2 w > 0 x 3 w > 1 x 0 w > 1 x 1 w > 1 x 2 w > 1 x 3 6w > 2 x 0 w > 2 x 1 w > 2 x 2 w > 2 x for Filter 1 other filters @o 0,1 just copy the gradient to the corresponding memory cell

36 l Implementation (computebackward) step2: propagate gradients to previous layer precedinglayer().outputerrors() = Filter 1 w > 0 w @x T w > 2 Filter 2 m_databuffer = just matrix multiplication assignproduct(matrix multiplication) 2 3 w > 0 x 0 w > 0 x 1 w > 0 x T w > 1 x 0 w > 1 x 1 w > 1 x T gradients 6w > 2 x 0 w > 2 x 1 w > 2 x T from Filter 1 other filters

37 l Implementation (computebackward) step3: propagate gradients to the filter weight precedinglayer().outputerrors() Input feature vectors = Filter 1 x 0 x T Filter 2 m_databuffer = just matrix multiplication assignproduct(matrix multiplication) 2 3 w > 0 x 0 w > 0 x 1 w > 0 x T w > 1 x 0 w > 1 x 1 w > 1 x T gradients 6w > 2 x 0 w > 2 x 1 w > 2 x T from Filter 1 other filters

38 l increase the receptive field? Toy example: increase by 1 Filter 1 Filter 2 Output features D dim T frames Filter width Input features D dim T frames

39 l increase the receptive field? Toy example: increase by 1 Filter 1 Filter 2 Output features D dim T frames Filter width Input features D dim T frames

40 l increase the receptive field? to get o 0 o 1 o 2 o 3 x 0 x 1 x 2 x 3 o 0 = w > 1 x 0 + w > 2 x 2 o 1 = w > 1 x 1 + w > 2 x 3 o 2 = w > 0 x 0 + w > 1 x 2 o 3 = w > 0 x 1 + w > 1 x 3

41 l increase the receptive field suppose the filter is [w 0, 0, w 1, 0, w 2, 0] w > 0 x 0 w > 0 x 1 w > 0 x 2 w > 0 x w > 1 x 0 w > 1 x 1 w > 1 x 2 w > 1 x w > 2 x 0 w > 2 x 1 w > 2 x 2 w > 2 x

42 l increase the receptive field Reshape the data buffer 0 0 w > 0 x 0 w > 0 x 1 w > 0 x 2 w > 0 x w > 1 x 0 w > 1 x 1 w > 1 x 2 w > 1 x w > 2 x 0 w > 2 x 1 w > 2 x 2 w > 2 x 3 0 0

43 l increase the receptive field in implementation, there will be no all-zero rows 0 0 w > 0 x 0 w > 0 x 1 w > 0 x 2 w > 0 x w > 1 x 0 w > 1 x 1 w > 1 x 2 w > 1 x w > 2 x 0 w > 2 x 1 w > 2 x 2 w > 2 x default interval is 1 for interval T, the interval along the time axis will be T

44 l Convolution along the dimension axis (2-D convolution)? A filter will produce more than 1 dimension of output We only consider non-zero padding 2-D convolution Filter w 0,0 w 0,1 w 0,2 w 1,0 w 1,1 w 1,2 w 0 w 1 w 2 Input features x 0,0 x 1,0 x 2,0 x 0 x 0,1 x 0,2 x 1,1 x 1,2 x 2,1 x 2,2 x 1 x 2 x 3

45 l Convolution along the dimension axis? A filter will produce more than 1 dimension of output We only consider non-zero padding 2-D convolution 2 3 o 0,0 o 0,1 o 0,2 o 0,3 o 0,4 4 5 o 1,0 o 1,1 o 1,2 o 1,3 o 1,4 o 0,0 = w 0,1 x 0,0 + w 1,1 x 1,0 + w 0,2 x 0,1 + w 1,2 x 1,1 w 0,1 x 0,0 w 1,2 w 0,2 x 0,1 w 1,1 x 1,0 x 1,1 x 0 x 1 x 2 x 3

46 l Convolution along the dimension axis? A filter will produce more than 1 dimension of output We only consider non-zero padding 2-D convolution 2 3 o 0,0 o 0,1 o 0,2 o 0,3 o 0,4 4 5 o 1,0 o 1,1 o 1,2 o 1,3 o 1,4 o 0,0 = w 0,1 x 1,0 + w 1,1 x 2,0 + w 0,2 x 1,1 + w 1,2 x 2,1 w 0,1 x 1,0 w 0,2 x 1,1 w 1,1 x 2,0 w 1,2 x 2,1 x 0 x 1 x 2 x 3

47 l Convolution along the dimension axis? A filter will produce more than 1 dimension of output We only consider non-zero padding 2-D convolution 2 3 o 0,0 o 0,1 o 0,2 o 0,3 o 0,4 4 5 o 1,0 o 1,1 o 1,2 o 1,3 o 1,4 x 0 x 1 x 2 x 3

48 l Convolution along the dimension axis? A filter will produce more than 1 dimension of output We only consider non-zero padding 2-D convolution 2 3 o 0,0 o 0,1 o 0,2 o 0,3 o 0,4 4 5 o 1,0 o 1,1 o 1,2 o 1,3 o 1,4 x 0 x 1 x 2 x 3

49 l Convolution along the dimension axis? A filter will produce more than 1 dimension of output We only consider non-zero padding 2-D convolution output from one filter 2 3 o 0,0 o 0,1 o 0,2 o 0,3 o 0,4 4 5 o 1,0 o 1,1 o 1,2 o 1,3 o 1,4 Dimension of output for one filter = Dim_data Height_filter + 1 filter height w 0 w 1 w 2 data dimension x 0 x 1 x 2 x 3

50 l Convolution along the dimension axis (computeforward) step 0: prepare the filter matrix (FilterDuplicate) previous case: w > 1 shift(0,, L) w > 2 shift(0,, L) w > 3 shift(0,, L) shifted filter Filter 1 w > 1 w > 2 shift each row with zero padding shift(1, w > 1, L) w > 2 shift(1,, L) w > 3 shift(1,, L) shifted filter Filter 2 w > 3 shift(2, w > 1, L) w > 2 shift(2,, L) shifted filter w > 3 shift(2,, L) Ø e.g. shift( 1, [w 0,0,w 1,0 ], 4) = [w 1,0, 0, 0, 0] shift(0, [w 0,0,w 1,0 ], 4) = [w 0,0,w 1,0, 0, 0]

51 l Convolution along the dimension axis (computeforward) step 0: prepare the filter matrix then the same steps as in 1-D case

52 l Convolution along the dimension axis (computebackward) The same procedure as previous case. Then, the gradients are summed up (FilterGradientMerge) precedinglayer().outputerrors() Input feature vectors = gradients to filter 1 shift(0, w > 1, L) w > 2 shift(0,, L) x 0 x T gradients to shifted filter 1 gradients to shifted filter 1 w > 3 shift(0,, L) shift(1, w > 1, L) w > 2 shift(1,, L) w > 3 shift(1,, L) shift(2, w > 1, L) shift(2, w > 2, L) shift(2, w > 3, L) assignproduct(matrix multiplication) 2 3 w > 0 x 0 w > 0 x 1 w > 0 x T w > 1 x 0 w > 1 x 1 w > 1 x T gradients 6w > 2 x 0 w > 2 x 1 w > 2 x T 7 4 5

53 l Convolution along the dimension axis (computeforward) step 0: prepare the filter matrix (FilterDuplicate) L: precedinglayer.size() Filter 1 height shift(0, w > 1, L) w > 2 shift(0,, L) m_winwidth_h[0] Filter 1 width w > 1 w > 2 w > 3 shift(0,, L) shift(1, w > 1, L) w > 3 Filter 2 height w > 2 shift(1,, L) w > 3 shift(1,, L) m_winwidth_h[1] m_wintotall Filter 2 width......

54 l Convolution along the dimension axis (computeforward) step 0: prepare the filter matrix (FilterDuplicate) L: precedinglayer.size() Filter 1 width Filter 2 width Filter 1 height w > 1 w > 2 w > 3 Filter 2 height shift(0, w > 1, L) shift(0,, L) shift(1, w > 1, L) shift(1,, L) w > 2 w > 3 shift(0,, L) w > 2 w > 3 shift(1,, L) m_winwidth_h[0] m_winwidth_h[1] N: this.size() = Number of shifted filters = Dimension of this layer's output m_winwidth_h[n-1]

55 l Convolution along the dimension axis (computeforward) step 0: prepare the filter matrix (FilterDuplicate) L: precedinglayer.size() shift(0, w > 1, L) w > 2 shift(0,, L) w > 3 shift(0,, L) shift(1, w > 1, L) w > 2 shift(1,, L) w > 3 shift(1,, L) m_winwidth_cum[1] = 3 row index of the first row m_winindex_h[1] = 0 index of original filter m_winheight_h[1] = filter height m_winshiftindex_h[1] = 1 shift m_winshiftrevid_h[1] = 0 (but m_winshiftrevid_h[0] = how many shifted filters)......

56 l Convolution along the dimension axis (computeforward) step 0: prepare the filter matrix (FilterDuplicate) L: precedinglayer.size() shift(0, w > 1, L) w > 2 shift(0,, L) w > 3 shift(0,, L) shift(1, w > 1, L) w > 2 shift(1,, L) w > 3 shift(1,, L) m_winwidth_cum[0] = 0 m_winindex_h[0] = 0 m_winheight_h[0] = filter height m_winshiftindex_h[0] = 0 m_winshiftrevid_h[1] = 2 (if shifted for two times) m_winwidth_cum[1] = 3 m_winindex_h[1] = 0 m_winheight_h[1] = filter height m_winshiftindex_h[1] = 1 m_winshiftrevid_h[1] = m_winwidth_cum[n]= m_wintotall

57 l Convolution along the dimension axis (computeforward) step 0: prepare the filter matrix (FilterDuplicate) Note: the weight is fact stored in this way (column major) m_wincolindex_h m_winrowindex_h m_wincolheight_h m_winwidthcol_h m_winshiftnum_h shift from original position (column) shift from original position (row) filter height width of the filter how many shifted filters

58 l Convolution along the dimension axis (computeforward) step 0: prepare the filter matrix (FilterDuplicate) Note: the weight is fact stored in this way (column major) m_wincolindex_h[1] = 0 m_winrowindex_h[1] = 0 m_wincolheight_h[1] = filter height m_winwidthcol_h[1] = 3 m_winshiftnum_h[1] = 2 (only used by columns with m_wincolindex_h[k]=0)

l Convolution along the dimension axis (computeforward) step 0: prepare the filter matrix (FilterDuplicate) Note: the weight is fact stored in this way (column major)

59 l Convolution along the dimension axis (computeforward) step 0: prepare the filter matrix (FilterDuplicate) Note: the weight is fact stored in this way (column major) m_wincolindex_h[4] = 3 m_winrowindex_h[4] = 1 m_wincolheight_h[4] = filter height m_winwidthcol_h[4] = 3 m_winshiftnum_h[4] = 0 (only used by columns with m_wincolindex_h[k]=0)

60 CURRENNT CONFIGURATION

61 1-D Convolution Output features D dim T frames Filter 1 Filter 2 Input features D dim T frames

62 1-D Convolution Output features 2 dim T frames Filter 1 Filter 2 { "size": "name": "cn1", "type": "cnn", "window_width": "window_tap_interval": }, output dimension = filter number

63 1-D Convolution Output features 2 dim T frames Filter 1 Filter 2 width { "size": "name": "cn1", "type": "cnn", "window_width": "window_tap_interval": }, 1. window_width is the half width of the filter, window, thus, filter_width = 2 * window_width window_width = 1 for filter 1 3. window_width = 2 for filter 2

64 1-D Convolution Output features D dim T frames Filter 1 Filter 2 { "size": "name": "cn1", "type": "cnn", "window_width": "window_tap_interval": }, by default, tap interval = 1

65 1-D Convolution Output features D dim T frames Filter 1 Filter 2 { "size": "name": "cn1", "type": "cnn", "window_width": "window_tap_interval": }, this case, tap interval = 2 see page 37 for reference

66 2-D Convolution Output features T frames Filter 1 Filter 2 Input features T frames

67 2-D Convolution Output features Filter 1 2 dim "window_width": 1 "window_tap_interval": 1 "window_height": 2

68 2-D Convolution Output features Filter 1 2 dim "window_width": 1 "window_tap_interval": 1 "window_height": 2 "window_stride": 1 3 dim

69 2-D Convolution Output features Filter 1 2 dim "window_width": 1 "window_tap_interval": 1 "window_height": 2 "window_stride": 1 3 dim stride: move 1 step this filter generates: floor((3-2) / 1) + 1 =2 dim

70 2-D Convolution Output features Filter 1 Filter 2 3 dim "window_width": 1 "window_tap_interval": 1 "window_height": 3 "window_stride": 1 3 dim this filter cannot move this filter generates: floor((3-3) / 1) + 1 =1 dim

71 2-D Convolution Output features Filter 1 Filter 2 3 dim "size": 3 thus, these two filters generate 3 dimensions of output the layer size of should be 3

72 2-D Convolution T frames Filter 1 Filter 2 To write the configuration for this layer { "size": 3 "window_width": "1_2" "window_tap_interval": "1_1" "window_height": "2_3" "window_stride": "1_1" } or { "size": 3 "window_width": "1_2" "window_tap_interval": "2*1" "window_height": "2_3" "window_stride": "2*1" } N*M = M_M_M_M..._M (repeat N times)

Νεςπο-Ασαυήρ Υπολογιστική Neuro-Fuzzy Computing

Νεςπο-Ασαυήρ Υπολογιστική Neuro-Fuzzy Computing ΗΥ418 Διδάσκων Δημήτριος Κατσαρός @ Τμ. ΗΜΜΥ Πανεπιστήμιο Θεσσαλίαρ Διάλεξη 21η BackProp for CNNs: Do I need to understand it? Why do we have to write the