CSCI 315: Artificial Intelligence through Deep Learning

Size: px

Start display at page:

Download "CSCI 315: Artificial Intelligence through Deep Learning"

Osborn Owens
5 years ago
Views:

1 CSCI 35: Artificial Intelligence through Deep Learning W&L Fall Term 27 Prof. Levy Convolutional Networks

4 Convolution: Convolution is a mathematical way of combining two signals to form a third signal. It is the single most important technique in Digital Signal Processing.* *

5 Convolution: The Dot Product on Steroids Recall dot product between layers: nn y j=f ( x ij w ij j= i= y w w 2 x y2 w3 x2 x3 )

6 Convolution: The Dot Product on Steroids Let's look dot product in terms of vectors (arrays): n xi w i i= x x2 x3... xn * * * * w w2 w3... wn +

7 Convolution: The Dot Product on Steroids To get convolution, we slide w across x: x x2 x3... xn * * * * w w2 w3... wn x x2 x3... xn * * * * w2 w3 w4... wn+ + y Do you see a problem? + y2 x x2 x3... xn * * * * w3 w4 w5... wn+2 + y3

8 Convolution: The Dot Product on Steroids Usual definition of convolution assumes infinite vectors: yn = k= x n k w k... x x x... 2 * * *... w w w x x x * * *... w w w x x x * * *... w w w y + y2 + y3 Of course, this is unrealistic, so...

9 Convolution: The Dot Product on Steroids We treat w as a finite convolution kernel, and switch to non-negative indices: e.g.,: 2 y n = x n k w k k=... x x x... 2 * * * w w w2... x x x * * * w w w2... x x x * * * w w w2 + y2 + y3 + y4 Do you notice another (small) problem?

10 w x y

11 w x y

12 w x y

13 w x y

14 w x y

15 w x y c c

16 w x y c c c

17 w x y c c c c

18 w x y c c c c c

19 Convolution: What's It Goodfer? Many useful operations can be expressed as a convolution E.g., smoothing, a.ka. moving average, a.k.a. low-pass filtering.

20 w x y

21 w x y.66

22 w x y

23 w x y

24 w x y

25 w x y

26 w x y

27 w x y

28 w x y

30 Convolution: What's It Goodfer? Many useful operations can be expressed as a convolution E.g., edge detection, a.k.a. high-pass filtering

31 w x y

32 w x y -2

33 w x y

34 w x y

35 w x y

36 w x y

37 w x y

38 w x y

39 w x y

41 Two-Dimensional Convolution x w

42 Two-Dimensional Convolution x y

43 Two-Dimensional Convolution x y

44 Two-Dimensional Convolution x y

45 Two-Dimensional Convolution x y

46 Common Kernels for Image Manipulation

47 Common Kernels for Image Manipulation

48 Homebrew Convolution with NumPy (Look Ma, No Loops!)

49 Adding Convolution to Multi-Layer Neural Nets Up to now our networks have been fully connected (every node on a layer connects to every node on the next layer)

50 Adding Convolution to Multi-Layer Neural Nets Our current method is to present an image on the input layer, pass it through a hidden layer, and have our softmax layer classify it as belonging to a particular category ( = airplane; = automobile; 2 = bird, etc.) Problem #: A fully-connected input hidden configuration would require an unmanageable number of weights CIFAR- grayscale: (32x32) x 5 = 52, weights Webcam RGB: (64x48x3) x 5 = 46,8, weights

Adding Convolution to Multi-Layer Neural Nets Problem #2 ( Minsky's Revenge ): With fullyconnected input hidden, each hidden unit has access to all pixels.

51 Adding Convolution to Multi-Layer Neural Nets Problem #2 ( Minsky's Revenge ): With fullyconnected input hidden, each hidden unit has access to all pixels. This arrangement ignores the high degree of correlation (agreement) among nearby pixels, requiring the network to learn and unlearn relationships that we know are already there. similar different

52 Adding Convolution to Multi-Layer Neural Nets Solution: Treat input hidden connection as convolution: Each hidden unit gets input from a small number of nearby inputs (its receptive field ) All hidden units use the same weight values Input Hidden w w2 w3

54 Now we can visualize our input hidden layers as two-dimensional convolution

55 4 4 y j,k = x j+l, k +m wl,m l= m = x y

56 Use a bias and a nonlinear activation function to make it an actual neural network: 4 4 y j,k =f (b + x j+l, k+ m wl, m ) l= m = x y

57 Convolution Layer: What's It Goodfer? Previously we saw a hidden unit as having two major roles: ) Support linearly non-separable functions 2) Respond differently from other hidden units on its layer, based on unique weights So if all hidden units have the same weights, what good are they?

58 Convolution Layer: What's It Goodfer? Translational invariance: detect the same item anywhere in the image! x y w T Detector

59 Convolution Layer: What's It Goodfer? Translational invariance: detect the same item anywhere in the image! x y w T Detector

60 Feature Maps A T Detector is unrealistic: the T would be only a few pixels in size! Instead, we use the weights to detect the same feature (pattern) in different parts of the image. Typical features could include edges (as we saw before) Vertical edge Horizontal edge

61 Multiple Feature Maps Typically, a given hidden layer will have multiple feature maps

62 Multiple Feature Maps Typically, a given hidden layer will have multiple feature maps This gives us great power: we can detect the same idiosyncratic pattern across different regions of the image Think like a neuron : repeating the same small operation over and over gives you big statistical generalizations...

63 Multiple Feature Maps bright, smooth splotches edges dark, smooth University Colonnade

64 Multiple Feature Maps splotches bright, smooth edges dark, smooth University Colonnade

66 Think like a neuron: features that end up being useful don't have to match our intuitions. E.g., MNIST digit features learned by Nielsen's 2-feature layer:

67 Pooling Layers Our goal is to reduce the high-dimensional input (e.g., 64x48 image = 37,2 dimensions) to a simple, one-dimensional classification (e.g., choice of one digit out of ten). Hence any intermediate dimensionality reduction we can perform will be beneficial This is the value of pooling layers.

68 Pooling Layers E.g., Max Pooling: Each unit in the maxpooling layer is set to the maximum of four adjacent units in the hidden layer:

69 Max Pooling Max-pooling layer preserves distinction among feature maps: Hidden Maxpooling

70 The Full Monty Hypothetical network for 28x28 MNIST digits:

71 The Fuller Monty!

72 Dropout: Another Hyper-Parameter With some probability p, the activation (output) of a neuron is set to zero during training. As with weight decay (λ), helps prevent over fitting In principle, could be used in any kind of network, but seems especially popular in convolution nets.

73 Convolutional Networks in TensorFlow* *

74 ???

79 fc = fully connected

80 Pooling has reduced our 28x28 image to 7x7

85 Convolutional Networks: Conclusions ReLU works well as a general-purpose activation function, so long as we use the appropriate loss function (tf.nn.softmax_cross_entropy_with_logits) Looking at the rest of the code (training/testing), you ll see that it s very similar to what we did with the logistic-regression and multilayer-perceptron networks in the previous exercises: TensorFlow is all about setting up the graph (network)!

CSCI 250: Intro to Robotics. Spring Term 2017 Prof. Levy. Computer Vision: A Brief Survey

CSCI 250: Intro to Robotics. Spring Term 2017 Prof. Levy. Computer Vision: A Brief Survey CSCI 25: Intro to Robotics Spring Term 27 Prof. Levy Computer Vision: A Brief Survey What Is Computer Vision? Higher-order tasks Face recognition Object recognition (Deep Learning) What Is Computer Vision?