Outline. CSCI567 Machine Learning (Spring 2019) Outline. Math formulation. Prof. Victor Adamchik. Feb. 12, 2019
|
|
- Joshua Lawrence
- 5 years ago
- Views:
Transcription
1 Outline CSCI56 Machine Learning (Spring 29) Review of last lecture Prof. Victor Adamchik 2 Convolutional neural networks U of Southern California Feb. 2, 29 Kernel methods February 2, 29 / 48 February 2, 29 2 / 48 Outline Math formulation Review of last lecture An L-layer neural net can be written as f(x) = h L ( W L h L ( W L h ( W x ))) 2 Convolutional neural networks Kernel methods To ease notation, for a given input x, define recursively o = x, a l = W l o l, o l = h l (a l ) (l =,..., L) where D = D, D,..., D L are numbers of neurons at each layer W l R D l D l is the weights for layer l a l R D l is input to layer l o l R D l is output to layer l h l : R D l R D l is activation functions at layer l February 2, 29 / 48 February 2, 29 4 / 48
2 Putting everything into SGD Outline The backpropagation algorithm (986) computes the gradient of the cost function for a single training example. Initialize W,..., W L (all or randomly). Repeat: 2 randomly pick one data point n [N] forward propagation: for each layer ` =,..., L I ` ` ` compute a = W o ` Review of last lecture 2 Convolutional neural networks Motivation Architecture Kernel methods ` and o = h` (a ) (o = xn ) backward propagation: for each ` = L,..., I compute En a` I update weights W` W` λ En En = W ` λ ` (o` )T ` W a February 2, 29 5 / 48 February 2, 29 6 / 48 Acknowledgements Image Classification: A core task in Computer Vision (assume given set of discrete labels) {dog, cat, truck, plane,...} The materials I used to develop these notes include the following sources: Stanford Course Cs2n: (taught by Prof. Fei-Fei Li and her students). cat Dr. Ian Goodfellow s lectures on deep learning: Both website provides tons of useful resources: notes, demos, videos, etc. This image by Nikita is licensed under CC-BY 2. February 2, 29 / 48 Lecture 2-6 April 6, 2
3 The Problem: Semantic Gap Challenges: Viewpoint variation What the computer sees An image is just a big grid of numbers between [, 255]: All pixels change when the camera moves! e.g. 8 x 6 x ( channels RGB) This image by Nikita is licensed under CC-BY 2. This image by Nikita is licensed under CC-BY 2. Lecture 2 - April 6, 2 Challenges: Illumination This image is CC. public domain This image is CC. public domain April 6, 2 Challenges: Deformation This image is CC. public domain This image is CC. public domain This image by Umberto Salvagnin is licensed under CC-BY 2. Lecture 2-8 Lecture 2-9 April 6, 2 This image by Umberto Salvagnin is licensed under CC-BY 2. This image by sare bear is licensed under CC-BY 2. Lecture 2 - This image by Tom Thai is licensed under CC-BY 2. April 6, 2
4 Challenges: Occlusion This image is CC. public domain Challenges: Background Clutter This image by jonsson is licensed under CC-BY 2. This image is CC. public domain This image is CC. public domain Lecture 2 - April 6, 2 This image is CC. public domain Lecture 2-2 April 6, 2 Fundamental problems in vision Challenges: Intraclass variation The key challenge How to train a model that can tolerate all those variations? Main ideas need a lot of data that exhibits those variations need more specialized models to capture the invariance This image is CC. public domain Lecture 2 - April 6, 2 February 2, 29 8 / 48
5 A bit of history: A bit of history: Gradient-based learning applied to document recognition [LeCun, Bottou, Bengio, Haffner 998] ImageNet Classification with Deep Convolutional Neural Networks [Krizhevsky, Sutskever, Hinton, 22] Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 22. Reproduced with permission. AlexNet LeNet-5 4 Lecture 5-4 April 8, 2 Issues of standard NN for image inputs 5 Lecture 5-5 April 8, 2 Solution: Convolutional Neural Net (ConvNet/CNN) A special case of fully connected neural nets Fully Connected Layer usually consist of convolution layers, ReLU layers, pooling layers, and regular fully connected layers 2x2x image -> stretch to 2 x key idea: learning from low-level to high-level features input 2 activation x 2 weights number: the result of taking a dot product between a row of W and the input (a 2-dimensional dot product) Lecture 5-2 April 8, 2 Spatial structure is lost! February 2, 29 9 / 48 February 2, 29 / 48
6 CHAPTER 9. CONVOLUTIONAL NETWORKS Convolution layer Convolution Arrange neurons as a D matrix/tensor naturally 2D Convolution Convolution Layer Input 2x2x image -> preserve spatial structure a b c d e f g h i j k Kernel w x y z (filter/receptive field) l Output 2 height 2 width depth Lecture 5 - aw ey bx fz bw fy cx gz cw gy dx hz ew iy fx jz fw jy gx kz gw ky hx lz April 8, 2 February 2, 29 / 48 Figure 9.: An example of 2-D convolution without kernel-flipping. In this case we restrict the output to only positions where the kernel lies entirely within the image, called valid convolution in some contexts. We draw boxes with arrows to indicate how the upper-left element of the output tensor is formed by applying the kernel to the corresponding upper-left region of the input tensor. Convolution Layer Convolution Layer 2x2x image 2x2x image 4 5x5x filter 2 / 48 Filters always extend the full depth of the input volume 5x5x filter 2 2 Convolve the filter with the image i.e. slide over the image spatially, computing dot products Convolve the filter with the image i.e. slide over the image spatially, computing dot products 2 2 (Goodfellow 26) February 2, 29 Lecture 5-29 April 8, 2 Lecture 5 - April 8, 2
7 Convolution Layer Convolution Layer activation map 2 2x2x image 5x5x filter 2 2x2x image 5x5x filter convolve (slide) over all spatial locations number: the result of taking a dot product between the filter and a small 5x5x chunk of the image (i.e. 5*5* = 5-dimensional dot product bias) 2 2 Convolution Layer Lecture 5 - April 8, 2 consider a second, green filter Lecture 5-2 April 8, 2 For example, if we had 6 5x5 filters, we ll get 6 separate activation maps: activation maps 2 2x2x image 5x5x filter activation maps 2 Convolution Layer convolve (slide) over all spatial locations 2 Lecture 5 - April 8, We stack these up to get a new image of size xx6! Lecture 5-4 April 8, 2
8 Preview: ConvNet is a sequence of Convolution Layers, interspersed with activation functions 2 2 Preview: ConvNet is a sequence of Convolutional Layers, interspersed with activation functions CONV, ReLU e.g. 6 5x5x filters Lecture 5-5 April 8, 2 CONV, ReLU e.g. 6 5x5x filters 24 CONV, ReLU e.g. 5x5x6 filters 6 CONV, ReLU. 24 Lecture 5-6 April 8, 2 Why convolution makes sense? CHAPTER 9. CONVOLUTIONAL NETWORKS one filter => one activation map example 5x5 filters (2 total) We call the layer convolutional because it is related to convolution of two signals: elementwise multiplication and sum of a filter and the signal (image) Figure copyright Andrej Karpathy. Lecture 5-9 April 8, 2 Main idea: if a filter is useful at one location, it should be useful at other locations. CHAPTER 9. CONVOLUTIONAL NETWORKS A simple example why filtering is useful Input Figure 9.6: Efficiency of edge detection. The image on the right was formed by taking each pixel in the original image and subtracting the value of its neighboring pixel on the left. This shows the strength of all of the vertically oriented edges in the input image, which can be a useful operation for object detection. Both images are pixels tall. - - while the output image is 9 pixels Output The input image is Figure 2 pixels wide. This 9.6: wide Efficiency of edge detection. The image on the right was formed by taking transformation can be described by aoriginal convolution containingthe two elements, and each pixel in the image kernel and subtracting value of its neighboring pixel on the Kernel requires 9 left. = 26,shows 96 floating point of operations multiplications and in the input image, This the strength all of the (two vertically oriented edges one addition per output using convolution. To describe theimages same are pixels tall. whichpixel) can betoacompute useful operation for object detection. Both transformation with The a matrix would wide take 2 9 image, orisover inputmultiplication image is 2 pixels while the output 9 pixels wide. This (Goodfellow 26) eight billion, entries in the matrix, making foura billion times kernel more efficient for two elements, and transformation can beconvolution described by convolution containing representing this transformation. algorithm February 2, 29 requires 9 The straightforward = 26, 96 matrix floatingmultiplication point operations (two multiplications and performs over sixteenone billion floating operations, convolution roughly 6, addition perpoint output pixel) tomaking compute using convolution. To describe the same times more efficient computationally. Of course, most of the entries of the matrix would be / 48
9 Connection to fully connected NNs Local Receptive Field Leads to Sparse Connectivity (affects less) Sparse connections due to small convolution kernel A convolution layer is a special case of a fully connected layer: filter = weights with sparse connection s s2 s s4 s5 x x2 x x4 x5 s s2 s s4 s5 x x2 x x4 x5 Dense connections February 2, 29 4 / 48 Sparse connectivity: being affected by less Figure 9.2: Sparse connectivity, viewed from below: We highlight one input unit, x(goodfellow 26), and also highlight the output units in s that are affected by this unit. (Top)When s is formed by convolution with a kernel of width, only three outputs are affected by x. (Bottom)When s is formed by matrix multiplication, connectivity is no longer sparse, so all of the outputs are affected by x. Connection to fully connected NNs CHAPTER 9. CONVOLUTIONAL NETWORKS Sparse connections due to small convolution kernel s s2 s s4 s5 A convolution layer is a special case of6a fully connected layer: filter = weights with sparse connection x x2 x x4 x5 parameters sharing s s2 s s4 s5 x x2 x x4 x5 Dense connections Figure 9.: Sparse connectivity, viewed from above: We highlight one output unit, s, (Goodfellow 26) and also highlight the input units in x that affect this unit. These units are known as the receptive field of s. (Top)When s is formed by convolution with a kernel of Figure 9. February 2, 29 5 / 48
10 Connection to fully connected NNs CHAPTER 9. CONVOLUTIONAL NETWORKS Parameter Sharing Convolution shares the same parameters across all spatial locations Traditional matrix multiplication does not share any parameters s s2 s s4 A convolution layer is a special case of a fully connected layer: filter = weights with sparse connection parameters sharing s5 Much less parameters! Example (ignore bias terms): x x2 x x4 x5 s s2 s s4 s5 x x2 x x4 x5 FC: (2 2 ) ( ) 2.4M CNN: 5 5 = 5 Figure 9.5: Parameter sharing: Black arrows indicate the connections that use a particular (Goodfellow 26) Figure (Top)The 9.5 parameter in two different models. black arrows indicate uses of the central element of a -element kernel in a convolutional model. Due to parameter sharing, this single parameter is used at all input locations. (Bottom)The single black arrow indicates the use of the central element of the weight matrix in a fully connected model. This model Spatial arrangement: stride padding has no parameter sharing so the and parameter is used only once. February 2, 29 6 / 48 A closer look at spatial dimensions: for every location, we learn only one set. This does not affect the runtime of A closer look at spatial dimensions: forward propagation it is still O(k n) but it does further reduce the storage requirements of the model to k parameters. Recall that k is usually several orders of magnitude less than m. Since m and n are usually roughly the same size, k is x input (spatially) x inputto (spatially) practically insignificant compared m n. Convolution is thus dramatically more assume x filter assume x filter in terms of the memory requirements efficient than dense matrix multiplication and statistical efficiency. For a graphical depiction of how parameter sharing works, see figure 9.5. As an example of both of these first two principles in action, figure 9.6 shows how sparse connectivity and parameter sharing can dramatically improve the efficiency of a linear function for detecting edges in an image. In the case of convolution, the particular form of parameter sharing causes the Lecture 5-42 April 8, 2 layer to have a property called equivariance to translation. To say a function is Li & Justin Johnson & Serena Yeung Fei-Fei equivariant means that if the input changes, the output changes in the same way. Specifically, a function f (x) is equivariant to a function g if f (g(x)) = g(f (x)). February 2, 29 / 48 In the case of convolution, if we let g be any function that translates the input, i.e., shifts it, then the convolution function is equivariant to g. For example, let I Lecture 5-4 April 8, 2
11 A closer look at spatial dimensions: A closer look at spatial dimensions: x input (spatially) assume x filter x input (spatially) assume x filter Lecture 5-44 April 8, 2 A closer look at spatial dimensions: x input (spatially) assume x filter applied with stride 2 x input (spatially) assume x filter => 5x5 output April 8, 2 A closer look at spatial dimensions: Lecture 5-45 Lecture 5-46 April 8, 2 Lecture 5-4 April 8, 2
12 A closer look at spatial dimensions: A closer look at spatial dimensions: x input (spatially) assume x filter applied with stride 2 Lecture 5-48 April 8, 2 A closer look at spatial dimensions: Lecture 5-49 April 8, 2 A closer look at spatial dimensions: x input (spatially) assume x filter applied with stride? x input (spatially) assume x filter applied with stride 2 => x output! x input (spatially) assume x filter applied with stride? Lecture 5-5 April 8, 2 doesn t fit! cannot apply x filter on x input with stride. Lecture 5-5 April 8, 2
13 N In practice: Common to zero pad the border Output size: (N - F) / stride F N F e.g. N =, F = : stride => ( - )/ = 5 stride 2 => ( - )/2 = stride => ( - )/ = 2. :\ e.g. input x x filter, applied with stride pad with pixel border => what is the output? (recall:) (N - F) / stride Lecture 5-52 April 8, 2 In practice: Common to zero pad the border e.g. input x x filter, applied with stride pad with pixel border => what is the output? Lecture 5-54 April 8, 2 Lecture 5-5 April 8, 2 In practice: Common to zero pad the border x output! e.g. input x x filter, applied with stride pad with pixel border => what is the output? x output! in general, common to see CONV layers with stride, filters of size FxF, and zero-padding with (F-)/2. (will preserve size spatially) e.g. F = => zero pad with F = 5 => zero pad with 2 F = => zero pad with Lecture 5-55 April 8, 2
14 Remember back to E.g. 2x2 input convolved repeatedly with 5x5 filters shrinks volumes spatially! (2 -> -> 24...). Shrinking too fast is not good, doesn t work well. 2 2 CONV, ReLU e.g. 6 5x5x filters Input volume: 2x2x 5x5 filters with stride, pad 2 24 CONV, ReLU e.g. 5x5x6 filters 6 Examples time: CONV, ReLU. Output volume size:? 24 Lecture 5-56 April 8, 2 Lecture 5-5 Examples time: Examples time: Input volume: 2x2x 5x5 filters with stride, pad 2 Input volume: 2x2x 5x5 filters with stride, pad 2 Output volume size: (22*2-5)/ = 2 spatially, so 2x2x Number of parameters in this layer? Lecture 5-58 April 8, 2 Lecture 5-59 April 8, 2 April 8, 2
15 Summary for convolution layer Input: a tensor of size W H D Examples time: Hyperparameters: K filters of size F F Input volume: 2x2x 5x5 filters with stride, pad 2 Number of parameters in this layer? each filter has 5*5* = 6 params => 6* = 6 stride S amount of zero padding P (for one side) Output: a tensor of size W2 H2 D2 where W2 = (W 2P F )/S ( for bias) H2 = (H 2P F )/S D2 = K Lecture 5-6 April 8, 2 #parameters: (F F D ) K weights Common setting: F =, S = P = and K is a power of two. February 2, 29 The brain/neuron view of CONV Layer 2 2 The brain/neuron view of CONV Layer 2x2x image 5x5x filter 2 Lecture x2x image 5x5x filter It s just a neuron with local connectivity... number: the result of taking a dot product between the filter and this part of the image (i.e. 5*5* = 5-dimensional dot product) 8 / 48 2 April 8, 2 number: the result of taking a dot product between the filter and this part of the image (i.e. 5*5* = 5-dimensional dot product) Lecture 5-6 April 8, 2
16 The brain/neuron view of CONV Layer The brain/neuron view of CONV Layer x5 filter -> 5x5 receptive field for each neuron Lecture 5-68 April 8, 2 Another element: pooling 5 Lecture 5-69 April 8, 2 Max pooling Max pooling with 2 2 filter and stride 2 is very common Pooling layer - There will be 5 different neurons all looking at the same region in the input volume E.g. with 5 filters, CONV layer consists of neurons arranged in a D grid (xx5) An activation map is a x sheet of neuron outputs:. Each is connected to a small region in the input 2. All of them share parameters makes the representations smaller and more manageable operates over each activation map independently: MAX POOLING Single depth slice x Lecture 5-2 April 8, max pool with 2x2 filters and stride y Types of pooling: average, L2-norm, max February 2, / 48 Lecture 5 - April 8, 2 February 2, 29 2 / 48
17 Putting everything together Revolution of Depth Typical architecture for CNNs: Input [[Conv ReLU]*N Pool]*M [FC ReLU]*Q Softmax Common choices: N 5, Q 2, M is large Well-known CNNs: LeNet (998), AlexNet(22), ZF Net(2), GoogLeNet(24), VGGNet(24), ResNet(25), FractalNet(2) February 2, 29 2 / 48 February 2, / 48 How to train a CNN? Outline How do we learn the filters/weights? Run the image through layers of convolutions and train the filters with SGD/backpropagation What computer architecture to use? buy or more GPUs (NVIDIA graphic card) What software package to use? State-of-the-art: TensorFlow plus Keras or Caffe on the top. See demo at: cifar.html Review of last lecture 2 Convolutional neural networks Kernel methods Motivation Dual formulation of linear regression Kernel Trick February 2, 29 2 / 48 February 2, / 48
18 Motivation Case study: regularized linear regression Recall the question: how to choose nonlinear basis φ : R D R M? w T φ(x) neural network is one approach: learn φ from data kernel method is another one: sidestep the issue of choosing φ by using kernel functions Kernel methods work for many problems and we take regularized linear regression as an example. Recall the regularized least square solution: w = argmin F (w) w ( = argmin Φw y 2 2 λ w 2 2) w = ( Φ T Φ λi ) Φ T y Φ = φ(x ) T φ(x 2 ) T. φ(x N ) T Here Φ is N M matrix, and Φ T Φ is M M matrix,, y = Issue: operate in space R M and M could be huge or even infinity! y y 2. y N February 2, / 48 February 2, / 48 A closer look at the least square solution Why is this helpful? By setting the gradient of F (w) = Φw y 2 2 λ w 2 2 to : Φ T (Φw y) λw = Next we move λw to the other side and divide it by λ: w = λ ΦT (y Φw ) Let us define a new vector α = y Φw λ. Then w = Φ T α = N α n φ(x n ) n= Assuming we know α, the prediction of w on a new example x is w T φ(x) = N ( α n φ(xn ) T φ(x) ) n= Therefore we do not really need to know w. Only inner products in the new feature space matter! Kernel methods are exactly about computing inner products without knowing φ. Thus the least square solution is a linear combination of features! Note this is true for perceptron and many other problems. Of course, the above calculation does not show what α is. February 2, 29 2 / 48 But we need to figure out what α is! February 2, 29 / 48
19 How to find α? Gram matrix Plugging in w = Φ T α into F (w) gives G(α) = F (Φ T α) We find α by minimizing G(α). = ΦΦ T α y 2 2 λ Φ T α 2 2 = ΦΦ T α y 2 2 λα T ΦΦ T α = Kα y 2 2 λα T Kα (K = ΦΦ T ) Here K = ΦΦ T is called Gram matrix or kernel matrix where the (i, j) entry is φ(x i ) T φ(x j ) K = ΦΦ T = φ(x ) T φ(x ) φ(x ) T φ(x 2 ) φ(x ) T φ(x N ) φ(x 2 ) T φ(x ) φ(x 2 ) T φ(x 2 ) φ(x 2 ) T φ(x N ) φ(x N ) T φ(x ) φ(x N ) T φ(x 2 ) φ(x N ) T φ(x N ) RN N February 2, / 48 February 2, 29 / 48 Examples of kernel matrix data points in R x =, x 2 =, x = φ is polynomial basis with degree 4: φ(x ) = φ(x) = φ(x 2) = x x 2 x φ(x ) = Calculation of the Gram matrix φ(x ) = Gram/Kernel matrix K = = φ(x 2) = φ(x ) = φ(x ) T φ(x ) φ(x ) T φ(x 2 ) φ(x ) T φ(x ) φ(x 2 ) T φ(x ) φ(x 2 ) T φ(x 2 ) φ(x 2 ) T φ(x ) φ(x ) T φ(x ) φ(x ) T φ(x 2 ) φ(x ) T φ(x ) 4 4 February 2, 29 / 48 February 2, 29 2 / 48
20 Gram matrix vs covariance matrix How to find α? Differentiate G(α) G(α) = F (Φ T α) = Kα y 2 2 λα T Kα dimensions entry (i, j) property ΦΦ T N N φ(x i ) T φ(x j ) both are symmetric and Φ T Φ M M N n= φ(x n) i φ(x n ) j positive semidefinite and set it to. We have (note K is symmetric) = 2K (Kα y) 2λKα = (Kα y) λα = (K λi)α y It follows, that α = (K λi) y is a minimizer and we obtain w = Φ T α = Φ T (K λi) y February 2, 29 / 48 February 2, 29 4 / 48 Comparing two solutions Then what is the difference? Minimizing F (w) gives w = (Φ T Φ λi) Φ T y Minimizing G(α) gives w = Φ T (ΦΦ T λi) y, Note I has different dimensions in two formulas. Natural question: are they the same or different? They are the same, and here is the proof: ΦΦ T = K First, computing (ΦΦ T λi) = (K λi) can be more efficient than computing (Φ T Φ λi) when N M. More importantly, computing α = (K λi) also only requires computing inner products in the new feature space! (Φ T Φ λi ) Φ T y = (Φ T Φ λi ) Φ T (ΦΦ T λi 2 )(ΦΦ T λi 2 ) y = (Φ T Φ λi ) (Φ T ΦΦ T λφ T )(ΦΦ T λi 2 ) y = (Φ T Φ λi ) (Φ T Φ λi )Φ T (ΦΦ T λi 2 ) y = Φ T (ΦΦ T λi 2 ) y Now we can conclude that the exact form of φ( ) is not essential; all we need is computing inner products φ(x) T φ(x ). For some φ it is indeed possible to compute φ(x) T φ(x ) without computing/knowing φ. This is the kernel trick. February 2, 29 5 / 48 February 2, 29 6 / 48
21 Example of the kernel trick Consider the following polynomial basis φ : R 2 R : x 2 φ(x) = 2x x 2 What is the inner product between φ(x) and φ(x )? φ(x) T φ(x ) = x 2 x 2 2x x 2 x x 2 x 2 2 x 2 2 = (x x x 2 x 2) 2 = (x T x ) 2 Therefore, the inner product in the new space is simply a function of the inner product in the original space. Count the number of multiplications. x 2 2 February 2, 29 / 48 Another example φ : R D R 2D is parameterized by θ: φ θ (x) = cos(θx ) sin(θx ). cos(θx D ) sin(θx D ) What is the inner product between φ θ (x) and φ θ (x )? φ θ (x) T φ θ (x ) = = D cos(θx d ) cos(θx d ) sin(θx d) sin(θx d ) d= D cos(θ(x d x d )) d= Once again, the inner product in the new space is a simple function of the features in the original space. February 2, 29 8 / 48 More complicated example Based on the previous example mapping φ θ, we define a new one φ L : R D R 2D(L) as follows: φ (x) φ 2π (x) L φ L (x) = φ 2 2π (x) L. (x) φ L 2π L What is the inner product between φ L (x) and φ L (x )? φ L (x) T φ L (x ) = = L l= L φ 2πl L D l= d= (x) T φ 2πl (x ) L ( ) 2πl cos L (x d x d ) Infinite dimensional mapping Let us set L. This means that φ L (x) vector has infinite dimension. Clearly we cannot compute φ L (x), but we can still compute the inner product: φ (x) T φ (x ) = = 2π D d= D d= cos(θ(x d x d )) dθ sin(2π(x d x d )) x d x d Again, a simple function of the original features. Note that using this mapping in linear regression, we are learning a weight w with infinite dimension! February 2, 29 9 / 48 February 2, 29 4 / 48
22 Kernel functions Using kernel functions Definition: a function k : R D R D R is called a (positive semidefinite) kernel function if there exists a function φ : R D R M so that for any x, x R D, k(x, x ) = φ(x) T φ(x ) Examples we have seen k(x, x ) = (x T x ) 2 k(x, x ) = D d= sin(2π(x d x d )) x d x d Choosing a nonlinear basis φ becomes choosing a kernel function. As long as computing the kernel function is more efficient, we should apply the kernel trick. Gram/kernel matrix becomes: k(x, x ) k(x, x 2 ) k(x, x N ) K = ΦΦ T k(x 2, x ) k(x 2, x 2 ) k(x 2, x N ) =.... k(x N, x ) k(x N, x 2 ) k(x N, x N ) In fact, k is a kernel if and only if K is positive semidefinite for any x, x 2,..., x N (Mercer theorem). February 2, 29 4 / 48 February 2, / 48 Using kernel functions More examples of kernel functions Two most commonly used kernel functions in practice: The prediction on a new example x is N N w T φ(x) = α n φ(x n ) T φ(x) = α n k(x n, x) n= n= Polynomial kernel for c and d is a positive integer. k(x, x ) = (x T x c) d Gaussian kernel or Radial basis function (RBF) kernel k(x, x ) = e x x 2 2 2σ 2 for some σ >. Think about what the corresponding φ is for each kernel. February 2, 29 4 / 48 February 2, / 48
23 Composing kernels Examples that are not kernels Creating more kernel functions using the following rules: If k (, ) and k 2 (, ) are kernels, the followings are kernels too linear combination: αk (, ) βk 2 (, ) if α, β product: k (, )k 2 (, ) exponential: e k(, ) Verify using the definition of kernel! Function is not a kernel, why? k(x, x ) = x x 2 2 Example: if it is a kernel, the kernel matrix for two data points x and x 2 : ( ) ( k(x, x K = ) k(x, x 2 ) x = x 2 2 ) 2 k(x 2, x ) k(x 2, x 2 ) x x must be positive semidefinite, but is it? February 2, / 48 February 2, / 48 Kernelizing other ML algorithms Example: Kernelizing NNC Kernel trick is applicable to many ML algorithms: nearest neighbor classifier perceptron logistic regression SVM For NNC with L2 distance, the key is to compute for any two points x, x d(x, x ) = x x 2 2 = x T x x T x 2x T x With a kernel function k, we simply compute d kernel (x, x ) = k(x, x) k(x, x ) 2k(x, x ) which by definition is the L2 distance in a new feature space d kernel (x, x ) = φ(x) φ(x ) 2 2 February 2, 29 4 / 48 February 2, / 48
Lecture 7 Convolutional Neural Networks
Lecture 7 Convolutional Neural Networks CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 17, 2017 We saw before: ŷ x 1 x 2 x 3 x 4 A series of matrix multiplications:
More informationDeep Learning (CNNs)
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deep Learning (CNNs) Deep Learning Readings: Murphy 28 Bishop - - HTF - - Mitchell
More informationConvolutional Neural Networks II. Slides from Dr. Vlad Morariu
Convolutional Neural Networks II Slides from Dr. Vlad Morariu 1 Optimization Example of optimization progress while training a neural network. (Loss over mini-batches goes down over time.) 2 Learning rate
More informationLecture 17: Neural Networks and Deep Learning
UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions
More informationIntroduction to Convolutional Neural Networks (CNNs)
Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei
More informationCS 1674: Intro to Computer Vision. Final Review. Prof. Adriana Kovashka University of Pittsburgh December 7, 2016
CS 1674: Intro to Computer Vision Final Review Prof. Adriana Kovashka University of Pittsburgh December 7, 2016 Final info Format: multiple-choice, true/false, fill in the blank, short answers, apply an
More informationDeep Learning. Convolutional Neural Network (CNNs) Ali Ghodsi. October 30, Slides are partially based on Book in preparation, Deep Learning
Convolutional Neural Network (CNNs) University of Waterloo October 30, 2015 Slides are partially based on Book in preparation, by Bengio, Goodfellow, and Aaron Courville, 2015 Convolutional Networks Convolutional
More informationBayesian Networks (Part I)
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Bayesian Networks (Part I) Graphical Model Readings: Murphy 10 10.2.1 Bishop 8.1,
More informationLecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets)
COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets) Sanjeev Arora Elad Hazan Recap: Structure of a deep
More informationClassification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box
ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses
More informationCS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS
CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS LAST TIME Intro to cudnn Deep neural nets using cublas and cudnn TODAY Building a better model for image classification Overfitting
More informationConvolutional Neural Network Architecture
Convolutional Neural Network Architecture Zhisheng Zhong Feburary 2nd, 2018 Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 1 / 55 Outline 1 Introduction of Convolution Motivation
More informationConvolutional neural networks
11-1: Convolutional neural networks Prof. J.C. Kao, UCLA Convolutional neural networks Motivation Biological inspiration Convolution operation Convolutional layer Padding and stride CNN architecture 11-2:
More informationLoss Functions and Optimization. Lecture 3-1
Lecture 3: Loss Functions and Optimization Lecture 3-1 Administrative Assignment 1 is released: http://cs231n.github.io/assignments2017/assignment1/ Due Thursday April 20, 11:59pm on Canvas (Extending
More informationMachine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016
Machine Learning for Signal Processing Neural Networks Continue Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 1 So what are neural networks?? Voice signal N.Net Transcription Image N.Net Text
More informationLoss Functions and Optimization. Lecture 3-1
Lecture 3: Loss Functions and Optimization Lecture 3-1 Administrative: Live Questions We ll use Zoom to take questions from remote students live-streaming the lecture Check Piazza for instructions and
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationCSCI 315: Artificial Intelligence through Deep Learning
CSCI 35: Artificial Intelligence through Deep Learning W&L Fall Term 27 Prof. Levy Convolutional Networks http://wernerstudio.typepad.com/.a/6ad83549adb53ef53629ccf97c-5wi Convolution: Convolution is
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationCSCI567 Machine Learning (Fall 2018)
CSCI567 Machine Learning (Fall 2018) Prof. Haipeng Luo U of Southern California Sep 12, 2018 September 12, 2018 1 / 49 Administration GitHub repos are setup (ask TA Chi Zhang for any issues) HW 1 is due
More informationNeural networks COMS 4771
Neural networks COMS 4771 1. Logistic regression Logistic regression Suppose X = R d and Y = {0, 1}. A logistic regression model is a statistical model where the conditional probability function has a
More informationNeural Networks 2. 2 Receptive fields and dealing with image inputs
CS 446 Machine Learning Fall 2016 Oct 04, 2016 Neural Networks 2 Professor: Dan Roth Scribe: C. Cheng, C. Cervantes Overview Convolutional Neural Networks Recurrent Neural Networks 1 Introduction There
More informationSGD and Deep Learning
SGD and Deep Learning Subgradients Lets make the gradient cheating more formal. Recall that the gradient is the slope of the tangent. f(w 1 )+rf(w 1 ) (w w 1 ) Non differentiable case? w 1 Subgradients
More information<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)
Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation
More informationKernel methods, kernel SVM and ridge regression
Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;
More informationConvolutional Neural Networks
Convolutional Neural Networks Books» http://www.deeplearningbook.org/ Books http://neuralnetworksanddeeplearning.com/.org/ reviews» http://www.deeplearningbook.org/contents/linear_algebra.html» http://www.deeplearningbook.org/contents/prob.html»
More informationNeural networks and optimization
Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional
More informationFrom perceptrons to word embeddings. Simon Šuster University of Groningen
From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written
More informationLecture 35: Optimization and Neural Nets
Lecture 35: Optimization and Neural Nets CS 4670/5670 Sean Bell DeepDream [Google, Inceptionism: Going Deeper into Neural Networks, blog 2015] Aside: CNN vs ConvNet Note: There are many papers that use
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationNeural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35
Neural Networks David Rosenberg New York University July 26, 2017 David Rosenberg (New York University) DS-GA 1003 July 26, 2017 1 / 35 Neural Networks Overview Objectives What are neural networks? How
More information1 Machine Learning Concepts (16 points)
CSCI 567 Fall 2018 Midterm Exam DO NOT OPEN EXAM UNTIL INSTRUCTED TO DO SO PLEASE TURN OFF ALL CELL PHONES Problem 1 2 3 4 5 6 Total Max 16 10 16 42 24 12 120 Points Please read the following instructions
More informationIntroduction to CNN and PyTorch
Introduction to CNN and PyTorch Kripasindhu Sarkar kripasindhu.sarkar@dfki.de Kaiserslautern University, DFKI Deutsches Forschungszentrum für Künstliche Intelligenz http://av.dfki.de Some of the contents
More informationMachine Learning Basics
Security and Fairness of Deep Learning Machine Learning Basics Anupam Datta CMU Spring 2019 Image Classification Image Classification Image classification pipeline Input: A training set of N images, each
More informationStatistical Machine Learning
Statistical Machine Learning Lecture 9 Numerical optimization and deep learning Niklas Wahlström Division of Systems and Control Department of Information Technology Uppsala University niklas.wahlstrom@it.uu.se
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17
3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural
More informationIntroduction to Convolutional Neural Networks 2018 / 02 / 23
Introduction to Convolutional Neural Networks 2018 / 02 / 23 Buzzword: CNN Convolutional neural networks (CNN, ConvNet) is a class of deep, feed-forward (not recurrent) artificial neural networks that
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 16
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 16 Slides adapted from Jordan Boyd-Graber, Justin Johnson, Andrej Karpathy, Chris Ketelsen, Fei-Fei Li, Mike Mozer, Michael Nielson
More informationJakub Hajic Artificial Intelligence Seminar I
Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network
More informationIntroduction to Machine Learning (67577)
Introduction to Machine Learning (67577) Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Deep Learning Shai Shalev-Shwartz (Hebrew U) IML Deep Learning Neural Networks
More informationCENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 8. Sinan Kalkan
CENG 783 Special topics in Deep Learning AlchemyAPI Week 8 Sinan Kalkan Loss functions Many correct labels case: Binary prediction for each label, independently: L i = σ j max 0, 1 y ij f j y ij = +1 if
More informationConvolutional Neural Networks. Srikumar Ramalingam
Convolutional Neural Networks Srikumar Ramalingam Reference Many of the slides are prepared using the following resources: neuralnetworksanddeeplearning.com (mainly Chapter 6) http://cs231n.github.io/convolutional-networks/
More informationDeep Feedforward Networks. Sargur N. Srihari
Deep Feedforward Networks Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Neural networks Daniel Hennes 21.01.2018 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Logistic regression Neural networks Perceptron
More informationBackpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation Matt Gormley Lecture 12 Feb 23, 2018 1 Neural Networks Outline
More informationNeural Architectures for Image, Language, and Speech Processing
Neural Architectures for Image, Language, and Speech Processing Karl Stratos June 26, 2018 1 / 31 Overview Feedforward Networks Need for Specialized Architectures Convolutional Neural Networks (CNNs) Recurrent
More informationLogistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 8 Feb. 12, 2018 1 10-601 Introduction
More informationCSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer
CSE446: Neural Networks Spring 2017 Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer Human Neurons Switching time ~ 0.001 second Number of neurons 10 10 Connections per neuron 10 4-5 Scene
More informationConvolution and Pooling as an Infinitely Strong Prior
Convolution and Pooling as an Infinitely Strong Prior Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Convolutional
More informationΝεςπο-Ασαυήρ Υπολογιστική Neuro-Fuzzy Computing
Νεςπο-Ασαυήρ Υπολογιστική Neuro-Fuzzy Computing ΗΥ418 Διδάσκων Δημήτριος Κατσαρός @ Τμ. ΗΜΜΥ Πανεπιστήμιο Θεσσαλίαρ Διάλεξη 21η BackProp for CNNs: Do I need to understand it? Why do we have to write the
More information(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann
(Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for
More informationDEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY
DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo
More informationDeep Learning: a gentle introduction
Deep Learning: a gentle introduction Jamal Atif jamal.atif@dauphine.fr PSL, Université Paris-Dauphine, LAMSADE February 8, 206 Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 / Why
More informationNeural Networks: Backpropagation
Neural Networks: Backpropagation Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others
More informationNeural networks and support vector machines
Neural netorks and support vector machines Perceptron Input x 1 Weights 1 x 2 x 3... x D 2 3 D Output: sgn( x + b) Can incorporate bias as component of the eight vector by alays including a feature ith
More informationRecurrent Neural Networks (RNN) and Long-Short-Term-Memory (LSTM) Yuan YAO HKUST
1 Recurrent Neural Networks (RNN) and Long-Short-Term-Memory (LSTM) Yuan YAO HKUST Summary We have shown: Now First order optimization methods: GD (BP), SGD, Nesterov, Adagrad, ADAM, RMSPROP, etc. Second
More information10-701/ Recitation : Kernels
10-701/15-781 Recitation : Kernels Manojit Nandi February 27, 2014 Outline Mathematical Theory Banach Space and Hilbert Spaces Kernels Commonly Used Kernels Kernel Theory One Weird Kernel Trick Representer
More informationCSC321 Lecture 16: ResNets and Attention
CSC321 Lecture 16: ResNets and Attention Roger Grosse Roger Grosse CSC321 Lecture 16: ResNets and Attention 1 / 24 Overview Two topics for today: Topic 1: Deep Residual Networks (ResNets) This is the state-of-the
More informationTraining Neural Networks Practical Issues
Training Neural Networks Practical Issues M. Soleymani Sharif University of Technology Fall 2017 Most slides have been adapted from Fei Fei Li and colleagues lectures, cs231n, Stanford 2017, and some from
More informationNearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2
Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal
More informationSajid Anwar, Kyuyeon Hwang and Wonyong Sung
Sajid Anwar, Kyuyeon Hwang and Wonyong Sung Department of Electrical and Computer Engineering Seoul National University Seoul, 08826 Korea Email: sajid@dsp.snu.ac.kr, khwang@dsp.snu.ac.kr, wysung@snu.ac.kr
More informationSpatial Transformer Networks
BIL722 - Deep Learning for Computer Vision Spatial Transformer Networks Max Jaderberg Andrew Zisserman Karen Simonyan Koray Kavukcuoglu Contents Introduction to Spatial Transformers Related Works Spatial
More informationIntroduction to Deep Learning
Introduction to Deep Learning A. G. Schwing & S. Fidler University of Toronto, 2015 A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 1 / 39 Outline 1 Universality of Neural Networks
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationDeep Feedforward Networks
Deep Feedforward Networks Yongjin Park 1 Goal of Feedforward Networks Deep Feedforward Networks are also called as Feedforward neural networks or Multilayer Perceptrons Their Goal: approximate some function
More informationSVMs: nonlinearity through kernels
Non-separable data e-8. Support Vector Machines 8.. The Optimal Hyperplane Consider the following two datasets: SVMs: nonlinearity through kernels ER Chapter 3.4, e-8 (a) Few noisy data. (b) Nonlinearly
More informationIntroduction to Deep Neural Networks
Introduction to Deep Neural Networks Presenter: Chunyuan Li Pattern Classification and Recognition (ECE 681.01) Duke University April, 2016 Outline 1 Background and Preliminaries Why DNNs? Model: Logistic
More informationHandwritten Indic Character Recognition using Capsule Networks
Handwritten Indic Character Recognition using Capsule Networks Bodhisatwa Mandal,Suvam Dubey, Swarnendu Ghosh, RiteshSarkhel, Nibaran Das Dept. of CSE, Jadavpur University, Kolkata, 700032, WB, India.
More informationBinary Convolutional Neural Network on RRAM
Binary Convolutional Neural Network on RRAM Tianqi Tang, Lixue Xia, Boxun Li, Yu Wang, Huazhong Yang Dept. of E.E, Tsinghua National Laboratory for Information Science and Technology (TNList) Tsinghua
More informationDeep Learning Lecture 2
Fall 2016 Machine Learning CMPSCI 689 Deep Learning Lecture 2 Sridhar Mahadevan Autonomous Learning Lab UMass Amherst COLLEGE Outline of lecture New type of units Convolutional units, Rectified linear
More informationIntroduction to Deep Learning
Introduction to Deep Learning Some slides and images are taken from: David Wolfe Corne Wikipedia Geoffrey A. Hinton https://www.macs.hw.ac.uk/~dwcorne/teaching/introdl.ppt Feedforward networks for function
More informationLogistic Regression Introduction to Machine Learning. Matt Gormley Lecture 9 Sep. 26, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 9 Sep. 26, 2018 1 Reminders Homework 3:
More informationNeural Networks. Nicholas Ruozzi University of Texas at Dallas
Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify
More informationFeature Design. Feature Design. Feature Design. & Deep Learning
Artificial Intelligence and its applications Lecture 9 & Deep Learning Professor Daniel Yeung danyeung@ieee.org Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China Appropriately
More informationCSC321 Lecture 20: Reversible and Autoregressive Models
CSC321 Lecture 20: Reversible and Autoregressive Models Roger Grosse Roger Grosse CSC321 Lecture 20: Reversible and Autoregressive Models 1 / 23 Overview Four modern approaches to generative modeling:
More informationECE521 Lectures 9 Fully Connected Neural Networks
ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance
More informationEach new feature uses a pair of the original features. Problem: Mapping usually leads to the number of features blow up!
Feature Mapping Consider the following mapping φ for an example x = {x 1,...,x D } φ : x {x1,x 2 2,...,x 2 D,,x 2 1 x 2,x 1 x 2,...,x 1 x D,...,x D 1 x D } It s an example of a quadratic mapping Each new
More informationUnderstanding How ConvNets See
Understanding How ConvNets See Slides from Andrej Karpathy Springerberg et al, Striving for Simplicity: The All Convolutional Net (ICLR 2015 workshops) CSC321: Intro to Machine Learning and Neural Networks,
More informationNeural Networks for Machine Learning. Lecture 2a An overview of the main types of neural network architecture
Neural Networks for Machine Learning Lecture 2a An overview of the main types of neural network architecture Geoffrey Hinton with Nitish Srivastava Kevin Swersky Feed-forward neural networks These are
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:
More informationKernel Methods. Foundations of Data Analysis. Torsten Möller. Möller/Mori 1
Kernel Methods Foundations of Data Analysis Torsten Möller Möller/Mori 1 Reading Chapter 6 of Pattern Recognition and Machine Learning by Bishop Chapter 12 of The Elements of Statistical Learning by Hastie,
More informationNeural networks and optimization
Neural networks and optimization Nicolas Le Roux INRIA 8 Nov 2011 Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov 2011 1 / 80 1 Introduction 2 Linear classifier 3 Convolutional neural networks
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationMachine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML)
Machine Learning Fall 2017 Kernels (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang (Chap. 12 of CIML) Nonlinear Features x4: -1 x1: +1 x3: +1 x2: -1 Concatenated (combined) features XOR:
More information9 Classification. 9.1 Linear Classifiers
9 Classification This topic returns to prediction. Unlike linear regression where we were predicting a numeric value, in this case we are predicting a class: winner or loser, yes or no, rich or poor, positive
More informationStochastic Gradient Descent: The Workhorse of Machine Learning. CS6787 Lecture 1 Fall 2017
Stochastic Gradient Descent: The Workhorse of Machine Learning CS6787 Lecture 1 Fall 2017 Fundamentals of Machine Learning? Machine Learning in Practice this course What s missing in the basic stuff? Efficiency!
More informationDeep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017
Deep Learning for Natural Language Processing Sidharth Mudgal April 4, 2017 Table of contents 1. Intro 2. Word Vectors 3. Word2Vec 4. Char Level Word Embeddings 5. Application: Entity Matching 6. Conclusion
More informationLecture 7: Kernels for Classification and Regression
Lecture 7: Kernels for Classification and Regression CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 15, 2011 Outline Outline A linear regression problem Linear auto-regressive
More informationLecture 15: Exploding and Vanishing Gradients
Lecture 15: Exploding and Vanishing Gradients Roger Grosse 1 Introduction Last lecture, we introduced RNNs and saw how to derive the gradients using backprop through time. In principle, this lets us train
More informationTheories of Deep Learning
Theories of Deep Learning Lecture 02 Donoho, Monajemi, Papyan Department of Statistics Stanford Oct. 4, 2017 1 / 50 Stats 385 Fall 2017 2 / 50 Stats 285 Fall 2017 3 / 50 Course info Wed 3:00-4:20 PM in
More informationBits of Machine Learning Part 1: Supervised Learning
Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification
More informationSVMs, Duality and the Kernel Trick
SVMs, Duality and the Kernel Trick Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 26 th, 2007 2005-2007 Carlos Guestrin 1 SVMs reminder 2005-2007 Carlos Guestrin 2 Today
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationML (cont.): SUPPORT VECTOR MACHINES
ML (cont.): SUPPORT VECTOR MACHINES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 40 Support Vector Machines (SVMs) The No-Math Version
More informationComments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms
Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:
More informationNeural Network Language Modeling
Neural Network Language Modeling Instructor: Wei Xu Ohio State University CSE 5525 Many slides from Marek Rei, Philipp Koehn and Noah Smith Course Project Sign up your course project In-class presentation
More informationLecture 4 Backpropagation
Lecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 5, 2017 Things we will look at today More Backpropagation Still more backpropagation Quiz
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationMachine Learning Lecture 10
Machine Learning Lecture 10 Neural Networks 26.11.2018 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Today s Topic Deep Learning 2 Course Outline Fundamentals Bayes
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More information