Perceptron & Neural Networks
|
|
- Annabel Brooks
- 5 years ago
- Views:
Transcription
1 School of Computer Science Introduction to Machine Learning Perceptron & Neural Networks Readings: Bishop Ch , Ch. 5 Murphy Ch. 16.5, Ch. 28 Mitchell Ch. 4 Matt Gormley Lecture 12 October 17,
2 Reminders Homework 3: due 10/24/16 2
3 Outline Discriminative vs. Generative Perceptron Neural Networks Backpropagation 3
4 DISCRIMINATIVE AND GENERATIVE CLASSIFIERS 4
5 Generative vs. Discriminative Generative Classifiers: Example: Naïve Bayes Define a joint model of the observations x and the labels y: p(x,y) Learning maximizes (joint) likelihood Use Bayes Rule to classify based on the posterior: p(y x) =p(x y)p(y)/p(x) Discriminative Classifiers: Example: Logistic Regression Directly model the conditional: p(y x) Learning maximizes conditional likelihood 5
6 Generative vs. Discriminative Finite Sample Analysis (Ng & Jordan, 2002) [Assume that we are learning from a finite training dataset] If model assumptions are correct: Naive Bayes is a more efficient learner (requires fewer samples) than Logistic Regression If model assumptions are incorrect: Logistic Regression has lower asymtotic error, and does better than Naïve Bayes 6
7 solid: NB dashed: LR Slide courtesy of William Cohen 7
8 solid: NB dashed: LR Naïve Bayes makes stronger assumptions about the data but needs fewer examples to estimate the parameters On Discriminative vs Generative Classifiers:. Andrew Ng and Michael Jordan, NIPS Slide courtesy of William Cohen 8
9 Generative vs. Discriminative Learning (Parameter Estimation) Naïve Bayes: Parameters are decoupled à Closed form solution for MLE Logistic Regression: Parameters are coupled à No closed form solution must use iterative optimization techniques instead 9
10 Naïve Bayes vs. Logistic Reg. Learning (MAP Estimation of Parameters) Bernoulli Naïve Bayes: Parameters are probabilities à Beta prior (usually) pushes probabilities away from zero / one extremes Logistic Regression: Parameters are not probabilities à Gaussian prior encourages parameters to be close to zero (effectively pushes the probabilities away from zero / one extremes) 10
11 Features Naïve Bayes vs. Logistic Reg. Naïve Bayes: Features x are assumed to be conditionally independent given y. (i.e. Naïve Bayes Assumption) Logistic Regression: No assumptions are made about the form of the features x. They can be dependent and correlated in any fashion. 11
12 THE PERCEPTRON ALGORITHM 12
13 Recall Background: Hyperplanes Why don t we drop the generative model and try to learn this hyperplane directly?
14 Background: Hyperplanes Recall w Hyperplane (Definition 1): H = {x : w T x = b} Hyperplane (Definition 2): H = {x : w T x =0 and x 1 =1} Half- spaces: H + = {x : w T x > 0 and x 1 =1} H = {x : w T x < 0 and x 1 =1}
15 Background: Hyperplanes Directly modeling the hyperplane would use a decision function: for: h( ) =sign( T ) y { 1, +1} Recall Why don t we drop the generative model and try to learn this hyperplane directly?
16 Online Learning Model Setup: We receive an example (x, y) Make a prediction h(x) Check for correctness h(x) = y? Goal: Minimize the number of mistakes 16
17 Margins Recall Definition: The margin of example x w.r.t. a linear sep. w is the distance from x to the plane w x = 0 (or the negative if on wrong side) Definition: The margin γ w of a set of examples S wrt a linear separator w is the smallest margin over points x S. Definition: The margin γ of a set of examples S is the maximum γ w over all linear separators w. Slide from Nina Balcan γ γ w
18 Perceptron Algorithm Data: Inputs are continuous vectors of length K. Outputs are discrete. D = { (i),y (i) } N i=1 where R K and y {+1, 1} Prediction: Output determined by hyperplane. ŷ = h (x) = sign( T x) sign(a) = 1, if a 0 1, otherwise Learning: Iterative procedure: while not converged receive next example (x, y) predict y = h(x) if positive mistake: add x to parameters if negative mistake: subtract x from parameters 18
19 Perceptron Algorithm Learning: Algorithm 1 Perceptron Learning Algorithm (Batch) 1: procedure P (D = {( (1),y (1) ),...,( (N),y (N) )}) 2: 0 Initialize parameters 3: while not converged do 4: for i {1, 2,...,N} do For each example 5: ŷ sign( T (i) ) Predict 6: if ŷ = y (i) then If mistake 7: + y (i) (i) Update parameters 8: return 19
20 Analysis: Perceptron Perceptron Mistake Bound Guarantee: If data has margin and all points inside a ball of radius R, then Perceptron makes (R/ ) 2 mistakes. (Normalized margin: multiplying all points by 100, or dividing all points by 100, doesn t change the number of mistakes; algo is invariant to scaling.) Slide from Nina Balcan - - γ γ R 20
21 Analysis: Perceptron Perceptron Mistake Bound Theorem 0.1 (Block (1962), Novikoff (1962)). Given dataset: D = {( (i),y (i) )} N i=1. Suppose: 1. Finite size inputs: x (i) R 2. Linearly separable data: s.t. =1and y (i) ( (i) ), i Then: The number of mistakes made by the Perceptron + algorithm on this dataset is + + k (R/ ) γ γ R + + Figure from Nina Balcan 21
22 Analysis: Perceptron Proof of Perceptron Mistake Bound: We will show that there exist constants A and B s.t. Ak (k+1) B k Ak (k+1) B k Ak 22
23 Analysis: Perceptron Theorem 0.1 (Block (1962), Novikoff (1962)). Given dataset: D = {( (i),y (i) )} N i=1. Suppose: 1. Finite size inputs: x (i) R 2. Linearly separable data: s.t. =1and y (i) ( (i) ), i Then: The number of mistakes made by the Perceptron algorithm on this dataset is γ γ R + k (R/ ) 2 Algorithm 1 Perceptron Learning Algorithm (Online) 1: procedure P (D = {( (1),y (1) ), ( (2),y (2) ),...}) 2: 0, k =1 Initialize parameters 3: for i {1, 2,...} do For each example 4: if y (i) ( (k) (i) ) 0 then If mistake 5: (k+1) (k) + y (i) (i) Update parameters 6: k k +1 7: return 23
24 Analysis: Perceptron Whiteboard: Proof of Perceptron Mistake Bound 24
25 Analysis: Perceptron Proof of Perceptron Mistake Bound: Part 1: for some A, Ak (k+1) (k+1) =( (k) + y (i) (i) ) by Perceptron algorithm update = (k) + y (i) ( (i) ) (k) + by assumption (k+1) k by induction on k since (1) = 0 (k+1) k since and =1 Cauchy- Schwartz inequality 25
26 Analysis: Perceptron Proof of Perceptron Mistake Bound: Part 2: for some B, (k+1) B k (k+1) 2 = (k) + y (i) (i) 2 by Perceptron algorithm update = (k) 2 +(y (i) ) 2 (i) 2 +2y (i) ( (k) (i) ) (k) 2 +(y (i) ) 2 (i) 2 since kth mistake y (i) ( (k) (i) ) 0 = (k) 2 + R 2 since (y (i) ) 2 (i) 2 = (i) 2 = R 2 by assumption and (y (i) ) 2 =1 (k+1) 2 kr 2 by induction on k since ( (1) ) 2 =0 (k+1) kr 26
27 Analysis: Perceptron Proof of Perceptron Mistake Bound: Part 3: Combining the bounds finishes the proof. k k (R/ ) 2 (k+1) kr The total number of mistakes must be less than this 27
28 Extensions of Perceptron Kernel Perceptron Choose a kernel K(x, x) Apply the kernel trick to Perceptron Resulting algorithm is still very simple Structured Perceptron Basic idea can also be applied when y ranges over an exponentially large set Mistake bound does not depend on the size of that set 28
29 Summary: Perceptron Perceptron is a simple linear classifier Simple learning algorithm: when a mistake is made, add / subtract the features For linearly separable and inseparable data, we can bound the number of mistakes (geometric argument) Extensions support nonlinear separators and structured prediction 29
30 RECALL: LOGISTIC REGRESSION 30
31 Using gradient ascent for linear classifiers Key idea behind today s lecture: 1. Define a linear classifier (logistic regression) 2. Define an objective function (likelihood) 3. Optimize it with gradient descent to learn parameters Recall 4. Predict the class with highest probability under the model 31
32 Using gradient ascent for linear classifiers Recall This decision function isn t differentiable: h( ) =sign( T ) Use a differentiable function instead: p (y =1 ) = 1 1+ ( T ) sign(x) logistic(u) 1 1+ e u 32
33 Using gradient ascent for linear classifiers Recall This decision function isn t differentiable: h( ) =sign( T ) Use a differentiable function instead: p (y =1 ) = 1 1+ ( T ) sign(x) logistic(u) 1 1+ e u 33
34 Logistic Regression Recall Data: Inputs are continuous vectors of length K. Outputs are discrete. D = { (i),y (i) } N i=1 where R K and y {0, 1} Model: Logistic function applied to dot product of parameters with input vector. p (y =1 ) = Learning: finds the parameters that minimize some objective function. = argmin J( ) 1 1+ ( T ) Prediction: Output is the most probable class. ŷ = y {0,1} p (y ) 34
35 NEURAL NETWORKS 35
36 Learning highly non-linear functions l l l f: X à Y f might be non-linear function X (vector of) continuous and/or discrete vars Y (vector of) continuous and/or discrete vars The XOR gate Speech recognition Eric CMU,
37 Perceptron and Neural Nets l From biological neuron to artificial neuron (perceptron) Synapse Inputs l Dendrites Axon Soma Synapse Soma Synapse Activation function Dendrites Axon x 1 x 2 w 1 w 2 Linear Combiner Hard Limiter θ Threshold Output Y n X = x i w i 1 + 1, if X ω Y = 1, if X < ω i= 0 0 l Artificial neuron networks l l supervised learning gradient descent I n p u t S i g n a l s O u t p u t S i g n a l s Middle Layer Input Layer Output Layer Eric CMU,
38 Connectionist Models l l Consider humans: l l l l l Neuron switching time ~ second Number of neurons ~ Connections per neuron ~ Scene recognition time ~ 0.1 second 100 inference steps doesn't seem like enough à much parallel computation Properties of artificial neural nets (ANN) l l l Many neuron-like threshold switching units Many weighted interconnections among units Highly parallel, distributed processes Eric CMU,
39 Motivation Why is everyone talking about Deep Learning? Because a lot of money is invested in it DeepMind: Acquired by Google for $400 million DNNResearch: Three person startup (including Geoff Hinton) acquired by Google for unknown price tag Enlitic, Ersatz, MetaMind, Nervana, Skylab: Deep Learning startups commanding millions of VC dollars Because it made the front page of the New York Times 39
40 Motivation Why is everyone talking about Deep Learning? 1960s 1980s 1990s Deep learning: Has won numerous pattern recognition competitions Does so with minimal feature engineering This wasn t always the case! Since 1980s: Form of models hasn t changed much, but lots of new tricks More hidden units Better (online) optimization New nonlinear functions (ReLUs) Faster computers (CPUs and GPUs) 40
41 Background A Recipe for Machine Learning 1. Given training data: Face Face Not a face 2. Choose each of these: Decision function Examples: Linear regression, Logistic regression, Neural Network Loss function Examples: Mean- squared error, Cross Entropy 41
42 Background A Recipe for Machine Learning 1. Given training data: 3. Define goal: 2. Choose each of these: Decision function Loss function 4. Train with SGD: (take small steps opposite the gradient) 42
43 Background A Recipe for Gradients Machine Learning 1. Given training data: Backpropagation 3. Define can goal: compute this gradient! And it s a special case of a more general algorithm called reverse- 2. Choose each of these: mode automatic differentiation that Decision function can compute 4. Train the with gradient SGD: of any differentiable (take function small steps efficiently! opposite the gradient) Loss function 43
44 A Recipe for Goals for Today s Machine Lecture Learning Background Given Explore training a new data: class of 3. decision Define functions goal: (Neural Networks) 2. Consider variants of this recipe for training 2. Choose each of these: Decision function Loss function 4. Train with SGD: (take small steps opposite the gradient) 44
45 Decision Functions Output Linear Regression y = h (x) = ( T x) where (a) =a θ 1 θ 2 θ 3 θ M Input 45
46 Decision Functions Logistic Regression Output y = h (x) = ( T x) 1 where (a) = 1+ ( a) θ 1 θ 2 θ 3 θ M Input 46
47 Decision Functions Logistic Regression y = h (x) = ( x) 1 where (a) = 1 + 2tT( a) Output Face θ1 Input T θ2 θ3 Face Not a face θm 47
48 Decision Functions Output Logistic Regression y = h (x) = ( T x) 1 where (a) = 1+ ( a) y θ 1 θ 2 θ 3 θ M x 2 x 1 Input 48
49 Decision Functions Logistic Regression Output y = h (x) = ( T x) 1 where (a) = 1+ ( a) θ 1 θ 2 θ 3 θ M Input 49
50 Neural Network Model Inputs Age 34 Gender 2 Stage Σ Σ Σ Output 0.6 Probability of beingalive Independent variables Weights HiddenL ayer Weights Dependent variable Prediction Eric CMU,
51 Combined logistic models Inputs Age 34.6 Output Gender 2 Stage Σ 0.6 Probability of beingalive Independent variables Weights HiddenL ayer Weights Dependent variable Prediction Eric CMU,
52 Inputs Age 34 Gender 2 Stage Σ Output 0.6 Probability of beingalive Independent variables Weights HiddenL ayer Weights Dependent variable Prediction Eric CMU,
53 Inputs Age 34 Gender 1 Stage Σ Output 0.6 Probability of beingalive Independent variables Weights HiddenL ayer Weights Dependent variable Prediction Eric CMU,
54 Not really, no target for hidden units... Age 34 Gender 2 Stage Σ Σ Σ 0.6 Probability of beingalive Independent variables Weights HiddenL ayer Weights Dependent variable Prediction Eric CMU,
55 Jargon Pseudo-Correspondence l l l l Independent variable = input variable Dependent variable = output variable Coefficients = weights Estimates = targets Logistic Regression Model (the sigmoid unit) Inputs Age 34 Gende 1 r Stage Output Σ 0.6 Probability of beingalive Independent variables x1, x2, x3 Coefficients Dependent variable a, b, c p Prediction Eric CMU,
56 Decision Functions Neural Network Output Hidden Layer Input 56
57 Decision Functions Neural Network Output (E) Output (sigmoid) y = 1 1+ ( b) Hidden Layer (D) Output (linear) b = D j=0 jz j Input (C) Hidden (sigmoid) z j = 1 1+ ( a j ), j (B) Hidden (linear) a j = M i=0 jix i, j (A) Input Given x i, i 57
58 Building a Neural Net Output Features 58
59 Building a Neural Net Output Hidden Layer D = M Input 59
60 Building a Neural Net Output Hidden Layer D = M Input 60
61 Building a Neural Net Output Hidden Layer D = M Input 61
62 Building a Neural Net Output Hidden Layer D < M Input 62
63 Decision Boundary 0 hidden layers: linear classifier Hyperplanes y x1 x2 Example from to Eric Postma via Jason Eisner 63
64 Decision Boundary 1 hidden layer y Boundary of convex region (open or closed) x1 x2 Example from to Eric Postma via Jason Eisner 64
65 Decision Boundary y 2 hidden layers Combinations of convex regions x1 x2 Example from to Eric Postma via Jason Eisner 65
66 Decision Functions Multi- Class Output Output Hidden Layer Input 66
67 Decision Functions Deeper Networks Next lecture: Output Hidden Layer 1 Input 67
68 Decision Functions Deeper Networks Next lecture: Output Hidden Layer 2 Hidden Layer 1 Input 68
69 Decision Functions Next lecture: Making the neural networks deeper Output Hidden Layer 3 Hidden Layer 2 Deeper Networks Hidden Layer 1 Input 69
70 Decision Functions Different Levels of Abstraction We don t know the right levels of abstraction So let the model figure it out! Example from Honglak Lee (NIPS 2010) 70
71 Decision Functions Different Levels of Abstraction Face Recognition: Deep Network can build up increasingly higher levels of abstraction Lines, parts, regions Example from Honglak Lee (NIPS 2010) 71
72 Decision Functions Different Levels of Abstraction Output Hidden Layer 3 Hidden Layer 2 Hidden Layer 1 Input Example from Honglak Lee (NIPS 2010) 72
73 ARCHITECTURES 73
74 Neural Network Architectures Even for a basic Neural Network, there are many design decisions to make: 1. # of hidden layers (depth) 2. # of units per hidden layer (width) 3. Type of activation function (nonlinearity) 4. Form of objective function 74
75 Activation Functions Neural Network with sigmoid activation functions (F) Loss J = 1 2 (y y )2 Output (E) Output (sigmoid) y = 1 1+ ( b) Hidden Layer (D) Output (linear) b = D j=0 jz j Input (C) Hidden (sigmoid) z j = 1 1+ ( a j ), j (B) Hidden (linear) a j = M i=0 jix i, j (A) Input Given x i, i 75
76 Activation Functions Neural Network with arbitrary nonlinear activation functions (F) Loss J = 1 2 (y y )2 Output (E) Output (nonlinear) y = (b) Hidden Layer (D) Output (linear) b = D j=0 jz j Input (C) Hidden (nonlinear) z j = (a j ), j (B) Hidden (linear) a j = M i=0 jix i, j (A) Input Given x i, i 76
77 Activation Functions Sigmoid / Logistic Function 1 logistic(u) 1+ e u So far, we ve assumed that the activation function (nonlinearity) is always the sigmoid function 77
78 Activation Functions A new change: modifying the nonlinearity The logistic is not widely used in modern ANNs Alternate 1: tanh Like logistic function but shifted to range [- 1, +1] Slide from William Cohen
79 AI Stats 2010 depth 4? sigmoid vs. tanh Figure from Glorot & Bentio (2010)
80 Activation Functions A new change: modifying the nonlinearity relu often used in vision tasks Alternate 2: rectified linear unit Linear with a cutoff at zero (Implementation: clip the gradient when you pass zero) Slide from William Cohen
81 Activation Functions A new change: modifying the nonlinearity relu often used in vision tasks Alternate 2: rectified linear unit Soft version: log(exp(x)+1) Doesn t saturate (at one end) Sparsifies outputs Helps with vanishing gradient Slide from William Cohen
82 Objective Functions for NNs Regression: Use the same objective as Linear Regression Quadratic loss (i.e. mean squared error) Classification: Use the same objective as Logistic Regression Cross- entropy (i.e. negative log likelihood) This requires probabilities, so we add an additional softmax layer at the end of our network Forward Backward Quadratic J = 1 2 (y y )2 dj dy = y Cross Entropy J = y (y)+(1 y ) (1 y) y dj dy = y 1 y +(1 y ) 1 y 1 82
83 Multi- Class Output Output Hidden Layer Input 83
84 Multi- Class Output Softmax: y k = (b k ) K l=1 (b l) (F) Loss J = K k=1 y k (y k) (E) Output (softmax) y k = (b k) K l=1 (b l) (D) Output (linear) b k = D j=0 kjz j k Output Hidden Layer (C) Hidden (nonlinear) z j = (a j ), j (B) Hidden (linear) a j = M i=0 jix i, j Input (A) Input Given x i, i 84
85 Cross- entropy vs. Quadratic loss Figure from Glorot & Bentio (2010)
86 Background A Recipe for Machine Learning 1. Given training data: 3. Define goal: 2. Choose each of these: Decision function Loss function 4. Train with SGD: (take small steps opposite the gradient) 86
87 Objective Functions Matching Quiz: Suppose you are given a neural net with a single output, y, and one hidden layer. 1) Minimizing sum of squared errors 2) Minimizing sum of squared errors plus squared Euclidean norm of weights 3) Minimizing cross- entropy 4) Minimizing hinge loss gives 5) MLE estimates of weights assuming target follows a Bernoulli with parameter given by the output value 6) MAP estimates of weights assuming weight priors are zero mean Gaussian 7) estimates with a large margin on the training data 8) MLE estimates of weights assuming zero mean Gaussian noise on the output value A. 1=5, 2=7, 3=6, 4=8 B. 1=5, 2=7, 3=8, 4=6 C. 1=7, 2=5, 3=5, 4=7 D. 1=7, 2=5, 3=6, 4=8 E. 1=8, 2=6, 3=5, 4=7 F. 1=8, 2=6, 3=8, 4=6 87
88 BACKPROPAGATION 88
89 Background A Recipe for Machine Learning 1. Given training data: 3. Define goal: 2. Choose each of these: Decision function Loss function 4. Train with SGD: (take small steps opposite the gradient) 89
90 Training Backpropagation Question 1: When can we compute the gradients of the parameters of an arbitrary neural network? Question 2: When can we make the gradient computation efficient? 90
91 Training { That is, the computation Given: : y = g(u) and u = h(x). es. Chain Rule: dy i dx k = JX j=1 dy i du j du j dx k, Chain Rule 8i, k 91
92 Training { That is, the computation Given: : y = g(u) and u = h(x). es. Chain Rule: dy i dx k = JX j=1 dy i du j du j dx k, Chain Rule 8i, k Backpropagation is just repeated application of the chain rule from Calculus
93 Given: Training Chain Rule: { That is, the computation : y = g(u) and u = h(x). es. dy i dx k = JX j=1 dy i du j du j dx k, 8i, k Chain Rule Backpropagation: 1. Instantiate the computation as a directed acyclic graph, where each intermediate quantity is a node 2. At each node, store (a) the quantity computed in the forward pass and (b) the partial derivative of the goal with respect to that node s intermediate quantity. 3. Initialize all partial derivatives to Visit each node in reverse topological order. At each node, add its contribution to the partial derivatives of its parents This algorithm is also called automatic differentiation in the reverse- mode 93
94 Training Backpropagation Simple Example: The goal is to compute J = ( (x 2 )+3x 2 ) on the forward pass and the derivative dj dx on the backward pass. Forward J = cos(u) u = u 1 + u 2 u 1 = sin(t) u 2 =3t t = x 2 94
95 Training Backpropagation Simple Example: The goal is to compute J = ( (x 2 )+3x 2 ) on the forward pass and the derivative dj dx on the backward pass. Forward J = cos(u) u = u 1 + u 2 u 1 = sin(t) u 2 =3t t = x 2 Backward dj du = sin(u) dj = dj du, du 1 du du 1 dj dj du 1 = dt du 1 dt, du 1 dt dj dj du 2 = dt du 2 dt, du 2 dt dj dj dt = dx dt dx, du du 1 =1 = (t) =3 dt dx =2x dj = dj du 2 du du du 2, du du 2 =1 95
96 Training Backpropagation Case 1: Logistic Regression Output Input θ 1 θ 2 θ 3 θ M Forward J = y y +(1 y ) (1 y) y = a = 1 1+ ( a) D j=0 jx j Backward dj dy = y y + (1 y ) y 1 dj da = dj dy dy da, dy da = ( a) ( ( a) + 1) 2 dj = dj d j da dj = dj dx j da da d j, da dx j, da d j = x j da dx j = j 96
97 Training Backpropagation Output (E) Output (sigmoid) y = 1 1+ ( b) Hidden Layer (D) Output (linear) b = D j=0 jz j Input (C) Hidden (sigmoid) z j = 1 1+ ( a j ), j (B) Hidden (linear) a j = M i=0 jix i, j (A) Input Given x i, i 97
98 Training Backpropagation (F) Loss J = 1 2 (y y )2 Output (E) Output (sigmoid) y = 1 1+ ( b) Hidden Layer (D) Output (linear) b = D j=0 jz j Input (C) Hidden (sigmoid) z j = 1 1+ ( a j ), j (B) Hidden (linear) a j = M i=0 jix i, j (A) Input Given x i, i 98
99 Training Backpropagation Case 2: Neural Network Forward J = y y +(1 y ) (1 y) y = b = z j = a j = 1 1+ ( b) D j=0 jz j 1 1+ ( a j ) M i=0 jix i Backward dj dy = y y + (1 y ) y 1 dj db = dj dy dy db, dy db = ( b) ( ( b) + 1) 2 dj = dj d j db dj = dj dz j db db d j, db dz j, db d j = z j db dz j = dj = dj dz j, dz j = da j dz j da j da j dj = dj d ji da j da j d ji, dj = dj da j, da j = dx i da j dx i dx i j ( a j ) ( ( a j ) + 1) 2 da j d ji = x i D j=0 ji 99
100 Given: Training Chain Rule: { That is, the computation : y = g(u) and u = h(x). es. dy i dx k = JX j=1 dy i du j du j dx k, 8i, k Chain Rule Backpropagation: 1. Instantiate the computation as a directed acyclic graph, where each intermediate quantity is a node 2. At each node, store (a) the quantity computed in the forward pass and (b) the partial derivative of the goal with respect to that node s intermediate quantity. 3. Initialize all partial derivatives to Visit each node in reverse topological order. At each node, add its contribution to the partial derivatives of its parents This algorithm is also called automatic differentiation in the reverse- mode 100
101 Given: Training Chain Rule: { That is, the computation : y = g(u) and u = h(x). es. dy i dx k = JX j=1 dy i du j du j dx k, 8i, k Chain Rule Backpropagation: 1. Instantiate the computation as a directed acyclic graph, where each node represents a Tensor. 2. At each node, store (a) the quantity computed in the forward pass and (b) the partial derivatives of the goal with respect to that node s Tensor. 3. Initialize all partial derivatives to Visit each node in reverse topological order. At each node, add its contribution to the partial derivatives of its parents This algorithm is also called automatic differentiation in the reverse- mode 101
102 Training Backpropagation Case 2: Neural Module 5 Network Module 4 Module 3 Module 2 Module 1 Forward J = y y +(1 y ) (1 y) y = b = z j = a j = 1 1+ ( b) D j=0 jz j 1 1+ ( a j ) M i=0 jix i Backward dj dy = y y + (1 y ) y 1 dj db = dj dy dy db, dy db = ( b) ( ( b) + 1) 2 dj = dj d j db dj = dj dz j db db d j, db dz j, db d j = z j db dz j = dj = dj dz j, dz j = da j dz j da j da j dj = dj d ji da j da j d ji, dj = dj da j, da j = dx i da j dx i dx i j ( a j ) ( ( a j ) + 1) 2 da j d ji = x i D j=0 ji 102
103 Background A Recipe for Gradients Machine Learning 1. Given training data: Backpropagation 3. Define can goal: compute this gradient! And it s a special case of a more general algorithm called reverse- 2. Choose each of these: mode automatic differentiation that Decision function can compute 4. Train the with gradient SGD: of any differentiable (take function small steps efficiently! opposite the gradient) Loss function 103
104 Summary 1. Neural Networks provide a way of learning features are highly nonlinear prediction functions (can be) a highly parallel network of logistic regression classifiers discover useful hidden representations of the input 2. Backpropagation provides an efficient way to compute gradients is a special case of reverse- mode automatic differentiation 104
Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation Matt Gormley Lecture 12 Feb 23, 2018 1 Neural Networks Outline
More informationDeep Learning (CNNs)
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deep Learning (CNNs) Deep Learning Readings: Murphy 28 Bishop - - HTF - - Mitchell
More informationBackpropagation Introduction to Machine Learning. Matt Gormley Lecture 13 Mar 1, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation Matt Gormley Lecture 13 Mar 1, 2018 1 Reminders Homework 5: Neural
More informationPerceptron (Theory) + Linear Regression
10601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Perceptron (Theory) Linear Regression Matt Gormley Lecture 6 Feb. 5, 2018 1 Q&A
More informationLogistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 8 Feb. 12, 2018 1 10-601 Introduction
More informationHidden Markov Models
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework
More informationMachine Learning. Neural Networks. Le Song. CSE6740/CS7641/ISYE6740, Fall Lecture 7, September 11, 2012 Based on slides from Eric Xing, CMU
Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Neural Networks Le Song Lecture 7, September 11, 2012 Based on slides from Eric Xing, CMU Reading: Chap. 5 CB Learning highly non-linear functions f:
More informationLogistic Regression Introduction to Machine Learning. Matt Gormley Lecture 9 Sep. 26, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 9 Sep. 26, 2018 1 Reminders Homework 3:
More informationNaïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Naïve Bayes Matt Gormley Lecture 18 Oct. 31, 2018 1 Reminders Homework 6: PAC Learning
More informationHidden Markov Models
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 19 Nov. 5, 2018 1 Reminders Homework
More informationDEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY
DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationLogistic Regression. William Cohen
Logistic Regression William Cohen 1 Outline Quick review classi5ication, naïve Bayes, perceptrons new result for naïve Bayes Learning as optimization Logistic regression via gradient ascent Over5itting
More informationLogistic Regression & Neural Networks
Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability
More informationNeural Networks: Introduction
Neural Networks: Introduction Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others 1
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationNeural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35
Neural Networks David Rosenberg New York University July 26, 2017 David Rosenberg (New York University) DS-GA 1003 July 26, 2017 1 / 35 Neural Networks Overview Objectives What are neural networks? How
More informationCSC 411 Lecture 10: Neural Networks
CSC 411 Lecture 10: Neural Networks Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 10-Neural Networks 1 / 35 Inspiration: The Brain Our brain has 10 11
More informationArtificial Neuron (Perceptron)
9/6/208 Gradient Descent (GD) Hantao Zhang Deep Learning with Python Reading: https://en.wikipedia.org/wiki/gradient_descent Artificial Neuron (Perceptron) = w T = w 0 0 + + w 2 2 + + w d d where
More informationLecture 3 Feedforward Networks and Backpropagation
Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression
More informationMachine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler
+ Machine Learning and Data Mining Multi-layer Perceptrons & Neural Networks: Basics Prof. Alexander Ihler Linear Classifiers (Perceptrons) Linear Classifiers a linear classifier is a mapping which partitions
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationECE521 Lecture 7/8. Logistic Regression
ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression
More informationCourse 395: Machine Learning - Lectures
Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture
More informationAdvanced Machine Learning
Advanced Machine Learning Lecture 4: Deep Learning Essentials Pierre Geurts, Gilles Louppe, Louis Wehenkel 1 / 52 Outline Goal: explain and motivate the basic constructs of neural networks. From linear
More informationBayesian Networks (Part I)
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Bayesian Networks (Part I) Graphical Model Readings: Murphy 10 10.2.1 Bishop 8.1,
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationLinear discriminant functions
Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I
Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 2012 Engineering Part IIB: Module 4F10 Introduction In
More informationCSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer
CSE446: Neural Networks Spring 2017 Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer Human Neurons Switching time ~ 0.001 second Number of neurons 10 10 Connections per neuron 10 4-5 Scene
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationCSC242: Intro to AI. Lecture 21
CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages
More informationLecture 3 Feedforward Networks and Backpropagation
Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationCSC321 Lecture 5: Multilayer Perceptrons
CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer Perceptrons 1 / 21 Overview Recall the simple neuron-like unit: y output output bias i'th weight w 1 w2 w3
More informationMachine Learning Lecture 10
Machine Learning Lecture 10 Neural Networks 26.11.2018 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Today s Topic Deep Learning 2 Course Outline Fundamentals Bayes
More informationLecture 17: Neural Networks and Deep Learning
UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17
3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural
More informationArtificial Neural Networks
Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples
More informationFrom perceptrons to word embeddings. Simon Šuster University of Groningen
From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 1, 2011 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationArtificial Neural Networks. MGS Lecture 2
Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation
More informationCOMP 551 Applied Machine Learning Lecture 14: Neural Networks
COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: Ryan Lowe (ryan.lowe@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted,
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationNeural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann
Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationMLE/MAP + Naïve Bayes
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes MLE / MAP Readings: Estimating Probabilities (Mitchell, 2016)
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Neural Networks: A brief touch Yuejie Chi Department of Electrical and Computer Engineering Spring 2018 1/41 Outline
More informationNeural Networks in Structured Prediction. November 17, 2015
Neural Networks in Structured Prediction November 17, 2015 HWs and Paper Last homework is going to be posted soon Neural net NER tagging model This is a new structured model Paper - Thursday after Thanksgiving
More informationNeural Networks: Backpropagation
Neural Networks: Backpropagation Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others
More informationDeep Neural Networks (1) Hidden layers; Back-propagation
Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 4 October 2017 / 9 October 2017 MLP Lecture 3 Deep Neural Networs (1) 1 Recap: Softmax single
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationDeep Feedforward Networks
Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 1 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationMachine Learning for NLP
Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers
More informationIntroduction to Convolutional Neural Networks (CNNs)
Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei
More informationCSC321 Lecture 6: Backpropagation
CSC321 Lecture 6: Backpropagation Roger Grosse Roger Grosse CSC321 Lecture 6: Backpropagation 1 / 21 Overview We ve seen that multilayer neural networks are powerful. But how can we actually learn them?
More informationNeural Networks. Nicholas Ruozzi University of Texas at Dallas
Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify
More informationLearning Deep Architectures for AI. Part I - Vijay Chakilam
Learning Deep Architectures for AI - Yoshua Bengio Part I - Vijay Chakilam Chapter 0: Preliminaries Neural Network Models The basic idea behind the neural network approach is to model the response as a
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationAN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009
AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is
More informationMachine Learning Basics III
Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient
More informationECE521 Lectures 9 Fully Connected Neural Networks
ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance
More information2018 EE448, Big Data Mining, Lecture 5. (Part II) Weinan Zhang Shanghai Jiao Tong University
2018 EE448, Big Data Mining, Lecture 5 Supervised Learning (Part II) Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html Content of Supervised Learning
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationMachine Learning Lecture 12
Machine Learning Lecture 12 Neural Networks 30.11.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory Probability
More informationComments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms
Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationMultilayer Perceptron
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4
More informationMIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,
MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationReading Group on Deep Learning Session 1
Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular
More informationSupport Vector Machines and Kernel Methods
2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University
More informationThe Perceptron Algorithm, Margins
The Perceptron Algorithm, Margins MariaFlorina Balcan 08/29/2018 The Perceptron Algorithm Simple learning algorithm for supervised classification analyzed via geometric margins in the 50 s [Rosenblatt
More informationARTIFICIAL INTELLIGENCE. Artificial Neural Networks
INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html
More informationNeural Networks (Part 1) Goals for the lecture
Neural Networks (Part ) Mark Craven and David Page Computer Sciences 760 Spring 208 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 4, 2015 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationRegularization Introduction to Machine Learning. Matt Gormley Lecture 10 Feb. 19, 2018
1-61 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Regularization Matt Gormley Lecture 1 Feb. 19, 218 1 Reminders Homework 4: Logistic
More informationMachine Learning Basics
Security and Fairness of Deep Learning Machine Learning Basics Anupam Datta CMU Spring 2019 Image Classification Image Classification Image classification pipeline Input: A training set of N images, each
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationIntroduction to Logistic Regression and Support Vector Machine
Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel
More informationMachine Learning (CSE 446): Neural Networks
Machine Learning (CSE 446): Neural Networks Noah Smith c 2017 University of Washington nasmith@cs.washington.edu November 6, 2017 1 / 22 Admin No Wednesday office hours for Noah; no lecture Friday. 2 /
More informationBias-Variance Tradeoff
What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,
More information1 Machine Learning Concepts (16 points)
CSCI 567 Fall 2018 Midterm Exam DO NOT OPEN EXAM UNTIL INSTRUCTED TO DO SO PLEASE TURN OFF ALL CELL PHONES Problem 1 2 3 4 5 6 Total Max 16 10 16 42 24 12 120 Points Please read the following instructions
More informationLecture 6. Regression
Lecture 6. Regression Prof. Alan Yuille Summer 2014 Outline 1. Introduction to Regression 2. Binary Regression 3. Linear Regression; Polynomial Regression 4. Non-linear Regression; Multilayer Perceptron
More informationLearning Deep Architectures for AI. Part II - Vijay Chakilam
Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model
More informationBayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Bayesian Networks Matt Gormley Lecture 24 April 9, 2018 1 Homework 7: HMMs Reminders
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA
More informationDeep Neural Networks (1) Hidden layers; Back-propagation
Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 2 October 2018 http://www.inf.ed.ac.u/teaching/courses/mlp/ MLP Lecture 3 / 2 October 2018 Deep
More information