Rapid Introduction to Machine Learning/ Deep Learning

Size: px
Start display at page:

Download "Rapid Introduction to Machine Learning/ Deep Learning"

Transcription

1 Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/62

2 Lecture 1b Logistic regression & neural network October 2, /62

3 Table of contents 1 1 Bird s-eye view of Lecture 1b 1.1 Objectives 1.2 Quick Summary 2 2. GLM: Generalized linear model 2.1 Exponential family of distributions 2.2 Generalized linear model(glm) 2.3 Parameter estimation 3 3. XOR problem and neural network with hidden layer 4 4. Universal approximation 4.1 Further construction 4.2 Universal approximation theorem 4.3 Deep vs Shallow learning 3/62

4 1.1 Objectives 1 Bird s-eye view of Lecture 1b 1.1 Objectives Objective1 Understand logistic regression (binary classification) and its multiclass generalization (softmax regression) Objective2 Recast logistic and softmax regression in a neural network (perceptron) formalism 4/62

5 1.1 Objectives Objective3 Learn the limitations of the perceptron by looking at the XOR problem Learn how to fix it by adding a hidden layer Objective4 Introduce the Universal Approximation Learn about the clash of Deep vs Shallow paradigms in machine learning 5/62

6 1.2 Quick Summary 1.2 Quick Summary Logistic regression Data D = {(x (t), y (t) )} N t=1 input x (t) R d label y (t) {0, 1} Given x = (x 1,, x d ) R d, logistic regression outputs the probability of the output label y being equal to 1 by P[y = 1 x] = sigm(b + w 1 x w d x d ), where sigm(t) = et 1 + e t. 6/62

7 1.2 Quick Summary Thus P[y = 1 x] = P[y = 0 x] = e b+ j w j x j 1 + e b+ j w j x j e b+ j w j x j Decision Given x, decide the output label is ŷ, where { ŷ = 1 if b + j w jx j 0 ŷ = 0 if b + j w jx j < 0 [Thus the decision boundary is the hyperplane b + j w jx j = 0 in R d ] 7/62

8 1.2 Quick Summary Neural network formulation Figure: Neural Network 8/62

9 1.2 Quick Summary z: input to the output neuron. z = b 1 + w 1 x b 1 + w d x d h: output of the output neuron h = sigm(z) = sigm(b 1 + w 1 x b 1 + w d x d ) 9/62

10 1.2 Quick Summary Symmetric (redundant) form of logistic regression The probabilities P[y = 1 x] and P[y = 0 x] have different form in logistic regression. We can put them in symmetric form by rewriting them in the following (redundant) form: exp (b 1 + ) j w 1jx j P[y = 1 x] = exp (b 1 + ) j w 1jx j + exp (b 2 + ) j w 2jx j exp (b 2 + ) j w 2jx j P[y = 0 x] = exp (b 1 + ) j w 1jx j + exp (b 2 + ) j w 2jx j 10/62

11 1.2 Quick Summary Decision Given x, decide the output label is ŷ, where { ŷ = 1 if b1 + j w 1jx j b 2 + j w 2jx j ŷ = 0 if b 1 + j w 1jx j < b 2 + j w 2jx j The decision boundary is the hyperplane b 1 + j w 1j x j = b 2 + j w 2j x j in R d 11/62

12 1.2 Quick Summary Neural network formulation Figure: Neural Network 12/62

13 1.2 Quick Summary z i : input to the ith neuron in the output layer. z i = b i + j w ij x j, i = 1, 2 h i : output of the ith neuron in the output layer. h i = e z i e z 1 + e z 2, i = 1, 2 13/62

14 1.2 Quick Summary Softmax regression: multiclass classification There are K output labels, i.e., y {1,, K} Probability P[y = i x] = exp (b i + ) j w ijx j exp (b 1 + ) j w 1jx j + + exp (b K + ) j w Kjx j for i = 1,, K. Decision Given x, decide the output label is ŷ, where ŷ = argmaxp[y = i x] i 14/62

15 1.2 Quick Summary Decision boundary Figure: example of decision boundary Decision regions are partitioned by linear hyperplanes in R d 15/62

16 1.2 Quick Summary Neural network formalism Figure: Neural network 16/62

17 1.2 Quick Summary z i : input to the ith neuron in the output layer. z i = b i + j w ij x j, i = 1,, K h i : output of the ith neuron in the output layer. h i = ez i k ez k In vector notation, we write: = P[y = i x], i = 1,, K (h 1,, h K ) = softmax(z 1,, z K ), or h = softmax(z) 17/62

18 1.2 Quick Summary XOR problem Given a data set D consisting of 4 points in R 2 in 2 classes as shown in the following: Figure: XOR Note that there is no line that separates there two classes 18/62

19 1.2 Quick Summary But if we add one more (hidden) layer to the neural network, then this network can separate the two classes Figure: hidden layer 19/62

20 1.2 Quick Summary Cybenko-Hornik-Funabashi Theorem Let Σ = [0, 1] d : d-dimensional hypercube. Then the sum of the form f (x) = d c i sigm(b i + w ij x j ) i j=1 can approximate any continuous function on Σ to any degree of accuracy 20/62

21 1.2 Quick Summary Universal Approximation This theorem implies that the neural network with one hidden layer is good enough to do any classification job with small error. at least in principle. In fact, Lecture 2 should be viewed in this spirit Then, why deep learning? 21/62

22 2.1 Exponential family of distributions 2. GLM: Generalized linear model 2.1 Exponential family of distributions Exponential family of distributions An exponential family of distributions in canonical form is a probability distribution of the form: ( ) P θ (y) = 1 h(y) exp θ i T i (y), Z(θ) where y = (y 1,, y K ) R K, θ = (θ 1,, θ m ) R m, T : R m R K i 22/62

23 2.1 Exponential family of distributions Rewrite it in the form [ ] P θ (y) = exp θ i T i (y) A(θ) + C(y), i where A(θ) = log Z(θ) : log partition (cumulant) function C(y) = log h(y) [Remark: Here, we assume the dispersion parameter is 1] 23/62

24 2.1 Exponential family of distributions Bernoulli distribution Random variable Y with value y {0, 1}. Let Then p = P[y = 1] P(y) = p y (1 p) 1 y = exp In exponential family form: T (y) = y θ = log [ ] p y log + log(1 p) 1 p p 1 p = logit(p) p = sigm(θ) = eθ 1 + e θ 24/62

25 2.1 Exponential family of distributions Multivariate Bernoulli (Multinoulli) distribution Random variable Y with value y {0,, K}. Let p i = P[y = i] Define y i = I(y = i) {0, 1}. Thus y y K = 1, and we have P(y) = p y 1 1 py K K = p y 1 1 py K 1 [ K 1 = exp i=1 y i log p i p K + log p K K 1 K 1 p1 i=1 y i k [Note: when K = 2, this is exactly the Bernoulli distribution.] ] 25/62

26 2.1 Exponential family of distributions In the exponential family form: T i (y) = y i θ i = log p i p K, i = 1,, K 1 26/62

27 2.1 Exponential family of distributions Solving for p i, we get generalized sigmoid (softmax) function p i = p K = e θ i 1 + K 1 k=1 eθ k K 1 k=1 eθ k The generalized logit function θ i = log = P[y = i], i = 1,, K 1 = P[y = K] p i 1 K 1 k=1 p, i = 1,, K 1 k The above expressions show how p 1,, p K 1 and θ 1,, θ K 1 are related; p K is gotten by setting p K = 1 (p p K 1 ) 27/62

28 2.2 Generalized linear model(glm) 2.2 Generalized linear model(glm) GLM GLM mechanism is a way to relate the input vector x = (x 1,, x d ) to the parameters θ i of GLM by setting θ i = b i + d w ij x j, where b i and w ij are the GLM parameters to be determined by the data Thus get p i = j=1 ( exp b i + ) d j=1 w ijx j 1 + ( K 1 k=1 exp b i + ), d j=1 w kjx j 28/62

29 2.2 Generalized linear model(glm) i.e., p i = P[y = i x] for i = 1,, K 1, and p K = P[y = k x] = ( K 1 k=1 exp b i + ) d j=1 w kjx j 29/62

30 2.2 Generalized linear model(glm) Note: when K = 2, it is the logistic regression such that P[y = 1 x] = p 1 = P[y = 0 x] = p 2 = exp (b + ) j w jx j 1 + exp (b + ) j w jx j exp (b + ). j w jx j Here, we set b = b 1, w j = w 1j 30/62

31 2.2 Generalized linear model(glm) Symmetric (redundant) form The expression for p K is different from those for p i. To put p 1,, p K in symmetric form, multiply d exp a + α j x j on the numerator and the denominator of p i and p K. Then p i = ( exp a + b i + ) d j=1 (w ij + α j )x j ( exp a + ) d j=1 α jx j + ( K 1 k=1 exp a + b k + ), d j=1 (w kj + α j )x j i = 1,, K 1 j=1 31/62

32 2.2 Generalized linear model(glm) and p K = ( exp a + ) d j=1 α jx j ( exp a + ) d j=1 α jx j + ( K 1 k=1 exp a + b k + ) d j=1 (w kj + α j )x j Set b i b i + a w ij w ij + α j, j = 1,, d, for i = 1,, K 1 and set b K = a w Kj = α j, j = 1,, d 32/62

33 2.2 Generalized linear model(glm) Then we have ( exp b i + ) d j=1 w ijx j p i = K k=1 (b exp k + ) = P[y = 1 x], d j=1 w kjx j i = 1,, K. In vector notation p = (p 1,, p K ) = softmax(z 1,, z K ) = softmax(z), where d z i = exp b i + w ij x j, i = 1,, K j=1 33/62

34 2.2 Generalized linear model(glm) Neural network formalism Figure: Neural network 34/62

35 2.2 Generalized linear model(glm) z i : input to the ith neuron in the output layer z i = b i + j w ij x j, i = 1,, K h i : output of the ith neuron in the output layer h i = ez i k ez k In vector notation, we write: = P[y = i x], i = 1,, K (h 1,, h K ) = softmax(z 1,, z K ), or h = softmax(z) 35/62

36 2.3 Parameter estimation 2.3 Parameter estimation Determining W and b MLE So far the parameters K 1 vector b = [b 1,, b K ] T and K d matrix W = [w ij ] are regarded as given But need to determine b and W using the given data Use MLE (maximum likelihood estimation) Data D = {(x (t), y (t) )} N t=1 Probability P(y x) = p y 1 1 py K K, 36/62

37 2.3 Parameter estimation where p i = Likelihood function Log likelihood function ( exp b i + ) d j=1 w ijx j K k=1 (b exp k + ) d j=1 w kjx j L(W, b) = N P[y (t) x (t) ] t=1 l(w, b) = log L(W, b) = N log P[y (t) x (t) ] t=1 37/62

38 2.3 Parameter estimation Recall Thus P(y x) = p y 1 1 py K K log P[y x] = y 1 log p y K log p K K = I(y = k) log p k = = k=1 K I(y = k) log P[y = k x] k=1 K e z k I(y = k) log K, i=1 ez i k=1 38/62

39 2.3 Parameter estimation where z i = b i + d j=1 w ijx j Rewrite the log likelihood function: l(w, b) = = = N log P[y (t) x (t) ] t=1 N t=1 k=1 N t=1 k=1 K I(y (t) = k) log P[y (t) = k x (t) ] K I(y (t) = k) log e z(t) k K i=1 ez(t) i, 39/62

40 2.3 Parameter estimation where z (t) i = b i + d j=1 w ijx (t) j MLE is to find W and b that maximizes l(w, b) [Note: for softmax regression it turns out that l(w, b) is a concave (for generic data sets, strictly concave) function of W and b.] 40/62

41 2.3 Parameter estimation Neural network formalism Recall Figure: Neural network 41/62

42 2.3 Parameter estimation For each input x (t) z (t) i : input to the ith neuron in the output layer. z (t) i = b i + j w ij x (t) j, i = 1,, K h (t) i : output of the ith neuron in the output layer. (t) h (t) i = ez i k ez(t) k, i = 1,, K 42/62

43 2.3 Parameter estimation For neural networks, the error function is set to be l(w, b) and the training is to minimize this error. [Note: This neural network training is exactly the same as the MLE estimation in softmax regression] Training (learning) of neural network in case of single layer (no hidden layer) neural network Training (learning) is a convex optimization optimization problem; so it is a relatively easy problem Three kinds of training (learning) strategies Full-batch learning: train using all data in D at once Mini-batch learning: train using a small portion of D successively, and cycle through them On-line learning: train using one data point at a time and cycle through them 43/62

44 3. XOR problem and neural network with hidden layer XOR Problem Separate X s from O s 44/62

45 XOR(x 1, x 2 ) = x 1 x 2 + x 1 x 2 x 1 x 2 : 45/62

46 z 1 = a(x 1 x ), a : large h 1 = sigm(z 1 ) 46/62

47 47/62

48 x 1 x 2 : 48/62

49 z 2 = a( x 1 + x ), a : large h 2 = sigm(z 2 ) 49/62

50 50/62

51 z 3 = b(h 1 + h ), b : large h 2 = sigm(z 3 ) 51/62

52 This neural network achieves the separation 52/62

53 4.1 Further construction 4. Universal approximation 4.1 Further construction Further construction The NN constructed above has values 53/62

54 4.1 Further construction Can also construct another NN 54/62

55 4.1 Further construction 55/62

56 4.1 Further construction The region where h 1 h 2 h 3 h 4 = 0 is The neural network 56/62

57 4.1 Further construction One can easily find a hyperplane in R 4 that separates (0, 0, 0, 0) from the rest; and this hyperplane define h 5, which defines a function with value 0 in the center and 1 in the rest 57/62

58 4.1 Further construction Continuing this way, one can construct any approximate bump function as an output of a neural network with one hidden Combining these bump functions, one can approximate any continuous function Namely, a neural network with one hidden layer can do any task, at least in principle 58/62

59 4.2 Universal approximation theorem 4.2 Universal approximation theorem Universal approximation theorem This heuristic argument can be made rigorous using Stone-Weienstrass theorem-type argument to get Cybenko-Hornik-Funabashi Theorem Cybenko-Hornik-Funabashi Theorem Let Σ = [0, 1] d : d-dimensional hypercube. Then the sum of the form f (x) = d c i sigm(b i + w ij x j ) i j=1 can approximate any continuous function on Σ to any degree of accuracy 59/62

60 4.2 Universal approximation theorem Universal approximation theorem There are many similar results to this effect 60/62

61 4.3 Deep vs Shallow learning 4.2 Deep vs Shallow learning Deep vs Shallow learning This theorem says that at least in principle one can do any classification with a neural network with one hidden layer Deep learning utilizes neural network with many hidden layers, typically up to 40 or more layers. Question: If universal Approximation Theorem says one can do the job with only one hidden layer, why does one use so many hidden layers? What is advantage in doing so? This is one big question we like to address to for the rest of this lecture series. 61/62

62 4.3 Deep vs Shallow learning Deep vs Shallow learning To achieve high accuracy, the number of terms has to be huge and the training (learning) is a big problem: typical problem of shallow networks (shallow learning) In contrast, deep NN arranges neurons in depth for more efficiency and better training, but training is a very subtle issue, [which will be dealt with later in this lecture series] 62/62

Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2)

Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2) Lectures on Machine Learning (Fall 2017) Hyeong In Choi Seoul National University Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2) Topics to be covered:

More information

Rapid Introduction to Machine Learning/ Deep Learning

Rapid Introduction to Machine Learning/ Deep Learning Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/59 Lecture 4a Feedforward neural network October 30, 2015 2/59 Table of contents 1 1. Objectives of Lecture

More information

Neural networks COMS 4771

Neural networks COMS 4771 Neural networks COMS 4771 1. Logistic regression Logistic regression Suppose X = R d and Y = {0, 1}. A logistic regression model is a statistical model where the conditional probability function has a

More information

CSC321 Lecture 4: Learning a Classifier

CSC321 Lecture 4: Learning a Classifier CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 31 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron

More information

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Machine Learning. Lecture 3: Logistic Regression. Feng Li. Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification

More information

Learning Deep Architectures for AI. Part I - Vijay Chakilam

Learning Deep Architectures for AI. Part I - Vijay Chakilam Learning Deep Architectures for AI - Yoshua Bengio Part I - Vijay Chakilam Chapter 0: Preliminaries Neural Network Models The basic idea behind the neural network approach is to model the response as a

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

CS489/698: Intro to ML

CS489/698: Intro to ML CS489/698: Intro to ML Lecture 04: Logistic Regression 1 Outline Announcements Baseline Learning Machine Learning Pyramid Regression or Classification (that s it!) History of Classification History of

More information

Gradient-Based Learning. Sargur N. Srihari

Gradient-Based Learning. Sargur N. Srihari Gradient-Based Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

CSC321 Lecture 4: Learning a Classifier

CSC321 Lecture 4: Learning a Classifier CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 28 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

Comments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert

Comments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert Logistic regression Comments Mini-review and feedback These are equivalent: x > w = w > x Clarification: this course is about getting you to be able to think as a machine learning expert There has to be

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)

More information

Logistic Regression. Seungjin Choi

Logistic Regression. Seungjin Choi Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Lecture 3 Feedforward Networks and Backpropagation

Lecture 3 Feedforward Networks and Backpropagation Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression

More information

Multiclass Logistic Regression

Multiclass Logistic Regression Multiclass Logistic Regression Sargur. Srihari University at Buffalo, State University of ew York USA Machine Learning Srihari Topics in Linear Classification using Probabilistic Discriminative Models

More information

Learning from Data: Multi-layer Perceptrons

Learning from Data: Multi-layer Perceptrons Learning from Data: Multi-layer Perceptrons Amos Storkey, School of Informatics University of Edinburgh Semester, 24 LfD 24 Layered Neural Networks Background Single Neurons Relationship to logistic regression.

More information

Logistic Regression & Neural Networks

Logistic Regression & Neural Networks Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability

More information

Deep Feedforward Networks. Han Shao, Hou Pong Chan, and Hongyi Zhang

Deep Feedforward Networks. Han Shao, Hou Pong Chan, and Hongyi Zhang Deep Feedforward Networks Han Shao, Hou Pong Chan, and Hongyi Zhang Deep Feedforward Networks Goal: approximate some function f e.g., a classifier, maps input to a class y = f (x) x y Defines a mapping

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Stochastic gradient descent; Classification

Stochastic gradient descent; Classification Stochastic gradient descent; Classification Steve Renals Machine Learning Practical MLP Lecture 2 28 September 2016 MLP Lecture 2 Stochastic gradient descent; Classification 1 Single Layer Networks MLP

More information

Logistic Regression. Will Monroe CS 109. Lecture Notes #22 August 14, 2017

Logistic Regression. Will Monroe CS 109. Lecture Notes #22 August 14, 2017 1 Will Monroe CS 109 Logistic Regression Lecture Notes #22 August 14, 2017 Based on a chapter by Chris Piech Logistic regression is a classification algorithm1 that works by trying to learn a function

More information

Lecture 17: Neural Networks and Deep Learning

Lecture 17: Neural Networks and Deep Learning UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions

More information

Lecture 3 Feedforward Networks and Backpropagation

Lecture 3 Feedforward Networks and Backpropagation Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression

More information

Learning Neural Networks

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex decision boundaries Variable size. Any boolean function can be represented. Hidden units can be interpreted as new features Deterministic

More information

CSC321 Lecture 5: Multilayer Perceptrons

CSC321 Lecture 5: Multilayer Perceptrons CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer Perceptrons 1 / 21 Overview Recall the simple neuron-like unit: y output output bias i'th weight w 1 w2 w3

More information

Machine Learning. Neural Networks. Le Song. CSE6740/CS7641/ISYE6740, Fall Lecture 7, September 11, 2012 Based on slides from Eric Xing, CMU

Machine Learning. Neural Networks. Le Song. CSE6740/CS7641/ISYE6740, Fall Lecture 7, September 11, 2012 Based on slides from Eric Xing, CMU Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Neural Networks Le Song Lecture 7, September 11, 2012 Based on slides from Eric Xing, CMU Reading: Chap. 5 CB Learning highly non-linear functions f:

More information

Intro to Neural Networks and Deep Learning

Intro to Neural Networks and Deep Learning Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi UVA CS 6316 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions Backpropagation Nonlinearity Functions NNs

More information

CS60010: Deep Learning

CS60010: Deep Learning CS60010: Deep Learning Sudeshna Sarkar Spring 2018 16 Jan 2018 FFN Goal: Approximate some unknown ideal function f : X! Y Ideal classifier: y = f*(x) with x and category y Feedforward Network: Define parametric

More information

Lecture 6. Regression

Lecture 6. Regression Lecture 6. Regression Prof. Alan Yuille Summer 2014 Outline 1. Introduction to Regression 2. Binary Regression 3. Linear Regression; Polynomial Regression 4. Non-linear Regression; Multilayer Perceptron

More information

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 8 Feb. 12, 2018 1 10-601 Introduction

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Week 5: Logistic Regression & Neural Networks

Week 5: Logistic Regression & Neural Networks Week 5: Logistic Regression & Neural Networks Instructor: Sergey Levine 1 Summary: Logistic Regression In the previous lecture, we covered logistic regression. To recap, logistic regression models and

More information

Gaussian discriminant analysis Naive Bayes

Gaussian discriminant analysis Naive Bayes DM825 Introduction to Machine Learning Lecture 7 Gaussian discriminant analysis Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. is 2. Multi-variate

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Jan Drchal Czech Technical University in Prague Faculty of Electrical Engineering Department of Computer Science Topics covered

More information

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning Lecture 0 Neural networks and optimization Machine Learning and Data Mining November 2009 UBC Gradient Searching for a good solution can be interpreted as looking for a minimum of some error (loss) function

More information

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ

More information

Lecture 2: Logistic Regression and Neural Networks

Lecture 2: Logistic Regression and Neural Networks 1/23 Lecture 2: and Neural Networks Pedro Savarese TTI 2018 2/23 Table of Contents 1 2 3 4 3/23 Naive Bayes Learn p(x, y) = p(y)p(x y) Training: Maximum Likelihood Estimation Issues? Why learn p(x, y)

More information

A summary of Deep Learning without Poor Local Minima

A summary of Deep Learning without Poor Local Minima A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given

More information

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

COMP 551 Applied Machine Learning Lecture 14: Neural Networks COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: Ryan Lowe (ryan.lowe@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted,

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data January 17, 2006 Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Multi-Layer Perceptrons Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer CSE446: Neural Networks Spring 2017 Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer Human Neurons Switching time ~ 0.001 second Number of neurons 10 10 Connections per neuron 10 4-5 Scene

More information

text classification 3: neural networks

text classification 3: neural networks text classification 3: neural networks CS 585, Fall 2018 Introduction to Natural Language Processing http://people.cs.umass.edu/~miyyer/cs585/ Mohit Iyyer College of Information and Computer Sciences University

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Logistic Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Logistic Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com Logistic Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these

More information

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

Qualifying Exam in Machine Learning

Qualifying Exam in Machine Learning Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts

More information

Learning: Binary Perceptron. Examples: Perceptron. Separable Case. In the space of feature vectors

Learning: Binary Perceptron. Examples: Perceptron. Separable Case. In the space of feature vectors Linear Classifiers CS 88 Artificial Intelligence Perceptrons and Logistic Regression Pieter Abbeel & Dan Klein University of California, Berkeley Feature Vectors Some (Simplified) Biology Very loose inspiration

More information

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35 Neural Networks David Rosenberg New York University July 26, 2017 David Rosenberg (New York University) DS-GA 1003 July 26, 2017 1 / 35 Neural Networks Overview Objectives What are neural networks? How

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

MLPR: Logistic Regression and Neural Networks

MLPR: Logistic Regression and Neural Networks MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition Amos Storkey Amos Storkey MLPR: Logistic Regression and Neural Networks 1/28 Outline 1 Logistic Regression 2 Multi-layer

More information

Outline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap.

Outline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap. Outline MLPR: and Neural Networks Machine Learning and Pattern Recognition 2 Amos Storkey Amos Storkey MLPR: and Neural Networks /28 Recap Amos Storkey MLPR: and Neural Networks 2/28 Which is the correct

More information

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation Matt Gormley Lecture 12 Feb 23, 2018 1 Neural Networks Outline

More information

Applied Machine Learning Lecture 5: Linear classifiers, continued. Richard Johansson

Applied Machine Learning Lecture 5: Linear classifiers, continued. Richard Johansson Applied Machine Learning Lecture 5: Linear classifiers, continued Richard Johansson overview preliminaries logistic regression training a logistic regression classifier side note: multiclass linear classifiers

More information

Generalized Linear Models. Kurt Hornik

Generalized Linear Models. Kurt Hornik Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks Steve Renals Automatic Speech Recognition ASR Lecture 10 24 February 2014 ASR Lecture 10 Introduction to Neural Networks 1 Neural networks for speech recognition Introduction

More information

Machine Learning and Data Mining. Linear classification. Kalev Kask

Machine Learning and Data Mining. Linear classification. Kalev Kask Machine Learning and Data Mining Linear classification Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ = f(x ; q) Parameters q Program ( Learner ) Learning algorithm Change q

More information

CSCI567 Machine Learning (Fall 2018)

CSCI567 Machine Learning (Fall 2018) CSCI567 Machine Learning (Fall 2018) Prof. Haipeng Luo U of Southern California Sep 12, 2018 September 12, 2018 1 / 49 Administration GitHub repos are setup (ask TA Chi Zhang for any issues) HW 1 is due

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models David Rosenberg New York University April 12, 2015 David Rosenberg (New York University) DS-GA 1003 April 12, 2015 1 / 20 Conditional Gaussian Regression Gaussian Regression Input

More information

Generalized logit models for nominal multinomial responses. Local odds ratios

Generalized logit models for nominal multinomial responses. Local odds ratios Generalized logit models for nominal multinomial responses Categorical Data Analysis, Summer 2015 1/17 Local odds ratios Y 1 2 3 4 1 π 11 π 12 π 13 π 14 π 1+ X 2 π 21 π 22 π 23 π 24 π 2+ 3 π 31 π 32 π

More information

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire

More information

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 2012 Engineering Part IIB: Module 4F10 Introduction In

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

Machine Learning. Lecture 04: Logistic and Softmax Regression. Nevin L. Zhang

Machine Learning. Lecture 04: Logistic and Softmax Regression. Nevin L. Zhang Machine Learning Lecture 04: Logistic and Softmax Regression Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering The Hong Kong University of Science and Technology This set

More information

Deep Feedforward Networks. Seung-Hoon Na Chonbuk National University

Deep Feedforward Networks. Seung-Hoon Na Chonbuk National University Deep Feedforward Networks Seung-Hoon Na Chonbuk National University Neural Network: Types Feedforward neural networks (FNN) = Deep feedforward networks = multilayer perceptrons (MLP) No feedback connections

More information

Machine Learning (CSE 446): Neural Networks

Machine Learning (CSE 446): Neural Networks Machine Learning (CSE 446): Neural Networks Noah Smith c 2017 University of Washington nasmith@cs.washington.edu November 6, 2017 1 / 22 Admin No Wednesday office hours for Noah; no lecture Friday. 2 /

More information

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler + Machine Learning and Data Mining Multi-layer Perceptrons & Neural Networks: Basics Prof. Alexander Ihler Linear Classifiers (Perceptrons) Linear Classifiers a linear classifier is a mapping which partitions

More information

Machine Learning (CS 567) Lecture 5

Machine Learning (CS 567) Lecture 5 Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Warm up: risk prediction with logistic regression

Warm up: risk prediction with logistic regression Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T

More information

Modern Methods of Statistical Learning sf2935 Lecture 5: Logistic Regression T.K

Modern Methods of Statistical Learning sf2935 Lecture 5: Logistic Regression T.K Lecture 5: Logistic Regression T.K. 10.11.2016 Overview of the Lecture Your Learning Outcomes Discriminative v.s. Generative Odds, Odds Ratio, Logit function, Logistic function Logistic regression definition

More information

Machine Learning Basics III

Machine Learning Basics III Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient

More information

Lecture 7. Logistic Regression. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 11, 2016

Lecture 7. Logistic Regression. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 11, 2016 Lecture 7 Logistic Regression Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 11, 2016 Luigi Freda ( La Sapienza University) Lecture 7 December 11, 2016 1 / 39 Outline 1 Intro Logistic

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

CS489/698: Intro to ML

CS489/698: Intro to ML CS489/698: Intro to ML Lecture 03: Multi-layer Perceptron Outline Failure of Perceptron Neural Network Backpropagation Universal Approximator 2 Outline Failure of Perceptron Neural Network Backpropagation

More information

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 9 Sep. 26, 2018

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 9 Sep. 26, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 9 Sep. 26, 2018 1 Reminders Homework 3:

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Deep Learning book, by Ian Goodfellow, Yoshua Bengio and Aaron Courville

Deep Learning book, by Ian Goodfellow, Yoshua Bengio and Aaron Courville Deep Learning book, by Ian Goodfellow, Yoshua Bengio and Aaron Courville Chapter 6 :Deep Feedforward Networks Benoit Massé Dionyssos Kounades-Bastian Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

More information

Machine Learning Basics

Machine Learning Basics Security and Fairness of Deep Learning Machine Learning Basics Anupam Datta CMU Spring 2019 Image Classification Image Classification Image classification pipeline Input: A training set of N images, each

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Lecture 4: Deep Learning Essentials Pierre Geurts, Gilles Louppe, Louis Wehenkel 1 / 52 Outline Goal: explain and motivate the basic constructs of neural networks. From linear

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods 2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University

More information

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run

More information