Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Size: px
Start display at page:

Download "Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods."


1 Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods Kernelizing linear methods COMP-652 and ECSE-608, Lecture 4 - September 18,

2 Regression vs classification Regression: map x R n to y R. Classification: map x R n to a label y Y = {1,, k} (k classes). Input space is divided into decision regions. Linear classification: decisions surfaces are linear (or affine) in x. How to represent the output? If k = 2, Y = { 1, 1} can be convenient... One-hot encoding is another option... We will mainly talk about binary classification, i.e. k = 2. COMP-652 and ECSE-608, Lecture 4 - September 18,

3 Linear models for classification Hyperplane in R n : defined by the equation w x = 0 for some w R n. Least-squares for classification: Y = {1,, k} and h W (x) = arg max i=1,,k (W x) i where W = [ w 1 w k ] R n k. Learning: encode each output sample y i to a vector in R k using onehot encoding and minimize the squared error (closed form solution). COMP-652 and ECSE-608, Lecture 4 - September 18,

4 Linear models for classification Least-squares for classification: Y = {1,, k} and h W (x) = arg max i=1,,k (W x) i where W = [ w 1 w k ] R n k. Learning: encode each output sample y i to a vector in R k using onehot encoding and minimize the squared error (closed form solution). Very sensitive to outliers: COMP-652 and ECSE-608, Lecture 4 - September 18,

5 Linear models for classification Least-squares for classification: Y = {1,, k} and h W (x) = arg max i=1,,k (W x) i where W = [ w 1 w k ] R n k. Learning: encode each output sample y i to a vector in R k using onehot encoding and minimize the squared error (closed form solution). Very sensitive to outliers: Recall that minimizing the squared error corresponds to maximizing the likelihood under the assumption of a Gaussian conditional distribution. COMP-652 and ECSE-608, Lecture 4 - September 18,

6 Linear models for classification Hyperplane in R n : defined by the equation w x = 0 for some w R n. Binary linear discriminant function: Y = { 1, 1} and h w (x) = sign(w x). Learning: Perceptron algorithm (Rosenblatt, 1957) iterative algorithm (closely related to stochastic gradient descent, more on that later...). converges when the data is linearly separable. Converges... ok, but to which hyperplane? What is the best one? Support Vector Machines (next lecture) More generally, Generalized linear models: h w (x) = ψ(w x) for some activation function ψ. If ψ takes its values in (0, 1), we could interpret h w (x) as the conditional probability P (y = 1 x)! COMP-652 and ECSE-608, Lecture 4 - September 18,

7 Logistic sigmoid (recall): σ(z) = exp( z) Symmetry: σ( z) = 1 σ(z) ) Inverse: z = ln ( σ(z) 1 σ(z) Derivative: σ (z) = σ(z)(1 σ(z)) COMP-652 and ECSE-608, Lecture 4 - September 18,

8 Logistic regression Suppose we represent the hypothesis itself as a logistic function of a linear combination of inputs: h(x) = σ(w x) = exp( w x) This is also known as a sigmoid neuron Suppose we interpret h(x) as P (y = 1 x) Then the log-odds ratio, ( ) P (y = 1 x) ln = w x P (y = 0 x) is linear in x (nice!) The optimum weights will maximize the conditional likelihood of the outputs, given the inputs. COMP-652 and ECSE-608, Lecture 4 - September 18,

9 The cross-entropy error function Suppose we interpret the output of the hypothesis, h(x i ), as the probability that y i = 1 Setting Y = {0, 1} for convenience, the log-likelihood of a hypothesis h is: m m { log h(xi ) if y log L(h) = log P (y i x i, h) = i = 1 log(1 h(x i )) if y i = 0 = i=1 i=1 m y i log h(x i ) + (1 y i ) log(1 h(x i )) i=1 The cross-entropy error function is the opposite quantity: ( m ) J D (w) = y i log h(x i ) + (1 y i ) log(1 h(x i )) i=1 COMP-652 and ECSE-608, Lecture 4 - September 18,

10 Cross-entropy error surface for logistic function ( m ) J D (w) = y i log σ(w x i ) + (1 y i ) log(1 σ(w x i )) i=1 Nice error surface, unique minimum, but cannot be solved in closed form COMP-652 and ECSE-608, Lecture 4 - September 18,

11 Gradient descent The gradient of J at a point w can be thought of as a vector indicating which way is uphill SSQ w w If this is an error function, we want to move downhill on it, i.e., in the direction opposite to the gradient COMP-652 and ECSE-608, Lecture 4 - September 18,

12 Example gradient descent traces If the error function is convex, gradient descent converges to the global minimum (under some conditions on the learning rates) For more general hypothesis classes, there may be local optima In this case, the final solution may depend on the initial parameters COMP-652 and ECSE-608, Lecture 4 - September 18,

13 Gradient descent algorithm The basic algorithm assumes that J is easily computed We want to produce a sequence of vectors w 1, w 2, w 3,... with the goal that: J(w 1 ) > J(w 2 ) > J(w 3 ) >... lim i w i = w and w is locally optimal. The algorithm: Given w 0, do for i = 0, 1, 2,... w i+1 = w i α i J(w i ), where α i > 0 is the step size or learning rate for iteration i. COMP-652 and ECSE-608, Lecture 4 - September 18,

14 Maximization procedure: Gradient ascent First we compute the gradient of log L(w) wrt w: log L(w) = 1 y i h i w (x i ) h w(x i )(1 h w (x i ))x i 1 + (1 y i ) 1 h w (x i ) h w(x i )(1 h w (x i ))x i = i x i (y i y i h w (x i ) h w (x i ) + y i h w (x i )) = i (y i h w (x i ))x i = X (y ŷ) where ŷ R m is defined by (ŷ) i = h w (x i ). COMP-652 and ECSE-608, Lecture 4 - September 18,

15 The update rule (because we maximize) is: w w + α log L(w) = w + αx (y ŷ) where α (0, 1) is a step-size or learning rate parameter This is called logistic regression If one uses features of the input, we simply have: w w + αφ (y ŷ) COMP-652 and ECSE-608, Lecture 4 - September 18,

16 Roadmap: theoretical guarantees for gradient descent and second-order methods If the cost function is convex then gradient descent will converge to the optimal solution for an appropriate choice of the learning rates. We will show that the cross-entropy error function is convex. We will see how we can use a second-order method to choose optimal learning rates. COMP-652 and ECSE-608, Lecture 4 - September 18,

17 Convexity A function f : R d R is convex if for all a, b R d, λ [0, 1]: f(λa + (1 λ)b) λf(a) + (1 λ)f(b). f is concave if f is convex. If f and g are convex functions, αf + βg is also convex for any real numbers α and β. COMP-652 and ECSE-608, Lecture 4 - September 18,

18 First-order characterization: Characterizations of convexity f is convex for all a, b : f(a) f(b) + f(b) (a b) (the function is globally above the tangent at b). Second-order characterization: f is convex the Hessian of f is positive semi-definite. The Hessian contains the second-order derivatives of f: H i,j = 2 f x i x j. H is positive semi-definite a Ha 0 for all a R d H has only non-negative eigenvalues. COMP-652 and ECSE-608, Lecture 4 - September 18,

19 Logistic regression: convexity of the cost function ( m ) J(w) = y i log σ(w x i ) + (1 y i ) log(1 σ(w x i )) i=1 where σ is the sigmoid function. We show that log σ(w x) and log(1 σ(w x)) are convex in w: ( w log σ(w x) ) = ( w σ(w x) ) σ(w = σ (w x) w (w x) x) σ(w = (σ(w x) 1)x x) 2 w ( log σ(w x) ) = w ( σ(w x)x ) = σ(w x)(1 σ(w x))xx It is easy to check that this matrix is positive semi-definite for any x. COMP-652 and ECSE-608, Lecture 4 - September 18,

20 Convexity of the cost function ( m ) J(w) = y i log σ(w x i ) + (1 y i ) log(1 σ(w x i )) i=1 where σ(z) = 1/(1 + e z ) (check that σ (z) = σ(z)(1 σ(z))). Similarly you can show that 2 w w ( log(1 σ(w x)) ) = σ(w x)x ( log(1 σ(w x)) ) = σ(w x)(1 σ(w x))xx J(w) is convex in w. The gradient of J is X (ŷ y) where ŷ i = σ(w x i ) = h(x i ). The Hessian of J is X RX where R is diagonal with entries R i,i = h(x i )(1 h(x i )). COMP-652 and ECSE-608, Lecture 4 - September 18,

21 Another algorithm for optimization Recall Newton s method for finding the zero of a function g : R R At point w t, approximate the function by a straight line (its tangent) Solve the linear equation for where the tangent equals 0, and move the parameter to this point: w t+1 = w t g(wt ) g (w t ) COMP-652 and ECSE-608, Lecture 4 - September 18,

22 Application to machine learning Suppose for simplicity that the error function J has only one parameter We want to optimize J, so we can apply Newton s method to find the zeros of J = d dw J We obtain the iteration: w t+1 = w t J (w t ) J (w t ) Note that there is no step size parameter! This is a second-order method, because it requires computing the second derivative But, if our error function is quadratic, this will find the global optimum in one step! COMP-652 and ECSE-608, Lecture 4 - September 18,

23 Second-order methods: Multivariate setting If we have an error function J that depends on many variables, we can compute the Hessian matrix, which contains the second-order derivatives of J: H ij = 2 J w i w j The inverse of the Hessian gives the optimal learning rates The weights are updated as: w w H 1 w J This is also called Newton-Raphson method for logistic regression, or Fisher scoring COMP-652 and ECSE-608, Lecture 4 - September 18,

24 Which method is better? Newton s method usually requires significantly fewer iterations than gradient descent Computing the Hessian requires a batch of data, so there is no natural on-line algorithm Inverting the Hessian explicitly is expensive, but almost never necessary Computing the product of a Hessian with a vector can be done in linear time (Pearlmutter, 1993), which helps also to compute the product of the inverse Hessian with a vector without explicitly computing H COMP-652 and ECSE-608, Lecture 4 - September 18,

25 Newton-Raphson for logistic regression Leads to a nice algorithm called iteratively reweighted least squares (or iterative recursive least squares) The Hessian has the form: H = Φ RΦ where R is the diagonal matrix of h(x i )(1 h(x i )) (you can check that this is the form of the second derivative). The weight update becomes: w w (Φ RΦ) 1 Φ (ŷ y) which can be rewritten as the solution of a weighted least square problem: w (Φ RΦ) 1 Φ R(Φw R 1 (ŷ y)) COMP-652 and ECSE-608, Lecture 4 - September 18,

26 Regularization for logistic regression One can do regularization for logistic regression just like in the case of linear regression Recall regularization makes a statement about the weights, so does not affect the error function Eg: L 2 regularization will have the optimization criterions: J(w) = J D (w) + λ 2 w w COMP-652 and ECSE-608, Lecture 4 - September 18,

27 Probabilistic view of logistic regression Consider the additive noise model we discussed before: y i = h w (x i ) + ɛ where ɛ are drawn iid from some distribution At first glance, log reg does not fit very well We will instead think of a latent variable ŷ i such that: ŷ i = h w (x i ) + ɛ Then the output is generated as: y i = 1 iff ŷ i > 0 COMP-652 and ECSE-608, Lecture 4 - September 18,

28 Generalized Linear Models Logistic regression is a special case of a generalized linear model: E[Y x] = g 1 (w x). g is called the link function, it relates the mean of the response to the linear predictor. Linear regression: E[Y x] = E[w x +ε x] = w x (g is the identity). Logistic regression: E[Y x] = P (Y = 1 x) = σ(w x) Poisson regression: E[Y x] = exp(w x) (for count data).... COMP-652 and ECSE-608, Lecture 4 - September 18,

29 Linear regression with feature vectors revisited Recall: we want to minimize the (regularized) error function: J(w) = 1 2 (Φw y) (Φw y) + λ 2 w w Using the identity (M M + αi) 1 M = M (MM + αi) 1, the solution can be rewritten as w = (Φ Φ + λi n ) 1 Φ y = Φ (ΦΦ + λi m ) 1 y = Φ a with a R m. Inversion of an m m matrix instead of n n! The solution w is a linear combination of input points! COMP-652 and ECSE-608, Lecture 4 - September 18,

30 Linear regression with feature vectors revisited (cont d) Let K = ΦΦ R m m be the so-called Gram matrix. K i,j = φ(x i ) φ(x j ) measures the similarity between inputs i and j. No need to compute the feature map, just need to know how to compute φ(u) φ(v) for any u, v. Solution of regularized least squares: w = Φ a with a = (K + λi) 1 y. The predictions for the input data are given by ŷ = Φw = ΦΦ a = K(K + λi) 1 y. The prediction for a new input point x is given by y = φ(x) Φ a = k x a where k x R m is defined by (k x ) i = φ(x) φ(x i ) Never need to compute the feature map φ explicitly! COMP-652 and ECSE-608, Lecture 4 - September 18,

31 Another derivation: Linear regression with feature vectors revisited Find the weight vector w which minimizes the (regularized) error function: J(w) = 1 2 (Φw y) (Φw y) + λ 2 w w Instead of closed-form solution, take the gradient and rearrange the terms Setting the learning rate α = 1 λ, the solution takes the form: w = 1 λ m (w φ(x i ) y i )φ(x i ) = i=1 m a i φ(x i ) = Φ a i=1 where a is a vector of size m (with a i = 1 λ (w φ(x i ) y i )) Main idea: use a instead of w as parameter vector COMP-652 and ECSE-608, Lecture 4 - September 18,

32 Another derivation: Re-writing the error function Instead of J(w) we have J(a): J(a) = 1 2 a ΦΦ ΦΦ a a ΦΦ y y y + λ 2 a ΦΦ a Denote ΦΦ = K Hence, we can re-write this as: J(a) = 1 2 a KKa a Ky y y + λ 2 a Ka This is quadratic in a, and we can set the gradient to 0 and solve. COMP-652 and ECSE-608, Lecture 4 - September 18,

33 Another derivation: Dual-view regression By setting the gradient to 0 we get: a = (K + λi m ) 1 y Note that this is similar to re-formulating a weight vector in terms of a linear combination of instances Again, the feature mapping is not needed either to learn or to make predictions! This approach is useful if the feature space is very large COMP-652 and ECSE-608, Lecture 4 - September 18,

34 Kernel functions Whenever a learning algorithm can be written in terms of dot-products, it can be generalized to kernels. A kernel is any function K : R n R n R which corresponds to a dot product for some feature mapping φ: K(x 1, x 2 ) = φ(x 1 ) φ(x 2 ) for some φ. Conversely, by choosing a feature map φ : R n R l, we implicitly choose a kernel function (l may even be infinite!) Recall that φ(x 1 ) φ(x 2 ) = cos (x 1, x 2 ) where denotes the angle between the vectors, so a kernel function can be thought of as a notion of similarity. COMP-652 and ECSE-608, Lecture 4 - September 18,

35 Example: Quadratic kernel Let K(x, z) = (x z) 2. Is this a kernel? K(x, z) = = ( n i=1 i,j {1...n} x i z i ) n x j z j = j=1 (x i x j ) (z i z j ) i,j {1...n} x i z i x j z j Hence, it is a kernel, with feature mapping: φ(x) = x 2 1, x 1 x 2,..., x 1 x n, x 2 x 1, x 2 2,..., x 2 n Feature vector includes all squares of elements and all cross terms. Note that computing φ takes O(n 2 ) but computing K takes only O(n)! COMP-652 and ECSE-608, Lecture 4 - September 18,

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,

More information

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

More information

Iterative Reweighted Least Squares

Iterative Reweighted Least Squares Iterative Reweighted Least Squares Sargur. University at Buffalo, State University of ew York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative

More information

Multiclass Logistic Regression

Multiclass Logistic Regression Multiclass Logistic Regression Sargur. Srihari University at Buffalo, State University of ew York USA Machine Learning Srihari Topics in Linear Classification using Probabilistic Discriminative Models

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging

More information

Machine Learning. Lecture 04: Logistic and Softmax Regression. Nevin L. Zhang

Machine Learning. Lecture 04: Logistic and Softmax Regression. Nevin L. Zhang Machine Learning Lecture 04: Logistic and Softmax Regression Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering The Hong Kong University of Science and Technology This set

More information

Regression with Numerical Optimization. Logistic

Regression with Numerical Optimization. Logistic CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

Machine Learning. 7. Logistic and Linear Regression

Machine Learning. 7. Logistic and Linear Regression Sapienza University of Rome, Italy - Machine Learning (27/28) University of Rome La Sapienza Master in Artificial Intelligence and Robotics Machine Learning 7. Logistic and Linear Regression Luca Iocchi,

More information

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Machine Learning. Lecture 3: Logistic Regression. Feng Li. Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification

More information

Max Margin-Classifier

Max Margin-Classifier Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization

More information

Machine Learning. Linear Models. Fabio Vandin October 10, 2017

Machine Learning. Linear Models. Fabio Vandin October 10, 2017 Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w

More information

Logistic Regression Logistic

Logistic Regression Logistic Case Study 1: Estimating Click Probabilities L2 Regularization for Logistic Regression Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 10 th,

More information

Midterm exam CS 189/289, Fall 2015

Midterm exam CS 189/289, Fall 2015 Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points

More information

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012) Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Machine Learning Support Vector Machines. Prof. Matteo Matteucci Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way

More information

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Perceptrons Definition Perceptron learning rule Convergence Margin & max margin classifiers (Linear) support vector machines Formulation

More information

Linear and logistic regression

Linear and logistic regression Linear and logistic regression Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Linear and logistic regression 1/22 Outline 1 Linear regression 2 Logistic regression 3 Fisher discriminant analysis

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

April 9, Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá. Linear Classification Models. Fabio A. González Ph.D.

April 9, Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá. Linear Classification Models. Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 9, 2018 Content 1 2 3 4 Outline 1 2 3 4 problems { C 1, y(x) threshold predict(x) = C 2, y(x) < threshold, with threshold

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Machine Learning - Waseda University Logistic Regression

Machine Learning - Waseda University Logistic Regression Machine Learning - Waseda University Logistic Regression AD June AD ) June / 9 Introduction Assume you are given some training data { x i, y i } i= where xi R d and y i can take C different values. Given

More information

Machine Learning. Linear Models. Fabio Vandin October 10, 2017

Machine Learning. Linear Models. Fabio Vandin October 10, 2017 Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods 2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Lecture 5: GPs and Streaming regression

Lecture 5: GPs and Streaming regression Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition LINEAR CLASSIFIERS Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification, the input

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

MLPR: Logistic Regression and Neural Networks

MLPR: Logistic Regression and Neural Networks MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition Amos Storkey Amos Storkey MLPR: Logistic Regression and Neural Networks 1/28 Outline 1 Logistic Regression 2 Multi-layer

More information

Lecture 7. Logistic Regression. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 11, 2016

Lecture 7. Logistic Regression. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 11, 2016 Lecture 7 Logistic Regression Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 11, 2016 Luigi Freda ( La Sapienza University) Lecture 7 December 11, 2016 1 / 39 Outline 1 Intro Logistic

More information

Classification Logistic Regression

Classification Logistic Regression Announcements: Classification Logistic Regression Machine Learning CSE546 Sham Kakade University of Washington HW due on Friday. Today: Review: sub-gradients,lasso Logistic Regression October 3, 26 Sham

More information

Outline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap.

Outline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap. Outline MLPR: and Neural Networks Machine Learning and Pattern Recognition 2 Amos Storkey Amos Storkey MLPR: and Neural Networks /28 Recap Amos Storkey MLPR: and Neural Networks 2/28 Which is the correct

More information

CSC 411: Lecture 04: Logistic Regression

CSC 411: Lecture 04: Logistic Regression CSC 411: Lecture 04: Logistic Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 23, 2015 Urtasun & Zemel (UofT) CSC 411: 04-Prob Classif Sep 23, 2015 1 / 16 Today Key Concepts: Logistic

More information

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015 EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete

More information

CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Bayesian Logistic Regression

Bayesian Logistic Regression Bayesian Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Models for Classification Overview 1. Discriminant Functions 2. Probabilistic Generative

More information

CS-E3210 Machine Learning: Basic Principles

CS-E3210 Machine Learning: Basic Principles CS-E3210 Machine Learning: Basic Principles Lecture 3: Regression I slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 48 In a nutshell

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

ECS171: Machine Learning

ECS171: Machine Learning ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018 Gradient descent Optimization Goal: find the minimizer of a function min f (w) w For now we assume f

More information

Stochastic Gradient Descent

Stochastic Gradient Descent Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular

More information

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning Lecture 0 Neural networks and optimization Machine Learning and Data Mining November 2009 UBC Gradient Searching for a good solution can be interpreted as looking for a minimum of some error (loss) function

More information

DATA MINING AND MACHINE LEARNING. Lecture 4: Linear models for regression and classification Lecturer: Simone Scardapane

DATA MINING AND MACHINE LEARNING. Lecture 4: Linear models for regression and classification Lecturer: Simone Scardapane DATA MINING AND MACHINE LEARNING Lecture 4: Linear models for regression and classification Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Linear models for regression Regularized

More information

1 Machine Learning Concepts (16 points)

1 Machine Learning Concepts (16 points) CSCI 567 Fall 2018 Midterm Exam DO NOT OPEN EXAM UNTIL INSTRUCTED TO DO SO PLEASE TURN OFF ALL CELL PHONES Problem 1 2 3 4 5 6 Total Max 16 10 16 42 24 12 120 Points Please read the following instructions

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Lecture 3 - Linear and Logistic Regression

Lecture 3 - Linear and Logistic Regression 3 - Linear and Logistic Regression-1 Machine Learning Course Lecture 3 - Linear and Logistic Regression Lecturer: Haim Permuter Scribe: Ziv Aharoni Throughout this lecture we talk about how to use regression

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know The Bayes classifier Theorem The classifier satisfies where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know Alternatively, since the maximum it is

More information

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA The Bayes classifier Linear discriminant analysis (LDA) Theorem The classifier satisfies In linear discriminant analysis (LDA), we make the (strong) assumption that where the min is over all possible classifiers.

More information

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so. CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic

More information

Statistical Machine Learning Hilary Term 2018

Statistical Machine Learning Hilary Term 2018 Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html

More information

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

Statistical learning theory, Support vector machines, and Bioinformatics

Statistical learning theory, Support vector machines, and Bioinformatics 1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:

More information

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Machine Learning. Lecture 6: Support Vector Machine. Feng Li. Machine Learning Lecture 6: Support Vector Machine Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Warm Up 2 / 80 Warm Up (Contd.)

More information

SVMs, Duality and the Kernel Trick

SVMs, Duality and the Kernel Trick SVMs, Duality and the Kernel Trick Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 26 th, 2007 2005-2007 Carlos Guestrin 1 SVMs reminder 2005-2007 Carlos Guestrin 2 Today

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable

More information

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 8 Feb. 12, 2018 1 10-601 Introduction

More information


LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary

More information

COMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37

COMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37 COMP 652: Machine Learning Lecture 12 COMP 652 Lecture 12 1 / 37 Today Perceptrons Definition Perceptron learning rule Convergence (Linear) support vector machines Margin & max margin classifier Formulation

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Machine Learning Final Exam May 5, 2015

Machine Learning Final Exam May 5, 2015 Name: Andrew ID: Instructions Anything on paper is OK in arbitrary shape size, and quantity. Electronic devices are not acceptable. This includes ipods, ipads, Android tablets, Blackberries, Nokias, Windows

More information

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron CS446: Machine Learning, Fall 2017 Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron Lecturer: Sanmi Koyejo Scribe: Ke Wang, Oct. 24th, 2017 Agenda Recap: SVM and Hinge loss, Representer

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,

More information

Warm up: risk prediction with logistic regression

Warm up: risk prediction with logistic regression Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T

More information

CS 340 Lec. 16: Logistic Regression

CS 340 Lec. 16: Logistic Regression CS 34 Lec. 6: Logistic Regression AD March AD ) March / 6 Introduction Assume you are given some training data { x i, y i } i= where xi R d and y i can take C different values. Given an input test data

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Logistic Regression. Seungjin Choi

Logistic Regression. Seungjin Choi Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Logistic Regression. Sargur N. Srihari. University at Buffalo, State University of New York USA

Logistic Regression. Sargur N. Srihari. University at Buffalo, State University of New York USA Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative 1. Fixed basis

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Hypothesis Space variable size deterministic continuous parameters Learning Algorithm linear and quadratic programming eager batch SVMs combine three important ideas Apply optimization

More information

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4 Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the

More information

Machine Learning and Data Mining. Linear classification. Kalev Kask

Machine Learning and Data Mining. Linear classification. Kalev Kask Machine Learning and Data Mining Linear classification Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ = f(x ; q) Parameters q Program ( Learner ) Learning algorithm Change q

More information

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014 Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of

More information

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010

More information

Introduction Dual Representations Kernel Design RBF Linear Reg. GP Regression GP Classification Summary. Kernel Methods. Henrik I Christensen

Introduction Dual Representations Kernel Design RBF Linear Reg. GP Regression GP Classification Summary. Kernel Methods. Henrik I Christensen Kernel Methods Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Kernel Methods 1 / 37 Outline

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining Linear Classifiers: predictions Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due Friday of next

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information