Classification Logistic Regression
|
|
- Briana Warner
- 5 years ago
- Views:
Transcription
1 Announcements: Classification Logistic Regression Machine Learning CSE546 Sham Kakade University of Washington HW due on Friday. Today: Review: sub-gradients,lasso Logistic Regression October 3, 26 Sham Kakade Sham Kakade 2 Variable Selection by Regularization Simple Variable Selection LASSO: Sparse Regression Ridge regression: Penalizes large weights What if we want to perform feature selection? E.g., Which regions of the brain are important for word prediction? Can t simply choose features with largest coefficients in ridge solution Machine Learning CSE546 Sham Kakade University of Washington October, Sham Kakade 3 Try new (convex) penalty: Penalize non-zero weights Regularization penalty: Leads to sparse solutions Just like ridge regression, solution is indexed by a continuous param λ Major impact in: statistics, machine learning & electrical engineering 26 Sham Kakade 4
2 LASSO Regression (Related) Constrained Optimization LASSO: least absolute shrinkage and selection operator New objective: LASSO solution: ŵ LASSO = arg min w j= 2 t(x j) (w + w ih i(x j))! + i= w i i= 26 Sham Kakade 5 26 Sham Kakade 6 Optimizing the LASSO Objective Coordinate Descent LASSO solution: ŵ LASSO = arg min w j= 2 t(x j) (w + w ih i(x j))! + i= w i i= Given a function F Want to find minimum Often, hard to find minimum for all coordinates, but easy for one coordinate Coordinate descent: How do we pick next coordinate? Super useful approach for *many* problems 26 Sham Kakade 7 Converges to optimum in some cases, such as LASSO 26 Sham Kakade 8 2
3 Optimizing LASSO Objective One Coordinate at a Time Subgradients of Convex Functions 2 t(x j ) (w + w i h i (x j ))! + w i j= i= i= Taking the derivative: Residual sum of squares RSS(w) = 2 h`(x j ) t(x j ) (w + w i h i (x j j= i= Gradients lower bound convex functions: Gradients are unique at w iff function differentiable at w Penalty term: Subgradients: Generalize gradients to non-differentiable points: Any plane that lower bounds function: 26 Sham Kakade 9 26 Sham Kakade Taking the Subgradient Gradient of RSS RSS(w) =a`w` If no penalty: Subgradient of full objective:!2 t(xj) (w + wihi(xj)) + wi j= i= i= a` =2 (h`(xj)) 2 j= c` =2 (w + X wihi(xj)) A i6=` j= Setting Subgradient to 8 w`f (w) = : a`w` c` w` < [ c`, c` + ] w` = a`w` c` + w` > 26 Sham Kakade 26 Sham Kakade 2 3
4 Soft Thresholding ŵ` = 8 < : (c` + )/a` c` < c` 2 [, ] (c` )/a` c` > Coordinate Descent for LASSO (aka Shooting Algorithm) Repeat until convergence Pick a coordinate l at (random or sequentially) 8 Set: < (c` + )/a` c` < ŵ` = c` 2 [, ] : (c` )/a` c` > c` From Kevin Murphy textbook Where: a` =2 (h`(xj)) 2 j= c` =2 (w + X wihi(xj)) A i6=` j= For convergence rates, see Shalev-Shwartz and Tewari 29 Other common technique = LARS 26 Sham Kakade 3 Least angle regression and shrinkage, Efron et al Sham Kakade 4 Recall: Ridge Coefficient Path Now: LASSO Coefficient Path From Kevin Murphy textbook From Kevin Murphy textbook Typical approach: select λ using cross validation 26 Sham Kakade 5 26 Sham Kakade 6 4
5 What you need to know Sample size issues? Variable Selection: find a sparse solution to learning problem L regularization is one way to do variable selection Applies beyond regression Hundreds of other approaches out there LASSO objective non-differentiable, but convex è Use subgradient No closed-form solution for minimization è Use coordinate descent Shooting algorithm is simple approach for solving LASSO 26 Sham Kakade 7 Sham Kakade 26 8 Classification Logistic Regression Machine Learning CSE546 Sham Kakade University of Washington October 3, 26 Sham Kakade 26 9 THUS FAR, REGRESSION: PREDICT A CONTINUOUS VALUE GIVEN SOME INPUTS Sham Kakade
6 Weather prediction revisted Reading Your Brain, Simple Example Pairwise classification accuracy: 85% Person Animal [Mitchell et al.] Temperature Sham Kakade 26 2 Sham Kakade Classification Learn: h:x! Y X features Y target classes Link Functions Estimating P(Y X): Why not use standard linear regression? Conditional probability: P(Y X) Suppose you know P(Y X) exactly, how should you classify? Bayes optimal classifier: How do we estimate P(Y X)? Sham Kakade Combing regression and probability? Need a mapping from real values to [,] A link function! Sham Kakade
7 Logistic Regression Logistic function (or Sigmoid): Understanding the sigmoid Learn P(Y X) directly Assume a particular functional form for link function Sigmoid applied to a linear function of the input features: Z w =-2, w =- w =, w =- w =, w = Features can be discrete or continuous! Sham Kakade Sham Kakade Logistic Regression a Linear classifier Very convenient! implies implies implies linear classification rule! Sham Kakade Sham Kakade
8 Optimizing concave function Gradient ascent Loss function: Conditional Likelihood Conditional likelihood for Logistic Regression is concave. Find optimum with gradient ascent Have a bunch of iid data of the form: Gradient: Update rule: Step size, h > Discriminative (logistic regression) loss function: Conditional Data Likelihood Gradient ascent is simplest of optimization approaches e.g., Conjugate gradient ascent can be much better Sham Kakade Sham Kakade 26 3 Expressing Conditional Log Likelihood Maximizing Conditional Log Likelihood `(w) = X j y j ln P (Y = x j, w)+( y j )lnp (Y = x j, w) Good news: l(w) is concave function of w, no local optima problems Bad news: no closed-form solution to maximize l(w) Good news: concave functions easy to optimize Sham Kakade 26 3 Sham Kakade
9 Maximize Conditional Log Likelihood: Gradient ascent Gradient Ascent for LR Gradient ascent algorithm: iterate until change < e (t) For i=,,k, (t) repeat Sham Kakade Sham Kakade Regularization in linear regression Linear Separability Overfitting usually leads to very large parameter choices, e.g.: X.3 X ,7,9.7 X 8,585,638.4 X 2 + Regularized least-squares (a.k.a. ridge regression), for l >: Sham Kakade Sham Kakade
10 Large parameters Overfitting Regularized Conditional Log Likelihood Add regularization penalty, e.g., L 2 : NY `(w) =ln P (y j x j, w) 2 w 2 2 j= If data is linearly separable, weights go to infinity Practical note about w : Gradient of regularized likelihood: In general, leads to overfitting: Penalizing high weights can prevent overfitting Sham Kakade Sham Kakade Standard v. Regularized Updates Please Stop!! Stopping criterion Maximum conditional likelihood estimate NY w = arg max ln P (y j x j, w) w j= (t) `(w) =ln Y j P (y j x j, w)) w 2 2 When do we stop doing gradient descent? Regularized maximum conditional likelihood estimate NY w = arg max ln P (y j x j, w) wi 2 w 2 j= i= (t) Because l(w) is strongly concave: i.e., because of some technical condition `(w ) `(w) apple 2 r`(w) 2 2 Thus, stop when: Sham Kakade Sham Kakade 26 4
11 Digression: Logistic regression for more than 2 classes Logistic regression in more general case (C classes), where Y in {,,C-} Digression: Logistic regression more generally Logistic regression in more general case, where Y in {,,C-} for c> P (Y = c x, w) = exp(w c + P k i= w cix i ) + P C c = exp(w c + P k i= w c ix i ) for c= (normalization, so no weights for this class) P (Y = x, w) = + P C c = exp(w c + P k i= w c ix i ) Sham Kakade 26 4 Learning procedure is basically the same as what we derived! Sham Kakade The Cost, The Cost!!! Think about the cost Stochastic Gradient Descent What s the cost of a gradient update step for LR??? (t) Machine Learning CSE546 Sham Kakade University of Washington October 3, 26 Sham Kakade Sham Kakade 26 44
12 Learning Problems as Expectations Gradient ascent in Terms of Expectations Minimizing loss in training data: Given dataset: Sampled iid from some distribution p(x) on features: Loss function, e.g., hinge loss, logistic loss, We often minimize loss in training data: `D(w) = `(w, x j ) N j= However, we should really minimize expected loss on all data: Z `(w) =E x [`(w, x)] = p(x)`(w, x)dx So, we are approximating the integral by the average on the training data Z `(w) =E x [`(w, x)] = p(x)`(w, x)dx True objective function: Taking the gradient: True gradient ascent rule: How do we estimate expected gradient? Sham Kakade Sham Kakade SGD: Stochastic Gradient Ascent (or Descent) True gradient: Sample based approximation: What if we estimate gradient with just one sample??? Unbiased estimate of gradient Very noisy! Called stochastic gradient ascent (or descent) Among many other names VERY useful in practice!!! r`(w) =E x [r`(w, x)] Stochastic Gradient Ascent for Logistic Regression Logistic loss as a stochastic function: E x [`(w, x)] = E x ln P (y x, w) Batch gradient ascent updates: w < w (t+) i w (t) i + : w(t) i + = x (j) i [y (j) P (Y = x (j), w (t) )] N ; j= Stochastic gradient ascent updates: Online setting: n o i + t w (t) i + x (t) i [y (t) P (Y = x (t), w (t) )] w (t+) i w (t) Sham Kakade Sham Kakade
13 Stochastic Gradient Ascent: general case Given a stochastic function of parameters: Want to find maximum Start from w () Repeat until convergence: Get a sample data point x t Update parameters: Works on the online learning setting! Complexity of each gradient step is constant in number of examples! In general, step size changes with iterations What you should know Classification: predict discrete classes rather than real values Logistic regression model: Linear model Logistic function maps real values to [,] Optimize conditional likelihood Gradient computation Overfitting Regularization Regularized optimization Cost of gradient step is high, use stochastic gradient descent Sham Kakade Sham Kakade 26 5 Stopping criterion Convergence rates for gradient descent/ascent `(w) =ln Y j P (y j x j, w)) w 2 2 Regularized logistic regression is strongly concave Negative second derivative bounded away from zero: Number of Iterations to get to accuracy `(w ) `(w) apple If func Lipschitz: O(/ϵ 2 ) If gradient of func Lipschitz: O(/ϵ) Strong concavity (convexity) is super helpful!! If func is strongly convex: O(ln(/ϵ)) For example, for strongly concave l(w): `(w ) `(w) apple 2 r`(w) 2 2 Sham Kakade 26 5 Sham Kakade
Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!
Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 4, 017 1 Announcements:
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. October 7, Efficiency: If size(w) = 100B, each prediction is expensive:
Simple Variable Selection LASSO: Sparse Regression Machine Learning CSE546 Carlos Guestrin University of Washington October 7, 2013 1 Sparsity Vector w is sparse, if many entries are zero: Very useful
More informationIs the test error unbiased for these programs? 2017 Kevin Jamieson
Is the test error unbiased for these programs? 2017 Kevin Jamieson 1 Is the test error unbiased for this program? 2017 Kevin Jamieson 2 Simple Variable Selection LASSO: Sparse Regression Machine Learning
More informationIs the test error unbiased for these programs?
Is the test error unbiased for these programs? Xtrain avg N o Preprocessing by de meaning using whole TEST set 2017 Kevin Jamieson 1 Is the test error unbiased for this program? e Stott see non for f x
More informationStochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More informationAd Placement Strategies
Case Study : Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 7 th, 04 Ad
More informationLogistic Regression Logistic
Case Study 1: Estimating Click Probabilities L2 Regularization for Logistic Regression Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 10 th,
More informationLASSO Review, Fused LASSO, Parallel LASSO Solvers
Case Study 3: fmri Prediction LASSO Review, Fused LASSO, Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 3, 2016 Sham Kakade 2016 1 Variable
More informationClassification Logistic Regression
Classification Logistic Regression Machine Learning CSE546 Kevin Jamieson University of Washington October 16, 2016 1 THUS FAR, REGRESSION: PREDICT A CONTINUOUS VALUE GIVEN SOME INPUTS 2 Weather prediction
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features Y target
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 1 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features
More informationBias-Variance Tradeoff
What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff
More informationLinear classifiers: Overfitting and regularization
Linear classifiers: Overfitting and regularization Emily Fox University of Washington January 25, 2017 Logistic regression recap 1 . Thus far, we focused on decision boundaries Score(x i ) = w 0 h 0 (x
More informationLast Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression
CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 22 Dan Weld Learning Gaussians Naïve Bayes Last Time Gaussians Naïve Bayes Logistic Regression Today Some slides from Carlos Guestrin, Luke Zettlemoyer
More informationWarm up. Regrade requests submitted directly in Gradescope, do not instructors.
Warm up Regrade requests submitted directly in Gradescope, do not email instructors. 1 float in NumPy = 8 bytes 10 6 2 20 bytes = 1 MB 10 9 2 30 bytes = 1 GB For each block compute the memory required
More informationClassification Logistic Regression
O due Thursday µtwl Classification Logistic Regression Machine Learning CSE546 Kevin Jamieson University of Washington October 16, 2016 1 THUS FAR, REGRESSION: PREDICT A CONTINUOUS VALUE GIVEN SOME INPUTS
More informationMachine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 20, 2012 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)
More informationMachine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, Emily Fox 2014
Case Study 3: fmri Prediction Fused LASSO LARS Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, 2014 Emily Fox 2014 1 LASSO Regression
More informationLogistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.
Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features
More informationAnnouncements Kevin Jamieson
Announcements Project proposal due next week: Tuesday 10/24 Still looking for people to work on deep learning Phytolith project, join #phytolith slack channel 2017 Kevin Jamieson 1 Gradient Descent Machine
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 2, 2015 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationClustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning
Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades
More informationLinear classifiers: Logistic regression
Linear classifiers: Logistic regression STAT/CSE 416: Machine Learning Emily Fox University of Washington April 19, 2018 How confident is your prediction? The sushi & everything else were awesome! The
More informationLinear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: Classification: Logistic Regression NB & LR connections Readings: Barber 17.4 Dhruv Batra Virginia Tech Administrativia HW2 Due: Friday 3/6, 3/15, 11:55pm
More informationMachine Learning Gaussian Naïve Bayes Big Picture
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 27, 2011 Today: Naïve Bayes Big Picture Logistic regression Gradient ascent Generative discriminative
More informationMachine Learning, Fall 2012 Homework 2
0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0
More informationOptimization in the Big Data Regime 2: SVRG & Tradeoffs in Large Scale Learning. Sham M. Kakade
Optimization in the Big Data Regime 2: SVRG & Tradeoffs in Large Scale Learning. Sham M. Kakade Machine Learning for Big Data CSE547/STAT548 University of Washington S. M. Kakade (UW) Optimization for
More informationLasso Regression: Regularization for feature selection
Lasso Regression: Regularization for feature selection Emily Fox University of Washington January 18, 2017 1 Feature selection task 2 1 Why might you want to perform feature selection? Efficiency: - If
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationCSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression
CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html
More informationLasso Regression: Regularization for feature selection
Lasso Regression: Regularization for feature selection Emily Fox University of Washington January 18, 2017 Feature selection task 1 Why might you want to perform feature selection? Efficiency: - If size(w)
More informationKernelized Perceptron Support Vector Machines
Kernelized Perceptron Support Vector Machines Emily Fox University of Washington February 13, 2017 What is the perceptron optimizing? 1 The perceptron algorithm [Rosenblatt 58, 62] Classification setting:
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationMachine Learning. Linear Models. Fabio Vandin October 10, 2017
Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w
More informationConvergence rate of SGD
Convergence rate of SGD heorem: (see Nemirovski et al 09 from readings) Let f be a strongly convex stochastic function Assume gradient of f is Lipschitz continuous and bounded hen, for step sizes: he expected
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationAdaptive Gradient Methods AdaGrad / Adam. Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade
Adaptive Gradient Methods AdaGrad / Adam Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade 1 Announcements: HW3 posted Dual coordinate ascent (some review of SGD and random
More informationcxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c
Warm up D cai.yo.ie p IExrL9CxsYD Sglx.Ddl f E Luo fhlexi.si dbll Fix any a, b, c > 0. 1. What is the x 2 R that minimizes ax 2 + bx + c x a b Ta OH 2 ax 16 0 x 1 Za fhkxiiso3ii draulx.h dp.d 2. What is
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationRegression.
Regression www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts linear regression RMSE, MAE, and R-square logistic regression convex functions and sets
More informationMachine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?
Linear Regression Machine Learning CSE546 Sham Kakade University of Washington Oct 4, 2016 1 What about continuous variables? Billionaire says: If I am measuring a continuous variable, what can you do
More informationIntroduction to Machine Learning
Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationLogistic Regression. COMP 527 Danushka Bollegala
Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5 Slides adapted from Jordan Boyd-Graber, Tom Mitchell, Ziv Bar-Joseph Machine Learning: Chenhao Tan Boulder 1 of 27 Quiz question For
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationCOMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma
COMS 4771 Introduction to Machine Learning James McInerney Adapted from slides by Nakul Verma Announcements HW1: Please submit as a group Watch out for zero variance features (Q5) HW2 will be released
More informationAd Placement Strategies
Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January
More informationCSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015
CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015 Luke ZeElemoyer Slides adapted from Carlos Guestrin Predic5on of con5nuous variables Billionaire says: Wait, that s not what
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationLecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron
CS446: Machine Learning, Fall 2017 Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron Lecturer: Sanmi Koyejo Scribe: Ke Wang, Oct. 24th, 2017 Agenda Recap: SVM and Hinge loss, Representer
More informationLogistic Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Logistic Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression (Continued) Generative v. Discriminative Decision rees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 31 st, 2007 2005-2007 Carlos Guestrin 1 Generative
More informationLogistic Regression & Neural Networks
Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:
More informationl 1 and l 2 Regularization
David Rosenberg New York University February 5, 2015 David Rosenberg (New York University) DS-GA 1003 February 5, 2015 1 / 32 Tikhonov and Ivanov Regularization Hypothesis Spaces We ve spoken vaguely about
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationSCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.
SCMA292 Mathematical Modeling : Machine Learning Krikamol Muandet Department of Mathematics Faculty of Science, Mahidol University February 9, 2016 Outline Quick Recap of Least Square Ridge Regression
More informationNearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2
Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal
More informationLogistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824
Logistic Regression Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please start HW 1 early! Questions are welcome! Two principles for estimating parameters Maximum Likelihood
More informationSupport Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017
Support Vector Machines: Training with Stochastic Gradient Descent Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationMachine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression
Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,
More informationMachine Learning - Waseda University Logistic Regression
Machine Learning - Waseda University Logistic Regression AD June AD ) June / 9 Introduction Assume you are given some training data { x i, y i } i= where xi R d and y i can take C different values. Given
More informationMidterm exam CS 189/289, Fall 2015
Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationLinear Models for Regression
Linear Models for Regression CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 The Regression Problem Training data: A set of input-output
More informationLogistic Regression. William Cohen
Logistic Regression William Cohen 1 Outline Quick review classi5ication, naïve Bayes, perceptrons new result for naïve Bayes Learning as optimization Logistic regression via gradient ascent Over5itting
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 1, 2011 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationSGD and Deep Learning
SGD and Deep Learning Subgradients Lets make the gradient cheating more formal. Recall that the gradient is the slope of the tangent. f(w 1 )+rf(w 1 ) (w w 1 ) Non differentiable case? w 1 Subgradients
More informationCSC 411: Lecture 04: Logistic Regression
CSC 411: Lecture 04: Logistic Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 23, 2015 Urtasun & Zemel (UofT) CSC 411: 04-Prob Classif Sep 23, 2015 1 / 16 Today Key Concepts: Logistic
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationLinear and logistic regression
Linear and logistic regression Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Linear and logistic regression 1/22 Outline 1 Linear regression 2 Logistic regression 3 Fisher discriminant analysis
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationRegression with Numerical Optimization. Logistic
CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204
More informationKernels. Machine Learning CSE446 Carlos Guestrin University of Washington. October 28, Carlos Guestrin
Kernels Machine Learning CSE446 Carlos Guestrin University of Washington October 28, 2013 Carlos Guestrin 2005-2013 1 Linear Separability: More formally, Using Margin Data linearly separable, if there
More informationCSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18
CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H
More informationOptimization and Gradient Descent
Optimization and Gradient Descent INFO-4604, Applied Machine Learning University of Colorado Boulder September 12, 2017 Prof. Michael Paul Prediction Functions Remember: a prediction function is the function
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013
Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you
More informationLogistic Regression: Online, Lazy, Kernelized, Sequential, etc.
Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationTufts COMP 135: Introduction to Machine Learning
Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Logistic Regression Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale Doshi-Velez (Harvard)
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationCoordinate Descent and Ascent Methods
Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:
More informationStatistical Methods for SVM
Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,
More informationMax Margin-Classifier
Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization
More informationMachine Learning (CSE 446): Multi-Class Classification; Kernel Methods
Machine Learning (CSE 446): Multi-Class Classification; Kernel Methods Sham M Kakade c 2018 University of Washington cse446-staff@cs.washington.edu 1 / 12 Announcements HW3 due date as posted. make sure
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 4, 2015 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationLogistic Regression. Stochastic Gradient Descent
Tutorial 8 CPSC 340 Logistic Regression Stochastic Gradient Descent Logistic Regression Model A discriminative probabilistic model for classification e.g. spam filtering Let x R d be input and y { 1, 1}
More informationMachine Learning. Support Vector Machines. Fabio Vandin November 20, 2017
Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More information