CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015
|
|
- Gervais Lawson
- 5 years ago
- Views:
Transcription
1 CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015 Luke ZeElemoyer Slides adapted from Carlos Guestrin
2 Predic5on of con5nuous variables Billionaire says: Wait, that s not what I meant! You say: Chill out, dude. He says: I want to predict a con5nuous variable for con5nuous inputs: I want to predict salaries from GPA. You say: I can regress that
3 Linear Regression Prediction Prediction
4 Ordinary Least Squares (OLS) Observation Error or residual Prediction
5 The regression problem Instances: <x j, t j > Learn: Mapping from x to t(x) Hypothesis space: Given, basis func5ons {h 1,,h k } h i (x) 2 R Find coeffs w={w 1,,w k } Why is this usually called linear regression? model is linear in the parameters Can we es5mate func5ons that are not lines???
6 Linear Basis: 1D input Need a bias term: {h 1 (x) = x, h 2 (x)=1} y x
7 Parabola: {h 1 (x) = x 2, h 2 (x)=x, h 3 (x)=1} y 2D: {h 1 (x) = x 12, h 2 (x)= x 22, h 3 (x)=x 1 x 2, } Can define any basis functions h i (x) for n- dimensional input x=<x 1,,x n > x
8 The regression problem Instances: <x j, t j > Learn: Mapping from x to t(x) Hypothesis space: Given, basis func5ons {h 1,,h k } h i (x) 2 R Find coeffs w={w 1,,w k } Why is this usually called linear regression? model is linear in the parameters Can we es5mate func5ons that are not lines??? Precisely, minimize the residual squared error:
9 Regression: matrix nota5on K basis functions N data points weights K basis func N observed outputs measurements
10 Regression: closed form solu5on w = arg min w (Hw t)t (Hw t) F(w) =(Hw t) T (Hw t) O w F(w) =0 2H T (Hw t) =0 H T Hw H T t = 0 w =(H T H) 1 H T t
11 Regression solu5on: simple matrix math where k k matrix for k basis functions k 1 vector
12 But, why? Billionaire (again) says: Why sum squared error??? You say: Gaussians, Dr. Gateson, Gaussians Model: predic5on is linear func5on plus Gaussian noise t(x) = i w i h i (x) + ε Learn w using MLE:
13 X Maximizing X log- likelihood Maximize wrt w: X X X arg max w 1 ln 2 X NX =argmax w = arg min w j=1 X X N NX + P j=1 P P [t X j Pi w Pih i (x j )] NX X [t j w i h i (x j )] 2 j=1 i [t j Pi w ih i (x j )] Least-squares Linear Regression is MLE for Gaussians!!!
14 Regulariza5on in Linear Regression One sign of overfikng: large parameter values! Regularized or penalized regressions modified learning object to penalize large parameters
15 Ridge Regression Introduce a new objec5ve func5on: NX kx kx ŵ ridge = arg min w j=1 t(x j ) (w 0 + i=1 w i h i (x j ))! 2 + i=1 w 2 i Prefer low error but also add a squared penalize for large weights λ is hyperparameter that balances tradeoff Explicitly writing out bias feature (essentially h 0 =1), which is not penalized
16 Ridge Regression: matrix nota5on ŵ ridge = arg min w NX j=1 t(x j ) (w 0 + kx i=1 w i h i (x j ))! 2 + kx i=1 w 2 i + w T I 0+k w bias column and N data points weights k basis functs plus bias N observed outputs measurements I 0+k = k k+1 x k+1 identity matrix, but with 0 in upper left k+1 k basis functions
17 Ridge Regression: closed form solu5on ŵ ridge = arg min w NX j=1 t(x j ) (w 0 + O w F(w) =0 2 kx w i h i (x j ))! + i=1 kx i=1 w 2 i + w T I 0+k w F(w) =(Hw t) T (Hw t)+ w T I 0+k w 2H T (Hw t)+2 I 0+k w = 0 w ridge =(H T H + I 0+k ) 1 H T t
18 Regression solu5on: simple matrix math ŵ ridge = arg min w NX j=1 t(x j ) (w 0 + kx i=1 w i h i (x j ))! 2 + kx i=1 w 2 i + w T I 0+k w w ridge =(H T H + I 0+k ) 1 H T t Compare to un-regularized regression: w =(H T H) 1 H T t
19 Ridge Regression How does varying lambda change w? NX kx kx ŵ ridge = arg min w j=1 t(x j ) (w 0 + i=1 w i h i (x j ))! 2 + i=1 w 2 i Larger λ? Smaller λ? As λ à 0? Becomes same a MLE, unregularized As λ à? All weights will be 0!
20 Ridge Coefficent Path Feature Weight From Kevin Murphy textbook Larger λ ß ŵ 2 à Smaller λ
21 How to pick lambda? Experimentation cycle Select a hypothesis f to best match training set Tune hyperparameters on held-out set Try many different values of lambda, pick best one Or, can do k-fold cross validation No held-out set Divide training set into k subsets Repeatedly train on k-1 and test on remaining one Average the results Training Data Held-Out (Development) Data Test Data Training Part 1 Training Part 2 Training Part K Test Data
22 Why squared regulariza5on? Ridge: ŵ ridge = arg min w LASSO: ŵ LASSO = arg min w NX j=1 NX j=1 t(x j ) (w 0 + t(x j ) (w kx w i h i (x j ))! + i=1 2 kx w i h i (x j ))! + i=1 Linear penalty pushes more weights to zero Allows for a type of feature selec3on But, not differen5able and no closed form solu5on. kx i=1 w 2 i kx w i i=1
23 Geometric Intui5on w 2 β 2 ^. β w MLE w. 2 β 2 ^ β w MLE Ridge Regression β1 w 1 Lasso β 1 w 1 From Rob Tibshirani slides
24 LASSO Coefficent Path Feature Weight From Kevin Murp textbook Larger λ ß ŵ 1 à Smaller λ
25 Bias- Variance tradeoff Intui5on Model too simple: does not fit the data well A biased solu5on 1 t 0 1 M =0 Model too complex: small changes to the data, solu5on changes a lot A high- variance solu5on t x 1 M =9 0 x 1
26 (Squared) Bias of learner Given: dataset D with m samples Learn: for different data D, you will learn different h D (x) Expected predic5on (averaged over hypotheses): h(x) =E D [h D (x)] Bias: expected difference between expected predic5on and truth (here we square it) E x [(t(x) h(x)) 2 ] Measures how well you expect to represent true solu5on Decreases with more complex model Zero bias typically means good predic5ons as mà
27 Variance of learner Given: dataset D with m samples h(x) =E D [h D (x)] Learn: for different datasets D, you will learn different h D (x) Variance: difference between what you expect to learn and what you learn from a from a par5cular dataset E D [E x [(h D (x) h(x)) 2 ]] Measures how sensi5ve learner is to specific dataset Decreases with simpler model
28 = Bias Variance decomposi5on of error MSE = E D [E x [(t(x) h D (x)) 2 ]] = E D [E x [(t(x) h(x)+ h(x) hd (x)) 2 ]] = E D [E x [(t(x) h(x)) 2 ]] + E D [E x [( h(x) h D (x)) 2 ]] + + 2E D [E x [(t(x) h(x))( h(x) hd (x))]] =0, because h(x) =E D [h D (x)] E x [(t(x) h(x)) 2 ] Bias + E D [E x [(h D (x) h(x)) 2 ]] Variance
29 Bias- Variance Tradeoff Choice of hypothesis class introduces learning bias More complex class less bias More complex class more variance
30 Training set error Given a dataset (Training data) Choose a loss func5on e.g., squared error (L 2 ) for regression Training error: For a par5cular set of parameters, loss func5on on training data:
31 Training error as a func5on of model complexity
32 Predic5on error Training set error can be poor measure of quality of solu5on Predic5on error (true error): We really care about error over all possibili5es:
33 Predic5on error as a func5on of model complexity
34 Compu5ng predic5on error To correctly predict error Hard integral! May not know t(x) for every x, may not know p(x) Monte Carlo integration (sampling approximation) Sample a set of i.i.d. points {x 1,,x M } from p(x) Approximate integral with sample average
35 Why training set error doesn t approximate predic5on error? Sampling approxima5on of predic5on error: Training error : Very similar equa5ons!!! Why is training set a bad measure of predic5on error???
36 Why training set error doesn t approximate predic5on error? Sampling approxima5on Because of predic5on you cheated!!! error: Training error good estimate for a single w, But you optimized w with respect to the training error, and found w that is good for this set of samples Training error : Training error is a (optimistically) biased estimate of prediction error Very similar equa5ons!!! Why is training set a bad measure of predic5on error???
37 Test set error Given a dataset, randomly split it into two parts: Training data {x 1,, x Ntrain } Test data {x 1,, x Ntest } Use training data to op5mize parameters w Test set error: For the final solu)on w*, evaluate the error using:
38 Test set error as a func5on of model complexity
39 Overfikng: this slide is so important we Assume: are looking at it again! Data generated from distribu5on D(X,Y) A hypothesis space H Define: errors for hypothesis h H Training error: error train (h) Data (true) error: error true (h) We say h overfits the training data if there exists an h H such that: and error train (h) < error train (h ) error true (h) > error true (h )
40 Summary: error es5mators Gold Standard: Training: op5mis5cally biased Test: our final measure
41 Error as a func5on of number of training examples for a fixed model complexity bias little data infinite data
42 Error as func5on of regulariza5on parameter, fixed model complexity λ= λ=0
43 Summary: error es5mators Gold Standard: Be careful!!! Test set only unbiased if you never never ever ever do any any any any learning on the test data Training: op5mis5cally biased For example, if you use the test set to select the degree of the polynomial no longer unbiased!!! (We will address this problem later in the semester) Test: our final measure
44 What you need to know Regression Basis func5on = features Op5mizing sum squared error Rela5onship between regression and Gaussians Regulariza5on Ridge regression math LASSO Formula5on How to set lambda Bias- Variance trade- off
Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013
Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?
Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what
More informationMachine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?
Linear Regression Machine Learning CSE546 Sham Kakade University of Washington Oct 4, 2016 1 What about continuous variables? Billionaire says: If I am measuring a continuous variable, what can you do
More informationGaussians Linear Regression Bias-Variance Tradeoff
Readings listed in class website Gaussians Linear Regression Bias-Variance Tradeoff Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 22 nd, 2007 Maximum Likelihood Estimation
More informationBias-Variance Tradeoff
What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff
More informationLinear Regression. Machine Learning CSE546 Kevin Jamieson University of Washington. Oct 5, Kevin Jamieson 1
Linear Regression Machine Learning CSE546 Kevin Jamieson University of Washington Oct 5, 2017 1 The regression problem Given past sales data on zillow.com, predict: y = House sale price from x = {# sq.
More informationLearning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin
Learning Theory Machine Learning CSE546 Carlos Guestrin University of Washington November 25, 2013 Carlos Guestrin 2005-2013 1 What now n We have explored many ways of learning from data n But How good
More informationCSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression
CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html
More informationRegression.
Regression www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts linear regression RMSE, MAE, and R-square logistic regression convex functions and sets
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. October 7, Efficiency: If size(w) = 100B, each prediction is expensive:
Simple Variable Selection LASSO: Sparse Regression Machine Learning CSE546 Carlos Guestrin University of Washington October 7, 2013 1 Sparsity Vector w is sparse, if many entries are zero: Very useful
More informationMachine Learning and Data Mining. Linear regression. Prof. Alexander Ihler
+ Machine Learning and Data Mining Linear regression Prof. Alexander Ihler Supervised learning Notation Features x Targets y Predictions ŷ Parameters θ Learning algorithm Program ( Learner ) Change µ Improve
More informationLearning Theory. Aar$ Singh and Barnabas Poczos. Machine Learning / Apr 17, Slides courtesy: Carlos Guestrin
Learning Theory Aar$ Singh and Barnabas Poczos Machine Learning 10-701/15-781 Apr 17, 2014 Slides courtesy: Carlos Guestrin Learning Theory We have explored many ways of learning from data But How good
More informationClassification Logistic Regression
Announcements: Classification Logistic Regression Machine Learning CSE546 Sham Kakade University of Washington HW due on Friday. Today: Review: sub-gradients,lasso Logistic Regression October 3, 26 Sham
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationcxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c
Warm up D cai.yo.ie p IExrL9CxsYD Sglx.Ddl f E Luo fhlexi.si dbll Fix any a, b, c > 0. 1. What is the x 2 R that minimizes ax 2 + bx + c x a b Ta OH 2 ax 16 0 x 1 Za fhkxiiso3ii draulx.h dp.d 2. What is
More informationCOMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)
COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless
More informationCS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model
Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Assignment
More informationUVA CS 4501: Machine Learning. Lecture 6: Linear Regression Model with Dr. Yanjun Qi. University of Virginia
UVA CS 4501: Machine Learning Lecture 6: Linear Regression Model with Regulariza@ons Dr. Yanjun Qi University of Virginia Department of Computer Science Where are we? è Five major sec@ons of this course
More informationCS 6140: Machine Learning Spring 2016
CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa?on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Logis?cs Assignment
More informationCSE546: SVMs, Dual Formula5on, and Kernels Winter 2012
CSE546: SVMs, Dual Formula5on, and Kernels Winter 2012 Luke ZeClemoyer Slides adapted from Carlos Guestrin Linear classifiers Which line is becer? w. = j w (j) x (j) Data Example i Pick the one with the
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationLearning Theory Continued
Learning Theory Continued Machine Learning CSE446 Carlos Guestrin University of Washington May 13, 2013 1 A simple setting n Classification N data points Finite number of possible hypothesis (e.g., dec.
More informationRirdge Regression. Szymon Bobek. Institute of Applied Computer science AGH University of Science and Technology
Rirdge Regression Szymon Bobek Institute of Applied Computer science AGH University of Science and Technology Based on Carlos Guestrin adn Emily Fox slides from Coursera Specialization on Machine Learnign
More informationLinear classifiers Lecture 3
Linear classifiers Lecture 3 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin ML Methodology Data: labeled instances, e.g. emails marked spam/ham
More informationCOMP 562: Introduction to Machine Learning
COMP 562: Introduction to Machine Learning Lecture 20 : Support Vector Machines, Kernels Mahmoud Mostapha 1 Department of Computer Science University of North Carolina at Chapel Hill mahmoudm@cs.unc.edu
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:
More informationPriors in Dependency network learning
Priors in Dependency network learning Sushmita Roy sroy@biostat.wisc.edu Computa:onal Network Biology Biosta2s2cs & Medical Informa2cs 826 Computer Sciences 838 hbps://compnetbiocourse.discovery.wisc.edu
More informationRegression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh.
Regression Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh September 24 (All of the slides in this course have been adapted from previous versions
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple
More informationAccouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF
Accouncements You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF Please do not zip these files and submit (unless there are >5 files) 1 Bayesian Methods Machine Learning
More informationLast Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression
CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 22 Dan Weld Learning Gaussians Naïve Bayes Last Time Gaussians Naïve Bayes Logistic Regression Today Some slides from Carlos Guestrin, Luke Zettlemoyer
More informationIntroduction to Machine Learning Fall 2017 Note 5. 1 Overview. 2 Metric
CS 189 Introduction to Machine Learning Fall 2017 Note 5 1 Overview Recall from our previous note that for a fixed input x, our measurement Y is a noisy measurement of the true underlying response f x):
More informationLinear Regression with mul2ple variables. Mul2ple features. Machine Learning
Linear Regression with mul2ple variables Mul2ple features Machine Learning Mul4ple features (variables). Size (feet 2 ) Price ($1000) 2104 460 1416 232 1534 315 852 178 Mul4ple features (variables). Size
More informationLogistic Regression Logistic
Case Study 1: Estimating Click Probabilities L2 Regularization for Logistic Regression Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 10 th,
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: Classification: Logistic Regression NB & LR connections Readings: Barber 17.4 Dhruv Batra Virginia Tech Administrativia HW2 Due: Friday 3/6, 3/15, 11:55pm
More informationStochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More information[Read Ch. 5] [Recommended exercises: 5.2, 5.3, 5.4]
Evaluating Hypotheses [Read Ch. 5] [Recommended exercises: 5.2, 5.3, 5.4] Sample error, true error Condence intervals for observed hypothesis error Estimators Binomial distribution, Normal distribution,
More informationIntroduction to Machine Learning
Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10
COS53: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 0 MELISSA CARROLL, LINJIE LUO. BIAS-VARIANCE TRADE-OFF (CONTINUED FROM LAST LECTURE) If V = (X n, Y n )} are observed data, the linear regression problem
More informationMachine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 20, 2012 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)
More informationModeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop
Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector
More informationBias/variance tradeoff, Model assessment and selec+on
Applied induc+ve learning Bias/variance tradeoff, Model assessment and selec+on Pierre Geurts Department of Electrical Engineering and Computer Science University of Liège October 29, 2012 1 Supervised
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More informationy Xw 2 2 y Xw λ w 2 2
CS 189 Introduction to Machine Learning Spring 2018 Note 4 1 MLE and MAP for Regression (Part I) So far, we ve explored two approaches of the regression framework, Ordinary Least Squares and Ridge Regression:
More informationCS 6140: Machine Learning Spring What We Learned Last Week 2/26/16
Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Sign
More informationLinear Regression 9/23/17. Simple linear regression. Advertising sales: Variance changes based on # of TVs. Advertising sales: Normal error?
Simple linear regression Linear Regression Nicole Beckage y " = β % + β ' x " + ε so y* " = β+ % + β+ ' x " Method to assess and evaluate the correlation between two (continuous) variables. The slope of
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 2, 2015 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationToday. Calculus. Linear Regression. Lagrange Multipliers
Today Calculus Lagrange Multipliers Linear Regression 1 Optimization with constraints What if I want to constrain the parameters of the model. The mean is less than 10 Find the best likelihood, subject
More informationMachine Learning and Data Mining. Linear regression. Kalev Kask
Machine Learning and Data Mining Linear regression Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ Parameters q Learning algorithm Program ( Learner ) Change q Improve performance
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationCS 340 Lec. 15: Linear Regression
CS 340 Lec. 15: Linear Regression AD February 2011 AD () February 2011 1 / 31 Regression Assume you are given some training data { x i, y i } N where x i R d and y i R c. Given an input test data x, you
More informationApplied Machine Learning Annalisa Marsico
Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature
More informationStatistics 203: Introduction to Regression and Analysis of Variance Penalized models
Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance
More informationWeek 3: Linear Regression
Week 3: Linear Regression Instructor: Sergey Levine Recap In the previous lecture we saw how linear regression can solve the following problem: given a dataset D = {(x, y ),..., (x N, y N )}, learn to
More informationData Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.
TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin
More informationMachine learning - HT Basis Expansion, Regularization, Validation
Machine learning - HT 016 4. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford Feburary 03, 016 Outline Introduce basis function to go beyond linear regression Understanding
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationLINEAR REGRESSION, RIDGE, LASSO, SVR
LINEAR REGRESSION, RIDGE, LASSO, SVR Supervised Learning Katerina Tzompanaki Linear regression one feature* Price (y) What is the estimated price of a new house of area 30 m 2? 30 Area (x) *Also called
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 4, 2015 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationLinear classifiers: Logistic regression
Linear classifiers: Logistic regression STAT/CSE 416: Machine Learning Emily Fox University of Washington April 19, 2018 How confident is your prediction? The sushi & everything else were awesome! The
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 1, 2011 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement
More informationProbability and Statistical Decision Theory
Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Probability and Statistical Decision Theory Many slides attributable to: Erik Sudderth (UCI) Prof. Mike Hughes
More informationCase Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!
Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 4, 017 1 Announcements:
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 1 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationLecture 3. Linear Regression II Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationMachine Learning - MT & 5. Basis Expansion, Regularization, Validation
Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships
More informationBias-Variance in Machine Learning
Bias-Variance in Machine Learning Bias-Variance: Outline Underfitting/overfitting: Why are complex hypotheses bad? Simple example of bias/variance Error as bias+variance for regression brief comments on
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 6: Model complexity scores (v3) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 34 Estimating prediction error 2 / 34 Estimating prediction error We saw how we can estimate
More informationMachine Learning. Ensemble Methods. Manfred Huber
Machine Learning Ensemble Methods Manfred Huber 2015 1 Bias, Variance, Noise Classification errors have different sources Choice of hypothesis space and algorithm Training set Noise in the data The expected
More informationMidterm Exam, Spring 2005
10-701 Midterm Exam, Spring 2005 1. Write your name and your email address below. Name: Email address: 2. There should be 15 numbered pages in this exam (including this cover sheet). 3. Write your name
More informationLogis&c Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Logis&c Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these
More information4 Bias-Variance for Ridge Regression (24 points)
Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationy(x) = x w + ε(x), (1)
Linear regression We are ready to consider our first machine-learning problem: linear regression. Suppose that e are interested in the values of a function y(x): R d R, here x is a d-dimensional vector-valued
More informationBoos$ng Can we make dumb learners smart?
Boos$ng Can we make dumb learners smart? Aarti Singh Machine Learning 10-601 Nov 29, 2011 Slides Courtesy: Carlos Guestrin, Freund & Schapire 1 Why boost weak learners? Goal: Automa'cally categorize type
More informationESS2222. Lecture 3 Bias-Variance Trade-off
ESS2222 Lecture 3 Bias-Variance Trade-off Hosein Shahnas University of Toronto, Department of Earth Sciences, 1 Outline Bias-Variance Trade-off Overfitting & Regularization Ridge & Lasso Regression Nonlinear
More information6.867 Machine Learning
6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.
More informationLecture 3: Statistical Decision Theory (Part II)
Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical
More informationSupport vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More informationCSE446: non-parametric methods Spring 2017
CSE446: non-parametric methods Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin and Luke Zettlemoyer Linear Regression: What can go wrong? What do we do if the bias is too strong? Might want
More informationMachine Learning (CSE 446): Probabilistic Machine Learning
Machine Learning (CSE 446): Probabilistic Machine Learning oah Smith c 2017 University of Washington nasmith@cs.washington.edu ovember 1, 2017 1 / 24 Understanding MLE y 1 MLE π^ You can think of MLE as
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationMS-C1620 Statistical inference
MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents
More informationIntroduction to Statistical modeling: handout for Math 489/583
Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect
More informationMachine Learning Gaussian Naïve Bayes Big Picture
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 27, 2011 Today: Naïve Bayes Big Picture Logistic regression Gradient ascent Generative discriminative
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationDecision Trees Lecture 12
Decision Trees Lecture 12 David Sontag New York University Slides adapted from Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore Machine Learning in the ER Physician documentation Triage Information
More informationIs the test error unbiased for these programs?
Is the test error unbiased for these programs? Xtrain avg N o Preprocessing by de meaning using whole TEST set 2017 Kevin Jamieson 1 Is the test error unbiased for this program? e Stott see non for f x
More informationLogistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.
Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features
More information