Linear Regression. Machine Learning CSE546 Kevin Jamieson University of Washington. Oct 5, Kevin Jamieson 1
|
|
- Rolf Small
- 5 years ago
- Views:
Transcription
1 Linear Regression Machine Learning CSE546 Kevin Jamieson University of Washington Oct 5,
2 The regression problem Given past sales data on zillow.com, predict: y = House sale price from x = {# sq. ft., zip code, date of sale, etc.} Training Data: {(x i,y i )} n i=1 x i 2 R d y i 2 R Sale Price # square feet 2
3 The regression problem Given past sales data on zillow.com, predict: y = House sale price from x = {# sq. ft., zip code, date of sale, etc.} best linear fit Training Data: {(x i,y i )} n i=1 x i 2 R d y i 2 R Sale Price # square feet Hypothesis: linear y i x T i w Loss: least squares min w nx i=1 y i x T i w 2 3
4 The regression problem in matrix notation y = y 1. y n bw LS = arg min w X = x T x T n nx i=1 y i x T i w 2 = arg min w (y Xw)T (y Xw) 4
5 The regression problem in matrix notation bw LS = arg min w Xw 2 2 = arg min w Xw)T (y Xw) 5
6 The regression problem in matrix notation bw LS = arg min w y Xw 2 2 =(X T X) 1 X T y What about an offset? bw LS, b b LS = arg min w,b nx i=1 y i (x T i w + b) 2 = arg min w,b y (Xw + 1b) 2 2 6
7 Dealing with an offset bw LS, b b LS = arg min w,b y (Xw + 1b) 2 2 7
8 Dealing with an offset bw LS, b b LS = arg min w,b y (Xw + 1b) 2 2 X T X bw LS + b b LS X T 1 = X T y 1 T X bw LS + b b LS 1 T 1 = 1 T y If X T 1 = 0 (i.e., if each feature is mean-zero) then bw LS =(X T X) 1 X T Y nx b bls = 1 n i=1 y i 8
9 The regression problem in matrix notation bw LS = arg min w y Xw 2 2 =(X T X) 1 X T y But why least squares? Consider y i = x T i w + i where i i.i.d. N (0, 2 ) P (y x, w, )= 9
10 Maximizing log-likelihood Maximize: Y n 1 log P (D w, ) = log( p 2 ) n i=1 e (y i x T i w)
11 MLE is LS under linear model bw LS = arg min w nx i=1 y i x T i w 2 bw MLE = arg max P (D w, ) if y i = x T w i.i.d. i w + i and i N (0, 2 ) bw LS = bw MLE =(X T X) 1 X T Y 11
12 The regression problem Given past sales data on zillow.com, predict: y = House sale price from x = {# sq. ft., zip code, date of sale, etc.} best linear fit Training Data: {(x i,y i )} n i=1 x i 2 R d y i 2 R Sale Price # square feet Hypothesis: linear y i x T i w Loss: least squares min w nx i=1 y i x T i w 2 12
13 The regression problem Sale Price Given past sales data on zillow.com, predict: y = House sale price from x = {# sq. ft., zip code, date of sale, etc.} Training Data: date of sale best linear fit {(x i,y i )} n i=1 Hypothesis: linear y i x T i w Loss: least squares min w nx i=1 x i 2 R d y i 2 R y i x T i w 2 13
14 The regression problem Training Data: {(x i,y i )} n i=1 Hypothesis: linear y i x T i w x i 2 R d y i 2 R Transformed data: Loss: least squares min w nx i=1 y i x T i w 2 14
15 The regression problem Training Data: {(x i,y i )} n i=1 Hypothesis: linear y i x T i w Loss: least squares min w nx i=1 x i 2 R d y i 2 R y i x T i w 2 Transformed data: h : R d! R p maps original features to a rich, possibly high-dimensional space h 1 (x) 6h 2 (x) 7 6 in d=1: h(x) = 6 4. h p (x) 7 5 = 6 4 for d>1, generate {u j } p j=1 Rd 1 h j (x) = 1 + exp(u T j x) h j (x) =(u T j x) 2 3 x 7 5 x 2. x p h j (x) = cos(u T j x) 15
16 The regression problem Training Data: {(x i,y i )} n i=1 Hypothesis: linear y i x T i w Loss: least squares min w nx i=1 x i 2 R d y i 2 R y i x T i w 2 Transformed data: h(x) = Hypothesis: linear y i h(x i ) T w Loss: least squares min w nx i=1 2 3 h 1 (x) h 2 (x) h p (x) w 2 R p y i h(x i ) T w 2 16
17 The regression problem Training Data: {(x i,y i )} n i=1 x i 2 R d y i 2 R Transformed data: h(x) = 2 3 h 1 (x) h 2 (x) h p (x) best linear fit Hypothesis: linear Sale Price y i h(x i ) T w Loss: least squares min w nx i=1 w 2 R p y i h(x i ) T w 2 date of sale 17
18 The regression problem Training Data: {(x i,y i )} n i=1 Sale Price date of sale x i 2 R d y i 2 R small p fit Transformed data: h(x) = Hypothesis: linear y i h(x i ) T w Loss: least squares min w nx i=1 2 3 h 1 (x) h 2 (x) h p (x) w 2 R p y i h(x i ) T w 2 18
19 The regression problem Training Data: {(x i,y i )} n i=1 Sale Price date of sale x i 2 R d y i 2 R large p fit Transformed data: h(x) = Hypothesis: linear y i h(x i ) T w Loss: least squares min w nx i=1 2 3 h 1 (x) h 2 (x) h p (x) w 2 R p y i h(x i ) T w 2 What s going on here? 19
20 Bias-Variance Tradeoff Machine Learning CSE546 Kevin Jamieson University of Washington Oct 5,
21 Statistical Learning P XY (X = x, Y = y) y x
22 Statistical Learning P XY (X = x, Y = y) P XY (Y = y X = x 0 ) y x 0 x 1 x P XY (Y = y X = x 1 )
23 Statistical Learning P XY (X = x, Y = y) Ideally, we want to find: (x) =E XY [Y X = x] P XY (Y = y X = x 0 ) y x 0 x 1 x P XY (Y = y X = x 1 )
24 Statistical Learning P XY (X = x, Y = y) Ideally, we want to find: (x) =E XY [Y X = x] y x
25 Statistical Learning P XY (X = x, Y = y) Ideally, we want to find: (x) =E XY [Y X = x] But we only have samples: (x i,y i ) i.i.d. P XY for i =1,...,n y x
26 Statistical Learning P XY (X = x, Y = y) Ideally, we want to find: (x) =E XY [Y X = x] y x But we only have samples: (x i,y i ) i.i.d. P XY bf = arg min f2f 1 n nx (y i f(x i )) 2 i=1 for i =1,...,n and are restricted to a function class (e.g., linear) so we compute:
27 Statistical Learning P XY (X = x, Y = y) Ideally, we want to find: (x) =E XY [Y X = x] y x But we only have samples: (x i,y i ) i.i.d. P XY bf = arg min f2f 1 n nx (y i f(x i )) 2 i=1 for i =1,...,n and are restricted to a function class (e.g., linear) so we compute: We care about future predictions: E XY [(Y b f(x)) 2 ]
28 Statistical Learning P XY (X = x, Y = y) Ideally, we want to find: (x) =E XY [Y X = x] y x But we only have samples: (x i,y i ) i.i.d. P XY bf = arg min f2f 1 n nx (y i f(x i )) 2 i=1 for i =1,...,n and are restricted to a function class (e.g., linear) so we compute: Each draw D = {(x i,y i )} n i=1 results in di erent b f
29 Statistical Learning P XY (X = x, Y = y) Ideally, we want to find: (x) =E XY [Y X = x] y E D [ b f(x)] x But we only have samples: (x i,y i ) i.i.d. P XY bf = arg min f2f 1 n nx (y i f(x i )) 2 i=1 for i =1,...,n and are restricted to a function class (e.g., linear) so we compute: Each draw D = {(x i,y i )} n i=1 results in di erent b f
30 Bias-Variance Tradeoff (x) =E XY [Y X = x] b f = arg min f2f 1 n nx (y i f(x i )) 2 i=1 E Y X=x [E D [(Y b fd (x)) 2 ]] = E Y X=x [E D [(Y (x)+ (x) b fd (x)) 2 ]]
31 Bias-Variance Tradeoff (x) =E XY [Y X = x] b f = arg min f2f 1 n nx (y i f(x i )) 2 i=1 E XY [E D [(Y b fd (x)) 2 ] X = x] =E XY [E D [(Y (x)+ (x) b fd (x)) 2 ] X = x] =E XY [E D [(Y (x)) 2 + 2(Y (x))( (x) fd b (x)) +( (x) fd b (x)) 2 ] X = x] =E XY [(Y (x)) 2 X = x]+e D [( (x) fd b (x)) 2 ] irreducible error Caused by stochastic label noise learning error Caused by either using too simple of a model or not enough data to learn the model accurately
32 Bias-Variance Tradeoff (x) =E XY [Y X = x] b f = arg min f2f 1 n nx (y i f(x i )) 2 i=1 E D [( (x) b fd (x)) 2 ]=E D [( (x) E D [ b f D (x)] + E D [ b f D (x)] b fd (x)) 2 ]
33 Bias-Variance Tradeoff (x) =E XY [Y X = x] b f = arg min f2f 1 n nx (y i f(x i )) 2 i=1 E D [( (x) fd b (x)) 2 ]=E D [( (x) E D [ f b D (x)] + E D [ f b D (x)] fd b (x)) 2 ] =E D [( (x) E D [ f b D (x)]) 2 + 2( (x) E D [ f b D (x)])(e D [ f b D (x)] fd b (x)) +(E D [ f b D (x)] fd b (x)) 2 ] =( (x) E D [ f b D (x)]) 2 + E D [(E D [ f b D (x)] fd b (x)) 2 ] biased squared variance
34 Bias-Variance Tradeoff E XY [E D [(Y b fd (x)) 2 ] X = x] =E XY [(Y (x)) 2 X = x] irreducible error +( (x) E D [ b f D (x)]) 2 + E D [(E D [ b f D (x)] b fd (x)) 2 ] biased squared variance Model too simple high bias, cannot fit well to data Model too complex high variance, small changes in data change learned function a lot
35 Bias-Variance Tradeoff E XY [E D [(Y b fd (x)) 2 ] X = x] =E XY [(Y (x)) 2 X = x] irreducible error +( (x) E D [ b f D (x)]) 2 + E D [(E D [ b f D (x)] b fd (x)) 2 ] biased squared variance
36 Overfitting Machine Learning CSE546 Kevin Jamieson University of Washington Oct 5,
37 Bias-Variance Tradeoff Choice of hypothesis class introduces learning bias More complex class less bias More complex class more variance But in practice??
38 Bias-Variance Tradeoff Choice of hypothesis class introduces learning bias More complex class less bias More complex class more variance But in practice?? Before we saw how increasing the feature space can increase the complexity of the learned estimator: F 1 F 2 F 3... bf (k) D 1 = arg min f2f k D X (x i,y i )2D Complexity grows as k grows (y i f(x i )) 2
39 Training set error as a function of model complexity F 1 F 2 F 3... bf (k) D 1 = arg min f2f k D X (x i,y i )2D D i.i.d. P XY (y i f(x i )) 2 1 TRAIN error: X (y i b(k) f D D (x i)) 2 (x i,y i )2D TRUE error: E XY [(Y f b(k) D (X))2 ] 39
40 Training set error as a function of model complexity F 1 F 2 F 3... bf (k) D 1 = arg min f2f k D X (x i,y i )2D D i.i.d. P XY (y i f(x i )) 2 1 TRAIN error: X (y i b(k) f D D (x i)) 2 (x i,y i )2D TRUE error: E XY [(Y f b(k) D (X))2 ] Complexity (k) TEST error: T i.i.d. 1 T P XY X (x i,y i )2T (y i b f (k) D (x i)) 2 Important: D \ T = ; 40
41 Training set error as a function of model complexity F 1 F 2 F 3... bf (k) D 1 = arg min f2f k D X (x i,y i )2D D i.i.d. P XY (y i f(x i )) 2 1 TRAIN error: X (y i b(k) f D D (x i)) 2 (x i,y i )2D TRUE error: E XY [(Y f b(k) D (X))2 ] Each line is i.i.d. draw of D or T TEST error: T i.i.d. 1 T P XY X (x i,y i )2T (y i b f (k) D (x i)) 2 Important: D \ T = ; Complexity (k) Plot from Hastie et al 41
42 Training set error as a function of model complexity F 1 F 2 F 3... bf (k) D 1 = arg min f2f k D X (x i,y i )2D D i.i.d. P XY (y i f(x i )) 2 1 TRAIN error: X (y i b(k) f D D (x i)) 2 (x i,y i )2D TRAIN error is optimistically biased because it is evaluated on the data it trained on. TEST error is unbiased only if T is never used to train the model or even pick the complexity k. TRUE error: E XY [(Y f b(k) D (X))2 ] TEST error: T i.i.d. 1 T P XY X (x i,y i )2T (y i b f (k) D (x i)) 2 Important: D \ T = ; 42
43 Test set error Given a dataset, randomly split it into two parts: Training data: Test data: Use training data to learn predictor e.g., 1 D T X D (x i,y i )2D Important: D \ T = ; (y i f b(k) D (x i)) 2 use training data to pick complexity k (next lecture) Use test data to report predicted performance 1 T X (x i,y i )2T (y i b f (k) D (x i)) 2 43
44 Overfitting Overfitting: a learning algorithm overfits the training data if it outputs a solution w when there exists another solution w such that: 44
45 How many points do I use for training/testing? Very hard question to answer! Too few training points, learned model is bad Too few test points, you never know if you reached a good solution Bounds, such as Hoeffding s inequality can help: More on this later this quarter, but still hard to answer Typically: If you have a reasonable amount of data 90/10 splits are common If you have little data, then you need to get fancy (e.g., bootstrapping) 45
46 Recap Learning is Collect some data E.g., housing info and sale price Randomly split dataset into TRAIN and TEST E.g., 80% and 20%, respectively Choose a hypothesis class or model E.g., linear Choose a loss function E.g., least squares Choose an optimization procedure E.g., set derivative to zero to obtain estimator Justifying the accuracy of the estimate E.g., report TEST error 46
Linear Regression. Machine Learning CSE546 Kevin Jamieson University of Washington. Oct 2, Kevin Jamieson 1
Linear Regression Machine Learning CSE546 Kevin Jamieson University of Washington Oct 2, 2018 1 The regression problem Given past sales data on zillow.com, predict: y = House sale price from x = {# sq.
More informationAccouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF
Accouncements You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF Please do not zip these files and submit (unless there are >5 files) 1 Bayesian Methods Machine Learning
More informationcxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c
Warm up D cai.yo.ie p IExrL9CxsYD Sglx.Ddl f E Luo fhlexi.si dbll Fix any a, b, c > 0. 1. What is the x 2 R that minimizes ax 2 + bx + c x a b Ta OH 2 ax 16 0 x 1 Za fhkxiiso3ii draulx.h dp.d 2. What is
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013
Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you
More informationMachine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?
Linear Regression Machine Learning CSE546 Sham Kakade University of Washington Oct 4, 2016 1 What about continuous variables? Billionaire says: If I am measuring a continuous variable, what can you do
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?
Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what
More informationBias-Variance Tradeoff
What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationCSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015
CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015 Luke ZeElemoyer Slides adapted from Carlos Guestrin Predic5on of con5nuous variables Billionaire says: Wait, that s not what
More informationNearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2
Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal
More informationClassification Logistic Regression
Classification Logistic Regression Machine Learning CSE546 Kevin Jamieson University of Washington October 16, 2016 1 THUS FAR, REGRESSION: PREDICT A CONTINUOUS VALUE GIVEN SOME INPUTS 2 Weather prediction
More informationWarm up. Regrade requests submitted directly in Gradescope, do not instructors.
Warm up Regrade requests submitted directly in Gradescope, do not email instructors. 1 float in NumPy = 8 bytes 10 6 2 20 bytes = 1 MB 10 9 2 30 bytes = 1 GB For each block compute the memory required
More informationMachine Learning. Linear Models. Fabio Vandin October 10, 2017
Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w
More informationCSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18
CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H
More informationLearning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin
Learning Theory Machine Learning CSE546 Carlos Guestrin University of Washington November 25, 2013 Carlos Guestrin 2005-2013 1 What now n We have explored many ways of learning from data n But How good
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationClassification Logistic Regression
O due Thursday µtwl Classification Logistic Regression Machine Learning CSE546 Kevin Jamieson University of Washington October 16, 2016 1 THUS FAR, REGRESSION: PREDICT A CONTINUOUS VALUE GIVEN SOME INPUTS
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationStochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More informationIs the test error unbiased for these programs?
Is the test error unbiased for these programs? Xtrain avg N o Preprocessing by de meaning using whole TEST set 2017 Kevin Jamieson 1 Is the test error unbiased for this program? e Stott see non for f x
More informationUnderstanding Generalization Error: Bounds and Decompositions
CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the
More informationGaussians Linear Regression Bias-Variance Tradeoff
Readings listed in class website Gaussians Linear Regression Bias-Variance Tradeoff Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 22 nd, 2007 Maximum Likelihood Estimation
More informationMachine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 4: SVM I Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d. from distribution
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationPDEEC Machine Learning 2016/17
PDEEC Machine Learning 2016/17 Lecture - Model assessment, selection and Ensemble Jaime S. Cardoso jaime.cardoso@inesctec.pt INESC TEC and Faculdade Engenharia, Universidade do Porto Nov. 07, 2017 1 /
More information[Read Ch. 5] [Recommended exercises: 5.2, 5.3, 5.4]
Evaluating Hypotheses [Read Ch. 5] [Recommended exercises: 5.2, 5.3, 5.4] Sample error, true error Condence intervals for observed hypothesis error Estimators Binomial distribution, Normal distribution,
More informationProbability and Statistical Decision Theory
Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Probability and Statistical Decision Theory Many slides attributable to: Erik Sudderth (UCI) Prof. Mike Hughes
More informationMachine Learning. Ensemble Methods. Manfred Huber
Machine Learning Ensemble Methods Manfred Huber 2015 1 Bias, Variance, Noise Classification errors have different sources Choice of hypothesis space and algorithm Training set Noise in the data The expected
More informationIntroduction to Machine Learning Fall 2017 Note 5. 1 Overview. 2 Metric
CS 189 Introduction to Machine Learning Fall 2017 Note 5 1 Overview Recall from our previous note that for a fixed input x, our measurement Y is a noisy measurement of the true underlying response f x):
More informationLinear classifiers: Logistic regression
Linear classifiers: Logistic regression STAT/CSE 416: Machine Learning Emily Fox University of Washington April 19, 2018 How confident is your prediction? The sushi & everything else were awesome! The
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationBias-Variance in Machine Learning
Bias-Variance in Machine Learning Bias-Variance: Outline Underfitting/overfitting: Why are complex hypotheses bad? Simple example of bias/variance Error as bias+variance for regression brief comments on
More informationy Xw 2 2 y Xw λ w 2 2
CS 189 Introduction to Machine Learning Spring 2018 Note 4 1 MLE and MAP for Regression (Part I) So far, we ve explored two approaches of the regression framework, Ordinary Least Squares and Ridge Regression:
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Hypothesis testing Machine Learning CSE546 Kevin Jamieson University of Washington October 30, 2018 2018 Kevin Jamieson 2 Anomaly detection You are
More informationAnnouncements Kevin Jamieson
Announcements My office hours TODAY 3:30 pm - 4:30 pm CSE 666 Poster Session - Pick one First poster session TODAY 4:30 pm - 7:30 pm CSE Atrium Second poster session December 12 4:30 pm - 7:30 pm CSE Atrium
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 1, 2011 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationMachine Learning. Model Selection and Validation. Fabio Vandin November 7, 2017
Machine Learning Model Selection and Validation Fabio Vandin November 7, 2017 1 Model Selection When we have to solve a machine learning task: there are different algorithms/classes algorithms have parameters
More informationMachine Learning. Regression. Manfred Huber
Machine Learning Regression Manfred Huber 2015 1 Regression Regression refers to supervised learning problems where the target output is one or more continuous values Continuous output values imply that
More informationCOMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)
COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless
More informationMachine Learning (CSE 446): Probabilistic Machine Learning
Machine Learning (CSE 446): Probabilistic Machine Learning oah Smith c 2017 University of Washington nasmith@cs.washington.edu ovember 1, 2017 1 / 24 Understanding MLE y 1 MLE π^ You can think of MLE as
More informationLinear Regression (continued)
Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression
More informationVariance Reduction and Ensemble Methods
Variance Reduction and Ensemble Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Last Time PAC learning Bias/variance tradeoff small hypothesis
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple
More informationLinear classifiers: Overfitting and regularization
Linear classifiers: Overfitting and regularization Emily Fox University of Washington January 25, 2017 Logistic regression recap 1 . Thus far, we focused on decision boundaries Score(x i ) = w 0 h 0 (x
More informationAnnouncements Kevin Jamieson
Announcements Project proposal due next week: Tuesday 10/24 Still looking for people to work on deep learning Phytolith project, join #phytolith slack channel 2017 Kevin Jamieson 1 Gradient Descent Machine
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION
COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:
More informationA Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007
Decision Trees, cont. Boosting Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 1 st, 2007 1 A Decision Stump 2 1 The final tree 3 Basic Decision Tree Building Summarized
More informationLearning Theory Continued
Learning Theory Continued Machine Learning CSE446 Carlos Guestrin University of Washington May 13, 2013 1 A simple setting n Classification N data points Finite number of possible hypothesis (e.g., dec.
More informationLearning with multiple models. Boosting.
CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models
More informationAn Introduction to Statistical Machine Learning - Theoretical Aspects -
An Introduction to Statistical Machine Learning - Theoretical Aspects - Samy Bengio bengio@idiap.ch Dalle Molle Institute for Perceptual Artificial Intelligence (IDIAP) CP 592, rue du Simplon 4 1920 Martigny,
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationRidge Regression: Regulating overfitting when using many features
Ridge Regression: Regulating overfitting when using man features Emil Fox Universit of Washington April 10, 2018 Training, true, & test error vs. model complexit Overfitting if: Error Model complexit 2
More informationMachine Learning CSE546
http://www.cs.washington.edu/education/courses/cse546/17au/ Machine Learning CSE546 Kevin Jamieson University of Washington September 28, 2017 1 You may also like ML uses past data to make personalized
More informationPoint Estimation. Maximum likelihood estimation for a binomial distribution. CSE 446: Machine Learning
Point Estimation Emily Fox University of Washington January 6, 2017 Maximum likelihood estimation for a binomial distribution 1 Your first consulting job A bored Seattle billionaire asks you a question:
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More informationCase Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!
Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 4, 017 1 Announcements:
More informationMachine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 20, 2012 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)
More informationMachine Learning Recitation 8 Oct 21, Oznur Tastan
Machine Learning 10601 Recitation 8 Oct 21, 2009 Oznur Tastan Outline Tree representation Brief information theory Learning decision trees Bagging Random forests Decision trees Non linear classifier Easy
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More informationDecision Tree Learning Lecture 2
Machine Learning Coms-4771 Decision Tree Learning Lecture 2 January 28, 2008 Two Types of Supervised Learning Problems (recap) Feature (input) space X, label (output) space Y. Unknown distribution D over
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 4, 2015 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationMachine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationModeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop
Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector
More informationMachine Learning Gaussian Naïve Bayes Big Picture
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 27, 2011 Today: Naïve Bayes Big Picture Logistic regression Gradient ascent Generative discriminative
More informationMachine Learning CSE546
https://www.cs.washington.edu/education/courses/cse546/18au/ Machine Learning CSE546 Kevin Jamieson University of Washington September 27, 2018 1 Traditional algorithms Social media mentions of Cats vs.
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationCSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression
CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html
More informationMethods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro
Methods and Criteria for Model Selection CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Introduce classifier evaluation criteria } Introduce Bias x Variance duality } Model Assessment }
More informationMachine Learning. Lecture 2: Linear regression. Feng Li. https://funglee.github.io
Machine Learning Lecture 2: Linear regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2017 Supervised Learning Regression: Predict
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationBayesian Linear Regression [DRAFT - In Progress]
Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory
More informationLinear Regression. CSL603 - Fall 2017 Narayanan C Krishnan
Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization
More informationLinear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan
Linear Regression CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis
More informationCSE 151 Machine Learning. Instructor: Kamalika Chaudhuri
CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Ensemble Learning How to combine multiple classifiers into a single one Works well if the classifiers are complementary This class: two types of
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 24) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu October 2, 24 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 24) October 2, 24 / 24 Outline Review
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 2, 2015 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)
More informationMachine Learning. Linear Models. Fabio Vandin October 10, 2017
Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w
More information12 Statistical Justifications; the Bias-Variance Decomposition
Statistical Justifications; the Bias-Variance Decomposition 65 12 Statistical Justifications; the Bias-Variance Decomposition STATISTICAL JUSTIFICATIONS FOR REGRESSION [So far, I ve talked about regression
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationIs the test error unbiased for these programs? 2017 Kevin Jamieson
Is the test error unbiased for these programs? 2017 Kevin Jamieson 1 Is the test error unbiased for this program? 2017 Kevin Jamieson 2 Simple Variable Selection LASSO: Sparse Regression Machine Learning
More informationMaximum Likelihood, Logistic Regression, and Stochastic Gradient Training
Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationReminders. Thought questions should be submitted on eclass. Please list the section related to the thought question
Linear regression Reminders Thought questions should be submitted on eclass Please list the section related to the thought question If it is a more general, open-ended question not exactly related to a
More informationMachine Learning And Applications: Supervised Learning-SVM
Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine
More informationESS2222. Lecture 3 Bias-Variance Trade-off
ESS2222 Lecture 3 Bias-Variance Trade-off Hosein Shahnas University of Toronto, Department of Earth Sciences, 1 Outline Bias-Variance Trade-off Overfitting & Regularization Ridge & Lasso Regression Nonlinear
More information6.867 Machine Learning
6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.
More informationGI01/M055: Supervised Learning
GI01/M055: Supervised Learning 1. Introduction to Supervised Learning October 5, 2009 John Shawe-Taylor 1 Course information 1. When: Mondays, 14:00 17:00 Where: Room 1.20, Engineering Building, Malet
More informationEvaluating Classifiers. Lecture 2 Instructor: Max Welling
Evaluating Classifiers Lecture 2 Instructor: Max Welling Evaluation of Results How do you report classification error? How certain are you about the error you claim? How do you compare two algorithms?
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationLecture 3: Introduction to Complexity Regularization
ECE90 Spring 2007 Statistical Learning Theory Instructor: R. Nowak Lecture 3: Introduction to Complexity Regularization We ended the previous lecture with a brief discussion of overfitting. Recall that,
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement
More informationSimple and Multiple Linear Regression
Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where
More informationData Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.
TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationAn Introduction to Statistical and Probabilistic Linear Models
An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning
More information