Linear Regression. Machine Learning CSE546 Kevin Jamieson University of Washington. Oct 2, Kevin Jamieson 1

Size: px
Start display at page:

Download "Linear Regression. Machine Learning CSE546 Kevin Jamieson University of Washington. Oct 2, Kevin Jamieson 1"

Transcription

1 Linear Regression Machine Learning CSE546 Kevin Jamieson University of Washington Oct 2,

2 The regression problem Given past sales data on zillow.com, predict: y = House sale price from x = {# sq. ft., zip code, date of sale, etc.} Training Data: {(x i,y i )} n x i 2 R d y i 2 R Sale Price # square feet 2

3 The regression problem Sale Price Given past sales data on zillow.com, predict: y = House sale price from x = {# sq. ft., zip code, date of sale, etc.} best linear fit # square feet Training Data: {(x i,y i )} n Hypothesis: linear y i x T i w Loss: least squares min w nx x i 2 R d y i 2 R y i x T i w 2 3

4 The regression problem in matrix notation y = y 1. y n bw LS = arg min w X = x T x T n nx y i x T i w 2 = arg min w (y Xw)T (y Xw) 4

5 The regression problem in matrix notation bw LS = arg min w Xw 2 2 = arg min w Xw)T (y Xw) Pwl 2Xtly Xu O xty XTXw_ If I exists then I Xix x'y 5

6 The regression problem in matrix notation bw LS = arg min y w Xw 2 2 =(X T X) 1 X T y What about an offset? bw LS, b b LS = arg min w,b nx y i (x T i w + b) 2 = arg min w,b y (Xw + 1b) 2 2 6

7 Dealing with an offset til bw LS, b b LS = arg min w,b y (Xw + 1b) 2 2 Pw 2 XT y Xu 1b O XT w X'Ib Pb 2 IT ly Hut 1b 0 I I n Ii LIX w t n b IT II Yi b a Isi t IX w 7

8 Dealing with an offset bw LS, b b LS = arg min w,b y (Xw + 1b) 2 2 X T X bw LS + b b LS X T 1 = X T y 1 T X bw LS + b b LS 1 T 1 = 1 T y I XwFwtx T µ 2xi mu O o If X T 1 = 0 (i.e., if each feature is mean-zero) then i bw LS =(X T X) 1 X T Y b bls = 1 nx y i n µ L 3 Given a new x predict w T X µ 15 8 i

9 The regression problem in matrix notation bw LS = arg min y w Xw 2 2 =(X T X) 1 X T y But why least squares? Consider y i = x T i w + i where i i.i.d. N (0, 2 ) P (y x, w, )= exp Y 9

10 Maximizing log-likelihood Maximize: Y n 1 log P (D w, ) = log( p 2 ) n e (y i x T i w)

11 MLE is LS under linear model bw LS = arg min w nx y i x T i w 2 bw MLE = arg max P (D w, ) if y i = x T w i.i.d. i w + i and i N (0, 2 ) bw LS = bw MLE =(X T X) 1 X T Y 11

12 Analysis of error Y = Xw + if y i = x T i w + i and i i.i.d. N (0, bw MLE =(X T X) 1 X T Y 2 ) xtxtxtfxw EET w TtxxfxE ELIE wki wyj Efcxixtxieetxcxixij c ELeet o cxixtxtieeetxcxixl t.ae

13 Analysis of error Y = Xw + if y i = x T i w + i and i i.i.d. N (0, bw MLE =(X T X) 1 X T Y 2 ) =(X T X) 1 X T (Xw + ) = w +(X T X) 1 X T Cov( bw MLE )=E[( bw E[ bw])( bw E[ bw]) T ]=(X T X) 1 bw MLE N (w, (X T X) 1 ) 13

14 The regression problem Sale Price Given past sales data on zillow.com, predict: y = House sale price from x = {# sq. ft., zip code, date of sale, etc.} best linear fit # square feet Training Data: {(x i,y i )} n Hypothesis: linear y i x T i w Loss: least squares min w nx x i 2 R d y i 2 R y i x T i w 2 14

15 The regression problem Sale Price Given past sales data on zillow.com, predict: y = House sale price from x = {# sq. ft., zip code, date of sale, etc.} Training Data: date of sale best linear fit {(x i,y i )} n Hypothesis: linear y i x T i w Loss: least squares min w nx x i 2 R d y i 2 R y i x T i w 2 15

16 The regression problem Training Data: {(x i,y i )} n Hypothesis: linear y i x T i w x i 2 R d y i 2 R Transformed data: Loss: least squares min w nx y i x T i w 2 16

17 The regression problem Training Data: {(x i,y i )} n Hypothesis: linear y i x T i w Loss: least squares min w nx x i 2 R d y i 2 R y i x T i w 2 Transformed data: h : R d! R p maps original features to a rich, possibly high-dimensional space h 1 (x) 6h 2 (x) 7 6 in d=1: h(x) = 6 4. h p (x) 7 5 = 6 4 for d>1, generate {u j } p j=1 Rd 1 h j (x) = 1 + exp(u T j x) h j (x) =(u T j x) 2 3 x x x p h j (x) = cos(u T j x) 17

18 a 7 51 is t x I s Ext tix next 0 Given x predict iuthlx he.la xk T xinhai L I ri Xix x'y

19 The regression problem Training Data: {(x i,y i )} n Hypothesis: linear y i x T i w Loss: least squares min w nx x i 2 R d y i 2 R y i x T i w 2 Transformed data: h(x) = Hypothesis: linear y i h(x i ) T w Loss: least squares min w nx 2 3 h 1 (x) h 2 (x) h p (x) w 2 R p y i h(x i ) T w 2 18

20 The regression problem Training Data: {(x i,y i )} n x i 2 R d y i 2 R Transformed data: h(x) = 2 3 h 1 (x) h 2 (x) h p (x) best linear fit Hypothesis: linear Sale Price y i h(x i ) T w Loss: least squares min w nx w 2 R p y i h(x i ) T w 2 date of sale 19

21 The regression problem Training Data: {(x i,y i )} n Sale Price date of sale x i 2 R d y i 2 R small p fit Transformed data: h(x) = Hypothesis: linear y i h(x i ) T w Loss: least squares min w nx 2 3 h 1 (x) h 2 (x) h p (x) w 2 R p y i h(x i ) T w 2 20

22 The regression problem Training Data: {(x i,y i )} n Sale Price date of sale x i 2 R d y i 2 R large p fit Transformed data: h(x) = Hypothesis: linear y i h(x i ) T w Loss: least squares min w nx 2 3 h 1 (x) h 2 (x) h p (x) w 2 R p y i h(x i ) T w 2 What s going on here? 21

23 Bias-Variance Tradeoff Machine Learning CSE546 Kevin Jamieson University of Washington Oct 5,

24 Statistical Learning P XY (X = x, Y = y) Goal: Predict Y given X Find function that minimizes E XY [(Y (X)) 2 ] ExLE kz x x Ux ME Eyfly c IX x E fit c l x E z Y c IX x 0 2 LELY Ix x 2C O c Z x E Y lx x

25 Statistical Learning P XY (X = x, Y = y) Goal: Predict Y given X Find function that minimizes i E XY [(Y (X)) 2 ] = E X he Y X [(Y (x)) 2 X = x] (x) = arg min c E Y X [(Y c) 2 X = x] =E Y X [Y X = x] Under LS loss, optimal predictor: (x) =E Y X [Y X = x]

26 Statistical Learning E XY [(Y (X)) 2 ] P XY (X = x, Y = y) y x 2017 Kevin Jamieson

27 Statistical Learning E XY [(Y (X)) 2 ] P XY (X = x, Y = y) P XY (Y = y X = x 0 ) y x 0 x 1 x P XY (Y = y X = x 1 )

28 Statistical Learning E XY [(Y (X)) 2 ] P XY (X = x, Y = y) Ideally, we want to find: (x) =E Y X [Y X = x] P XY (Y = y X = x 0 ) y x 0 x 1 x P XY (Y = y X = x 1 )

29 Statistical Learning P XY (X = x, Y = y) Ideally, we want to find: (x) =E Y X [Y X = x] y x

30 Statistical Learning P XY (X = x, Y = y) Ideally, we want to find: (x) =E Y X [Y X = x] But we only have samples: (x i,y i ) i.i.d. P XY for i =1,...,n y x

31 Statistical Learning P XY (X = x, Y = y) Ideally, we want to find: (x) =E Y X [Y X = x] y x But we only have samples: (x i,y i ) i.i.d. P XY bf = arg min f2f 1 n nx (y i f(x i )) 2 for i =1,...,n and are restricted to a function class (e.g., linear) so we compute:

32 Statistical Learning P XY (X = x, Y = y) Ideally, we want to find: (x) =E Y X [Y X = x] y x But we only have samples: (x i,y i ) i.i.d. P XY bf = arg min f2f 1 n nx (y i f(x i )) 2 for i =1,...,n and are restricted to a function class (e.g., linear) so we compute: We care about future predictions: E XY [(Y b f(x)) 2 ]

33 Statistical Learning P XY (X = x, Y = y) Ideally, we want to find: (x) =E Y X [Y X = x] y x But we only have samples: (x i,y i ) i.i.d. P XY bf = arg min f2f 1 n nx (y i f(x i )) 2 for i =1,...,n and are restricted to a function class (e.g., linear) so we compute: Each draw D = {(x i,y i )} n results in di erent b f

34 Statistical Learning P XY (X = x, Y = y) Ideally, we want to find: (x) =E Y X [Y X = x] y E D [ b f(x)] x But we only have samples: (x i,y i ) i.i.d. P XY bf = arg min f2f 1 n nx (y i f(x i )) 2 for i =1,...,n and are restricted to a function class (e.g., linear) so we compute: Each draw D = {(x i,y i )} n results in di erent b f

35 Bias-Variance Tradeoff (x) =E Y X [Y X = x] bf = arg min f2f 1 n nx (y i f(x i )) 2 E Y X [E D [(Y b fd (x)) 2 ] X = x] =E Y X [E D [(Y (x)+ (x) b fd (x)) 2 ] X = x] E y Eo Y cal Foca ax ElmYx Ey flhcxdyx x t2 ExDD O teokuxh Ey Ct Zan lx c Ey HX x 26

36 Bias-Variance Tradeoff (x) =E Y X [Y X = x] bf = arg min f2f 1 n nx (y i f(x i )) 2 E Y X [E D [(Y fd b (x)) 2 ] X = x] =E Y X [E D [(Y (x)+ (x) fd b (x)) 2 ] X = x] =E Y X he D [(Y (x)) 2 + 2(Y (x))( (x) fd b (x)) i +( (x) fd b (x)) 2 ] X = x =E Y X [(Y (x)) 2 X = x]+e D [( (x) fd b (x)) 2 ] irreducible error Caused by stochastic label noise learning error Caused by either using too simple of a model or not enough data to learn the model accurately

37 Bias-Variance Tradeoff (x) =E Y X [Y X = x] bf = arg min f2f 1 n nx (y i f(x i )) 2 E D [( (x) b fd (x)) 2 ]=E D [( (x) E D [ b f D (x)] + E D [ b f D (x)] b fd (x)) 2 ] Epfl Cas Eoff.CN Jt2IEIL EEDKEfEixD IcatEolEoLf.cn Foix

38 Bias-Variance Tradeoff (x) =E Y X [Y X = x] bf = arg min f2f 1 n nx (y i f(x i )) 2 E D [( (x) fd b (x)) 2 ]=E D [( (x) E D [ f b D (x)] + E D [ f b D (x)] fd b (x)) 2 ] =E D [( (x) E D [ f b D (x)]) 2 + 2( (x) E D [ f b D (x)])(e D [ f b D (x)] fd b (x)) +(E D [ f b D (x)] fd b (x)) 2 ] =( (x) E D [ f b D (x)]) 2 + E D [(E D [ f b D (x)] fd b (x)) 2 ] biased squared variance

39 Bias-Variance Tradeoff E Y X [E D [(Y b fd (x)) 2 ] X = x] =E Y X [(Y (x)) 2 X = x] irreducible error +( (x) E D [ b f D (x)]) 2 + E D [(E D [ b f D (x)] b fd (x)) 2 ] biased squared variance

40 Example: Linear LS Y = Xw + if y i = x T i w + i and i i.i.d. N (0, 2 ) bw MLE =(X T X) 1 X T Y = w +(X T X) 1 X T (x) =E Y X [Y X = x] ELxtwtE X x xtw bf D (x) = Ttx E LEAD Ellen litxxi TIE wtx3 utx 39

41 Example: Linear LS Y = Xw + if y i = x T i w + i and i i.i.d. N (0, 2 ) bw MLE =(X T X) 1 X T Y = w +(X T X) 1 X T (x) =E Y X [Y X = x] bf D (x) = bw T x = w T x + T X(X T X) 1 x In E XY [(Y (x)) 2 X = x] = 2 irreducible error ( (x) E D [ b f D (x)]) 2 +=0 biased squared 40

42 Example: Linear LS Y = Xw + if y i = x T i w + i and i i.i.d. N (0, 2 ) bw MLE =(X T X) 1 X T Y = w +(X T X) 1 X T bf D (x) = bw T x = w T x + T X(X T X) 1 x + E D [(E D [ b f D (x)] b fd (x)) 2 ] = E xtfxtxyxiexcxtxi.be variance Ex Lxi Xix 5 x o 41

43 Example: Linear LS Y = Xw + if y i = x T i w + i and i i.i.d. N (0, 2 ) bw MLE =(X T X) 1 X T Y = w +(X T X) 1 X T bf D (x) = bw T x = w T x + T X(X T X) 1 x + E D [(E D [ b f D (x)] b fd (x)) 2 ] = E D [x T (X T X) 1 X T T X(X T X) 1 x] X T X = variance nx x i x T i = 2 x T (X T X) 1 x = 2 Trace((X T X) 1 xx T ) n large! n = E[XX T ], X P X + E D [(E D [ f b E D (x)] fd b (x)) 2 2 X=x ] = n E X[Trace( 1 XX T )] = d 2 n e 42

44 Example: Linear LS Y = Xw + if y i = x T i w + i and i i.i.d. N (0, 2 ) bw MLE =(X T X) 1 X T Y = w +(X T X) 1 X T (x) =E Y X [Y X = x] bf D (x) = bw T x = w T x + T X(X T X) 1 x E XY [(Y (x)) 2 X = x] = 2 irreducible error + E D [(E D [ f b E D (x)] fd b (x)) 2 X=x ] = d 2 n variance ( (x) E D [ b f D (x)]) 2 +=0 biased squared 43

Linear Regression. Machine Learning CSE546 Kevin Jamieson University of Washington. Oct 5, Kevin Jamieson 1

Linear Regression. Machine Learning CSE546 Kevin Jamieson University of Washington. Oct 5, Kevin Jamieson 1 Linear Regression Machine Learning CSE546 Kevin Jamieson University of Washington Oct 5, 2017 1 The regression problem Given past sales data on zillow.com, predict: y = House sale price from x = {# sq.

More information

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF Accouncements You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF Please do not zip these files and submit (unless there are >5 files) 1 Bayesian Methods Machine Learning

More information

cxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c

cxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c Warm up D cai.yo.ie p IExrL9CxsYD Sglx.Ddl f E Luo fhlexi.si dbll Fix any a, b, c > 0. 1. What is the x 2 R that minimizes ax 2 + bx + c x a b Ta OH 2 ax 16 0 x 1 Za fhkxiiso3ii draulx.h dp.d 2. What is

More information

Classification Logistic Regression

Classification Logistic Regression O due Thursday µtwl Classification Logistic Regression Machine Learning CSE546 Kevin Jamieson University of Washington October 16, 2016 1 THUS FAR, REGRESSION: PREDICT A CONTINUOUS VALUE GIVEN SOME INPUTS

More information

Warm up: risk prediction with logistic regression

Warm up: risk prediction with logistic regression Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T

More information

Announcements. Proposals graded

Announcements. Proposals graded Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:

More information

Classification Logistic Regression

Classification Logistic Regression Classification Logistic Regression Machine Learning CSE546 Kevin Jamieson University of Washington October 16, 2016 1 THUS FAR, REGRESSION: PREDICT A CONTINUOUS VALUE GIVEN SOME INPUTS 2 Weather prediction

More information

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2 Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal

More information

Is the test error unbiased for these programs?

Is the test error unbiased for these programs? Is the test error unbiased for these programs? Xtrain avg N o Preprocessing by de meaning using whole TEST set 2017 Kevin Jamieson 1 Is the test error unbiased for this program? e Stott see non for f x

More information

Warm up. Regrade requests submitted directly in Gradescope, do not instructors.

Warm up. Regrade requests submitted directly in Gradescope, do not  instructors. Warm up Regrade requests submitted directly in Gradescope, do not email instructors. 1 float in NumPy = 8 bytes 10 6 2 20 bytes = 1 MB 10 9 2 30 bytes = 1 GB For each block compute the memory required

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Machine Learning. Linear Models. Fabio Vandin October 10, 2017

Machine Learning. Linear Models. Fabio Vandin October 10, 2017 Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w

More information

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables? Linear Regression Machine Learning CSE546 Sham Kakade University of Washington Oct 4, 2016 1 What about continuous variables? Billionaire says: If I am measuring a continuous variable, what can you do

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Machine Learning. Regression. Manfred Huber

Machine Learning. Regression. Manfred Huber Machine Learning Regression Manfred Huber 2015 1 Regression Regression refers to supervised learning problems where the target output is one or more continuous values Continuous output values imply that

More information

Machine learning - HT Maximum Likelihood

Machine learning - HT Maximum Likelihood Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce

More information

Gaussians Linear Regression Bias-Variance Tradeoff

Gaussians Linear Regression Bias-Variance Tradeoff Readings listed in class website Gaussians Linear Regression Bias-Variance Tradeoff Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 22 nd, 2007 Maximum Likelihood Estimation

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables? Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question Linear regression Reminders Thought questions should be submitted on eclass Please list the section related to the thought question If it is a more general, open-ended question not exactly related to a

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013 Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you

More information

Announcements Kevin Jamieson

Announcements Kevin Jamieson Announcements Project proposal due next week: Tuesday 10/24 Still looking for people to work on deep learning Phytolith project, join #phytolith slack channel 2017 Kevin Jamieson 1 Gradient Descent Machine

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

An Introduction to Statistical Machine Learning - Theoretical Aspects -

An Introduction to Statistical Machine Learning - Theoretical Aspects - An Introduction to Statistical Machine Learning - Theoretical Aspects - Samy Bengio bengio@idiap.ch Dalle Molle Institute for Perceptual Artificial Intelligence (IDIAP) CP 592, rue du Simplon 4 1920 Martigny,

More information

Bayesian Linear Regression [DRAFT - In Progress]

Bayesian Linear Regression [DRAFT - In Progress] Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory

More information

Regression with Numerical Optimization. Logistic

Regression with Numerical Optimization. Logistic CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204

More information

Machine Learning. Lecture 2: Linear regression. Feng Li. https://funglee.github.io

Machine Learning. Lecture 2: Linear regression. Feng Li. https://funglee.github.io Machine Learning Lecture 2: Linear regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2017 Supervised Learning Regression: Predict

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods 2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University

More information

Machine Learning

Machine Learning Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 1, 2011 Today: Generative discriminative classifiers Linear regression Decomposition of error into

More information

Introduction to Machine Learning Fall 2017 Note 5. 1 Overview. 2 Metric

Introduction to Machine Learning Fall 2017 Note 5. 1 Overview. 2 Metric CS 189 Introduction to Machine Learning Fall 2017 Note 5 1 Overview Recall from our previous note that for a fixed input x, our measurement Y is a noisy measurement of the true underlying response f x):

More information

CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015

CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015 CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015 Luke ZeElemoyer Slides adapted from Carlos Guestrin Predic5on of con5nuous variables Billionaire says: Wait, that s not what

More information

Recap from previous lecture

Recap from previous lecture Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience

More information

Machine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 4: SVM I Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d. from distribution

More information

MLCC 2017 Regularization Networks I: Linear Models

MLCC 2017 Regularization Networks I: Linear Models MLCC 2017 Regularization Networks I: Linear Models Lorenzo Rosasco UNIGE-MIT-IIT June 27, 2017 About this class We introduce a class of learning algorithms based on Tikhonov regularization We study computational

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the

More information

Announcements Kevin Jamieson

Announcements Kevin Jamieson Announcements My office hours TODAY 3:30 pm - 4:30 pm CSE 666 Poster Session - Pick one First poster session TODAY 4:30 pm - 7:30 pm CSE Atrium Second poster session December 12 4:30 pm - 7:30 pm CSE Atrium

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Linear regression COMS 4771

Linear regression COMS 4771 Linear regression COMS 4771 1. Old Faithful and prediction functions Prediction problem: Old Faithful geyser (Yellowstone) Task: Predict time of next eruption. 1 / 40 Statistical model for time between

More information

Machine Learning (CSE 446): Probabilistic Machine Learning

Machine Learning (CSE 446): Probabilistic Machine Learning Machine Learning (CSE 446): Probabilistic Machine Learning oah Smith c 2017 University of Washington nasmith@cs.washington.edu ovember 1, 2017 1 / 24 Understanding MLE y 1 MLE π^ You can think of MLE as

More information

Announcements. Proposals graded

Announcements. Proposals graded Announcements Proposals graded Kevin Jamieson 2018 1 Hypothesis testing Machine Learning CSE546 Kevin Jamieson University of Washington October 30, 2018 2018 Kevin Jamieson 2 Anomaly detection You are

More information

Linear Regression (continued)

Linear Regression (continued) Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression

More information

Introduction to Machine Learning Spring 2018 Note 18

Introduction to Machine Learning Spring 2018 Note 18 CS 189 Introduction to Machine Learning Spring 2018 Note 18 1 Gaussian Discriminant Analysis Recall the idea of generative models: we classify an arbitrary datapoint x with the class label that maximizes

More information

Probability and Statistical Decision Theory

Probability and Statistical Decision Theory Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Probability and Statistical Decision Theory Many slides attributable to: Erik Sudderth (UCI) Prof. Mike Hughes

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

Chapter 1. Linear Regression with One Predictor Variable

Chapter 1. Linear Regression with One Predictor Variable Chapter 1. Linear Regression with One Predictor Variable 1.1 Statistical Relation Between Two Variables To motivate statistical relationships, let us consider a mathematical relation between two mathematical

More information

Variance Reduction and Ensemble Methods

Variance Reduction and Ensemble Methods Variance Reduction and Ensemble Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Last Time PAC learning Bias/variance tradeoff small hypothesis

More information

Stochastic Gradient Descent

Stochastic Gradient Descent Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular

More information

Linear Regression and Discrimination

Linear Regression and Discrimination Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian

More information

Understanding Generalization Error: Bounds and Decompositions

Understanding Generalization Error: Bounds and Decompositions CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the

More information

Machine Learning. Linear Models. Fabio Vandin October 10, 2017

Machine Learning. Linear Models. Fabio Vandin October 10, 2017 Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w

More information

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Jan Drchal Czech Technical University in Prague Faculty of Electrical Engineering Department of Computer Science Topics covered

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 4, 2015 Today: Generative discriminative classifiers Linear regression Decomposition of error into

More information

Statistical Machine Learning Hilary Term 2018

Statistical Machine Learning Hilary Term 2018 Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

11. Learning graphical models

11. Learning graphical models Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical

More information

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff IEOR 165 Lecture 7 Bias-Variance Tradeoff 1 Bias-Variance Tradeoff Consider the case of parametric regression with β R, and suppose we would like to analyze the error of the estimate ˆβ in comparison to

More information

Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012

Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012 Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 20, 2012 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)

More information

y Xw 2 2 y Xw λ w 2 2

y Xw 2 2 y Xw λ w 2 2 CS 189 Introduction to Machine Learning Spring 2018 Note 4 1 MLE and MAP for Regression (Part I) So far, we ve explored two approaches of the regression framework, Ordinary Least Squares and Ridge Regression:

More information

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 24) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu October 2, 24 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 24) October 2, 24 / 24 Outline Review

More information

The key is to understand an es0mator as a random variable

The key is to understand an es0mator as a random variable The key is to understand an es0mator as a random variable IID P(X,Y) y 1 y 2 y n x 1 x 2 x n? x n+1 minimize w2r d nx i=1 y i x i > w 2 + kwk 2 Training error Regulariza0on (ridge) term (λ: regulariza0on

More information

6.867 Machine learning: lecture 2. Tommi S. Jaakkola MIT CSAIL

6.867 Machine learning: lecture 2. Tommi S. Jaakkola MIT CSAIL 6.867 Machine learning: lecture 2 Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learning problem hypothesis class, estimation algorithm loss and estimation criterion sampling, empirical and

More information

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12 Ensemble Methods Charles Sutton Data Mining and Exploration Spring 2012 Bias and Variance Consider a regression problem Y = f(x)+ N(0, 2 ) With an estimate regression function ˆf, e.g., ˆf(x) =w > x Suppose

More information

Lecture 13: Data Modelling and Distributions. Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1

Lecture 13: Data Modelling and Distributions. Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1 Lecture 13: Data Modelling and Distributions Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1 Why data distributions? It is a well established fact that many naturally occurring

More information

Modern Methods of Data Analysis - WS 07/08

Modern Methods of Data Analysis - WS 07/08 Modern Methods of Data Analysis Lecture VIc (19.11.07) Contents: Maximum Likelihood Fit Maximum Likelihood (I) Assume N measurements of a random variable Assume them to be independent and distributed according

More information

Classification objectives COMS 4771

Classification objectives COMS 4771 Classification objectives COMS 4771 1. Recap: binary classification Scoring functions Consider binary classification problems with Y = { 1, +1}. 1 / 22 Scoring functions Consider binary classification

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Learning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin

Learning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin Learning Theory Machine Learning CSE546 Carlos Guestrin University of Washington November 25, 2013 Carlos Guestrin 2005-2013 1 What now n We have explored many ways of learning from data n But How good

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Machine Learning Gaussian Naïve Bayes Big Picture

Machine Learning Gaussian Naïve Bayes Big Picture Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 27, 2011 Today: Naïve Bayes Big Picture Logistic regression Gradient ascent Generative discriminative

More information

Hastie, Tibshirani & Friedman: Elements of Statistical Learning Chapter Model Assessment and Selection. CN700/March 4, 2008.

Hastie, Tibshirani & Friedman: Elements of Statistical Learning Chapter Model Assessment and Selection. CN700/March 4, 2008. Hastie, Tibshirani & Friedman: Elements of Statistical Learning Chapter 7.1-7.9 Model Assessment and Selection CN700/March 4, 2008 Satyavarta sat@cns.bu.edu Auditory Neuroscience Laboratory, Department

More information

Bias-Variance Tradeoff

Bias-Variance Tradeoff What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

Lecture 2 - Learning Binary & Multi-class Classifiers from Labelled Training Data

Lecture 2 - Learning Binary & Multi-class Classifiers from Labelled Training Data Lecture 2 - Learning Binary & Multi-class Classifiers from Labelled Training Data DD2424 March 23, 2017 Binary classification problem given labelled training data Have labelled training examples? Given

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II 1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector

More information

[Read Ch. 5] [Recommended exercises: 5.2, 5.3, 5.4]

[Read Ch. 5] [Recommended exercises: 5.2, 5.3, 5.4] Evaluating Hypotheses [Read Ch. 5] [Recommended exercises: 5.2, 5.3, 5.4] Sample error, true error Condence intervals for observed hypothesis error Estimators Binomial distribution, Normal distribution,

More information

Classification. Sandro Cumani. Politecnico di Torino

Classification. Sandro Cumani. Politecnico di Torino Politecnico di Torino Outline Generative model: Gaussian classifier (Linear) discriminative model: logistic regression (Non linear) discriminative model: neural networks Gaussian Classifier We want to

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated

More information

Announcements. stuff stat 538 Zaid Hardhaoui. Statistics. cry. spring. g fa. inference. VC dimension covering

Announcements. stuff stat 538 Zaid Hardhaoui. Statistics. cry. spring. g fa. inference. VC dimension covering Announcements spring Convex Optimization next quarter ML stuff EE 578 Margam FaZe CS 547 Tim Althoff Modeling how to formulate real world problems as convex optimization Data science constrained optimization

More information

1 Correlation between an independent variable and the error

1 Correlation between an independent variable and the error Chapter 7 outline, Econometrics Instrumental variables and model estimation 1 Correlation between an independent variable and the error Recall that one of the assumptions that we make when proving the

More information

Chapter 3: Maximum Likelihood Theory

Chapter 3: Maximum Likelihood Theory Chapter 3: Maximum Likelihood Theory Florian Pelgrin HEC September-December, 2010 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 1 / 40 1 Introduction Example 2 Maximum likelihood

More information

Machine Learning Basics: Maximum Likelihood Estimation

Machine Learning Basics: Maximum Likelihood Estimation Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

GI01/M055: Supervised Learning

GI01/M055: Supervised Learning GI01/M055: Supervised Learning 1. Introduction to Supervised Learning October 5, 2009 John Shawe-Taylor 1 Course information 1. When: Mondays, 14:00 17:00 Where: Room 1.20, Engineering Building, Malet

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

+E x, y ρ. (fw (x) y) 2] (4)

+E x, y ρ. (fw (x) y) 2] (4) Bias-Variance Analysis Let X be a set (or space) of objects and let ρ be a fixed probabiity distribution (or density) on X R. In other worlds ρ is a probability density on pairs x, y with x X and y R.

More information

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Ensemble Learning How to combine multiple classifiers into a single one Works well if the classifiers are complementary This class: two types of

More information

Comments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert

Comments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert Logistic regression Comments Mini-review and feedback These are equivalent: x > w = w > x Clarification: this course is about getting you to be able to think as a machine learning expert There has to be

More information

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression BSTT523: Kutner et al., Chapter 1 1 Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression Introduction: Functional relation between

More information

Evaluating Classifiers. Lecture 2 Instructor: Max Welling

Evaluating Classifiers. Lecture 2 Instructor: Max Welling Evaluating Classifiers Lecture 2 Instructor: Max Welling Evaluation of Results How do you report classification error? How certain are you about the error you claim? How do you compare two algorithms?

More information