Linear Regression. Machine Learning CSE546 Kevin Jamieson University of Washington. Oct 2, Kevin Jamieson 1
|
|
- Chad Bailey
- 5 years ago
- Views:
Transcription
1 Linear Regression Machine Learning CSE546 Kevin Jamieson University of Washington Oct 2,
2 The regression problem Given past sales data on zillow.com, predict: y = House sale price from x = {# sq. ft., zip code, date of sale, etc.} Training Data: {(x i,y i )} n x i 2 R d y i 2 R Sale Price # square feet 2
3 The regression problem Sale Price Given past sales data on zillow.com, predict: y = House sale price from x = {# sq. ft., zip code, date of sale, etc.} best linear fit # square feet Training Data: {(x i,y i )} n Hypothesis: linear y i x T i w Loss: least squares min w nx x i 2 R d y i 2 R y i x T i w 2 3
4 The regression problem in matrix notation y = y 1. y n bw LS = arg min w X = x T x T n nx y i x T i w 2 = arg min w (y Xw)T (y Xw) 4
5 The regression problem in matrix notation bw LS = arg min w Xw 2 2 = arg min w Xw)T (y Xw) Pwl 2Xtly Xu O xty XTXw_ If I exists then I Xix x'y 5
6 The regression problem in matrix notation bw LS = arg min y w Xw 2 2 =(X T X) 1 X T y What about an offset? bw LS, b b LS = arg min w,b nx y i (x T i w + b) 2 = arg min w,b y (Xw + 1b) 2 2 6
7 Dealing with an offset til bw LS, b b LS = arg min w,b y (Xw + 1b) 2 2 Pw 2 XT y Xu 1b O XT w X'Ib Pb 2 IT ly Hut 1b 0 I I n Ii LIX w t n b IT II Yi b a Isi t IX w 7
8 Dealing with an offset bw LS, b b LS = arg min w,b y (Xw + 1b) 2 2 X T X bw LS + b b LS X T 1 = X T y 1 T X bw LS + b b LS 1 T 1 = 1 T y I XwFwtx T µ 2xi mu O o If X T 1 = 0 (i.e., if each feature is mean-zero) then i bw LS =(X T X) 1 X T Y b bls = 1 nx y i n µ L 3 Given a new x predict w T X µ 15 8 i
9 The regression problem in matrix notation bw LS = arg min y w Xw 2 2 =(X T X) 1 X T y But why least squares? Consider y i = x T i w + i where i i.i.d. N (0, 2 ) P (y x, w, )= exp Y 9
10 Maximizing log-likelihood Maximize: Y n 1 log P (D w, ) = log( p 2 ) n e (y i x T i w)
11 MLE is LS under linear model bw LS = arg min w nx y i x T i w 2 bw MLE = arg max P (D w, ) if y i = x T w i.i.d. i w + i and i N (0, 2 ) bw LS = bw MLE =(X T X) 1 X T Y 11
12 Analysis of error Y = Xw + if y i = x T i w + i and i i.i.d. N (0, bw MLE =(X T X) 1 X T Y 2 ) xtxtxtfxw EET w TtxxfxE ELIE wki wyj Efcxixtxieetxcxixij c ELeet o cxixtxtieeetxcxixl t.ae
13 Analysis of error Y = Xw + if y i = x T i w + i and i i.i.d. N (0, bw MLE =(X T X) 1 X T Y 2 ) =(X T X) 1 X T (Xw + ) = w +(X T X) 1 X T Cov( bw MLE )=E[( bw E[ bw])( bw E[ bw]) T ]=(X T X) 1 bw MLE N (w, (X T X) 1 ) 13
14 The regression problem Sale Price Given past sales data on zillow.com, predict: y = House sale price from x = {# sq. ft., zip code, date of sale, etc.} best linear fit # square feet Training Data: {(x i,y i )} n Hypothesis: linear y i x T i w Loss: least squares min w nx x i 2 R d y i 2 R y i x T i w 2 14
15 The regression problem Sale Price Given past sales data on zillow.com, predict: y = House sale price from x = {# sq. ft., zip code, date of sale, etc.} Training Data: date of sale best linear fit {(x i,y i )} n Hypothesis: linear y i x T i w Loss: least squares min w nx x i 2 R d y i 2 R y i x T i w 2 15
16 The regression problem Training Data: {(x i,y i )} n Hypothesis: linear y i x T i w x i 2 R d y i 2 R Transformed data: Loss: least squares min w nx y i x T i w 2 16
17 The regression problem Training Data: {(x i,y i )} n Hypothesis: linear y i x T i w Loss: least squares min w nx x i 2 R d y i 2 R y i x T i w 2 Transformed data: h : R d! R p maps original features to a rich, possibly high-dimensional space h 1 (x) 6h 2 (x) 7 6 in d=1: h(x) = 6 4. h p (x) 7 5 = 6 4 for d>1, generate {u j } p j=1 Rd 1 h j (x) = 1 + exp(u T j x) h j (x) =(u T j x) 2 3 x x x p h j (x) = cos(u T j x) 17
18 a 7 51 is t x I s Ext tix next 0 Given x predict iuthlx he.la xk T xinhai L I ri Xix x'y
19 The regression problem Training Data: {(x i,y i )} n Hypothesis: linear y i x T i w Loss: least squares min w nx x i 2 R d y i 2 R y i x T i w 2 Transformed data: h(x) = Hypothesis: linear y i h(x i ) T w Loss: least squares min w nx 2 3 h 1 (x) h 2 (x) h p (x) w 2 R p y i h(x i ) T w 2 18
20 The regression problem Training Data: {(x i,y i )} n x i 2 R d y i 2 R Transformed data: h(x) = 2 3 h 1 (x) h 2 (x) h p (x) best linear fit Hypothesis: linear Sale Price y i h(x i ) T w Loss: least squares min w nx w 2 R p y i h(x i ) T w 2 date of sale 19
21 The regression problem Training Data: {(x i,y i )} n Sale Price date of sale x i 2 R d y i 2 R small p fit Transformed data: h(x) = Hypothesis: linear y i h(x i ) T w Loss: least squares min w nx 2 3 h 1 (x) h 2 (x) h p (x) w 2 R p y i h(x i ) T w 2 20
22 The regression problem Training Data: {(x i,y i )} n Sale Price date of sale x i 2 R d y i 2 R large p fit Transformed data: h(x) = Hypothesis: linear y i h(x i ) T w Loss: least squares min w nx 2 3 h 1 (x) h 2 (x) h p (x) w 2 R p y i h(x i ) T w 2 What s going on here? 21
23 Bias-Variance Tradeoff Machine Learning CSE546 Kevin Jamieson University of Washington Oct 5,
24 Statistical Learning P XY (X = x, Y = y) Goal: Predict Y given X Find function that minimizes E XY [(Y (X)) 2 ] ExLE kz x x Ux ME Eyfly c IX x E fit c l x E z Y c IX x 0 2 LELY Ix x 2C O c Z x E Y lx x
25 Statistical Learning P XY (X = x, Y = y) Goal: Predict Y given X Find function that minimizes i E XY [(Y (X)) 2 ] = E X he Y X [(Y (x)) 2 X = x] (x) = arg min c E Y X [(Y c) 2 X = x] =E Y X [Y X = x] Under LS loss, optimal predictor: (x) =E Y X [Y X = x]
26 Statistical Learning E XY [(Y (X)) 2 ] P XY (X = x, Y = y) y x 2017 Kevin Jamieson
27 Statistical Learning E XY [(Y (X)) 2 ] P XY (X = x, Y = y) P XY (Y = y X = x 0 ) y x 0 x 1 x P XY (Y = y X = x 1 )
28 Statistical Learning E XY [(Y (X)) 2 ] P XY (X = x, Y = y) Ideally, we want to find: (x) =E Y X [Y X = x] P XY (Y = y X = x 0 ) y x 0 x 1 x P XY (Y = y X = x 1 )
29 Statistical Learning P XY (X = x, Y = y) Ideally, we want to find: (x) =E Y X [Y X = x] y x
30 Statistical Learning P XY (X = x, Y = y) Ideally, we want to find: (x) =E Y X [Y X = x] But we only have samples: (x i,y i ) i.i.d. P XY for i =1,...,n y x
31 Statistical Learning P XY (X = x, Y = y) Ideally, we want to find: (x) =E Y X [Y X = x] y x But we only have samples: (x i,y i ) i.i.d. P XY bf = arg min f2f 1 n nx (y i f(x i )) 2 for i =1,...,n and are restricted to a function class (e.g., linear) so we compute:
32 Statistical Learning P XY (X = x, Y = y) Ideally, we want to find: (x) =E Y X [Y X = x] y x But we only have samples: (x i,y i ) i.i.d. P XY bf = arg min f2f 1 n nx (y i f(x i )) 2 for i =1,...,n and are restricted to a function class (e.g., linear) so we compute: We care about future predictions: E XY [(Y b f(x)) 2 ]
33 Statistical Learning P XY (X = x, Y = y) Ideally, we want to find: (x) =E Y X [Y X = x] y x But we only have samples: (x i,y i ) i.i.d. P XY bf = arg min f2f 1 n nx (y i f(x i )) 2 for i =1,...,n and are restricted to a function class (e.g., linear) so we compute: Each draw D = {(x i,y i )} n results in di erent b f
34 Statistical Learning P XY (X = x, Y = y) Ideally, we want to find: (x) =E Y X [Y X = x] y E D [ b f(x)] x But we only have samples: (x i,y i ) i.i.d. P XY bf = arg min f2f 1 n nx (y i f(x i )) 2 for i =1,...,n and are restricted to a function class (e.g., linear) so we compute: Each draw D = {(x i,y i )} n results in di erent b f
35 Bias-Variance Tradeoff (x) =E Y X [Y X = x] bf = arg min f2f 1 n nx (y i f(x i )) 2 E Y X [E D [(Y b fd (x)) 2 ] X = x] =E Y X [E D [(Y (x)+ (x) b fd (x)) 2 ] X = x] E y Eo Y cal Foca ax ElmYx Ey flhcxdyx x t2 ExDD O teokuxh Ey Ct Zan lx c Ey HX x 26
36 Bias-Variance Tradeoff (x) =E Y X [Y X = x] bf = arg min f2f 1 n nx (y i f(x i )) 2 E Y X [E D [(Y fd b (x)) 2 ] X = x] =E Y X [E D [(Y (x)+ (x) fd b (x)) 2 ] X = x] =E Y X he D [(Y (x)) 2 + 2(Y (x))( (x) fd b (x)) i +( (x) fd b (x)) 2 ] X = x =E Y X [(Y (x)) 2 X = x]+e D [( (x) fd b (x)) 2 ] irreducible error Caused by stochastic label noise learning error Caused by either using too simple of a model or not enough data to learn the model accurately
37 Bias-Variance Tradeoff (x) =E Y X [Y X = x] bf = arg min f2f 1 n nx (y i f(x i )) 2 E D [( (x) b fd (x)) 2 ]=E D [( (x) E D [ b f D (x)] + E D [ b f D (x)] b fd (x)) 2 ] Epfl Cas Eoff.CN Jt2IEIL EEDKEfEixD IcatEolEoLf.cn Foix
38 Bias-Variance Tradeoff (x) =E Y X [Y X = x] bf = arg min f2f 1 n nx (y i f(x i )) 2 E D [( (x) fd b (x)) 2 ]=E D [( (x) E D [ f b D (x)] + E D [ f b D (x)] fd b (x)) 2 ] =E D [( (x) E D [ f b D (x)]) 2 + 2( (x) E D [ f b D (x)])(e D [ f b D (x)] fd b (x)) +(E D [ f b D (x)] fd b (x)) 2 ] =( (x) E D [ f b D (x)]) 2 + E D [(E D [ f b D (x)] fd b (x)) 2 ] biased squared variance
39 Bias-Variance Tradeoff E Y X [E D [(Y b fd (x)) 2 ] X = x] =E Y X [(Y (x)) 2 X = x] irreducible error +( (x) E D [ b f D (x)]) 2 + E D [(E D [ b f D (x)] b fd (x)) 2 ] biased squared variance
40 Example: Linear LS Y = Xw + if y i = x T i w + i and i i.i.d. N (0, 2 ) bw MLE =(X T X) 1 X T Y = w +(X T X) 1 X T (x) =E Y X [Y X = x] ELxtwtE X x xtw bf D (x) = Ttx E LEAD Ellen litxxi TIE wtx3 utx 39
41 Example: Linear LS Y = Xw + if y i = x T i w + i and i i.i.d. N (0, 2 ) bw MLE =(X T X) 1 X T Y = w +(X T X) 1 X T (x) =E Y X [Y X = x] bf D (x) = bw T x = w T x + T X(X T X) 1 x In E XY [(Y (x)) 2 X = x] = 2 irreducible error ( (x) E D [ b f D (x)]) 2 +=0 biased squared 40
42 Example: Linear LS Y = Xw + if y i = x T i w + i and i i.i.d. N (0, 2 ) bw MLE =(X T X) 1 X T Y = w +(X T X) 1 X T bf D (x) = bw T x = w T x + T X(X T X) 1 x + E D [(E D [ b f D (x)] b fd (x)) 2 ] = E xtfxtxyxiexcxtxi.be variance Ex Lxi Xix 5 x o 41
43 Example: Linear LS Y = Xw + if y i = x T i w + i and i i.i.d. N (0, 2 ) bw MLE =(X T X) 1 X T Y = w +(X T X) 1 X T bf D (x) = bw T x = w T x + T X(X T X) 1 x + E D [(E D [ b f D (x)] b fd (x)) 2 ] = E D [x T (X T X) 1 X T T X(X T X) 1 x] X T X = variance nx x i x T i = 2 x T (X T X) 1 x = 2 Trace((X T X) 1 xx T ) n large! n = E[XX T ], X P X + E D [(E D [ f b E D (x)] fd b (x)) 2 2 X=x ] = n E X[Trace( 1 XX T )] = d 2 n e 42
44 Example: Linear LS Y = Xw + if y i = x T i w + i and i i.i.d. N (0, 2 ) bw MLE =(X T X) 1 X T Y = w +(X T X) 1 X T (x) =E Y X [Y X = x] bf D (x) = bw T x = w T x + T X(X T X) 1 x E XY [(Y (x)) 2 X = x] = 2 irreducible error + E D [(E D [ f b E D (x)] fd b (x)) 2 X=x ] = d 2 n variance ( (x) E D [ b f D (x)]) 2 +=0 biased squared 43
Linear Regression. Machine Learning CSE546 Kevin Jamieson University of Washington. Oct 5, Kevin Jamieson 1
Linear Regression Machine Learning CSE546 Kevin Jamieson University of Washington Oct 5, 2017 1 The regression problem Given past sales data on zillow.com, predict: y = House sale price from x = {# sq.
More informationAccouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF
Accouncements You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF Please do not zip these files and submit (unless there are >5 files) 1 Bayesian Methods Machine Learning
More informationcxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c
Warm up D cai.yo.ie p IExrL9CxsYD Sglx.Ddl f E Luo fhlexi.si dbll Fix any a, b, c > 0. 1. What is the x 2 R that minimizes ax 2 + bx + c x a b Ta OH 2 ax 16 0 x 1 Za fhkxiiso3ii draulx.h dp.d 2. What is
More informationClassification Logistic Regression
O due Thursday µtwl Classification Logistic Regression Machine Learning CSE546 Kevin Jamieson University of Washington October 16, 2016 1 THUS FAR, REGRESSION: PREDICT A CONTINUOUS VALUE GIVEN SOME INPUTS
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:
More informationClassification Logistic Regression
Classification Logistic Regression Machine Learning CSE546 Kevin Jamieson University of Washington October 16, 2016 1 THUS FAR, REGRESSION: PREDICT A CONTINUOUS VALUE GIVEN SOME INPUTS 2 Weather prediction
More informationNearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2
Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal
More informationIs the test error unbiased for these programs?
Is the test error unbiased for these programs? Xtrain avg N o Preprocessing by de meaning using whole TEST set 2017 Kevin Jamieson 1 Is the test error unbiased for this program? e Stott see non for f x
More informationWarm up. Regrade requests submitted directly in Gradescope, do not instructors.
Warm up Regrade requests submitted directly in Gradescope, do not email instructors. 1 float in NumPy = 8 bytes 10 6 2 20 bytes = 1 MB 10 9 2 30 bytes = 1 GB For each block compute the memory required
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationMachine Learning. Linear Models. Fabio Vandin October 10, 2017
Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w
More informationMachine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?
Linear Regression Machine Learning CSE546 Sham Kakade University of Washington Oct 4, 2016 1 What about continuous variables? Billionaire says: If I am measuring a continuous variable, what can you do
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationMachine Learning. Regression. Manfred Huber
Machine Learning Regression Manfred Huber 2015 1 Regression Regression refers to supervised learning problems where the target output is one or more continuous values Continuous output values imply that
More informationMachine learning - HT Maximum Likelihood
Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce
More informationGaussians Linear Regression Bias-Variance Tradeoff
Readings listed in class website Gaussians Linear Regression Bias-Variance Tradeoff Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 22 nd, 2007 Maximum Likelihood Estimation
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?
Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationMachine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationReminders. Thought questions should be submitted on eclass. Please list the section related to the thought question
Linear regression Reminders Thought questions should be submitted on eclass Please list the section related to the thought question If it is a more general, open-ended question not exactly related to a
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013
Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you
More informationAnnouncements Kevin Jamieson
Announcements Project proposal due next week: Tuesday 10/24 Still looking for people to work on deep learning Phytolith project, join #phytolith slack channel 2017 Kevin Jamieson 1 Gradient Descent Machine
More informationSimple and Multiple Linear Regression
Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where
More informationAn Introduction to Statistical Machine Learning - Theoretical Aspects -
An Introduction to Statistical Machine Learning - Theoretical Aspects - Samy Bengio bengio@idiap.ch Dalle Molle Institute for Perceptual Artificial Intelligence (IDIAP) CP 592, rue du Simplon 4 1920 Martigny,
More informationBayesian Linear Regression [DRAFT - In Progress]
Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory
More informationRegression with Numerical Optimization. Logistic
CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204
More informationMachine Learning. Lecture 2: Linear regression. Feng Li. https://funglee.github.io
Machine Learning Lecture 2: Linear regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2017 Supervised Learning Regression: Predict
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationSupport Vector Machines and Kernel Methods
2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 1, 2011 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationIntroduction to Machine Learning Fall 2017 Note 5. 1 Overview. 2 Metric
CS 189 Introduction to Machine Learning Fall 2017 Note 5 1 Overview Recall from our previous note that for a fixed input x, our measurement Y is a noisy measurement of the true underlying response f x):
More informationCSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015
CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015 Luke ZeElemoyer Slides adapted from Carlos Guestrin Predic5on of con5nuous variables Billionaire says: Wait, that s not what
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More informationMachine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 4: SVM I Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d. from distribution
More informationMLCC 2017 Regularization Networks I: Linear Models
MLCC 2017 Regularization Networks I: Linear Models Lorenzo Rosasco UNIGE-MIT-IIT June 27, 2017 About this class We introduce a class of learning algorithms based on Tikhonov regularization We study computational
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More informationAnnouncements Kevin Jamieson
Announcements My office hours TODAY 3:30 pm - 4:30 pm CSE 666 Poster Session - Pick one First poster session TODAY 4:30 pm - 7:30 pm CSE Atrium Second poster session December 12 4:30 pm - 7:30 pm CSE Atrium
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION
COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationLinear regression COMS 4771
Linear regression COMS 4771 1. Old Faithful and prediction functions Prediction problem: Old Faithful geyser (Yellowstone) Task: Predict time of next eruption. 1 / 40 Statistical model for time between
More informationMachine Learning (CSE 446): Probabilistic Machine Learning
Machine Learning (CSE 446): Probabilistic Machine Learning oah Smith c 2017 University of Washington nasmith@cs.washington.edu ovember 1, 2017 1 / 24 Understanding MLE y 1 MLE π^ You can think of MLE as
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Hypothesis testing Machine Learning CSE546 Kevin Jamieson University of Washington October 30, 2018 2018 Kevin Jamieson 2 Anomaly detection You are
More informationLinear Regression (continued)
Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression
More informationIntroduction to Machine Learning Spring 2018 Note 18
CS 189 Introduction to Machine Learning Spring 2018 Note 18 1 Gaussian Discriminant Analysis Recall the idea of generative models: we classify an arbitrary datapoint x with the class label that maximizes
More informationProbability and Statistical Decision Theory
Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Probability and Statistical Decision Theory Many slides attributable to: Erik Sudderth (UCI) Prof. Mike Hughes
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationChapter 1. Linear Regression with One Predictor Variable
Chapter 1. Linear Regression with One Predictor Variable 1.1 Statistical Relation Between Two Variables To motivate statistical relationships, let us consider a mathematical relation between two mathematical
More informationVariance Reduction and Ensemble Methods
Variance Reduction and Ensemble Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Last Time PAC learning Bias/variance tradeoff small hypothesis
More informationStochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More informationLinear Regression and Discrimination
Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian
More informationUnderstanding Generalization Error: Bounds and Decompositions
CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the
More informationMachine Learning. Linear Models. Fabio Vandin October 10, 2017
Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w
More informationStatistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks
Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Jan Drchal Czech Technical University in Prague Faculty of Electrical Engineering Department of Computer Science Topics covered
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 4, 2015 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationStatistical Machine Learning Hilary Term 2018
Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More information11. Learning graphical models
Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical
More informationIEOR 165 Lecture 7 1 Bias-Variance Tradeoff
IEOR 165 Lecture 7 Bias-Variance Tradeoff 1 Bias-Variance Tradeoff Consider the case of parametric regression with β R, and suppose we would like to analyze the error of the estimate ˆβ in comparison to
More informationMachine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 20, 2012 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)
More informationy Xw 2 2 y Xw λ w 2 2
CS 189 Introduction to Machine Learning Spring 2018 Note 4 1 MLE and MAP for Regression (Part I) So far, we ve explored two approaches of the regression framework, Ordinary Least Squares and Ridge Regression:
More informationChap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University
Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 24) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu October 2, 24 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 24) October 2, 24 / 24 Outline Review
More informationThe key is to understand an es0mator as a random variable
The key is to understand an es0mator as a random variable IID P(X,Y) y 1 y 2 y n x 1 x 2 x n? x n+1 minimize w2r d nx i=1 y i x i > w 2 + kwk 2 Training error Regulariza0on (ridge) term (λ: regulariza0on
More information6.867 Machine learning: lecture 2. Tommi S. Jaakkola MIT CSAIL
6.867 Machine learning: lecture 2 Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learning problem hypothesis class, estimation algorithm loss and estimation criterion sampling, empirical and
More informationEnsemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12
Ensemble Methods Charles Sutton Data Mining and Exploration Spring 2012 Bias and Variance Consider a regression problem Y = f(x)+ N(0, 2 ) With an estimate regression function ˆf, e.g., ˆf(x) =w > x Suppose
More informationLecture 13: Data Modelling and Distributions. Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1
Lecture 13: Data Modelling and Distributions Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1 Why data distributions? It is a well established fact that many naturally occurring
More informationModern Methods of Data Analysis - WS 07/08
Modern Methods of Data Analysis Lecture VIc (19.11.07) Contents: Maximum Likelihood Fit Maximum Likelihood (I) Assume N measurements of a random variable Assume them to be independent and distributed according
More informationClassification objectives COMS 4771
Classification objectives COMS 4771 1. Recap: binary classification Scoring functions Consider binary classification problems with Y = { 1, +1}. 1 / 22 Scoring functions Consider binary classification
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationLearning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin
Learning Theory Machine Learning CSE546 Carlos Guestrin University of Washington November 25, 2013 Carlos Guestrin 2005-2013 1 What now n We have explored many ways of learning from data n But How good
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationMachine Learning Gaussian Naïve Bayes Big Picture
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 27, 2011 Today: Naïve Bayes Big Picture Logistic regression Gradient ascent Generative discriminative
More informationHastie, Tibshirani & Friedman: Elements of Statistical Learning Chapter Model Assessment and Selection. CN700/March 4, 2008.
Hastie, Tibshirani & Friedman: Elements of Statistical Learning Chapter 7.1-7.9 Model Assessment and Selection CN700/March 4, 2008 Satyavarta sat@cns.bu.edu Auditory Neuroscience Laboratory, Department
More informationBias-Variance Tradeoff
What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff
More informationLinear Regression Linear Regression with Shrinkage
Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle
More informationLecture 2 - Learning Binary & Multi-class Classifiers from Labelled Training Data
Lecture 2 - Learning Binary & Multi-class Classifiers from Labelled Training Data DD2424 March 23, 2017 Binary classification problem given labelled training data Have labelled training examples? Given
More informationADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II
1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector
More information[Read Ch. 5] [Recommended exercises: 5.2, 5.3, 5.4]
Evaluating Hypotheses [Read Ch. 5] [Recommended exercises: 5.2, 5.3, 5.4] Sample error, true error Condence intervals for observed hypothesis error Estimators Binomial distribution, Normal distribution,
More informationClassification. Sandro Cumani. Politecnico di Torino
Politecnico di Torino Outline Generative model: Gaussian classifier (Linear) discriminative model: logistic regression (Non linear) discriminative model: neural networks Gaussian Classifier We want to
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationOverview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation
Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated
More informationAnnouncements. stuff stat 538 Zaid Hardhaoui. Statistics. cry. spring. g fa. inference. VC dimension covering
Announcements spring Convex Optimization next quarter ML stuff EE 578 Margam FaZe CS 547 Tim Althoff Modeling how to formulate real world problems as convex optimization Data science constrained optimization
More information1 Correlation between an independent variable and the error
Chapter 7 outline, Econometrics Instrumental variables and model estimation 1 Correlation between an independent variable and the error Recall that one of the assumptions that we make when proving the
More informationChapter 3: Maximum Likelihood Theory
Chapter 3: Maximum Likelihood Theory Florian Pelgrin HEC September-December, 2010 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 1 / 40 1 Introduction Example 2 Maximum likelihood
More informationMachine Learning Basics: Maximum Likelihood Estimation
Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning
More informationLinear Regression Linear Regression with Shrinkage
Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle
More informationGI01/M055: Supervised Learning
GI01/M055: Supervised Learning 1. Introduction to Supervised Learning October 5, 2009 John Shawe-Taylor 1 Course information 1. When: Mondays, 14:00 17:00 Where: Room 1.20, Engineering Building, Malet
More informationIntroduction to Machine Learning
Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More information+E x, y ρ. (fw (x) y) 2] (4)
Bias-Variance Analysis Let X be a set (or space) of objects and let ρ be a fixed probabiity distribution (or density) on X R. In other worlds ρ is a probability density on pairs x, y with x X and y R.
More informationCSE 151 Machine Learning. Instructor: Kamalika Chaudhuri
CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Ensemble Learning How to combine multiple classifiers into a single one Works well if the classifiers are complementary This class: two types of
More informationComments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert
Logistic regression Comments Mini-review and feedback These are equivalent: x > w = w > x Clarification: this course is about getting you to be able to think as a machine learning expert There has to be
More informationChapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression
BSTT523: Kutner et al., Chapter 1 1 Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression Introduction: Functional relation between
More informationEvaluating Classifiers. Lecture 2 Instructor: Max Welling
Evaluating Classifiers Lecture 2 Instructor: Max Welling Evaluation of Results How do you report classification error? How certain are you about the error you claim? How do you compare two algorithms?
More information