Machine Learning. Introduction to Regression. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Similar documents
Advanced Introduction to Machine Learning

Generative vs. Discriminative Classifiers

CS 2750 Machine Learning. Lecture 8. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

Bayes (Naïve or not) Classifiers: Generative Approach

Generative classification models

Supervised learning: Linear regression Logistic regression

Kernel-based Methods and Support Vector Machines

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

Dimensionality Reduction and Learning

Binary classification: Support Vector Machines

Objectives of Multiple Regression

CS 1675 Introduction to Machine Learning Lecture 12 Support vector machines

Maximum Likelihood Estimation

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

Linear Regression Linear Regression with Shrinkage. Some slides are due to Tommi Jaakkola, MIT AI Lab

i 2 σ ) i = 1,2,...,n , and = 3.01 = 4.01

Support vector machines II

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

CS 2750 Machine Learning. Lecture 7. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

Unsupervised Learning and Other Neural Networks

Support vector machines

Lecture Notes Forecasting the process of estimating or predicting unknown situations

Line Fitting and Regression

Regression and the LMS Algorithm

Lecture 8: Linear Regression

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements

Lecture Notes 2. The ability to manipulate matrices is critical in economics.

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture)


Lecture 9: Tolerant Testing

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Multivariate Transformation of Variables and Maximum Likelihood Estimation

ENGI 3423 Simple Linear Regression Page 12-01

Introduction to local (nonparametric) density estimation. methods

Lecture 3. Sampling, sampling distributions, and parameter estimation

An Introduction to. Support Vector Machine

9.1 Introduction to the probit and logit models

4. Standard Regression Model and Spatial Dependence Tests

CSE 5526: Introduction to Neural Networks Linear Regression

Chapter 14 Logistic Regression Models

Econometric Methods. Review of Estimation

TESTS BASED ON MAXIMUM LIKELIHOOD

Lecture 3. Least Squares Fitting. Optimization Trinity 2014 P.H.S.Torr. Classic least squares. Total least squares.

12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model

Point Estimation: definition of estimators

ESS Line Fitting

Chapter Two. An Introduction to Regression ( )

ECON 5360 Class Notes GMM

Correlation and Simple Linear Regression

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

PGE 310: Formulation and Solution in Geosystems Engineering. Dr. Balhoff. Interpolation

Section 2 Notes. Elizabeth Stone and Charles Wang. January 15, Expectation and Conditional Expectation of a Random Variable.

Lecture 1: Introduction to Regression

Generalized Linear Regression with Regularization

Overview. Basic concepts of Bayesian learning. Most probable model given data Coin tosses Linear regression Logistic regression

Statistics: Unlocking the Power of Data Lock 5

Lecture Notes Types of economic variables

Simple Linear Regression

Dimensionality reduction Feature selection

Radial Basis Function Networks

Recall MLR 5 Homskedasticity error u has the same variance given any values of the explanatory variables Var(u x1,...,xk) = 2 or E(UU ) = 2 I

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Lecture 3 Probability review (cont d)

CHAPTER VI Statistical Analysis of Experimental Data

QR Factorization and Singular Value Decomposition COS 323

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

Some Different Perspectives on Linear Least Squares

STK3100 and STK4100 Autumn 2018

1 Solution to Problem 6.40

Model Fitting, RANSAC. Jana Kosecka

Third handout: On the Gini Index

Introduction to Matrices and Matrix Approach to Simple Linear Regression

Midterm Exam 1, section 2 (Solution) Thursday, February hour, 15 minutes

L5 Polynomial / Spline Curves

Probability and. Lecture 13: and Correlation

Classification : Logistic regression. Generative classification model.

Simulation Output Analysis

ECON 482 / WH Hong The Simple Regression Model 1. Definition of the Simple Regression Model

STK3100 and STK4100 Autumn 2017

9 U-STATISTICS. Eh =(m!) 1 Eh(X (1),..., X (m ) ) i.i.d

Lecture 1: Introduction to Regression

Logistic regression (continued)

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

Training Sample Model: Given n observations, [[( Yi, x i the sample model can be expressed as (1) where, zero and variance σ

Simple Linear Regression and Correlation.

Special Instructions / Useful Data

Bayesian Classification. CS690L Data Mining: Classification(2) Bayesian Theorem: Basics. Bayesian Theorem. Training dataset. Naïve Bayes Classifier

STK4011 and STK9011 Autumn 2016

Part I: Background on the Binomial Distribution

Lecture Note to Rice Chapter 8

KLT Tracker. Alignment. 1. Detect Harris corners in the first frame. 2. For each Harris corner compute motion between consecutive frames

NATIONAL SENIOR CERTIFICATE GRADE 11

Simple Linear Regression

Example: Multiple linear regression. Least squares regression. Repetition: Simple linear regression. Tron Anders Moger

18.413: Error Correcting Codes Lab March 2, Lecture 8

Statistics. Correlational. Dr. Ayman Eldeib. Simple Linear Regression and Correlation. SBE 304: Linear Regression & Correlation 1/3/2018

å 1 13 Practice Final Examination Solutions - = CS109 Dec 5, 2018

Linear Regression with One Regressor

6. Nonparametric techniques

Multiple Choice Test. Chapter Adequacy of Models for Regression

Transcription:

Mache Learg CSE6740/CS764/ISYE6740, Fall 0 Itroducto to Regresso Le Sog Lecture 4, August 30, 0 Based o sldes from Erc g, CMU Readg: Chap. 3, CB

Mache learg for apartmet hutg Suppose ou are to move to LA!! Ad ou wat to fd the most reasoabl prced apartmet satsfg our eeds: square-ft., # of bedroom, dstace to campus Lvg area (ft # bedroom Ret ($ 30 600 506 000 433 00 09 500 50? 70.5?

he learg problem Features: Lvg area, dstace to campus, # bedroom Deote as =[,, k ] arget: Ret Deoted as rag set: ret Lvg area ret Lvg area Locato k k k,,,,,,,,, Y or

Lear Regresso ˆ 0 J ( ( ˆ (

he Least-Mea-Square (LMS method J ( (

he Least-Mea-Square (LMS method t j t j t ( ( j

he Least-Mea-Square (LMS method Steepest descet Note that: hs s as a batch gradet descet algorthm k J J J (,, t t t (

Some matr dervatves f R m : R For, defe: race: tra A A f ( A A A, m f f tra A A a, m f f trabc trcab trbca Some fact of matr dervatves (wthout proof A trab B, traba C CAB C AB, A A A A A

he ormal equatos Wrte the cost fucto matr form: o mmze J(θ, take dervatve ad set to zero: J ( ( 0 J tr tr tr tr he ormal equatos *

A recap: LMS update rule t j t j t ( j Pros: o-le, low per-step cost Cos: coordate, mabe slow-covergg Steepest descet Pros: fast-covergg, eas to mplemet Cos: a batch, Normal equatos t * t t ( Pros: a sgle-shot algorthm! Easest to mplemet. Cos: eed to compute pseudo-verse ( -, epesve, umercal ssues (e.g., matr s sgular..

Geometrc Iterpretato of LMS he predctos o the trag data are: Note that ad s the orthogoal projecto of to the space spaed b the colums of ˆ * I ˆ 0 I ˆ!! ŷ

Probablstc Iterpretato of LMS Let us assume that the target varable ad the puts are related b the equato: where ε s a error term of umodeled effects or radom ose Now assume that ε follows a Gaussa N(0,σ, the we have: B depedece assumpto: ( ep ; ( p p L ( ep ; ( (

Probablstc Iterpretato of LMS, cot. Hece the log-lkelhood s: ( ( l log Do ou recogze the last term? Yes t s: J ( ( hus uder depedece assumpto, LMS s equvalet to MLE of θ!

Beod basc LR LR wth o-lear bass fuctos Locall weghted lear regresso Regresso trees ad Multlear Iterpolato

LR wth o-lear bass fuctos LR does ot mea we ca ol deal wth lear relatoshps We are free to desg (o-lear features uder LR 0 m j j f f( ( where the f j ( are fed bass fuctos (ad we defe f 0 ( =. Eample: polomal regresso: f( :, 3,, We wll be cocered wth estmatg (dstrbutos over the weghts θ ad choosg the model order M.

Bass fuctos here are ma bass fuctos, e.g.: Polomal f ( j j Radal bass fuctos Sgmodal f ( j f ( j j s ep j s Sples, Fourer, Wavelets, etc

D ad D RBFs D RBF After ft:

Good ad Bad RBFs A good D RBF wo bad D RBFs

Locall weghted lear regresso Overfttg ad uderfttg 0 0 5 j 0 j j

Bas ad varace we defe the bas of a model to be the epected geeralzato error eve f we were to ft t to a ver (sa, ftel large trag set. B fttg "spurous" patters the trag set, we mght aga obta a model wth large geeralzato error. I ths case, we sa the model has large varace.

Locall weghted lear regresso he algorthm: Istead of mmzg ow we ft θ to mmze Where do w 's come from? J ( ( J ( w ( ( ep w where s the quer pot for whch we'd lke to kow ts correspodg Essetall we put hgher weghts o (errors o trag eamples that are close to the quer pot (tha those that are further awa from the quer Do we also have a probablstc terpretato here (as we dd for LR?

Parametrc vs. o-parametrc Locall weghted lear regresso s the frst eample we are rug to of a o-parametrc algorthm. he (uweghted lear regresso algorthm that we saw earler s kow as a parametrc learg algorthm because t has a fed, fte umber of parameters (the θ, whch are ft to the data; Oce we've ft the θ ad stored them awa, we o loger eed to keep the trag data aroud to make future predctos. I cotrast, to make predctos usg locall weghted lear regresso, we eed to keep the etre trag set aroud. he term "o-parametrc" (roughl refers to the fact that the amout of stuff we eed to keep order to represet the hpothess grows learl wth the sze of the trag set.

Robust Regresso he best ft from a quadratc regresso But ths s probabl better How ca we do ths?

LOESS-based Robust Regresso Remember what we do "locall weghted lear regresso"? we "score" each pot for ts mpotece Now we score each pot accordg to ts "ftess" (Courtes to Adrew Moor

Robust regresso For k = to R Let ( k, k be the kth datapot Let est k be predcted value of k Let w k be a weght for data pot k that s large f the data pot fts well ad small f t fts badl: w k f ( k est k he redo the regresso usg weghted data pots. Repeat whole thg utl coverged!

Robust regresso probablstc terpretato What regular regresso does: Assume k was orgall geerated usg the followg recpe: k k N( 0, Computatoal task s to fd the Mamum Lkelhood estmato of θ

Robust regresso probablstc terpretato What LOESS robust regresso does: Assume k was orgall geerated usg the followg recpe: wth probablt p: k k N( 0, but otherwse k ~ N(, huge Computatoal task s to fd the Mamum Lkelhood estmates of θ, p, µ ad σ huge. he algorthm ou saw wth teratve reweghtg/refttg does ths computato for us. Later ou wll fd that t s a stace of the famous E.M. algorthm

Regresso ree Decso tree for regresso Geder Rch? Num. Chldre # travel per r. Age Geder? F No 5 38 M No 0 5 M Yes 0 7 : : : : : Female Predcted age=39 Male Predcted age=36

A coceptual pcture Assumg regular regresso trees, ca ou sketch a graph of the ftted fucto *( over ths dagram? < 0?

How about ths oe? Partto the space ad each partto a costat ft Each cell ca be reached b askg a set of questos

ake home message Gradet descet O-le Batch Normal equatos Equvalece of LMS ad MLE LR does ot mea fttg lear relatos, but lear combato or bass fuctos (that ca be o-lear Weghtg pots b mportace versus b ftess