15-381: Artificial Intelligence. Regression and cross validation

Size: px

Start display at page:

Download "15-381: Artificial Intelligence. Regression and cross validation"

Shona Wheeler
6 years ago
Views:

1 15-381: Artfcal Intellgence Regresson and cross valdaton

2 Where e are Inputs Densty Estmator Probablty Inputs Classfer Predct category Inputs Regressor Predct real no. Today

3 Lnear regresson Gven an nput e ould lke to compute an output y For eample: - Predct heght from age - Predct Google s prce from Yahoo s prce - Predct dstance from all from sensors Y X

4 Lnear regresson Gven an nput e ould lke to compute an output y In lnear regresson e assume that y and are related th the follong equaton: What e are tryng to predct y +ε Observed values Y here s a parameter and ε represents measurement or other nose X

5 Lnear regresson Our goal s to estmate from a tranng data of <,y > pars Ths could be done usng a least squares approach y +! Y arg mn!( y ) X Why least squares? - mnmzes squared dstance beteen measurements and predcted lne - has a nce probablstc nterpretaton - easy to compute If the nose s Gaussan th mean 0 then least squares s also the mamum lkelhood estmate of

6 Solvng lnear regresson You should be famlar th ths by no We just take the dervatve.r.t. to and set to 0:!!!!!!! # # # # $ $ y y y y y 0 ) ( ) ( ) (

7 Regresson eample Generated: Recovered:.03 Nose: std1

8 Regresson eample Generated: Recovered:.05 Nose: std

9 Regresson eample Generated: Recovered:.08 Nose: std4

10 Affne regresson So far e assumed that the lne passes through the orgn What f the lne does not? No problem, smply change the model to y ε Can use least squares to determne 0, 1 0 Y X 0! y n 1! ( y 1! 0 )

11 Affne regresson So far e assumed that the lne passes through the orgn What f the lne does not? No problem, smply change the model to y ε Can use least squares to determne 0, 1 Just a second, e ll soon gve a smpler soluton 0 Y X 0! y n 1! ( y 1! 0 )

12 Multvarate regresson What f e have several nputs? - Stock prces for Yahoo, Mcrosoft and Ebay for the Google predcton task Ths becomes a multvarate regresson problem Agan, ts easy to model: y k k + ε Notatons: Loer case: varable or parameter ( 0 ) Loer case bold: vector () Upper case bold: matr (X)

13 Multvarate regresson: Least squares We are no nterested n a vector T [ 0, 1,, k ] It ould be useful to represent ths n matr notatons: $ X $ # $ X 1 M X n 1 % 11 L 1k % & y1 # $ ' ' 1 ' 1 L $! $ k ' $ y y $ M M L M ' $ M &' $ ' # 1 n1 L $!!! nk & % y n We can thus re-rte our model as y X+ε The soluton turns out to be: (X T X) -1 X T y Ths s an nstance of a larger set of computatonal solutons hch are usually referred to as generalzed least squares

14 Multvarate regresson: Least squares We can re-rte our model as y X The soluton turns out to be: (X T X) -1 X T y The s an nstance of a larger set of computatonal solutons hch are usually referred to as generalzed least squares X T X s a k by k matr X T y s a vector th k entres Why s (X T X) -1 X T y the rght soluton? Hnt: Multply both sdes of the orgnal equaton by (X T X) -1 X T

15 Beyond lnear regresson Can also generalze these classes of functons to be non-lnear functons of the nputs but stll lnear n the parameters. f (,) L+ m m

16 Polynomal regresson eamples

17 Over fttng Wth too fe tranng eamples our polynomal regresson model may acheve zero tranng error but nevertheless has a large generalzaton error When the tranng error no longer bears any relaton to the generalzaton error e say that the functon overfts the (tranng) data 0 )), ; ( ( 0 )), ; ( ( )~, ( >>!! # f y E f y n P y n

18 Cross valdaton Cross-valdaton allos us to estmate the generalzaton error based on tranng eamples alone. We learn a model usng a subset of the tranng data and estmate the generalzaton error usng the rest of the data We chose the model (for eample polynomal order) that mnmzes the error on the held out data Common strateges - Leave one out cross valdaton - Leave a bgger subset - Tran and test sets

19 Cross valdaton: Eample

Discriminative classifier: Logistic Regression. CS534-Machine Learning

Discriminative classifier: Logistic Regression. CS534-Machine Learning Dscrmnatve classfer: Logstc Regresson CS534-Machne Learnng robablstc Classfer Gven an nstance, hat does a probablstc classfer do dfferentl compared to, sa, perceptron? It does not drectl predct Instead,