Linear Regression & Least Squares!

Similar documents
Least squares. Václav Hlaváč. Czech Technical University in Prague

CISE 301: Numerical Methods Lecture 5, Topic 4 Least Squares, Curve Fitting

SVMs for regression Non-parametric/instance based classification method

SVMs for regression Multilayer neural networks

DCDM BUSINESS SCHOOL NUMERICAL METHODS (COS 233-8) Solutions to Assignment 3. x f(x)

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 9

Lecture 36. Finite Element Methods

Support vector machines for regression

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede. with respect to λ. 1. χ λ χ λ ( ) λ, and thus:

Definition of Tracking

Chapter 7 Generalized and Weighted Least Squares Estimation. In this method, the deviation between the observed and expected values of

Machine Learning Support Vector Machines SVM

FUNDAMENTALS ON ALGEBRA MATRICES AND DETERMINANTS

Linear Regression Introduction to Machine Learning. Matt Gormley Lecture 5 September 14, Readings: Bishop, 3.1

ESCI 342 Atmospheric Dynamics I Lesson 1 Vectors and Vector Calculus

ME 501A Seminar in Engineering Analysis Page 1

UNIVERSITY OF IOANNINA DEPARTMENT OF ECONOMICS. M.Sc. in Economics MICROECONOMIC THEORY I. Problem Set II

Multiple view geometry

6 Roots of Equations: Open Methods

Linear Inferential Modeling: Theoretical Perspectives, Extensions, and Comparative Analysis

Quiz: Experimental Physics Lab-I

Course Review Introduction to Computer Methods

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

Jens Siebel (University of Applied Sciences Kaiserslautern) An Interactive Introduction to Complex Numbers

International Journal of Pure and Applied Sciences and Technology

18.7 Artificial Neural Networks

Modeling Labor Supply through Duality and the Slutsky Equation

Activator-Inhibitor Model of a Dynamical System: Application to an Oscillating Chemical Reaction System

Generalized Least-Squares Regressions I: Efcient Derivations

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Lecture 3: Dual problems and Kernels

Feb 14: Spatial analysis of data fields

An Introduction to Support Vector Machines

INSTITUTE OF AERONAUTICAL ENGINEERING Dundigal, Hyderabad

Chapter 5 Supplemental Text Material R S T. ij i j ij ijk

Rank One Update And the Google Matrix by Al Bernstein Signal Science, LLC

COMPLEX NUMBERS INDEX

LOCAL FRACTIONAL LAPLACE SERIES EXPANSION METHOD FOR DIFFUSION EQUATION ARISING IN FRACTAL HEAT TRANSFER

18-660: Numerical Methods for Engineering Design and Optimization

Fitting a Polynomial to Heat Capacity as a Function of Temperature for Ag. Mathematical Background Document

Partially Observable Systems. 1 Partially Observable Markov Decision Process (POMDP) Formalism

CENTROID (AĞIRLIK MERKEZİ )

Abhilasha Classes Class- XII Date: SOLUTION (Chap - 9,10,12) MM 50 Mob no

Department of Mechanical Engineering, University of Bath. Mathematics ME Problem sheet 11 Least Squares Fitting of data

Review of linear algebra. Nuno Vasconcelos UCSD

Chapter 9: Statistical Inference and the Relationship between Two Variables

Chapter 15 - Multiple Regression

Model Fitting and Robust Regression Methods

Lecture Notes on Linear Regression

Principle Component Analysis

INTRODUCTORY NUMERICAL ANALYSIS

THE COMBINED SHEPARD ABEL GONCHAROV UNIVARIATE OPERATOR

Neural Network (Basic Ideas) Hung-yi Lee

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

e i is a random error

The Geometry of Logit and Probit

Lecture 4: Piecewise Cubic Interpolation

Linear Approximation with Regularization and Moving Least Squares

Logistic Regression Maximum Likelihood Estimation

Chemical Reaction Engineering

Katholieke Universiteit Leuven Department of Computer Science

Jean Fernand Nguema LAMETA UFR Sciences Economiques Montpellier. Abstract

Review: Fit a line to N data points

β0 + β1xi. You are interested in estimating the unknown parameters β

Feature Selection: Part 1

β0 + β1xi. You are interested in estimating the unknown parameters β

Lecture 6: Introduction to Linear Regression

Generative classification models

Lecture 3 Camera Models 2 & Camera Calibration. Professor Silvio Savarese Computational Vision and Geometry Lab

Association for the Chi-square Test

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 13

Exploiting Structure in Probability Distributions Irit Gat-Viks

PhysicsAndMathsTutor.com

8. INVERSE Z-TRANSFORM

Pattern Classification

Decision Analysis (part 2 of 2) Review Linear Regression

Quadrilateral et Hexahedral Pseudo-conform Finite Elements

Statistics 423 Midterm Examination Winter 2009

is the calculated value of the dependent variable at point i. The best parameters have values that minimize the squares of the errors

PHYS 2421 Fields and Waves

Lecture 21: Numerical methods for pricing American type derivatives

CALIBRATION OF SMALL AREA ESTIMATES IN BUSINESS SURVEYS

Reactor Control Division BARC Mumbai India

Intro to Visual Recognition

Quantum Mechanics for Scientists and Engineers. David Miller

Stratified Extreme Ranked Set Sample With Application To Ratio Estimators

Kristin P. Bennett. Rensselaer Polytechnic Institute

Chapter 2 Transformations and Expectations. , and define f

USING IMAGE STATISTICS FOR AUTOMATED QUALITY ASSESSMENT OF URBAN GEOSPATIAL DATA

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

β0 + β1xi and want to estimate the unknown

Transform Coding. C.M. Liu Perceptual Signal Processing Lab College of Computer Science National Chiao-Tung University

Binomial Distribution: Tossing a coin m times. p = probability of having head from a trial. y = # of having heads from n trials (y = 0, 1,..., m).

Introduction to Numerical Integration Part II

10-701/ Machine Learning, Fall 2005 Homework 3

15-381: Artificial Intelligence. Regression and cross validation

Transcription:

Lner Regresson & Lest Squres Al Borj UWM CS 790 Slde credt: Aykut Erdem

Ths&week Lner&regresson&prolem&& ' con0nuous&outputs& ' smple&model Introduce&key&concepts:&& ' loss&func0ons& ' generlz0on& ' op0mz0on& ' model&complety& &regulrz0on Sldes'dpted'from'Rchrd'Zemel,'Ern'Hlpern,'Zv:Br'Joseph,'Ar>'Sngh,'Brns'Poczos,'J.P.'Lews,' Erk'Sudderth 2

Clssfc0on Input:&X& ' Rel&vlued,&vectors&over&rel.& ' Dscrete&vlues&(0,,2, )& ' Other&structures&(e.g.,&strngs,&grphs,&etc.) Output:&Y& ' Dscrete&(0,,2,...) Sports% Scence% News% Anemc%cell% Helthy%cell% X''Document' Y''Topc' X''Cell'Imge' Y''Dgnoss' 3

Regresson Input:&X& ' Rel&vlued,&vectors&over&rel.& ' Dscrete&vlues&(0,,2, )& ' Other&structures&(e.g.,&strngs,&grphs,&etc.) Output:&Y& ' Rel&vlued,&vectors&over&rel. Stock%Mrket%% Predcon% Y''?' X''Fe0'' 4

Choosng&&resturnt In&everydy&lfe&we&need&to&mke&decsons& y&tkng&nto&ccount&lots&of&fctors& The&ques0on&s&wht&weght&we&put&on& ech&of&these&fctors&(how&mportnt&re& they&wth&respect&to&the&others).& Assume&we&would&lke&to&uld&& recommender&system&sed&on&n& ndvduls &preferences& If&we&hve&mny& oserv0ons&we& my&e&le&to& recover&the&weghts Revews (out of 5 strs) Dstnce Cusne (out of 0) 4 30 2 7 2 5 2 8 5 27 53 9 3 20 5 6? 5

Some%other%emples Weght%+%heght% cholesterol%level% Age%+%gender%6me%% 6me%spent%n%front%of%the%TV% Pst%choces%of%%user% 'NeHl%score'% Profle%of%%jo% (user,%mchne,%6me) Memory%usge%of% %sumked%process. 6

Emple:%Polynoml%Curve%FQng The%green%curve%s%the%true%func6on% (whch%s%not%%polynoml)% %not%known% The%dt%ponts%re%unform%n%%ut%hve% nose%n%t.% t() f() + Am:%ft%%curve%to%these%ponts% Key%ques6ons:% own from Bshop %How%do%we%prmetrze%the%model%(the%curve)?% %%%% %%Wht%loss%(ojec6ve)%func6on%should%we%use%to%judge%ft?%% %%%% %%How%do%we%op6mze%ft%to%unseen%test%dt%(generlz6on)?%% 7

\D%regresson 8

One\dmensonl%regresson Fnd%%lne%tht%represent%the% est %lner%rel6onshp: 9

One\dmensonl%regresson Prolem:%the%dt%does%not% go%through%%lne% e 0

One\dmensonl%regresson Prolem:%the%dt%does%not% go%through%%lne% e Fnd%the%lne%tht%mnmzes% the%sum:% ( 2 )

Prolem:%the%dt%does%not% go%through%%lne% Fnd%the%lne%tht%mnmzes% the%sum:% We%re%lookng%for%%%%%%tht% mnmzes% One\dmensonl%regresson 2 ˆ e 2 ) ( ) ( e 2 ) (

Mtr%not6on Usng%the%followng%not6ons & : % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%nd n # " & : % n # " 3

Usng%the%followng%not6ons %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%nd We%cn%rewrte%the%error%func6on%usng%lner% lger%s: Mtr%not6on 4 " # % & n : " # % & n : 2 2 ) ( ) ( ) ( ) ( ) ( e e T

Emple:%Boston%House%Przes Es6mte%medn%house% prce%n%%neghorhood% sed%on%neghorhood% st6s6cs%% Look%t%frst%(of%3)% Krutes:%per%cpt% crme%rte%% Use%ths%to%predct%house% prces%n%other% neghorhoods hkps://rchve.cs.uc.edu/ml/dtsets/housng 5

Represent%the%dt Dt%descred%s%prs%D%%(( (),t () ),%( (2),t (2) ),...,%( (N),t (N) ))%% \ %s%the%nput%feture%(per%cpt%crme%rte)% \ t%s%the%trget%output%(medn%house%prce)% Here%t%s%con6nuous,%so%ths%s%%regresson%prolem%% Could%tke%frst%300%emples%s%trnng%set,%remnng%206% s%test%set% \ Use%the%trnng%emples%to%construct%hypothess,%or% func6on%ppromtor,%tht%mps%%to%predcted%y% \ Evlute%hypothess%on%test%set 6

Nose A%smple%model%typclly%does%not%ectly%ft%the%dt% % lck%of%ft%cn%e%consdered%nose Sources%of%nose% %Imprecson%n%dt%Krutes%(nput%nose)% %%% %% %%Errors%n%dt%trgets%(mslelng)%% %%% %% %%Add6onl%Krutes%not%tken%nto%ccount%y%dt% Krutes,%ffect%trget%vlues%(ltent%vrles)%% %%% %% %%Model%my%e%too%smple%to%ccount%for%dt% trgets 7

Lest\Squres%Regresson Stndrd%loss/cost/ojec6ve% func6on%mesures%the% squred%error%n%the% predc6on%of%t()%from%. N J(w) " [t (n) (w + w (n) )] 2 0 n from Bshop The%loss%for%the%red% hypothess%s%the%sum%of%the% squred%ver6cl%errors. 8

Op6mzng%the%Ojec6ve One%strghHorwrd%method:%n6lze%w%rndomly,% repetedly%updte%sed%on%grdent%descent%n%j% w w " #J #w Here%λ%s%the%lernng%rte% rnng rte For%%sngle%trnng%cse,%ths%gves%the%LMS%updte% rule: w w + 2(t (n) " y( (n) )) (n) Note:%s%error%pproches%zero,%so%does%updte error pproches zero, so does upd 9

Effect%of%step\sze%λ Lrge%λ%>%Fst%convergence%ut%lrger%resdul%error% %%%%%%%%%%%%%%%%%%%Also%possle%oscll6ons% Smll%λ%>%Slow%convergence%ut%smll%resdul%error 20

Op6mzng%Across%Trnng%Set Two%wys%to%generlze%ths%for%ll%emples%n%trnng%set:%. Stochs6c/onlne%updtes% updte%the%prmeters% for%ech%trnng%cse%n%turn,%ccordng%to%ts%own% grdents%% 2. Btch%updtes:%sum%or%verge%updtes%cross%every% emple%,%then%chnge%the%prmeter% vlues% w w + 2 N # n Underlyng%ssump6on:%smple%s%ndependent%nd% ssumpton: smple s ndepend den6clly%dstruted%(..d.) (t (n) " y( (n) )) (n) 2

Non\ter6ve%Lest\squres%Regresson An%ltern6ve%op6mz6on%pproch%s%non\ ter6ve:%tke%derv6ves,%set%to%zero,%nd%solve% for%prmeters. N dj(w) 2 "[t (n) (w + w (n) )] 0 dw 0 0 N n w (" t (n) w (n) ) / N t w 0 n w " n " n (t (n) t )( (n) ) ( (n) ) 2 22

Mul6\dmensonl%lner%regresson Usng%%model%wth%m%prmeters +... + m m j j j 23

Mul6\dmensonl%lner%regresson Usng%%model%wth%m%prmeters +... + m m j j j 2 24

Mul6\dmensonl%lner%regresson Usng%%model%wth%m%prmeters +... + m m j j j 2 25

Mul6\dmensonl%lner%regresson Usng%%model%wth%m%prmeters nd%n%mesurements 26 + + j j j m m... 2 2, 2, ) ( ) ( A " # % & ' m j j j n m j j j e

Mul6\dmensonl%lner%regresson Usng%%model%wth%m%prmeters nd%n%mesurements 27 + + j j j m m... 2 2, 2, ) ( ) ( ) ( A " # % & ' e e m j j j n m j j j

:A 28 " # % & " # % & " # % & m m n n m n :... : :.. :,,,, A

:A 29 " # % & + + + + " # % & " # % & " # % & )... ( : )... ( :... : :.. :,,,,,,,, m m n n n m m n m n n m n A

:A 30 " # % & + + + + " # % & " # % & " # % & )... ( : )... ( :... : :.. :,,,,,,,, m m n n n m m n m n n m n A prmeter'

:A 3 " # % & + + + + " # % & " # % & " # % & )... ( : )... ( :... : :.. :,,,,,,,, m m n n n m m n m n n m n A mesurement'n prmeter'

Emple:%Boston%House%Przes% %revsted One%method%of%etendng%the% model%s%to%consder%other% nput%dmensons y() w + w + w 0 2 2 In%the%Boston%housng% emple,%we%cn%look%t%the% numer%of%rooms%nput% feture% We%cn%use%grdent%descent% to%solve%for%ech%coeffcent,% or%use%lner%lger% %solve% system%of%equ6ons 2 32

Lner%Regresson Imgne%now%wnt%to%predct%the%medn%house%prce% from%these%mul6\dmensonl%oserv6ons%% Ech%house%s%%dt%pont%n,%wth%oserv6ons% ndeed%y%j:% (n) ( (n),..., d (n) ) Smple%predctor%s%nlogue%of%lner%clssfer,% producng%rel\vlued%y%for%nput%%wth%prmeters%w% (effec6vely%fng% 0 %%): 0 y w 0 + d j w j j w T 33

Mul6\dmensonl%lner%regresson e() A 2 2 ( A) T ( A) T A T A T A T T A + T. A%mnmum%occurs%when%%.%The%frst%derv6ve%s%zero,%% 2.%The%second%derv6ve%s%pos6ve.% Mul6dmensonl%cse:%% \ st %derv6ve%of%%func6on%f()%s%the%grdent,% f()%(%row%vector)% \ 2 nd %derv6ve,%the%hessn,%s%%mtr%tht%we%wll%denote%s%h f' (). e()2a T A 2A T. H e ()2A T A. 34

Mnmzng% e() mn mnmzes e( ) f

Mnmzng% e() mn mnmzes e( ) f e() mn

Mnmzng% e() e() s flt t mn mn mnmzes e( ) f e() mn

Mnmzng% e() e() s flt t mn e( ) 0 mn mn mnmzes e( ) f e() mn

Mnmzng% e() e() s flt t mn e( ) 0 mn mn mnmzes e( ) f e() does not go down round mn e() mn

Mnmzng% e() e() s flt t mn e( ) 0 mn mn mnmzes e( ) f e() does not go down round mn H e ( mn ) s postve e() sem - defnte mn

Recp:%Pos6ve%sem\defnte A s postve sem - defnte T A 0, for ll In -D In 2-D 4

Mnmzng% e( ) A 2 A T Aˆ A T ˆ mnmzes e( ) f 2A T A s postve sem - defnte

Mnmzng% e( ) A 2 A T Aˆ A T ˆ mnmzes e( ) f 2A T A s postve sem - defnte Alwys%true

Mnmzng% e( ) A 2 A T Aˆ A T The%norml'equton ˆ mnmzes e( ) f 2A T A s postve sem - defnte Alwys%true

Geometrc%nterpret6on 45

Geometrc%nterpret6on %s%%vector%n%r n 46

Geometrc%nterpret6on %s%%vector%n'r n % The%columns%of%A%defne%%vector%spce%rnge(A) 2 47

Geometrc%nterpret6on %s%%vector%n%r n % The%columns%of%A%defne%%vector%spce%rnge(A)% A%s%n%rtrry%vector%n%rnge(A) + 2 2 A 2 48

Geometrc%nterpret6on %s%%vector%n%r n % The%columns%of%A%defne%%vector%spce%rnge(A)% A%s%n%rtrry%vector%n%rnge(A) A + 2 2 A 2 49

Geometrc%nterpret6on %%%%%%s%the%orthogonl%projec6on%of%%onto%rnge(a) Aˆ A T ( ) T T Aˆ 0 A Aˆ A Aˆ ˆ ˆ + 2 2 Aˆ 2 50

The norml equton: A T Aˆ A T

The norml equton: A T Aˆ A T T T Estence:%%%%%%%%%%%%%%%%%%%%%%%%hs%lwys%%soluton A Aˆ A

The norml equton: A T Aˆ A T T T Estence:%%%%%%%%%%%%%%%%%%%%%%%%hs%lwys%%soluton% A Aˆ A Unqueness:%the%soluton%s%unque%f%the%columns%of% A%re%lnerly%ndependent%

The norml equton: A T Aˆ A T T T Estence:%%%%%%%%%%%%%%%%%%%%%%%%hs%lwys%%soluton% A Aˆ A Unqueness:%the%soluton%s%unque%f%the%columns%of% A%re%lnerly%ndependent% Aˆ 2

Lner%models % %%It%s%mthem6clly%esy%to%ft%lner%models%to%dt. %We%cn%lern%%lot%out%model\fQng%n%ths%rel6vely% smple%cse.%% % %%There%re%mny%wys%to%mke%lner%models%more%powerful%whle%retnng% ther%nce%mthem6cl%proper6es:% %By%usng%non\lner,%non\dp6ve%ss%func6ons,%we%cn%get%generlzed% lner%models%tht%lern%non\lner%mppngs%from%nput%to%output%ut%re% lner%n%ther%prmeters% %only%the%lner%prt%of%the%model%lerns.%% %%% %%%%%% %By%usng%kernel%methods%we%cn%hndle%epnsons%of%the%rw%dt%tht%use% %huge%numer%of%non\lner,%non\dp6ve%ss%func6ons.% % %By%usng%lrge%mrgn%kernel%methods%we%cn%vod%overfQng%even%when% we%use%huge%numers%of%ss%func6ons.% %But%lner%methods%wll%not%solve%most%AI%prolems.%% % %They%hve%fundmentl%lmt6ons.% 55

Some%types%of%ss%func6ons%n%\D { Sgmods Gussns Polynomls ( ) ( φ j () ep { ( µ } µj ) φ j) 2 j () σ s 2s 2 σ() +ep( ). 56

Two%types%of%lner%model%tht%re%equvlent%wth% y(, w) y(, w) s w w 0 0 respect%to%lernng + + w w φ + w 2 ( ) + 2 w 2 + φ... 2 w ( ) + T... w T Φ( ) The%frst%model%hs%the%sme%numer%of%dp6ve%coeffcents%s% the%dmensonlty%of%the%dt%+.%% The%second%model%hs%the%sme%numer%of%dp6ve% coeffcents%s%the%numer%of%ss%func6ons%+.%% Once%we%hve%replced%the%dt%y%the%outputs%of%the%ss% func6ons,%fqng%the%second%model%s%ectly%the%sme%prolem% s%fqng%the%frst%model%(unless%we%use%the%kernel%trck)% So%we ll%just%focus%on%the%frst%model 57

Generl%lner%regresson%prolem Usng%our%new%not6ons%for%the%ss%func6on%lner% regresson%cn%e%wrken%s n y w () j j j 0 where%%%%%%%%%cn%e%ether% j () j %for%mul6vrte%regresson%or% one%of%the%nonlner%ss%we%defned% er Once%gn%we%cn%use% lest%squres %to%fnd%the%op6ml% solu6on. 58

LMS%for%the%generl%lner%regresson%prolem regresson prolem Our gol s to mnmze the followng loss functon: J(w) (y w j j ( )) Movng to vector nottons we get: j 2 y n j 0 w j j () w vector of dmenson k+ ( ) vector of dmenson k+ y scler J(w) (y w T ( )) 2 We tke the dervtve w.r.t w w (y w T ( )) 2 2 (y w T ( )) Equtng to 0 we get 2 (y w T ( )) ( ) T ( ) T 0 y ( ) T w T ( ) ( ) T 59

LMS%for%the%generl%lner%regresson%prolem We tke the dervtve w.r.t w w (y w T ( )) 2 2 (y w T ( )) ( ) T Equtng to 0 we get 2 (y w T ( )) ( ) T 0 Defne: y ( ) T w T ( ) 0 ( ) ( ) m ( ) 0 ( 2 ) ( 2 ) m ( 2 ) 0 ( n ) ( n ) m ( n ) ( ) T J(w) (y w T ( )) 2 Then dervng w we get: w ( T ) T y 60

LMS%for%the%generl%lner%regresson%prolem Dervng w we get: w ( T ) T y J(w) (y w T ( )) 2 k+ entres vector n entres vector n y k+ mtr Ths soluton s lso known s psuedo nverse 6

FQng%%polynoml Now%we%use%one%of%these% ss%func6ons:%n%m th %order %polynoml%func6on% own from Bshop We%cn%use%the%sme%pproches%to%op6mze%the% vlues%of%the%weghts%on%ech%coeffcent:%nly6c,% nd%ter6ve 62

0 th %order%polynoml 63

st %order%polynoml 64

3 rd %order%polynoml 65

9 th %order%polynoml 66

Root%Men%Squre%(RMS)%Error E(w) 2 N n {y( n, w) t n } 2 t 0 M 0 t 0 M E RMS 2E(w )/N 0 0 The%dvson%y%N%llows%us%to%compre% dfferent%szes%of%dt%sets%on%n%equl% foo6ng,%nd%the%squre%root%ensures% tht%erms%s%mesured%on%the%sme% scle%(nd%n%the%sme%unts)%s%the% trget%vrle%t% t 0 M 3 0 t 0 M 9 0 67

Root%Men%Squre%(RMS)%Error Trnng Test ERMS 0.5 0 0 3 M 6 9 Root>Men>Squre'(RMS)'Error:' E(w) 2 NX (t n ( n ) T w) 2 t w 2 2 n Theoverf)ngprolem 68

Root%Men%Squre%(RMS)%Error Tle of the coeffcents w for polynomls of vrous order. Oserve how the typcl mgntude of the coeffcents ncreses drmtclly s the order of the polynoml ncreses. M 0 M M 6 M 9 w 0 0.9 0.82 0.3 0.35 w -.27 7.99 232.37 w 2-25.43-532.83 w 3 7.37 48568.3 w 4-23639.30 w 5 640042.26 w 6-06800.52 w 7 042400.8 w 8-557682.99 w 9 2520.43 Theoverf)ngprolem 69

Incresng%the%sze%of%trnng%dt N 5 N 00 t t 0 0 0 M 9 0 For%%gven%model%complety,%the%over\fQng%prolem%ecome%less%severe% s%the%sze%of%the%dt%set%ncreses.% Another%wy%to%sy%ths%s%tht%the%lrger%the%dt%set,%the%more%comple%(n% other%words%more%flele)%the%model%tht%we%cn%fford%to%ft%to%the%dt.% 70

\D%regresson%llustrtes%key%concepts Dt%fts% %s%lner%model%est%(model%selec6on)?% %%% %% %%Smplest%models%do%not%cpture%ll%the%mportnt%% %%% %% %%More%comple%model%my%overft%the%trnng%dt%(ft% not%only%the%sgnl%ut%lso%the%nose%n%the%dt),% especlly%f%not%enough%dt%to%constrn%model%% One%method%of%ssessng%ft:%test%generlz6on%%model s% lty%to%predct%the%held%out%dt%% Op6mz6on%s%essen6l:%stochs6c%nd%tch%ter6ve% pproches;%nly6c%when%vlle 7

Regulrzed%Lest%Squres A%technque%to%control%the%overfQng%phenomenon% Add%%penlty%term%to%the%error%func6on%n%order%to% dscourge%the%coeffcents%from%rechng%lrge% vlues Rdge regresson Ẽ(w) 2 N n w + + + whch's'mnmzed'y' {y( n, w) t n } 2 + λ 2 w 2 w 2 w T w w 2 0 + w 2 +...+ w 2 M mportnce of the regulrzton term com 72

The%effect%of%regulrz6on ln λ 8 ln λ 0 t t 0 0 0 0 M 9 73

The%effect%of%regulrz6on ERMS 0.5 Trnng Test ln λ ln λ 8 ln λ 0 w 0 0.35 0.35 0.3 w 232.37 4.74-0.05 w 2-532.83-0.77-0.06 w 3 48568.3-3.97-0.05 w 4-23639.30-3.89-0.03 w 5 640042.26 55.28-0.02 w 6-06800.52 4.32-0.0 w 7 042400.8-45.95-0.00 w 8-557682.99-9.53 0.00 w 9 2520.43 72.68 0.0 0 35 30 ln λ 25 20 The%correspondng%coeffcents%from%the%fKed%polynomls,%showng% tht%regulrz6on%hs%the%desred%effect%of%reducng%the%mgntude% of%the%coeffcents. 74

A%more%generl%regulrzer 2 N {t n w T φ( n )} 2 + λ 2 n M j w j q q 0.5 q q 2 q 4 75