CS 2750 Machine Learning. Lecture 8. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

Similar documents
CS 2750 Machine Learning. Lecture 7. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

Supervised learning: Linear regression Logistic regression

Generative classification models

Linear regression (cont) Logistic regression

Linear regression (cont.) Linear methods for classification

CS 2750 Machine Learning Lecture 8. Linear regression. Supervised learning. a set of n examples

Support vector machines II

CS 1675 Introduction to Machine Learning Lecture 12 Support vector machines

Linear Regression Linear Regression with Shrinkage. Some slides are due to Tommi Jaakkola, MIT AI Lab

Binary classification: Support Vector Machines

Dimensionality reduction Feature selection

Support vector machines

Objectives of Multiple Regression

Classification : Logistic regression. Generative classification model.

Regression and the LMS Algorithm

Machine Learning. Introduction to Regression. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Bayes (Naïve or not) Classifiers: Generative Approach

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

Overview. Basic concepts of Bayesian learning. Most probable model given data Coin tosses Linear regression Logistic regression

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

CSE 5526: Introduction to Neural Networks Linear Regression

Kernel-based Methods and Support Vector Machines

Unsupervised Learning and Other Neural Networks

Dimensionality Reduction and Learning

Line Fitting and Regression

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Lecture 8: Linear Regression

CS 2750 Machine Learning Lecture 5. Density estimation. Density estimation

Simple Linear Regression and Correlation.

Dimensionality reduction Feature selection

Generalized Linear Regression with Regularization

Lecture 3. Least Squares Fitting. Optimization Trinity 2014 P.H.S.Torr. Classic least squares. Total least squares.

Chapter Two. An Introduction to Regression ( )

Big Data Analytics. Data Fitting and Sampling. Acknowledgement: Notes by Profs. R. Szeliski, S. Seitz, S. Lazebnik, K. Chaturvedi, and S.

Simple Linear Regression

An Introduction to. Support Vector Machine

Maximum Likelihood Estimation

Model Fitting, RANSAC. Jana Kosecka

Module 7. Lecture 7: Statistical parameter estimation

Bayesian Classification. CS690L Data Mining: Classification(2) Bayesian Theorem: Basics. Bayesian Theorem. Training dataset. Naïve Bayes Classifier

15-381: Artificial Intelligence. Regression and neural networks (NN)

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture)

TESTS BASED ON MAXIMUM LIKELIHOOD

Lecture 1: Introduction to Regression

Multivariate Transformation of Variables and Maximum Likelihood Estimation

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model

Analyzing Two-Dimensional Data. Analyzing Two-Dimensional Data

4. Standard Regression Model and Spatial Dependence Tests

Chapter 14 Logistic Regression Models

Linear Regression with One Regressor

Radial Basis Function Networks

Lecture 2: The Simple Regression Model

Lecture 16: Backpropogation Algorithm Neural Networks with smooth activation functions

ESS Line Fitting

Point Estimation: definition of estimators

ECON 482 / WH Hong The Simple Regression Model 1. Definition of the Simple Regression Model

Econometric Methods. Review of Estimation

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

Midterm Exam 1, section 2 (Solution) Thursday, February hour, 15 minutes

Part I: Background on the Binomial Distribution

Lecture Notes Types of economic variables

Naïve Bayes MIT Course Notes Cynthia Rudin

Correlation and Regression Analysis

residual. (Note that usually in descriptions of regression analysis, upper-case

ENGI 3423 Simple Linear Regression Page 12-01

QR Factorization and Singular Value Decomposition COS 323

Introduction to local (nonparametric) density estimation. methods

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

9.1 Introduction to the probit and logit models

Multiple Choice Test. Chapter Adequacy of Models for Regression

ε. Therefore, the estimate

CHAPTER VI Statistical Analysis of Experimental Data

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn:


best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

STA302/1001-Fall 2008 Midterm Test October 21, 2008

Lecture Notes 2. The ability to manipulate matrices is critical in economics.

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

Lecture Notes Forecasting the process of estimating or predicting unknown situations

Parametric Density Estimation: Bayesian Estimation. Naïve Bayes Classifier

Rademacher Complexity. Examples

Advanced Introduction to Machine Learning

Summary of the lecture in Biostatistics

Lecture 1: Introduction to Regression

CS 3710 Advanced Topics in AI Lecture 17. Density estimation. CS 3710 Probabilistic graphical models. Administration

Correlation and Simple Linear Regression

CHAPTER 3 POSTERIOR DISTRIBUTIONS

LINEAR REGRESSION ANALYSIS

The equation is sometimes presented in form Y = a + b x. This is reasonable, but it s not the notation we use.

Spreadsheet Problem Solving

Statistics. Correlational. Dr. Ayman Eldeib. Simple Linear Regression and Correlation. SBE 304: Linear Regression & Correlation 1/3/2018

Example: Multiple linear regression. Least squares regression. Repetition: Simple linear regression. Tron Anders Moger

Midterm Exam 1, section 1 (Solution) Thursday, February hour, 15 minutes

Lecture 12 APPROXIMATION OF FIRST ORDER DERIVATIVES

STATISTICAL INFERENCE

Previous lecture. Lecture 8. Learning outcomes of this lecture. Today. Statistical test and Scales of measurement. Correlation

Chapter 11 Systematic Sampling

Principal Components. Analysis. Basic Intuition. A Method of Self Organized Learning

Transcription:

CS 75 Mache Learg Lecture 8 Lear regresso Mlos Hauskrecht mlos@cs.ptt.edu 539 Seott Square CS 75 Mache Learg Lear regresso Fucto f : X Y s a lear combato of put compoets f + + + K d d K k - parameters eghts + d Bas term f Iput vector d d CS 75 Mache Learg

Lear regresso. Error. Data: D < > Fucto: f We ould lke to have f for all.. Error fucto measures ho much our predctos devate from the desred asers Mea-squared error J.. Learg: We at to fd the eghts mmzg the error! f CS 75 Mache Learg Lear regresso. Eample dmesoal put 3 5 5 5-5 - -5 -.5 - -.5.5.5 CS 75 Mache Learg

Lear regresso. Eample. dmesoal put 5 5-5 - -5 - -3 - - 3-4 - 4 CS 75 Mache Learg Solvg lear regresso he optmal set of eghts satsfes: J Leads to a sstem of lear equatos SLE th d+ ukos of the form A b + + K+ + K+ d d Soluto to SLE: matr verso A b CS 75 Mache Learg

Gradet descet soluto Goal: the eght optmzato the lear regresso model J Error f.. Iteratve soluto: Gradet descet frst order method Idea: Adust eghts the drecto that mproves the Error he gradet tells us hat s the rght drecto α Error α > - a learg rate scales the gradet chages CS 75 Mache Learg Gradet descet method Desced usg the gradet formato Error Error * * Drecto of the descet Chage the value of accordg to the gradet α Error CS 75 Mache Learg

Gradet descet method Iteratvel approaches the optmum of the Error fucto Error 3 CS 75 Mache Learg -th eght: Ole gradet method Lear model f O-le error J ole Error f O-le algorthm: geerates a sequece of ole updates -th update step th : D < > Error α + α f Fed learg rate: - Use a small costat α C Aealed learg rate: α - Graduall rescales chages CS 75 Mache Learg

O-le learg. Eample 4.5 4.5 4 4 3.5 3.5 3 3.5.5.5.5-3 - - 3-3 - - 3 5.5 5 4.5 5.5 3 5 4 4.5 4 4 3.5 3.5 3 3.5.5.5.5.5-3 - - 3.5-3 - - 3 CS 75 Mache Learg Etesos of smple lear model Replace puts to lear uts th feature bass fuctos to model oleartes f m + φ φ f φ φ - a arbtrar fucto of d φ m m he same techques as before to lear the eghts CS 75 Mache Learg

Addtve lear models Models lear the parameters e at to ft f m + k φ k k... m - parameters φ φ... φ m - feature or bass fuctos Bass fuctos eamples: a hgher order polomal oe-dmesoal put 3 φ φ φ 3 Multdmesoal quadratc φ φ φ 3 4 φ 5 Other tpes of bass fuctos φ s φ cos φ CS 75 Mache Learg Fttg addtve lear models Error fucto J /.. f Assume: φ φ φ J f φ.. Leads to a sstem of m lear equatos K φ m φ + K+ φ φ + K+ mφ m φ φ Ca be solved eactl lke the lear case CS 75 Mache Learg

Eample. Regresso th polomals. Regresso th polomals of degree m Data pots: pars of < > Feature fuctos: m feature fuctos φ K m Fucto to lear: m f + φ φ φ + m φ m m m CS 75 Mache Learg Learg th feature fuctos. Fucto to lear: f + φ O le gradet update for the <> par + α f k +α f φ Gradet updates are of the same form as the lear ad logstc regresso models CS 75 Mache Learg

Eample. Regresso th polomals. Eample: Regresso th polomals of degree m m + f φ O le update for <> par + α f + m + α f CS 75 Mache Learg Multdmesoal addtve model eample 5 5-5 - -5 - -3 - - 3-4 - 4 CS 75 Mache Learg

Multdmesoal addtve model eample CS 75 Mache Learg Statstcal model of regresso A geeratve model: f + ε f s a determstc fucto ε s a radom ose t represets thgs e caot capture th f e.g. ε ~ N 3 5 5 5-5 - -5 -.5 - -.5.5.5 CS 75 Mache Learg

Statstcal model of regresso Assume a geeratve model: f + ε here f s a lear model ad ε ~ N he: f E models the mea of outputs for ad the ose ε models devatos from the mea he model defes the codtoal dest of gve p ep f π CS 75 Mache Learg ML estmato of the parameters lkelhood of predctos the probablt of observg outputs D gve ad s L D p Mamum lkelhood estmato of parameters parameters mamzg the lkelhood of predctos * arg ma Log-lkelhood trck for the ML optmzato Mamzg the log-lkelhood s equvalet to mamzg the lkelhood p CS 75 Mache Learg l D log L D log p

CS 75 Mache Learg ML estmato of the parameters Usg codtoal dest We ca rerte the log-lkelhood as Mamzg th regard to s equvalet to mmzg squared error fucto p D L D l log log c f p log + C f ] ep[ f p π CS 75 Mache Learg ML estmato of parameters Crtera based o mea squares error fucto ad the log lkelhood of the output are related We ko ho to optmze parameters the same approach as used for the least squares ft But hat s the ML estmate of the varace of the ose? Mamze th respect to varace log c p J ole + D l f * ˆ mea squared predcto error for the best predctor

Regularzed lear regresso If the umber of parameters s large relatve to umber of data pots used to tra the model e face the threat of overft geeralzato error of the model goes up he predcto accurac ca be ofte mproved b settg some coeffcets to zero Icreases the bas reduces the varace of estmates Solutos: Subset selecto Rdge regresso Prcpal compoet regresso Net: rdge regresso CS 75 Mache Learg Rdge regresso Error fucto for the stadard least squares estmates: J.. * We seek: arg m Rdge regresso: J + λ Where.. d.. ad λ What does the e error fucto do? CS 75 Mache Learg

Rdge regresso Stadard regresso: J Rdge regresso: J d.. +.. pealzes o-zero eghts th the cost proportoal to λ a shrkage coeffcet If a put attrbute has a small effect o mprovg the error fucto t s shut do b the pealt term Icluso of a shrkage pealt s ofte referred to as regularzato CS 75 Mache Learg λ Regularzed lear regresso Ho to solve the least squares problem f the error fucto s erched b the regularzato term λ? Aser: he soluto to the optmal set of eghts s obtaed aga b solvg a set of lear equato. Stadard lear regresso: J Soluto: * X X X Regularzed lear regresso: here X s a d matr th ros correspodg to eamples ad colums to puts * λi + X X X CS 75 Mache Learg

Regularzed lear regresso Problem: Ho to determe the parameter over-ft? λ that cotrols the Overftg s related to ML estmate. Baesa approach allevates the problem. J * X X X here X s a d matr th ros correspodg to eamples ad colums to puts * λi + X X X CS 75 Mache Learg Bas ad Varace Epected error Bas + Varace Epected error s the epected dscrepac betee the estmated ad true fucto E [ fˆ X E [ f X ] ] Bas s squared dscrepac betee averaged estmated ad true fucto [ fˆ X ] E [ f X ] E Varace s epected dvergece of the estmated fucto vs. ts average value E [ fˆ X E[ fˆ X ] ] CS 75 Mache Learg

E Bas ad Varace Epected error Bas + Varace [ ˆ f X E [ f X ] ] fˆ X E fˆ X + E [ [ ] [ ˆ E f X ] E [ f X ] ] fˆ X E [ fˆ X ] + E [ fˆ X ] E [ f X ] + E ˆ [ ˆ ] [ ˆ f X E f X E f X ] E [ f X ] E [ fˆ X ] E [ f X ] + E fˆ X E [ fˆ X ] + bas varace CS 75 Mache Learg Uder-fttg ad over-fttg Uder-fttg: Hgh bas models are ot accurate Small varace smaller fluece of eamples the trag set Over-fttg: Small bas models fleble eough to ft ell to trag data Large varace models deped ver much o the trag set CS 75 Mache Learg