CS 2750 Machine Learning. Lecture 7. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

Similar documents
CS 2750 Machine Learning. Lecture 8. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

CS 2750 Machine Learning Lecture 8. Linear regression. Supervised learning. a set of n examples

Supervised learning: Linear regression Logistic regression

Generative classification models

Linear regression (cont.) Linear methods for classification

Linear regression (cont) Logistic regression

Linear Regression Linear Regression with Shrinkage. Some slides are due to Tommi Jaakkola, MIT AI Lab

Some Different Perspectives on Linear Least Squares

CS 1675 Introduction to Machine Learning Lecture 12 Support vector machines

Algorithms behind the Correlation Setting Window

Classification : Logistic regression. Generative classification model.

Support vector machines II

Overview. Basic concepts of Bayesian learning. Most probable model given data Coin tosses Linear regression Logistic regression

Binary classification: Support Vector Machines

Regression and the LMS Algorithm

Dimensionality reduction Feature selection

Objectives of Multiple Regression

CSE 5526: Introduction to Neural Networks Linear Regression

The Mathematics of Portfolio Theory

3D Reconstruction from Image Pairs. Reconstruction from Multiple Views. Computing Scene Point from Two Matching Image Points

Machine Learning. Introduction to Regression. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

7.0 Equality Contraints: Lagrange Multipliers

Bayes (Naïve or not) Classifiers: Generative Approach

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 17

A New Method for Solving Fuzzy Linear. Programming by Solving Linear Programming

Support vector machines

Solving optimal margin classifier

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

Generalized Linear Regression with Regularization

3.1 Introduction to Multinomial Logit and Probit

Simple Linear Regression

ECE 194C Target Classification in Sensor Networks Problem. Fundamental problem in pattern recognition.

Chapter Two. An Introduction to Regression ( )

DATA DOMAIN DATA DOMAIN

residual. (Note that usually in descriptions of regression analysis, upper-case

An Introduction to. Support Vector Machine

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model

KURODA S METHOD FOR CONSTRUCTING CONSISTENT INPUT-OUTPUT DATA SETS. Peter J. Wilcoxen. Impact Research Centre, University of Melbourne.

Construction of Composite Indices in Presence of Outliers

Linear Regression with One Regressor

The theoretical background of

ESS Line Fitting

CS 2750 Machine Learning Lecture 5. Density estimation. Density estimation

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

15-381: Artificial Intelligence. Regression and neural networks (NN)

An Approach to Solve Linear Equations Using Time- Variant Adaptation Based Hybrid Evolutionary Algorithm

Department of Mathematics UNIVERSITY OF OSLO. FORMULAS FOR STK4040 (version 1, September 12th, 2011) A - Vectors and matrices

Lecture 8: Linear Regression

D. L. Bricker, 2002 Dept of Mechanical & Industrial Engineering The University of Iowa. CPL/XD 12/10/2003 page 1

Maximum Likelihood Estimation

PGE 310: Formulation and Solution in Geosystems Engineering. Dr. Balhoff. Interpolation

Global Optimization for Solving Linear Non-Quadratic Optimal Control Problems

Relations to Other Statistical Methods Statistical Data Analysis with Positive Definite Kernels

Midterm Exam 1, section 2 (Solution) Thursday, February hour, 15 minutes

Gene Expression Data Classification with Kernel independent Component Analysis

Some results and conjectures about recurrence relations for certain sequences of binomial sums.

2/20/2013. Topics. Power Flow Part 1 Text: Power Transmission. Power Transmission. Power Transmission. Power Transmission

Comparison between MSE and MEE Based Component Extraction Approaches to Process Monitoring and Fault Diagnosis

Dimensionality reduction Feature selection

Lecture Notes Types of economic variables

Line Fitting and Regression

A Conventional Approach for the Solution of the Fifth Order Boundary Value Problems Using Sixth Degree Spline Functions

Analyzing Two-Dimensional Data. Analyzing Two-Dimensional Data

CH E 374 Computational Methods in Engineering Fall 2007

SMOOTH SUPPORT VECTOR REGRESSION BASED ON MODIFICATION SPLINE INTERPOLATION

Logistic Regression Classification for Uncertain Data

Rademacher Complexity. Examples

Bayesian Classification. CS690L Data Mining: Classification(2) Bayesian Theorem: Basics. Bayesian Theorem. Training dataset. Naïve Bayes Classifier

Kernel-based Methods and Support Vector Machines

Dimensionality Reduction and Learning

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn:

Linear models for classification

Lecture 12 APPROXIMATION OF FIRST ORDER DERIVATIVES

ECON 482 / WH Hong The Simple Regression Model 1. Definition of the Simple Regression Model

LINEAR REGRESSION ANALYSIS

Simple Linear Regression and Correlation.

å 1 13 Practice Final Examination Solutions - = CS109 Dec 5, 2018

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

Radial Basis Function Networks

Multivariate Transformation of Variables and Maximum Likelihood Estimation

Solutions to problem set ); (, ) (

Example: Multiple linear regression. Least squares regression. Repetition: Simple linear regression. Tron Anders Moger

Correlation and Regression Analysis

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

STA302/1001-Fall 2008 Midterm Test October 21, 2008

Linear Model Analysis of Observational Data in the Sense of Least Squares Criterion

Chapter 14 Logistic Regression Models

L5 Polynomial / Spline Curves

are positive, and the pair A, B is controllable. The uncertainty in is introduced to model control failures.

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

TESTS BASED ON MAXIMUM LIKELIHOOD

Unsupervised Learning and Other Neural Networks

Module 7. Lecture 7: Statistical parameter estimation

Estimation of R= P [Y < X] for Two-parameter Burr Type XII Distribution

4. Standard Regression Model and Spatial Dependence Tests

Class 13,14 June 17, 19, 2015

Part I: Background on the Binomial Distribution

Spreadsheet Problem Solving

The equation is sometimes presented in form Y = a + b x. This is reasonable, but it s not the notation we use.

Transcription:

CS 75 Mache Learg Lecture 7 Lear regresso Mlos Hauskrecht los@cs.ptt.edu 59 Seott Square CS 75 Mache Learg Lear regresso Fucto f : X Y s a lear cobato of put copoets f + + + K d d K k - paraeters eghts + d Bas ter f Iput vector d d CS 75 Mache Learg

Lear regresso. Error. Data: D < > Fucto: f We ould lke to have f for all.. Error fucto easures ho uch our predctos devate fro the desred asers Mea-squared error.. Learg: We at to fd the eghts zg the error! f CS 75 Mache Learg Lear regresso. Eaple desoal put 5 5 5-5 - -5 -.5 - -.5.5.5 CS 75 Mache Learg

Lear regresso. Eaple. desoal put 5 5-5 - -5 - - - - - - CS 75 Mache Learg Solvg lear regresso The optal set of eghts satsfes: T Leads to a sste of lear equatos SLE th d+ ukos of the for A b + + K+ + K+ d d Soluto to SLE: atr verso A b CS 75 Mache Learg

Gradet descet soluto Goal: the eght optzato the lear regresso odel Error f.. Iteratve soluto: Gradet descet frst order ethod Idea: Adust eghts the drecto that proves the Error The gradet tells us hat s the rght drecto α Error α > - a learg rate scales the gradet chages CS 75 Mache Learg Gradet descet ethod Desced usg the gradet forato Error Error * * Drecto of the descet Chage the value of accordg to the gradet α Error CS 75 Mache Learg

Gradet descet ethod Iteratvel approaches the optu of the Error fucto Error CS 75 Mache Learg -th eght: Ole gradet ethod T Lear odel f O-le error ole Error f O-le algorth: geerates a sequece of ole updates -th update step th : D < > Error α + α f Fed learg rate: - Use a sall costat α C Aealed learg rate: α - Graduall rescales chages CS 75 Mache Learg

O-le learg. Eaple.5.5.5.5.5.5.5.5 - - - - - - 5.5 5.5 5.5 5.5.5.5.5.5.5.5.5 - - -.5 - - - CS 75 Mache Learg Etesos of sple lear odel Replace puts to lear uts th feature bass fuctos to odel oleartes f + φ φ f φ φ - a arbtrar fucto of d φ The sae techques as before to lear the eghts CS 75 Mache Learg

Addtve lear odels Models lear the paraeters e at to ft f + k φ k k... - paraeters φ φ... φ - feature or bass fuctos Bass fuctos eaples: a hgher order poloal oe-desoal put φ φ φ Multdesoal quadratc φ φ φ φ 5 Other tpes of bass fuctos φ s φ cos φ CS 75 Mache Learg Fttg addtve lear odels Error fucto /.. f Assue: φ φ φ f φ.. Leads to a sste of lear equatos K φ φ + K+ φ φ + K+ φ φ φ Ca be solved eactl lke the lear case CS 75 Mache Learg

Eaple. Regresso th poloals. Regresso th poloals of degree Data pots: pars of < > Feature fuctos: feature fuctos φ K Fucto to lear: f + φ φ φ + φ CS 75 Mache Learg Learg th feature fuctos. Fucto to lear: f + φ O le gradet update for the <> par + α f k +α f φ Gradet updates are of the sae for as the lear ad logstc regresso odels CS 75 Mache Learg

Eaple. Regresso th poloals. Eaple: Regresso th poloals of degree + f φ O le update for <> par + α f + + α f CS 75 Mache Learg Multdesoal addtve odel eaple 5 5-5 - -5 - - - - - - CS 75 Mache Learg

Multdesoal addtve odel eaple CS 75 Mache Learg Statstcal odel of regresso A geeratve odel: f + ε f s a deterstc fucto ε s a rado ose t represets thgs e caot capture th f e.g. ε ~ N 5 5 5-5 - -5 -.5 - -.5.5.5 CS 75 Mache Learg

Statstcal odel of regresso Assue a geeratve odel: f + ε here T f s a lear odel ad ε ~ N The: f E odels the ea of outputs for ad the ose ε odels devatos fro the ea The odel defes the codtoal dest of gve p ep f π CS 75 Mache Learg ML estato of the paraeters lkelhood of predctos the probablt of observg outputs D gve ad s L D p Mau lkelhood estato of paraeters paraeters azg the lkelhood of predctos * arg a Log-lkelhood trck for the ML optzato Mazg the log-lkelhood s equvalet to azg the lkelhood p CS 75 Mache Learg l D log L D log p

CS 75 Mache Learg ML estato of the paraeters Usg codtoal dest We ca rerte the log-lkelhood as Mazg th regard to s equvalet to zg squared error fucto p D L D l log log c f p log + C f ] ep[ f p π CS 75 Mache Learg ML estato of paraeters Crtera based o ea squares error fucto ad the log lkelhood of the output are related We ko ho to optze paraeters the sae approach as used for the least squares ft But hat s the ML estate of the varace of the ose? Maze th respect to varace log c p ole + D l f * ˆ ea squared predcto error for the best predctor

Regularzed lear regresso If the uber of paraeters s large relatve to the uber of data pots used to tra the odel e face the threat of overft geeralzato error of the odel goes up The predcto accurac ca be ofte proved b settg soe coeffcets to zero Icreases the bas reduces the varace of estates Solutos: Subset selecto Rdge regresso Prcpal copoet regresso Net: rdge regresso CS 75 Mache Learg Rdge regresso Error fucto for the stadard least squares estates: T.. * T We seek: arg Rdge regresso: T + λ Where.. d.. ad λ What does the e error fucto do? CS 75 Mache Learg

Rdge regresso Stadard regresso: Rdge regresso: d T.. T +.. pealzes o-zero eghts th the cost proportoal to λ a shrkage coeffcet If a put attrbute has a sall effect o provg the error fucto t s shut do b the pealt ter Icluso of a shrkage pealt s ofte referred to as regularzato CS 75 Mache Learg λ Regularzed lear regresso Ho to solve the least squares proble f the error fucto s erched b the regularzato ter λ? Aser: The soluto to the optal set of eghts s obtaed aga b solvg a set of lear equato. Stadard lear regresso: T Soluto: * X X T X Regularzed lear regresso: T here X s a d atr th ros correspodg to eaples ad colus to puts * λi + X X T X T CS 75 Mache Learg