Supervised learning: Linear regression Logistic regression

Similar documents
CS 2750 Machine Learning. Lecture 8. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

Linear regression (cont.) Linear methods for classification

Binary classification: Support Vector Machines

CS 2750 Machine Learning. Lecture 7. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

Classification : Logistic regression. Generative classification model.

Linear regression (cont) Logistic regression

Generative classification models

CS 2750 Machine Learning Lecture 8. Linear regression. Supervised learning. a set of n examples

CS 1675 Introduction to Machine Learning Lecture 12 Support vector machines

Support vector machines II

Support vector machines

Regression and the LMS Algorithm

CSE 5526: Introduction to Neural Networks Linear Regression

15-381: Artificial Intelligence. Regression and neural networks (NN)

Linear models for classification

Kernel-based Methods and Support Vector Machines

An Introduction to. Support Vector Machine

Bayes (Naïve or not) Classifiers: Generative Approach

Machine Learning. Introduction to Regression. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Linear Regression Linear Regression with Shrinkage. Some slides are due to Tommi Jaakkola, MIT AI Lab

Objectives of Multiple Regression

Dimensionality reduction Feature selection

Unsupervised Learning and Other Neural Networks

Lecture 12: Multilayer perceptrons II

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture)

1 0, x? x x. 1 Root finding. 1.1 Introduction. Solve[x^2-1 0,x] {{x -1},{x 1}} Plot[x^2-1,{x,-2,2}] 3

Generative classification models

Overview. Basic concepts of Bayesian learning. Most probable model given data Coin tosses Linear regression Logistic regression

Midterm Exam 1, section 2 (Solution) Thursday, February hour, 15 minutes

Density estimation III. Linear regression.

Lecture Notes Forecasting the process of estimating or predicting unknown situations

Chapter 14 Logistic Regression Models

Multivariate Transformation of Variables and Maximum Likelihood Estimation

Maximum Likelihood Estimation

Multi-layer neural networks

Model Fitting, RANSAC. Jana Kosecka

Classification learning II

MMJ 1113 FINITE ELEMENT METHOD Introduction to PART I

å 1 13 Practice Final Examination Solutions - = CS109 Dec 5, 2018

Analyzing Two-Dimensional Data. Analyzing Two-Dimensional Data

Big Data Analytics. Data Fitting and Sampling. Acknowledgement: Notes by Profs. R. Szeliski, S. Seitz, S. Lazebnik, K. Chaturvedi, and S.

Bayesian Classification. CS690L Data Mining: Classification(2) Bayesian Theorem: Basics. Bayesian Theorem. Training dataset. Naïve Bayes Classifier

Machine Learning. knowledge acquisition skill refinement. Relation between machine learning and data mining. P. Berka, /18

Line Fitting and Regression

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

6.867 Machine Learning

An Improved Support Vector Machine Using Class-Median Vectors *

Correlation and Regression Analysis

Radial Basis Function Networks

QR Factorization and Singular Value Decomposition COS 323

ECON 482 / WH Hong The Simple Regression Model 1. Definition of the Simple Regression Model

Lecture Notes 2. The ability to manipulate matrices is critical in economics.

Rademacher Complexity. Examples

PGE 310: Formulation and Solution in Geosystems Engineering. Dr. Balhoff. Interpolation

Dimensionality Reduction and Learning

0/1 INTEGER PROGRAMMING AND SEMIDEFINTE PROGRAMMING

Introduction to local (nonparametric) density estimation. methods

Spreadsheet Problem Solving

Chapter Two. An Introduction to Regression ( )

Simple Linear Regression

Lecture 3. Least Squares Fitting. Optimization Trinity 2014 P.H.S.Torr. Classic least squares. Total least squares.

CS475 Parallel Programming

Lecture 16: Backpropogation Algorithm Neural Networks with smooth activation functions

Point Estimation: definition of estimators

6. Nonparametric techniques

Lecture 8: Linear Regression

Multiple Choice Test. Chapter Adequacy of Models for Regression

9.1 Introduction to the probit and logit models

Lecture Notes Types of economic variables

Special Instructions / Useful Data

( ) 2 2. Multi-Layer Refraction Problem Rafael Espericueta, Bakersfield College, November, 2006

Transforms that are commonly used are separable

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Naïve Bayes MIT Course Notes Cynthia Rudin

Principal Components. Analysis. Basic Intuition. A Method of Self Organized Learning

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

Multilayer neural networks

KLT Tracker. Alignment. 1. Detect Harris corners in the first frame. 2. For each Harris corner compute motion between consecutive frames

Midterm Exam 1, section 1 (Solution) Thursday, February hour, 15 minutes

L5 Polynomial / Spline Curves

LECTURE 9: Principal Components Analysis

A Robust Total Least Mean Square Algorithm For Nonlinear Adaptive Filter

UNIT 2 SOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS

Solving Constrained Flow-Shop Scheduling. Problems with Three Machines

Fourth Order Four-Stage Diagonally Implicit Runge-Kutta Method for Linear Ordinary Differential Equations ABSTRACT INTRODUCTION

Dimensionality reduction Feature selection

Unimodality Tests for Global Optimization of Single Variable Functions Using Statistical Methods

Chapter 7. Solution of Ordinary Differential Equations

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements

Lecture Note to Rice Chapter 8

LINEAR REGRESSION ANALYSIS

Nonparametric Density Estimation Intro

Lecture 3. Sampling, sampling distributions, and parameter estimation

Recall MLR 5 Homskedasticity error u has the same variance given any values of the explanatory variables Var(u x1,...,xk) = 2 or E(UU ) = 2 I

ρ < 1 be five real numbers. The

Analyzing Control Structures

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

1. A real number x is represented approximately by , and we are told that the relative error is 0.1 %. What is x? Note: There are two answers.

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn:

Arithmetic Mean and Geometric Mean

Transcription:

CS 57 Itroducto to AI Lecture 4 Supervsed learg: Lear regresso Logstc regresso Mlos Hauskrecht mlos@cs.ptt.edu 539 Seott Square CS 57 Itro to AI

Data: D { D D.. D D Supervsed learg d a set of eamples s a put vector of sze d s the desred output gve b a teacher Obectve: lear the mappg s.t. f for all.. } f Y Regresso: Y s cotuous Eample: eargs product orders compa stock prce Classfcato: Y s dscrete Eample: hadrtte dgt bar form dgt label : X CS 57 Itro to AI

Supervsed learg Net: To basc models of f : used supervsed learg X Y Lear regresso: Regresso here Y s R Logstc regresso Classfcato th classes CS 57 Itro to AI

Lear regresso Fucto f : X Y s a lear combato of put compoets f k - parameters eghts d d d Bas term f Iput vector d d CS 57 Itro to AI

Lear regresso Shorter vector defto of the model Iclude bas costat the put vector f d CS 57 Itro to AI d d k - parameters eghts Iput vector d d T f

Lear regresso. Error. Data: Fucto: D f We ould lke to have f for all.. Error fucto measures ho much our predctos devate from the desred asers Mea-squared error J.. f Learg: We at to fd the eghts mmzg the error! CS 57 Itro to AI

Lear regresso. Eample dmesoal put 3 5 5 5-5 - -5 -.5 - -.5.5.5 CS 57 Itro to AI

Lear regresso. Eample. dmesoal put 5 5-5 - -5 - -3 - - 3-4 - 4 CS 57 Itro to AI

CS 57 Itro to AI Lear regresso. Optmzato. We at the eghts mmzg the error For the optmal set of parameters dervatves of the error th respect to each parameter must be Vector of dervatves:.... T f J T J J grad d d J

CS 57 Itro to AI Lear regresso. Optmzato. For the optmal set of parameters dervatves of the error th respect to each parameter must be defes a set of equatos ] [ J k k ] [ k k J.... ] [ k k f J ] [ k k J grad J

CS 57 Itro to AI Solvg lear regresso B rearragg the terms e get a sstem of lear equatos th d+ ukos d d J d d A b d d d d

CS 57 Itro to AI Solvg lear regresso The optmal set of eghts satsfes: Leads to a sstem of lear equatos SLE th d+ ukos of the form Solutos to SLE: e.g. matr verso f the matr s sgular T J d d A b b A

Gradet descet soluto There are other as to solve the eght optmzato problem the lear regresso model J Error f.. A smple techque: Gradet descet Idea: Adust eghts the drecto that mproves the Error The gradet tells us hat s the rght drecto Error - a learg rate scales the gradet chages CS 57 Itro to AI

Gradet descet method Desced usg the gradet formato Error Error * * Drecto of the descet Chage the value of accordg to the gradet Error CS 57 Itro to AI

Gradet descet method Error Error * * Ne value of the parameter Error * * For all - a learg rate scales the gradet chages CS 57 Itro to AI

Gradet descet method Iteratvel coverge to the optmum of the Error fucto Error 3 CS 57 Itro to AI

Ole regresso algorthm The error fucto defed for the hole dataset D J Error f.. Istead of the error for all data pots e use error for each eample D Jole Error f Chage regresso eghts after ever eample accordg to the gradet: Error vector form: Error - Learg rate that depeds o the umber of updates CS 57 Itro to AI

Gradet for o-le learg T Lear model f O-le error Jole Error f O-le algorthm: sequece of ole updates -th update for the lear model: D Vector form: Error f -th eght: Error f Aealed learg rate: - Graduall rescales chages eghts CS 57 Itro to AI

Adaptve models T Lear model f O-le error Jole Error f O-le algorthm: Sequece of ole updates oe eample at the tme Useful for cotuous data streams Adaptve models: the uderlg model s ot statoar ad ca chage over tme Eample: seasoal chages O-le algorthm ca be made adaptve b keepg the learg at some costat value c CS 57 Itro to AI

Ole regresso algorthm Ole-lear-regresso D umber of teratos Italze eghts for =:: umber of teratos do select a data pot D from D ed for retur eghts set / d update eght vector f Advatages: ver eas to mplemet cotuous data streams CS 57 Itro to AI

O-le learg. Eample 4.5 4.5 4 4 3.5 3.5 3 3.5.5.5.5-3 - - 3-3 - - 3 5.5 5 4.5 3 5.5 5 4 4.5 4 4 3.5 3.5 3 3.5.5.5.5.5-3 - - 3.5-3 - - 3 CS 57 Itro to AI

Etesos of smple lear model Replace puts to lear uts th feature bass fuctos to model oleartes f m - a arbtrar fucto of f d m m The same techques as before to lear the eghts CS 75 Mache Learg

Etesos of the lear model Models lear the parameters e at to ft f Bass fuctos eamples: k k k a hgher order polomal oe-dmesoal put 3 3 Multdmesoal quadratc Other tpes of bass fuctos s cos m... m - parameters... - feature or bass fuctos m 3 4 5 CS 75 Mache Learg

CS 75 Mache Learg Etesos of the lear case Error fucto.. / f J φ.. f J Leads to a sstem of m lear equatos m φ Assume: m m Ca be solved eactl lke the lear case

Eample. Regresso th polomals. Regresso th polomals of degree m Data pots: pars of Feature fuctos: m feature fuctos m Fucto to lear: m f m m m m CS 75 Mache Learg

CS 75 Mache Learg Eample: Regresso th polomals of degree m O le update for <> par f f Eample. Regresso th polomals. m m f

Learg th feature fuctos Fucto to lear: f O le gradet update for the <> par f k f Gradet updates are of the same form as the lear regresso models CS 75 Mache Learg

Multdmesoal addtve model eample 5 5-5 - -5 - -3 - - 3-4 - 4 CS 75 Mache Learg

Multdmesoal addtve model eample CS 75 Mache Learg

To classes Y Bar classfcato {} Our goal s to lear to classf correctl to tpes of eamples Class labeled as Class labeled as We ould lke to lear f Zero-oe error loss fucto : X { } f Error f Error e ould lke to mmze: E Error Frst step: e eed to devse a model of the fucto CS 75 Mache Learg

Dscrmat fuctos Oe a to represet a classfer s b usg Dscrmat fuctos Works for bar ad mult-a classfcato Idea: For ever class = k defe a fucto mappg X Whe the decso o put should be made choose the class th the hghest value of g g So hat happes th the put space? Assume a bar case. CS 75 Mache Learg

Dscrmat fuctos Eample: To classes -D.5.5 g g -.5 - -.5 - - -.5 - -.5.5.5 CS 75 Mache Learg

Dscrmat fuctos Dscrmat fuctos g ad g defe the decso boudar.5.5 -.5 - -.5 g g g g g g g g g g - - -.5 - -.5.5.5 CS 75 Mache Learg

Quadratc decso boudar 3 Decso boudar.5.5.5 g g -.5 - -.5 g g g g - - -.5 - -.5.5.5 CS 75 Mache Learg

Logstc regresso model Defes a lear decso boudar Dscrmat fuctos: T g g here g z / e z T T f g g T g g - s a logstc fucto Iput vector d z Logstc fucto f d CS 75 Mache Learg

fucto Logstc fucto g z z e Is also referred to as a sgmod fucto Replaces the threshold fucto th smooth stchg takes a real umber ad outputs the umber the terval [].9.8.7.6.5.4.3.. - -5 - -5 5 5 CS 75 Mache Learg

Dscrmat fuctos: T g g Logstc regresso model T g g Values of dscrmat fuctos var [] Probablstc terpretato T f p g g z p Iput vector d d CS 75 Mache Learg

Logstc regresso We lear a probablstc fucto f : X [] here f descrbes the probablt of class gve f Note that: T g p p p Trasformato to bar class values: If p / the choose Else choose CS 75 Mache Learg

Logstc regresso model. Decso boudar Logstc Regresso defes a lear decso boudar Eample: classes blue ad red pots Decso boudar.5.5 -.5 - -.5 - - -.5 - -.5.5.5 CS 75 Mache Learg

CS 75 Mache Learg Lkelhood of outputs Let The Fd eghts that mamze the lkelhood of outputs Appl the log-lkelhood trck The optmal eghts are the same for both the lkelhood ad the log-lkelhood Logstc regresso: parameter learg D l log log P D L T g z g p log log D

CS 75 Mache Learg Logstc regresso: parameter learg Log lkelhood Dervatves of the loglkelhood Gradet descet: Ole update for k-th eample log log D l T f g D l ] [ k D l k k k Nolear eghts!! k k k f k ] [ z g D l

Logstc regresso. Ole gradet descet O-le compoet of the loglkelhood J ole D log log O-le learg update for eght J D ole k k k k [ J D ] k ole k Ole update for the logstc regresso for k-th eample D k k k k[ k k k f k ] k CS 75 Mache Learg

Ole logstc regresso algorthm Ole-logstc-regresso D umber of teratos talze eghts for =:: umber of teratos do select a data pot D from D ed for retur eghts d set / update eghts parallel [ f ] CS 75 Mache Learg

Ole algorthm. Eample. CS 75 Mache Learg

Ole algorthm. Eample. CS 75 Mache Learg

Ole algorthm. Eample. CS 75 Mache Learg

CS 75 Mache Learg Apped: Dervato of the gradet Log lkelhood Dervatves of the loglkelhood log log D l T f g D l z z D l log log z z g z g z z g z g z log log z g z g z z g Dervatve of a logstc fucto z g z g z g z