Classification with linear models

Similar documents
Topics Machine learning: lecture 3. Linear regression. Linear regression. Linear regression. Linear regression

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

5 : Exponential Family and Generalized Linear Models

The Bayesian Learning Framework. Back to Maximum Likelihood. Naïve Bayes. Simple Example: Coin Tosses. Given a generative model

Machine Learning. Logistic Regression -- generative verses discriminative classifier. Le Song /15-781, Spring 2008

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

f(x i ; ) L(x; p) = i=1 To estimate the value of that maximizes L or equivalently ln L we will set =0, for i =1, 2,...,m p x i (1 p) 1 x i i=1

Regression and generalization

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification

Pattern Classification

Lecture 7: Linear Classification Methods

Exponential Families and Bayesian Inference

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

Generative classification models

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019

Step 1: Function Set. Otherwise, output C 2. Function set: Including all different w and b

Probability and MLE.

Naïve Bayes. Naïve Bayes

Pattern Classification

Statistical Pattern Recognition

Logit regression Logit regression

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

Factor Analysis. Lecture 10: Factor Analysis and Principal Component Analysis. Sam Roweis

Lecture 2 October 11

Lecture 3. Properties of Summary Statistics: Sampling Distribution

1.010 Uncertainty in Engineering Fall 2008

CSIE/GINM, NTU 2009/11/30 1

Last time: Moments of the Poisson distribution from its generating function. Example: Using telescope to measure intensity of an object

Lecture 13: Maximum Likelihood Estimation

ECE 901 Lecture 13: Maximum Likelihood Estimation

Chapter 7. Support Vector Machine

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar.

15-780: Graduate Artificial Intelligence. Density estimation

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.

10-701/ Machine Learning Mid-term Exam Solution

FMA901F: Machine Learning Lecture 4: Linear Models for Classification. Cristian Sminchisescu

Week 1, Lecture 2. Neural Network Basics. Announcements: HW 1 Due on 10/8 Data sets for HW 1 are online Project selection 10/11. Suggested reading :

Elementary manipulations of probabilities

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Expectation-Maximization Algorithm.

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Pattern Classification, Ch4 (Part 1)

6.867 Machine learning

TAMS24: Notations and Formulas

11 Correlation and Regression

Matrix Representation of Data in Experiment

Machine Learning.

Lecture 15: Learning Theory: Concentration Inequalities

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

Mathematical Statistics Anna Janicka

MCT242: Electronic Instrumentation Lecture 2: Instrumentation Definitions

Perceptron. Inner-product scalar Perceptron. XOR problem. Gradient descent Stochastic Approximation to gradient descent 5/10/10

Machine Learning. Ilya Narsky, Caltech

ME 539, Fall 2008: Learning-Based Control

Quantile regression with multilayer perceptrons.

Outline. L7: Probability Basics. Probability. Probability Theory. Bayes Law for Diagnosis. Which Hypothesis To Prefer? p(a,b) = p(b A) " p(a)

A Note on Box-Cox Quantile Regression Estimation of the Parameters of the Generalized Pareto Distribution

MA Advanced Econometrics: Properties of Least Squares Estimators

Ω ). Then the following inequality takes place:

Statistics for Applications. Chapter 3: Maximum Likelihood Estimation 1/23

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Axis Aligned Ellipsoid

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead)

Support vector machine revisited

Lecture 33: Bootstrap

Lecture 11 and 12: Basic estimation theory

32 estimating the cumulative distribution function

Properties and Hypothesis Testing

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Statistical Inference Based on Extremum Estimators

Estimation of the Mean and the ACVF

Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3

Efficient GMM LECTURE 12 GMM II

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Section 14. Simple linear regression.

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian

Machine Learning Theory (CS 6783)

Design of Engineering Experiments Chapter 2 Basic Statistical Concepts

Lecture 12: November 13, 2018

Solution of Final Exam : / Machine Learning

Introduction to Machine Learning DIS10

Linear Regression Models

EE 6885 Statistical Pattern Recognition

Discrete probability distributions

System Diagnostics using Kalman Filter Estimation Error

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)

Dimensionality Reduction vs. Clustering

arxiv: v1 [math.pr] 13 Oct 2011

IE 230 Probability & Statistics in Engineering I. Closed book and notes. No calculators. 120 minutes.

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

[ ] ( ) ( ) [ ] ( ) 1 [ ] [ ] Sums of Random Variables Y = a 1 X 1 + a 2 X 2 + +a n X n The expected value of Y is:

FIR Filters. Lecture #7 Chapter 5. BME 310 Biomedical Computing - J.Schesser

For a 3 3 diagonal matrix we find. Thus e 1 is a eigenvector corresponding to eigenvalue λ = a 11. Thus matrix A has eigenvalues 2 and 3.

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Maximum Likelihood Estimation

A Note on Effi cient Conditional Simulation of Gaussian Distributions. April 2010

CSE 527, Additional notes on MLE & EM

Transcription:

Lecture 8 Classificatio with liear models Milos Hauskrecht milos@cs.pitt.edu 539 Seott Square Geerative approach to classificatio Idea:. Represet ad lear the distributio, ). Use it to defie probabilistic discrimiat fuctios E.g. g o ( ) ) g ( ) ) pical model, ) ) ) ) Class-coditioal distributios (desities) biar classificatio: two class-coditioal distributios ) ) ) Priors o classes - probabilit of class biar classificatio: Beroulli distributio ) + )

Geerative approach to classificatio Eample: Class-coditioal distributios multivariate ormal distributios N ( µ, ) for µ, ) ~ ~ N ( µ, ) for Multivariate ormal ~ N ( µ, ) (π ) d / / ep ( µ ) ( µ ) Priors o classes (class,) Beroulli distributio, θ ) θ ( θ ) ~ Beroulli {,} Gaussia class-coditioal desities.

Learig of parameters of the model Desit estimatio problem We see eamples & we do ot kow the parameters of Gaussias (class-coditioal desities) µ, ) (π ) d / / ep ML estimate of parameters of a multivariate ormal N ( µ, ) for a set of eamples of Optimize log-likelihood: l( D, µ, ) log i µ, ) µ ˆ i i How to lear class priors ), )? ˆ ( i ( µ ) i µ ˆ )( i i ( µ ) µ ˆ ) Makig class decisio Basicall we eed to desig discrimiat fuctios wo possible choices:. Likelihood of data choose the class (Gaussia) that eplais the iput data () better (likelihood of the data) p ( µ, ) > µ, ) the g ( else ) g ( ). Posterior of a class choose the class with better posterior probabilit ) > ) the g ( ) g ( ) else ) µ, µ, ) ) ) ) + µ, ) )

Gaussias: Liear decisio boudar Whe covariaces are the same ~ N ( µ, ), ~ N ( µ, ), Gaussias: Liear decisio boudar Cotours of class-coditioal desities.5.5 -.5 - -.5 - - -.5 - -.5.5.5

Gaussias: liear decisio boudar Decisio boudar.5.5 -.5 - -.5 - - -.5 - -.5.5.5 Gaussias: Quadratic decisio boudar Whe differet covariaces N ( µ, ), ~ ~ N ( µ, ),

Gaussias: Quadratic decisio boudar Cotours of class-coditioal desities.5.5 -.5 - -.5 - - -.5 - -.5.5.5 Gaussias: Quadratic decisio boudar 3 Decisio boudar.5.5.5 -.5 - -.5 - - -.5 - -.5.5.5

Back to the logistic regressio wo models with liear decisio boudaries: Logistic regressio Geerative model with Gaussias with the same covariace matrices ~ ( µ, ) for N N ( µ, ~ ) for wo models are related!!! Whe we have Gaussias with the same covariace matrices the discrimiat fuctio has the form of a logistic regressio model!!! p (, µ, µ, ) g( w ) Whe is the logistic regressio model correct? Members of a epoetial famil ca be ofte more aturall described as f ( θ, φ θ ) h(, φ) ep A( θ) a( φ) θ - A locatio parameter φ - A scale parameter Claim: A logistic regressio is a correct model whe class coditioal desities are from the same distributio i the epoetial famil ad have the same scale factor φ Ver powerful result!!!! We ca represet posteriors of ma distributios with the same small etwork

Liear regressio w w w Liear uits f () Logistic regressio f ( ) w f ( ), w) g( w ) w w w z f () p ( ) d d Gradiet update: w w+ α( f ( )) Olie: Gradiet update: w w+ α ( i f ( he same i)) i w w+ α( i f ( i)) i Olie: i i w w+ α( f ( )) Gradiet-based learig he same simple gradiet update rule derived for both the liear ad logistic regressio models Where the magic comes from? Uder the log-likelihood measure the fuctio models ad the models for the output selectio fit together: Liear model + Gaussia oise Gaussia oise w + ε ε ~ N (, σ ) w w w w Logistic + Beroulli θ Beroulli(θ ) p ( ) g ( w ) d w w w z g ( w ) Beroulli trial d

Geeralized liear models (GLIM) Assumptios: he coditioal mea (epectatio) is: µ f ( w ) Where f (.) is a respose fuctio Output is characterized b a epoetial famil distributio with a coditioal mea µ Eamples: Gaussia oise w Liear model + Gaussia oise w + ε ε ~ N (, σ ) Logistic + Beroulli Beroulli(θ ) θ g ( w ) + e w d d w w w w w z w g ( w ) Beroulli trial Geeralized liear models A caoical respose fuctios f (.) : ecoded i the distributio θ θ, φ) h(, φ) ep Leads to a simple gradiet form Eample: Beroulli distributio p ( µ ) µ ( µ ) µ θ log µ Logistic fuctio matches the Beroulli A( θ) a( φ) µ ep log + log( µ µ ) µ θ + e

Whe does the logistic regressio fail? Quadratic decisio boudar is eeded 3 Decisio boudar.5.5.5 -.5 - -.5 - - -.5 - -.5.5.5 Whe does the logistic regressio fail? Aother eample of a o-liear decisio boudar 5 4 3 - - -3-4 -4-3 - - 3 4 5

No-liear etesio of logistic regressio use feature (basis) fuctios to model oliearities the same trick as used for the liear regressio Liear regressio m w j j f ( ) w + φ ( ) φ j () φ ( ) φ ( ) j - a arbitrar fuctio of w w w Logistic regressio f ( ) g ( w + φ ( )) m w j j j d φ m ( ) w m