Pattern Classification

Similar documents
Pattern Classification

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

Statistical Pattern Recognition

Pattern Classification, Ch4 (Part 1)

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3

1.010 Uncertainty in Engineering Fall 2008

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification

Lecture 11 and 12: Basic estimation theory

Classification with linear models

EE 6885 Statistical Pattern Recognition

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Chapter 6 Principles of Data Reduction

Stat 421-SP2012 Interval Estimation Section

Probability and MLE.

Elementary manipulations of probabilities

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

n n i=1 Often we also need to estimate the variance. Below are three estimators each of which is optimal in some sense: n 1 i=1 k=1 i=1 k=1 i=1 k=1

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Lecture 7: Properties of Random Samples

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Estimation of the Mean and the ACVF

Chapter 11 Output Analysis for a Single Model. Banks, Carson, Nelson & Nicol Discrete-Event System Simulation

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Unbiased Estimation. February 7-12, 2008

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Stat 319 Theory of Statistics (2) Exercises

Topic 10: Introduction to Estimation

f(x i ; ) L(x; p) = i=1 To estimate the value of that maximizes L or equivalently ln L we will set =0, for i =1, 2,...,m p x i (1 p) 1 x i i=1

Machine Learning. Logistic Regression -- generative verses discriminative classifier. Le Song /15-781, Spring 2008

15-780: Graduate Artificial Intelligence. Density estimation

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion

MATHEMATICAL SCIENCES PAPER-II

The Sample Variance Formula: A Detailed Study of an Old Controversy

CSE 527, Additional notes on MLE & EM

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

Frequentist Inference

Exponential Families and Bayesian Inference

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

Topics Machine learning: lecture 3. Linear regression. Linear regression. Linear regression. Linear regression

Probabilistic Unsupervised Learning

Lecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett

Probabilistic Classifiers Using Nearest Neighbor Balls. Climate Change Workshop, Malta, March, 2009

A statistical method to determine sample size to estimate characteristic value of soil parameters

5 : Exponential Family and Generalized Linear Models

Chi-Squared Tests Math 6070, Spring 2006

Sequences, Mathematical Induction, and Recursion. CSE 2353 Discrete Computational Structures Spring 2018

Bootstrap Intervals of the Parameters of Lognormal Distribution Using Power Rule Model and Accelerated Life Tests

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead)

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 6 9/23/2013. Brownian motion. Introduction

Expectation-Maximization Algorithm.

Analytic Continuation

10-701/ Machine Learning Mid-term Exam Solution

Lecture 15: Learning Theory: Concentration Inequalities

Estimation for Complete Data

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

DISTRIBUTION LAW Okunev I.V.

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

Random Variables, Sampling and Estimation

Empirical Process Theory and Oracle Inequalities

Expectation and Variance of a random variable

Pattern recognition systems Laboratory 10 Linear Classifiers and the Perceptron Algorithm

Maximum Likelihood Estimation and Complexity Regularization

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

TAMS24: Notations and Formulas

Questions and Answers on Maximum Likelihood

Probability and Statistics

Sampling Error. Chapter 6 Student Lecture Notes 6-1. Business Statistics: A Decision-Making Approach, 6e. Chapter Goals

MBACATÓLICA. Quantitative Methods. Faculdade de Ciências Económicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring

Bayesian Methods: Introduction to Multi-parameter Models

MA Advanced Econometrics: Properties of Least Squares Estimators

This section is optional.

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

Sample questions. 8. Let X denote a continuous random variable with probability density function f(x) = 4x 3 /15 for

Distribution of Random Samples & Limit theorems

Lecture 19: Convergence

Pattern recognition systems Lab 10 Linear Classifiers and the Perceptron Algorithm

DEEPAK SERIES DEEPAK SERIES DEEPAK SERIES FREE BOOKLET CSIR-UGC/NET MATHEMATICAL SCIENCES

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

New Ratio Estimators Using Correlation Coefficient

Kernel density estimator

Convergence of random variables. (telegram style notes) P.J.C. Spreij

4.5 Multiple Imputation

Chimica Inorganica 3

Last Lecture. Wald Test

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn

Parameter, Statistic and Random Samples

Chapter 23: Inferences About Means

Regression and generalization

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

STAT Homework 1 - Solutions

Transcription:

Patter Classificatio All materials i these slides were tae from Patter Classificatio (d ed) by R. O. Duda, P. E. Hart ad D. G. Stor, Joh Wiley & Sos, 000 with the permissio of the authors ad the publisher

Chapter 3: Maimum-Lielihood & Bayesia Parameter Estimatio (part ) Itroductio Maimum-Lielihood Estimatio Eample of a Specific Case The Gaussia Case: uow µ ad σ Bias Appedi: ML Problem Statemet

Itroductio Data availability i a Bayesia framewor We could desig a optimal classifier if we ew: P(ω i ) (priors) p( ω i ) (class-coditioal desities) Ufortuately, we rarely have this complete iformatio! Desig a classifier from a traiig sample No problem with prior estimatio Samples are ofte too small for class-coditioal estimatio (large dimesio of feature space!) Patter Classificatio, Chapter 3

A priori iformatio about the problem 3 Normality of p( ω i ) p( ω i ) ~ N( µ i, Σ i ) Characterized by parameters Estimatio techiques Maimum-Lielihood (ML) ad the Bayesia estimatios Results are early idetical, but the approaches are differet Patter Classificatio, Chapter 3

Parameters i ML estimatio are fied but uow! 4 Best parameters are obtaied by maimizig the probability of obtaiig the samples observed Bayesia methods view the parameters as radom variables havig some ow distributio Observatio of samples Obtai posterior desity I either approach, we use p(ω i ) for our classificatio rule! Patter Classificatio, Chapter 3

Maimum-Lielihood Estimatio 5 Has good covergece properties as the sample size icreases Simpler tha ay other alterative techiques Geeral priciple Assume we have c classes A collectio of samples categorized D to D C Draw accordig to p( ω j ) Samples are i.i.d p( ω i ) with ow parametric form that specifies it uiquely i Problem: Use the iformatio provided by the samples to obtai good estimates of the parameters i Assume- Samples i D i give o iformatio about j j i Patter Classificatio, Chapter 3

Maimum-Lielihood Estimatio 6 For eample, assume Normal pdfs p( ω j ) ~ N( µ j, Σ j ) p( ω j ) P ( ω j, j ) where: ( µ, Σ ) j j j ( µ, µ,..., σ, σ,cov(, m j j j j j j )...) Patter Classificatio, Chapter 3

Use the iformatio provided by the traiig samples to estimate (,,, c ), each i (i,,, c) is associated with each category 7 p( D p( D Suppose that D cotais samples,,,, ) p( ) F( ) ) is called the lielihood of w.r.t. the set of ML estimate of is, by defiitio the value maimizes p(d ) that It is the value of that best agrees with the actually observed traiig sample ˆ samples) Patter Classificatio, Chapter 3

8 Patter Classificatio, Chapter 3

Optimal estimatio Let (,,, p ) T ad let be the gradiet operator 9 D,,..., p T We defie l() as the log-lielihood fuctio l() l p(d ) New problem statemet: determie that maimizes the log-lielihood ˆ argma l( ) Patter Classificatio, Chapter 3

0 Set of ecessary coditios for a optimum is: D l D l p( ) 0 Patter Classificatio, Chapter 3

Maimum A Posteriori (MAP) Estimators Fid that maimizes l() + l p() Maimum Lielihood Estimator -- p() is flat Patter Classificatio, Chapter 3

Eample of a specific case: uow µ P( i µ) ~ N(µ, Σ) (Samples are draw from a multivariate ormal populatio) l p( µ ) [ π ) Σ ] d l ( ( µ ) T ( µ ) ad D l p( µ ) ( µ ) µ therefore: The ML estimate for µ must satisfy: Σ ( µ ˆ ) 0 Patter Classificatio, Chapter 3

Multiplyig by Σ ad rearragig, we obtai: µ ˆ 3 Just the arithmetic average of the samples of the traiig samples! Coclusio: If p( ω j ) (j,,, c) is supposed to be Gaussia i a d- dimesioal feature space; the we ca estimate the vector (,,, c ) t ad perform a optimal classificatio! Patter Classificatio, Chapter 3

Patter Classificatio, Chapter 3 4 ML Estimatio: Gaussia Case: uow µ ad σ (, ) (µ, σ ) + 0 ) ( 0 ) ( 0 )) ( (l )) ( (l ) ( l ) ( l σ σ σ σ π p p l p l

Patter Classificatio, Chapter 3 5 Summatio: Combiig () ad (), oe obtais: + () 0 ˆ ) ˆ ( ˆ () 0 ) ( ˆ T Σ ˆ) ( ˆ) ( ; ˆ ˆ µ µ µ

6 Bias ML estimate for σ is biased E ( i ) i. σ σ A elemetary ubiased estimator for Σ is: T C ( ˆ)( µ ˆ) µ 4 4444-44444 3 Sample covariace matri Patter Classificatio, Chapter 3

7 Appedi: ML Problem Statemet Let D {,,, } P(,, ) Π, P( ); D Our goal is to determie ˆ (value of that maes this sample the most represetative!) Patter Classificatio, Chapter 3

8 D...... N(µ j, Σ j ) P( j, ω ) P( j ω ) P( j ω ) D. 0. D. 8.. 0. 9 D c...... Patter Classificatio, Chapter 3

9 (,,, c ) Problem: fid ˆ such that: MaP(D ) MaP(,..., ) Ma P( ) Patter Classificatio, Chapter 3

Patter Classificatio, Chapter 3 0 else e p 0 0 ) ( Problem : 0 0 ) ( else e p ) ( e p