Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Similar documents
Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Bayesian Decision Theory

Parametric Techniques Lecture 3

Parametric Techniques

Density Estimation. Seungjin Choi

Pattern Recognition. Parameter Estimation of Probability Density Functions

Density Estimation: ML, MAP, Bayesian estimation

Bayesian Decision and Bayesian Learning

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Maximum Likelihood Estimation. only training data is available to design a classifier

Bayesian Methods: Naïve Bayes

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

CMU-Q Lecture 24:

COMP90051 Statistical Machine Learning

Bayesian Machine Learning

Naïve Bayes classification

Graphical Models for Collaborative Filtering

Introduction to Machine Learning

p(d θ ) l(θ ) 1.2 x x x

Machine Learning 4771

Bias-Variance Tradeoff

Introduction to Machine Learning

CSC321 Lecture 18: Learning Probabilistic Models

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Introduction to Bayesian Learning. Machine Learning Fall 2018

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

Generative v. Discriminative classifiers Intuition

Bayesian Learning (II)

Lecture 1: Bayesian Framework Basics

Last Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Statistical learning. Chapter 20, Sections 1 3 1

ECE521 week 3: 23/26 January 2017

Support Vector Machines

Introduction to Probabilistic Machine Learning

Expectation maximization tutorial

Lecture : Probabilistic Machine Learning

COS513 LECTURE 8 STATISTICAL CONCEPTS

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Mathematical Formulation of Our Example

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Lecture 3: Pattern Classification

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

MLE/MAP + Naïve Bayes

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

The Bayes classifier

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF

STA414/2104 Statistical Methods for Machine Learning II

Probabilistic Machine Learning. Industrial AI Lab.

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Probability and Estimation. Alan Moses

Machine Learning Basics: Maximum Likelihood Estimation

Application: Can we tell what people are looking at from their brain activity (in real time)? Gaussian Spatial Smooth

Parameter Estimation. Industrial AI Lab.

Lecture 4: Probabilistic Learning

Logistic Regression. Machine Learning Fall 2018

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Lecture 3: Pattern Classification. Pattern classification

Probabilistic Graphical Models

Computer Vision Group Prof. Daniel Cremers. 3. Regression

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

Learning with Probabilities

Curve Fitting Re-visited, Bishop1.2.5

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

ECE 5984: Introduction to Machine Learning

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

CSC 411: Lecture 04: Logistic Regression

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

Bayesian Inference and MCMC

Statistical learning. Chapter 20, Sections 1 4 1

Statistical learning. Chapter 20, Sections 1 3 1

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Naive Bayes and Gaussian Bayes Classifier

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

PATTERN RECOGNITION AND MACHINE LEARNING

Machine Learning Linear Classification. Prof. Matteo Matteucci

Naive Bayes and Gaussian Bayes Classifier

Statistical Learning. Philipp Koehn. 10 November 2015

MLE/MAP + Naïve Bayes

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

SGN (4 cr) Chapter 5

Machine Learning Gaussian Naïve Bayes Big Picture

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

Lecture 3: Machine learning, classification, and generative models

Naïve Bayes Lecture 17

Lecture 2 Machine Learning Review

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Machine Learning Lecture 2

Probabilistic Graphical Models for Image Analysis - Lecture 1

Machine Learning

Machine Learning

Transcription:

Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012

Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation

Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation

Introduction Bayesian Decision Theory shows us how to design an optimal classifier if we know the prior probabilities P(ω i ) and the class-conditional densities p(x ω i ). Unfortunately, we rarely have complete knowledge of the probabilistic structure. However, we can often find design samples or training data that include particular representatives of the patterns we want to classify.

Introduction To simplify the problem, we can assume some parametric form for the conditional densities and estimate these parameters using training data. Then, we can use the resulting estimates as if they were the true values and perform classification using the Bayesian decision rule. We will consider only the supervised learning case where the true class label for each sample is known.

Introduction We will study two estimation procedures: Maximum likelihood estimation Views the parameters as quantities whose values are fixed but unknown. Estimates these values by maximizing the probability of obtaining the samples observed. Bayesian estimation Views the parameters as random variables having some known prior distribution. Observing new samples converts the prior to a posterior density.

Maximum Likelihood Estimation Suppose we have a set D = {x 1,...,x n } of independent and identically distributed (i.i.d.) samples drawn from the density p(x θ). We would like to use training samples in D to estimate the unknown parameter vector θ. Define L(θ D) as the likelihood function of θ with respect to D as

Maximum Likelihood Estimation The maximum likelihood estimate (MLE) of θ is, by definition, the value that maximizes L(θ D) and can be computed as It is often easier to work with the logarithm of the likelihood function (log-likelihood function) that gives

Maximum Likelihood Estimation If the number of parameters is p, i.e., θ = (θ 1,..., θ p ) T, define the gradient operator Then, the MLE of θ should satisfy the necessary conditions

Maximum Likelihood Estimation Properties of MLEs: The MLE is the parameter point for which the observed sample is the most likely. The procedure with partial derivatives may result in several local extrema. We should check each solution individually to identify the global optimum. Boundary conditions must also be checked separately for extrema.

The Gaussian Case Suppose that p(x θ) = N (μ, Σ). When Σ is known but μ is unknown: When both μ and Σ are unknown:

Bias of Estimators Bias of an estimator is the difference between the expected value of and θ. The unbiased sample covariance

Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Also referred as Maximum a Posteriori Probability (MAP) estimation

Bayesian Estimation Suppose the set D = {x 1,..., x n } contains the samples drawn independently from the density p(x θ) whose form is assumed to be known but θ is not known exactly. Assume that θ is a quantity whose variation can be described by the prior probability distribution p(θ).

Bayesian Estimation Given D, the prior distribution can be updated to form the posterior distribution using the Bayes rule

Bayesian Estimation The posterior distribution p(θ D) can be used to find estimates for θ (e.g., the expected value of p(θ D) can be used as an estimate for θ). Then, the conditional density p(x D) can be computed as and can be used in the Bayesian classifier.

MLEs vs. Bayes Estimates Maximum likelihood estimation finds an estimate of θ based on the samples in D but a different sample set would give rise to a different estimate. Bayes estimate takes into account the sampling variability. We assume that we do not know the true value of θ, and instead of taking a single estimate, we take a weighted sum of the densities p(x θ) weighted by the distribution p(θ D).

The Gaussian Case

The Gaussian Case

The Gaussian Case

The Gaussian Case

The Gaussian Case

The Gaussian Case

The Gaussian Case

Conjugate Priors A conjugate prior is one which, when multiplied with the probability of the observation, gives a posterior probability having the same functional form as the prior. This relationship allows the posterior to be used as a prior in further computations. The corresponding conjugate prior of Gaussian pdf is also Gaussian.

Recursive Bayes Learning quite useful if the distributions can be represented using only a few parameters (sufficient statistics).

MLEs vs. Bayes Estimates Comparison of MLEs and Bayes estimates

MLEs vs. Bayes Estimates ML and MAP estimates of will be approximately the same in (a) and different in (b).

Classification Error Different sources of error: Bayes error: due to overlapping class-conditional densities (related to the features used). Model error: due to incorrect model. Estimation error: due to estimation from a finite sample (can be reduced by increasing the amount of training data).