Density Estimation: ML, MAP, Bayesian estimation

Similar documents
Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Learning Bayesian network : Given structure and completely observed data

Density Estimation. Seungjin Choi

Maximum Likelihood Estimation. only training data is available to design a classifier

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Machine Learning 4771

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Statistical learning. Chapter 20, Sections 1 3 1

Statistical learning. Chapter 20, Sections 1 4 1

GAUSSIAN PROCESS REGRESSION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

Pattern Recognition. Parameter Estimation of Probability Density Functions

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Parametric Techniques Lecture 3

Intelligent Systems Statistical Machine Learning

Parametric Techniques

Curve Fitting Re-visited, Bishop1.2.5

CSC321 Lecture 18: Learning Probabilistic Models

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Introduction to Machine Learning

Statistical learning. Chapter 20, Sections 1 3 1

Non-parametric Methods

STA414/2104 Statistical Methods for Machine Learning II

PATTERN RECOGNITION AND MACHINE LEARNING

Intelligent Systems Statistical Machine Learning

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Machine Learning for Data Science (CS4786) Lecture 12

Machine Learning Basics: Maximum Likelihood Estimation

Naïve Bayes classification

Gentle Introduction to Infinite Gaussian Mixture Modeling

Bayesian Learning (II)

Regression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh.

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Least Squares Regression

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

ECE521 week 3: 23/26 January 2017

Lecture 4: Probabilistic Learning

Expectation Maximization (EM)

Lecture : Probabilistic Machine Learning

Bayesian Decision and Bayesian Learning

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Bayesian statistics. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Application: Can we tell what people are looking at from their brain activity (in real time)? Gaussian Spatial Smooth

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Linear Models for Regression

Introduction to Maximum Likelihood Estimation

Computer Vision Group Prof. Daniel Cremers. 3. Regression

Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

Least Squares Regression

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF

Introduction to Probabilistic Machine Learning

Bayesian Models in Machine Learning

Statistical Pattern Recognition

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

The Bayes classifier

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

Introduction to Statistical Learning Theory

PMR Learning as Inference

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Bayes Decision Theory

Bayesian Analysis for Natural Language Processing Lecture 2

Nonparametric Bayesian Methods (Gaussian Processes)

Learning from Data: Regression

Gaussians Linear Regression Bias-Variance Tradeoff

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

y Xw 2 2 y Xw λ w 2 2

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Learning with Probabilities

Machine Learning Lecture 2

The Expectation-Maximization Algorithm

Probability and Estimation. Alan Moses

Statistical Learning. Philipp Koehn. 10 November 2015

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Introduction to Bayesian Learning. Machine Learning Fall 2018

Theory of Maximum Likelihood Estimation. Konstantin Kashin

MLE/MAP + Naïve Bayes

Bayesian Linear Regression. Sargur Srihari

COMP90051 Statistical Machine Learning

Linear Models for Regression CS534

Graphical Models for Collaborative Filtering

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?

Transcription:

Density Estimation: ML, MAP, Bayesian estimation CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani

Outline Introduction Maximum-Likelihood Estimation Maximum A Posteriori Estimation Bayesian Estimation 2

Density Estimation Estimating the probability density function set of data points drawn from it.,givena Main approaches of density estimation: Parametric: assuming a parameterized model for density function A number of parameters are optimized by fitting the model to the data set Nonparametric (Instance-based): No specific parametric model is assumed for density function The form of the density function is determined entirely by the data 3

Class-Conditional Densities We usually do not know the class-conditional densities ( ). However, we might have prior knowledge about: Functional forms of these densities Ranges for the values of their unknown parameters We can separate training data of different classes and use the set containing training samples of class to estimate ( ) =(, ) Estimating ( ) from can be considered as an unsupervised density estimation problem 4

Parametric Density Estimation Assume that in terms of a specific functional form which has a number of adjustable parameters. Example: a multivariate Gaussian distribution Methods for parameter estimation Maximum likelihood estimation Maximum A Posteriori (MAP) estimation Bayesian estimation 5

Maximum Likelihood Estimation (MLE) Likelihood is the conditional probability of observations () () () given the value of parameters Assuming i.i.d. observations (statistically independent, identically distributed samples) () likelihood of w.r.t. the samples Maximum Likelihood estimation 6

Maximum Likelihood Estimation (MLE) best agrees with the observed samples 7

Maximum Likelihood Estimation (MLE) () () () Thus, we solve to find global optimum 8

MLE Gaussian: Unknown / () 9

MLE Gaussian Case: Unknown and =, 10

Maximum A Posteriori (MAP) Estimation MAP estimation Since Example of prior distribution: 11

Maximum A Posteriori (MAP) Estimation Given a set of observations and a prior distribution on parameters, the parameter vector that maximizes is found. 12

MAP Estimation Gaussian: Unknown ( )~(, ) ( )~(, ) is the only unknown parameter and are known 13 ln 1 = () = 0 + + 1 =0 1+ 1or = =

Bayesian Estimation Given:samples pdf, the form of the density, a priori information about Goal: compute the conditional pdf of. as an estimate =, = Analytical solutions exist only for very special forms of the involved functions 14 If we know the value of the parameters, we know exactly the distribution of

Bayesian Estimation Parameters are considered as a vector of random variables with a priori distribution Bayesian estimation utilizes the available prior information about the unknown parameter As opposed to ML and MAP estimation, it does not seek a specific point estimate of the unknown parameter vector The observed samples convert the prior densities a posterior density To find the conditional pdf ( ), we must first specify Then, is found as = into 15

Bayesian Estimation Gaussian: Unknown (known ) ~(, ) ()~(, ) = () ( () = 1 2 exp 1 2 = exp 1 2 = exp 1 2 + 1 2 1 2 exp 1 2 + + 16

Bayesian Estimation Gaussian: Unknown (known ) ~(, ) (): conjugate prior = + + + = + = = 1 2 exp 1 2 +, ~(, + ) 17

Bayesian Estimation Gaussian: Unknown (known ) ( ) More samples sharper ( ) 18

Some Related Definitions Conjugate Priors We consider a form of prior distribution that has a simple interpretation as well as some useful analytical properties Choosing a prior such that the posterior distribution that is proportional to ( )() will have the same functional form as the prior. Bayesian learning When densities converges to a Dirac delta function centered about the true parameter value 19

ML, MAP, and Bayesian Estimation If has a sharp peak at (i.e., ), then In this case, the Bayesian estimation will be approximately equal to the MAP estimation. If is concentrated around a sharp peak and () is broad enough around this peak, the ML, MAP, and Bayesian estimations yield approximately the same result. All three methods asymptotically ( same estimate )resultsinthe 20

Bayesian Estimation: Example =,, () =4 =7 21

Bayesian Estimation: Example = 22

Summary ML and MAP result in a single (point) estimate of the unknown parameters vector. More simple and interpretable than Bayesian estimation Bayesian approach estimates a distribution using all the available information: expected to give better results needs higher computational complexity Bayesian methods have gained a lot of popularity over the recent decade due to the advances in computer technology. All three methods asymptotically ( ) results in the same estimate. 23