Lecture 23 Maximum Likelihood Estimation and Bayesian Inference

Similar documents
CSC321 Lecture 18: Learning Probabilistic Models

Bayesian classification CISC 5800 Professor Daniel Leeds

Introduction to Probability for Graphical Models

Review of Discrete Probability (contd.)

36-463/663: Multilevel & Hierarchical Models

HPD Intervals / Regions

Introduction to Machine Learning. Lecture 2

ECON 4130 Supplementary Exercises 1-4

Basics of Inference. Lecture 21: Bayesian Inference. Review - Example - Defective Parts, cont. Review - Example - Defective Parts

STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

4. Score normalization technical details We now discuss the technical details of the score normalization method.

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

COS513 LECTURE 8 STATISTICAL CONCEPTS

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)]

Hypothesis Testing: The Generalized Likelihood Ratio Test

Machine Learning

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

STAT 425: Introduction to Bayesian Analysis

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Estimation and Detection

Introduc)on to Bayesian Methods

Introduction to Bayesian Learning. Machine Learning Fall 2018

Lecture 8 Sampling Theory

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Parameter Estimation

Bayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

CS 361: Probability & Statistics

Introduction to Probabilistic Machine Learning

Introduction into Bayesian statistics

Overview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58

Parametric Techniques Lecture 3

CS 361: Probability & Statistics

Point Estimation. Vibhav Gogate The University of Texas at Dallas

Probability and Estimation. Alan Moses

Math 152. Rumbos Fall Solutions to Assignment #12

Computational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept

Exercise 1: Basics of probability calculus

Introduction to Bayesian Methods

Notes on the Multivariate Normal and Related Topics

CSC321: 2011 Introduction to Neural Networks and Machine Learning. Lecture 10: The Bayesian way to fit models. Geoffrey Hinton

Computational Cognitive Science

CSE 312 Final Review: Section AA

Parametric Techniques

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Bayesian Statistics. Debdeep Pati Florida State University. February 11, 2016

Bayesian RL Seminar. Chris Mansley September 9, 2008

Review of Probabilities and Basic Statistics

Statistics 3858 : Maximum Likelihood Estimators

Time Series and Dynamic Models

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

1 Introduction. P (n = 1 red ball drawn) =

Research Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI **

Math Review Sheet, Fall 2008

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

2.6.3 Generalized likelihood ratio tests

Loglikelihood and Confidence Intervals

Probabilistic Graphical Models

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

Probability Theory for Machine Learning. Chris Cremer September 2015

Nuisance parameters and their treatment

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Primer on statistics:

Lecture 4. Generative Models for Discrete Data - Part 3. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza.

Published: 14 October 2013

COMP90051 Statistical Machine Learning

Chapter 8.8.1: A factorization theorem

Computational Perception. Bayesian Inference

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Lecture: Condorcet s Theorem

Likelihoods. P (Y = y) = f(y). For example, suppose Y has a geometric distribution on 1, 2,... with parameter p. Then the pmf is

Chapter 7: Special Distributions

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

Lecture 13 Fundamentals of Bayesian Inference

Answer Key for STAT 200B HW No. 7

Data Analysis and Uncertainty Part 2: Estimation

Lecture 3 January 16

Mathematical statistics

More on nuisance parameters

MAS3301 Bayesian Statistics

Computational Cognitive Science

Lecture 18: Bayesian Inference

MATH 829: Introduction to Data Mining and Analysis Consistency of Linear Regression

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

Lecture 4 Bayes Theorem

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

GOV 2001/ 1002/ E-2001 Section 3 Theories of Inference

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell

Machine Learning, Fall 2012 Homework 2

Terminology. Experiment = Prior = Posterior =

Estimation MLE-Pandemic data MLE-Financial crisis data Evaluating estimators. Estimation. September 24, STAT 151 Class 6 Slide 1

Transcription:

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference Thais Paiva STA 111 - Summer 2013 Term II August 7, 2013 1 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Lecture Plan 1 Maximum likelihood estimation 2 Bayesian estimation 2 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Reca f (x 1,..., x n ; θ 1,..., θ m ) is the function that links the robability of random variables to arameters If we treat the x 1,..., x n as variables and the arameters θ 1,..., θ m as constants, this is the joint density function f (x θ). However, if we treat the x 1,..., x n as constants (values observed in the samle) and the θ 1,..., θ m as variables, this is the likelihood function L(θ x). 3 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Reca If X 1,..., X n are an iid (indeendent and identically distributed) samle from a oulation with robability density function f (x θ), then the likelihood function is defined by: L(θ x) = L(θ 1,..., θ m x 1,..., x n ) = n f (x i θ 1,..., θ m ) i=1 4 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Maximum Likelihood Estimators Definition: MLE The Maximum Likelihood Estimators of the arameters θ 1,..., θ m are the values ˆθ 1,..., ˆθ m that maximize the likelihood function L(θ x). 5 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Maximum Likelihood Estimators The MLE is the arameter oint for which the observed samle is most likely measured by the likelihood Finding the MLE is an otimization roblem Find the global maximum (differential calculus) 6 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Unfair coin examle Suose I asked one student to fli an unfair coin 10 times 0 0 1 0 1 1 0 0 0 0 ˆ = 0.3 likelihood 0.0000 0.0010 0.0020 But how do we get this curve??? 0.0 0.2 0.4 0.6 0.8 1.0 7 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Unfair coin examle The curve is the likelihood, a function of θ = Remember: Bernoulli R.V. s iid X 1,..., X n Bernoulli() n L( x 1,..., x n ) = x i (1 ) 1 x i = x i (1 ) n x i i=1 8 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Unfair coin examle If x = 0 0 1 0 1 1 0 0 0 0, how likely is the data if = 0.5? likelihood 0.0000 0.0010 0.0020 0.0 0.2 0.4 0.6 0.8 1.0 0.5 3 (1 0.5) 10 3 = 0.001 9 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Unfair coin examle If x = 0 0 1 0 1 1 0 0 0 0, what about = 0.25 or 0.75? likelihood 0.0000 0.0010 0.0020 0.0 0.2 0.4 0.6 0.8 1.0 0.25 3 (1 0.25) 10 3 = 0.0021 0.75 3 (1 0.75) 10 3 = 0.0000 10 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Unfair coin examle If x = 0 0 1 0 1 1 0 0 0 0, what about all [0, 1]? likelihood 0.0000 0.0010 0.0020 0.0 0.2 0.4 0.6 0.8 1.0 3 (1 ) 10 3 = L( x) 11 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Unfair coin examle If x = 0 0 1 0 1 1 0 0 0 0, what about all [0, 1]? And the maximum? likelihood 0.0000 0.0010 0.0020 0.0 0.2 0.4 0.6 0.8 1.0 L( x) = 0 Easier to work with the log likelihood log L( x) 12 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Unfair coin examle If x = 0 0 1 0 1 1 0 0 0 0, how likely are all [0, 1]? And the maximum? log likelihood 30 25 20 15 10 log L( x) = 0 0.0 0.2 0.4 0.6 0.8 1.0 13 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Bernoulli MLE 1 L( x) = x i (1 ) n x i 2 log L( x) = ( x i ) log + (n x i ) log(1 ) 3 log L( x) = ( x i ) + (n x i ) 1 4 Set log L( x) = 0 and solve for ˆ MLE: ˆ = xi n 14 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

MLE - univariate case 1 Likelihood L(θ x) 2 Log likelihood log L(θ x) 3 Derivative θ log L(θ x) 4 Set θ log L(θ x) = 0 and solve for ˆθ MLE 15 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

MLE examle: Poisson X 1,..., X n iid Poisson(λ) so P(Xi = x i ) = e λ λx i x i! for x i = 0, 1,... 1 L(λ x) = n i=1 e λ λx i x i! = e nλ λ x1+ +xn x 1!... x n! 2 log L(λ x) = nλ + ( x i ) log λ log(x 1!... x n!) 3 λ log L(λ x) = n + xi log λ 4 Set to zero n + xi ˆλ = 0 and solve for ˆλ MLE = xi n 16 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

MLE - Normal distribution (known σ 2 ) X 1,..., X n iid N(µ, 1) ( ) n { 1 1 L(µ x) = 2π ex 1 n i=1 (x i µ) 2} ( 2 log L(µ x) = n log ) 1 2π 2 1 2 n i=1 (x i µ) 2 3 µ log L(µ x) = n i=1 (x i µ) 4 Solving for ˆµ MLE = xi n 17 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Bayesian Inference Recall Bayes Rule: P(A B) = P(B A) P(A) P(B) For the urose of estimation, we can exress the above as P(θ Data) = P(Data θ) P(θ) P(Data) Note that P(Data) does not deend on θ and it serves as a normalizing constant such that the right-hand side remains a valid density. We often write P(θ Data) P(Data θ) P(θ) 18 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Bayesian Inference P(θ Data) P(Data θ) P(θ) 1 Data likelihood: P(Data θ) describes how the data is generated based on the arameter θ 2 Prior: P(θ) describes the information about θ before any data is collected 3 Posterior distribution: P(θ Data): describes how θ deends on data. In Bayesian analysis, we use this distribution to make inference 19 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Bayesian Inference: baseball statistics In baseball, batters either reach base safely or make an out. The ercentage of times the batter reaches base over the entire year is called the on-base ercentage. Johnny Damon, on Aril 23, 2005, reached base safely in 22 out of 68. These 68 times can be thought of as a random samle of the times he will bat for the entire year (which is usually close to 600 times) 20 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Bayesian Inference: baseball statistics Suose your rior beliefs about Damon s on-base ercentage follow the following distribution: Pr() 0.25 1/20 0.30 1/10 0.35 3/10 0.40 4/10 0.45 1/10 0.50 1/20 Based on this rior distribution, what is the osterior robability that Johnny Damon s on-base ercentage at the end of the year will be 0.40? 21 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Bayesian Inference: baseball statistics Jonny Damon s erformance can be modeled as a binomial distribution: Bayes theorem tells us that P(x = 22 ) = 68! 22!46! 22 (1 ) 68 22 P( x) = P(x )P() (x) where (x) = j P(x, j ) 22 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Bayesian Inference: baseball statistics with Pr() Pr(X=22 ) Pr(X=22, ) Pr( X=22) 0.25.05.0408.00204.0352 0.30.10.0943.00943.1625 0.35.30.0926.02778.4791 0.40.40.0440.01760.3035 0.45.10.0107.00107.01850 0.50.05.0014.000068.00117 P(x) =.00204 +.00943 +.02778 +.01760 +.00107 +.000068 =.057988 23 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Bayesian Inference: baseball statistics Discrete rior density 0.0 0.1 0.2 0.3 0.4 0.25 0.30 0.35 0.40 0.45 0.50 24 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Bayesian Inference: baseball statistics Discrete rior density 0.0 0.1 0.2 0.3 0.4 0.25 0.30 0.35 0.40 0.45 0.50 25 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Bayesian Inference: baseball statistics Discrete rior rior likelihood osterior density 0.0 0.1 0.2 0.3 0.4 0.25 0.30 0.35 0.40 0.45 0.50 26 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Bayesian Inference: baseball statistics Note that this rior distribution is very strong, because it forces to equal only one of 6 values. A more realistic rior distribution would allow to range from 0 to 1 Also, note that the samle on-base ercentage is 0.3235 ( 22 68 ). But, the model favors = 0.35 as oosed to = 0.30. This is because we have a much higher rior belief that = 0.35 than = 0.30. If we had different rior beliefs, our osterior robabilities would change 27 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Bayesian Inference: baseball statistics Suose that we want to give rior beliefs to all [0, 1] We could use a Uniform distribution, or something else (Beta distribution) Uniform rior Uniform rior density 0 1 2 3 4 5 6 7 density 0 1 2 3 4 5 6 7 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 28 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Bayesian Inference: baseball statistics Then, the osteriors would combine the information of the rior with the likelihood. Uniform rior Uniform rior density 0 1 2 3 4 5 6 7 density 0 1 2 3 4 5 6 7 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 29 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Bayesian Inference: baseball statistics Then, the osteriors would combine the information of the rior with the likelihood. Uniform rior Uniform rior density 0 1 2 3 4 5 6 7 rior likelihood osterior density 0 1 2 3 4 5 6 7 rior likelihood osterior 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 30 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Summary 1 Maximum likelihood is a general-urose method that roduces good estimators 2 Being Bayesian is nice, but it gives you extra choices to make 31 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013