Lecture 23 Maximum Likelihood Estimation and Bayesian Inference

Size: px

Start display at page:

Download "Lecture 23 Maximum Likelihood Estimation and Bayesian Inference"

Cora Ferguson
5 years ago
Views:

1 Lecture 23 Maximum Likelihood Estimation and Bayesian Inference Thais Paiva STA Summer 2013 Term II August 7, / 31 Thais Paiva STA Summer 2013 Term II Lecture 23, 08/07/2013

2 Lecture Plan 1 Maximum likelihood estimation 2 Bayesian estimation 2 / 31 Thais Paiva STA Summer 2013 Term II Lecture 23, 08/07/2013

3 Reca f (x 1,..., x n ; θ 1,..., θ m ) is the function that links the robability of random variables to arameters If we treat the x 1,..., x n as variables and the arameters θ 1,..., θ m as constants, this is the joint density function f (x θ). However, if we treat the x 1,..., x n as constants (values observed in the samle) and the θ 1,..., θ m as variables, this is the likelihood function L(θ x). 3 / 31 Thais Paiva STA Summer 2013 Term II Lecture 23, 08/07/2013

4 Reca If X 1,..., X n are an iid (indeendent and identically distributed) samle from a oulation with robability density function f (x θ), then the likelihood function is defined by: L(θ x) = L(θ 1,..., θ m x 1,..., x n ) = n f (x i θ 1,..., θ m ) i=1 4 / 31 Thais Paiva STA Summer 2013 Term II Lecture 23, 08/07/2013

5 Maximum Likelihood Estimators Definition: MLE The Maximum Likelihood Estimators of the arameters θ 1,..., θ m are the values ˆθ 1,..., ˆθ m that maximize the likelihood function L(θ x). 5 / 31 Thais Paiva STA Summer 2013 Term II Lecture 23, 08/07/2013

6 Maximum Likelihood Estimators The MLE is the arameter oint for which the observed samle is most likely measured by the likelihood Finding the MLE is an otimization roblem Find the global maximum (differential calculus) 6 / 31 Thais Paiva STA Summer 2013 Term II Lecture 23, 08/07/2013

Unfair coin examle Suose I asked one student to fli an unfair coin 10 times 0 0 1 0 1 1 0 0 0 0 ˆ = 0.3 likelihood 0.0000 0.0010 0.

7 Unfair coin examle Suose I asked one student to fli an unfair coin 10 times ˆ = 0.3 likelihood But how do we get this curve??? / 31 Thais Paiva STA Summer 2013 Term II Lecture 23, 08/07/2013

8 Unfair coin examle The curve is the likelihood, a function of θ = Remember: Bernoulli R.V. s iid X 1,..., X n Bernoulli() n L( x 1,..., x n ) = x i (1 ) 1 x i = x i (1 ) n x i i=1 8 / 31 Thais Paiva STA Summer 2013 Term II Lecture 23, 08/07/2013

9 Unfair coin examle If x = , how likely is the data if = 0.5? likelihood (1 0.5) 10 3 = / 31 Thais Paiva STA Summer 2013 Term II Lecture 23, 08/07/2013

10 Unfair coin examle If x = , what about = 0.25 or 0.75? likelihood (1 0.25) 10 3 = (1 0.75) 10 3 = / 31 Thais Paiva STA Summer 2013 Term II Lecture 23, 08/07/2013

11 Unfair coin examle If x = , what about all [0, 1]? likelihood (1 ) 10 3 = L( x) 11 / 31 Thais Paiva STA Summer 2013 Term II Lecture 23, 08/07/2013

12 Unfair coin examle If x = , what about all [0, 1]? And the maximum? likelihood L( x) = 0 Easier to work with the log likelihood log L( x) 12 / 31 Thais Paiva STA Summer 2013 Term II Lecture 23, 08/07/2013

13 Unfair coin examle If x = , how likely are all [0, 1]? And the maximum? log likelihood log L( x) = / 31 Thais Paiva STA Summer 2013 Term II Lecture 23, 08/07/2013

14 Bernoulli MLE 1 L( x) = x i (1 ) n x i 2 log L( x) = ( x i ) log + (n x i ) log(1 ) 3 log L( x) = ( x i ) + (n x i ) 1 4 Set log L( x) = 0 and solve for ˆ MLE: ˆ = xi n 14 / 31 Thais Paiva STA Summer 2013 Term II Lecture 23, 08/07/2013

15 MLE - univariate case 1 Likelihood L(θ x) 2 Log likelihood log L(θ x) 3 Derivative θ log L(θ x) 4 Set θ log L(θ x) = 0 and solve for ˆθ MLE 15 / 31 Thais Paiva STA Summer 2013 Term II Lecture 23, 08/07/2013

16 MLE examle: Poisson X 1,..., X n iid Poisson(λ) so P(Xi = x i ) = e λ λx i x i! for x i = 0, 1,... 1 L(λ x) = n i=1 e λ λx i x i! = e nλ λ x1+ +xn x 1!... x n! 2 log L(λ x) = nλ + ( x i ) log λ log(x 1!... x n!) 3 λ log L(λ x) = n + xi log λ 4 Set to zero n + xi ˆλ = 0 and solve for ˆλ MLE = xi n 16 / 31 Thais Paiva STA Summer 2013 Term II Lecture 23, 08/07/2013

17 MLE - Normal distribution (known σ 2 ) X 1,..., X n iid N(µ, 1) ( ) n { 1 1 L(µ x) = 2π ex 1 n i=1 (x i µ) 2} ( 2 log L(µ x) = n log ) 1 2π n i=1 (x i µ) 2 3 µ log L(µ x) = n i=1 (x i µ) 4 Solving for ˆµ MLE = xi n 17 / 31 Thais Paiva STA Summer 2013 Term II Lecture 23, 08/07/2013

18 Bayesian Inference Recall Bayes Rule: P(A B) = P(B A) P(A) P(B) For the urose of estimation, we can exress the above as P(θ Data) = P(Data θ) P(θ) P(Data) Note that P(Data) does not deend on θ and it serves as a normalizing constant such that the right-hand side remains a valid density. We often write P(θ Data) P(Data θ) P(θ) 18 / 31 Thais Paiva STA Summer 2013 Term II Lecture 23, 08/07/2013

19 Bayesian Inference P(θ Data) P(Data θ) P(θ) 1 Data likelihood: P(Data θ) describes how the data is generated based on the arameter θ 2 Prior: P(θ) describes the information about θ before any data is collected 3 Posterior distribution: P(θ Data): describes how θ deends on data. In Bayesian analysis, we use this distribution to make inference 19 / 31 Thais Paiva STA Summer 2013 Term II Lecture 23, 08/07/2013

20 Bayesian Inference: baseball statistics In baseball, batters either reach base safely or make an out. The ercentage of times the batter reaches base over the entire year is called the on-base ercentage. Johnny Damon, on Aril 23, 2005, reached base safely in 22 out of 68. These 68 times can be thought of as a random samle of the times he will bat for the entire year (which is usually close to 600 times) 20 / 31 Thais Paiva STA Summer 2013 Term II Lecture 23, 08/07/2013

21 Bayesian Inference: baseball statistics Suose your rior beliefs about Damon s on-base ercentage follow the following distribution: Pr() / / / / / /20 Based on this rior distribution, what is the osterior robability that Johnny Damon s on-base ercentage at the end of the year will be 0.40? 21 / 31 Thais Paiva STA Summer 2013 Term II Lecture 23, 08/07/2013

22 Bayesian Inference: baseball statistics Jonny Damon s erformance can be modeled as a binomial distribution: Bayes theorem tells us that P(x = 22 ) = 68! 22!46! 22 (1 ) P( x) = P(x )P() (x) where (x) = j P(x, j ) 22 / 31 Thais Paiva STA Summer 2013 Term II Lecture 23, 08/07/2013

23 Bayesian Inference: baseball statistics with Pr() Pr(X=22 ) Pr(X=22, ) Pr( X=22) P(x) = = / 31 Thais Paiva STA Summer 2013 Term II Lecture 23, 08/07/2013

24 Bayesian Inference: baseball statistics Discrete rior density / 31 Thais Paiva STA Summer 2013 Term II Lecture 23, 08/07/2013

25 Bayesian Inference: baseball statistics Discrete rior density / 31 Thais Paiva STA Summer 2013 Term II Lecture 23, 08/07/2013

26 Bayesian Inference: baseball statistics Discrete rior rior likelihood osterior density / 31 Thais Paiva STA Summer 2013 Term II Lecture 23, 08/07/2013

27 Bayesian Inference: baseball statistics Note that this rior distribution is very strong, because it forces to equal only one of 6 values. A more realistic rior distribution would allow to range from 0 to 1 Also, note that the samle on-base ercentage is ( ). But, the model favors = 0.35 as oosed to = This is because we have a much higher rior belief that = 0.35 than = If we had different rior beliefs, our osterior robabilities would change 27 / 31 Thais Paiva STA Summer 2013 Term II Lecture 23, 08/07/2013

28 Bayesian Inference: baseball statistics Suose that we want to give rior beliefs to all [0, 1] We could use a Uniform distribution, or something else (Beta distribution) Uniform rior Uniform rior density density / 31 Thais Paiva STA Summer 2013 Term II Lecture 23, 08/07/2013

29 Bayesian Inference: baseball statistics Then, the osteriors would combine the information of the rior with the likelihood. Uniform rior Uniform rior density density / 31 Thais Paiva STA Summer 2013 Term II Lecture 23, 08/07/2013

30 Bayesian Inference: baseball statistics Then, the osteriors would combine the information of the rior with the likelihood. Uniform rior Uniform rior density rior likelihood osterior density rior likelihood osterior / 31 Thais Paiva STA Summer 2013 Term II Lecture 23, 08/07/2013

31 Summary 1 Maximum likelihood is a general-urose method that roduces good estimators 2 Being Bayesian is nice, but it gives you extra choices to make 31 / 31 Thais Paiva STA Summer 2013 Term II Lecture 23, 08/07/2013

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling