Lecture 23 Maximum Likelihood Estimation and Bayesian Inference

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference Thais Paiva STA 111 - Summer 2013 Term II August 7, 2013 1 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Lecture Plan 1 Maximum likelihood estimation 2 Bayesian estimation 2 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Reca f (x 1,..., x n ; θ 1,..., θ m ) is the function that links the robability of random variables to arameters If we treat the x 1,..., x n as variables and the arameters θ 1,..., θ m as constants, this is the joint density function f (x θ). However, if we treat the x 1,..., x n as constants (values observed in the samle) and the θ 1,..., θ m as variables, this is the likelihood function L(θ x). 3 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Reca If X 1,..., X n are an iid (indeendent and identically distributed) samle from a oulation with robability density function f (x θ), then the likelihood function is defined by: L(θ x) = L(θ 1,..., θ m x 1,..., x n ) = n f (x i θ 1,..., θ m ) i=1 4 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Maximum Likelihood Estimators Definition: MLE The Maximum Likelihood Estimators of the arameters θ 1,..., θ m are the values ˆθ 1,..., ˆθ m that maximize the likelihood function L(θ x). 5 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Maximum Likelihood Estimators The MLE is the arameter oint for which the observed samle is most likely measured by the likelihood Finding the MLE is an otimization roblem Find the global maximum (differential calculus) 6 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Unfair coin examle Suose I asked one student to fli an unfair coin 10 times 0 0 1 0 1 1 0 0 0 0 ˆ = 0.3 likelihood 0.0000 0.0010 0.0020 But how do we get this curve??? 0.0 0.2 0.4 0.6 0.8 1.0 7 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Unfair coin examle The curve is the likelihood, a function of θ = Remember: Bernoulli R.V. s iid X 1,..., X n Bernoulli() n L( x 1,..., x n ) = x i (1 ) 1 x i = x i (1 ) n x i i=1 8 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Unfair coin examle If x = 0 0 1 0 1 1 0 0 0 0, how likely is the data if = 0.5? likelihood 0.0000 0.0010 0.0020 0.0 0.2 0.4 0.6 0.8 1.0 0.5 3 (1 0.5) 10 3 = 0.001 9 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Unfair coin examle If x = 0 0 1 0 1 1 0 0 0 0, what about = 0.25 or 0.75? likelihood 0.0000 0.0010 0.0020 0.0 0.2 0.4 0.6 0.8 1.0 0.25 3 (1 0.25) 10 3 = 0.0021 0.75 3 (1 0.75) 10 3 = 0.0000 10 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Unfair coin examle If x = 0 0 1 0 1 1 0 0 0 0, what about all [0, 1]? likelihood 0.0000 0.0010 0.0020 0.0 0.2 0.4 0.6 0.8 1.0 3 (1 ) 10 3 = L( x) 11 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Unfair coin examle If x = 0 0 1 0 1 1 0 0 0 0, what about all [0, 1]? And the maximum? likelihood 0.0000 0.0010 0.0020 0.0 0.2 0.4 0.6 0.8 1.0 L( x) = 0 Easier to work with the log likelihood log L( x) 12 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Unfair coin examle If x = 0 0 1 0 1 1 0 0 0 0, how likely are all [0, 1]? And the maximum? log likelihood 30 25 20 15 10 log L( x) = 0 0.0 0.2 0.4 0.6 0.8 1.0 13 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Bernoulli MLE 1 L( x) = x i (1 ) n x i 2 log L( x) = ( x i ) log + (n x i ) log(1 ) 3 log L( x) = ( x i ) + (n x i ) 1 4 Set log L( x) = 0 and solve for ˆ MLE: ˆ = xi n 14 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

MLE - univariate case 1 Likelihood L(θ x) 2 Log likelihood log L(θ x) 3 Derivative θ log L(θ x) 4 Set θ log L(θ x) = 0 and solve for ˆθ MLE 15 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

MLE examle: Poisson X 1,..., X n iid Poisson(λ) so P(Xi = x i ) = e λ λx i x i! for x i = 0, 1,... 1 L(λ x) = n i=1 e λ λx i x i! = e nλ λ x1+ +xn x 1!... x n! 2 log L(λ x) = nλ + ( x i ) log λ log(x 1!... x n!) 3 λ log L(λ x) = n + xi log λ 4 Set to zero n + xi ˆλ = 0 and solve for ˆλ MLE = xi n 16 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

MLE - Normal distribution (known σ 2 ) X 1,..., X n iid N(µ, 1) ( ) n { 1 1 L(µ x) = 2π ex 1 n i=1 (x i µ) 2} ( 2 log L(µ x) = n log ) 1 2π 2 1 2 n i=1 (x i µ) 2 3 µ log L(µ x) = n i=1 (x i µ) 4 Solving for ˆµ MLE = xi n 17 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Bayesian Inference Recall Bayes Rule: P(A B) = P(B A) P(A) P(B) For the urose of estimation, we can exress the above as P(θ Data) = P(Data θ) P(θ) P(Data) Note that P(Data) does not deend on θ and it serves as a normalizing constant such that the right-hand side remains a valid density. We often write P(θ Data) P(Data θ) P(θ) 18 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Bayesian Inference P(θ Data) P(Data θ) P(θ) 1 Data likelihood: P(Data θ) describes how the data is generated based on the arameter θ 2 Prior: P(θ) describes the information about θ before any data is collected 3 Posterior distribution: P(θ Data): describes how θ deends on data. In Bayesian analysis, we use this distribution to make inference 19 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Bayesian Inference: baseball statistics In baseball, batters either reach base safely or make an out. The ercentage of times the batter reaches base over the entire year is called the on-base ercentage. Johnny Damon, on Aril 23, 2005, reached base safely in 22 out of 68. These 68 times can be thought of as a random samle of the times he will bat for the entire year (which is usually close to 600 times) 20 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Bayesian Inference: baseball statistics Suose your rior beliefs about Damon s on-base ercentage follow the following distribution: Pr() 0.25 1/20 0.30 1/10 0.35 3/10 0.40 4/10 0.45 1/10 0.50 1/20 Based on this rior distribution, what is the osterior robability that Johnny Damon s on-base ercentage at the end of the year will be 0.40? 21 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Bayesian Inference: baseball statistics Jonny Damon s erformance can be modeled as a binomial distribution: Bayes theorem tells us that P(x = 22 ) = 68! 22!46! 22 (1 ) 68 22 P( x) = P(x )P() (x) where (x) = j P(x, j ) 22 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Bayesian Inference: baseball statistics with Pr() Pr(X=22 ) Pr(X=22, ) Pr( X=22) 0.25.05.0408.00204.0352 0.30.10.0943.00943.1625 0.35.30.0926.02778.4791 0.40.40.0440.01760.3035 0.45.10.0107.00107.01850 0.50.05.0014.000068.00117 P(x) =.00204 +.00943 +.02778 +.01760 +.00107 +.000068 =.057988 23 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Bayesian Inference: baseball statistics Discrete rior density 0.0 0.1 0.2 0.3 0.4 0.25 0.30 0.35 0.40 0.45 0.50 24 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Bayesian Inference: baseball statistics Discrete rior density 0.0 0.1 0.2 0.3 0.4 0.25 0.30 0.35 0.40 0.45 0.50 25 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Bayesian Inference: baseball statistics Discrete rior rior likelihood osterior density 0.0 0.1 0.2 0.3 0.4 0.25 0.30 0.35 0.40 0.45 0.50 26 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Bayesian Inference: baseball statistics Note that this rior distribution is very strong, because it forces to equal only one of 6 values. A more realistic rior distribution would allow to range from 0 to 1 Also, note that the samle on-base ercentage is 0.3235 ( 22 68 ). But, the model favors = 0.35 as oosed to = 0.30. This is because we have a much higher rior belief that = 0.35 than = 0.30. If we had different rior beliefs, our osterior robabilities would change 27 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Bayesian Inference: baseball statistics Suose that we want to give rior beliefs to all [0, 1] We could use a Uniform distribution, or something else (Beta distribution) Uniform rior Uniform rior density 0 1 2 3 4 5 6 7 density 0 1 2 3 4 5 6 7 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 28 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Bayesian Inference: baseball statistics Then, the osteriors would combine the information of the rior with the likelihood. Uniform rior Uniform rior density 0 1 2 3 4 5 6 7 density 0 1 2 3 4 5 6 7 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 29 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Bayesian Inference: baseball statistics Then, the osteriors would combine the information of the rior with the likelihood. Uniform rior Uniform rior density 0 1 2 3 4 5 6 7 rior likelihood osterior density 0 1 2 3 4 5 6 7 rior likelihood osterior 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 30 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013

Summary 1 Maximum likelihood is a general-urose method that roduces good estimators 2 Being Bayesian is nice, but it gives you extra choices to make 31 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013