36-463/663: Multilevel & Hierarchical Models

Size: px

Start display at page:

Download "36-463/663: Multilevel & Hierarchical Models"

Julie James
5 years ago
Views:

1 36-463/663: Multilevel & Hierarchical Models From Maximum Likelihood to Bayes Brian Junker 132E Baker Hall 1 Outline 2016 Pre-election oll in Ohio Binomial and Bernoulli MLE Bayes Rule Bayes for densities Bayesian inference 2016 Pre-election oll in Ohio 2

2 Ohio, 2016 Pre-Election Poll Donald Trum (R) running for election to the residency against Hillary Clinton (D) In a Suffolk University Poll (Set 12-14, 2016): 401 of 500 voters exressed a reference for Trum or Clinton. Of those 401: 208 refer Donald Trum. In most olling, weights are attached to each resonse, to adjust the reresentativeness of the resonse for things like who is likely to be home when survey worker calls who refuses to answer etc We will ignore weights etcand treat the 401 as a simle random samle. 3 Possible models for the data 401 individual Bernoulli coin flis, x i = 1 for Trum, x i = 0 for Clinton 401 trials, 208 successes (Trum voters) What matters for MLE and SE is shae, not size! 4

3 Binomial and Bernoulli Likelihoods Binomial Likelihood Binomial Log-likelihood L bin () [arameter] Bernouli Likelihood L bin () e e-242 log(l bin ()) [arameter] Bernouli Log-likelihood log(l ber ()) [arameter] [arameter] 5 Proortionality and log-roortionality f(θ) g(θ) [ f(θ) is roortional tog(θ) ] if f(θ) = cg(θ) Clearly L bin () L ber (), with c = For log-likelihoods we also write : LL bin () LL ber () because LL bin () = LL ber () + log (weird, huh?) 6

4 Finding the MLE If we use the Bernoulli likelihood, If we use the Binomial likelihood Either way we want to maximize with k = 208, n=401 7 MLE: Point Estimate Differentiating and setting to zero so, clearly, 8

5 MLE: Standard Error & CI First we calculate the exected Information and then A CI for is then (0.47,0.57), uncertain who wins! 9 Bayes Rule (a.k.a. Bayes Theorem) A very simle idea with very owerful consequences We often start with information like P[A B] and what we really want is P[B A]. Bayes Theorem lets us turn the conditioning around : See htt://yudkowsky.net/rational/bayesfor a ton of examles and geeky roselytizing. 10

6 Finding Terrorists According to htt://wiki.answers.com/q/how_many_eole_fly_in_a_year, US airlines carry million assengers er year According to htt:// RAND_OP292.df, 42 eole were indicted in the US for jihadists activities in About 2000 eole are under surveillance in the UK (htt:// so let s generously assume that about 10,000 are under surveillance in the US. Let s assume (again generously) that all 10,000 will try to fly once in the US in a year, carrying a detectable weaon. Now suose TSA methods are 99.99% accurate: P[red light terrorist] = = P[green light not terrorist] What is P[terrorist red light]? P[not terrorist green]? How many travellers will be red-lighted? 11 Terrorists and Bayes B = terrorist; B c = not terrorist P(B) = 10,000/(561.9*10 6 ) = 1.78*10-15 A = red light; A c = green light P[A B]= P[A c B c ] = P[A] = P[A&B] + P[A&B c ] = P[A B]P[B] + P[A B c ]P[B c ] = (0.9999)(1.78*10-15 ) + ( )(1-1.78*10-15 ) = P[B A] = P[A B]P[B]/P[A] = (0.9999)(1.78*10-15 ) / = 1.5*10-11 P[B c A c ] = P[A c B c ]P[B c ]/P[A c ] = (0.9999)(1-1.78*10-15) ) / ( ) 1 E[#A] = P[A] * (561*10 6 ) = ( ) (561*10 6 ) = 66,188 There better be other ways! 12

7 Conditional robability & conditional density P[A & B] = P[B A]P[A] P[B] = P[B A]P[A] + P[B A c ]P[A c ] P[A B] = P[A&B]/P[B] f(x,y) = f(y x) f(x) f(x y) = f(x,y)/f(y) Bayes Theorem: Bayes Theorem: 13 Bayes Theorem for Data Bayes Theorem Let x = data, y = θ(arameter!); then 14

8 Bayes Theorem for Data We call f(θ) the rior distribution f(data θ) = L(θ) the likelihood f(θ data) the osterior distribution So Bayes Theorem says Slogan: (osterior) (likelihood) (rior) 15 Back to 2016 Ohio re-election oll The likelihoodis the same as before: L() k (1-) n-k We need a rior distribution. One good choice is a beta distribution, with Some grahs of beta densities aear on the next slide 16

9 Some Beta Densities dbeta(,0.5,0.5) dbeta(,1,0.5) dbeta(,0.5,1) dbeta(,1,1) dbeta(,0.5,2) dbeta(,1,2) dbeta(,0.5,4) dbeta(,1,4) dbeta(,2,0.5) 0 2 dbeta(,2,1) dbeta(,2,2) dbeta(,4,0.5) dbeta(,4,1) dbeta(,4,2) dbeta(,4,4) dbeta(,2,4) 17 Some Beta Densities uniform distribution! dbeta(,0.5,0.5) dbeta(,0.5,1) dbeta(,0.5,2) dbeta(,1,0.5) dbeta(,1,1) dbeta(,1,2) dbeta(,0.5,4) dbeta(,2,0.5) 0 2 dbeta(,2,1) dbeta(,2,2) dbeta(,1,4) dbeta(,4,0.5) dbeta(,4,1) dbeta(,4,2) dbeta(,4,4) dbeta(,2,4) 18

10 Choosing rior arameters The likelihoodis the same as before: L() k (1-) n-k = 208 (1-) 193 The rior distribution is a beta distribution α= 1, β= 1 gives a uniform distribution no reference for one over another! Suose that in a revious oll, 942 refer Trum and 1008 refer Clinton. Could set α=942, β= If α=1 and β=1 (osterior) (likelihood) (rior): Since f( data)=l(), osterior mode = MLE = 208/401 = 0.52 Since f( data) is a beta with α=209, β=194, E[ data] = 209/403 20

11 If α=942, β=1008 (osterior) (likelihood) (rior): shrinkage : osterior between rior & likelihood Since f( data) = beta(,1150,1202), E[ data] = 1150/2352 = vs MLE= Standard Errors (α=942, β=1008) Since then (comare to SE=0.018 from MLE ) Arox 95% interval from : (0.47, 0.51) still can t decide 22

12 Alternative interval for Since we know the osterior distribution of, we can calculate the 2.5%-ileand 97.5%-ileand get another 95% interval: > nsim < > <- rbeta(nsim,1150,1202) > quantile(,c(0.025,.5,.975)) 2.5% 50% 97.5% Gives almost the same 95% interval: (0.47, 0.51)... still can t decide Summary For MLE Need a function roortional to L(θ) Calculate MLE by setting 0 = LL (θ) Calculate where I(θ) = E[-LL (θ)] For Bayes Need a function roortional to L(θ) Need a rior distribution Calculate (osterior) (likelihood) (rior) Calculate osterior mean, SE Use formula if you have one Use simulation if you don t! 24

13 What we did 2016 Pre-election oll in Ohio Binomial and Bernoulli MLE Bayes Rule Bayes for densities Bayesian inference 2016 Pre-election oll in Ohio Summary! 25

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference Thais Paiva STA 111 - Summer 2013 Term II August 7, 2013 1 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013 Lecture