- PDF Free Download

Size: px

Start display at page:

Download ""

Jonathan Wilcox
5 years ago
Views:

12 Weakness of Beta priors (or conjugate priors in general) They can only represent a limited range of prior beliefs. For example... There are no bimodal beta distributions (except when the modes are at 0 and 1). There are no beta distributions which are roughly constant in (.3,.7), and drop off rapidly to zero outside this range. etc. Use of arbitrary priors Requires numerical integration (or Monte Carlo). For example θf(x θ)π(θ) dθ E(θ x) = θ π(θ x) dθ = f(x θ)π(θ) dθ For E(h(θ) x), replace θ by h(θ) above. To compute Var(θ x) = E(θ 2 x) [E(θ x)] 2 E(θ 2 x). we need When θ is high-dimensional, Monte Carlo (as in WinBugs) is often easier then numerical integration. Mixtures of Beta priors (or conjugate priors in general) Allows representation of more general prior beliefs while retaining some of the convenience of conjugate priors.

13 For example, for Bayesian estimation of θ = P (heads): If then π(θ) = p 1 f 1 (θ) + p 2 f 2 (θ), where f 1 Beta(α 1, β 1 ), f 2 Beta(α 2, β 2 ) p 1, p 2 > 0 and p 1 + p 2 = 1, π(θ x) = p 1 f 1 (θ) + p 2 f 2 (θ), where f 1 Beta(α 1 + t, β 1 + n t), f 2 Beta(α 2 + t, β 2 + n t) p 1, p 2 > 0 and p 1 + p 2 = 1, with p 1, p 2 being functions of t (see below). The posterior weights are: p i = ψ i ψ 1 + ψ 2 where ψ i = p ib(α i + t, β i + n t) B(α i, β i ) with B(α, β) = Γ(α)Γ(β) Γ(α + β).

14 Example: Suppose λ π and, conditional on λ, X 1,..., X n are iid Poisson(λ). What is the family of conjugate priors in this situation? Examine the likelihood function and see what priors fit well with it: multiplying likelihood times prior should produce something in the same family (ignoring constants). L(λ x) = n i=1 λ x i e λ x i! = λ xi e nλ xi! λ t e nλ where t = which has the general form λ a e b λ, which is the kernel of a gamma density (as a function of λ). Textbook parameterization of Gamma density: f(x α, β) = xα 1 e x/β β α Γ(α) xα 1 e x/β Another common parameterization (call it Gamma(a, b)) substitutes b = 1/β: n i=1 x i f(x a, b) = ba x a 1 e bx Γ(a) (Substitute x λ in both pdf s.) x a 1 e bx. Suppose the prior π is Gamma(a, b). (Using Gamma(a, b) leads to a somewhat simpler updating rule.) Then π(λ x) L(λ x)π(λ) λ t e nλ λ a 1 e bλ = λ (a+t) 1 e (b+n)λ Gamma(a + t, b + n) pdf so that the posterior distn is Gamma(a+t, b+n). The Gamma family is closed under sampling and forms a conjugate family.

18 Bayesian Estimation and Sufficient Statistics Suppose T (X) is a sufficient statistic for θ. Fact: Posterior distributions depend on the data X only through the sufficient statistic T (X). Corollary: E(θ X) and Var(θ X) (and any other posterior quantity) depend on the data X only through T (X). Proof: By the FC, g, h such that f(x θ) = g(t (x), θ)h(x) for all x and θ. Thus π(θ x) f(x θ)π(θ) = g(t (x), θ)h(x)π(θ) g(t (x), θ)π(θ) so that g(t (x), θ)π(θ) π(θ x) = Θ g(t (x), θ )π(θ ) dθ which depends on x only through T (x).

19 A Simple Bayesian Hierarchical Model Situation: 10 coins are sampled from a population of coins. Each coin is tossed 20 times. We desire to estimate p 1, p 2,..., p 10, the probability of heads for each coin. Let X i be the number of heads in 20 tosses for coin i. A Bayesian can incorporate prior knowledge about the similarity of the coins into the prior distribution. Bayesian Model: (i = 1,..., 10 throughout) X i Binomial(p i, 20) η i Normal(µ, 1/τ) where p i = and η i = log eη i 1 + e η i ( pi 1 p i µ Normal(0, 1), τ Gamma(10, b) The rv s at each level are conditionally independent given the rv s at lower levels. X = (X 1, X 2,..., X 10 ), θ = (η 1, η 2,..., η 10, µ, τ) The posterior is π(θ X) 10 i=1 ( 20 X i ) p X i i 10 (1 p i ) 20 X i g(η i µ, τ) h(µ) k(τ) i=1 where g( µ, τ) is the N(µ, 1/τ) density, h( ) is the N(0, 1) density, and k( ) is the Gamma(10, b) density. Note: Gamma(a, b) has mean a/b and variance a/b 2. )

20 BUGS code describing the model: model { for(i in 1:N){ x[i] dbin(p[i],20) logit(p[i]) <- eta[i] eta[i] dnorm(mu,tau) } mu dnorm(0.0,1.0) tau dgamma(10.0,xxx) # Changing the scale only. } Data: list(n=10,x=c(14,8,11,14,11,11,8,10,12,8)) Inits: list(mu=0,tau=1,eta=c(0,0,0,0,0,0,0,0,0,0)) Story: There are 10 coins. Each is tossed 20 times. The data x[1] = 14, x[2] = 8,..., x[9] = 12, x[10] = 8 is the number of heads observed on each coin. Goal: Estimate p[1],..., p[10], the probability of heads for each coin Frequentist Answers Assuming all p[i] are equal leads to: phat = sum(x)/200 = std. dev. of phat = sqrt(.535*(1-.535)/200) = Estimating each p[i] separately leads to: p[1]hat = 14/20 = 0.7 p[2]hat = 8/20 = 0.4 std. dev. of p[1]hat = sqrt(.7*(1-.7)/20) = std. dev. of p[2]hat = sqrt(.4*(1-.4)/20) = Compare with the answers below.

21 The Likelihood function: The prior: L(θ) = f(x θ) = π(θ) = 10 i=1 10 i=1 ( 20 X i ) p X i i where p i = (1 p i ) 20 X i eη i 1 + e η i g(η i µ, τ) h(µ) k(τ) where g( µ, τ) is the N(µ, 1/τ) density, h( ) is the N(0, 1) density, and k( ) is the Gamma(10, b) density. The Posterior: π(θ X) f(x θ)π(θ) The Bayesian s belief after collecting the data is given by their posterior distrbution. BUGS uses MCMC (Markov Chain Monte Carlo) to generate a sample of many thousands (as many as you like) of θ vectors from the posterior distribution. Using this sample from the posterior, BUGS can print summary information for each component of θ = (η 1, η 2,..., η 10, µ, τ) giving Monte Carlo estimates of the posterior mean, posterior variance, posterior median, etc. BUGS can also print summary information of the posterior distribution of other quantities such as p 1, p 2,..., p 10 The following page gives output summarizing the posterior distributions of three different Bayesians, each with a different prior. The three priors differ only in the value of b, which takes on the three values b = 10 3, 10 4, 2.

22 Using: tau dgamma(10.0,.001) node mean sd MC error 2.5% median 97.5% mu node mean sd MC error 2.5% median 97.5% tau node mean sd MC error 2.5% median 97.5% p[1] p[2] Using: tau dgamma(10.0,10000) node mean sd MC error 2.5% median 97.5% mu node mean sd MC error 2.5% median 97.5% tau E E E node mean sd MC error 2.5% median 97.5% p[1] E p[2] E Using: tau dgamma(10.0,2.0) node mean sd MC error 2.5% median 97.5% mu node mean sd MC error 2.5% median 97.5% tau node mean sd MC error 2.5% median 97.5% p[1] E p[2] E

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.. Beta Distribution We ll start by learning about the Beta distribution, since we end up using