MAS3301 Bayesian Statistics Problems 5 and Solutions

Size: px

Start display at page:

Download "MAS3301 Bayesian Statistics Problems 5 and Solutions"

Robyn Douglas
5 years ago
Views:

1 MAS3301 Bayesian Statistics Problems 5 and Solutions Semester Problems 5 1. (Some of this question is also in Problems 4). I recorded the attendance of students at tutorials for a module. Suppose that we can, in some sense, regard the students as a sample from some population of students so that, for example, we can learn about the likely behaviour of next year s students by observing this year s. At the time I recorded the data we had had tutorials in Week and Week 4. Let the probability that a student attends in both weeks be θ 11, the probability that a student attends in week but not Week 4 be θ 10 and so on. The data are as follows. Attendance Probability Observed frequency Week and Week 4 θ 11 n 11 = 5 Week but not Week 4 θ 10 n 10 = 7 Week 4 but not Week θ 01 n 01 = 6 Neither week θ 00 n 00 = 13 Suppose that the prior distribution for (θ 11, θ 10, θ 01, θ 00 ) is a Dirichlet distribution with density proportional to θ 3 11θ 10 θ 01 θ 00. (a) Find the prior means and prior variances of θ 11, θ 10, θ 01, θ 00. (b) Find the posterior distribution. (c) Find the posterior means and posterior variances of θ 11, θ 10, θ 01, θ 00. (d) Using the R function hpdbeta which may be obtained from the Web page (or otherwise), find a 95% posterior hpd interval, based on the exact posterior distribution, for θ 00. (e) Find an approximate 95% hpd interval for θ 00 using a normal approximation based on the posterior mode and the partial second derivatives of the log posterior density. Compare this with the exact hpd interval. Hint: To find the posterior mode you will need to introduce a Lagrange multiplier. (f) The population mean number of attendances out of two is µ = θ 11 + θ 10 + θ 01. Find the posterior mean of µ and an approximation to the posterior standard deviation of µ.. Samples are taken from twenty wagonloads of an industrial mineral and analysed. The amounts in ppm (parts per million) of an impurity are found to be as follows We regard these as independent samples from a normal distribution with mean µ and variance σ = τ 1. Find a 95% posterior hpd interval for µ under each of the following two conditions. 1

2 (a) The value of τ is known to be 0.1 and our prior distribution for µ is normal with mean 60.0 and standard deviation 0.0. (b) The value of τ is unknown. Our prior distribution for τ is a gamma distribution with mean 0.1 and standard deviation Our conditional prior distribution for µ given τ is normal with mean 60.0 and precision 0.05τ (that is, standard deviation 40τ 1/ ). 3. We observe a sample of 30 observations from a normal distribution with mean µ and precision τ. The data, y 1,..., y 30, are such that 30 y i = 67 and 30 y i = (a) Suppose that the value of τ is known to be 0.04 and that our prior distribution for µ is normal with mean 0 and variance 100. Find the posterior distribution of µ and evaluate a posterior 95% hpd interval for µ. (b) Suppose that we have a gamma(1, 10) prior distribution for τ and our conditional prior distribution for µ given τ is normal with mean 0 and variance (0.1τ) 1. Find the marginal posterior distribution for τ, the marginal posterior distribution for µ and the marginal posterior 95% hpd interval for µ. 4. The following data come from the experiment reported by MacGregor et al. (1979). They give the supine systolic blood pressures (mm Hg) for fifteen patients with moderate essential hypertension. The measurements were taken immediately before and two hours after taking a drug. Patient Before After Patient Before After We are interested in the effect of the drug on blood pressure. We assume that, given parameters µ, τ, the changes in blood pressure, from before to after, in the n patients are independent and normally distributed with unknown mean µ and unknown precision τ. The fifteen differences are as follows Our prior distribution for τ is a gamma(0.35, 1.01) distribution. Our conditional prior distribution for µ given τ is a normal N(0, [0.003τ] 1 ) distribution. (a) Find the marginal posterior distribution of τ. (b) Find the marginal posterior distribution of µ. (c) Find the marginal posterior 95% hpd interval for µ. (d) Comment on what you can conclude about the effect of the drug. 5. The lifetimes of certain components are supposed to follow a Weibull distribution with known shape parameter α =. The probability density function of the lifetime distribution is for 0 < t <. f(t) = αρ t exp[ (ρt) ] We will observe a sample of n such lifetimes where n is large.

3 (a) Assuming that the prior density is nonzero and reasonably flat so that it may be disregarded, find an approximation to the posterior distribution of ρ. Find an approximate 95% hpd interval for ρ when n = 300, log(t) = and t = (b) Assuming that the prior distribution is a gamma(a, b) distribution, find an approximate 95% hpd interval for ρ, taking into account this prior, when a =, b = 100, n = 300, log(t) = and t = Given the value of λ, the number X i of transactions made by customer i at an online store in a year has a Poisson(λ) distribution, with X i independent of X j for i j. The value of λ is unknown. Our prior distribution for λ is a gamma(5,1) distribution. We observe the numbers of transactions in a year for 45 customers and 45 x i = 18. (a) Using a χ table (i.e. without a computer) find the lower.5% point and the upper.5% point of the prior distribution of λ. (These bound a 95% symmetric prior credible interval). (b) Find the posterior distribution of λ. (c) Using a normal approximation to the posterior distribution, based on the posterior mean and variance, find a 95% symmetric posterior credible interval for λ. (d) Find an expression for the posterior predictive probability that a customer makes m transactions in a year. (e) As well as these ordinary customers, we believe that there is a second group of individuals. The number of transactions in a year for a member of this second group has, given θ, a Poisson(θ) distribution and our beliefs about the value of θ are represented by a gamma(1,0.05) distribution. A new individual is observed who makes 10 transactions in a year. Given that our prior probability that this is an ordinary customer is 0.9, find our posterior probability that this is an ordinary customer. Hint: You may find it best to calculate the logarithms of the predictive probabilities before exponentiating these. For this you might find the R function lgamma useful. It calculates the log of the gamma function. Alternatively it is possible to do the calculation using the R function dnbinom. (N.B. In reality a slightly more complicated model is used in this type of application). 7. The following data give the heights in cm of 5 ten-year-old children. We assume that, given the values of µ and τ, these are independent observations from a normal distribution with mean µ and variance τ (a) Assuming that the value of τ 1 is known to be 64 and our prior distribution for µ is normal with mean 55 and standard deviation 5, find a 95% hpd interval for the height in cm of another ten-year-old child drawn from the same population. (b) Assume now that the value of τ is unknown but we have a prior distribution for it which is a gamma(,18) distribution and our conditional prior distribution for µ given τ is normal with mean 55 and variance (.56τ) 1. Find a 95% hpd interval for the height in cm of another ten-year-old child drawn from the same population. 3

4 8. A random sample of n = 1000 people was chosen from a large population. Each person was asked whether they approved of a proposed new law. The number answering Yes was x = 37. (For the purpose of this exercise all other responses and non-responses are teated as simply Not Yes ). Assume that x is an observation from the binomial(n, p) distribution where p is the unknown proportion of people in the population who would answer Yes. Our prior distribution for p is a uniform distribution on (0, 1). Let p = Φ(θ) so θ = Φ 1 (p) where Φ(y) is the standard normal distribution function and Φ 1 (z) is its inverse. (a) Find the maximum likelihood estimate of p and hence find the maximum likelihood estimate of θ. (b) Disregarding the prior distribution, find a large-sample approximation to the posterior distribution of θ. (c) Using your approximate posterior distribution for θ, find an approximate 95% hpd interval for θ. (d) Use the exact posterior distribution for p to find the actual posterior probability that θ is inside your approximate hpd interval. Notes: The standard normal distribution function Φ(x) = x φ(u) du where φ(u) = (π) 1/ exp{ u /}. Let l be the log-likelihood. Then and Derivatives of p : d l dθ = d { } dl dp dθ dp dθ You can evaluate Φ 1 (u) using R with qnorm(u,0,1) and φ(u) is given by dnorm(u,0,1) dl dθ = dl dp dp dθ = d { } dl dp dθ dp dθ + dl d p dp dθ ( ) = d l dp dp + dl d p dθ dp dθ dp dθ = φ(θ) d p dθ = θφ(θ) 9. The amounts of rice, by weight, in 0 nominally 500g packets are determined. The weights, in g, are as follows Assume that, given the values of parameters µ, τ, the weights are independent and each has a normal N(µ, τ) distribution. The values of µ and τ are unknown. Our prior distribution is as follows. We have a gamma(, 9) prior distribution for τ and a N(500, (0.005τ) 1 ) conditional prior distribution for µ given τ. 4

5 (a) Find the posterior probability that µ < 495. (b) Find the posterior predictive probability that a new packet of rice will contain less than 500g of rice 10. A machine which is used in a manufacturing process jams from time to time. It is thought that the frequency of jams might change over time as the machine becomes older. Once every three months the number of jams in a day is counted. The results are as follows. Observation i Age of machine t i (months) Number of jams y i Our model is as follows. Given the values of two parameters α, β, the number of jams y i on a dat when the machine has age t i months has a Poisson distribution where y i Poisson(λ i ) log e (λ i ) = α + βt i. Assume that the effect of our prior distribution on the posterior distribution is negligible and that large-sample approximations may be used. (a) Let the values of α and β which maximise the likelihood be ˆα and ˆβ. Assuming that the likelihood is differentiable at its maximum, show that these satisfy the following two equations where 8 (ˆλ i y i ) = 0 8 t i (ˆλ i y i ) = 0 log e (ˆλ i ) = ˆα + ˆβt i and show that these equations are satisfied (to a good approximation) by ˆα =.55 and ˆβ = (You may use R to help with the calculations, but show your commands). You may assume from now on that these values maximise the likelihood. (b) Find an approximate symmetric 95% posterior interval for α + 4β. (c) Find an approximate symmetric 95% posterior interval for exp(α + 4β), the mean jam-rate per day at age 4 months. (You may use R to help with the calculations, but show your commands). Homework 5 Solutions to Questions 9 and 10 of Problems 5 are to be submitted in the Homework Letterbox no later than 4.00pm on Tuesday May 5th. Reference MacGregor, G.A., Markandu, N.D., Roulston, J.E. and Jones, J.C., Essential hypertension: effect of an oral inhibitor of angiotensin-converting enzyme. British Medical Journal,,

6 Solutions 1. (a) Prior distribution is Dirichlet(4,,,3). So A 0 = = 11. The prior means are The prior variances are Prior means: a 0,i A 0. a 0,i a 0,i (A 0 + 1)A 0 A 0 (A 0 + 1). θ 11 : θ 10 : θ 01 : θ 00 : = = = = 0.77 Prior variances: θ 11 : θ 10 : θ 01 : θ 00 : = = = = NOTE: Suppose that we are given prior means m 1,..., m 4 and one prior standard deviation s 1. Then a 0i = m i A 0 and Hence s m 1 A 0 1 = m 1A 0 (A 0 + 1)A 0 A 0 (A 0 + 1) = m 1(1 m 1 ). A A = m 1(1 m 1 ) s 1 = ( ) = 1. Hence A 0 = 11 and a 0i = 11m i. For example a 01 = = 4. (b) Posterior distribution is Dirichlet(4+5, +7, +6, 3+13). That is Dirichlet(9,9,8,16). (c) Now A 1 = = 6. The posterior means are The posterior variances are a 1,i A 1. a 1,i a 1,i (A 1 + 1)A 1 A 1 (A 1 + 1). 6

7 Posterior means: Posterior variances: θ 11 : θ 10 : θ 01 : θ 00 : = = = = θ 11 : θ 10 : θ 01 : θ 00 : = = = = (d) Posterior distribution for θ 00 is beta(16, 6 16). That is beta(16,46). Using the R command hpdbeta(0.95,16,46) gives < θ 00 < (e) The log posterior density is (apart from a constant) 4 (a 1,j 1) log θ j. j=1 Add λ( j j=1 θ j 1) to this and differentiate wrt θ j then set the derivative equal to zero. This gives a 1,j 1 ˆθ j + λ = 0 which leads to However 4 j=1 θ j = 1 so so and ˆθ j = (a 1,j 1). λ 4 j=1 λ = (a 1,j 1) λ = 1 4 (a 1,j 1) j=1 ˆθ j = a 1,j 1 a1,k 4. Hence the posterior mode for θ 00 is ˆθ 00 = =

8 The second derivatives of the log likelihood are l θ j = a 1,j 1 θ j and l θ j θ k = 0. Since the mixed partial second derivatives are zero, the information matrix is diagonal and the posterior variance of θ j is approximately ˆθ j a 1,j 1 = (a 1,j 1) (a 1,j 1)( a 1,k 4) = (a i,j 1 ( a 1,k 4). The posterior variance of θ 00 is approximately = The approximate 95% hpd interval is ± that is < θ 00 < This is a little wider than the exact interval. (f) Approximation based on posterior mode and curvature: Posterior modes: 8 8 θ 11 : θ 10 : θ 01 : So, approx. posterior mean of µ is Approx. posterior variances: θ 11 : = = θ 10 : 8 58 θ 01 : Since the (approx.) covariances are all zero, the approx. posterior variance of µ is = = so approx. standard deviation is = N.B. There is an alternative exact calculation, as follows, which is also acceptable. Posterior mean: Posterior covariances: = a 1,ja 1,k A 1 (A 1 + 1) var(µ) = 4var(θ 11 ) + var(θ 10 ) + var(θ 01 ) +4covar(θ 11, θ 10 ) + 4covar(θ 11, θ 01 ) + covar(θ 10, θ 01 ) ( ) ( 9 = ) ( ) 63 6 ( ) ( ) ( ) = = So the standard deviation is The difference is quite big! 8

9 . From the data ȳ = n y i = s n = 1 n n (y i ȳ) = 1 n n yi = { } = (a) Prior mean: M 0 = 60.0 Prior precision: P 0 = 1/0 = Data precision: P d = nτ = = Posterior precision: P 1 = P 0 + P d =.005 Posterior mean: M 1 =.005 Posterior std. dev.: = % hpd interval: ± That is (b) Prior τ gamma(d/, dv/) where < µ < d/ dv/ = 1 v = 0.1 = so v = 10 and d/ (dv/) = 1 v d = 0.05 so /d = 0.5 so /d = 0.5 so d = 8. Hence d 0 = 8 v 0 = 10 c 0 = 0.05 m 0 = 60.0 m 1 = c 0m 0 + nȳ = c 0 + n 0.05 c 1 = c 0 + n = 0.05 d 1 = d 0 + n = 8 r = 1 n (yi m 0 ) = (ȳ m 0 ) + s n = ( ) = = % hpd interval: v d = c 0r + ns n c 0 + n v 1 = d 0v 0 + nv d d 0 + n = = M 1 ± t 8 v1 c 1 9

10 That is That is M 1 ± < µ < (a) We have P 0 = 0.01 P d = nτ = = 1. P 1 = = 1.1 M 0 = 0 ȳ =.4 M 1 = P 0M 0 + P d ȳ P 1 = =.380 Posterior: µ N(.380, ). That is µ N(.380, 0.864). 95% hpd interval:.380 ± That is (b) We have d 0 = d 1 = d 0 + n = 3 v 0 = 10 c 0 = 0.1 c 1 = c 0 + n = 30.1 m 0 = < µ < 4.16 ȳ =.4 m 1 = c 0m 0 + nȳ = =.39 c 0 + n 30.1 ( s n = 1 n yi 1 n ) y n i n = (ȳ m 0 ) = (.4 0) = 5.76 r = (ȳ m 0 ) + s n = v d = c 0r + ns n c 0 + n v 1 = d 0v 0 + nv d d 0 + n = = Marginal posterior distribution for τ: d 1 v 1 τ = τ χ 3. Marginal posterior distribution for µ: µ m 1 µ.39 = t 3 v1 /c / % hpd interval: v1 m 1 ±.037. c 1 10

11 That is That is ± < µ < Data: n = 15 y = 84 y = 6518 ȳ = s n = 1 } { = = Calculate posterior: d 0 = 0.7 v 0 =.0/0.7 =.8857 c 0 = m 0 = 0 d 1 = d = 15.7 (ȳ m 0 ) = ȳ = r = (ȳ m 0 ) + s n = v d = c 0r + ns n c 0 + n = v 1 = d 0v 0 + nv d = d 0 + n c 1 = c m 1 = c 0m 0 + nȳ = 84 c 0 + n = (a) Marginal posterior distribution for τ is gamma(d 1 /, d 1 v 1 /). That is gamma(7.85, ). (Alternatively d 1 v 1 τ χ d 1. That is τ χ 15.7). (b) Marginal posterior for µ: µ m 1 v1 /c 1 = µ t 15.7 (c) We can use R for the critical points of t 15.7 : ±.13 qt(0.975,15.7) 95% interval: ± That is 3.61 < µ < (d) Comment, e.g., since zero is well outside the 95% interval it seems clear that the drug reduces the blood pressure. 11

12 5. (a) The likelihood is n L = ρ t i exp[ (ρt i ) ] ( n ) = n ρ n t i exp[ ρ n t i ] The log likelihood is n l = n log + n log ρ + log(t i ) ρ n t i. So l ρ = n n ρ ρ and, setting this equal to zero at the mode ˆρ, we find The second derivative is n = ˆρ n ˆρ = ˆρ = t i n t i n t = i so the posterior variance is approximately t i = l n ρ = n ρ 1 (n/ˆρ + t i ) = 1 ( t i + t i ) = Our 95% hpd interval is therefore ± That is t i < ρ < (b) The prior density is proportional to ρ 1 e 100ρ so the log prior density is log ρ 100ρ plus a constant. The log posterior is therefore g(ρ) plus a constant where So g(ρ) = (n + 1) log ρ 100ρ ρ g ρ = n + 1 ρ 100 ρ Setting this equal to zero at the mode ˆρ we find n n t i. n t i ˆρ + 100ˆρ (n + 1) = 0. t i. 1

13 This quadratic equation has two solutions but one is negative and ρ must be positive so The second derivative is ˆρ = ( t i )(n + 1) 4 t = i g ρ = n + 1 ρ so the posterior variance is approximately n 1 [(n + 1/)/ˆρ + t i ] = Our 95% hpd interval is therefore ± That is t i < ρ < So the prior makes no noticeable difference in this case. 6. (a) λ gamma(5, 1) so λ gamma(5, 1/), i.e. gamma(10/, 1/), i.e. χ 10. From tables, 95% interval, 3.47 < λ < That is < λ < 10.4 (b) Prior density prop. to λ 5 1 e λ. Likelihood 45 e λ λ xi L = x i! = e 45λ λ xi xi! e 45λ λ 18. Posterior density prop. to λ e 46λ. This is a gamma(187, 46) distribution. (c) Posterior mean: Posterior variance: = = Posterior sd: = % interval ± That is (d) Joint prob. of λ, X = m: Γ(187) λ187 1 e 46λ λm e λ = m! Γ(187) < λ < Γ(187 + m) m m! m Γ(187 + m) λ187+m 1 e 47λ 13

14 Integrate out λ: Pr(X = m) = = = Γ(187 + m) m Γ(187)m! ( ) 187 ( (186 + m)! !m! ( m m ) m ) ( ) 187 ( ) m (e) Joint probability (density) of θ, X = m: 0.05e 0.05θ θm e θ m! = 0.05 m! Γ(1 + m) 1.05 m m+1 Γ(1 + m) θm+1 1 e 1.05θ Integrate out θ : Log posterior probs: Ordinary : Pr(X = m) = 0.05 Γ(1 + m) 1.05 m+1 = m! ( ) ( ) m log(p 1 ) = log[γ( )] log[γ(187)] log[γ(11)] log(46/47) + 10 log(1/47) = log[γ(197)] log[γ(187)] log(γ(11)] log(46) 197 log(47) = > lgamma(197) - lgamma(187) - lgamma(11) + 187*log(46) - 197*log(47) [1] Type : log(p ) = log(0.05/1.05) + 10 log(1/1.05) = log(0.05) 11 log(1.05) = Hence the predictive probabilities are as follows. Ordinary : P 1 = exp( ) = Type : P = exp( ) = Hence the posterior probability that this is an ordinary customer is =

15 9. Prior: ( ) 4 τ gamma, 18 so d 0 = 4, d 0 v 0 = 18, v 0 = 4.5. µ τ N(500, (0.005τ) 1 ) so m 0 = 500, c 0 = Data: y = 9857, n = 0, ȳ = = y = , s n = 1 n { y nȳ } = =.75 Posterior: c 1 = c 0 + n = m 1 = c 0m 0 + nȳ = c 0 + n d 1 = d 0 + n = 4 r = (ȳ m 0 ) + s n = v d = c 0r + ns n c 0 + n v 1 = d 0v 0 + nv d d 0 + n =.403 = (a) ( marks) µ /0.005 t 4 Pr(µ < 495) = Pr (Eg. use R: pt(.1880,4) ). ( ) µ < / /0.005 = Pr(t 4 <.1990) = (b) (3 marks) c p = c 1 c = = Y /0.954 t 4 Pr(Y < 500) = Pr (Eg. use R: pt(1.5886,4) ). ( ) Y < / /0.954 = Pr(t 4 < ) = (3 marks) 15

16 10. (a) Likelihood: L = 8 e λi λ yi i y i! Log likelihood: l = λ i + y i log λ i log(y i!) = λ i + y i (α + βt i ) log(y i!) Derivatives: λ i α = α eα+βti = λ i λ i = β β eα+βti = λ i t i l α = λ i α + y i = λ i + y i = (λ i y i ) l β = λ i β + y i t i = λ i t i + y i t i = t i (λ i y i ) At the maximum Hence ˆα and ˆβ satisfy the given equations. Calculations in R: > y<-c(10,13,4,17,0,,0,3) > t<-seq(3,4,3) > lambda<-exp( *t) > sum(lambda-y) [1] > sum(t*(lambda-y)) [1] l α = l β = 0. These values seem close to zero but let us try a small change to the parameter values: > lambda<-exp( *t) > sum(lambda-y) [1] > sum(t*(lambda-y)) [1] The results are now much further than zero suggesting that the given values are very close to the solutions. (b) Second derivatives: (5 marks) 16

17 l α = λ i α = λ i l β = λ i t i α = t i λ i l α β = λ i β = t i λ i Variance matrix: V = ( l α l α β l α β l β ) Numerically using R: > lambda<-exp( *t) > d<-matrix(nrow=,ncol=) > d[1,1]<- -sum(lambda) > d[1,]<- - sum(t*lambda) > d[,1]<- - sum(t*lambda) > d[,]<- - sum((t^)*lambda) > V<- - solve(d) > V [,1] [,] [1,] [,] The mean of α + 4β is = The variance is ( ) = Alternative matrix-based calculation in R: > dim(m)<-c(1,) > v<-m%*%v%*%t(m) > v [,1] [1,] The approximate 95% interval is ± That is.9139 < α + 4β < (5 marks) (c) The interval for λ 4 is e.9139 < e α+4β < e That is < λ 4 < ( marks) 17

MAS3301 Bayesian Statistics

MAS3301 Bayesian Statistics MAS331 Bayesian Statistics M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 28-9 1 9 Conjugate Priors II: More uses of the beta distribution 9.1 Geometric observations 9.1.1