MAS3301 Bayesian Statistics Problems 5 and Solutions

Similar documents
MAS3301 Bayesian Statistics

MAS3301 Bayesian Statistics

MAS3301 Bayesian Statistics

MAS3301 Bayesian Statistics

MAS3301 Bayesian Statistics Problems 2 and Solutions

Conjugate Priors for Normal Data

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

This midterm covers Chapters 6 and 7 in WMS (and the notes). The following problems are stratified by chapter.

Chapter 5. Bayesian Statistics

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Bayesian Inference in a Normal Population

Bayesian Inference. Chapter 2: Conjugate models

Topic 12 Overview of Estimation

Module 11: Linear Regression. Rebecca C. Steorts

Problem Selected Scores

Math 180B Problem Set 3

UNIVERSITY OF TORONTO Faculty of Arts and Science

Part 6: Multivariate Normal and Linear Models

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:

MAS8303 Modern Bayesian Inference Part 2

Introduction to Machine Learning. Lecture 2

1 Introduction. P (n = 1 red ball drawn) =

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH2715 (January 2015) STATISTICAL METHODS. Time allowed: 2 hours

Part 3: Parametric Models

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Introduction to Bayesian Methods

Bayesian statistics, simulation and software

Approximating models. Nancy Reid, University of Toronto. Oxford, February 6.

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

Predictive Distributions

You may use your calculator and a single page of notes.

Master s Written Examination - Solution

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Bayesian Econometrics

MAS3301 / MAS8311 Biostatistics Part II: Survival

Statistics & Data Sciences: First Year Prelim Exam May 2018

Survival Analysis. Stat 526. April 13, 2018

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments

Midterm Examination. Mth 136 = Sta 114. Wednesday, 2000 March 8, 2:20 3:35 pm

Beyond GLM and likelihood

Last week. posterior marginal density. exact conditional density. LTCC Likelihood Theory Week 3 November 19, /36

Bayesian Inference in a Normal Population

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Interval Estimation. Chapter 9

Multivariate Bayesian Linear Regression MLAI Lecture 11

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

MAS223 Statistical Inference and Modelling Exercises

Stat 516, Homework 1

Homework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn

Conjugate Priors for Normal Data

Part IB. Statistics. Year

Problem 1 (20) Log-normal. f(x) Cauchy

COMP90051 Statistical Machine Learning

Section 4.6 Simple Linear Regression

Contents 1. Contents

Statistical Theory MT 2007 Problems 4: Solution sketches

Bayesian course - problem set 5 (lecture 6)

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

The binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution.

1 Basic continuous random variable problems

Stats 579 Intermediate Bayesian Modeling. Assignment # 2 Solutions

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

STAT FINAL EXAM

The Expectation-Maximization Algorithm

HPD Intervals / Regions

Link lecture - Lagrange Multipliers

2. I will provide two examples to contrast the two schools. Hopefully, in the end you will fall in love with the Bayesian approach.

Bayesian Inference: Posterior Intervals

557: MATHEMATICAL STATISTICS II HYPOTHESIS TESTING: EXAMPLES

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.

Bayesian Inference for Regression Parameters

PMR Learning as Inference

The logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is

Probability and Probability Distributions. Dr. Mohammed Alahmed

From Model to Log Likelihood

Statistics/Mathematics 310 Larget April 14, Solution to Assignment #9

Choosing among models

Chapter 5. Chapter 5 sections

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Regression. ECO 312 Fall 2013 Chris Sims. January 12, 2014

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Inference for a Population Proportion

Statistics 3858 : Maximum Likelihood Estimators


Prior Choice, Summarizing the Posterior

Bayesian Inference. Chapter 9. Linear models and regression

0, otherwise. U = Y 1 Y 2 Hint: Use either the method of distribution functions or a bivariate transformation. (b) Find E(U).

For more information about how to cite these materials visit

Advanced Quantitative Methods: maximum likelihood

Bayesian statistics, simulation and software

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Point Estimation and Confidence Interval

Transcription:

MAS3301 Bayesian Statistics Problems 5 and Solutions Semester 008-9 Problems 5 1. (Some of this question is also in Problems 4). I recorded the attendance of students at tutorials for a module. Suppose that we can, in some sense, regard the students as a sample from some population of students so that, for example, we can learn about the likely behaviour of next year s students by observing this year s. At the time I recorded the data we had had tutorials in Week and Week 4. Let the probability that a student attends in both weeks be θ 11, the probability that a student attends in week but not Week 4 be θ 10 and so on. The data are as follows. Attendance Probability Observed frequency Week and Week 4 θ 11 n 11 = 5 Week but not Week 4 θ 10 n 10 = 7 Week 4 but not Week θ 01 n 01 = 6 Neither week θ 00 n 00 = 13 Suppose that the prior distribution for (θ 11, θ 10, θ 01, θ 00 ) is a Dirichlet distribution with density proportional to θ 3 11θ 10 θ 01 θ 00. (a) Find the prior means and prior variances of θ 11, θ 10, θ 01, θ 00. (b) Find the posterior distribution. (c) Find the posterior means and posterior variances of θ 11, θ 10, θ 01, θ 00. (d) Using the R function hpdbeta which may be obtained from the Web page (or otherwise), find a 95% posterior hpd interval, based on the exact posterior distribution, for θ 00. (e) Find an approximate 95% hpd interval for θ 00 using a normal approximation based on the posterior mode and the partial second derivatives of the log posterior density. Compare this with the exact hpd interval. Hint: To find the posterior mode you will need to introduce a Lagrange multiplier. (f) The population mean number of attendances out of two is µ = θ 11 + θ 10 + θ 01. Find the posterior mean of µ and an approximation to the posterior standard deviation of µ.. Samples are taken from twenty wagonloads of an industrial mineral and analysed. The amounts in ppm (parts per million) of an impurity are found to be as follows. 44.3 50. 51.7 49.4 50.6 55.0 53.5 48.6 48.8 53.3 59.4 51.4 5.0 51.9 51.6 48.3 49.3 54.1 5.4 53.1 We regard these as independent samples from a normal distribution with mean µ and variance σ = τ 1. Find a 95% posterior hpd interval for µ under each of the following two conditions. 1

(a) The value of τ is known to be 0.1 and our prior distribution for µ is normal with mean 60.0 and standard deviation 0.0. (b) The value of τ is unknown. Our prior distribution for τ is a gamma distribution with mean 0.1 and standard deviation 0.05. Our conditional prior distribution for µ given τ is normal with mean 60.0 and precision 0.05τ (that is, standard deviation 40τ 1/ ). 3. We observe a sample of 30 observations from a normal distribution with mean µ and precision τ. The data, y 1,..., y 30, are such that 30 y i = 67 and 30 y i = 16193. (a) Suppose that the value of τ is known to be 0.04 and that our prior distribution for µ is normal with mean 0 and variance 100. Find the posterior distribution of µ and evaluate a posterior 95% hpd interval for µ. (b) Suppose that we have a gamma(1, 10) prior distribution for τ and our conditional prior distribution for µ given τ is normal with mean 0 and variance (0.1τ) 1. Find the marginal posterior distribution for τ, the marginal posterior distribution for µ and the marginal posterior 95% hpd interval for µ. 4. The following data come from the experiment reported by MacGregor et al. (1979). They give the supine systolic blood pressures (mm Hg) for fifteen patients with moderate essential hypertension. The measurements were taken immediately before and two hours after taking a drug. Patient 1 3 4 5 6 7 8 Before 10 169 187 160 167 176 185 06 After 01 165 166 157 147 145 168 180 Patient 9 10 11 1 13 14 15 Before 173 146 174 01 198 148 154 After 147 136 151 168 179 19 131 We are interested in the effect of the drug on blood pressure. We assume that, given parameters µ, τ, the changes in blood pressure, from before to after, in the n patients are independent and normally distributed with unknown mean µ and unknown precision τ. The fifteen differences are as follows. -9-4 -1-3 -0-31 -17-6 -6-10 -3-33 -19-19 -3 Our prior distribution for τ is a gamma(0.35, 1.01) distribution. Our conditional prior distribution for µ given τ is a normal N(0, [0.003τ] 1 ) distribution. (a) Find the marginal posterior distribution of τ. (b) Find the marginal posterior distribution of µ. (c) Find the marginal posterior 95% hpd interval for µ. (d) Comment on what you can conclude about the effect of the drug. 5. The lifetimes of certain components are supposed to follow a Weibull distribution with known shape parameter α =. The probability density function of the lifetime distribution is for 0 < t <. f(t) = αρ t exp[ (ρt) ] We will observe a sample of n such lifetimes where n is large.

(a) Assuming that the prior density is nonzero and reasonably flat so that it may be disregarded, find an approximation to the posterior distribution of ρ. Find an approximate 95% hpd interval for ρ when n = 300, log(t) = 1305.165 and t = 3161776. (b) Assuming that the prior distribution is a gamma(a, b) distribution, find an approximate 95% hpd interval for ρ, taking into account this prior, when a =, b = 100, n = 300, log(t) = 1305.165 and t = 3161776. 6. Given the value of λ, the number X i of transactions made by customer i at an online store in a year has a Poisson(λ) distribution, with X i independent of X j for i j. The value of λ is unknown. Our prior distribution for λ is a gamma(5,1) distribution. We observe the numbers of transactions in a year for 45 customers and 45 x i = 18. (a) Using a χ table (i.e. without a computer) find the lower.5% point and the upper.5% point of the prior distribution of λ. (These bound a 95% symmetric prior credible interval). (b) Find the posterior distribution of λ. (c) Using a normal approximation to the posterior distribution, based on the posterior mean and variance, find a 95% symmetric posterior credible interval for λ. (d) Find an expression for the posterior predictive probability that a customer makes m transactions in a year. (e) As well as these ordinary customers, we believe that there is a second group of individuals. The number of transactions in a year for a member of this second group has, given θ, a Poisson(θ) distribution and our beliefs about the value of θ are represented by a gamma(1,0.05) distribution. A new individual is observed who makes 10 transactions in a year. Given that our prior probability that this is an ordinary customer is 0.9, find our posterior probability that this is an ordinary customer. Hint: You may find it best to calculate the logarithms of the predictive probabilities before exponentiating these. For this you might find the R function lgamma useful. It calculates the log of the gamma function. Alternatively it is possible to do the calculation using the R function dnbinom. (N.B. In reality a slightly more complicated model is used in this type of application). 7. The following data give the heights in cm of 5 ten-year-old children. We assume that, given the values of µ and τ, these are independent observations from a normal distribution with mean µ and variance τ 1. 66 66 69 61 58 53 78 71 49 57 54 61 49 64 63 60 53 51 65 70 55 55 74 70 4 (a) Assuming that the value of τ 1 is known to be 64 and our prior distribution for µ is normal with mean 55 and standard deviation 5, find a 95% hpd interval for the height in cm of another ten-year-old child drawn from the same population. (b) Assume now that the value of τ is unknown but we have a prior distribution for it which is a gamma(,18) distribution and our conditional prior distribution for µ given τ is normal with mean 55 and variance (.56τ) 1. Find a 95% hpd interval for the height in cm of another ten-year-old child drawn from the same population. 3

8. A random sample of n = 1000 people was chosen from a large population. Each person was asked whether they approved of a proposed new law. The number answering Yes was x = 37. (For the purpose of this exercise all other responses and non-responses are teated as simply Not Yes ). Assume that x is an observation from the binomial(n, p) distribution where p is the unknown proportion of people in the population who would answer Yes. Our prior distribution for p is a uniform distribution on (0, 1). Let p = Φ(θ) so θ = Φ 1 (p) where Φ(y) is the standard normal distribution function and Φ 1 (z) is its inverse. (a) Find the maximum likelihood estimate of p and hence find the maximum likelihood estimate of θ. (b) Disregarding the prior distribution, find a large-sample approximation to the posterior distribution of θ. (c) Using your approximate posterior distribution for θ, find an approximate 95% hpd interval for θ. (d) Use the exact posterior distribution for p to find the actual posterior probability that θ is inside your approximate hpd interval. Notes: The standard normal distribution function Φ(x) = x φ(u) du where φ(u) = (π) 1/ exp{ u /}. Let l be the log-likelihood. Then and Derivatives of p : d l dθ = d { } dl dp dθ dp dθ You can evaluate Φ 1 (u) using R with qnorm(u,0,1) and φ(u) is given by dnorm(u,0,1) dl dθ = dl dp dp dθ = d { } dl dp dθ dp dθ + dl d p dp dθ ( ) = d l dp dp + dl d p dθ dp dθ dp dθ = φ(θ) d p dθ = θφ(θ) 9. The amounts of rice, by weight, in 0 nominally 500g packets are determined. The weights, in g, are as follows. 496 506 495 491 488 49 48 495 493 496 487 490 493 495 49 498 491 493 495 489 Assume that, given the values of parameters µ, τ, the weights are independent and each has a normal N(µ, τ) distribution. The values of µ and τ are unknown. Our prior distribution is as follows. We have a gamma(, 9) prior distribution for τ and a N(500, (0.005τ) 1 ) conditional prior distribution for µ given τ. 4

(a) Find the posterior probability that µ < 495. (b) Find the posterior predictive probability that a new packet of rice will contain less than 500g of rice 10. A machine which is used in a manufacturing process jams from time to time. It is thought that the frequency of jams might change over time as the machine becomes older. Once every three months the number of jams in a day is counted. The results are as follows. Observation i 1 3 4 5 6 7 8 Age of machine t i (months) 3 6 9 1 15 18 1 4 Number of jams y i 10 13 4 17 0 0 3 Our model is as follows. Given the values of two parameters α, β, the number of jams y i on a dat when the machine has age t i months has a Poisson distribution where y i Poisson(λ i ) log e (λ i ) = α + βt i. Assume that the effect of our prior distribution on the posterior distribution is negligible and that large-sample approximations may be used. (a) Let the values of α and β which maximise the likelihood be ˆα and ˆβ. Assuming that the likelihood is differentiable at its maximum, show that these satisfy the following two equations where 8 (ˆλ i y i ) = 0 8 t i (ˆλ i y i ) = 0 log e (ˆλ i ) = ˆα + ˆβt i and show that these equations are satisfied (to a good approximation) by ˆα =.55 and ˆβ = 0.0638. (You may use R to help with the calculations, but show your commands). You may assume from now on that these values maximise the likelihood. (b) Find an approximate symmetric 95% posterior interval for α + 4β. (c) Find an approximate symmetric 95% posterior interval for exp(α + 4β), the mean jam-rate per day at age 4 months. (You may use R to help with the calculations, but show your commands). Homework 5 Solutions to Questions 9 and 10 of Problems 5 are to be submitted in the Homework Letterbox no later than 4.00pm on Tuesday May 5th. Reference MacGregor, G.A., Markandu, N.D., Roulston, J.E. and Jones, J.C., 1979. Essential hypertension: effect of an oral inhibitor of angiotensin-converting enzyme. British Medical Journal,, 1106-1109. 5

Solutions 1. (a) Prior distribution is Dirichlet(4,,,3). So A 0 = 4 + + + 3 = 11. The prior means are The prior variances are Prior means: a 0,i A 0. a 0,i a 0,i (A 0 + 1)A 0 A 0 (A 0 + 1). θ 11 : θ 10 : θ 01 : θ 00 : 4 11 11 11 3 11 = 0.3636 = 0.1818 = 0.1818 = 0.77 Prior variances: θ 11 : θ 10 : θ 01 : θ 00 : 4 1 11 4 11 1 1 11 11 1 1 11 11 1 3 1 11 3 11 1 = 0.01984 = 0.01397 = 0.01397 = 0.01659 NOTE: Suppose that we are given prior means m 1,..., m 4 and one prior standard deviation s 1. Then a 0i = m i A 0 and Hence s m 1 A 0 1 = m 1A 0 (A 0 + 1)A 0 A 0 (A 0 + 1) = m 1(1 m 1 ). A 0 + 1 A 0 + 1 = m 1(1 m 1 ) s 1 = 0.3636(1 0.3636) 0.01984 = 1. Hence A 0 = 11 and a 0i = 11m i. For example a 01 = 11 0.3636 = 4. (b) Posterior distribution is Dirichlet(4+5, +7, +6, 3+13). That is Dirichlet(9,9,8,16). (c) Now A 1 = 9 + 9 + 8 + 16 = 6. The posterior means are The posterior variances are a 1,i A 1. a 1,i a 1,i (A 1 + 1)A 1 A 1 (A 1 + 1). 6

Posterior means: Posterior variances: θ 11 : θ 10 : θ 01 : θ 00 : 9 6 9 6 8 6 16 6 = 0.4677 = 0.145 = 0.190 = 0.581 θ 11 : θ 10 : θ 01 : θ 00 : 9 63 6 9 6 63 9 63 6 9 6 63 8 63 6 8 6 63 16 63 6 16 6 63 = 0.00395 = 0.001970 = 0.001784 = 0.003039 (d) Posterior distribution for θ 00 is beta(16, 6 16). That is beta(16,46). Using the R command hpdbeta(0.95,16,46) gives 0.1535 < θ 00 < 0.3674. (e) The log posterior density is (apart from a constant) 4 (a 1,j 1) log θ j. j=1 Add λ( j j=1 θ j 1) to this and differentiate wrt θ j then set the derivative equal to zero. This gives a 1,j 1 ˆθ j + λ = 0 which leads to However 4 j=1 θ j = 1 so so and ˆθ j = (a 1,j 1). λ 4 j=1 λ = (a 1,j 1) λ = 1 4 (a 1,j 1) j=1 ˆθ j = a 1,j 1 a1,k 4. Hence the posterior mode for θ 00 is ˆθ 00 = 15 58 = 0.586. 7

The second derivatives of the log likelihood are l θ j = a 1,j 1 θ j and l θ j θ k = 0. Since the mixed partial second derivatives are zero, the information matrix is diagonal and the posterior variance of θ j is approximately ˆθ j a 1,j 1 = (a 1,j 1) (a 1,j 1)( a 1,k 4) = (a i,j 1 ( a 1,k 4). The posterior variance of θ 00 is approximately 15 58 = 0.00445898. The approximate 95% hpd interval is 0.586 ± 1.96 0.00445898. that is 0.177 < θ 00 < 0.38948. This is a little wider than the exact interval. (f) Approximation based on posterior mode and curvature: Posterior modes: 8 8 θ 11 : θ 10 : θ 01 : 58 58 So, approx. posterior mean of µ is Approx. posterior variances: θ 11 : 8 58 + 8 58 + 7 58 = 71 58 = 1.414. 8 58 θ 10 : 8 58 θ 01 : 7 58 7 58 Since the (approx.) covariances are all zero, the approx. posterior variance of µ is 4 8 58 + 8 58 + 7 58 = 17 58 = 0.037757 so approx. standard deviation is 0.037757 = 0.1943. N.B. There is an alternative exact calculation, as follows, which is also acceptable. Posterior mean: Posterior covariances: 9 6 + 9 6 + 8 6 = 1.0968. a 1,ja 1,k A 1 (A 1 + 1) var(µ) = 4var(θ 11 ) + var(θ 10 ) + var(θ 01 ) +4covar(θ 11, θ 10 ) + 4covar(θ 11, θ 01 ) + covar(θ 10, θ 01 ) ( ) ( 9 = 4 63 6 9 9 63 6 + 63 6 9 ) ( 8 63 6 + 63 6 8 ) 63 6 ( ) ( ) ( ) 9 9 9 8 9 8 4 63 6 4 63 6 63 6 133 = 63 6 565 63 6 = 0.01089. So the standard deviation is 0.1040. The difference is quite big! 8

. From the data ȳ = 51.445 n y i = 108.9 s n = 1 n n (y i ȳ) = 1 n n yi = 53113.73 { 53113.73 1 0 108.9 } = 9.09848 (a) Prior mean: M 0 = 60.0 Prior precision: P 0 = 1/0 = 0.005 Data precision: P d = nτ = 0 0.1 = Posterior precision: P 1 = P 0 + P d =.005 Posterior mean: 0.005 60.0 + 51.445 M 1 =.005 Posterior std. dev.: 1.005 = 0.706665 95% hpd interval: 51.4557 ± 1.96 0.706665. That is (b) Prior τ gamma(d/, dv/) where 50.0706 < µ < 5.8408 d/ dv/ = 1 v = 0.1 = 51.4557 so v = 10 and d/ (dv/) = 1 v d = 0.05 so /d = 0.5 so /d = 0.5 so d = 8. Hence d 0 = 8 v 0 = 10 c 0 = 0.05 m 0 = 60.0 m 1 = c 0m 0 + nȳ 0.05 60.0 + 108.9 = c 0 + n 0.05 c 1 = c 0 + n = 0.05 d 1 = d 0 + n = 8 r = 1 n (yi m 0 ) = (ȳ m 0 ) + s n = (51.445 60) + 9.09848 = 8.865 = 51.4557 95% hpd interval: v d = c 0r + ns n c 0 + n v 1 = d 0v 0 + nv d d 0 + n = 9.189846 = 9.413 M 1 ± t 8 v1 c 1 9

That is That is M 1 ±.048 0.68591 50.051 < µ < 5.860 3. (a) We have P 0 = 0.01 P d = nτ = 30 0.04 = 1. P 1 = 0.01 + 1. = 1.1 M 0 = 0 ȳ =.4 M 1 = P 0M 0 + P d ȳ P 1 = 0.01 0 + 1..4 1.1 =.380 Posterior: µ N(.380, 1.1 1 ). That is µ N(.380, 0.864). 95% hpd interval:.380 ± 1.96 0.864. That is (b) We have d 0 = d 1 = d 0 + n = 3 v 0 = 10 c 0 = 0.1 c 1 = c 0 + n = 30.1 m 0 = 0 0.60 < µ < 4.16 ȳ =.4 m 1 = c 0m 0 + nȳ 0.1 0 + 30.4 = =.39 c 0 + n 30.1 ( s n = 1 n yi 1 n ) y n i n = 38.0067 (ȳ m 0 ) = (.4 0) = 5.76 r = (ȳ m 0 ) + s n = 43.7667 v d = c 0r + ns n c 0 + n v 1 = d 0v 0 + nv d d 0 + n = 38.058 = 36.7419 Marginal posterior distribution for τ: d 1 v 1 τ = 11670.77τ χ 3. Marginal posterior distribution for µ: µ m 1 µ.39 = t 3 v1 /c 1 36.7419/30.1 95% hpd interval: v1 m 1 ±.037. c 1 10

That is That is 36.7419.39 ±.037. 30.1 0.16 < µ < 4.63. 4. Data: n = 15 y = 84 y = 6518 ȳ = 18.9333 s n = 1 } {6518 84 = 1140.9333 = 76.06 15 15 15 Calculate posterior: d 0 = 0.7 v 0 =.0/0.7 =.8857 c 0 = 0.003 m 0 = 0 d 1 = d 0 + 15 = 15.7 (ȳ m 0 ) = ȳ = 358.4711 r = (ȳ m 0 ) + s n = 434.5333 v d = c 0r + ns n c 0 + n = 76.1339 v 1 = d 0v 0 + nv d = 7.8681 d 0 + n c 1 = c 0 + 15 m 1 = c 0m 0 + nȳ = 84 c 0 + n 15.003 = 18.995 (a) Marginal posterior distribution for τ is gamma(d 1 /, d 1 v 1 /). That is gamma(7.85, 57.014). (Alternatively d 1 v 1 τ χ d 1. That is 1144.05τ χ 15.7). (b) Marginal posterior for µ: µ m 1 v1 /c 1 = µ 18.995 4.8569 t 15.7 (c) We can use R for the critical points of t 15.7 : ±.13 qt(0.975,15.7) 95% interval: 18.995 ±.13 4.8569. That is 3.61 < µ < 14.5. (d) Comment, e.g., since zero is well outside the 95% interval it seems clear that the drug reduces the blood pressure. 11

5. (a) The likelihood is n L = ρ t i exp[ (ρt i ) ] ( n ) = n ρ n t i exp[ ρ n t i ] The log likelihood is n l = n log + n log ρ + log(t i ) ρ n t i. So l ρ = n n ρ ρ and, setting this equal to zero at the mode ˆρ, we find The second derivative is n = ˆρ n ˆρ = ˆρ = t i n t i n t = i so the posterior variance is approximately t i 300 3161776 = 0.0097408. l n ρ = n ρ 1 (n/ˆρ + t i ) = 1 ( t i + t i ) = 7.90695 10 8. Our 95% hpd interval is therefore 0.0097408 ± 1.96 7.90695 10 8. That is t i 0.009190 < ρ < 0.0109. (b) The prior density is proportional to ρ 1 e 100ρ so the log prior density is log ρ 100ρ plus a constant. The log posterior is therefore g(ρ) plus a constant where So g(ρ) = (n + 1) log ρ 100ρ ρ g ρ = n + 1 ρ 100 ρ Setting this equal to zero at the mode ˆρ we find n n t i. n t i ˆρ + 100ˆρ (n + 1) = 0. t i. 1

This quadratic equation has two solutions but one is negative and ρ must be positive so The second derivative is ˆρ = 100 + 100 + 8( t i )(n + 1) 4 t = 0.0097410. i g ρ = n + 1 ρ so the posterior variance is approximately n 1 [(n + 1/)/ˆρ + t i ] = 7.900519 10 8. Our 95% hpd interval is therefore 0.0097410 ± 1.96 7.900519 10 8. That is t i 0.009190 < ρ < 0.0109. So the prior makes no noticeable difference in this case. 6. (a) λ gamma(5, 1) so λ gamma(5, 1/), i.e. gamma(10/, 1/), i.e. χ 10. From tables, 95% interval, 3.47 < λ < 0.48. That is 1.635 < λ < 10.4 (b) Prior density prop. to λ 5 1 e λ. Likelihood 45 e λ λ xi L = x i! = e 45λ λ xi xi! e 45λ λ 18. Posterior density prop. to λ 187 1 e 46λ. This is a gamma(187, 46) distribution. (c) Posterior mean: Posterior variance: 147 46 = 4.065 187 46 = 0.088774 Posterior sd: 187 46 = 0.978 95% interval 4.065 ± 1.96 0.978. That is (d) Joint prob. of λ, X = m: 46 187 Γ(187) λ187 1 e 46λ λm e λ = 46187 m! Γ(187) 3.486 < λ < 4.6479 Γ(187 + m) 1 47 187+m m! 47 187+m Γ(187 + m) λ187+m 1 e 47λ 13

Integrate out λ: Pr(X = m) = = = 46 187 Γ(187 + m) 47 187+m Γ(187)m! ( ) 187 ( (186 + m)! 46 1 186!m! ( 186 + m m ) m 47 47 ) ( ) 187 ( 46 1 47 47 ) m (e) Joint probability (density) of θ, X = m: 0.05e 0.05θ θm e θ m! = 0.05 m! Γ(1 + m) 1.05 m+1 1.05 m+1 Γ(1 + m) θm+1 1 e 1.05θ Integrate out θ : Log posterior probs: Ordinary : Pr(X = m) = 0.05 Γ(1 + m) 1.05 m+1 = m! ( ) ( 0.05 1 ) m 1.05 1.05 log(p 1 ) = log[γ(187 + 10)] log[γ(187)] log[γ(11)] + 187 log(46/47) + 10 log(1/47) = log[γ(197)] log[γ(187)] log(γ(11)] + 187 log(46) 197 log(47) = 5.079796 > lgamma(197) - lgamma(187) - lgamma(11) + 187*log(46) - 197*log(47) [1] -5.079796 Type : log(p ) = log(0.05/1.05) + 10 log(1/1.05) = log(0.05) 11 log(1.05) = 3.5344 Hence the predictive probabilities are as follows. Ordinary : P 1 = exp( 5.079796) = 0.0061178 Type : P = exp( 3.5344) = 0.093396 Hence the posterior probability that this is an ordinary customer is 9 0.0061178 9 0.0061178 + 1 0.093396 = 0.65698 7. 8. 14

9. Prior: ( ) 4 τ gamma, 18 so d 0 = 4, d 0 v 0 = 18, v 0 = 4.5. µ τ N(500, (0.005τ) 1 ) so m 0 = 500, c 0 = 0.005. Data: y = 9857, n = 0, ȳ = 9857 0 = 49.85 y = 4858467, s n = 1 n { y nȳ } = 444.55 0 =.75 Posterior: c 1 = c 0 + n = 0.005 m 1 = c 0m 0 + nȳ = 49.8518 c 0 + n d 1 = d 0 + n = 4 r = (ȳ m 0 ) + s n = 73.35 v d = c 0r + ns n c 0 + n v 1 = d 0v 0 + nv d d 0 + n =.403 = 19.836 (a) ( marks) µ 49.8518 19.836/0.005 t 4 Pr(µ < 495) = Pr (Eg. use R: pt(.1880,4) ). ( ) µ 49.8518 495 49.8518 < 19.836/0.005 19.836/0.005 = Pr(t 4 <.1990) = 0.9807 (b) (3 marks) c p = c 1 c 1 + 1 = 0.005 1.005 = 0.954 Y 49.8518 19.836/0.954 t 4 Pr(Y < 500) = Pr (Eg. use R: pt(1.5886,4) ). ( ) Y 49.8518 500 49.8518 < 19.836/0.954 19.836/0.954 = Pr(t 4 < 1.5886) = 0.9374 (3 marks) 15

10. (a) Likelihood: L = 8 e λi λ yi i y i! Log likelihood: l = λ i + y i log λ i log(y i!) = λ i + y i (α + βt i ) log(y i!) Derivatives: λ i α = α eα+βti = λ i λ i = β β eα+βti = λ i t i l α = λ i α + y i = λ i + y i = (λ i y i ) l β = λ i β + y i t i = λ i t i + y i t i = t i (λ i y i ) At the maximum Hence ˆα and ˆβ satisfy the given equations. Calculations in R: > y<-c(10,13,4,17,0,,0,3) > t<-seq(3,4,3) > lambda<-exp(.55+0.0638*t) > sum(lambda-y) [1] 0.00157513 > sum(t*(lambda-y)) [1] -0.00354096 l α = l β = 0. These values seem close to zero but let us try a small change to the parameter values: > lambda<-exp(.55+0.064*t) > sum(lambda-y) [1] -0.598 > sum(t*(lambda-y)) [1] -3.606993 The results are now much further than zero suggesting that the given values are very close to the solutions. (b) Second derivatives: (5 marks) 16

l α = λ i α = λ i l β = λ i t i α = t i λ i l α β = λ i β = t i λ i Variance matrix: V = ( l α l α β l α β l β ) Numerically using R: > lambda<-exp(.55+0.0638*t) > d<-matrix(nrow=,ncol=) > d[1,1]<- -sum(lambda) > d[1,]<- - sum(t*lambda) > d[,1]<- - sum(t*lambda) > d[,]<- - sum((t^)*lambda) > V<- - solve(d) > V [,1] [,] [1,] 0.038194535-0.001361807 [,] -0.00136181 0.0001449430 The mean of α + 4β is.55 + 4 0.0638 = 3.1851. The variance is 0.038194535 + 4 0.0001449430 + 1 4 ( 0.001361807) = 0.01914501. Alternative matrix-based calculation in R: > dim(m)<-c(1,) > v<-m%*%v%*%t(m) > v [,1] [1,] 0.01914501 The approximate 95% interval is 3.1851 ± 1.96 0.01914501 That is.9139 < α + 4β < 3.4563 (5 marks) (c) The interval for λ 4 is e.9139 < e α+4β < e 3.4563. That is 18.49 < λ 4 < 31.700. ( marks) 17