The Relationship Between the Power Prior and Hierarchical Models

Size: px
Start display at page:

Download "The Relationship Between the Power Prior and Hierarchical Models"

Transcription

1 Bayesian Analysis 006, Number 3, pp The Relationship Between the Power Prior and Hierarchical Models Ming-Hui Chen, and Joseph G. Ibrahim Abstract. The power prior has emerged as a useful informative prior for the incorporation of historical data in a Bayesian analysis. Viewing hierarchical modeling as the gold standard for combining information across studies, we provide a formal justification of the power prior by examining formal analytical relationships between the power prior and hierarchical modeling in linear models. Asymptotic relationships between the power prior and hierarchical modeling are obtained for non-normal models, including generalized linear models, for example. These analytical relationships unify the theory of the power prior, demonstrate the generality of the power prior, shed new light on benchmark analyses, and provide insights into the elicitation of the power parameter in the power prior. Several theorems are presented establishing these formal connections, as well as a formal methodology for eliciting a guide value for the power parameter a 0 via hierarchical models. Keywords: Generalized linear model, hierarchical model, historical data, power prior, prior elicitation, random effects model Introduction The power prior discussed in Ibrahim and Chen 000 has emerged as a useful class of informative priors for a variety of situations in which historical data is available. Several applications to clinical trials and epidemiological studies using the power prior have appeared in the literature. Examples of the use of the power prior and its modifications in clinical trials and carcinogenicity studies include Berry 99, Eddy et al. 99, Berry and Hardwick 993, Lin 993, Spiegelhalter et al. 994, Berry and Stangl 996, Chen et al. 000, Ibrahim and Chen 998, Ibrahim et al. 998, Ibrahim et al. 999, Chen et al. 999a, Chen et al. 999b, Ibrahim and Chen 000, and Ibrahim et al Examples using the power prior in epidemiological studies include Greenhouse and Wasserman 995, Brophy and Joseph 995, Fryback et al. 00, and Tan et al. 00. A recent book illustrating the use of the power prior in epidemiological studies is Spiegelhalter et al The power prior for a general regression model can be constructed as follows. Suppose we have historical data from a similar previous study, denoted by D 0 = n 0, y 0, X 0, where n 0 is the sample size of the historical data, y 0 is the n 0 response vector, and Department of Statistics, University of Connecticut, Storrs, CT, Department of Biostatistics, University of North Carolina, Chapel Hill, NC, c 006 International Society for Bayesian Analysis ba0003

2 55 Power Prior and Hierarchical Models X 0 is the n 0 p matrix of covariates based on the historical data. Further, let π 0 θ denote the prior distribution for θ before the historical data D 0 is observed. We shall call π 0 θ the initial prior distribution for θ. Let the data from the current study be denoted by D = n, y, X, where n denotes the sample size, y denotes the n response vector, and X denotes the n p matrix of covariates. Further, denote the likelihood for the current study by Lθ D, where θ is a vector of indexing parameters. Thus, Lθ D is a general likelihood function for an arbitrary regression model, such as linear models, generalized linear model, random effects model, nonlinear model, or a survival model with censored data. Given the discounting parameter a 0, we define the power prior distribution of θ for the current study as πθ D 0, a 0 Lθ D 0 a0 π 0 θ, where a 0 weights the historical data relative to the likelihood of the current study, and thus the parameter a 0 controls the influence of the historical data on Lθ D. The parameter a 0 can be interpreted as a precision parameter for the historical data. Since D 0 is historical data, it is unnatural in most applications including clinical trials and carcinogenicity studies to weight the historical data more than the current data; thus it is scientifically more sound to restrict the range of a 0 to be between 0 and, and thus we take 0 a 0. One of the main roles of a 0 is that it controls the heaviness of the tails of the prior for θ. As a 0 becomes smaller, the tails of become heavier. Setting a 0 =, corresponds to the update of π 0 θ using Bayes theorem. That is, with a 0 =, corresponds to the posterior distribution of θ based on the historical data. When a 0 = 0, then the prior does not depend on the historical data D 0 ; in this case, πθ D 0, a 0 = 0 π 0 θ. Thus, a 0 = 0 is equivalent to a prior specification with no incorporation of historical data. Therefore, can be viewed as a generalization of the usual Bayesian update of π 0 θ. The parameter a 0 allows the investigator to control the influence of the historical data on the current study. Such control is important in cases where there is heterogeneity between the previous and current studies, or when the sample sizes of the two studies are quite different. One of the most useful applications of the power prior is for model selection problems since it inherently automates the informative prior specification for all possible models in the model space see Chen et al. 999b, Ibrahim et al. 999, and Ibrahim and Chen 000. One of the most common ways of combining several datasets or incorporating prior information is through hierarchical modeling. Hierarchical modeling is perhaps the most common and best known method for combining several sources of information. In this paper, we establish a formal analytic connection between the power prior and hierarchical models, both for the normal linear model and non-normal models including the class of generalized linear models. This relationship is accomplished by establishing an analytic relationship between the power parameter a 0 and the variance components of the hierarchical model. Establishing this relationship is critical since it formally justifies the power prior based on hierarchical modeling as well as guides the user into the choice of a 0 from which sensitivity analyses can be based. Such a benchmark power prior would be the prior that has an a 0 value which corresponds to equivalence between the hierarchical model and the power prior. Thus in this situation, the benchmark power

3 M.-H. Chen and J. G. Ibrahim 553 prior analysis would be the one that corresponds to hierarchical modeling. Thus, the analytical connection we make is important and useful since it characterizes the relationship between the power parameter a 0 and the variance components in the hierarchical model, thereby motivating and justifying the use of the power prior as an informative prior for incorporating historical data, as well as providing a semi-automatic elicitation scheme for a 0 based on hierarchical modeling. Indeed, one of the most difficult and elusive issues in the use of the power prior is the choice of a 0. Since there will typically be little information in the historical data about a 0, it is not at all clear how to use the historical data in eliciting a 0. The analytic relationship between a 0 and the hyperparameters of the hierarchical model provides the data analyst with elicitation strategies for a 0 based on the historical data. The rest of this paper is organized as follows. We consider the normal hierarchical linear model in Section and develop formal analytical relationships between the power prior and the hierarchical regression model. In Section 3 we develop formal analytical relationships between the power prior and the hierarchical generalized linear model. In Section 4, we extend our results to multiple historical datasets. In Section 5, we use the analytic relationship between a 0 and the hyperparameters of the hierarchical model to devise a formal elicitation method for a 0 based on the historical data. In Section 6, we present a real dataset to demonstrate the main results. Hierarchical Normal Models. I.I.D. Case We first consider the i.i.d. case with a single historical dataset. The model for this case can be written as y i = θ + ɛ i, i =,,..., n, and y 0i = θ 0 + ɛ 0i, i =,,..., n 0, where y = y, y,..., y n denotes the current data with sample size of n and y 0 = y 0, y 0,..., y 0n0 denotes the historical data with sample size of n 0. In, we further assume that the ɛ i are i.i.d. N0, and the ɛ 0i are i.i.d. N0, 0 and independent of the ɛ i s, where and 0 are fixed parameters. Now the hierarchical model is completed by independently taking and then taking θ 0 µ, τ Nµ, τ, θ µ, τ Nµ, τ, 3 µ Nα, ν, 4 where α, ν, and τ are all fixed hyperparameters. Within the development of this hierarchical model, our goal is to make inferences about the current study through the marginal posterior distribution [θ y, y 0 ] [a b] denotes the marginal distribution of a given b throughout. Here, θ is the parameter of interest for the current study, such as a treatment effect, and θ 0 denotes the corresponding parameter based on the historical study. The following theorem gives the form of the marginal posterior distribution [θ y, y 0 ] obtained from, 3, and 4.

4 554 Power Prior and Hierarchical Models Theorem. Letting ȳ 0 = n 0 n0 i= y 0i and ȳ = n n i= y i, we have θ y, y 0 Nµ h, h, 5 where µ h = h nȳ + α τ ν ν + τ + τ 4 ν + τ n 0 0 n 0ȳ α τ ν ν + τ + τ τ 4 ν + τ 6 and h = n + τ τ 4 ν + τ τ 4 ν + τ n τ τ 4 ν + τ. 7 The proof of Theorem. is given in Appendix B. Now we consider a power prior formulation of the model in. To do this, we set θ 0 = θ, and the resulting model becomes y i = θ + ɛ i, and y 0i = θ + ɛ 0i. 8 Thus in the power prior formulation, the ɛ i are i.i.d. N0,, and the ɛ 0i are i.i.d. N0, 0 and independent of the ɛ i s, where and 0 are fixed. Under this model, the power prior based on the historical data y 0 using the initial prior π 0 θ, is given by πθ y 0 exp Straightforward calculations show that a 0 0 n 0 i= y 0i θ }. 9 θ y, y 0 Nµ p, p, 0 where and nȳ n + a 0ȳ 0 µ p = 0 0 n n + a 0, 0 0 p = n n + a We now examine the relationship between 5 and 0. To do this, we need to find an explicit relationship between µ h and µ p as well as a relationship between h and p. We are led to the following theorem which characterizes this relationship.

5 M.-H. Chen and J. G. Ibrahim 555 Theorem. The posteriors in 5 and 0 match, i.e., µ h = µ p and h = p if and only if α = 0 and ν in 4, and a 0 = τ n The proof of Theorem. is given in Appendix B. Corollary. The choice of a 0 given in 3 satisfies 0 < a 0 <. Corollary. The result in Theorem. can be alternatively obtained by taking a uniform improper prior for µ at the outset, i.e., πµ. The proofs of Corollaries. and. are straightforward. Theorem. gives us a useful characterization of the explicit relationship between the power prior and the hierarchical model, and we see from this theorem that the two models are equivalent if a 0 is chosen as 3. We see from 3 that a 0 is a monotonic function of τ, and if τ 0, then a 0. This implies that if θ = θ 0 with probability, the historical and current data should be weighted equally. Also, the larger the sample size for the historical data, the less the weight given to the historical data. This is a desirable property since, in general, we would never want the historical data to dominate the posterior distribution of θ by simply increasing n 0.. Regression Model We now extend the normal hierarchical model in to the normal hierarchical regression model by setting θ = x i β and θ 0 = x 0i β 0. This leads to the model y i = x iβ + ɛ i, i =,,..., n, and y 0i = x 0iβ 0 + ɛ 0i, i =,,..., n 0, 4 where β is a p vector of regression coefficients for the current data, β 0 is the p vector of regression coefficients for the historical data, x i is a p vector of covariates for the i th subject in the current dataset, and x 0i is a p vector of covariates for the i th subject in the historical dataset. Similar to, we further assume that the ɛ i are i.i.d. N0, and the ɛ 0i are i.i.d. N0, 0 and independent of the ɛ i s, where and 0 are fixed. In addition, we take β µ, Ω N p µ, Ω, β 0 µ, Ω N p µ, Ω, and πµ, 5 where Ω is fixed. Here, β is the parameter vector of interest for the current study, and β 0 denotes the corresponding parameter based on the historical study. The following theorem gives the form of the marginal posterior distribution [β y, y 0 ] obtained from 4 and 5.

6 556 Power Prior and Hierarchical Models Theorem.3 The marginal posterior distribution of β is given by where β y, X, y 0, X 0 N p A h B h, A h, 6 A h = X X + Ω Ω Ω Ω Ω + 0 X 0X 0 Ω Ω, 7 B h = X y+ [ Ω Ω Ω Ω + 0 X 0X 0 Ω X = x, x,..., x n, and X 0 = x 0, x 0,..., x 0n0. Ω Ω + 0 X 0X 0 0 X 0y 0 ], 8 The proof of Theorem.3 is given in Appendix B. Now we consider a power prior formulation of the model in 4. To do this, we set β 0 = β, and the resulting model becomes y i = x iβ + ɛ i, and y 0i = x 0iβ + ɛ 0i. 9 Thus in the power prior formulation, the ɛ i are i.i.d. N0,, and the ɛ 0i are i.i.d. N0, 0 and independent of the ɛ i s, where and 0 are fixed. Under this model, the power prior based on the historical data n 0, y 0, X 0 using the initial prior π 0 β is given by πβ y 0, X 0, a 0 exp a } 0 0 y 0 X 0 β y 0 X 0 β, 0 and the posterior distribution of β is given by β y, X, y 0, X 0, a 0 N p A p B p, A p, where and A p = X X + a 0 0 X 0X 0 B p = X y + a 0 0 X 0y 0. 3 To match 6 and, we need to find an explicit relationship between A h and A p as well as a relationship between B h and B p. Clearly for the hierarchical model and the power prior to have identical distributions, we need A h = A p and B h = B p. We are led to the following theorem which characterizes this relationship. Theorem.4 Assume that X 0 is of full rank. Then, the posteriors in 6 and match, i.e., A h = A p and B h = B p if and only if a 0 I + 0 ΩX 0X 0 = I. 4

7 M.-H. Chen and J. G. Ibrahim 557 The proof of Theorem.4 is given in Appendix B. We note that when p =, 4 reduces to 3. Specifically, setting setting Ω = τ and X 0 =,,...,, 4 reduces to a τ n 0 =, and thus, the condition for a 0 given by 3 is a special case of 4. We also see that Theorem.4 implies that for the two models to match, ΩX 0X 0 must be proportional to the identity matrix; that is, Ω must be of the form Ω = cx 0X 0, where c 0 is a scalar. This form of Ω is quite attractive especially in model selection contexts, and has been discussed by several authors as prior covariance matrix, including Zellner 986, Ibrahim and Laud 994, and Laud and Ibrahim 995. Precisely, the form Ω = cx 0X 0 corresponds to Zellner s g-prior, and has several nice properties as discussed in Zellner 986, Ibrahim and Laud 994, and Laud and Ibrahim 995. Taking Ω = cx 0X 0 is actually quite attractive in practice since it has the interpretation that, if there is historical data D 0 = n 0, y 0, X 0 for which a linear model is fit, Y 0 = X 0 β + ɛ, ɛ N0, 0I, then the information contained in β in the data D 0 is the Fisher information, given by 0 X 0X 0. Thus, based on Fisher information arguments, it makes scientific sense to assign cx 0X 0 as the prior covariance matrix for β. Our prior does indeed imply that the variation in β 0 and β are linked via X 0. This is reasonable since β and β 0 do have the same prior covariance matrix Ω, and β and β 0 represent the same unobservable phenomenon. Therefore, allowing a multiple of the Fisher information in β 0 to be the prior precision matrix makes scientific and intuitive sense. We are now led to the following theorem. Theorem.5 If the g-prior form is taken for Ω, that is, Ω = cx 0X 0, then the power prior and the hierarchical model are identical when a 0 is taken to satisfy a 0 I + 0 ci = I or a 0 = + c. 0 This theorem yields a very appealing result in that it shows the power prior and the hierarchical model are equivalent in the regression setting when a g-prior is chosen for Ω and a 0 is chosen as in Theorem.4. This result thus gives more appeal to the g-prior for use in Bayesian inference. 3 Hierarchical Generalized Linear Models The analytic relationships given in the previous section can be extended to any nonnormal model that has an asymptotically normal likelihood. To be specific, we demonstrate the nature of such approximations for the class of generalized linear models. Consider the hierarchical generalized linear model given by and py i θ i, τ = exp a i τy i θ i bθ i + cy i, τ }, i =,..., n, py 0i θ 0i, τ = exp a i τy 0i θ 0i bθ 0i + cy 0i, τ }, i =,..., n 0,

8 558 Power Prior and Hierarchical Models indexed by the canonical parameter θ i θ 0i and the scale parameter τ. The functions b and c determine a particular family in the class, such as the binomial, normal, Poisson, etc. The functions a i τ are commonly of the form a i τ = τ w i, where the w i s are known weights. For ease of exposition, we assume that w i = and τ is known throughout. We further assume the θ i s and θ 0i s satisfy the equations θ i = θη i, i =,,..., n, θ 0i = θη 0i, i =,,..., n 0, and η = Xβ and η 0 = X 0 β 0, where η = η,..., η n, and η 0 = η 0,..., η 0n0. The function θη i is called the θ-link, and for a canonical link, θ i = η i. Given β, β 0, y, y 0 are independent, and thus we can write the joint likelihood of β, β 0 in vector notation for the generalized linear model GLM as py, y 0 β, β 0 = expτ y θxβ b[θxβ] + cy, τ} expτ y 0θX 0 β 0 b[θx 0 β 0 ] + cy 0, τ}, 5 where θxβ is a componentwise function of Xβ that depends on the link. When a canonical link is used, θxβ = Xβ. To complete the hierarchical GLM specification, we specify priors for β, β 0 as in the normal linear regression model. That is, we independently take β Nµ, Ω and β 0 Nµ, Ω, 6 where Ω is fixed. We further take πµ. The marginal posterior distribution of β is required to make inferences about β in this framework. Due to the complexity of the hierarchical GLM specification, it is not possible to obtain a closed form expression for the marginal posterior distribution of β. To overcome this difficulty, we consider an asymptotic approximation to the posterior similar to that of Chen 985. In the context considered here, asymptotic means n 0 and n. Toward this goal, let pβ y, X = expτ y θxβ τb[θxβ] + cy, τ} and pβ 0 y 0, X 0 = expτ y 0θX 0 β 0 b[θx 0 β 0 ] + cy 0, τ}. Following Chen 985 and ignoring constants that are free of the parameters, we have pβ y, X exp β ˆβ Σ β ˆβ } 7 and pβ 0 y 0, X 0 exp β 0 ˆβ 0 Σ 0 β 0 ˆβ } 0, 8

9 M.-H. Chen and J. G. Ibrahim 559 where ˆβ and ˆβ 0 are the respective maximum likelihood estimators MLEs of β and β 0 based on the data n, y, X and n 0, y 0, X 0 under the GLM, Σ = ln pβ y, X β β ˆβ β=, and Σ 0 = It is straightforward to show that under the model in 5, ln pβ 0 y 0, X 0 β 0 β 0 β0 = ˆβ 0 Σ = τx ˆ ˆV X and Σ 0 = τx 0 ˆ 0 ˆV 0 X 0, 9 where ˆ and ˆV are n n diagonal matrices with i th diagonal elements δ i = δ i x i β = dθ i /dη i and v i = v i x i β = d bθ i /dθ i evaluated at ˆβ, and ˆ 0 and ˆV 0 are n 0 n 0 diagonal matrices with i th diagonal elements δ 0i = δ 0i x 0i β 0 = dθ i /dη i and v 0i = v 0i x 0i β 0 = d bθ 0i /dθ 0i evaluated at ˆβ 0. The approximations in 7 and 8 are valid for large n and large n 0, respectively. Using 7 and 8 and the proof of Theorem.3, the marginal posterior distribution of β is then approximated by where and ˆB h = ˆΣ ˆβ + Ω β y, y 0, X, X 0 approx. N p  h  h = ˆΣ + Ω Ω Ω Ω Ω + Ω Ω Ω + ˆB h,  h, 30 ˆΣ 0 Ω Ω Ω + ˆΣ 0 Ω Ω 3 ˆΣ ˆΣ ˆβ As in the normal linear regression model, we consider a power prior formulation of the hierarchical GLM given by 5 by setting β 0 = β. The power prior based on the historical data n 0, y 0, X 0 using the initial prior π 0 β is then given by πβ y 0, X 0, a 0 expa 0 τ y 0θX 0 β 0 b[θx 0 β 0 ] + cy 0, τ}, 33 and the posterior distribution of β is given by πβ y, X, y 0, X 0, a 0 expτ y θxβ τb[θxβ] + cy, τ} expa 0 τ y 0θX 0 β b[θx 0 β] + cy 0, τ}. 34 Using 7 and 8, and after some algebra, the posterior distribution of β given by 34 is approximated by. 3 approx. β y, y 0, X, X 0, a 0 N p  p ˆB p,  p, 35

10 560 Power Prior and Hierarchical Models where  p = ˆΣ + a ˆΣ 0 0, ˆBp = ˆΣ ˆβ + a0 ˆΣ ˆβ 0 0, 36 ˆβ and ˆβ 0 are the MLEs, and ˆΣ and ˆΣ 0 are defined by 9. Similar to the normal linear regression model, we now match the approximate posterior distributions of β under the GLM. We are now led to the following theorem. Theorem 3. The approximate posteriors in 30 and 35 match, i.e.,  h = Âp and ˆB h = ˆB p if and only if a 0 I + ΩˆΣ 0 = I. 37 The proof of Theorem 3. is similar to that of Theorem.4, and therefore the details are omitted for brevity. If a generalized g-prior form is taken for Ω, i.e., Ω = c ˆΣ 0, then the approximate power prior and the hierarchical model are identical when a 0 = / + c. Thus, again, we see that for the class of GLM s, the connection between the hierarchical model and the power prior is facilitated by taking a generalized g-prior for Ω. 4 Extensions to Multiple Historical Datasets In this section, we generalize the results given in Theorems.,.4, and 3. to multiple historical datasets. First, we generalize the hierarchical model in 4 and the power prior in 0 to multiple historical datasets as follows. Suppose we have L 0 historical datasets. For the current study, the hierarchical model can be written as y i = θ + ɛ i, 38 where i =,,..., n, and the ɛ i s are i.i.d. N0, random variables, where is fixed. For the historical data, the hierarchical model can be written as y i = θ + ɛ i, 39 where i =,,..., n, the ɛ i s are i.i.d. N0, and independent of the ɛ i s, and the s are fixed for k =,,..., L 0. Further, we assume that and the θ are independently distributed as θ µ, τ Nµ, τ 40 θ µ, τ Nµ, τ, 4 where k =,,..., L 0, and τ is a fixed hyperparameter. Based on the results established in Theorem., we take a uniform improper prior for µ, i.e., πµ, and let θ 0 = θ 0,..., θ 0L0, y = y, y,..., y n, and y 0 = y 0, y 0,..., y 0L 0, where y = y, y,..., y n for k =,,..., L 0. The following theorem gives the form of the marginal posterior distribution [θ y, y 0 ] under the model given by 38, 39, 40, and 4.

11 M.-H. Chen and J. G. Ibrahim 56 Theorem 4. Using 38, 39, 40, and 4, the marginal posterior distribution of θ is given by θ y, y 0 Nµ mh, mh, 4 where µ mh = mh nȳ + [ L0 n ȳ n + τ ] [τ 4 L 0 + τ mh n = + τ τ 4 L 0+ τ L 0 L 0 τ 4 n + τ τ 4 n + τ ȳ = n n i= y i, and ȳ = n n i= y i for k =,,..., L 0. ], 43, 44 The proof of Theorem 4. is given in Appendix B. The power prior formulation of the model with multiple historical datasets can be obtained by letting θ = θ in 40 and 4. The model for the current data can be written as and the model for the historical data can be written as y i = θ + ɛ i, 45 y i = θ + ɛ i, 46 where the ɛ i are i.i.d. N0, for k =,,..., L 0. Now the power prior for θ under multiple historical datasets is given by } πθ y, k =,,..., L 0, a 0 exp L 0 a n y i θ, 47 where a 0 = a 0, a 0,..., a 0L0 with 0 < a < for k =,,..., L 0. Using 45 and 46 along with a uniform improper initial prior for θ, i.e., π 0 θ, we are led to i= θ y, y 0, a 0 Nµ mp, mp, 48 where and µ mp = mp = nȳ + L 0 a n ȳ n + L 0 a n n + L 0 a n We are now led to the following theorem characterizing the relationship between the power prior and the hierarchical model with multiple historical datasets.

12 56 Power Prior and Hierarchical Models Theorem 4. Suppose we take π 0 θ for the power prior and πµ for the hierarchical model. Then µ mh = µ mp in 43 and 49, and mh = mp in 44 and 50 if and only if a = τ 4 n + L 0 + τ τ L 0 i= The proof of Theorem 4. is given in Appendix B. τ 4 n0i 0i + τ. 5 Corollary 4. The choice of a given in 5 satisfies 0 < a <. Furthermore, if τ n for k =,,..., L 0, then the choice of a given in 5 also satisfies L0 a <. The proof of Corollary 4. is straightforward. We also note that when L 0 =, then with some obvious adjustments in notation, µ mh in 43, mh in 44, and a 0 in 5 all reduce to µ h in 6, h in 7, and a 0 in 3, respectively. Next, we extend the normal hierarchical regression model 4 to multiple historical datasets as follows. Suppose we have L 0 historical datasets. In the vector notation, we assume y = Xβ + ɛ and y = X β + ɛ, k =,,..., L 0, 5 where X is an n p matrix, X is an n p matrix for k =,,..., L 0, β and β, k =,,..., L 0, are the p vectors of regression coefficients, ɛ N n 0, I with fixed, ɛ N n 0, I with fixed for k =,,..., L 0, and ɛ is independent of ɛ. The following theorem gives the marginal posterior distribution of β under this setting. Theorem 4.3 Assume that β and β, k =,,..., L 0, are i.i.d. N p µ, Ω random vectors, where Ω is fixed, and πµ. Using 5, the marginal posterior distribution of β is given by β N p A mh B mh, A mh, 53 where ] A mh = L 0 X X+Ω Ω [L 0 + Ω Ω Ω + Ω X X Ω 54 and B mh = X y + ] L 0 Ω [L 0 + Ω Ω Ω + Ω X X Ω L 0 Ω + X X X y }. 55

13 M.-H. Chen and J. G. Ibrahim 563 The proof of Theorem 4.3 is similar to that of Theorem.3 for the normal hierarchical model with a single historical dataset. Thus, the proof is omitted for brevity. By letting β = β 0 = = β in 5, the power prior for the normal linear model with multiple historical datasets can be obtained as πβ y, X, k =,,..., L 0, a 0 exp L 0 a y X β y X β 56 where 0 < a < for k =,,..., L 0. Assuming π 0 β, the posterior distribution of β is given by where β y, X, y, X, k =,,..., L 0 N p A mpb mp, A mp, 57 A mp = L 0 a X X + X X and B mp = X y + L 0 a X y. The following theorem characterizes the relationship between the power prior and the normal hierarchical regression model with multiple historical datasets. Theorem 4.4 The posterior distribution of β in 57 under the power prior 56 matches 53 under the normal hierarchical regression model if and only if ] L 0 a [L 0 + I I + ΩX X = I + ΩX X 58 for k =,,..., L 0. The proof of Theorem 4.4 directly follows from that of Theorem.4. We note that when L 0 =, 58 reduces to 4, and if p =, Ω = τ, and X = X 0 = = X 0L0 =,,...,, 58 reduces to 5. Finally, the result given in Theorem 3. for GLM s can be extended to multiple historical datasets, and details of this extension can be found in Appendix A. }, 5 Elicitation of a 0 In this section, we discuss two methods for specifying Ω in a Bayesian analysis: i using the analytic relationship between the power prior and hierarchical models, and taking a fixed value for Ω namely Ω = cx 0X 0 where c is a fixed hyperparameter, as well as a fixed value for 0, and ii taking Ω and 0 to be random, in which a prior is specified for Ω and 0. We first consider situation i. The results of the previous section, yielding equivalence between the power prior and hierarchical models, shed light on how to elicit a

14 564 Power Prior and Hierarchical Models guide value for a 0 using the historical data. The basic idea is to use the relationship between the power prior and the hierarchical model to specify a guide value for a 0 from the hyperparameters of the hierarchical model. Specifically, we can use the relationship given by 4 to elicit a 0. One natural way to specify a 0 is take the trace of both sides of 4 and take a 0 to be the solution to that equation. Upon taking traces of both sides of equation 4, we are led to a 0 tri + 0 ΩX 0X 0 = tri. 59 Since I is a p p matrix, tri = p, and thus 59 implies that a guide value of a 0 is taken to be p â 0 = p + 0 trωx 0 X If Ω = cx 0X 0, 60 reduces to â 0 = c. We see that 6 corresponds to the relationship between the power prior and the hierarchical model in the case that θ is a univariate parameter, i.e., the i.i.d. case. We can clearly see that 59 satisfies 0 < a 0 <. The guide value for a 0 given in 60 provides a very useful benchmark value for a 0 from which further sensitivity analyses can be conducted. This guide value has a solid theoretical justification and motivation: it is the value of a 0 that results in equivalence between the power prior and the hierarchical model. In the example of Section 6, we show that the guide value given by 60 works quite well in practice. We now consider situation ii. If Ω and 0 are random, then 60 can be easily modified. In this case, our guide value for a 0 would be the posterior expectation of 60 with respect to the joint posterior distribution of 0, Ω, thus leading to â 0 = E p p + 0 trωx 0 X 0, 6 where the expectation is taken with respect to the joint posterior distribution of 0, Ω. Similar formulas for the guide value for a 0 are available for GLM s using 37. For the class of GLM s under situation i, the guide value for a 0 is given by p â 0 = 63 p + trωˆσ 0. When Ω is random under situation ii for GLM s, the guide value is obtained by taking the expectation of 63 with respect to the posterior distribution of Ω, leading to p â 0 = E p + trωˆσ 0. 64

15 M.-H. Chen and J. G. Ibrahim 565 Finally, we briefly discuss how to obtain a guide value for a in the presence of multiple historical datasets under situation ii. For ease of exposition, we consider the normal linear model with multiple historical datasets. Using 58, we take the guide value for a as â = E I ] tr[ + ΩX X L 0 + p L 0 tr [ I + ΩX X ], where the expectation is taken with respect to the joint posterior distribution of Ω, 0,..., 0L 0. After some matrix algebra, it can be shown that â < for k =,,..., L 0. When multiple historical datasets are available, we have more information to estimate Ω, and thus Ω is estimated with more precision in this case. In this sense, inference with multiple historical datasets can be quite advantageous in eliciting the power parameter in the power prior. 6 AIDS Data We consider the two AIDS studies ACTG09 and ACTG036, where ACTG036 represents the current data and ACTG09 represents the historical data. The purpose of this example is to demonstrate the proposed methodology for obtaining the guide value for a 0 discussed in Section 5 in the context of logistic regression. The ACTG09 study was a double blind placebo-controlled clinical trial comparing zidovudine AZT to placebo in persons with CD4 counts less than 500. The sample size for this study, excluding cases with missing data, was n 0 = 83. The response variable y 0 for these data is binary with a indicating death, development of AIDS, or AIDS related complex ARC, and a 0 indicates otherwise. Several covariates were also measured. The ones we use here are CD4 count x 0 cell count per mm 3 of serum, age x 0, and treatment, x 03. The covariates CD4 count and age are continuous, whereas the treatment covariate is binary. The ACTG036 study was also a placebo-controlled clinical trial comparing AZT to placebo in patients with hereditary coagulation disorders. The sample size in this study, excluding cases with missing data, was n = 83. The response variable y for these data is binary with a indicating death, development of AIDS, or AIDS related complex ARC, and a 0 indicates otherwise. Several covariates were measured for these data. The ones we use here are CD4 count x, age x, and treatment x 3. We consider the hierarchical logistic regression model to fit the ACTG09 and ACTG036 data. In 6, we take Ω to be a 4 4 diagonal matrix and we further i.i.d. assume that Ω jj IG, 0.005, j = 0,,..., 3. Using 64, we specify a 0 by [ 4 â 0 = E 4 + trωˆσ 0 ], 65 where ˆΣ 0 is computed via 9, and the expectation is taken with respect to the posterior distribution of Ω under the hierarchical logistic regression model. The Gibbs

16 566 Power Prior and Hierarchical Models sampling algorithm is used to sample β = β 0, β, β, β 3, β 0 = β 00, β 0, β 0, β 03, µ = µ 0, µ, µ, µ 3, and Ω from their respective posterior distributions under the hierarchical model. Table gives the posterior estimates of β, β 0, µ, and Ω based on 0,000 Gibbs iterations after a burn-in of,000 iterations. Using 65, we obtain â 0 = Table shows the posterior estimates of β using the power prior in 33 with several values of a 0, including a 0 = â 0 = From Tables and, we can see that the posterior estimates of β using the power prior with a 0 = 0.45 are fairly close to those obtained under the hierarchical model. From Table, we also see that the posterior estimates are slightly different for the different values of a 0. In particular, when more weight is given to the historical data, age and treatment become more significant, that is, their 95% Highest Posterior Density HPD intervals do not include 0. Thus, the power prior gives the investigator great flexibility for incorporating the historical data to analyze the ACTG036 trial. Posterior Posterior Posterior Posterior Parameter Mean Std Dev Parameter Mean Std Dev β β β β β β β β µ Ω µ Ω µ Ω µ Ω Table : Posterior Estimates Under the Hierarchical Logistic Regression Model Posterior Posterior 95% HPD a 0 Parameter Mean Std Dev Interval 0 β , -3.3 β , β , β , β , β , β , 0.53 β , β , -.7 β , β , 0.53 β , Table : Posterior Estimates Using the Power Prior for the Logistic Regression Model

17 M.-H. Chen and J. G. Ibrahim 567 Appendix A: Extension to Multiple Historical Datasets for GLMs In this appendix, we generalize the result given in Theorem 3. to multiple historical datasets. First, we consider the hierarchical GLM with multiple historical datasets. In vector notation, the likelihood function of β, β 0 given y, y 0,..., y 0L0 is given by py, y 0,..., y 0L0 β, β 0 = expτ y θxβ b[θxβ] + cy, τ} L 0 expτ y θx β b[θx β ] + cy, τ}. We independently take β N p µ, Ω, β N p µ, Ω for k =,,..., L 0, and πµ. Using 3.3, 3.4, and Theorems.3 and 4.3, the marginal posterior distribution of β is approximated by i.e., for n and n 0 large β y, X, y, X, k =,,..., L 0 approx. N p  ˆB mh mh,  mh, A. where L  mh = ˆΣ 0 + Ω Ω [L 0 + Ω Ω Ω + ˆΣ Ω ] Ω, ˆB mh = ˆΣ ˆβ + L 0 Ω [L 0 + Ω Ω Ω + Ω L 0 Ω + ˆΣ ˆΣ ˆβ } Ω ˆΣ, ] ˆβ is the MLE of β based on the data D = n, y, X, and ˆβ is the MLE of β based on the data D = n, y, X, k =,..., L 0 under the GLM, ˆΣ and ˆΣ are defined by 3.5 based on the current data D and the historical data D, for k =,..., L 0. The power prior based on the L 0 historical datasets y, X, k =,,..., L 0 } using the initial prior π 0 β is given by πβ y, X, k =,,..., L 0, a 0 L 0 expa τ y θx β b[θx β] + cy 0, τ}, A.

18 568 Power Prior and Hierarchical Models and the posterior distribution of β is given by πβ y, X, y, X, k =,,..., L 0, a 0 expτ y θxβ τb[θxβ] + cy, τ} L 0 expa τ y θx β b[θx β] + cy 0, τ}. A.3 Using 3.3 and 3.4 and after some algebra, the posterior distribution of β given by A.3 is approximated by i.e., for large n and n, k =,..., L 0 β y, X, y, X =,,..., L 0, a 0 approx. N p  p ˆB p,  p, A.4 where L  p = ˆΣ 0 L0 + a ˆΣ and ˆB p = ˆΣ ˆβ + a ˆΣ ˆβ. Then we are now led to the following theorem. Theorem A. The posterior distribution of β in A.4, based on the power prior A., matches A. under the hierarchical GLM if and only if for k =,,..., L 0. ] L 0 a [L 0 + I I + ΩˆΣ = I + ΩˆΣ A.5 Appendix B: Proofs of Theorems Proof of Theorem.: Under the hierarchical model in.,., and.3, we can write the joint posterior of θ, θ 0 as nȳ θ πθ, θ 0, µ y, y 0 exp n 0ȳ 0 θ 0 0 θ µ + θ 0 µ τ Integrating out µ gives πθ, θ 0 y, y 0 exp nȳθ nθ n 0θ0 n 0 ȳ 0 θ 0 0 θ + θ0 τ θ+θ 0 } µ α ν. τ + α ν ν + τ.

19 M.-H. Chen and J. G. Ibrahim 569 After integrating out θ 0, straightforward calculations lead to πθ y, y 0 exp exp n0ȳ 0 0 [ n + τ nȳ τ 4 ν + θ τ + + α τ ν ν + n 0 0 nȳ + τ + + τ τ 4 ν + θ τ 4 ν + τ τ [ n + τ τ 4 ν + α τ ν ν + τ + ]} τ τ 4 ν + τ n0 τ 4 ν + τ n0ȳ0 0 n α τ ν ν + + τ τ 4 ν + τ α τ ν ν + θ τ + 0 τ τ 4 ν + τ τ θ ]} θ. B. Thus,.4 directly follows from B.. Proof of Theorem.: It is easy to show that µ h in.5 equals µ p in.0 if and only if α = 0. When α = 0, µ h reduces to µ h = h [ nȳ + n 0 0 To match B. to µ p in.0, we have to set τ 4 ν + n0ȳ0 τ 0 + τ τ 4 ν + τ ]. B. a 0 = τ 4 ν + τ n0 + 0 τ τ 4 ν + τ B.3 and n + a 0 n 0 0 = h. B.4 Note that B.4 directly yields that h in.6 is equal to p in. Now, using B.3, B.4 reduces to n 0 0 τ 4 ν + τ n0 + 0 = τ τ 4 ν + τ τ τ 4 ν + τ τ 4 ν + n0 τ + 0 τ τ 4 ν + τ. B.5

20 570 Power Prior and Hierarchical Models After some algebra, we obtain = n 0 0 τ 4 ν + τ n0 + 0 τ 4 ν + τ τ τ 4 ν + τ τ 4 ν + τ τ ν + n τ τ 4 ν + τ Thus, B.5 holds if and only if ν. When ν, B.3 reduces to., which completes the proof of Theorem.. Proof of Theorem.3: Using.3 and.4, we have πβ, β 0, µ y, X, y 0, X 0 exp y Xβ y Xβ} exp 0 y 0 X 0 β 0 y 0 X 0 β 0 } exp β µ Ω β µ} exp β 0 µ Ω β 0 µ}. Integrating out β 0 leads to πβ, µ y, X, y 0, X 0 exp y Xβ y Xβ β µ Ω β µ} exp [ µ Ω Ω Ω + 0 X 0X 0 Ω µ µ Ω Ω + 0 X 0X 0 0 X 0y 0 ]}. To obtain the marginal distribution of β, we need to integrate out µ. After some further algebra, we obtain πβ y, X, y 0, X 0 exp β [ X X + Ω Ω Ω Ω Ω + 0 X 0X 0 Ω Ω ] β β [ X Y + Ω Ω Ω Ω + 0 X 0X 0 Ω Ω Ω + 0 X 0X 0 ] } 0 X 0y 0. Thus,.5 directly follows from the above equation. Proof of Theorem.4: A h in.6 matches equals A p in. if and only if a 0 0 X 0X 0 = Ω Ω Ω Ω Ω + 0 X 0X 0 Ω Ω. B.6.

21 M.-H. Chen and J. G. Ibrahim 57 After some algebra, B.6 can be written as a 0 0 which is equivalent to a 0 0 ΩX 0X 0 = I I I + 0 ΩX 0X 0, ΩX 0X 0 I I + 0 ΩX 0X 0 = I I + 0 ΩX 0X 0. B.7 Multiplying both sides of B.7 by I + 0 ΩX 0X 0 gives a 0 0 ΩX 0X 0 [ I + 0 ΩX 0X 0 I ] = 0 ΩX 0X 0. B.8 Since X 0X 0 is invertible, then B.8 reduces to.3. We further note that if.3 holds, B h in.7 automatically matches B p in. which completes the proof. Proof of Theorem 4.: Let θ 0 = θ 0, θ 0,..., θ. Under the hierarchical model in 4., 4., 4.3, and 4.4, we can write the joint posterior of θ, θ 0, µ as nȳ θ πθ, θ 0, µ y, y 0 exp After integrating out θ 0, we obtain πθ, µ y, y 0 exp Integrating out µ from B.9 yields πθ y, y 0 exp exp L 0 n ȳ θ [ nȳ θ θ µ + τ + L 0µ τ nȳ + [ n + τ θ nȳθ [ n + τ L0 L 0 τ 4 L 0+ τ L 0 n ȳ n + τ τ 4 L 0+ τ L 0 Thus, 4.5 directly follows from B.0. Proof of Theorem 4.: τ 4 n θ µ τ τ n ȳ n + τ θ τ + L 0 L 0+ τ L 0 + τ τ 4 n + τ L 0 + µ ]} τ n ȳ τ n + τ 4 n θ τ + τ θ µ }.. B.9 ]} ]} θ. B.0

22 57 Power Prior and Hierarchical Models To show µ mh = µ mp in 4.6 and 4. and mh = mp in 4.7 and 4.3, we take a in 4.4. It suffices to show L0 n + n a = mh. B. Note that if B. is true, then we have mh = mp. Using 4.3, B. reduces to L 0 a n = τ τ 4 L 0+ τ L 0 τ 4 n + τ. B. Using 4.4, we are led to L 0 a n = = L0 τ 4 L0+ τ L 0 n + τ τ n + τ τ [ L 0+ τ τ 4 L0 τ 4 L 0+ τ L 0 τ 4 n + τ = ] n + τ τ τ 4 n + τ, L 0 τ L0 τ 4 L0+ τ L 0 n + τ τ 4 n + τ which is equal to the right-hand side of B.. This completes the proof. References Berry, D. A. 99. Bayesian Methodology in Phase III Trails. Drug Information Association Journal, 5: Berry, D. A. and Hardwick, J Using Historical Controls in Clinical Trials: Application to ECMO. In Berry, D. A. and Stangl, D. K. eds., Statistical Decision Theory and Related Topics V, New York: Springer-Verlag. 55 Berry, D. A. and Stangl, D. K Bayesian Methods in Health-Related Research. In Berry, D. A. and Stangl, D. K. eds., Bayesian Biostatistics, New York: Marcel Dekker. 55 Brophy, J. M. and Joseph, L Placing Trials in Context using Bayesian Analysis - GUSTO Revisited by Reverend Bayes. Journal of the American Medical Association, 73: Chen, M.-H., Ibrahim, J. G., and Shao, Q.-M Power Prior Distributions for Generalized Linear Models. Journal of Statistical Planning and Inference, 84:

23 M.-H. Chen and J. G. Ibrahim 573 Chen, M.-H., Ibrahim, J. G., and Sinha, D. 999a. A New Bayesian Model for Survival Data with a Surviving Fraction. Journal of the American Statistical Association, 94: Chen, M.-H., Ibrahim, J. G., and Yiannoutsos, C. 999b. Prior Elicitation, Variable Selection and Bayesian Computation for Logistic Regression Models. Journal of the Royal Statistical Society, Series B, 6: , 55 Eddy, D. M., Hasselblad, V., and Schaachter, R. 99. Meta-analysis by the Confidence Profile Method: The Statistical Synthesis of Evidence. New York: Academic Press. 55 Fryback, D. G., Chinnis, J. O., and Ulvila, J. W. 00. Bayesian Cost-effectiveness Analysis - An Example Using the GUSTO Trial. International Journal of Treatment and Healh Care, 7: Greenhouse, J. B. and Wasserman, L Robust Bayesian Methods for Monitoring Clinical Trials. Statistics in Medicine, 4: Ibrahim, J. G. and Chen, M.-H Prior Distributions and Bayesian Computation for Proportional Hazards Models. Sankhya, Series B, 60: Power Prior Distributions for Regression Models. Statistical Science, 5: , 55 Ibrahim, J. G., Chen, M.-H., and MacEachern, S. N Bayesian Variable Selection for Proportional Hazards Models. Canadian Journal of Statistics, 7: , 55 Ibrahim, J. G., Chen, M.-H., and Sinha, D On Optimality Properties of the Power Prior. Journal of the American Statistical Association, 98: Ibrahim, J. G. and Laud, P. W A Predictive Approach to the Analysis of Designed Experiments. Journal of the American Statistical Association, 89: Ibrahim, J. G., Ryan, L. M., and Chen, M.-H Use of Historical Controls to Adjust for Covariates in Trend Tests for Binary Data. Journal of the American Statistical Association, 93: Laud, P. W. and Ibrahim, J. G Predictive Model Selection. Journal of the Royal Statistical Society, Series B, 57: Lin, Z Statistical Methods for Combining Historical Controls With Clinical Trial Data. Unpublished Ph.D. Dissertation, Duke University, Durham, NC. 55 Spiegelhalter, D. J., Abrams, K. R., and Myles, J. P Bayesian Approaches to Clinical Trials and Health-care Evaluation. Chichester: Wiley. 55

24 574 Power Prior and Hierarchical Models Spiegelhalter, D. J., Freedman, L. S., and Parmar, M. K. B Bayesian Approaches to Randomized Trials with discussion. Journal of the Royal Statistical Society, Series A, 57: Tan, S., Machin, D., Tai, B. C., and Foo, K. F. 00. A Bayesian Re-assessment of Two Phase II Trials of Gemcitabine in Metastatic Nasopharyngeal Cancer. British Journal of Cancer, 86: Zellner, A On Assessing Prior Distributions and Bayesian Regression Analysis with g-prior Distributions. In Goel, P. and Zellner, A. eds., Bayesian Inference and Decision Techniques, Amsterdam: Elsevier Science Publishers B.V. 557 Acknowledgments The authors wish to thank the Editor and two referees for several suggestions which have greatly improved the paper. This research was partially supported by NIH grants #CA 700 and #CA 7405.

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

CONJUGATE PRIORS FOR GENERALIZED LINEAR MODELS

CONJUGATE PRIORS FOR GENERALIZED LINEAR MODELS Statistica Sinica 13(23), 461-476 CONJUGATE PRIORS FOR GENERALIZED LINEAR MODELS Ming-Hui Chen and Joseph G. Ibrahim University of Connecticut and University of North Carolina Abstract: We propose a novel

More information

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

STA216: Generalized Linear Models. Lecture 1. Review and Introduction STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33 Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

Generalized Linear Models. Kurt Hornik

Generalized Linear Models. Kurt Hornik Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general

More information

ST 740: Linear Models and Multivariate Normal Inference

ST 740: Linear Models and Multivariate Normal Inference ST 740: Linear Models and Multivariate Normal Inference Alyson Wilson Department of Statistics North Carolina State University November 4, 2013 A. Wilson (NCSU STAT) Linear Models November 4, 2013 1 /

More information

Xia Wang and Dipak K. Dey

Xia Wang and Dipak K. Dey A Flexible Skewed Link Function for Binary Response Data Xia Wang and Dipak K. Dey Technical Report #2008-5 June 18, 2008 This material was based upon work supported by the National Science Foundation

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

The linear model is the most fundamental of all serious statistical models encompassing:

The linear model is the most fundamental of all serious statistical models encompassing: Linear Regression Models: A Bayesian perspective Ingredients of a linear model include an n 1 response vector y = (y 1,..., y n ) T and an n p design matrix (e.g. including regressors) X = [x 1,..., x

More information

Bayesian Inference. Chapter 9. Linear models and regression

Bayesian Inference. Chapter 9. Linear models and regression Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering

More information

Variable Selection for Multivariate Logistic Regression Models

Variable Selection for Multivariate Logistic Regression Models Variable Selection for Multivariate Logistic Regression Models Ming-Hui Chen and Dipak K. Dey Journal of Statistical Planning and Inference, 111, 37-55 Abstract In this paper, we use multivariate logistic

More information

CRITERION-BASED METHODS FOR BAYESIAN MODEL ASSESSMENT

CRITERION-BASED METHODS FOR BAYESIAN MODEL ASSESSMENT Statistica Sinica 11(2001), 419-443 CRITERION-BASED METHODS FOR BAYESIAN MODEL ASSESSMENT Joseph G. Ibrahim, Ming-Hui Chen and Debajyoti Sinha Harvard School of Public Health and Dana-Farber Cancer Institute,

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Including historical data in the analysis of clinical trials using the modified power priors: theoretical overview and sampling algorithms

Including historical data in the analysis of clinical trials using the modified power priors: theoretical overview and sampling algorithms Including historical data in the analysis of clinical trials using the modified power priors: theoretical overview and sampling algorithms Joost van Rosmalen 1, David Dejardin 2,3, and Emmanuel Lesaffre

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random STA 216: GENERALIZED LINEAR MODELS Lecture 1. Review and Introduction Much of statistics is based on the assumption that random variables are continuous & normally distributed. Normal linear regression

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

CTDL-Positive Stable Frailty Model

CTDL-Positive Stable Frailty Model CTDL-Positive Stable Frailty Model M. Blagojevic 1, G. MacKenzie 2 1 Department of Mathematics, Keele University, Staffordshire ST5 5BG,UK and 2 Centre of Biostatistics, University of Limerick, Ireland

More information

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Generalized Linear Models. Last time: Background & motivation for moving beyond linear Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered

More information

Linear Methods for Prediction

Linear Methods for Prediction This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

An Extended BIC for Model Selection

An Extended BIC for Model Selection An Extended BIC for Model Selection at the JSM meeting 2007 - Salt Lake City Surajit Ray Boston University (Dept of Mathematics and Statistics) Joint work with James Berger, Duke University; Susie Bayarri,

More information

The logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is

The logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is Example The logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is log p 1 p = β 0 + β 1 f 1 (y 1 ) +... + β d f d (y d ).

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Statistics & Data Sciences: First Year Prelim Exam May 2018

Statistics & Data Sciences: First Year Prelim Exam May 2018 Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book

More information

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets Abhirup Datta 1 Sudipto Banerjee 1 Andrew O. Finley 2 Alan E. Gelfand 3 1 University of Minnesota, Minneapolis,

More information

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Advanced Statistics and Data Mining Summer School

More information

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative

More information

Introduction to Maximum Likelihood Estimation

Introduction to Maximum Likelihood Estimation Introduction to Maximum Likelihood Estimation Eric Zivot July 26, 2012 The Likelihood Function Let 1 be an iid sample with pdf ( ; ) where is a ( 1) vector of parameters that characterize ( ; ) Example:

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Gibbs Sampling in Linear Models #2

Gibbs Sampling in Linear Models #2 Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling

More information

Practical considerations for survival models

Practical considerations for survival models Including historical data in the analysis of clinical trials using the modified power prior Practical considerations for survival models David Dejardin 1 2, Joost van Rosmalen 3 and Emmanuel Lesaffre 1

More information

Multivariate Survival Analysis

Multivariate Survival Analysis Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in

More information

Partial factor modeling: predictor-dependent shrinkage for linear regression

Partial factor modeling: predictor-dependent shrinkage for linear regression modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013 Factor framework The factor framework

More information

Chapter 5. Bayesian Statistics

Chapter 5. Bayesian Statistics Chapter 5. Bayesian Statistics Principles of Bayesian Statistics Anything unknown is given a probability distribution, representing degrees of belief [subjective probability]. Degrees of belief [subjective

More information

Problem Selected Scores

Problem Selected Scores Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Module 22: Bayesian Methods Lecture 9 A: Default prior selection Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical

More information

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks.

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks. Linear Mixed Models One-way layout Y = Xβ + Zb + ɛ where X and Z are specified design matrices, β is a vector of fixed effect coefficients, b and ɛ are random, mean zero, Gaussian if needed. Usually think

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Department of Statistical Science FIRST YEAR EXAM - SPRING 2017

Department of Statistical Science FIRST YEAR EXAM - SPRING 2017 Department of Statistical Science Duke University FIRST YEAR EXAM - SPRING 017 Monday May 8th 017, 9:00 AM 1:00 PM NOTES: PLEASE READ CAREFULLY BEFORE BEGINNING EXAM! 1. Do not write solutions on the exam;

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee September 03 05, 2017 Department of Biostatistics, Fielding School of Public Health, University of California, Los Angeles Linear Regression Linear regression is,

More information

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 20 Lecture 6 : Bayesian Inference

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

November 2002 STA Random Effects Selection in Linear Mixed Models

November 2002 STA Random Effects Selection in Linear Mixed Models November 2002 STA216 1 Random Effects Selection in Linear Mixed Models November 2002 STA216 2 Introduction It is common practice in many applications to collect multiple measurements on a subject. Linear

More information

Model Checking and Improvement

Model Checking and Improvement Model Checking and Improvement Statistics 220 Spring 2005 Copyright c 2005 by Mark E. Irwin Model Checking All models are wrong but some models are useful George E. P. Box So far we have looked at a number

More information

Semiparametric Mixed Effects Models with Flexible Random Effects Distribution

Semiparametric Mixed Effects Models with Flexible Random Effects Distribution Semiparametric Mixed Effects Models with Flexible Random Effects Distribution Marie Davidian North Carolina State University davidian@stat.ncsu.edu www.stat.ncsu.edu/ davidian Joint work with A. Tsiatis,

More information

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Michael J. Daniels and Chenguang Wang Jan. 18, 2009 First, we would like to thank Joe and Geert for a carefully

More information

A Bayesian perspective on GMM and IV

A Bayesian perspective on GMM and IV A Bayesian perspective on GMM and IV Christopher A. Sims Princeton University sims@princeton.edu November 26, 2013 What is a Bayesian perspective? A Bayesian perspective on scientific reporting views all

More information

Gibbs Sampling in Endogenous Variables Models

Gibbs Sampling in Endogenous Variables Models Gibbs Sampling in Endogenous Variables Models Econ 690 Purdue University Outline 1 Motivation 2 Identification Issues 3 Posterior Simulation #1 4 Posterior Simulation #2 Motivation In this lecture we take

More information

Bayesian Multivariate Logistic Regression

Bayesian Multivariate Logistic Regression Bayesian Multivariate Logistic Regression Sean M. O Brien and David B. Dunson Biostatistics Branch National Institute of Environmental Health Sciences Research Triangle Park, NC 1 Goals Brief review of

More information

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Put your solution to each problem on a separate sheet of paper. Problem 1. (5166) Assume that two random samples {x i } and {y i } are independently

More information

Bayesian Econometrics

Bayesian Econometrics Bayesian Econometrics Christopher A. Sims Princeton University sims@princeton.edu September 20, 2016 Outline I. The difference between Bayesian and non-bayesian inference. II. Confidence sets and confidence

More information

A Very Brief Summary of Bayesian Inference, and Examples

A Very Brief Summary of Bayesian Inference, and Examples A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b) LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking

More information

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Stable Limit Laws for Marginal Probabilities from MCMC Streams: Acceleration of Convergence

Stable Limit Laws for Marginal Probabilities from MCMC Streams: Acceleration of Convergence Stable Limit Laws for Marginal Probabilities from MCMC Streams: Acceleration of Convergence Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham NC 778-5 - Revised April,

More information

Linear Models A linear model is defined by the expression

Linear Models A linear model is defined by the expression Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Analysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates

Analysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates Communications in Statistics - Theory and Methods ISSN: 0361-0926 (Print) 1532-415X (Online) Journal homepage: http://www.tandfonline.com/loi/lsta20 Analysis of Gamma and Weibull Lifetime Data under a

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina Local Likelihood Bayesian Cluster Modeling for small area health data Andrew Lawson Arnold School of Public Health University of South Carolina Local Likelihood Bayesian Cluster Modelling for Small Area

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Statistical Inference with Monotone Incomplete Multivariate Normal Data

Statistical Inference with Monotone Incomplete Multivariate Normal Data Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 1/4 Statistical Inference with Monotone Incomplete Multivariate Normal Data This talk is based on joint work with my wonderful

More information

Using historical data for Bayesian sample size determination

Using historical data for Bayesian sample size determination J. R. Statist. Soc. A (2007) 170, Part 1, pp. 95 113 Using historical data for Bayesian sample size determination Fulvio De Santis Università di Roma La Sapienza, Italy [Received July 2004. Final revision

More information

Uniformly Most Powerful Bayesian Tests and Standards for Statistical Evidence

Uniformly Most Powerful Bayesian Tests and Standards for Statistical Evidence Uniformly Most Powerful Bayesian Tests and Standards for Statistical Evidence Valen E. Johnson Texas A&M University February 27, 2014 Valen E. Johnson Texas A&M University Uniformly most powerful Bayes

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information

Introduction. Start with a probability distribution f(y θ) for the data. where η is a vector of hyperparameters

Introduction. Start with a probability distribution f(y θ) for the data. where η is a vector of hyperparameters Introduction Start with a probability distribution f(y θ) for the data y = (y 1,...,y n ) given a vector of unknown parameters θ = (θ 1,...,θ K ), and add a prior distribution p(θ η), where η is a vector

More information

INTRODUCTION TO BAYESIAN STATISTICS

INTRODUCTION TO BAYESIAN STATISTICS INTRODUCTION TO BAYESIAN STATISTICS Sarat C. Dass Department of Statistics & Probability Department of Computer Science & Engineering Michigan State University TOPICS The Bayesian Framework Different Types

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL* USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL* 3 Conditionals and marginals For Bayesian analysis it is very useful to understand how to write joint, marginal, and conditional distributions for the multivariate

More information

Invariant HPD credible sets and MAP estimators

Invariant HPD credible sets and MAP estimators Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in

More information

A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling

A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling Min-ge Xie Department of Statistics, Rutgers University Workshop on Higher-Order Asymptotics

More information

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Jon Wakefield Departments of Statistics and Biostatistics University of Washington 1 / 37 Lecture Content Motivation

More information

Bayesian Meta-analysis with Hierarchical Modeling Brian P. Hobbs 1

Bayesian Meta-analysis with Hierarchical Modeling Brian P. Hobbs 1 Bayesian Meta-analysis with Hierarchical Modeling Brian P. Hobbs 1 Division of Biostatistics, School of Public Health, University of Minnesota, Mayo Mail Code 303, Minneapolis, Minnesota 55455 0392, U.S.A.

More information

Introduction to Bayesian Methods

Introduction to Bayesian Methods Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative

More information

Using Historical Experimental Information in the Bayesian Analysis of Reproduction Toxicological Experimental Results

Using Historical Experimental Information in the Bayesian Analysis of Reproduction Toxicological Experimental Results Using Historical Experimental Information in the Bayesian Analysis of Reproduction Toxicological Experimental Results Jing Zhang Miami University August 12, 2014 Jing Zhang (Miami University) Using Historical

More information

Last week. posterior marginal density. exact conditional density. LTCC Likelihood Theory Week 3 November 19, /36

Last week. posterior marginal density. exact conditional density. LTCC Likelihood Theory Week 3 November 19, /36 Last week Nuisance parameters f (y; ψ, λ), l(ψ, λ) posterior marginal density π m (ψ) =. c (2π) q el P(ψ) l P ( ˆψ) j P ( ˆψ) 1/2 π(ψ, ˆλ ψ ) j λλ ( ˆψ, ˆλ) 1/2 π( ˆψ, ˆλ) j λλ (ψ, ˆλ ψ ) 1/2 l p (ψ) =

More information

Bayesian methods for sample size determination and their use in clinical trials

Bayesian methods for sample size determination and their use in clinical trials Bayesian methods for sample size determination and their use in clinical trials Stefania Gubbiotti Abstract This paper deals with determination of a sample size that guarantees the success of a trial.

More information

Power and Sample Size Calculations with the Additive Hazards Model

Power and Sample Size Calculations with the Additive Hazards Model Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information