arxiv: v2 [stat.me] 13 Jun 2012

Size: px
Start display at page:

Download "arxiv: v2 [stat.me] 13 Jun 2012"

Transcription

1 A Bayes factor with reasonable model selection consistency for ANOVA model arxiv: v2 [stat.me] 13 Jun 2012 Yuzo Maruyama The University of Tokyo Abstract: For the ANOVA model, we propose a new g-prior based Bayes factor without integral representation, with reasonable model selection consistency for any asymptotic situations (either number of levels of the factor and/or number of replication in each level goes to infinity). Exact analytic calculation of the marginal density under a special choice of the priors enables such a Bayes factor. AMS 2000 subject classifications: Primary 62F15, 62F07; secondary 62A10. Keywords and phrases: Bayesian model selection, model selection consistency, fully Bayes method, Bayes factor. 1. Introduction In this paper, we consider Bayesian model selection for ANOVA model. We start with one-way ANOVA with two possible models. In one model, all random variables have the same mean. In the other model, there are some levels and random variables in each level have different means. Formally, the independent observations y ij (i = 1,...,p, j = 1,...,r i, n = p i=1 r i) are assumed to arise from the linear model: y ij = µ+α i +ǫ ij, ǫ ij N(0,σ 2 ) (1.1) where µ, α i (i = 1,...,p) and σ 2 are unknown. Clearly two models described above are written as follows: M 1 : α i = 0 for all 1 i p, vs M A+1 : α i 0 for some i. (1.2) Model selection, which refers to using the data in order to decide on the plausibility of two or more competing models, is a common problem in modern statistical science. A natural Bayesian approach is by Bayes factor (ratio of marginal densities of two models), which is a function of the posterior model probabilities (Kass and Raftery (1995)). From a theoretical viewpoint, one of the most important topic on Bayesian model selection is consistency. Consistency means that the true model will be chosen if enough data are observed, assuming that one of the competing models is true. It is well-known that BIC by Schwarz (1978) is consistent under fixed number of parameters. In our situation, when p is fixed and r = n p p = i=1 r i p 1, (1.3)

2 Y. Maruyama/A Bayes factor 2 the BIC is consistent. As a high-dimensional problem of statistical inference, the consistency in the case where p in one-way ANOVA setup, has been addressed by Stone (1979) and Berger, Ghosh and Mukhopadhyay (2003). In the following, let CASE I and CASE II denote the cases where I. n, p and r is bounded, II. n, p and r, respectively. When σ 2 is known and CASE I is assumed, Stone (1979) showed that BIC chooses the null model M 1 with probability 1 (that is, BIC is not consistent under M A+1 ). This is reasonable because BIC is originally derived by the Laplace approximation under fixed number of parameters. In the same situation as Stone (1979), Berger, Ghosh and Mukhopadhyay(2003) proposed a Bayesian criterion called GBIC, which is derived by the Laplace approximation under CASE I. Then they showed that GBIC has model selection consistency under CASE I. Recently Moreno, Girón and Casella(2010) showed that under CASE II with p = O(n b ) for 0 < b < 1 or equivalently p = O({ r} b/(1 b) ), BIC is consistent. By consideringbveryclose to 1, BIC seemsto haveconsistencyunder CASE II even if r approaches infinity much slower than p. As in Theorem 3.1, however, we show that this is not so but the consistency of BIC under CASE II depends on how quick r goesto infinity comparedto p. As far as we know, this inconsistency of BIC has not yet been reported in the literature. As an alternative to BIC, following Zellner(1986); Liang et al.(2008); Maruyama and George (2011), we will propose a new g-prior based Bayes factor with consistency under CASE II, which is given by where BF F (A+1):1 = Γ(p/2)Γ({n p}/2) ( 1+ W ) (n p 1)/2 H, (1.4) Γ(1/2)Γ({n 1}/2) W E W H = r i (ȳ i ȳ ) 2, W E = i ij j ȳ i = y ij ij, ȳ = y ij r i i r. i (y ij ȳ i ) 2, As shown in Theorem 3.1, BF F (A+1):1 is consistent under CASE II even if r approaches infinity much slower than p. Under CASE I where BIC is not consistent, the consistency of BF F (A+1):1 is shown to depend on a kind of distance between M A+1 and M 1, which is c A = ij (E[y ij] E[ȳ ]) 2 nσ 2. (1.5) Naturally c A = 0 under M 1 and c A > 0 under M A+1. In Theorem 3.1, under CASE I with a positive small c A, we show that BF F (A+1):1 chooses M 1 with

3 Y. Maruyama/A Bayes factor 3 probability 1. Actually such an inconsistency result has been already reported by Moreno, Girón and Casella (2010). In Remark 3.1, we demonstrate that the existence of inconsistency region for small c A and large p is reasonable from the prediction point of view. The rest of the paper is organized as follows. In Section 2, we re-parameterize the ANOVA model givenby (1.1) as a linear regressionmodel and give priorsfor the regression model. Then we derive marginal densities under M 1 and M A+1 and eventually the Bayes factor given by (1.4), as the ratio of the marginal densities. In Section 3, we show that the Bayes factor has a reasonable model selection consistency for any asymptotic situation. In Section 4, the results derivedinsections2and3areextendedtotwo-wayanovamodel.theappendix presents some of the more technical proofs. 2. A Bayes factor for one-way ANOVA 2.1. re-parameterized ANOVA Let α = (α 1,...,α p ), y = (y 11,...,y 1r1,y 21,...,y 2r2,...,y p1,...,y prp ), ǫ = (ǫ 11,...,ǫ 1r1,ǫ 21,...,ǫ 2r2,...,ǫ p1,...,ǫ prp ). Further let an n p matrix X A = (x ij ) satisfy { 1 if j 1 k=1 x ij = r k +1 i j k=1 r k, 0 otherwise. (2.1) Then the linear model given by (1.1) is written as y = µ1 n +X A α+ǫ = θ 1 1 n + X A α+ǫ where θ 1 = µ+r α/n with r = (r 1,...,r p ) and X A = X A 1 nr which is the centered matrix of X A. The matrix X A is written as r 1 r 2 n {}}{{}}{{}}{ ( ν 1,...,ν 1, ν 2,...,ν 2,... ν p,...,ν p ) (2.2) where ν i = e i n 1 r with e i = (0,...,0,1,0,...,0). The n p matrix X A is not full rank but rank X A = p 1 since {ν 1,...,ν p 1 } is linearly independent and p i=1 r iν i = 0. For identifiability of α, the linear restriction r p ν 0α = 0 (2.3)

4 Y. Maruyama/A Bayes factor 4 is assumed such that {ν 0,ν 1,...,ν p 1 } is linearly independent. Typically 1 p or r are chosen as ν 0. We will see that any linear restriction does not affect our result. Let the singular value decomposition of XA with rank p 1 be X A = U A D A W A, whereu A andw A aren (p 1)andp (p 1)orthogonalmatrices,respectively. Then the one-way ANOVA is re-parameterized as the linear regression model where y = θ 1 1 n +U A θ A +ǫ (2.4) θ A = D A W Aα R p 1. (2.5) Under the restriction ν 0 α = 0, θ A = 0 p 1 is equivalent to α = 0 p because the p p matrix (W A D A,ν 0 ) is non-singular and ( ) ( ) DA W A θa α =. (2.6) 0 ν 0 Hence, under (2.4), two models described in (1.2) are written as M 1 : θ A = 0 p 1 and M A+1 : θ A 0 p Priors and the Bayes factor In Bayesian model selection, the specification of priors are needed on the models and parameters in each model. For the former, let Pr(M 1 ) = Pr(M A+1 ) = 1/2 as usual. For the latter, at moment, we write joint prior densities as p(θ 1,σ 2 ) for M 1 and p(θ 1,θ A,σ 2 ) for M A+1. From the Bayes theorem, M A+1 is chosen when Pr(M A+1 y) > 1/2 where Pr(M A+1 y) = BF (A+1):1 1+BF (A+1):1 and BF (A+1):1 is the Bayes factor given by BF (A+1):1 = m A+1 (y)/m 1 (y). (2.7) In other words, M A+1 is chosen if and only if BF (A+1):1 > 1. In (2.7), m γ (y) is the marginal density under M γ for γ = 1,A+1 as follows: m 1 (y) = p(y θ 1,σ 2 )p(θ 1,σ 2 )dθ 1 dσ 2 m A+1 (y) = p(y θ 1,θ A,σ 2 )p(θ 1,θ A,σ 2 )dθ 1 dθ A dσ 2, where p(y θ 1,σ 2 ) and p(y θ 1,θ A,σ 2 ) are sampling densities of y under M 1 and M A+1, respectively.

5 Y. Maruyama/A Bayes factor 5 In this paper, we use the following joint prior density for M 1 and p(θ 1,σ 2 ) = p(θ 1 )p(σ 2 ) = 1 σ 2 (2.8) p(θ 1,θ A,σ 2 ) = p(θ 1 )p(σ 2 )p(θ A σ 2 ) = 1 σ 2 0 p(θ A g,σ 2 )p(g)dg (2.9) for M A+1. Note that p(θ 1 )p(σ 2 ) = σ 2 in both (2.8) and (2.9) is a popular non-informative prior. It is clear that θ 1 and σ 2 are location-scale parameters, and the improper prior for them is the right-haar prior from the locationscale invariance group. These are reasons why the use of these improper priors for θ 1 and σ 2 is formally justified. See Berger, Pericchi and Varshavsky (1998); Berger, Bernardo and Sun (2009) for details. As p(θ A g,σ 2 ), we use so-called Zellner s (1986) g-prior p(θ A σ 2,g) = N p 1 (0,gσ 2 (U AU A ) 1 ) = N p 1 (0,gσ 2 I p 1 ). (2.10) As the prior of g, following Maruyama and George (2011), we use PearsonType VI distribution with the density p(g) = gb (1+g) a b 2 B(a+1,b+1) = g(n p)/2 a 2 (1+g) (n p)/2 B(a+1,(n p)/2 a 1) (2.11) where 1 < a < (n p)/2 1 and b = (n p)/2 a 2. (2.12) Remark 2.1. For the choice of a, my recommendation is a = 1/2. We will describe it briefly. The asymptotic behavior of p(g) given by (2.11), for sufficiently large g, is proportional to g a 2. From the Tauberian Theorem, which is well-known for describing the asymptotic behavior of the Laplace transform, we have p(θ A σ 2 ) = 0 p(θ A σ 2,g)p(g)dg (σ 2 ) a+1 θ A (p+2a+1), (2.13) for sufficiently large θ A R p 1, a > 1 and b > 1. Hence the asymptotic tail behavior of p(θ A σ 2 ) for a = 1/2 is multivariate Cauchy, θ A (p 1) 1, which has been recommended by Zellner and Siow(1980) and others in objective Bayes context. Before we proceed to give the Bayes factor with respect to priors described above, we review statistical inference in ANOVA from frequentist point of view. The key decomposition is given by W T = y ȳ 1 n 2 = ij (y ij ȳ ) 2 = U A U A{y ȳ 1 n } 2 + (I U A U A){y ȳ 1 n } 2 = i r i(ȳ i ȳ ) 2 + ij (y ij ȳ i ) 2 = W H +W E, (2.14)

6 where W H and W E are independent and Y. Maruyama/A Bayes factor 6 W H σ 2 [ χ2 p 1 θa 2 /σ 2], W E σ 2 χ2 n p. In (2.14), W T, W H and W E are called total sum of squares, between group sum of squares and within group sum of squares, respectively. The hypothesis H 0 : α = 0 p (or θ A = 0 p 1 ) is rejected if W H /W E is relatively larger. In the theorem below, the Bayes factor under our priors is an increasing function of W H /W E, which is reasonable from frequentist point of view, too. Theorem 2.1. Under priors given by (2.8) under M 1 and by (2.9) with (2.10) and (2.11) under M A+1, the Bayes factor is given by m A+1 (y) m 1 (y) Proof. See Appendix. = Γ(p/2+a+1/2)Γ((n p)/2) Γ(a+1)Γ({n 1}/2) ( 1+ W ) (n p 2)/2 a H. W E By Remark 2.1 and Theorem 2.1, the Bayes factor which we recommend is written as BF F (A+1):1 = m A+1(y) = Γ(p/2)Γ({n p}/2) ( 1+ W ) (n p 1)/2 H, (2.15) m 1 (y) Γ(1/2)Γ({n 1}/2) W E where the subscript F means Fully Bayes. 3. Model selection consistency In this section, we consider the model selection consistency as n = i r i. Generally, the posterior consistency for model choice is defined as plimpr(m γ y) = 1 or plimbf γ :γ = 0 (3.1) n n when M γ is the true model and M γ is not. Here plim denotes convergence in probability and the probability distribution in (3.1) is the sampling distribution under the true model M γ. We will show that BF F (A+1):1 given by (2.15) has a reasonable model selection consistency. As a natural competitor of BF F (A+1):1, the BIC based Bayes factor, ( BF BIC (A+1):1 = 1+ W ) n/2 H n (p 1)/2, W E is considered. As remarked in Section 1, the consistency depends on c A given by (1.5) or equivalently θ A 2 /(nσ 2 ). Let Then we have a following result. c A = lim c θ A 2 A = lim n n nσ 2. (3.2)

7 Theorem 3.1. Y. Maruyama/A Bayes factor 7 I. Assume r and p is fixed. (a) BF F (A+1):1 and BF BIC (A+1):1 are consistent whichever the true model is. II. Assume p. (a) BF F (A+1):1 and BFBIC (A+1):1 are consistent under M 1. (b) Assume r is fixed under M A+1. i. BF F (A+1):1 is consistent (inconsistent) if c A > (<)h( r) where ii. BF BIC (A+1):1 is inconsistent. (c) Assume r under M A+1. i. BF F (A+1):1 is consistent. h(r) = r 1/(r 1) 1. (3.3) ii. BF BIC (A+1):1 is consistent (inconsistent) if p slower (faster) than (e{1+ c A } r )/ r. Note: h(r) is a convex decreasing function in r which satisfies h(2) = 1, h(5). = 0.5, h(10). = 0.29, and h( ) = 0. Proof. See Appendix. Note: After Maruyama (2009), the first version of this paper, where the balanced ANOVA was treated only (that is, r i r is assumed), Wang and Sun (2012) extended consistency results of Maruyama (2009) to unbalanced case while the proof itself essentially follows from Maruyama (2009). Remark 3.1. We give some remarks on inconsistency shown in Theorem 3.1. I. As shown in II(b)ii of Theorem 3.1, BIC always chooses M 1 when p and r is fixed, even if c A is very large. This is interpreted as unknown variance version of Stone s (1979). II. As seen in II(b)i of Theorem 3.1, BF F (A+1):1 has an inconsistency region. Actually existing such an inconsistency region has been also reported by Moreno, Girón and Casella (2010). They proposed intrinsic Bayes factor for normal regression model and their upper-bound of inconsistency region for one-way balanced ANOVA is given by r 1 (r+1) (r 1)/r 1 1 which is slightly smaller than h(r) given by (3.3). III. The existence of inconsistency region for small c A and large p is quite reasonable from the following reason. Assume new independent observations z ij (i = 1,...,p, j = 1,...,r i ) from the same model as y ij. Then the difference of scaled mean squared prediction errors of ȳ i and ȳ is given

8 by Y. Maruyama/A Bayes factor 8 [ ] [ ] E y,z i,j (z ij ȳ ) 2 E y,z i,j (z ij ȳ i ) 2 [ȳ ;ȳ i ] = nσ 2 nσ 2 = c A p 1 n for any p and r. First assume that M 1 is true. We see that [ȳ ;ȳ i ] = (p 1)/n < 0, which is reasonable. Then assume that M A+1 is true. When r and p is fixed, lim r [ȳ ;ȳ i ] = c A > 0, which is reasonable. On the other hand, if p and r is fixed, lim [ȳ ;ȳ p i ] = c A 1/ r. Hence when c A < 1/ r, [ȳ ;ȳ i ] is negative even if M A+1 is true. Therefore, from the prediction point of view, the existence of inconsistency region for small c A and large p is reasonable. Table1showsfrequencyofchoiceofthetruemodelinsomecasesinnumerical experiment where he balanced case r 1 = = r p is considered. We see that it clearly guarantees the validity of Theorem two-way ANOVA 4.1. re-parameterized two-way ANOVA In this section, we extend the results in Sections 2 and 3 to two-way ANOVA model. We have n independent normal random variables y ijk (i = 1,...,p, j = 1,...,q, k = 1,...,r ij, n = i,j r ij) where y ijk = µ+α i +β j +γ ij +ǫ ijk, ǫ ijk N(0,σ 2 ). (4.1) In the following, for simplicity as in Christensen (2011), we assume that r ij is given by r ij = n pq ν iξ j (4.2) where ν i for i = 1,...,p and ξ j for j = 1,...,q satisfy p ν i = p, i=1 q ξ j = q j=1

9 Y. Maruyama/A Bayes factor 9 Table 1 Frequency of choice of the true model p\r under M 1 c = 0.1 under M A+1 FB BIC c = 0.5 under M A+1 c = 1 under M A+1 FB BIC c = 2 under M A+1 c = 5 under M A+1 FB BIC

10 Y. Maruyama/A Bayes factor 10 and that {n/(pq)}ν i ξ j for any (i,j) is a positive integer. Therefore we allow some kinds of unbalanced design as well as balanced design, r ij n/(pq) for any (i,j). Let y = ({y 111,...,y 11r11 },{y 121,...,y 12r12 },...,{y pq1,...,y pqrpq }), ǫ = ({ǫ 111,...,ǫ 11r11 },{ǫ 121,...,ǫ 12r12 },...,{ǫ pq1,...,ǫ pqrpq }), α = (α 1,...,α p ), β = (β 1,...,β q ) and γ = ({γ 11,...,γ 1q },{γ 21,...,γ 2q },...,{γ p1,...,γ pq }). The n (p+q +pq) design matrix is given by (X A,X B,X AB ), where the n p matrix X A = (x A ti ), the n q matrix X B = (X B1,...,X Bp ) with an ( q j=1 r ij) q matrix X Bi = (x Bi tj ), the n pq matrix X AB = (x AB tm ) satisfy { x A 1 if i 1 q m=1 l=1 ti = r ml +1 t i q m=1 l=1 r ml, 0 otherwise, and x AB x Bi tj = { 1 if j 1 m=1 r im +1 t j m=1 r im, 0 otherwise, { tm = 1 if s<m r i(s)j(s) +1 t s m r i(s)j(s), 0 otherwise, where i(s) and j(s) are the unique integer solution of s = (i 1)q + j where i = 1,...,p and j = 1,...,q, respectively. Then the linear model of (4.1) is written as y = µ1 n +X A α+x B β +X AB γ +ǫ, ǫ N n (0 n,σ 2 I n ). (4.3) Note that {µ,α,β,γ} do not have identifiability since 1 n = X A 1 p = X B 1 q = X AB 1 pq, X A e i = X AB (0 q,...,0 q,1 q,0 q,...,0 q ), for i = 1,...,p, X B e j = X AB (e j,...,e j ), for j = 1,...,q. (4.4) So we will re-parameterize the model (4.3) as a linear regression model with non-constrained parameters. The expectation E[y] is re-written as µ1 n +X A α+x B β +X AB γ = θ 1 (µ,α,β,γ)1 n + X A α+ X B β + X AB γ

11 Y. Maruyama/A Bayes factor 11 where θ 1 (µ,α,β,γ) = µ+ ν α p + ξ β q + (r 11,...,r pq ) γ n and X A, XB, XAB are the centered matrices. As explained in Christensen (2011), under the condition (4.2) on r ij including the balanced design, XA and X B are orthogonal, that is, X A XB is the zero matrix. Let the singular value decomposition of XA, XB and (I n U A U A )(I n U B U B ) X AB be X A = U A D A W A, XB = U B D B W B, (I n U A U A)(I n U B U B) X AB = U AB\(A+B) D AB\(A+B) W AB\(A+B), (4.5) where the rank of XA, XB and (I n U A U A )(I n U B U B ) X AB are p 1, q 1 and (p 1)(q 1), respectively. Then where E[y] = θ 1 (µ,α,β,γ)1 n + X A α+u A U AX AB γ + X B β +U B U BX AB γ +(I n U A U A U B U B) X AB γ = θ 1 (µ,α,β,γ)1 n +U A {D A W A α+u A X ABγ} +U B {D B W Bβ +U BX AB γ}+(i n U A U A U B U B) X AB γ = θ 1 (µ,α,β,γ)1 n +U A θ A (α,γ) +U B θ B (β,γ)+u AB\(A+B) θ AB\(A+B) (γ) θ A (α,γ) = D A W A α+u A X ABγ R p 1, θ B (β,γ) = D B W Bβ +U BX AB γ R q 1, θ AB\(A+B) (γ) = D AB\(A+B) W AB\(A+B) γ R(p 1)(q 1). (4.6) Therefore the linear model with non-constrained parameters is given by {θ 1,θ A,θ B,θ AB\(A+B) } y = θ 1 1 n +U A θ A +U B θ B +U AB\(A+B) θ AB\(A+B) +ǫ. (4.7) In the two-way ANOVA model given by (4.3), the following five submodels are important. M 1 : E[y] = µ1 n, (α = 0 p,β = 0 q,γ = 0 pq ) M A+1 : E[y] = µ1 n +X A α, (β = 0 q,γ = 0 pq ) M B+1 : E[y] = µ1 n +X B β, (α = 0 p,γ = 0 pq ) M A+B+1 : E[y] = µ1 n +X A α+x B β, (γ = 0 pq )

12 Y. Maruyama/A Bayes factor 12 M (A+1)(B+1) : E[y] = µ1 n +X A α+x B β +X AB γ. Under suitable constraints on α, β and γ, for example, ν α = 0, ξ β = 0, r 11,...,r pq U A X AB γ = 0 p+q 1, (4.8) U B X AB {µ,α,β,γ} are identifiable, and further between (4.3) and (4.7), there are the following equivalences: M 1 : (α = 0 p,β = 0 q,γ = 0 pq ) (θ A = 0 p 1,θ B = 0 q 1,θ AB\(A+B) = 0 (p 1)(q 1) ), M A+1 : (β = 0 q,γ = 0 pq ) (θ B = 0 q 1,θ AB\(A+B) = 0 (p 1)(q 1) ), M B+1 : (α = 0 p,γ = 0 pq ) (θ A = 0 p 1,θ AB\(A+B) = 0 (p 1)(q 1) ), M A+B+1 : γ = 0 pq θ AB\(A+B) = 0 (p 1)(q 1). Note that the constraints for {α,β,γ} as (4.8) do not affect our results priors and the Bayes factor Following Section 2.2, we assume that π(θ 1,σ 2 ) = 1/σ 2 for every model and θ A {g,σ 2 } N p 1 (0 p 1,gσ 2 I p 1 ) for M A+1,M A+B+1,M (A+1)(B+1), θ B {g,σ 2 } N q 1 (0 q 1,gσ 2 I q 1 ) for M B+1,M A+B+1,M (A+1)(B+1), and θ AB\(A+B) {g,σ 2 } N (p 1)(q 1) (0 (p 1)(q 1),gσ 2 I (p 1)(q 1) ) for M (A+1)(B+1). Further we assume p(g) = gb (1+g) a b 2 B(a+1,b+1) (4.9) where a = 1/2 and b(s) = (n s 1)/2 a 2 (4.10) for p 1 under M A+1 q 1 under M B+1 s = p+q 2 under M A+B+1 pq 1 under M (A+1)(B+1).

13 Y. Maruyama/A Bayes factor 13 Before we proceed to give the Bayes factor with respect to priors described above, we review the decomposition of squares for two-way ANOVA. As similar in one-way ANOVA, the total sum of squares W T = ijk (y ijk ȳ ) 2 = y ȳ 1 n 2 can be decomposed as where each sums of squares are given by W A = ijk W B = ijk W AB\(A+B) = ijk W T = W A +W B +W AB\(A+B) +W E (4.11) (ȳ i ȳ ) 2 = U A U A{y ȳ 1 n } 2 = U A{y ȳ 1 n } 2, (ȳ j ȳ ) 2 = U B U B {y ȳ 1 n } 2 = U B {y ȳ 1 n } 2, (ȳ ij ȳ i ȳ j +ȳ ) 2 = U AB\(A+B) U AB\(A+B) {y ȳ 1 n } 2 = U AB\(A+B) {y ȳ 1 n } 2, W E = ijk and (y ijk ȳ ij ) 2 = (I n U A U A)(I n U B U B)(I n U AB\(A+B) U AB\(A+B) ){y ȳ 1 n } 2 ȳ = 1 y ijk, ȳ i = 1 n ijk j r y ijk, ij jk ȳ j = 1 i r y ijk, ȳ ij = 1 y ijk. ij r ij ik In (4.11), W A,W B,W AB\(A+B) and W E are independently distributed as W A σ 2 χ2 p 1 ( θ A 2 /σ 2 W B ), σ 2 χ2 q 1 ( θ B 2 /σ 2 ), W AB\(A+B) σ 2 χ 2 (p 1)(q 1) ( θ AB\(A+B) 2 /σ 2 ). k W E σ 2 χ2 n pq, (4.12) Note that θ A 2, θ B 2 and θ AB\(A+B) 2 do not depend on parameterization since these are given by θ A 2 = U A{E[y] E[ȳ ]1 n } 2, θ B 2 = U B {E[y] E[ȳ ]1 n } 2, θ AB\(A+B) 2 = U AB\(A+B) {E[y] E[ȳ ]1 n } 2. Based on the decomposition, Bayes factors for two-way ANOVA are given as follows.

14 Y. Maruyama/A Bayes factor 14 Theorem 4.1. When M γ for γ = A+1,B+1,A+B+1, and (A+1)(B+1) and M 1 are pairwisely compared, the corresponding Bayes factors can be derived as follows: BF F (A+1):1 = Γ(p/2)Γ((n p)/2) ( 1+ Γ(1/2)Γ({n 1}/2) BF F (B+1):1 = Γ(q/2)Γ((n q)/2) Γ(1/2)Γ({n 1}/2) ( 1+ BF F (A+B+1):1 = Γ(p+q 1 2 )Γ( n p q+1 2 ) Γ(1/2)Γ({n 1}/2) BF F (A+1)(B+1):1 = Γ(pq/2)Γ((n pq)/2) Γ(1/2)Γ({n 1}/2) W A W T W A W B W T W B ( 1+ ) (n p)/2 1/2 ) (n q)/2 1/2 ) (n p q)/2 W A +W B W T W A W B ( 1+ W ) (n pq)/2 1/2 T W E. W E 4.3. Consistency In this subsection, we consider model selection consistency of Bayes factors proposed in Theorem 4.1. We call {BF F }, the set of Bayes factors given in Theorem 4.1, consistent under M γ if and only if BF F γ :1 plim n BF F = 0 γ:1 for any γ γ (Let BF F 1:1 = 1). As the competitor, the corresponding Bayes factors based on BIC are given as follows. Let BF BIC (A+1):1 = n (p 1)/2 (1+ BF BIC (B+1):1 = n (q 1)/2 (1+ BF BIC (A+B+1):1 = n (p+q 2)/2 (1+ W A W T W A W B W T W B ) n/2 ) n/2 W A +W B W T W A W B ) n/2. BF BIC (A+1)(B+1):1 = n (pq 1)/2 ( 1+ W T W E W E ) n/2 θ A 2 c A = lim n nσ 2, c θ B 2 B = lim n nσ 2, c θ AB\(A+B) 2 AB\(A+B) = lim n nσ 2. Then we have a following result on consistency. Theorem 4.2. I. Assume r and p and q are fixed.

15 Y. Maruyama/A Bayes factor 15 (a) Both {BF F } and {BF BIC } are consistent whichever the true model is. II. Assume p and q. (a) Both {BF F } and {BF BIC } are consistent except under M (A+1)(B+1). (b) Assume r is fixed. i. {BF F } is consistent (inconsistent) under M (A+1)(B+1) when c AB\(A+B) > (<)H( r, c A + c B ), (4.13) where H(r,c) with positive c is the (unique) positive solution of (x+1) r /r (x+1) c = 0. (4.14) ii. {BF BIC } is inconsistent under M (A+1)(B+1). (c) Assume r. Proof. See Appendix. i. {BF F } is consistent under M (A+1)(B+1). ii. {BF BIC } is consistent (inconsistent) under M (A+1)(B+1) when pq slower (faster) than (e{1+ c AB\(A+B) } r )/ r. The function H(r, c) satisfies H(r,c) with fixed r is increasing in c and H(r,0) = h(r) where h(r) is given by (3.3). H(r,c) with fixed c is decreasing in r and lim H(r,c) = 0. r Remark 4.1. We have results under fixed q (and fixed r or r ), too. We omit the detail since they are somewhat complicated. Appendix A: Proof of Theorem 2.1 First we derive the marginal density under M 1. Using the Pythagorean relation y θ 1 1 n 2 = n(ȳ θ 1 ) 2 +W T, where ȳ = n 1 i,j y ij and W T = y ȳ 1 n 2 = ij (y ij ȳ ) 2, we have 1 m 1 (y) = exp ( y θ 11 n 2 ) 0 (2π) n/2 σn+2 2σ 2 dθ 1 dσ 2 n 1/2 ( 1 = exp W ) T (2π) n/2 1/2 σn+1 2σ 2 dσ 2 0 = n1/2 Γ({n 1}/2) π (n 1)/2 {W T } (n 1)/2.

16 Y. Maruyama/A Bayes factor 16 Then we derive the marginal density under M A+1. Using the relationship y θ 1 1 n U A θ A 2 +g 1 θ A 2 = n(ȳ θ 1 ) 2 + y ȳ 1 n U A θ A 2 +g 1 θ A 2 = n(ȳ θ 1 ) 2 + g +1 g θ A gu A (y ȳ 1 n ) 2 g +1 g g +1 U A(y ȳ 1 n ) 2 + y ȳ 1 n 2 = n(ȳ θ 1 ) 2 + g +1 g θ A gu A (y ȳ 1 2 n ) g +1 + W T +gw E, g +1 where W E is given in (2.14), we have the conditional marginaldensity of y given g under M A+1, m A+1 (y g) = p(y θ 1,θ A,σ 2 )p(θ A σ 2,g)p(σ 2 )dθ 1 dθ A dσ 2 R p = R p 1 0 (2πσ 2 ) n/2 (2πgσ 2 ) (p 1)/2 exp ( y θ 11 n U A θ A 2 2σ 2 θ A 2 ) 2gσ 2 p(σ 2 )dθ 1 dθ A dσ 2 = 0 n 1/2 (1+g) (p 1)/2 (2πσ 2 ) (n 1)/2 (1+g) (n p)/2 exp = m 1 (y) (g{w E /W T }+1) (n 1)/2. The fully marginal density m A+1 (y) = 0 ( W T +gw E 2σ 2 (g +1) m A+1 (y g)p(g)dg ) 1 σ 2dσ2 under the prior of g given by (2.11), has a closed simple form as follows; m A+1 (y) = m 1 (y) 0 1 = m 1 (y) B(a+1,b+1) which completes the proof. (1+g) (n p)/2 g b (1+g) a b 2 (g{w E /W T }+1) (n 1)/2 B(a+1,b+1) dg = m 1 (y) Γ(p/2+a+1/2)Γ((n p)/2) Γ(a+1)Γ({n 1}/2) Appendix B: Proof of Theorem 3.1 preparation: random part 0 g b (g{w E /W T }+1) (n 1)/2dg ( ) (n p 2)/2 a WT, W E (A.1)

17 Y. Maruyama/A Bayes factor 17 Recall that, for any n and p, W H and W E are independent and W H σ 2 [ χ2 p 1 θa 2 /σ 2], W E σ 2 χ2 n p. P In the following, n denotes convergence in probability as n grows. When p is fixed and r(= n/p) goes to, we have W E P W H r p rσ 2 1, σ 2 χ2 p under M 1, W H P r p rσ 2 c A under M A+1. Hence, for any ǫ > 0 and any positive number r, there exists an l(ǫ) such that Pr ( 1/l(ǫ) < {1+W H /W E } p r < l(ǫ) ) > 1 ǫ under M 1. Under M A+1, we have When p goes to infinity, we have Hence we have 1 W E P p r 1 pσ 2 1, 1+ W H W E P r 1+ c A. W H pσ 2 P p 1 under M 1, W H pσ 2 P p 1+ r c A under M A W H W E P p r r 1 under M 1, 1+ W H P r{1+ c A } p under M A+1. W E r 1 preparation: non-random part For the asymptotic behavior of the gamma function, Stiring s formula, Γ(ax+b) 2πe ax (ax) ax+b 1/2 (B.1) (B.2) (B.3) (B.4) for sufficiently large x is useful. Here f g means limf/g = 1. Using (B.4), we get Γ(p/2)Γ({p r p}/2) Γ(1/2)Γ({p r 1}/2) Γ(p/2) 1 Γ(1/2)(p/2) (p 1)/2 r (p 1)/2, (B.5) when r and p is fixed, and when p. Γ(p/2)Γ({p r p}/2) Γ({p r 1}/2)Γ(1/2) ( r 1)p( r 1)/2 1/2 2 = 2 r ( r 1) 1/2 r p r/2 1 { r 1 r r/( r 1) } p( r 1)/2 (B.6)

18 Y. Maruyama/A Bayes factor 18 main part First, we consider the consistency in the case where r and p is fixed. Using (B.1), (B.2) and (B.5), we have BF F (A+1):1 P r 0 under M 1, 1/BF F P (A+1):1 r 0 under M A+1, which means BF F (A+1):1 is consistent under both M 1 and M A+1. Similarly we see that BF BIC (A+1):1 is consistent under both M 1 and M A+1 when r and p is fixed. Then we consider the consistency in the case where p. Using (B.3) under M 1 and (B.6), we have BF F (A+1):1 P p 0, BF BIC P (A+1):1 p 0 which means BF F (A+1):1 and BFBIC (A+1):1 are consistent under M 1 when p. Using (B.3) under M A+1 and (B.6), we have 1 BF F (A+1):1 { } p( r 1)/2 2 r 1+ ca P p 1. ( r 1) 1/2 r 1/( r 1) Hence when c A > h( r) = r 1/( r 1) 1, we have 1/BF F (A+1):1 P p 0, which means consistency of BF F (A+1):1 under M A+1. Since h( r) 0 as r, BF F (A+1):1 has consistency when both p and r go to infinity. On the other hand, when c A < h( r), we have BF F (A+1):1 P p 0, which means inconsistency of BF F (A+1):1 under M A+1. Using (B.3) under M A+1 and (B.6), we have BF BIC (A+1):1 ( We see that for any fixed r, p r { ) } r p/2 r 1 (p r) 1/2 P p 1. r(1+ c A ) BF BIC (A+1):1 P p 0 which means inconsistency of BIC under M A+1. When r as well as p goes to, ( ) p/2 BF BIC p r (A+1):1 e(1+ c ) r (p r) 1/2 P p 1. A

19 Y. Maruyama/A Bayes factor 19 Hence if p approaches infinity faster than (e{1+ c A } r )/ r, we have BF BIC (A+1):1 P p, r 0, which means inconsistency of BIC even if r. On the other hand, if p approaches infinity slower than (e{1+ c A } r )/ r, we have which means consistency of BIC. 1/BF BIC (A+1):1 P p, r 0. Appendix C: Proof of Theorem 4.2 By following the proof of Theorem 3.1 in Appendix B, all of the proof are straightforward. Here we give the sketch of the proof for part II(b)i only. When p,q and r is fixed, W A P W B P W E P p,q nσ 2 c A, p,q nσ 2 c B, p,q nσ 2 1 1/ r, W AB\(A+B) P p,q nσ 2 c AB\(A+B) +1/ r. (C.1) Using (C.1), Stiring s formula given in (B.4) and lim p p 1/p = 1, we have Therefore if the ratio { } ( 2/(pq) BF F P p,q 1+ ca + c B + c AB\(A+B) (A+1):1 1+ c B + c AB\(A+B) { } ( 2/(pq) BF F P p,q 1+ ca + c B + c AB\(A+B) (B+1):1 1+ c A + c AB\(A+B) ) r, ) r, { } ( 2/(pq) BF F P p,q 1+ ca + c B + c AB\(A+B) (A+B+1):1 1+ c AB\(A+B) ) r, { } 2/(pq) BF F P p,q (1+ c A + c B + c ) r 1 AB\(A+B) (A+1)(B+1):1. r (1+ c AB\(A+B) ) r r { > 1+ c A + c B + c AB\(A+B), (C.2) BF F γ:1 BF F (A+1)(B+1):1 } 2/(pq), (C.3) for γ = A+1,B+1,A+B+1, approaches a positive constant strictly less than 1, which guarantees the consistency of BF F (A+1)(B+1):1 under (C.2).

20 Y. Maruyama/A Bayes factor 20 References Berger, J. O., Bernardo, J. M. and Sun, D. (2009). The formal definition of reference priors. Ann. Statist MR Berger, J. O., Ghosh, J. K. and Mukhopadhyay, N. (2003). Approximations and consistency of Bayes factors as model dimension grows. J. Statist. Plann. Inference MR Berger, J. O., Pericchi, L. R. and Varshavsky, J. A.(1998). Bayes factors and marginal distributions in invariant situations. Sankhyā Ser. A MR Christensen, R. (2011). Plane answers to complex questions, fourth ed. Springer Texts in Statistics. Springer, New York. The theory of linear models. MR Kass, R. E. and Raftery, A. E. (1995). Bayes factors. J. Amer. Statist. Assoc Liang, F.,Paulo, R.,Molina, G.,Clyde, M. A.andBerger, J. O.(2008). Mixtures of g priors for Bayesian variable selection. J. Amer. Statist. Assoc MR Maruyama, Y. (2009). A Bayes factor with reasonable model selection consistency for ANOVA model. Arxiv v1 [stat.me]. Maruyama, Y. and George, E. I. (2011). Fully Bayes factors with a generalized g-prior. Ann. Statist Moreno, E., Girón, F. J. and Casella, G. (2010). Consistency of objective Bayes factors as the model dimension grows. Ann. Statist Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist MR Stone, M. (1979). Comments on Model Selection Criteria of Akaike and Schwarz. J. R. Stat. Soc. Ser. B Stat. Methodol Wang, M. and Sun, X. (2012). Bayes Factor Consistency for Unbalanced ANOVA Models. Arxiv v1 [stat.me]. Zellner, A. (1986). On assessing prior distributions and Bayesian regression analysis with g-prior distributions. In Bayesian inference and decision techniques. Stud. Bayesian Econometrics Statist North-Holland, Amsterdam. MR Zellner, A. and Siow, A. (1980). Posterior Odds Ratios for Selected Regression Hypotheses. In Bayesian Statistics: Proceedings of the First International Meeting held in Valencia (Spain) (J. M. Bernardo, M. H. DeGroot, D. V. Lindley and A. F. M. Smith, eds.) University of Valencia.

An Extended BIC for Model Selection

An Extended BIC for Model Selection An Extended BIC for Model Selection at the JSM meeting 2007 - Salt Lake City Surajit Ray Boston University (Dept of Mathematics and Statistics) Joint work with James Berger, Duke University; Susie Bayarri,

More information

Divergence Based priors for the problem of hypothesis testing

Divergence Based priors for the problem of hypothesis testing Divergence Based priors for the problem of hypothesis testing gonzalo garcía-donato and susie Bayarri May 22, 2009 gonzalo garcía-donato and susie Bayarri () DB priors May 22, 2009 1 / 46 Jeffreys and

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Bayesian Assessment of Hypotheses and Models

Bayesian Assessment of Hypotheses and Models 8 Bayesian Assessment of Hypotheses and Models This is page 399 Printer: Opaque this 8. Introduction The three preceding chapters gave an overview of how Bayesian probability models are constructed. Once

More information

1 Hypothesis Testing and Model Selection

1 Hypothesis Testing and Model Selection A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Lecture 5: Conventional Model Selection Priors

Lecture 5: Conventional Model Selection Priors Lecture 5: Conventional Model Selection Priors Susie Bayarri University of Valencia CBMS Conference on Model Uncertainty and Multiplicity July 23-28, 2012 Outline The general linear model and Orthogonalization

More information

Some Curiosities Arising in Objective Bayesian Analysis

Some Curiosities Arising in Objective Bayesian Analysis . Some Curiosities Arising in Objective Bayesian Analysis Jim Berger Duke University Statistical and Applied Mathematical Institute Yale University May 15, 2009 1 Three vignettes related to John s work

More information

ST 740: Model Selection

ST 740: Model Selection ST 740: Model Selection Alyson Wilson Department of Statistics North Carolina State University November 25, 2013 A. Wilson (NCSU Statistics) Model Selection November 25, 2013 1 / 29 Formal Bayesian Model

More information

Extending Conventional priors for Testing General Hypotheses in Linear Models

Extending Conventional priors for Testing General Hypotheses in Linear Models Extending Conventional priors for Testing General Hypotheses in Linear Models María Jesús Bayarri University of Valencia 46100 Valencia, Spain Gonzalo García-Donato University of Castilla-La Mancha 0071

More information

Invariant HPD credible sets and MAP estimators

Invariant HPD credible sets and MAP estimators Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in

More information

g-priors for Linear Regression

g-priors for Linear Regression Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,

More information

A simple two-sample Bayesian t-test for hypothesis testing

A simple two-sample Bayesian t-test for hypothesis testing A simple two-sample Bayesian t-test for hypothesis testing arxiv:159.2568v1 [stat.me] 8 Sep 215 Min Wang Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, USA and Guangying

More information

Bayesian Variable Selection Under Collinearity

Bayesian Variable Selection Under Collinearity Bayesian Variable Selection Under Collinearity Joyee Ghosh Andrew E. Ghattas. June 3, 2014 Abstract In this article we provide some guidelines to practitioners who use Bayesian variable selection for linear

More information

Bayesian Model Averaging

Bayesian Model Averaging Bayesian Model Averaging Hoff Chapter 9, Hoeting et al 1999, Clyde & George 2004, Liang et al 2008 October 24, 2017 Bayesian Model Choice Models for the variable selection problem are based on a subset

More information

A REVERSE TO THE JEFFREYS LINDLEY PARADOX

A REVERSE TO THE JEFFREYS LINDLEY PARADOX PROBABILITY AND MATHEMATICAL STATISTICS Vol. 38, Fasc. 1 (2018), pp. 243 247 doi:10.19195/0208-4147.38.1.13 A REVERSE TO THE JEFFREYS LINDLEY PARADOX BY WIEBE R. P E S T M A N (LEUVEN), FRANCIS T U E R

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Integrated Objective Bayesian Estimation and Hypothesis Testing

Integrated Objective Bayesian Estimation and Hypothesis Testing Integrated Objective Bayesian Estimation and Hypothesis Testing José M. Bernardo Universitat de València, Spain jose.m.bernardo@uv.es 9th Valencia International Meeting on Bayesian Statistics Benidorm

More information

Divergence Based Priors for Bayesian Hypothesis testing

Divergence Based Priors for Bayesian Hypothesis testing Divergence Based Priors for Bayesian Hypothesis testing M.J. Bayarri University of Valencia G. García-Donato University of Castilla-La Mancha November, 2006 Abstract Maybe the main difficulty for objective

More information

Lecture 7 October 13

Lecture 7 October 13 STATS 300A: Theory of Statistics Fall 2015 Lecture 7 October 13 Lecturer: Lester Mackey Scribe: Jing Miao and Xiuyuan Lu 7.1 Recap So far, we have investigated various criteria for optimal inference. We

More information

Seminar über Statistik FS2008: Model Selection

Seminar über Statistik FS2008: Model Selection Seminar über Statistik FS2008: Model Selection Alessia Fenaroli, Ghazale Jazayeri Monday, April 2, 2008 Introduction Model Choice deals with the comparison of models and the selection of a model. It can

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Overall Objective Priors

Overall Objective Priors Overall Objective Priors Jim Berger, Jose Bernardo and Dongchu Sun Duke University, University of Valencia and University of Missouri Recent advances in statistical inference: theory and case studies University

More information

Mixtures of Prior Distributions

Mixtures of Prior Distributions Mixtures of Prior Distributions Hoff Chapter 9, Liang et al 2007, Hoeting et al (1999), Clyde & George (2004) November 10, 2016 Bartlett s Paradox The Bayes factor for comparing M γ to the null model:

More information

Mixtures of Prior Distributions

Mixtures of Prior Distributions Mixtures of Prior Distributions Hoff Chapter 9, Liang et al 2007, Hoeting et al (1999), Clyde & George (2004) November 9, 2017 Bartlett s Paradox The Bayes factor for comparing M γ to the null model: BF

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Lecture 6: Linear models and Gauss-Markov theorem

Lecture 6: Linear models and Gauss-Markov theorem Lecture 6: Linear models and Gauss-Markov theorem Linear model setting Results in simple linear regression can be extended to the following general linear model with independently observed response variables

More information

Bayesian Inference. Chapter 9. Linear models and regression

Bayesian Inference. Chapter 9. Linear models and regression Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

An Overview of Objective Bayesian Analysis

An Overview of Objective Bayesian Analysis An Overview of Objective Bayesian Analysis James O. Berger Duke University visiting the University of Chicago Department of Statistics Spring Quarter, 2011 1 Lectures Lecture 1. Objective Bayesian Analysis:

More information

Linear Models A linear model is defined by the expression

Linear Models A linear model is defined by the expression Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose

More information

Illustrating the Implicit BIC Prior. Richard Startz * revised June Abstract

Illustrating the Implicit BIC Prior. Richard Startz * revised June Abstract Illustrating the Implicit BIC Prior Richard Startz * revised June 2013 Abstract I show how to find the uniform prior implicit in using the Bayesian Information Criterion to consider a hypothesis about

More information

ST 740: Linear Models and Multivariate Normal Inference

ST 740: Linear Models and Multivariate Normal Inference ST 740: Linear Models and Multivariate Normal Inference Alyson Wilson Department of Statistics North Carolina State University November 4, 2013 A. Wilson (NCSU STAT) Linear Models November 4, 2013 1 /

More information

A Bayesian Treatment of Linear Gaussian Regression

A Bayesian Treatment of Linear Gaussian Regression A Bayesian Treatment of Linear Gaussian Regression Frank Wood December 3, 2009 Bayesian Approach to Classical Linear Regression In classical linear regression we have the following model y β, σ 2, X N(Xβ,

More information

The joint posterior distribution of the unknown parameters and hidden variables, given the

The joint posterior distribution of the unknown parameters and hidden variables, given the DERIVATIONS OF THE FULLY CONDITIONAL POSTERIOR DENSITIES The joint posterior distribution of the unknown parameters and hidden variables, given the data, is proportional to the product of the joint prior

More information

Supplementary materials for Scalable Bayesian model averaging through local information propagation

Supplementary materials for Scalable Bayesian model averaging through local information propagation Supplementary materials for Scalable Bayesian model averaging through local information propagation August 25, 2014 S1. Proofs Proof of Theorem 1. The result follows immediately from the distributions

More information

Model Choice. Hoff Chapter 9, Clyde & George Model Uncertainty StatSci, Hoeting et al BMA StatSci. October 27, 2015

Model Choice. Hoff Chapter 9, Clyde & George Model Uncertainty StatSci, Hoeting et al BMA StatSci. October 27, 2015 Model Choice Hoff Chapter 9, Clyde & George Model Uncertainty StatSci, Hoeting et al BMA StatSci October 27, 2015 Topics Variable Selection / Model Choice Stepwise Methods Model Selection Criteria Model

More information

Harrison B. Prosper. CMS Statistics Committee

Harrison B. Prosper. CMS Statistics Committee Harrison B. Prosper Florida State University CMS Statistics Committee 08-08-08 Bayesian Methods: Theory & Practice. Harrison B. Prosper 1 h Lecture 3 Applications h Hypothesis Testing Recap h A Single

More information

A NONINFORMATIVE BAYESIAN APPROACH FOR TWO-STAGE CLUSTER SAMPLING

A NONINFORMATIVE BAYESIAN APPROACH FOR TWO-STAGE CLUSTER SAMPLING Sankhyā : The Indian Journal of Statistics Special Issue on Sample Surveys 1999, Volume 61, Series B, Pt. 1, pp. 133-144 A OIFORMATIVE BAYESIA APPROACH FOR TWO-STAGE CLUSTER SAMPLIG By GLE MEEDE University

More information

10. Exchangeability and hierarchical models Objective. Recommended reading

10. Exchangeability and hierarchical models Objective. Recommended reading 10. Exchangeability and hierarchical models Objective Introduce exchangeability and its relation to Bayesian hierarchical models. Show how to fit such models using fully and empirical Bayesian methods.

More information

Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis

Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis Yasunori Fujikoshi and Tetsuro Sakurai Department of Mathematics, Graduate School of Science,

More information

Bayesian Variable Selection Under Collinearity

Bayesian Variable Selection Under Collinearity Bayesian Variable Selection Under Collinearity Joyee Ghosh Andrew E. Ghattas. Abstract In this article we highlight some interesting facts about Bayesian variable selection methods for linear regression

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

On the use of non-local prior densities in Bayesian hypothesis tests

On the use of non-local prior densities in Bayesian hypothesis tests J. R. Statist. Soc. B (2010) 72, Part 2, pp. 143 170 On the use of non-local prior densities in Bayesian hypothesis tests Valen E. Johnson M. D. Anderson Cancer Center, Houston, USA and David Rossell Institute

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Mixtures of g-priors for Bayesian Variable Selection

Mixtures of g-priors for Bayesian Variable Selection Mixtures of g-priors for Bayesian Variable Selection January 8, 007 Abstract Zellner s g-prior remains a popular conventional prior for use in Bayesian variable selection, despite several undesirable consistency

More information

Default priors and model parametrization

Default priors and model parametrization 1 / 16 Default priors and model parametrization Nancy Reid O-Bayes09, June 6, 2009 Don Fraser, Elisabeta Marras, Grace Yun-Yi 2 / 16 Well-calibrated priors model f (y; θ), F(y; θ); log-likelihood l(θ)

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

A CONDITION TO OBTAIN THE SAME DECISION IN THE HOMOGENEITY TEST- ING PROBLEM FROM THE FREQUENTIST AND BAYESIAN POINT OF VIEW

A CONDITION TO OBTAIN THE SAME DECISION IN THE HOMOGENEITY TEST- ING PROBLEM FROM THE FREQUENTIST AND BAYESIAN POINT OF VIEW A CONDITION TO OBTAIN THE SAME DECISION IN THE HOMOGENEITY TEST- ING PROBLEM FROM THE FREQUENTIST AND BAYESIAN POINT OF VIEW Miguel A Gómez-Villegas and Beatriz González-Pérez Departamento de Estadística

More information

David Giles Bayesian Econometrics

David Giles Bayesian Econometrics David Giles Bayesian Econometrics 1. General Background 2. Constructing Prior Distributions 3. Properties of Bayes Estimators and Tests 4. Bayesian Analysis of the Multiple Regression Model 5. Bayesian

More information

Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models

Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models D. Fouskakis, I. Ntzoufras and D. Draper December 1, 01 Summary: In the context of the expected-posterior prior (EPP) approach

More information

Bayesian methods in economics and finance

Bayesian methods in economics and finance 1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors Linear Regression 2/26 Linear regression Model for relationship between (several) independent

More information

Objective Bayesian Hypothesis Testing

Objective Bayesian Hypothesis Testing Objective Bayesian Hypothesis Testing José M. Bernardo Universitat de València, Spain jose.m.bernardo@uv.es Statistical Science and Philosophy of Science London School of Economics (UK), June 21st, 2010

More information

BAYESIAN MODEL CRITICISM

BAYESIAN MODEL CRITICISM Monte via Chib s BAYESIAN MODEL CRITICM Hedibert Freitas Lopes The University of Chicago Booth School of Business 5807 South Woodlawn Avenue, Chicago, IL 60637 http://faculty.chicagobooth.edu/hedibert.lopes

More information

Mixtures of g-priors for Bayesian Variable Selection

Mixtures of g-priors for Bayesian Variable Selection Mixtures of g-priors for Bayesian Variable Selection Feng Liang, Rui Paulo, German Molina, Merlise A. Clyde and Jim O. Berger August 8, 007 Abstract Zellner s g-prior remains a popular conventional prior

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

Bayesian inference. Justin Chumbley ETH and UZH. (Thanks to Jean Denizeau for slides)

Bayesian inference. Justin Chumbley ETH and UZH. (Thanks to Jean Denizeau for slides) Bayesian inference Justin Chumbley ETH and UZH (Thanks to Jean Denizeau for slides) Overview of the talk Introduction: Bayesian inference Bayesian model comparison Group-level Bayesian model selection

More information

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction Sankhyā : The Indian Journal of Statistics 2007, Volume 69, Part 4, pp. 700-716 c 2007, Indian Statistical Institute More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order

More information

Stat260: Bayesian Modeling and Inference Lecture Date: March 10, 2010

Stat260: Bayesian Modeling and Inference Lecture Date: March 10, 2010 Stat60: Bayesian Modelin and Inference Lecture Date: March 10, 010 Bayes Factors, -priors, and Model Selection for Reression Lecturer: Michael I. Jordan Scribe: Tamara Broderick The readin for this lecture

More information

David Giles Bayesian Econometrics

David Giles Bayesian Econometrics 9. Model Selection - Theory David Giles Bayesian Econometrics One nice feature of the Bayesian analysis is that we can apply it to drawing inferences about entire models, not just parameters. Can't do

More information

Posterior Model Probabilities via Path-based Pairwise Priors

Posterior Model Probabilities via Path-based Pairwise Priors Posterior Model Probabilities via Path-based Pairwise Priors James O. Berger 1 Duke University and Statistical and Applied Mathematical Sciences Institute, P.O. Box 14006, RTP, Durham, NC 27709, U.S.A.

More information

1 Priors, Contd. ISyE8843A, Brani Vidakovic Handout Nuisance Parameters

1 Priors, Contd. ISyE8843A, Brani Vidakovic Handout Nuisance Parameters ISyE8843A, Brani Vidakovic Handout 6 Priors, Contd.. Nuisance Parameters Assume that unknown parameter is θ, θ ) but we are interested only in θ. The parameter θ is called nuisance parameter. How do we

More information

November 2002 STA Random Effects Selection in Linear Mixed Models

November 2002 STA Random Effects Selection in Linear Mixed Models November 2002 STA216 1 Random Effects Selection in Linear Mixed Models November 2002 STA216 2 Introduction It is common practice in many applications to collect multiple measurements on a subject. Linear

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet. Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Introduction to Markov chain Monte Carlo The Gibbs Sampler Examples Overview of the Lecture

More information

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources STA 732: Inference Notes 10. Parameter Estimation from a Decision Theoretic Angle Other resources 1 Statistical rules, loss and risk We saw that a major focus of classical statistics is comparing various

More information

Bayesian Loss-based Approach to Change Point Analysis

Bayesian Loss-based Approach to Change Point Analysis Bayesian Loss-based Approach to Change Point Analysis Laurentiu Hinoveanu 1, Fabrizio Leisen 1, and Cristiano Villa 1 arxiv:1702.05462v2 stat.me] 7 Jan 2018 1 School of Mathematics, Statistics and Actuarial

More information

ASSESSING A VECTOR PARAMETER

ASSESSING A VECTOR PARAMETER SUMMARY ASSESSING A VECTOR PARAMETER By D.A.S. Fraser and N. Reid Department of Statistics, University of Toronto St. George Street, Toronto, Canada M5S 3G3 dfraser@utstat.toronto.edu Some key words. Ancillary;

More information

Chapter 5. Bayesian Statistics

Chapter 5. Bayesian Statistics Chapter 5. Bayesian Statistics Principles of Bayesian Statistics Anything unknown is given a probability distribution, representing degrees of belief [subjective probability]. Degrees of belief [subjective

More information

Bayesian Hypothesis Testing: Redux

Bayesian Hypothesis Testing: Redux Bayesian Hypothesis Testing: Redux Hedibert F. Lopes and Nicholas G. Polson Insper and Chicago Booth arxiv:1808.08491v1 [math.st] 26 Aug 2018 First draft: March 2018 This draft: August 2018 Abstract Bayesian

More information

Consistency of test based method for selection of variables in high dimensional two group discriminant analysis

Consistency of test based method for selection of variables in high dimensional two group discriminant analysis https://doi.org/10.1007/s42081-019-00032-4 ORIGINAL PAPER Consistency of test based method for selection of variables in high dimensional two group discriminant analysis Yasunori Fujikoshi 1 Tetsuro Sakurai

More information

The binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution.

The binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution. The binomial model Example. After suspicious performance in the weekly soccer match, 37 mathematical sciences students, staff, and faculty were tested for the use of performance enhancing analytics. Let

More information

Fiducial Inference and Generalizations

Fiducial Inference and Generalizations Fiducial Inference and Generalizations Jan Hannig Department of Statistics and Operations Research The University of North Carolina at Chapel Hill Hari Iyer Department of Statistics, Colorado State University

More information

MODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY

MODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY ECO 513 Fall 2008 MODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY SIMS@PRINCETON.EDU 1. MODEL COMPARISON AS ESTIMATING A DISCRETE PARAMETER Data Y, models 1 and 2, parameter vectors θ 1, θ 2.

More information

arxiv: v1 [stat.ap] 27 Mar 2015

arxiv: v1 [stat.ap] 27 Mar 2015 Submitted to the Annals of Applied Statistics A NOTE ON THE SPECIFIC SOURCE IDENTIFICATION PROBLEM IN FORENSIC SCIENCE IN THE PRESENCE OF UNCERTAINTY ABOUT THE BACKGROUND POPULATION By Danica M. Ommen,

More information

A noninformative Bayesian approach to domain estimation

A noninformative Bayesian approach to domain estimation A noninformative Bayesian approach to domain estimation Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu August 2002 Revised July 2003 To appear in Journal

More information

An Encompassing Prior Generalization of the Savage-Dickey Density Ratio Test

An Encompassing Prior Generalization of the Savage-Dickey Density Ratio Test An Encompassing Prior Generalization of the Savage-Dickey Density Ratio Test Ruud Wetzels a, Raoul P.P.P. Grasman a, Eric-Jan Wagenmakers a a University of Amsterdam, Psychological Methods Group, Roetersstraat

More information

Penalized Loss functions for Bayesian Model Choice

Penalized Loss functions for Bayesian Model Choice Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented

More information

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30 MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)

More information

Introduction to Bayesian Methods

Introduction to Bayesian Methods Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

Uniformly Most Powerful Bayesian Tests and Standards for Statistical Evidence

Uniformly Most Powerful Bayesian Tests and Standards for Statistical Evidence Uniformly Most Powerful Bayesian Tests and Standards for Statistical Evidence Valen E. Johnson Texas A&M University February 27, 2014 Valen E. Johnson Texas A&M University Uniformly most powerful Bayes

More information

Noninformative Priors for the Ratio of the Scale Parameters in the Inverted Exponential Distributions

Noninformative Priors for the Ratio of the Scale Parameters in the Inverted Exponential Distributions Communications for Statistical Applications and Methods 03, Vol. 0, No. 5, 387 394 DOI: http://dx.doi.org/0.535/csam.03.0.5.387 Noninformative Priors for the Ratio of the Scale Parameters in the Inverted

More information

A Note on Hypothesis Testing with Random Sample Sizes and its Relationship to Bayes Factors

A Note on Hypothesis Testing with Random Sample Sizes and its Relationship to Bayes Factors Journal of Data Science 6(008), 75-87 A Note on Hypothesis Testing with Random Sample Sizes and its Relationship to Bayes Factors Scott Berry 1 and Kert Viele 1 Berry Consultants and University of Kentucky

More information

Gibbs Sampling in Linear Models #2

Gibbs Sampling in Linear Models #2 Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling

More information

A Few Special Distributions and Their Properties

A Few Special Distributions and Their Properties A Few Special Distributions and Their Properties Econ 690 Purdue University Justin L. Tobias (Purdue) Distributional Catalog 1 / 20 Special Distributions and Their Associated Properties 1 Uniform Distribution

More information

Regression and Statistical Inference

Regression and Statistical Inference Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF

More information

BTRY 4830/6830: Quantitative Genomics and Genetics

BTRY 4830/6830: Quantitative Genomics and Genetics BTRY 4830/6830: Quantitative Genomics and Genetics Lecture 23: Alternative tests in GWAS / (Brief) Introduction to Bayesian Inference Jason Mezey jgm45@cornell.edu Nov. 13, 2014 (Th) 8:40-9:55 Announcements

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33 Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett

More information

Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box Durham, NC 27708, USA

Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box Durham, NC 27708, USA Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box 90251 Durham, NC 27708, USA Summary: Pre-experimental Frequentist error probabilities do not summarize

More information

Regression. Oscar García

Regression. Oscar García Regression Oscar García Regression methods are fundamental in Forest Mensuration For a more concise and general presentation, we shall first review some matrix concepts 1 Matrices An order n m matrix is

More information

arxiv: v1 [math.st] 2 Oct 2015

arxiv: v1 [math.st] 2 Oct 2015 Model Selection and Multiple Testing - A Bayes and Empirical Bayes Overview and some New Results Ritabrata Dutta a, Malgorzata Bogdan ab and Jayanta K Ghosh ac a Department of Statistics, Purdue University

More information

MODEL AVERAGING by Merlise Clyde 1

MODEL AVERAGING by Merlise Clyde 1 Chapter 13 MODEL AVERAGING by Merlise Clyde 1 13.1 INTRODUCTION In Chapter 12, we considered inference in a normal linear regression model with q predictors. In many instances, the set of predictor variables

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 9 for Applied Multivariate Analysis Outline Addressing ourliers 1 Addressing ourliers 2 Outliers in Multivariate samples (1) For

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Bayesian Asymptotics

Bayesian Asymptotics BS2 Statistical Inference, Lecture 8, Hilary Term 2008 May 7, 2008 The univariate case The multivariate case For large λ we have the approximation I = b a e λg(y) h(y) dy = e λg(y ) h(y ) 2π λg (y ) {

More information

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over Point estimation Suppose we are interested in the value of a parameter θ, for example the unknown bias of a coin. We have already seen how one may use the Bayesian method to reason about θ; namely, we

More information

Cross-sectional space-time modeling using ARNN(p, n) processes

Cross-sectional space-time modeling using ARNN(p, n) processes Cross-sectional space-time modeling using ARNN(p, n) processes W. Polasek K. Kakamu September, 006 Abstract We suggest a new class of cross-sectional space-time models based on local AR models and nearest

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public

More information

Statistical Inference

Statistical Inference Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Spring, 2006 1. DeGroot 1973 In (DeGroot 1973), Morrie DeGroot considers testing the

More information