arxiv: v2 [stat.me] 13 Jun 2012
|
|
- Stephanie Stevens
- 5 years ago
- Views:
Transcription
1 A Bayes factor with reasonable model selection consistency for ANOVA model arxiv: v2 [stat.me] 13 Jun 2012 Yuzo Maruyama The University of Tokyo Abstract: For the ANOVA model, we propose a new g-prior based Bayes factor without integral representation, with reasonable model selection consistency for any asymptotic situations (either number of levels of the factor and/or number of replication in each level goes to infinity). Exact analytic calculation of the marginal density under a special choice of the priors enables such a Bayes factor. AMS 2000 subject classifications: Primary 62F15, 62F07; secondary 62A10. Keywords and phrases: Bayesian model selection, model selection consistency, fully Bayes method, Bayes factor. 1. Introduction In this paper, we consider Bayesian model selection for ANOVA model. We start with one-way ANOVA with two possible models. In one model, all random variables have the same mean. In the other model, there are some levels and random variables in each level have different means. Formally, the independent observations y ij (i = 1,...,p, j = 1,...,r i, n = p i=1 r i) are assumed to arise from the linear model: y ij = µ+α i +ǫ ij, ǫ ij N(0,σ 2 ) (1.1) where µ, α i (i = 1,...,p) and σ 2 are unknown. Clearly two models described above are written as follows: M 1 : α i = 0 for all 1 i p, vs M A+1 : α i 0 for some i. (1.2) Model selection, which refers to using the data in order to decide on the plausibility of two or more competing models, is a common problem in modern statistical science. A natural Bayesian approach is by Bayes factor (ratio of marginal densities of two models), which is a function of the posterior model probabilities (Kass and Raftery (1995)). From a theoretical viewpoint, one of the most important topic on Bayesian model selection is consistency. Consistency means that the true model will be chosen if enough data are observed, assuming that one of the competing models is true. It is well-known that BIC by Schwarz (1978) is consistent under fixed number of parameters. In our situation, when p is fixed and r = n p p = i=1 r i p 1, (1.3)
2 Y. Maruyama/A Bayes factor 2 the BIC is consistent. As a high-dimensional problem of statistical inference, the consistency in the case where p in one-way ANOVA setup, has been addressed by Stone (1979) and Berger, Ghosh and Mukhopadhyay (2003). In the following, let CASE I and CASE II denote the cases where I. n, p and r is bounded, II. n, p and r, respectively. When σ 2 is known and CASE I is assumed, Stone (1979) showed that BIC chooses the null model M 1 with probability 1 (that is, BIC is not consistent under M A+1 ). This is reasonable because BIC is originally derived by the Laplace approximation under fixed number of parameters. In the same situation as Stone (1979), Berger, Ghosh and Mukhopadhyay(2003) proposed a Bayesian criterion called GBIC, which is derived by the Laplace approximation under CASE I. Then they showed that GBIC has model selection consistency under CASE I. Recently Moreno, Girón and Casella(2010) showed that under CASE II with p = O(n b ) for 0 < b < 1 or equivalently p = O({ r} b/(1 b) ), BIC is consistent. By consideringbveryclose to 1, BIC seemsto haveconsistencyunder CASE II even if r approaches infinity much slower than p. As in Theorem 3.1, however, we show that this is not so but the consistency of BIC under CASE II depends on how quick r goesto infinity comparedto p. As far as we know, this inconsistency of BIC has not yet been reported in the literature. As an alternative to BIC, following Zellner(1986); Liang et al.(2008); Maruyama and George (2011), we will propose a new g-prior based Bayes factor with consistency under CASE II, which is given by where BF F (A+1):1 = Γ(p/2)Γ({n p}/2) ( 1+ W ) (n p 1)/2 H, (1.4) Γ(1/2)Γ({n 1}/2) W E W H = r i (ȳ i ȳ ) 2, W E = i ij j ȳ i = y ij ij, ȳ = y ij r i i r. i (y ij ȳ i ) 2, As shown in Theorem 3.1, BF F (A+1):1 is consistent under CASE II even if r approaches infinity much slower than p. Under CASE I where BIC is not consistent, the consistency of BF F (A+1):1 is shown to depend on a kind of distance between M A+1 and M 1, which is c A = ij (E[y ij] E[ȳ ]) 2 nσ 2. (1.5) Naturally c A = 0 under M 1 and c A > 0 under M A+1. In Theorem 3.1, under CASE I with a positive small c A, we show that BF F (A+1):1 chooses M 1 with
3 Y. Maruyama/A Bayes factor 3 probability 1. Actually such an inconsistency result has been already reported by Moreno, Girón and Casella (2010). In Remark 3.1, we demonstrate that the existence of inconsistency region for small c A and large p is reasonable from the prediction point of view. The rest of the paper is organized as follows. In Section 2, we re-parameterize the ANOVA model givenby (1.1) as a linear regressionmodel and give priorsfor the regression model. Then we derive marginal densities under M 1 and M A+1 and eventually the Bayes factor given by (1.4), as the ratio of the marginal densities. In Section 3, we show that the Bayes factor has a reasonable model selection consistency for any asymptotic situation. In Section 4, the results derivedinsections2and3areextendedtotwo-wayanovamodel.theappendix presents some of the more technical proofs. 2. A Bayes factor for one-way ANOVA 2.1. re-parameterized ANOVA Let α = (α 1,...,α p ), y = (y 11,...,y 1r1,y 21,...,y 2r2,...,y p1,...,y prp ), ǫ = (ǫ 11,...,ǫ 1r1,ǫ 21,...,ǫ 2r2,...,ǫ p1,...,ǫ prp ). Further let an n p matrix X A = (x ij ) satisfy { 1 if j 1 k=1 x ij = r k +1 i j k=1 r k, 0 otherwise. (2.1) Then the linear model given by (1.1) is written as y = µ1 n +X A α+ǫ = θ 1 1 n + X A α+ǫ where θ 1 = µ+r α/n with r = (r 1,...,r p ) and X A = X A 1 nr which is the centered matrix of X A. The matrix X A is written as r 1 r 2 n {}}{{}}{{}}{ ( ν 1,...,ν 1, ν 2,...,ν 2,... ν p,...,ν p ) (2.2) where ν i = e i n 1 r with e i = (0,...,0,1,0,...,0). The n p matrix X A is not full rank but rank X A = p 1 since {ν 1,...,ν p 1 } is linearly independent and p i=1 r iν i = 0. For identifiability of α, the linear restriction r p ν 0α = 0 (2.3)
4 Y. Maruyama/A Bayes factor 4 is assumed such that {ν 0,ν 1,...,ν p 1 } is linearly independent. Typically 1 p or r are chosen as ν 0. We will see that any linear restriction does not affect our result. Let the singular value decomposition of XA with rank p 1 be X A = U A D A W A, whereu A andw A aren (p 1)andp (p 1)orthogonalmatrices,respectively. Then the one-way ANOVA is re-parameterized as the linear regression model where y = θ 1 1 n +U A θ A +ǫ (2.4) θ A = D A W Aα R p 1. (2.5) Under the restriction ν 0 α = 0, θ A = 0 p 1 is equivalent to α = 0 p because the p p matrix (W A D A,ν 0 ) is non-singular and ( ) ( ) DA W A θa α =. (2.6) 0 ν 0 Hence, under (2.4), two models described in (1.2) are written as M 1 : θ A = 0 p 1 and M A+1 : θ A 0 p Priors and the Bayes factor In Bayesian model selection, the specification of priors are needed on the models and parameters in each model. For the former, let Pr(M 1 ) = Pr(M A+1 ) = 1/2 as usual. For the latter, at moment, we write joint prior densities as p(θ 1,σ 2 ) for M 1 and p(θ 1,θ A,σ 2 ) for M A+1. From the Bayes theorem, M A+1 is chosen when Pr(M A+1 y) > 1/2 where Pr(M A+1 y) = BF (A+1):1 1+BF (A+1):1 and BF (A+1):1 is the Bayes factor given by BF (A+1):1 = m A+1 (y)/m 1 (y). (2.7) In other words, M A+1 is chosen if and only if BF (A+1):1 > 1. In (2.7), m γ (y) is the marginal density under M γ for γ = 1,A+1 as follows: m 1 (y) = p(y θ 1,σ 2 )p(θ 1,σ 2 )dθ 1 dσ 2 m A+1 (y) = p(y θ 1,θ A,σ 2 )p(θ 1,θ A,σ 2 )dθ 1 dθ A dσ 2, where p(y θ 1,σ 2 ) and p(y θ 1,θ A,σ 2 ) are sampling densities of y under M 1 and M A+1, respectively.
5 Y. Maruyama/A Bayes factor 5 In this paper, we use the following joint prior density for M 1 and p(θ 1,σ 2 ) = p(θ 1 )p(σ 2 ) = 1 σ 2 (2.8) p(θ 1,θ A,σ 2 ) = p(θ 1 )p(σ 2 )p(θ A σ 2 ) = 1 σ 2 0 p(θ A g,σ 2 )p(g)dg (2.9) for M A+1. Note that p(θ 1 )p(σ 2 ) = σ 2 in both (2.8) and (2.9) is a popular non-informative prior. It is clear that θ 1 and σ 2 are location-scale parameters, and the improper prior for them is the right-haar prior from the locationscale invariance group. These are reasons why the use of these improper priors for θ 1 and σ 2 is formally justified. See Berger, Pericchi and Varshavsky (1998); Berger, Bernardo and Sun (2009) for details. As p(θ A g,σ 2 ), we use so-called Zellner s (1986) g-prior p(θ A σ 2,g) = N p 1 (0,gσ 2 (U AU A ) 1 ) = N p 1 (0,gσ 2 I p 1 ). (2.10) As the prior of g, following Maruyama and George (2011), we use PearsonType VI distribution with the density p(g) = gb (1+g) a b 2 B(a+1,b+1) = g(n p)/2 a 2 (1+g) (n p)/2 B(a+1,(n p)/2 a 1) (2.11) where 1 < a < (n p)/2 1 and b = (n p)/2 a 2. (2.12) Remark 2.1. For the choice of a, my recommendation is a = 1/2. We will describe it briefly. The asymptotic behavior of p(g) given by (2.11), for sufficiently large g, is proportional to g a 2. From the Tauberian Theorem, which is well-known for describing the asymptotic behavior of the Laplace transform, we have p(θ A σ 2 ) = 0 p(θ A σ 2,g)p(g)dg (σ 2 ) a+1 θ A (p+2a+1), (2.13) for sufficiently large θ A R p 1, a > 1 and b > 1. Hence the asymptotic tail behavior of p(θ A σ 2 ) for a = 1/2 is multivariate Cauchy, θ A (p 1) 1, which has been recommended by Zellner and Siow(1980) and others in objective Bayes context. Before we proceed to give the Bayes factor with respect to priors described above, we review statistical inference in ANOVA from frequentist point of view. The key decomposition is given by W T = y ȳ 1 n 2 = ij (y ij ȳ ) 2 = U A U A{y ȳ 1 n } 2 + (I U A U A){y ȳ 1 n } 2 = i r i(ȳ i ȳ ) 2 + ij (y ij ȳ i ) 2 = W H +W E, (2.14)
6 where W H and W E are independent and Y. Maruyama/A Bayes factor 6 W H σ 2 [ χ2 p 1 θa 2 /σ 2], W E σ 2 χ2 n p. In (2.14), W T, W H and W E are called total sum of squares, between group sum of squares and within group sum of squares, respectively. The hypothesis H 0 : α = 0 p (or θ A = 0 p 1 ) is rejected if W H /W E is relatively larger. In the theorem below, the Bayes factor under our priors is an increasing function of W H /W E, which is reasonable from frequentist point of view, too. Theorem 2.1. Under priors given by (2.8) under M 1 and by (2.9) with (2.10) and (2.11) under M A+1, the Bayes factor is given by m A+1 (y) m 1 (y) Proof. See Appendix. = Γ(p/2+a+1/2)Γ((n p)/2) Γ(a+1)Γ({n 1}/2) ( 1+ W ) (n p 2)/2 a H. W E By Remark 2.1 and Theorem 2.1, the Bayes factor which we recommend is written as BF F (A+1):1 = m A+1(y) = Γ(p/2)Γ({n p}/2) ( 1+ W ) (n p 1)/2 H, (2.15) m 1 (y) Γ(1/2)Γ({n 1}/2) W E where the subscript F means Fully Bayes. 3. Model selection consistency In this section, we consider the model selection consistency as n = i r i. Generally, the posterior consistency for model choice is defined as plimpr(m γ y) = 1 or plimbf γ :γ = 0 (3.1) n n when M γ is the true model and M γ is not. Here plim denotes convergence in probability and the probability distribution in (3.1) is the sampling distribution under the true model M γ. We will show that BF F (A+1):1 given by (2.15) has a reasonable model selection consistency. As a natural competitor of BF F (A+1):1, the BIC based Bayes factor, ( BF BIC (A+1):1 = 1+ W ) n/2 H n (p 1)/2, W E is considered. As remarked in Section 1, the consistency depends on c A given by (1.5) or equivalently θ A 2 /(nσ 2 ). Let Then we have a following result. c A = lim c θ A 2 A = lim n n nσ 2. (3.2)
7 Theorem 3.1. Y. Maruyama/A Bayes factor 7 I. Assume r and p is fixed. (a) BF F (A+1):1 and BF BIC (A+1):1 are consistent whichever the true model is. II. Assume p. (a) BF F (A+1):1 and BFBIC (A+1):1 are consistent under M 1. (b) Assume r is fixed under M A+1. i. BF F (A+1):1 is consistent (inconsistent) if c A > (<)h( r) where ii. BF BIC (A+1):1 is inconsistent. (c) Assume r under M A+1. i. BF F (A+1):1 is consistent. h(r) = r 1/(r 1) 1. (3.3) ii. BF BIC (A+1):1 is consistent (inconsistent) if p slower (faster) than (e{1+ c A } r )/ r. Note: h(r) is a convex decreasing function in r which satisfies h(2) = 1, h(5). = 0.5, h(10). = 0.29, and h( ) = 0. Proof. See Appendix. Note: After Maruyama (2009), the first version of this paper, where the balanced ANOVA was treated only (that is, r i r is assumed), Wang and Sun (2012) extended consistency results of Maruyama (2009) to unbalanced case while the proof itself essentially follows from Maruyama (2009). Remark 3.1. We give some remarks on inconsistency shown in Theorem 3.1. I. As shown in II(b)ii of Theorem 3.1, BIC always chooses M 1 when p and r is fixed, even if c A is very large. This is interpreted as unknown variance version of Stone s (1979). II. As seen in II(b)i of Theorem 3.1, BF F (A+1):1 has an inconsistency region. Actually existing such an inconsistency region has been also reported by Moreno, Girón and Casella (2010). They proposed intrinsic Bayes factor for normal regression model and their upper-bound of inconsistency region for one-way balanced ANOVA is given by r 1 (r+1) (r 1)/r 1 1 which is slightly smaller than h(r) given by (3.3). III. The existence of inconsistency region for small c A and large p is quite reasonable from the following reason. Assume new independent observations z ij (i = 1,...,p, j = 1,...,r i ) from the same model as y ij. Then the difference of scaled mean squared prediction errors of ȳ i and ȳ is given
8 by Y. Maruyama/A Bayes factor 8 [ ] [ ] E y,z i,j (z ij ȳ ) 2 E y,z i,j (z ij ȳ i ) 2 [ȳ ;ȳ i ] = nσ 2 nσ 2 = c A p 1 n for any p and r. First assume that M 1 is true. We see that [ȳ ;ȳ i ] = (p 1)/n < 0, which is reasonable. Then assume that M A+1 is true. When r and p is fixed, lim r [ȳ ;ȳ i ] = c A > 0, which is reasonable. On the other hand, if p and r is fixed, lim [ȳ ;ȳ p i ] = c A 1/ r. Hence when c A < 1/ r, [ȳ ;ȳ i ] is negative even if M A+1 is true. Therefore, from the prediction point of view, the existence of inconsistency region for small c A and large p is reasonable. Table1showsfrequencyofchoiceofthetruemodelinsomecasesinnumerical experiment where he balanced case r 1 = = r p is considered. We see that it clearly guarantees the validity of Theorem two-way ANOVA 4.1. re-parameterized two-way ANOVA In this section, we extend the results in Sections 2 and 3 to two-way ANOVA model. We have n independent normal random variables y ijk (i = 1,...,p, j = 1,...,q, k = 1,...,r ij, n = i,j r ij) where y ijk = µ+α i +β j +γ ij +ǫ ijk, ǫ ijk N(0,σ 2 ). (4.1) In the following, for simplicity as in Christensen (2011), we assume that r ij is given by r ij = n pq ν iξ j (4.2) where ν i for i = 1,...,p and ξ j for j = 1,...,q satisfy p ν i = p, i=1 q ξ j = q j=1
9 Y. Maruyama/A Bayes factor 9 Table 1 Frequency of choice of the true model p\r under M 1 c = 0.1 under M A+1 FB BIC c = 0.5 under M A+1 c = 1 under M A+1 FB BIC c = 2 under M A+1 c = 5 under M A+1 FB BIC
10 Y. Maruyama/A Bayes factor 10 and that {n/(pq)}ν i ξ j for any (i,j) is a positive integer. Therefore we allow some kinds of unbalanced design as well as balanced design, r ij n/(pq) for any (i,j). Let y = ({y 111,...,y 11r11 },{y 121,...,y 12r12 },...,{y pq1,...,y pqrpq }), ǫ = ({ǫ 111,...,ǫ 11r11 },{ǫ 121,...,ǫ 12r12 },...,{ǫ pq1,...,ǫ pqrpq }), α = (α 1,...,α p ), β = (β 1,...,β q ) and γ = ({γ 11,...,γ 1q },{γ 21,...,γ 2q },...,{γ p1,...,γ pq }). The n (p+q +pq) design matrix is given by (X A,X B,X AB ), where the n p matrix X A = (x A ti ), the n q matrix X B = (X B1,...,X Bp ) with an ( q j=1 r ij) q matrix X Bi = (x Bi tj ), the n pq matrix X AB = (x AB tm ) satisfy { x A 1 if i 1 q m=1 l=1 ti = r ml +1 t i q m=1 l=1 r ml, 0 otherwise, and x AB x Bi tj = { 1 if j 1 m=1 r im +1 t j m=1 r im, 0 otherwise, { tm = 1 if s<m r i(s)j(s) +1 t s m r i(s)j(s), 0 otherwise, where i(s) and j(s) are the unique integer solution of s = (i 1)q + j where i = 1,...,p and j = 1,...,q, respectively. Then the linear model of (4.1) is written as y = µ1 n +X A α+x B β +X AB γ +ǫ, ǫ N n (0 n,σ 2 I n ). (4.3) Note that {µ,α,β,γ} do not have identifiability since 1 n = X A 1 p = X B 1 q = X AB 1 pq, X A e i = X AB (0 q,...,0 q,1 q,0 q,...,0 q ), for i = 1,...,p, X B e j = X AB (e j,...,e j ), for j = 1,...,q. (4.4) So we will re-parameterize the model (4.3) as a linear regression model with non-constrained parameters. The expectation E[y] is re-written as µ1 n +X A α+x B β +X AB γ = θ 1 (µ,α,β,γ)1 n + X A α+ X B β + X AB γ
11 Y. Maruyama/A Bayes factor 11 where θ 1 (µ,α,β,γ) = µ+ ν α p + ξ β q + (r 11,...,r pq ) γ n and X A, XB, XAB are the centered matrices. As explained in Christensen (2011), under the condition (4.2) on r ij including the balanced design, XA and X B are orthogonal, that is, X A XB is the zero matrix. Let the singular value decomposition of XA, XB and (I n U A U A )(I n U B U B ) X AB be X A = U A D A W A, XB = U B D B W B, (I n U A U A)(I n U B U B) X AB = U AB\(A+B) D AB\(A+B) W AB\(A+B), (4.5) where the rank of XA, XB and (I n U A U A )(I n U B U B ) X AB are p 1, q 1 and (p 1)(q 1), respectively. Then where E[y] = θ 1 (µ,α,β,γ)1 n + X A α+u A U AX AB γ + X B β +U B U BX AB γ +(I n U A U A U B U B) X AB γ = θ 1 (µ,α,β,γ)1 n +U A {D A W A α+u A X ABγ} +U B {D B W Bβ +U BX AB γ}+(i n U A U A U B U B) X AB γ = θ 1 (µ,α,β,γ)1 n +U A θ A (α,γ) +U B θ B (β,γ)+u AB\(A+B) θ AB\(A+B) (γ) θ A (α,γ) = D A W A α+u A X ABγ R p 1, θ B (β,γ) = D B W Bβ +U BX AB γ R q 1, θ AB\(A+B) (γ) = D AB\(A+B) W AB\(A+B) γ R(p 1)(q 1). (4.6) Therefore the linear model with non-constrained parameters is given by {θ 1,θ A,θ B,θ AB\(A+B) } y = θ 1 1 n +U A θ A +U B θ B +U AB\(A+B) θ AB\(A+B) +ǫ. (4.7) In the two-way ANOVA model given by (4.3), the following five submodels are important. M 1 : E[y] = µ1 n, (α = 0 p,β = 0 q,γ = 0 pq ) M A+1 : E[y] = µ1 n +X A α, (β = 0 q,γ = 0 pq ) M B+1 : E[y] = µ1 n +X B β, (α = 0 p,γ = 0 pq ) M A+B+1 : E[y] = µ1 n +X A α+x B β, (γ = 0 pq )
12 Y. Maruyama/A Bayes factor 12 M (A+1)(B+1) : E[y] = µ1 n +X A α+x B β +X AB γ. Under suitable constraints on α, β and γ, for example, ν α = 0, ξ β = 0, r 11,...,r pq U A X AB γ = 0 p+q 1, (4.8) U B X AB {µ,α,β,γ} are identifiable, and further between (4.3) and (4.7), there are the following equivalences: M 1 : (α = 0 p,β = 0 q,γ = 0 pq ) (θ A = 0 p 1,θ B = 0 q 1,θ AB\(A+B) = 0 (p 1)(q 1) ), M A+1 : (β = 0 q,γ = 0 pq ) (θ B = 0 q 1,θ AB\(A+B) = 0 (p 1)(q 1) ), M B+1 : (α = 0 p,γ = 0 pq ) (θ A = 0 p 1,θ AB\(A+B) = 0 (p 1)(q 1) ), M A+B+1 : γ = 0 pq θ AB\(A+B) = 0 (p 1)(q 1). Note that the constraints for {α,β,γ} as (4.8) do not affect our results priors and the Bayes factor Following Section 2.2, we assume that π(θ 1,σ 2 ) = 1/σ 2 for every model and θ A {g,σ 2 } N p 1 (0 p 1,gσ 2 I p 1 ) for M A+1,M A+B+1,M (A+1)(B+1), θ B {g,σ 2 } N q 1 (0 q 1,gσ 2 I q 1 ) for M B+1,M A+B+1,M (A+1)(B+1), and θ AB\(A+B) {g,σ 2 } N (p 1)(q 1) (0 (p 1)(q 1),gσ 2 I (p 1)(q 1) ) for M (A+1)(B+1). Further we assume p(g) = gb (1+g) a b 2 B(a+1,b+1) (4.9) where a = 1/2 and b(s) = (n s 1)/2 a 2 (4.10) for p 1 under M A+1 q 1 under M B+1 s = p+q 2 under M A+B+1 pq 1 under M (A+1)(B+1).
13 Y. Maruyama/A Bayes factor 13 Before we proceed to give the Bayes factor with respect to priors described above, we review the decomposition of squares for two-way ANOVA. As similar in one-way ANOVA, the total sum of squares W T = ijk (y ijk ȳ ) 2 = y ȳ 1 n 2 can be decomposed as where each sums of squares are given by W A = ijk W B = ijk W AB\(A+B) = ijk W T = W A +W B +W AB\(A+B) +W E (4.11) (ȳ i ȳ ) 2 = U A U A{y ȳ 1 n } 2 = U A{y ȳ 1 n } 2, (ȳ j ȳ ) 2 = U B U B {y ȳ 1 n } 2 = U B {y ȳ 1 n } 2, (ȳ ij ȳ i ȳ j +ȳ ) 2 = U AB\(A+B) U AB\(A+B) {y ȳ 1 n } 2 = U AB\(A+B) {y ȳ 1 n } 2, W E = ijk and (y ijk ȳ ij ) 2 = (I n U A U A)(I n U B U B)(I n U AB\(A+B) U AB\(A+B) ){y ȳ 1 n } 2 ȳ = 1 y ijk, ȳ i = 1 n ijk j r y ijk, ij jk ȳ j = 1 i r y ijk, ȳ ij = 1 y ijk. ij r ij ik In (4.11), W A,W B,W AB\(A+B) and W E are independently distributed as W A σ 2 χ2 p 1 ( θ A 2 /σ 2 W B ), σ 2 χ2 q 1 ( θ B 2 /σ 2 ), W AB\(A+B) σ 2 χ 2 (p 1)(q 1) ( θ AB\(A+B) 2 /σ 2 ). k W E σ 2 χ2 n pq, (4.12) Note that θ A 2, θ B 2 and θ AB\(A+B) 2 do not depend on parameterization since these are given by θ A 2 = U A{E[y] E[ȳ ]1 n } 2, θ B 2 = U B {E[y] E[ȳ ]1 n } 2, θ AB\(A+B) 2 = U AB\(A+B) {E[y] E[ȳ ]1 n } 2. Based on the decomposition, Bayes factors for two-way ANOVA are given as follows.
14 Y. Maruyama/A Bayes factor 14 Theorem 4.1. When M γ for γ = A+1,B+1,A+B+1, and (A+1)(B+1) and M 1 are pairwisely compared, the corresponding Bayes factors can be derived as follows: BF F (A+1):1 = Γ(p/2)Γ((n p)/2) ( 1+ Γ(1/2)Γ({n 1}/2) BF F (B+1):1 = Γ(q/2)Γ((n q)/2) Γ(1/2)Γ({n 1}/2) ( 1+ BF F (A+B+1):1 = Γ(p+q 1 2 )Γ( n p q+1 2 ) Γ(1/2)Γ({n 1}/2) BF F (A+1)(B+1):1 = Γ(pq/2)Γ((n pq)/2) Γ(1/2)Γ({n 1}/2) W A W T W A W B W T W B ( 1+ ) (n p)/2 1/2 ) (n q)/2 1/2 ) (n p q)/2 W A +W B W T W A W B ( 1+ W ) (n pq)/2 1/2 T W E. W E 4.3. Consistency In this subsection, we consider model selection consistency of Bayes factors proposed in Theorem 4.1. We call {BF F }, the set of Bayes factors given in Theorem 4.1, consistent under M γ if and only if BF F γ :1 plim n BF F = 0 γ:1 for any γ γ (Let BF F 1:1 = 1). As the competitor, the corresponding Bayes factors based on BIC are given as follows. Let BF BIC (A+1):1 = n (p 1)/2 (1+ BF BIC (B+1):1 = n (q 1)/2 (1+ BF BIC (A+B+1):1 = n (p+q 2)/2 (1+ W A W T W A W B W T W B ) n/2 ) n/2 W A +W B W T W A W B ) n/2. BF BIC (A+1)(B+1):1 = n (pq 1)/2 ( 1+ W T W E W E ) n/2 θ A 2 c A = lim n nσ 2, c θ B 2 B = lim n nσ 2, c θ AB\(A+B) 2 AB\(A+B) = lim n nσ 2. Then we have a following result on consistency. Theorem 4.2. I. Assume r and p and q are fixed.
15 Y. Maruyama/A Bayes factor 15 (a) Both {BF F } and {BF BIC } are consistent whichever the true model is. II. Assume p and q. (a) Both {BF F } and {BF BIC } are consistent except under M (A+1)(B+1). (b) Assume r is fixed. i. {BF F } is consistent (inconsistent) under M (A+1)(B+1) when c AB\(A+B) > (<)H( r, c A + c B ), (4.13) where H(r,c) with positive c is the (unique) positive solution of (x+1) r /r (x+1) c = 0. (4.14) ii. {BF BIC } is inconsistent under M (A+1)(B+1). (c) Assume r. Proof. See Appendix. i. {BF F } is consistent under M (A+1)(B+1). ii. {BF BIC } is consistent (inconsistent) under M (A+1)(B+1) when pq slower (faster) than (e{1+ c AB\(A+B) } r )/ r. The function H(r, c) satisfies H(r,c) with fixed r is increasing in c and H(r,0) = h(r) where h(r) is given by (3.3). H(r,c) with fixed c is decreasing in r and lim H(r,c) = 0. r Remark 4.1. We have results under fixed q (and fixed r or r ), too. We omit the detail since they are somewhat complicated. Appendix A: Proof of Theorem 2.1 First we derive the marginal density under M 1. Using the Pythagorean relation y θ 1 1 n 2 = n(ȳ θ 1 ) 2 +W T, where ȳ = n 1 i,j y ij and W T = y ȳ 1 n 2 = ij (y ij ȳ ) 2, we have 1 m 1 (y) = exp ( y θ 11 n 2 ) 0 (2π) n/2 σn+2 2σ 2 dθ 1 dσ 2 n 1/2 ( 1 = exp W ) T (2π) n/2 1/2 σn+1 2σ 2 dσ 2 0 = n1/2 Γ({n 1}/2) π (n 1)/2 {W T } (n 1)/2.
16 Y. Maruyama/A Bayes factor 16 Then we derive the marginal density under M A+1. Using the relationship y θ 1 1 n U A θ A 2 +g 1 θ A 2 = n(ȳ θ 1 ) 2 + y ȳ 1 n U A θ A 2 +g 1 θ A 2 = n(ȳ θ 1 ) 2 + g +1 g θ A gu A (y ȳ 1 n ) 2 g +1 g g +1 U A(y ȳ 1 n ) 2 + y ȳ 1 n 2 = n(ȳ θ 1 ) 2 + g +1 g θ A gu A (y ȳ 1 2 n ) g +1 + W T +gw E, g +1 where W E is given in (2.14), we have the conditional marginaldensity of y given g under M A+1, m A+1 (y g) = p(y θ 1,θ A,σ 2 )p(θ A σ 2,g)p(σ 2 )dθ 1 dθ A dσ 2 R p = R p 1 0 (2πσ 2 ) n/2 (2πgσ 2 ) (p 1)/2 exp ( y θ 11 n U A θ A 2 2σ 2 θ A 2 ) 2gσ 2 p(σ 2 )dθ 1 dθ A dσ 2 = 0 n 1/2 (1+g) (p 1)/2 (2πσ 2 ) (n 1)/2 (1+g) (n p)/2 exp = m 1 (y) (g{w E /W T }+1) (n 1)/2. The fully marginal density m A+1 (y) = 0 ( W T +gw E 2σ 2 (g +1) m A+1 (y g)p(g)dg ) 1 σ 2dσ2 under the prior of g given by (2.11), has a closed simple form as follows; m A+1 (y) = m 1 (y) 0 1 = m 1 (y) B(a+1,b+1) which completes the proof. (1+g) (n p)/2 g b (1+g) a b 2 (g{w E /W T }+1) (n 1)/2 B(a+1,b+1) dg = m 1 (y) Γ(p/2+a+1/2)Γ((n p)/2) Γ(a+1)Γ({n 1}/2) Appendix B: Proof of Theorem 3.1 preparation: random part 0 g b (g{w E /W T }+1) (n 1)/2dg ( ) (n p 2)/2 a WT, W E (A.1)
17 Y. Maruyama/A Bayes factor 17 Recall that, for any n and p, W H and W E are independent and W H σ 2 [ χ2 p 1 θa 2 /σ 2], W E σ 2 χ2 n p. P In the following, n denotes convergence in probability as n grows. When p is fixed and r(= n/p) goes to, we have W E P W H r p rσ 2 1, σ 2 χ2 p under M 1, W H P r p rσ 2 c A under M A+1. Hence, for any ǫ > 0 and any positive number r, there exists an l(ǫ) such that Pr ( 1/l(ǫ) < {1+W H /W E } p r < l(ǫ) ) > 1 ǫ under M 1. Under M A+1, we have When p goes to infinity, we have Hence we have 1 W E P p r 1 pσ 2 1, 1+ W H W E P r 1+ c A. W H pσ 2 P p 1 under M 1, W H pσ 2 P p 1+ r c A under M A W H W E P p r r 1 under M 1, 1+ W H P r{1+ c A } p under M A+1. W E r 1 preparation: non-random part For the asymptotic behavior of the gamma function, Stiring s formula, Γ(ax+b) 2πe ax (ax) ax+b 1/2 (B.1) (B.2) (B.3) (B.4) for sufficiently large x is useful. Here f g means limf/g = 1. Using (B.4), we get Γ(p/2)Γ({p r p}/2) Γ(1/2)Γ({p r 1}/2) Γ(p/2) 1 Γ(1/2)(p/2) (p 1)/2 r (p 1)/2, (B.5) when r and p is fixed, and when p. Γ(p/2)Γ({p r p}/2) Γ({p r 1}/2)Γ(1/2) ( r 1)p( r 1)/2 1/2 2 = 2 r ( r 1) 1/2 r p r/2 1 { r 1 r r/( r 1) } p( r 1)/2 (B.6)
18 Y. Maruyama/A Bayes factor 18 main part First, we consider the consistency in the case where r and p is fixed. Using (B.1), (B.2) and (B.5), we have BF F (A+1):1 P r 0 under M 1, 1/BF F P (A+1):1 r 0 under M A+1, which means BF F (A+1):1 is consistent under both M 1 and M A+1. Similarly we see that BF BIC (A+1):1 is consistent under both M 1 and M A+1 when r and p is fixed. Then we consider the consistency in the case where p. Using (B.3) under M 1 and (B.6), we have BF F (A+1):1 P p 0, BF BIC P (A+1):1 p 0 which means BF F (A+1):1 and BFBIC (A+1):1 are consistent under M 1 when p. Using (B.3) under M A+1 and (B.6), we have 1 BF F (A+1):1 { } p( r 1)/2 2 r 1+ ca P p 1. ( r 1) 1/2 r 1/( r 1) Hence when c A > h( r) = r 1/( r 1) 1, we have 1/BF F (A+1):1 P p 0, which means consistency of BF F (A+1):1 under M A+1. Since h( r) 0 as r, BF F (A+1):1 has consistency when both p and r go to infinity. On the other hand, when c A < h( r), we have BF F (A+1):1 P p 0, which means inconsistency of BF F (A+1):1 under M A+1. Using (B.3) under M A+1 and (B.6), we have BF BIC (A+1):1 ( We see that for any fixed r, p r { ) } r p/2 r 1 (p r) 1/2 P p 1. r(1+ c A ) BF BIC (A+1):1 P p 0 which means inconsistency of BIC under M A+1. When r as well as p goes to, ( ) p/2 BF BIC p r (A+1):1 e(1+ c ) r (p r) 1/2 P p 1. A
19 Y. Maruyama/A Bayes factor 19 Hence if p approaches infinity faster than (e{1+ c A } r )/ r, we have BF BIC (A+1):1 P p, r 0, which means inconsistency of BIC even if r. On the other hand, if p approaches infinity slower than (e{1+ c A } r )/ r, we have which means consistency of BIC. 1/BF BIC (A+1):1 P p, r 0. Appendix C: Proof of Theorem 4.2 By following the proof of Theorem 3.1 in Appendix B, all of the proof are straightforward. Here we give the sketch of the proof for part II(b)i only. When p,q and r is fixed, W A P W B P W E P p,q nσ 2 c A, p,q nσ 2 c B, p,q nσ 2 1 1/ r, W AB\(A+B) P p,q nσ 2 c AB\(A+B) +1/ r. (C.1) Using (C.1), Stiring s formula given in (B.4) and lim p p 1/p = 1, we have Therefore if the ratio { } ( 2/(pq) BF F P p,q 1+ ca + c B + c AB\(A+B) (A+1):1 1+ c B + c AB\(A+B) { } ( 2/(pq) BF F P p,q 1+ ca + c B + c AB\(A+B) (B+1):1 1+ c A + c AB\(A+B) ) r, ) r, { } ( 2/(pq) BF F P p,q 1+ ca + c B + c AB\(A+B) (A+B+1):1 1+ c AB\(A+B) ) r, { } 2/(pq) BF F P p,q (1+ c A + c B + c ) r 1 AB\(A+B) (A+1)(B+1):1. r (1+ c AB\(A+B) ) r r { > 1+ c A + c B + c AB\(A+B), (C.2) BF F γ:1 BF F (A+1)(B+1):1 } 2/(pq), (C.3) for γ = A+1,B+1,A+B+1, approaches a positive constant strictly less than 1, which guarantees the consistency of BF F (A+1)(B+1):1 under (C.2).
20 Y. Maruyama/A Bayes factor 20 References Berger, J. O., Bernardo, J. M. and Sun, D. (2009). The formal definition of reference priors. Ann. Statist MR Berger, J. O., Ghosh, J. K. and Mukhopadhyay, N. (2003). Approximations and consistency of Bayes factors as model dimension grows. J. Statist. Plann. Inference MR Berger, J. O., Pericchi, L. R. and Varshavsky, J. A.(1998). Bayes factors and marginal distributions in invariant situations. Sankhyā Ser. A MR Christensen, R. (2011). Plane answers to complex questions, fourth ed. Springer Texts in Statistics. Springer, New York. The theory of linear models. MR Kass, R. E. and Raftery, A. E. (1995). Bayes factors. J. Amer. Statist. Assoc Liang, F.,Paulo, R.,Molina, G.,Clyde, M. A.andBerger, J. O.(2008). Mixtures of g priors for Bayesian variable selection. J. Amer. Statist. Assoc MR Maruyama, Y. (2009). A Bayes factor with reasonable model selection consistency for ANOVA model. Arxiv v1 [stat.me]. Maruyama, Y. and George, E. I. (2011). Fully Bayes factors with a generalized g-prior. Ann. Statist Moreno, E., Girón, F. J. and Casella, G. (2010). Consistency of objective Bayes factors as the model dimension grows. Ann. Statist Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist MR Stone, M. (1979). Comments on Model Selection Criteria of Akaike and Schwarz. J. R. Stat. Soc. Ser. B Stat. Methodol Wang, M. and Sun, X. (2012). Bayes Factor Consistency for Unbalanced ANOVA Models. Arxiv v1 [stat.me]. Zellner, A. (1986). On assessing prior distributions and Bayesian regression analysis with g-prior distributions. In Bayesian inference and decision techniques. Stud. Bayesian Econometrics Statist North-Holland, Amsterdam. MR Zellner, A. and Siow, A. (1980). Posterior Odds Ratios for Selected Regression Hypotheses. In Bayesian Statistics: Proceedings of the First International Meeting held in Valencia (Spain) (J. M. Bernardo, M. H. DeGroot, D. V. Lindley and A. F. M. Smith, eds.) University of Valencia.
An Extended BIC for Model Selection
An Extended BIC for Model Selection at the JSM meeting 2007 - Salt Lake City Surajit Ray Boston University (Dept of Mathematics and Statistics) Joint work with James Berger, Duke University; Susie Bayarri,
More informationDivergence Based priors for the problem of hypothesis testing
Divergence Based priors for the problem of hypothesis testing gonzalo garcía-donato and susie Bayarri May 22, 2009 gonzalo garcía-donato and susie Bayarri () DB priors May 22, 2009 1 / 46 Jeffreys and
More information7. Estimation and hypothesis testing. Objective. Recommended reading
7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing
More informationBayesian Assessment of Hypotheses and Models
8 Bayesian Assessment of Hypotheses and Models This is page 399 Printer: Opaque this 8. Introduction The three preceding chapters gave an overview of how Bayesian probability models are constructed. Once
More information1 Hypothesis Testing and Model Selection
A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection
More information7. Estimation and hypothesis testing. Objective. Recommended reading
7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing
More informationLecture 5: Conventional Model Selection Priors
Lecture 5: Conventional Model Selection Priors Susie Bayarri University of Valencia CBMS Conference on Model Uncertainty and Multiplicity July 23-28, 2012 Outline The general linear model and Orthogonalization
More informationSome Curiosities Arising in Objective Bayesian Analysis
. Some Curiosities Arising in Objective Bayesian Analysis Jim Berger Duke University Statistical and Applied Mathematical Institute Yale University May 15, 2009 1 Three vignettes related to John s work
More informationST 740: Model Selection
ST 740: Model Selection Alyson Wilson Department of Statistics North Carolina State University November 25, 2013 A. Wilson (NCSU Statistics) Model Selection November 25, 2013 1 / 29 Formal Bayesian Model
More informationExtending Conventional priors for Testing General Hypotheses in Linear Models
Extending Conventional priors for Testing General Hypotheses in Linear Models María Jesús Bayarri University of Valencia 46100 Valencia, Spain Gonzalo García-Donato University of Castilla-La Mancha 0071
More informationInvariant HPD credible sets and MAP estimators
Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in
More informationg-priors for Linear Regression
Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,
More informationA simple two-sample Bayesian t-test for hypothesis testing
A simple two-sample Bayesian t-test for hypothesis testing arxiv:159.2568v1 [stat.me] 8 Sep 215 Min Wang Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, USA and Guangying
More informationBayesian Variable Selection Under Collinearity
Bayesian Variable Selection Under Collinearity Joyee Ghosh Andrew E. Ghattas. June 3, 2014 Abstract In this article we provide some guidelines to practitioners who use Bayesian variable selection for linear
More informationBayesian Model Averaging
Bayesian Model Averaging Hoff Chapter 9, Hoeting et al 1999, Clyde & George 2004, Liang et al 2008 October 24, 2017 Bayesian Model Choice Models for the variable selection problem are based on a subset
More informationA REVERSE TO THE JEFFREYS LINDLEY PARADOX
PROBABILITY AND MATHEMATICAL STATISTICS Vol. 38, Fasc. 1 (2018), pp. 243 247 doi:10.19195/0208-4147.38.1.13 A REVERSE TO THE JEFFREYS LINDLEY PARADOX BY WIEBE R. P E S T M A N (LEUVEN), FRANCIS T U E R
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationIntegrated Objective Bayesian Estimation and Hypothesis Testing
Integrated Objective Bayesian Estimation and Hypothesis Testing José M. Bernardo Universitat de València, Spain jose.m.bernardo@uv.es 9th Valencia International Meeting on Bayesian Statistics Benidorm
More informationDivergence Based Priors for Bayesian Hypothesis testing
Divergence Based Priors for Bayesian Hypothesis testing M.J. Bayarri University of Valencia G. García-Donato University of Castilla-La Mancha November, 2006 Abstract Maybe the main difficulty for objective
More informationLecture 7 October 13
STATS 300A: Theory of Statistics Fall 2015 Lecture 7 October 13 Lecturer: Lester Mackey Scribe: Jing Miao and Xiuyuan Lu 7.1 Recap So far, we have investigated various criteria for optimal inference. We
More informationSeminar über Statistik FS2008: Model Selection
Seminar über Statistik FS2008: Model Selection Alessia Fenaroli, Ghazale Jazayeri Monday, April 2, 2008 Introduction Model Choice deals with the comparison of models and the selection of a model. It can
More informationMath 423/533: The Main Theoretical Topics
Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)
More informationOverall Objective Priors
Overall Objective Priors Jim Berger, Jose Bernardo and Dongchu Sun Duke University, University of Valencia and University of Missouri Recent advances in statistical inference: theory and case studies University
More informationMixtures of Prior Distributions
Mixtures of Prior Distributions Hoff Chapter 9, Liang et al 2007, Hoeting et al (1999), Clyde & George (2004) November 10, 2016 Bartlett s Paradox The Bayes factor for comparing M γ to the null model:
More informationMixtures of Prior Distributions
Mixtures of Prior Distributions Hoff Chapter 9, Liang et al 2007, Hoeting et al (1999), Clyde & George (2004) November 9, 2017 Bartlett s Paradox The Bayes factor for comparing M γ to the null model: BF
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationLecture 6: Linear models and Gauss-Markov theorem
Lecture 6: Linear models and Gauss-Markov theorem Linear model setting Results in simple linear regression can be extended to the following general linear model with independently observed response variables
More informationBayesian Inference. Chapter 9. Linear models and regression
Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationAn Overview of Objective Bayesian Analysis
An Overview of Objective Bayesian Analysis James O. Berger Duke University visiting the University of Chicago Department of Statistics Spring Quarter, 2011 1 Lectures Lecture 1. Objective Bayesian Analysis:
More informationLinear Models A linear model is defined by the expression
Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose
More informationIllustrating the Implicit BIC Prior. Richard Startz * revised June Abstract
Illustrating the Implicit BIC Prior Richard Startz * revised June 2013 Abstract I show how to find the uniform prior implicit in using the Bayesian Information Criterion to consider a hypothesis about
More informationST 740: Linear Models and Multivariate Normal Inference
ST 740: Linear Models and Multivariate Normal Inference Alyson Wilson Department of Statistics North Carolina State University November 4, 2013 A. Wilson (NCSU STAT) Linear Models November 4, 2013 1 /
More informationA Bayesian Treatment of Linear Gaussian Regression
A Bayesian Treatment of Linear Gaussian Regression Frank Wood December 3, 2009 Bayesian Approach to Classical Linear Regression In classical linear regression we have the following model y β, σ 2, X N(Xβ,
More informationThe joint posterior distribution of the unknown parameters and hidden variables, given the
DERIVATIONS OF THE FULLY CONDITIONAL POSTERIOR DENSITIES The joint posterior distribution of the unknown parameters and hidden variables, given the data, is proportional to the product of the joint prior
More informationSupplementary materials for Scalable Bayesian model averaging through local information propagation
Supplementary materials for Scalable Bayesian model averaging through local information propagation August 25, 2014 S1. Proofs Proof of Theorem 1. The result follows immediately from the distributions
More informationModel Choice. Hoff Chapter 9, Clyde & George Model Uncertainty StatSci, Hoeting et al BMA StatSci. October 27, 2015
Model Choice Hoff Chapter 9, Clyde & George Model Uncertainty StatSci, Hoeting et al BMA StatSci October 27, 2015 Topics Variable Selection / Model Choice Stepwise Methods Model Selection Criteria Model
More informationHarrison B. Prosper. CMS Statistics Committee
Harrison B. Prosper Florida State University CMS Statistics Committee 08-08-08 Bayesian Methods: Theory & Practice. Harrison B. Prosper 1 h Lecture 3 Applications h Hypothesis Testing Recap h A Single
More informationA NONINFORMATIVE BAYESIAN APPROACH FOR TWO-STAGE CLUSTER SAMPLING
Sankhyā : The Indian Journal of Statistics Special Issue on Sample Surveys 1999, Volume 61, Series B, Pt. 1, pp. 133-144 A OIFORMATIVE BAYESIA APPROACH FOR TWO-STAGE CLUSTER SAMPLIG By GLE MEEDE University
More information10. Exchangeability and hierarchical models Objective. Recommended reading
10. Exchangeability and hierarchical models Objective Introduce exchangeability and its relation to Bayesian hierarchical models. Show how to fit such models using fully and empirical Bayesian methods.
More informationConsistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis
Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis Yasunori Fujikoshi and Tetsuro Sakurai Department of Mathematics, Graduate School of Science,
More informationBayesian Variable Selection Under Collinearity
Bayesian Variable Selection Under Collinearity Joyee Ghosh Andrew E. Ghattas. Abstract In this article we highlight some interesting facts about Bayesian variable selection methods for linear regression
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationOn the use of non-local prior densities in Bayesian hypothesis tests
J. R. Statist. Soc. B (2010) 72, Part 2, pp. 143 170 On the use of non-local prior densities in Bayesian hypothesis tests Valen E. Johnson M. D. Anderson Cancer Center, Houston, USA and David Rossell Institute
More informationBayesian Linear Regression
Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective
More informationMixtures of g-priors for Bayesian Variable Selection
Mixtures of g-priors for Bayesian Variable Selection January 8, 007 Abstract Zellner s g-prior remains a popular conventional prior for use in Bayesian variable selection, despite several undesirable consistency
More informationDefault priors and model parametrization
1 / 16 Default priors and model parametrization Nancy Reid O-Bayes09, June 6, 2009 Don Fraser, Elisabeta Marras, Grace Yun-Yi 2 / 16 Well-calibrated priors model f (y; θ), F(y; θ); log-likelihood l(θ)
More informationModel comparison and selection
BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)
More informationA CONDITION TO OBTAIN THE SAME DECISION IN THE HOMOGENEITY TEST- ING PROBLEM FROM THE FREQUENTIST AND BAYESIAN POINT OF VIEW
A CONDITION TO OBTAIN THE SAME DECISION IN THE HOMOGENEITY TEST- ING PROBLEM FROM THE FREQUENTIST AND BAYESIAN POINT OF VIEW Miguel A Gómez-Villegas and Beatriz González-Pérez Departamento de Estadística
More informationDavid Giles Bayesian Econometrics
David Giles Bayesian Econometrics 1. General Background 2. Constructing Prior Distributions 3. Properties of Bayes Estimators and Tests 4. Bayesian Analysis of the Multiple Regression Model 5. Bayesian
More informationPower-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models
Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models D. Fouskakis, I. Ntzoufras and D. Draper December 1, 01 Summary: In the context of the expected-posterior prior (EPP) approach
More informationBayesian methods in economics and finance
1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors Linear Regression 2/26 Linear regression Model for relationship between (several) independent
More informationObjective Bayesian Hypothesis Testing
Objective Bayesian Hypothesis Testing José M. Bernardo Universitat de València, Spain jose.m.bernardo@uv.es Statistical Science and Philosophy of Science London School of Economics (UK), June 21st, 2010
More informationBAYESIAN MODEL CRITICISM
Monte via Chib s BAYESIAN MODEL CRITICM Hedibert Freitas Lopes The University of Chicago Booth School of Business 5807 South Woodlawn Avenue, Chicago, IL 60637 http://faculty.chicagobooth.edu/hedibert.lopes
More informationMixtures of g-priors for Bayesian Variable Selection
Mixtures of g-priors for Bayesian Variable Selection Feng Liang, Rui Paulo, German Molina, Merlise A. Clyde and Jim O. Berger August 8, 007 Abstract Zellner s g-prior remains a popular conventional prior
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2
Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate
More informationBayesian inference. Justin Chumbley ETH and UZH. (Thanks to Jean Denizeau for slides)
Bayesian inference Justin Chumbley ETH and UZH (Thanks to Jean Denizeau for slides) Overview of the talk Introduction: Bayesian inference Bayesian model comparison Group-level Bayesian model selection
More informationMore Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction
Sankhyā : The Indian Journal of Statistics 2007, Volume 69, Part 4, pp. 700-716 c 2007, Indian Statistical Institute More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order
More informationStat260: Bayesian Modeling and Inference Lecture Date: March 10, 2010
Stat60: Bayesian Modelin and Inference Lecture Date: March 10, 010 Bayes Factors, -priors, and Model Selection for Reression Lecturer: Michael I. Jordan Scribe: Tamara Broderick The readin for this lecture
More informationDavid Giles Bayesian Econometrics
9. Model Selection - Theory David Giles Bayesian Econometrics One nice feature of the Bayesian analysis is that we can apply it to drawing inferences about entire models, not just parameters. Can't do
More informationPosterior Model Probabilities via Path-based Pairwise Priors
Posterior Model Probabilities via Path-based Pairwise Priors James O. Berger 1 Duke University and Statistical and Applied Mathematical Sciences Institute, P.O. Box 14006, RTP, Durham, NC 27709, U.S.A.
More information1 Priors, Contd. ISyE8843A, Brani Vidakovic Handout Nuisance Parameters
ISyE8843A, Brani Vidakovic Handout 6 Priors, Contd.. Nuisance Parameters Assume that unknown parameter is θ, θ ) but we are interested only in θ. The parameter θ is called nuisance parameter. How do we
More informationNovember 2002 STA Random Effects Selection in Linear Mixed Models
November 2002 STA216 1 Random Effects Selection in Linear Mixed Models November 2002 STA216 2 Introduction It is common practice in many applications to collect multiple measurements on a subject. Linear
More informationStat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.
Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Introduction to Markov chain Monte Carlo The Gibbs Sampler Examples Overview of the Lecture
More informationSTA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources
STA 732: Inference Notes 10. Parameter Estimation from a Decision Theoretic Angle Other resources 1 Statistical rules, loss and risk We saw that a major focus of classical statistics is comparing various
More informationBayesian Loss-based Approach to Change Point Analysis
Bayesian Loss-based Approach to Change Point Analysis Laurentiu Hinoveanu 1, Fabrizio Leisen 1, and Cristiano Villa 1 arxiv:1702.05462v2 stat.me] 7 Jan 2018 1 School of Mathematics, Statistics and Actuarial
More informationASSESSING A VECTOR PARAMETER
SUMMARY ASSESSING A VECTOR PARAMETER By D.A.S. Fraser and N. Reid Department of Statistics, University of Toronto St. George Street, Toronto, Canada M5S 3G3 dfraser@utstat.toronto.edu Some key words. Ancillary;
More informationChapter 5. Bayesian Statistics
Chapter 5. Bayesian Statistics Principles of Bayesian Statistics Anything unknown is given a probability distribution, representing degrees of belief [subjective probability]. Degrees of belief [subjective
More informationBayesian Hypothesis Testing: Redux
Bayesian Hypothesis Testing: Redux Hedibert F. Lopes and Nicholas G. Polson Insper and Chicago Booth arxiv:1808.08491v1 [math.st] 26 Aug 2018 First draft: March 2018 This draft: August 2018 Abstract Bayesian
More informationConsistency of test based method for selection of variables in high dimensional two group discriminant analysis
https://doi.org/10.1007/s42081-019-00032-4 ORIGINAL PAPER Consistency of test based method for selection of variables in high dimensional two group discriminant analysis Yasunori Fujikoshi 1 Tetsuro Sakurai
More informationThe binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution.
The binomial model Example. After suspicious performance in the weekly soccer match, 37 mathematical sciences students, staff, and faculty were tested for the use of performance enhancing analytics. Let
More informationFiducial Inference and Generalizations
Fiducial Inference and Generalizations Jan Hannig Department of Statistics and Operations Research The University of North Carolina at Chapel Hill Hari Iyer Department of Statistics, Colorado State University
More informationMODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY
ECO 513 Fall 2008 MODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY SIMS@PRINCETON.EDU 1. MODEL COMPARISON AS ESTIMATING A DISCRETE PARAMETER Data Y, models 1 and 2, parameter vectors θ 1, θ 2.
More informationarxiv: v1 [stat.ap] 27 Mar 2015
Submitted to the Annals of Applied Statistics A NOTE ON THE SPECIFIC SOURCE IDENTIFICATION PROBLEM IN FORENSIC SCIENCE IN THE PRESENCE OF UNCERTAINTY ABOUT THE BACKGROUND POPULATION By Danica M. Ommen,
More informationA noninformative Bayesian approach to domain estimation
A noninformative Bayesian approach to domain estimation Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu August 2002 Revised July 2003 To appear in Journal
More informationAn Encompassing Prior Generalization of the Savage-Dickey Density Ratio Test
An Encompassing Prior Generalization of the Savage-Dickey Density Ratio Test Ruud Wetzels a, Raoul P.P.P. Grasman a, Eric-Jan Wagenmakers a a University of Amsterdam, Psychological Methods Group, Roetersstraat
More informationPenalized Loss functions for Bayesian Model Choice
Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented
More informationMISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30
MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)
More informationIntroduction to Bayesian Methods
Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative
More informationPeter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8
Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall
More informationUniformly Most Powerful Bayesian Tests and Standards for Statistical Evidence
Uniformly Most Powerful Bayesian Tests and Standards for Statistical Evidence Valen E. Johnson Texas A&M University February 27, 2014 Valen E. Johnson Texas A&M University Uniformly most powerful Bayes
More informationNoninformative Priors for the Ratio of the Scale Parameters in the Inverted Exponential Distributions
Communications for Statistical Applications and Methods 03, Vol. 0, No. 5, 387 394 DOI: http://dx.doi.org/0.535/csam.03.0.5.387 Noninformative Priors for the Ratio of the Scale Parameters in the Inverted
More informationA Note on Hypothesis Testing with Random Sample Sizes and its Relationship to Bayes Factors
Journal of Data Science 6(008), 75-87 A Note on Hypothesis Testing with Random Sample Sizes and its Relationship to Bayes Factors Scott Berry 1 and Kert Viele 1 Berry Consultants and University of Kentucky
More informationGibbs Sampling in Linear Models #2
Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling
More informationA Few Special Distributions and Their Properties
A Few Special Distributions and Their Properties Econ 690 Purdue University Justin L. Tobias (Purdue) Distributional Catalog 1 / 20 Special Distributions and Their Associated Properties 1 Uniform Distribution
More informationRegression and Statistical Inference
Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF
More informationBTRY 4830/6830: Quantitative Genomics and Genetics
BTRY 4830/6830: Quantitative Genomics and Genetics Lecture 23: Alternative tests in GWAS / (Brief) Introduction to Bayesian Inference Jason Mezey jgm45@cornell.edu Nov. 13, 2014 (Th) 8:40-9:55 Announcements
More informationMA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems
MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions
More informationHypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33
Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett
More informationTesting Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box Durham, NC 27708, USA
Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box 90251 Durham, NC 27708, USA Summary: Pre-experimental Frequentist error probabilities do not summarize
More informationRegression. Oscar García
Regression Oscar García Regression methods are fundamental in Forest Mensuration For a more concise and general presentation, we shall first review some matrix concepts 1 Matrices An order n m matrix is
More informationarxiv: v1 [math.st] 2 Oct 2015
Model Selection and Multiple Testing - A Bayes and Empirical Bayes Overview and some New Results Ritabrata Dutta a, Malgorzata Bogdan ab and Jayanta K Ghosh ac a Department of Statistics, Purdue University
More informationMODEL AVERAGING by Merlise Clyde 1
Chapter 13 MODEL AVERAGING by Merlise Clyde 1 13.1 INTRODUCTION In Chapter 12, we considered inference in a normal linear regression model with q predictors. In many instances, the set of predictor variables
More informationMultivariate Statistical Analysis
Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 9 for Applied Multivariate Analysis Outline Addressing ourliers 1 Addressing ourliers 2 Outliers in Multivariate samples (1) For
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationBayesian Asymptotics
BS2 Statistical Inference, Lecture 8, Hilary Term 2008 May 7, 2008 The univariate case The multivariate case For large λ we have the approximation I = b a e λg(y) h(y) dy = e λg(y ) h(y ) 2π λg (y ) {
More informationDecision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over
Point estimation Suppose we are interested in the value of a parameter θ, for example the unknown bias of a coin. We have already seen how one may use the Bayesian method to reason about θ; namely, we
More informationCross-sectional space-time modeling using ARNN(p, n) processes
Cross-sectional space-time modeling using ARNN(p, n) processes W. Polasek K. Kakamu September, 006 Abstract We suggest a new class of cross-sectional space-time models based on local AR models and nearest
More informationBayesian Linear Models
Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public
More informationStatistical Inference
Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Spring, 2006 1. DeGroot 1973 In (DeGroot 1973), Morrie DeGroot considers testing the
More information