Bayesian Variable Selection and Computation for Generalized Linear Models with Conjugate Priors

Size: px

Start display at page:

Download "Bayesian Variable Selection and Computation for Generalized Linear Models with Conjugate Priors"

Rosaline Reeves
6 years ago
Views:

1 Bayean Analy (2008 3, Number 3, pp Bayean Varable Selecton and Computaton for Generalzed Lnear Model wth Conjugate Pror Mng-Hu Chen, Lan Huang, Joeph G. Ibrahm and Sungduk Km Abtract. In th paper, we conder theoretcal and computatonal connecton between x popular method for varable ubet electon n generalzed lnear model (GLM. Under the conjugate pror developed by Chen and Ibrahm (2003 for the generalzed lnear model, we obtan cloed form analytc relatonhp between the Baye factor (poteror model probablty, the Condtonal Predctve Ordnate (CPO, the L meaure, the Devance Informaton Crteron (DIC, the Akake Informaton Crteron (AIC, and the Bayean Informaton Crteron (BIC n the cae of the lnear model. Moreover, we examne computatonal relatonhp n the model pace for thee Bayean method for an arbtrary GLM under conjugate pror a well a examne the performance of the conjugate pror of Chen and Ibrahm (2003 n Bayean varable electon. Specfcally, we how that once Markov chan Monte Carlo (MCMC ample are obtaned from the full model, the four Bayean crtera can be multaneouly computed for all poble ubet model n the model pace. We llutrate our new methodology wth a mulaton tudy and a real dataet. Keyword: Baye factor, Condtonal Predctve Ordnate, Conjugate pror, L meaure, Poon regreon, Logtc regreon 1 Introducton Bayean varable electon tll one of the mot theoretcally and computatonally challengng problem encountered n practce due to ue regardng pror elctaton, analytc evaluaton of the model electon crteron, and numercal computaton of the crteron for all poble model n the model pace. Thee ue have been dcued by many author for varou lnear and generalzed lnear model ncludng George and McCulloch (1993, Laud and Ibrahm (1995, George et al. (1996, Raftery (1996, Smth and Kohn (1996, George and McCulloch (1997, Raftery et al. (1997, Brown et al. (1998, Brown et al. (2002, Clyde (1999, Chen et al. (1999, Dellaporta and Forter (1999, Ibrahm et al. (1999, Chpman et al. (1998, Chpman et al. (2001, Chpman et al. (2003, George (2000, George and Foter (2000, Department of Stattc, Unverty of Connectcut, Storr, CT, SRAB, Natonal Cancer Inttute, Rockvlle, MD, malto:huangla@mal.nh.gov Department of Botattc, Unverty of North Carolna, Chapel Hll, NC, malto:brahm@bo.unc.edu Dvon of Epdemology, Stattc and Preventon Reearch, Natonal Inttute of Chld Health and Human Development, Rockvlle, MD, malto:km2@mal.nh.gov c 2008 Internatonal Socety for Bayean Analy DOI: /08-BA323

2 586 Bayean Varable Selecton and Computaton Ibrahm et al. (2000, Ntzoufra et al. (2003, and Chen et al. (2003. Clyde and George (2004 preent an excellent revew artcle on Bayean model electon and uncertanty, and gve an excellent expoton of the theoretcal and computatonal ue nvolved n Bayean varable electon and Bayean model uncertanty n general. An entre monograph devoted to Bayean model electon gven by Lahr (2001. One of the mportant unreolved ue n Bayean model electon and Bayean varable electon n partcular what the analytc or emprcal connecton are between the varou method. For example, t not clear what the relatonhp between BIC and DIC, or DIC and the L meaure, and whether one a monotonc functon of the other, and whether one can compute BIC from DIC or vce vera. A related queton that f one ha MCMC ample from the full model, how can thoe ample be ued to obtan all four Bayean crtera mentoned above. To anwer thee queton, we nvetgate the followng n th paper: ( for the normal lnear model wth conjugate pror, we obtan analytc relatonhp between the Baye factor, CPO, the L meaure, DIC, AIC, and BIC, and ( for the cla of GLM we how va the development of everal theorem and dentte how one can compute all of thee Bayean crtera multaneouly ung only an MCMC ample from the full model. The relatonhp obtaned n ( for the lnear model hed lght on the behavor and connecton between thee crtera for GLM. The development of ( above mportant and ueful nce t etablhe the computatonal relatonhp n the model pace for each of the four Bayean crtera and how that for varable ubet electon n GLM ung the conjugate pror of Chen and Ibrahm (2003, we can compute the four Bayean crtera for all poble 2 p ubet model ung only an MCMC ample from the full model wth p covarate. Another mportant ue we examne n th paper the performance of the conjugate pror propoed by Chen and Ibrahm (2003 n Bayean varable ubet electon. We demontrate that thee pror perform qute well n th context, and they are eay to pecfy and computatonally feable. The ret of th paper organzed a follow. Secton 2 gve formula for each of the crtera under the conjugate pror of Chen and Ibrahm (2003 for GLM and Secton 3 develop the theoretcal connecton between the x crtera for the normal lnear model. Secton 4 etablhe the computatonal connecton n the model pace for the four Bayean crtera and everal key dentte and theorem that are needed. Secton 5 preent a detaled mulaton tudy examnng varou properte of the x crtera, and Secton 6 preent a real data example. We conclude the artcle wth bref remark n Secton 7. All proof are gven n the Appendx. 2 The Method 2.1 Model and Notaton Suppoe that (x, y, = 1, 2,..., n are ndependent obervaton, where y the repone varable, and x = (1, x 1,, x k a (k random vector of covarate. Let M denote the model pace. We enumerate the model n M by m =

3 Chen, Huang, Ibrahm, and Km 587 1, 2,..., K, where K the dmenon of M and model K denote the full model. Alo, let β (K = (β 0, β 1,..., β k denote the regreon coeffcent for the full model ncludng an ntercept, and let x (m and β (m denote k m 1 vector of covarate and regreon coeffcent for model m wth an ntercept, and a pecfc choce of k m 1 covarate. We wrte x = (x (m ( m, x, and β (K = (β (m, β ( m, where x ( m x wth x (m deleted and β ( m β (K wth β (m deleted. Under model m, the generalzed lnear model (GLM aumed for [y x (m ], whch ha the condtonal denty gven by [ ] f(y x (m, β (m, τ = exp a 1 (τy θ (m b(θ (m + c(y, τ, = 1, 2,..., n, (1 where θ (m = θ(η (m the canoncal parameter, η (m = x (m β (m, and τ a dperon parameter. The functon a, b and c determne a partcular famly n the cla. The functon a (τ are commonly of the form a (τ = τ 1 w 1, where the w are known weght. For eae of expoton, we aume throughout that τ = 1 and w = 1, a, for example, n logtc and Poon regreon. The method propoed here can be ealy extended to the cae when τ unknown. Under th aumpton, (1 can be rewrtten a f(y x (m, β (m = exp y θ (m b(θ (m + c(y, = 1, 2,..., n. (2 2.2 Pror and Poteror In the context of Bayean varable electon, a pror dtrbuton for β (m need to be pecfed for each model n the model pace M. To th end, we conder a conjugate pror for the GLM propoed by Chen and Ibrahm (2003. Under model m, the conjugate pror of the form π(β (m y 0, a 0, m n [ ] exp a 0 y 0 θ (m b(θ (m =1 [ ] = exp a 0 y 0θ (m J b(θ (m, (3 where a 0 > 0 a calar pror parameter, y 0 = (y 01,..., y 0n an n 1 vector of pror parameter, J an n 1 vector of one, and b(θ (m = (b(θ (m 1,..., b(θ n (m an n 1 vector of the b(θ (m. A dcued n Chen and Ibrahm (2003, y 0 can be vewed a a pror predcton for the margnal mean of y at x. Thu, n elctng y 0, the uer mut focu on a predcton (or gue for E(y, whch narrow the poblte for choong y 0. Moreover, the pecfcaton of all y 0 equal ha an appealng nterpretaton. A pror pecfcaton wth y 01 =... = y 0n mple a pror n whch the pror mode of the lope n the regreon model are the ame, but the pror mode of ntercept n the regreon model vary. For example, a pror wth y 0 = 0.5 wll have the ame mode of lope but a dfferent mode of ntercept than a pror wth y 0 = 0.1. Th ntutvely appealng nce n th cae the pror predcton on y 0 doe not depend on the th ubject pecfc nformaton. Mathematcally, th reult wa etablhed n Chen and Ibrahm (2003.

4 588 Bayean Varable Selecton and Computaton The detal are a follow. Suppoe we drop model ndex m. Let µ 0 be any prepecfed p 1 vector, where p = k + 1. Suppoe we take y 0 = ḃ(θ = ḃ(θ(xµ 0, where ḃ(θ the gradent vector of b(θ. Then, the conjugate pror yeld a pror mode of β equal to µ 0. Now we can ee that µ 0 = (β 0, 0,..., 0 yeld y 01 = y 02 = = y 0n = ḃ(θ(β 0. On the other hand, a under ome mld condton, the pror mode unque, and, hence, the pecfcaton of y 0 = y 0 1 lead to the pror mode µ 0 = (β 0, 0,..., 0, where β 0 atfe ḃ(θ(β 0 = y 0. For ntance, under normal lnear regreon, we can how that the pror mode µ 0 of β gven by If we pecfy y 0 = y 0 1, we have µ 0 = (X X 1 X y 0. µ 0 = (y 0, 0, 0,..., 0, whch mple that all the lope are 0 whle the ntercept equal to y 0. Th attractve feature allow u to do entvty analye by varyng the ntercept n the pror. The parameter a 0 n (3 can be generally vewed a a precon parameter that quantfe the trength of our pror belef n y 0. In the context of Bayean varable electon, (3 pecfe the pror for all model n M n an automatc and ytematc fahon. Although varou theoretcal properte of (3 were examned n Chen and Ibrahm (2003 n a great detal, t not clear how well th type of the pror perform n the context of Bayean varable electon. Now, under model m, the poteror dtrbuton of β (m wth the conjugate pror (3 gven by π(β (m D, m exp y θ (m J b(θ (m π(β (m y 0, a 0, m exp (y + a 0 y 0 θ (m (1 + a 0 J b(θ (m, (4 where D = (y, x, = 1, 2,..., n denote the oberved data. From (4, we can ee that under the conjugate pror, the reultng poteror ha a very attractve form. Furthermore, when a 0 0, the poteror π(β (m D, m n (4 reduce to π(β (m D, m exp y θ (m J b(θ (m, whch the poteror dtrbuton baed on an mproper unform pror for β (m. 2.3 Varable Selecton Crtera In th ecton, we conder four Bayean model aement crtera, namely, Condtonal Predctve Ordnate (CPO tattc (Geer (1993; Gelfand et al. (1992;

5 Chen, Huang, Ibrahm, and Km 589 and Gelfand and Dey (1994, L meaure (Ibrahm and Laud (1994; Laud and Ibrahm (1995; Gelfand and Ghoh (1998; Ibrahm et al. (2001a; and Chen et al. (2004, Devance Informaton Crteron (DIC (Spegelhalter et al. (2002, and margnal lkelhood (Baye factor. The CPO, L meaure, and DIC are crteron baed method whch can be attractve n the ene that they are well defned under mproper pror a long a the poteror dtrbuton proper, and thu have an advantage over the margnal lkelhood or Baye factor approach n th ene. Becaue of th reaon, thee three crteron baed method can be drectly compared to AIC (Akake (1973 and BIC (Schwarz (1978. On the other hand, the margnal lkelhood or the Baye factor well calbrated and relatvely eay to nterpret, but generally entve to vague proper pror. In the context of varable electon, t not clear how thee method perform under the conjugate pror gven n (3 for the GLM. Under model m, for the th obervaton, we defne the CPO tattc a follow: CPO = f(y x, D ( = f(y x (m, β (m π(β (m D (, mdβ (m, where D ( D wth the th obervaton deleted, and π(β D (,m the poteror dtrbuton baed on the data D (. Due to the contructon of the conjugate pror (3, t more natural to defne π(β (m D (, m exp (y j + a 0 y 0j θ (m j (1 + a 0 b(θ (m j. j After ome mey algebra, we can how that CPO take the followng form: CPO = f(y x, D ( = 1 exp[a 0 y 0 θ (m 1 f(y x (m, β (m exp[a 0 y 0 θ (m b(θ (m ] π(β(m D, mdβ (m b(θ (m ] π(β(m D, mdβ (m, (5 where f(y x (m, β (m the denty functon gven n (2. Alo, we notce that the CPO defned n (5 lghtly dfferent from the uual CPO (Geer (1993 and Gelfand et al. (1992, whch of the form 1 1 f(y x (m, β (m π(β(m D, mdβ (m. However, thee two form wll be dentcal a a 0 0. A uggeted n Ibrahm et al. (2001b, a natural ummary tattc of the CPO the logarthm of the Peudomargnal lkelhood (LPML defned a LPML m = n log(cpo. =1

6 590 Bayean Varable Selecton and Computaton We wll ue LPML m a a crteron-baed meaure for varable electon. The L meaure crteron another ueful tool for model comparon and varable electon. The L meaure contructed from the poteror predctve dtrbuton of the data. For the entre cla of GLM n (2, under model m, the L meaure defned a: L m (ν = n [ =1 +ν Eb (θ (m m =1 ] D, m + Varb (θ (m D, m [ Eb (θ (m D, m y ] 2, (6 where b (. and b (. are the mean and varance functon of the GLM n (2, and all expectaton and varance are taken wth repect to the poteror dtrbuton π(β (m D, m n (4. We note that for the GLM n (1, we need to modfy L m (ν n (6 accordngly, and n th cae, the L meaure take the form where L m (ν = n [ ] Ea (τb (θ (m D, m + Varb (θ (m D, m =1 +ν m =1 [ Eb (θ (m D, m y ] 2. (7 The DIC crteron, propoed by Spegelhalter et al. (2002, gven by DIC m = D( β (m + 2p (m D, (8 p (m D = D(β(m D( β (m, β (m = E[β (m D, m], and D(β (m = E[D(β (m D, m]. For the GLM n (2, under model m, n D(β (m = 2 y θ (m b(θ (m. (9 =1 Smlar to (6, under the GLM n (1, D(β (m need to be modfed accordngly. In the prt of margnal lkelhood, after gnorng the contant hared by all varable ubet model n model pace M for the GLM n (2, for the purpoe of varable ubet electon t uffce to compute the poteror normalzng contant C m (D = exp (y + a 0 y 0 θ (m (1 + a 0 J b(θ (m dβ (m (10 and the pror normalzng contant [ ] C 0m (y 0 = exp a 0 y 0θ (m J b(θ (m dβ (m. (11

7 Chen, Huang, Ibrahm, and Km 591 Smlar to the modfcaton of (6 yeldng (7, under the GLM n (1, D(β (m n (9, C m (D n (10, and C 0m (y 0 n (11 need to be modfed accordngly. In the context of varable electon, we elect a varable ubet model whch yeld the larget LPML m under the CPO, the mallet L m (ν under the L meaure, the mallet DIC m under the DIC, and the larget C m (D/C 0m (y 0 or log[c m (D/C 0m (y 0 ] under the margnal lkelhood. 3 Analytc Connecton Between Varable Selecton Crtera For the Normal Lnear Regreon Model In th ecton, we conder the normal lnear regreon model gven by f(y x (m, β (m, τ = τ 1/2 exp τ 2 (y x (m β (m 2. (12 Let X m = (x (m 1, x (m 2,..., x n (m, whch the degn matrx for the normal lnear regreon under model m. Aume X m of full rank k m throughout. We focu only on the τ known cae a analytcal connecton are more dffcult to etablh when τ unknown. For the model n (12 wth a known τ, the conjugate pror for β (m n (3 reduce to [β (m y 0, a 0, m] N km ((X m X m 1 X m y 1 0, (X m τa X m 1, (13 0 and the poteror dtrbuton for β (m gven by [β (m D, m] N km ((X mx m 1 X m y + a 0 y 0 1, 1 + a 0 τ(1 + a 0 (X mx m 1. For (12, AIC and BIC under model m are gven by AIC m = 2 log L(ˆβ (m ( τ D + 2k m = n log + τ SSE m + 2k m, (14 where ˆβ (m the maxmum lkelhood etmate of β (m and the uual um of quared error, and SSE m = y I X m (X mx m 1 X m y BIC m = 2 log L(ˆβ (m D + log(nk m = n log ( τ + τ SSE m + log(nk m. (15 After ome algebra, we can how that after puttng back all normalzng contant, the logarthm of the margnal lkelhood under model m gven by logc m (D/C 0m (y 0 = n 2 log ( τ τ 2 y y + τ(1 + a 0 2 τa 0 2 y 0 X m(x m X m 1 X m y 0 + ( y + a0 y Xm ( 0 y + (X 1 + a mx m 1 X m a0 y a 0 ( 1 2 log a 0 k m. ( a 0

8 592 Bayean Varable Selecton and Computaton When y 0 = 0, the conjugate pror n (13 reduce to Zellner g-pror (Zellner (1986. For th pecal cae, (16 become log[c m (D/C 0m (0] = n 2 log ( τ τa 0 2(1 + a 0 y y τ ( 1 2(1 + a 0 SSE m + 2 log a 0 k m. ( a 0 Thu, we have [ M m (a 0 2(1 + a 0 logc m (D/C 0m (0 + τa ] 0 2(1 + a 0 y y ( τ + a 0 n log ( τ = n log + τsse m + (1 + a 0 log 1 + a 0 k m. (18 a 0 For purpoe of varable electon, t uffce to compare M m (a 0 and we then chooe a model wth the mallet M m (a 0. From (18, we can ee that M m (a 0 = AICm f (1 + a 0 log 1+a0 a 0 = 2, BIC m f (1 + a 0 log 1+a0 a 0 = log n. (19 For (12, we ue (7 to compute L m (ν. E[a (τb (θ (m D, m] = 1 τ, Var b (θ (m D, m = Var x (m β (m D, m = x (m Var(β (m D, mx (m = In partcular, we have a (τ = 1/τ, 1 τ(1 + a 0 x(m (X m X m 1 x (m, and Eb (θ (m D, m = Ex (m β (m D, m = x (m (X m X m 1 X m y+a0y0 1+a 0. Thu, we obtan L m (ν = n τ + 1 n x (m (X τ(1 + a 0 m X m 1 x (m =1 n +ν y x (m (X m X m 1 X y + a 0 y 2 0 m 1 + a =1 0 = n τ + 1 [ τ(1 + a 0 k m + ν y X m (X mx m 1 X m y + a 0 y a 0 y X m (X m X m 1 X m y + a 0 y ] 0. ( a 0 When y 0 = 0, (20 reduce to Wrte L m (ν = n τ + 1 τ(1 + a 0 k m + νa2 0 (1 + a 0 2 y y + ν(1 + 2a 0 (1 + a 0 2 SSE m. (21 L m (ν, a 0 = τ(1 + a 0 2 L m (ν n ( ν(1 + 2a 0 τ νa2 0 τ (1 + a 0 2 y y n log. (22

9 Chen, Huang, Ibrahm, and Km 593 Ung (21 and (22, we obtan and hence ( τ L m (ν, a 0 = n log + τ SSE m a 0 ν(1 + 2a 0 k m, 1+a AICm f 0 ν(1+2a L m (ν, a 0 = = 2, 0 BIC m f = log n. 1+a 0 ν(1+2a 0 Note that n the context of varable electon, a model wth the mallet L m (ν the ame model that ha the mallet L m (ν, a 0. Thu, n th ene, the L meaure can be equvalent to AIC or BIC by approprately tunng (ν, a 0. It nteretng to menton that n order to acheve L m (ν, a 0 = AIC m or L m (ν, a 0 = BIC m, ν mut be mall, and hence when ν = 1, the L meaure alway ha a maller dmenonal penalty than both AIC and BIC. Unlke the margnal lkelhood, a 0 play a mnmum role n controllng dmenonal penalty n the L meaure. When y 0 = 0, the poteror mean of β (m gven by β (m = 1 1+a 0 (X m X m 1 X ( m y. Thu, we have D(β (m = n log + τ(y X m β (m (y X m β (m, τ D(β (m ( τ [ y = E[D(β (m D, m] = n log + τe Xm β(m Xm (β (m β (m y X β(m m Xm (β (m β (m ] D, m ( τ = n log + 1 k m + τa a 0 (1 + a 0 2 y y + τ(1 + 2a 0 (1 + a 0 2 SSE m, (23 and ( D( β (m τ = n log = n log ( τ + τ(y X m β(m (y X m β(m + τa2 0 (1 + a 0 2 y y + τ(1 + 2a 0 (1 + a 0 2 SSE m. (24 Combnng (23 and (24 gve Thu, the DIC m for (12 gven by p (m D = D(β(m D( β (m = a 0 k m. (25 ( τ DIC m = n log + τa2 0 (1 + a 0 2 y y + τ(1 + 2a 0 (1 + a 0 2 SSE m + 2 k m. ( a 0 Wrte DIC m(a 0 = (1 + a 0 2 ( DIC m τa a 0 (1 + a 0 2 y y + na2 0 τ log a 0

10 594 Bayean Varable Selecton and Computaton We have ( τ DIC m (a 0 = n log + τ SSE m + 2(1 + a 0 k m. ( a 0 Therefore, when a 0 = 0, DIC m(0 = DIC m = AIC m, and when a 0 > 0, 2(1+a0 1+2a 0 < 2, whch mple that DIC m (a 0 ha a maller dmenonal penalty than both AIC and BIC. Smlarly to DIC, we conder only y 0 = 0. From (5, we have LPML m = n log(cpo = =1 where CPO 1 = exp a0τ CPO 2 = 2 x (m β(m n log (CPO 1 =1 n log (CPO 2, (28 =1 x (m β (m π(β (m D, mdβ (m and ( τ 1/2 τ [ τ(1 + a0 exp 2 y2 exp 2 β (m x (m x (m β (m 2 ] β (m x (m y π(β (m D, mdβ (m 1 + a 0 for = 1, 2,..., n. After ome mey algebra, we obtan CPO 1 = 1 a 0 x (m (X m 1 + a X m 1 x (m 0 τa 0 exp 1 a0 and ( τ 1/2 CPO 2 = 1 x (m [ τ exp y x (m 2(1 + a 0 [ τ(x m y x (m exp 1/2 2(1 + a 0 2 y X m (X m X m 1 x (m 1+a 0 x (m x (m (X m X m 1 X m y (X mx m 1 x (m 1/2 ( (X mx m 1 x (m τ exp 2 y2 ] (X m X m 1 x (m y 2y X m (X m X m 1 x (m y y (X mx m 1 x (m 2(1 + a 0 1 x (m (m Let ˆβ = (X m X m 1 X m y, ŷ(m = x (m Pluggng CPO 1 and CPO 2 nto (28 yeld LPML m = n 2 log ( τ τ 2 τ 2(1 + a 0 τ 2(1 + a 0 n y =1 n =1 n =1 h (m (ŷ (m =1 x (m (X mx m 1 (X my x (m y (X m X m 1 x (m ˆβ (m, and h (m ]. = x (m (X m X m 1 x (m. n ( log(1 h (m log 1 a 0 h (m 1 + a 0 y 2 2y ŷ (m + h (m y 2 1 h (m τa 0 2(1 + a 0 2 n ŷ (m2 1 a0 1+a 0 h m =1. (29

11 Chen, Huang, Ibrahm, and Km 595 Ung Taylor expanon and after ome algebra, LPML m n (29 can be rewrtten a LPML m = n ( τ 2 log where R m = τ 2(1 + a n =1 j=2 n =1 1 τa 2 0 2(1 + a 0 2 y y τ(1 + 2a 0 2(1 + a 0 2 SSE m (y ŷ (m 2 h (m τa h (m 2(1 + a 0 2 ( a0 j ( 1 j h (m. 1 + a 0 j j n k m 2(1 + a 0 + R m, (30 a 0 h (m ŷ (m2 1 + a =1 0 1 a0 1+a 0 h (m Wrte LPML m = 2(1 + a 0 2 τa 2 ( 0 LPML m a 0 2(1 + a 0 2 y y + na2 0 τ log. ( a 0 Ung (30 and (31, we obtan ( τ LPML m = n log + τ SSE m a 0 (1 + 2a 0 k m + R m, where R m = 2(1+a02 1+2a 0 Rm. We chooe a model wth the mallet LPML m. Note that the remander term R m mall when all h (m are mall. From (14, (15, and (27, we ee that when R m mall and doe not vary much n the model pace M, LPML ha a maller dmenonal penalty than DIC, AIC and BIC. In addton, when a 0 = 0, LPML m n (30 content wth the one derved by Gelfand and Dey (1994 baed on the aymptotc approxmaton. Fnally, we note that the quantte defned n (18, (22, (27 and (31 are lnear tranformaton of thoe defned by (17, (21, (26 and (30, repectvely. In thee lnear tranformaton, the relevant coeffcent are ndependent of m. Thu, for the purpoe of varable ubet electon, thee lnearly tranformed quantte act exactly lke thoe orgnal form. Wth (18, (22, (27 and (31, we can much more clearly ee the analytcal connecton to AIC and BIC. We alo note that George and Foter (2000 provded ome mlar connecton between model electon probablte and varou model electon crtera for th etup. 4 Computatonal Development: Theory and Implementaton For the purpoe of varable electon, we need to compute LPML m, L m (ν, DIC m, C m (D and C 0m (y 0 for the Bayean varable electon crtera decrbed n the prevou ecton for m = 1, 2,..., K. Due to the complexty and generalty of the GLM n (2, the analytcal evaluaton of thee meaure doe not appear poble. Thu, a Monte

12 596 Bayean Varable Selecton and Computaton Carlo (MC baed method requred for each of thoe meaure under conderaton. However, the MC method currently avalable n the Bayean computatonal lterature requre a Markov chan Monte Carlo (MCMC ample from the poteror dtrbuton π(β (m D, m n (4 under each varable ubet model m. When the number of the model n M large, amplng from the poteror dtrbuton under each varable ubet model can be expenve. Thu, the computaton of thee four meaure for all ubmodel become a dffcult and challengng tak. Therefore, the development of an effcent Monte Carlo method for varable electon for the GLM very eental. After examnng (5, (6, and (8, we oberve that there a common feature n computng LPML m, L m (ν, and DIC m,.e., all of thee three meaure requre to compute g m = Eg(β (m D, m, for varou functon g, where the expectaton taken wth repect to the jont poteror dtrbuton n (4 under model m. Specfcally, the functon requred n thee calculaton nclude ( ( g(β (m = exp[ a 0 y 0 θ (m b(θ (m ] and g(β (m = f(y x (m, β (m exp[a 0 y 0 θ (m b(θ (m ] 1 for LPMLm ; ( g(β (m = b (θ (m and, g(β (m = b (θ (m ( g(β (m = β (m and g(β (m = D(β (m for DIC m. 2, and g(β (m = b (θ (m for L m (ν; Wrte L(β (m D, m = exp (y + a 0 y 0 θ (m (1 + a 0 J b(θ (m under model m and let L(β D = L(β (K D, K, C(D = C K (D, and C 0 (y 0 = C 0K (y 0 under the full model. Here, we abue the notaton a lttle bt a L(β (m D, m not a lkelhood functon n the uual ene. Then, for a gven functon g, mathematcally, we have g m = E[g(β (m D, m] = g(β (m L(β(m D, m dβ (m, C m (D where C m (D defned n (10. Now, we preent a ueful dentty for g m, whch formally tated n the followng theorem. Theorem 5. For any gven functon g, uch that E[ g(β (m D, m] <, we have g m = C(D C m (D E g(β (m L(β(m D, mw(β ( m β (m D, (32 L(β D where the expectaton taken wth repect to the jont poteror dtrbuton n (4 under the full model. Here, w(β ( m β (m a completely known condtonal denty,

13 Chen, Huang, Ibrahm, and Km 597 whoe upport contaned n, or equal to, the upport of the condtonal denty of β ( m gven β (m wth repect to the jont poteror dtrbuton n (4 under the full model. Obervng that when g 1, we have 1 = C(D L(β (m C m (D E D, mw(β ( m β (m D, L(β D whch lead to C m (D L(β (m C(D = E D, mw(β ( m β (m D L(β D (33 and g m = g(β E (m L(β (m D,mw(β ( m β (m D L(β D. (34 L(β E (m D,mw(β ( m β (m D L(β D It nteretng to menton that the dentty (33 a by-product of th dervaton and th dentty can be ued to compute the poteror normalzng contant under model m. The dentte (33 and (34 play an mportant role n developng a novel Monte Carlo method for computng LPML m, L m (ν, DIC m, and C m (D multaneouly ung a ngle MCMC ample from the jont poteror dtrbuton under the full model. Toward th goal, we let β = (β (m, β( m, = 1, 2,..., S denote a MCMC ample from the jont poteror dtrbuton (4 under the full model, where S the MCMC ample ze. Then, an etmate of g m gven by ĝ m = S =1 S =1 g(β (m L(β (m L(β (m D,mw(β ( m L(β D D,mw(β ( m L(β D β (m β (m Under certan regularty condton, uch a ergodcty, we have whch ndcate that ĝ m content. and Lettng A S = 1 S S =1 B S = 1 S g(β (m S =1 lm S ĝm = g m, L(β (m L(β (m D, mw(β ( m L(β D D, mw(β ( m L(β D. (35 β (m (36 β (m, (37

14 598 Bayean Varable Selecton and Computaton we have and From (38 and (39, we obtan Ung (36-(40, we have ĝ m g m = A S B S g m = A S B S A B = A S A B B S B S In (41, lm S B S B In addton, we have = 1 and A S A B S B = 1 S 1 B lm A S = C m(d S C(D g m A, (38 S =1 lm B S = C m(d B. (39 S C(D g m = C(D C m (D A = A B. (40 A S = A A BS B B S = g m A S A BS B B S. (41 B ( AS lm S A B S = 0 (42 B [ 1 A g(β (m L(β (m We are then led to the followng theorem. L(β (m D, mw(β ( m L(β D D, mw(β ( m L(β D β (m ]. β (m Theorem 6. Let β, = 1, 2,..., S be a random ample. Aume A 0, and [ g(β (m L(β (m D, mw(β ( m β (m V w (g m = E A L(β D L(β(m D, mw(β ( m β (m B L(β D 2 ] D < (43 E[g(β (m 2 D] <, (44 where the expectaton taken wth repect to the jont poteror dtrbuton n (4 under the full model. Then we have [ (ĝm g 2 ] m lm S E = V w (g m, (45 S g m where V w (g m defned by (43 and S(ĝm g m D N(0, gm 2 V w(g m.

15 Chen, Huang, Ibrahm, and Km 599 The proof of Theorem 6 drectly follow from the proof of Theorem 3.1 of Chen and Shao (1997. Thu, the detal omtted for brevty. From (45, we notce that E[ ĝm gm g m ] 2 the relatve mean-quare error and Theorem 6 mple that when S large, (ĝm g 2 m 1 E g m S V w(g m. Remark 4.1: A dcued n Chen et al. (2000, the mulaton tandard error of ĝ m can be approxmated by e(ĝ m = ĝ m 1 S [ g(β (m ĝ m L(β(m D, mw(β ( m β (m ] 2, Â S L(β D where Â = A S. =1 Remark 4.2: From (34, t qute natural that one may thnk a more effcent way to obtan a MC etmate of g m by generatng two MC ample from the poteror dtrbuton o that one ample ued for computng E g(β(m L(β (m D,mw(β ( m β (m L(β D D,mw(β whle the econd ample ued for computng E L(β(m ( m β (m D. In L(β D th remark, we how that the ue of two MC ample n obtanng the MC etmate of g m may not necearly be more effcent than the ue of jut one MC ample. In addton, generatng two MC ample requre more computng tme. Specfcally, uppoe that β 1,, = 1, 2,..., S 1 and β 2,, = 1, 2,..., S 2 are two ndependent random ample from the jont poteror dtrbuton (4 under the full model. Then g m can be etmated by ĝ m = By the δ-method, we have (ĝ E m g 2 Var m = g m = Var + 1 S1 g(β (m 1, L(β(m 1, D,mw(β( m 1, β (m 1, S 1 =1 L(β 1, D 1 S2 L(β (m 2, D,mw(β( m 2, β (m 2, S 2 =1 L(β 2, D 1 S 1 S1 =1 1 S 2 S2 =1 g(β (m 1, L(β(m 1, D,mw(β( m 1, β (m 1, L(β 1, D A 2 L(β (m 2, D,mw(β( m 2, β (m 2, L(β 2, D B 2 1 g(β (m S 1 A 2 Var L(β (m D, mw(β ( m β (m L(β D + 1 L(β (m S 2 B 2 Var D, mw(β ( m β (m L(β D D. ( O (S 1 + S O 1 (S 1 + S 2 2, where the expectaton and varance are taken wth repect to the jont poteror dtrbuton (4 under the full model.

16 600 Bayean Varable Selecton and Computaton Aumng that S 1 = S 2 = S, we have (ĝ lm S E m g 2 m S g m = 1 g(β (m A 2 Var L(β (m D, mw(β ( m β (m L(β D + 1 L(β (m B 2 Var D, mw(β ( m β (m. (47 L(β D Thu, f g(β (m L(β (m D, mw(β ( m β (m E A L(β D we have L(β(m D, mw(β ( m β (m B L(β D (ĝ lm S E m g 2 (ĝm m g 2 m lm S E. S g m S g m 0, It eay to ee that when g(β (m 0 or g(β (m 0, (48 automatcally hold. Therefore, for many cae, t unneceary to ue two MC ample ntead of one MC ample n obtanng the MC etmate of g m. Note that the etmate ĝ m depend on w(β ( m β (m. It reaonable to argue that the bet choce of w hould yeld the mallet aymptotc varance of the etmate ĝ m among all poble w. The followng theorem precely addree th optmalty ue. Theorem 7. Let (48 w opt = π(β ( m β (m, D (49 be the condtonal poteror denty of β ( m gven β (m under the full model, then we have V wopt (g m V w (g m (50 for all w, where V w (g m defned by (43. Remark 4.3: Note that (50 hold for any functon g that atfe the condton gven n (44. Thu, for varou functon g nvolved n LPML m, L m (ν and DIC m, the bet choce of w the ame w opt gven n (49. Remark 4.4: When we ue ĝm n (46, we can alo how that w opt = π(β ( m β (m, D yeld the mallet aymptotc relatve mean-quare error of ĝm, for example, the one gven by (47. Remark 4.5: For computng CPO n (5 under model m, we do not need to compute n (32. In fact, t eay to ee that C(D C m(d CPO (m = E E g 1 (β (m L(β(m D,mw(β ( m β (m D L(β D, g 2 (β (m D,mw(β L(β(m ( m β (m D L(β D

17 Chen, Huang, Ibrahm, and Km 601 where g 1 (β (m = exp[ a 0 (y 0 θ (m b(θ (m ] and g 2 (β (m = f(y x (m, β (m exp[a 0 1. (y 0 θ (m b(θ (m ] Thu, gven a MCMC ample β = (β (m, β ( m, = 1, 2,..., S from the jont poteror dtrbuton (4, a MC etmate of CPO gven a follow: ĈPO (m = S =1 S =1 g 1(β (m g 2(β (m L(β (m L(β (m D,mw(β ( m L(β D D,mw(β ( m L(β D β (m β (m Followng the proof of Theorem 7, we can ealy how that the optmal choce of w for ĈPO (m tll the ame w opt gven n (49. Remark 4.6: To compute LPML K, L K (ν and DIC K under the full model, we can mply take β (K = β and w(β ( K β (K = 1. Then, for varou functon g, gven a MCMC ample β, = 1, 2,..., S (35 reduce to ĝ = 1 S S g(β, =1 where β, = 1, 2,..., S a MCMC ample from the poteror dtrbuton (4 under the full model. Remark 4.7: A hown n Theorem 7, the optmal choce of w w opt = π(β ( m β (m, D. However, for the GLM n (2, w opt not avalable n cloed form. Fortunately, for the GLM, a good w(β ( m β (m, whch cloe to the optmal choce, can be contructed baed on the aymptotc approxmaton to the jont poteror propoed by Chen (1985. Let ˆβ denote the poteror mode of β under the full model,.e., Alo let ˆβ = arg max β log L(β D = arg max (y + a 0 y 0 θ (1 + a 0 J b(θ. β ˆΣ = 2 log L(β D β= 1. β β ˆβ Then, the jont poteror π(β D under the full model can be approxmated by ˆπ(β ˆβ, D = ( k+1 2 ˆΣ 1 2 exp 1 2 (β ˆβ ˆΣ 1 (β ˆβ. (51 Ung (51, we mply take w(β ( m β (m = ˆπ(β ( m β (m, ˆβ, D, whch the condtonal dtrbuton of β ( m gven β (m wth repect to the (k + 1-dmenonal multvarate normal dtrbuton n (51. Remark 4.8: A a by-product, C m (D/C(D ready to compute va the dentty (33. It can alo be hown that C 0m (y 0 L(β (m C 0 (y 0 = E y 0, a 0, mw(β ( m β (m y 0, a 0, (52 L(β y 0, a 0.

18 602 Bayean Varable Selecton and Computaton [ ] where L(β (m y 0, a 0, m = exp a 0 y 0 θ(m J b(θ (m and the expectaton taken wth repect to the pror dtrbuton n (3 under the full model. After examnng the contructon of the conjugate pror and the form of the GLM n (2, we can alo how that B m = C m(d/c(d C 0m (y 0 /C 0 (y 0 = π(β( m = 0 D π(β ( m = 0 y 0, a 0, (53 where π(β ( m = 0 D and π(β ( m = 0 y 0, a 0 are the margnal poteror denty and the margnal pror denty of β ( m evaluated at β ( m = 0 under the full model. Furthermore, B m n (53 the Baye factor for comparng model m to the full model. Thu, to compute B m, we need to generate two MCMC ample, one from the poteror dtrbuton and another one from the pror dtrbuton of β under the full model, and then ue (33 and (52. Fnally, we note that we derve w opt under the ndependence aumpton. We expect that th optmal choce wll work well even when a dependent MCMC ample ued. Some related emprcal tude have been reported and dcued n Meng and Wong (1996, Dccco et al. (1997 and Meng and Schllng (2002. They uggeted that the optmal or near-optmal procedure contructed under the ndependence aumpton can work remarkably well n general, provdng order of magntude mprovement over other method wth mlar computatonal effort. Alternatvely, uppoe we ytematcally take a 1-n-b ubample of ze S from the Markov chan that generated from the jont poteror dtrbuton n (4. Then, followng from Guha et al. (2004, we can how that (45 hold under ome mld regularty condton uch a geometrcal ergodcty and a uffcently large b. Thu, f we take a MCMC ample n uch a way, th MCMC ample can be treated a a random ample. 5 A Smulaton Study In Secton 3, we have etablhed theoretcal connecton among AIC, BIC and the four Bayean crtera n the normal lnear regreon ettng. However, t doe not appear poble that there are any analytc connecton between AIC or BIC and the four Bayean crtera for Poon regreon. For th reaon, we preent a mulaton tudy for Poon regreon to emprcally examne whether there ext any connecton among thee crtera and to examne the performance of conjugate pror n the context of varable electon. Suppoe y θ are ndependent Poon obervaton wth mean e x β, where x a 1 p vector, = 1, 2,..., n. The conjugate pror take the form n π(β a 0, y 0 exp a 0 (y 0 x β expx, β (54 =1 where y 0 the th component of y 0. In the mulaton, we aume that x 0 = 1, x j N(0, 1 ndependently for j = 1, 2, 3 and = 1, 2,..., n. In (54, we take y 0 = 1 for = 1, 2,..., n, whch yeld a pror mode of β to be 0, a hown n Chen and Ibrahm (2003. Further we ue β = ( 0.3, 0.3, 0, 0, β = ( 0.3, 0.3, 0.2, 0, and

19 Chen, Huang, Ibrahm, and Km 603 β = ( 0.3, 0.3, 0.2, 0.15 whch correpond to the true model (x 1, (x 1, x 2, and (x 1, x 2, x 3 (full model, repectvely. We alo ue the ample ze of n = 500. Under the mulaton degn, we ndependently generated N = 500 dataet. For each mulated dataet, we ft 2 3 = 8 model. To compute the poteror model probablte baed on the conjugate pror, we mplemented the Monte Carlo algorthm propoed n Secton 4 wth a Monte Carlo ample ze of S = 20, 000. For all of thee 8 model, we computed BF, DIC, L meaure, LPML, AIC, and BIC. True Model AIC BIC (x (x 1, x (x 1, x 2, x Table 1: Frequence for Rankng the True Model a Bet Ung AIC and BIC Baed on n = 500 and N = 500 Dataet Table 1 and 2 how reult for the varou method. Our model performance evaluaton crteron a 0-1 lo functon, the lo beng 0 f the true model elected and 1 otherwe. In Table 1, we ee that BIC perform better than AIC n the number of tme the true model elected a bet when the true model a maller model. For example, when (x 1 the true model, AIC correctly dentfe th model a bet 361 tme out of 500 and BIC correctly dentfe th model a bet 490 tme. Table 2 compare the performance of the four other crtera under everal value of a 0 from the conjugate pror a well a everal value of ν for the L meaure. We ee from the table that, n general, for mall value of a 0, whch mply a nonnformatve pror, the Baye factor reult are qute content wth DIC, the L meaure, and LPML for mall model beng the true model, wherea when the full model the true model, the Baye factor tend to do wore for mall a 0 compared to large a 0. In general, a a 0 ncreae, the performance of DIC, LPML, and the Baye factor become wore, wherea for the L meaure, t farly robut over everal value of a 0. The L meaure eem to perform bet under moderate value of ν, uch a ν = A Real Data Example Due to lack of analytc connecton between AIC or BIC and the four Bayean crtera for logtc regreon, we conder the Chapman data from Lo Angele Heart Study of men (n = 200 preented n Dxon and Maey (1983 to emprcally examne whether there ext any connecton among thee crtera. In our analy, we conder a coronary ncdent a a bnary repone varable (y, whch take the value 0 and 1, where a 1 denote that an ncdent had occurred n the prevou ten year and a 0 ndcate otherwe. We conder fve prognotc factor: age (Ag, ytolc blood preure n mllmeter of mercury (S, datolc blood preure n mllmeter of mercury (D, Choleterol n mllgram per DL (Ch, and BMI =

20 604 Bayean Varable Selecton and Computaton L Meaure (ν True Model a 0 LPML DIC BF (x (x 1, x (x 1, x 2, x Table 2: Frequence for Rankng the True Model a Bet Ung BF, DIC, CPO and L meaure for Varou a 0 Baed on n = 500 and N = 500 Dataet (703.07Weght/(Heght 2. Let x 1, x 2, x 3, x 4, and x 5 denote Ag, S, D, Ch, and BMIH. For the Chapman data, we ft a logtc regreon model P (y = 1 x logtp (y = 1 x = log = x β. (55 1 P (y = 1 x The conjugate pror n (3 correpondng to the model (55 take the form ( n ] π(β a 0, y 0 exp a 0 [y 0 x β log1 + exp(x β, (56 =1 where y 0 = 0.5, = 1, 2,..., n, to enure the pror mode of β to be 0. We wh to compare the followng 32 model: Intercept only, (x 1,..., (x 5, (x 1, x 2,..., (x 1, x 2, x 3, x 4, x 5. We note that the notaton (x 1, x 2, x 3, x 4, x 5, for example, mple that x β = β 0 + β 1 Ag + β 2 S + β 3 D + β 4 Ch + β 5 BMI n (55. Thu, Intercept only the model wth zero predctor whle (x 1, x 2, x 3, x 4, x 5 the full model wth the larget model dmenon. We alo note that an ntercept ncluded n every model. Further we denote that M 1 = (Int, M 2 = (Int, Ag, M 3 = (Int, S, M 4 = (Int, D, M 5 = (Int, Ch, M 6 =(Int, BMI, M 7 = (Int, Ag, S, M 8 =(Int, Ag, D, M 9 =(Int, Ag, Ch, M 10 =(Int, Ag, BMI, M 11 =(Int, S, D, M 12 =(Int, S, Ch, M 13 =(Int, S, BMI, M 14 =(Int, D, Ch, M 15 =(Int, D, BMI, M 16 =(Int, Ch, BMI, M 17 =(Int, Ag, S, D, M 18 =(Int, Ag, S, Ch, M 19 =(Int, Ag, S, BMI, M 20 =(Int, Ag, D, Ch, M 21 =(Int,

21 Chen, Huang, Ibrahm, and Km 605 Ag, D, BMI, M 22 =(Int, Ag, Ch, BMI, M 23 =(Int, S, D, Ch, M 24 =(Int, S, D, BMI, M 25 =(Int, S, Ch, BMI, M 26 = (Int, D, Ch, BMI, M 27 =(Int, Ag, S, D, Ch, M 28 =(Int, Ag, S, D, BMI, M 29 = (Int, Ag, S, Ch, BMI, M 30 =(Int, Ag, D, Ch, BMI, M 31 = (Int, S, D, Ch, BMI, and M 32 = (Int, Ag, S, D, Ch, BMI. AIC BIC M k Value M k Value M M M M M M M M M M Table 3: The Top Model Baed on AIC and BIC for Chapman Data a 0 = a 0 = 0.01 a 0 = 0.1 a 0 = 0.5 a 0 = 1.0 Crteron M k Value M k Value M k Value M k Value M k Value PMP M M M M M DIC M M M M M LPML M M M M M L(ν = 0.1 M M M M M L(ν = 0.25 M M M M M L(ν = 0.5 M M M M M L(ν = 0.75 M M M M M L(ν = 0.9 M M M M M Table 4: The Bet Model Baed on Poteror Model Probablty (PMP, DIC, LPML, and L Meaure for Chapman Data To compute the poteror model probablty (PMP, DIC, LPML, and L meaure under varou conjugate pror, we mplemented the Monte Carlo algorthm propoed n Secton 4 wth a Monte Carlo ample ze of S = 20, 000. We ee from Table 3 that M 22 elected a the bet model by AIC and the fourth model by BIC, wherea M 10 elected a the econd bet model by both crtera. Table 4 how the reult of the L meaure, poteror model probablty (PMP, LPML, and DIC for everal value of a 0, a well a everal value of ν for the L meaure. Table 3 reveal a mlar tory a the mulaton tudy. Model M 22 elected a ether the top model or econd bet model for mot value of a 0 for DIC and PMP, a well a for the L meaure under mall value of ν. Under larger value of ν the L meaure a well a LPML appear to favor model M 32. Fnally, for mall value of a 0, LPML and PMP appear to favor a maller model, namely M 2. Thu, from thee analye, model M 2, M 22, M 32 appear to be the mot promng baed on all of thee model electon crtera. Table 5 how the top fve model elected for each of the four varable electon crtera (PMP, DIC, L meaure, LPML. Agan we ee a remarkable contency between the four crtera, n whch the orderng of the top model mlar for the four crtera for mall, moderate, and large value of a 0, and for a wde range of ν value for the L meaure.

22 606 Bayean Varable Selecton and Computaton a 0 = a 0 = 0.01 a 0 = 0.1 a 0 = 0.5 a 0 = 1.0 Crteron M k Value M k Value M k Value M k Value M k Value PMP M M M M M M M M M M M M M M M M M M M M M M M M M DIC M M M M M M M M M M M M M M M M M M M M M M M M M LPML M M M M M M M M M M M M M M M M M M M M M M M M M L M M M M M ν = 0.1 M M M M M M M M M M M M M M M M M M M M L M M M M M ν = 0.25 M M M M M M M M M M M M M M M M M M M M L M M M M M ν = 0.5 M M M M M M M M M M M M M M M M M M M M L M M M M M ν = 0.75 M M M M M M M M M M M M M M M M M M M M L M M M M M ν = 0.9 M M M M M M M M M M M M M M M M M M M M Table 5: The Top Fve Model Baed on PMP, DIC, LPML, and L Meaure for Chapman Data

23 Chen, Huang, Ibrahm, and Km 607 Table 6 how the poteror mean (Etmate, the poteror tandard error (SE, and 95% HPD nterval for the β j under model M 22 (Ag, Ch, BMI and model M 32 (Ag, S, D, Ch, BMI when a 0 = Table 6 alo how the correpondng maxmum lkelhood etmate (MLE, the tandard error, and p-value. We ee from Table 6 that the poteror etmate are very cloe to the MLE, whch ntutvely appealng, a a farly nonnformatve (a 0 = 0.01 ued. We alo ee from th table that under thee two bet model, age and BMI are only two prognotc factor for the coronary ncdent, whch are gnfcant at the 5% gnfcance level. Maxmum Lkelhood Etmate Poteror Etmate 95% HPD Model Varable Etmate SE p-value Etmate SE Interval M 22 Intercept < (-2.805, Ag ( 0.087, Ch (-0.064, BMI ( 0.069, M 32 Intercept < (-2.828, Ag ( 0.012, S (-0.583, D (-0.806, Ch (-0.074, BMI ( 0.028, Table 6: Etmate of the β under Model (Ag, Ch, BMI and Model (Ag, S, D, Ch, BMI for the Chapman Data when a 0 = 0.01 To examne performance of the propoed Monte Carlo method n Secton 4, we frt computed varou model electon crtera under a ub-model ung a MCMC ample from the full model. We then computed the ame quantte ung a MCMC ample drectly from the poteror dtrbuton under the ame ub-model. For llutratve purpoe, we condered a ngle varable ub-model M 2 = (Int, Ag ung the conjugate pror (56 wth a 0 = Ung a MCMC ample ze of S = 20, 000, the Monte Carlo etmate (mulaton tandard error of DIC, LPML, L(ν = 0.1, L(ν = 0.5, and L(ν = 0.9 under model M 2 are (0.08, (0.04, (0.05, (0.06, and (0.06, repectvely, ung the propoed Monte Carlo method va (35. Wth the ame MC ample ze, thee quantte are (0.02, (0.01, (0.02, (0.02, and (0.02, repectvely, ung the MC ample drectly from the poteror dtrbuton under model M 2. All mulaton tandard error were computed ung the overlappng batch tattc (OBS method of Schmeer et al. (1990. A expected, the mulaton tandard error ung the MC ample from the full model are lghtly larger than thoe computed ung the MC ample drectly from model M 2. However, thee two et of the MC etmate are very cloe. Th emprcally demontrate that the propoed MC method work qute well. Fnally, we compared the computatonal tme between the propoed Monte Carlo method and the exhautve alternatve. Wth 2,000 burn-n teraton and S = 20, 000, the computatonal tme of the propoed Monte Carlo method for 32 DIC, LPML, and L(ν are 71.28, , and 76.36

24 608 Bayean Varable Selecton and Computaton econd, repectvely, on a Dell WS Xeon dual 2.4GHZ CPU Lnux worktaton. Ung the ame number of burn-n teraton, the ame MC ample ze, and the ame computer, the computatonal tme of the exhautve alternatve Monte Carlo method for 32 DIC, LPML, and L(ν are , , and econd, repectvely. Thu, t become apparent that the propoed Monte Carlo method lead to a ubtantal computatonal avng over the exhautve alternatve. 7 Concludng Remark We have examned and etablhed theoretcal and computatonal relatonhp between x commonly ued method for varable ubet electon. Thee connecton were facltated from the cla of conjugate pror of Chen and Ibrahm (2003. We aw that under th cla of pror the four Bayean crtera were qute mlar n term of model choce epecally under mall value of a 0, and the reult were farly robut under a wde choce of a 0 value. Further work reman to be done. In partcular, t of nteret to obtan analytc connecton between thee crtera for pecfc GLM, uch a the logtc and Poon regreon model, a well a theoretcally examne the mall ample and large ample behavor of thee method. In Secton 4, the theory and algorthm are developed for computng the four Bayean crtera whch are defned for the GLM n (2. Wth ome traghtforward modfcaton, thee theory and algorthm can be appled for computng the four Bayean crtera that are defned for the general GLM n (1. We note ome phloophcal ue about model electon that are worth notng. In th paper, we have evaluated the performance of all crtera baed on how well they can pck up the true amplng model. However, there are other way of defnng the Bayean model. Many advocate that a Bayean model pecfed by the amplng denty and the pror, not only by the amplng denty. When one only evaluate the ucce of a crteron baed on how well t pck up the amplng model, then a comparon between AIC (or BIC and DIC not meanngful when DIC computed ung an nformatve pror. Snce AIC equvalent to DIC baed on a nonnformatve pror, a comparon of AIC (or BIC to DIC mply not meanngful when ung nformatve pror. In general, one hould avod uch comparon, and only comparable crtera hould be compared. For example, t meanngful to compare AIC, BIC, DIC, LPML, the L-meaure, and the Baye factor baed on nonnformatve pror. It meanngful to compare DIC, the L-meaure, LPML, and the Baye factor baed on nformatve pror. Fnally, we note that mot crtera for model aement, epecally the nformaton crtera, are baed on a well-defned utlty functon. If a utlty functon choen, a comparon to a crteron baed on a dfferent utlty functon not jutfed. For example, the Baye factor and BIC are pror predctve crtera amng at the explanaton of the data gven the pror, wherea DIC (AIC a a pecal cae and LPML are poteror predctve crtera amng at the explanaton of replcate (uneen data gven the poteror. Thu, one mut ue cauton n comparng thee crtera n term n pckng up the true amplng model.

25 Chen, Huang, Ibrahm, and Km 609 Appendx: Proof of Theorem Proof of Theorem 5: Snce w(β ( m β (m dβ ( m = 1 and β = (β (m, β ( m, we have g m = g(β (m L(β(m D, m dβ (m C m (D = g(β (m L(β(m D, m w(β ( m β (m dβ ( m dβ (m C m (D = C(D g(β (m L(β(m D, mw(β ( m β (m L(β D C m (D L(β D C(D dβ = C(D g(β (m C m (D E L(β (m D, mw(β ( m β (m D, L(β D whch complete the proof. Proof of Theorem 7: From (43, we have [ g(β (m V w (g m = E 1 A B Pluggng w opt nto (A.1, we have 2 L(β (m D, mw(β ( m β (m L(β D 2 D ]. (A.1 = = = = = V wopt (g m g(β (m 1 2 L(β (m D, m 2 π(β ( m β (m, D 2 dβ A B C(D π(β D g(β (m 1 2 L(β (m D, m 2 π(β ( m β (m,β, D π(β( m (m D π(β (m D dβ A B C(D π(β D g(β (m 1 2 L(β (m D, m 2 π(β ( m β (m, D dβ A B C(D π(β (m D g(β (m 1 2 L(β (m D, m 2 dβ (m π(β ( m β (m, Ddβ ( m A B C(D π(β (m D g(β (m 1 2 L(β (m D, m (β (m Ddβ (m, (A.2 A B π(β (m DC(D where π(β (m D denote the margnal poteror dtrbuton of β (m under the full

26 610 Bayean Varable Selecton and Computaton model. Thu, t uffce to how g(β (m 1 2 L(β (m D, m (β (m Ddβ (m A B π(β (m D g(β (m 1 2 L(β (m D, mw(β ( m β (m (β Ddβ. (A.3 A B π(β D By the Cauchy-Schwarz nequalty, we have 1 = w(β ( m β (m dβ ( m 2 w(β ( m β (m = π(β ( m β (m, Ddβ ( m 2 π(β ( m β (m, D w 2 (β ( m β (m π(β ( m β (m, D dβ( m π(β ( m β (m, Ddβ ( m w 2 (β ( m β (m = π(β ( m β (m, D dβ( m. (A.4 Ung (A.4, the left-hand de of (A.3 become g(β (m 1 2 L(β (m D, m (β (m Ddβ (m A B π(β (m D g(β (m 1 2 L(β (m D, mw(β ( m β (m 2 π(β (m D A B π(β (m D π(β ( m β (m, D dβ g(β (m = 1 2 L(β (m D, mw(β ( m β (m (β Ddβ, A B π(β D whch exactly matche the rght-hand de of (A.3. Reference Akake, H. (1973. Informaton Theory and an Extenon of the Maxmum Lkelhood Prncple. In Petrov, B. and Cak, F. (ed., Internatonal Sympoum on Informaton Theory, Budapet: Akadema Kado. 589 Brown, P. J., Vanucc, M., and Fearn, T. (1998. Multvarate Bayean Varable Selecton and Predcton. Journal of the Royal Stattcal Socety, Sere B, 60: (2002. Baye Model Averagng wth Selecton of Regreor. Journal of the Royal Stattcal Socety, Sere B, 64: Chen, C. F. (1985. On Aymptotc Normalty of Lmtng Denty Functon wth Bayean Implcaton. Journal of the Royal Stattcal Socety, Sere B, 47:

27 Chen, Huang, Ibrahm, and Km 611 Chen, M.-H., Dey, D. K., and Ibrahm, J. G. (2004. Bayean Crteron Baed Model Aement for Categorcal Data. Bometrka, 91: Chen, M.-H. and Ibrahm, J. G. (2003. Conjugate Pror for Generalzed Lnear Model. Stattca Snca, 13: , 586, 587, 588, 608 Chen, M.-H., Ibrahm, J. G., Shao, Q.-M., and We, R. E. (2003. Pror Elctaton for Model Selecton and Etmaton n Generalzed Lnear Mxed Model. Journal of Stattcal Plannng and Inference, 111: Chen, M.-H., Ibrahm, J. G., and Yannouto, C. (1999. Pror Elctaton, Varable Selecton, and Bayean Computaton for Logtc Regreon Model. Journal of the Royal Stattcal Socety, Sere B, 61: Chen, M.-H. and Shao, Q.-M. (1997. On Monte Carlo Method for Etmatng Rato of Normalzng Contant. The Annal of Stattc, 25: Chen, M.-H., Shao, Q.-M., and Ibrahm, J. G. (2000. Monte Carlo Method n Bayean Computaton. New York: Sprnger-Verlag. 599 Chpman, H. A., George, E. I., and McCulloch, R. E. (1998. Bayean CART Model Search (wth Dcuon. Journal of the Amercan Stattcal Aocaton, 93: (2001. The practcal Implementaton of Bayean Model Selecton (wth Dcuon. In Lahr, P. (ed., Model Selecton, Beachwood, Oho: Inttute of Mathematcal Stattc. 585 (2003. Bayean Treed Generalzed Lnear Model (wth Dcuon. In Bernardo, J. M., Bayarr, M., Berger, J. O., Dawd, A. P., Heckerman, D., and Smth, A. F. M. (ed., Bayean Stattc, volume 7, Oxford: Oxford Unverty Pre. 585 Clyde, M. (1999. Bayean Model Averagng and Model Search Stratege (wth Dcuon. In Bernardo, J. M., Berger, J. O., Dawd, A. P., and Smth, A. F. M. (ed., Bayean Stattc, volume 6, Oxford: Oxford Unverty Pre. 585 Clyde, M. and George, E. I. (2004. Model Uncertanty. Stattcal Scence, 19: Dellaporta, P. and Forter, J. J. (1999. Markov Chan Monte Carlo Model Determnaton for Herarchcal and Graphcal Log-lnear Model. Bometrka, 86: Dccco, T. J., Ka, R. E., Raftery, A., and Waerman, L. (1997. Computng Baye Factor by Combnng Smulaton and Aymptotc Approxmaton. Journal of the Amercan Stattcal Aocaton, 92: Dxon, W. J. and Maey, F. J. (1983. Introducton to Stattcal Analy. New York: McGraw-Hll, the fourth edton edton. 603

Additional File 1 - Detailed explanation of the expression level CPD

Additional File 1 - Detailed explanation of the expression level CPD Addtonal Fle - Detaled explanaton of the expreon level CPD A mentoned n the man text, the man CPD for the uterng model cont of two ndvdual factor: P( level gen P( level gen P ( level gen 2 (.).. CPD factor