Variable Selection for Bayesian Linear Regression

Size: px
Start display at page:

Download "Variable Selection for Bayesian Linear Regression"

Transcription

1 Variable Selection for Bayesian Linear Regression Model in a Finite Sample Size 1 Introduction Satoshi KABE 1 Graduate School of Systems and Information Engineering, University of Tsukuba. Yuichiro KANAZAWA 2 Graduate School of Systems and Information Engineering, University of Tsukuba. Data analysis often involves a comparison of several candidate models. Because true model is seldom known a priori, there is a need for a simple, effective, and objective methods for the selection of the best approximating model. Akaike (1973) proposed an information criterion later known to be Akaike s information criterion (AIC) based on the concept of minimizing the Kullback-Leibler Information, a measure of discrepancy between the true density (or model) and approximating model: The discrepancy between two models or two probability densities is expressed by the expected -likelihood with respect to the $\log$ true density. AIC is designed to be an approximately unbiased estimator of the expected $\log$-likelihood under the assumption that true model is one of the candidate models being considered. Hurvich and Tsai (1989) illustrated that AIC can be dramatically biased when the sample size is small by comparing AIC with the finite bias-corrected version of AIC $(AIC_{C})$ proposed by Sugiura (1978). This criterion adjusts AIC to be an exact unbiased estimator of the expected -likelihood. They then extend so $\log$ $AIC_{C}$ that it can be employed for autoregressive ( $AR$ ) model and autoregressive moving average (ARMA) model. Later Bedrick and Tsai (1994) further extend Sugiura (1978) to multivariate regression cases where response variable is expressed by a matrix. In the fully Bayesian data analysis, the deviance information criterion (DIC) is widely used for the model selection. Spiegelhalter et al. (2002) proposed a Bayesian measure of model complexity $(i.e.,$ effective number $of$ parameters $p_{d})$ obtained from the difference between the posterior mean of the deviance and the deviance at the posterior mean of the parameters. When the number of data is sufficiently large, DIC is given by adding $p_{d}$ to the posterior mean of the deviance. Spiegelhalter et al. (2002) gave an asymptotic justification of DIC in cases where the number of observations is large relative to the number of parameters. In an discussion to Spiegelhalter et al. (2002), Burnham (2002) questioned lesson $[a]$ that we learned was that, if sample size is small or the number of estimated parameters $n$ $p$ is large relative to, a modified AIC should be used, such as $n$ $AIC_{C}=$ AIC $+2p(p+$ $I$ $1)/(n-p-1)$. wonder whether DIC needs such a modification or if it really automatically adjusts for a small sample size or large $p$, relative to $n.$ We write this article to answer this question, at least partially for a very important case of linear regression. More concretely, $1E$ -mail: k @sk.tsukuba.ac.jp. 2 $E$ -mail: kanazawa@sk.tsukuba.ac.jp.

2 . follows 59 we propose a finite-sample bias corrected information criterion for the Bayesian linear regression models with normally distributed error, and then we implement simulation studies when the sample size is relatively small. The rest of this article is organized as follows: Next section briefly describes the Bayesian linear regression model. In Section 3, we propose our information criterion for the Bayesian linear regression $mo$del. Section 4 shows the results of simulation studies to show the validity of our proposed information criterion when the sample size is small. Finally, Section 5 draws some conclusions concerning our proposed criterion. 2 Bayesian Linear Regression Model We consider the linear regression model as follows $y=x\beta+\epsilon, \epsilon\sim \mathcal{n}(0, \sigma^{2}i_{n})$ (2.1) where $y$ is a $N\cross 1$ vector and $X$ is a $N\cross K$ non-stochastic matrix. The parameter $\beta$ vector is a vector $K\cross 1$ $\epsilon$ and error term a $N$ -dimensional multivariate normal distribution. $\mathcal{n}(0, \sigma^{2}i_{n})$ $\beta$ We assume that prior distribution of is a $K$ -dimensional multivariate normal distribution and that of is a gamma $\sigma^{-2}$ distribution: $\beta \sigma^{-2}\sim \mathcal{n}(b_{0}, \sigma^{2}b_{0})$ (2.2) $\sigma^{-2}\sim \mathcal{g}(\frac{\nu_{0}}{2}, \frac{\lambda_{0}}{2})$. (2.3) $\beta$ Then posterior distributions of parameters and $\sigma^{-2}$ are expressed as $\beta \sigma^{-2}, y, X\sim \mathcal{n}(b_{1}, \sigma^{2}b_{1})$ (2.4) $\sigma^{-2} y, X\sim \mathcal{g}(\frac{v_{1}}{2}, \frac{\lambda_{1}}{2})$ (2.5) where $b_{1}=b_{1}(x y+b_{0}^{-1}b_{0}),$ $B_{1}=(X X+B_{0}^{-1})^{-1},$ $\nu_{1}=\nu_{0}+n+k,$ $\lambda_{1}=\lambda_{0}+(y-$ $X\sqrt{}N) (y-x\hat{\beta}_{n})\wedge+(b_{0}-\hat{\beta}_{n}) [(X X)^{-1}+B_{0}]^{-1}(b_{0}-\hat{\beta}_{N})$ and $\hat{\beta}_{n}=(x X)^{-1}X y.$ 3 Finite-Sample Bias Correction and Variable Selection Criterion Let us denote unknown true density as and approximating model as $f_{y}(\cdot)$ $g(\cdot \theta)$ with $\theta$ parameter vector Then Kullback-Leibler Information between $f_{y}(.$ $)$ and can be $g(\cdot \theta)$ expressed as follows $I(f_{Y}, g( \cdot \theta))=\int f_{y}(z)\log\{\frac{f_{y}(z)}{g(z \theta)}\}dz$ (3.1) and (3.1) can be rewritten as $I(f_{Y}, g(\cdot \theta))=e_{z}[\log\{f_{y}(z)\}]-e_{z}[\log\{g(z \theta)\}]$. (3.2)

3 in 60 Even though the true density is unknown, the first term on the right-hand side of $f_{y}(\cdot)$ Kullback-Leibler Information in (3.2) can be regarded as a constant since the variable $z$ is integrated out. Then we select the best approximating model with maximizing the expected -likelihood. $\log$ In Bayesian perspective, parameters follow the posterior distributions estimated by observed data $y$. Hence we consider the posterior mean of (3.2): $E_{\theta y}[i(f_{y}, g(\cdot \theta))]=e_{z}[\log\{f_{y}(z)\}]-e_{\theta y}[e_{z}[\log\{9(z \theta)\}]]$ (3.3) and as in Spiegelhalter et al. (2002) and Ando (2007), our proposed information criterion is constructed based on the posterior mean of expected -likelihood. $\log$ From the Bayesian hnear regression model (2.1), we use $y$ as observed data of sample size $N$ obtained from the unknown true density $f_{y}(y)$ to estimate the posterior distributions of parameters and, while we also use as replicate data of sample size $z$ $\sigma^{-2}$ $\beta$ $N$ generated from the unknown true density $f_{y}(z)$ to evaluate the goodness of fit of approximating model $g(z X,\beta, \sigma^{-2})$. Then posterior mean of expected -likelihood $\log$ $\mathcal{t}$ is defined as $\mathcal{t}\equiv E_{\beta,\sigma^{-2} y,x}[e_{z}[\log\{g(z X, \beta, \sigma^{-2})\}]]$ (3.4) where expectation can be calculated by from the $E_{\beta,\sigma^{-2} y,x}[\cdot]$ $E_{\sigma^{-2} y,x}[e_{\beta \sigma^{-2}y,x}[\cdot]]$ posterior distributions (2.4) and (2.5). To estimate the posterior mean of expected -likelihood $\log$ $\mathcal{t}$ in (3.4), we use the posterior mean of observed -likelihood $\log$ $\hat{\mathcal{t}}_{n}$ : $\hat{\mathcal{t}}_{n}\equiv E_{\beta,\sigma^{-2} y,x}[\log\{g(y X, \beta, \sigma^{-2})\}]$. (3.5) $\hat{\mathcal{t}}_{n}-\hat{b}_{n}$ $\hat{b}_{n}$ and the bias-corrected estimator is obtained as, where is an estimate of bias $b_{\theta}\equiv E_{y}[\hat{\mathcal{T}}_{N}-T]\neq 0$. Then we propose information criterion ( $IC$ ) of the form $IC\equiv-2\hat{\mathcal{T}}_{N}+2\hat{b}_{N}$. (3.6) Ignoring the constant term, we can express the -likelihood function for the replicate $\log$ data $z$ such as $\log\{g(z X, \beta, a-2)\}=\frac{n}{2}\log\sigma^{-2}-\frac{\sigma^{-2}}{2}(z-x\beta) (z-x\beta)$ (3.7) $\sigma^{-2}$ $\beta$ where parameters and follow the posterior distributions estimated by observed data $y$ and X. Then the posterior mean of expected -likelihood $\log$ $\mathcal{t}$ (3.4) is rewritten as $\mathcal{t}=e_{\beta,\sigma^{-2} y,x}[e_{z}[\frac{n}{2}\log\sigma^{-2}-\frac{\sigma^{-2}}{2}(z-x\beta) (z-x\beta)]]$ $= E_{\beta,\sigma^{-2} y,x}[\frac{n}{2}\log\sigma^{-2}-\frac{\sigma^{-2}}{2}(y-x\beta) (y-x\beta)]$ $- E_{\beta,\sigma}-2 y,x[e_{z}[\frac{\sigma^{-2}}{2}(z-x\beta) (z-x\beta)]]$ $+ E_{\beta,\sigma^{-2} y,x}[\frac{\sigma^{-2}}{2}(y-x\beta) (y-x\beta)]$

4 with 61 $=\hat{\mathcal{t}}_{n}-c_{1}+c_{2}$ (3.8) and the bias $b_{\theta}$ $E_{y}[C_{1}-C_{2}].$ respect to the true density $f_{y}(y)$ is obtained by $b_{\theta}\equiv E_{y}[\hat{\mathcal{T}}_{N}-\eta=$ $C_{1}$ First we evaluate in (3.8). However, true density $f_{y}(z)$ is seldom known in practice, so that expectation with respect to the true density is not analytically obtained. In the previous studies, Kitagawa (1997) replaced the unknown true density by the prior predictive density to construct the predictive information criterion (PIC) for the Bayesian linear Gaussian model, while Laud and Ibrahim (1995), Gelfand and Ghosh (1998), and Ibrahim et al. (2001) considered using the posterior predictive density to generate the replicate data for model assessment. In this article, we use the posterior predictive $z$ density to evaluate the expectation with respect to the true density $f_{y}(z)$ $C_{1}$ in because the prior predictive density is far more sensitive to the selection of prior distribution. $C_{1}$ To evaluate in (3.8), we replace the true density with (conditional) posterior $f_{y}(\cdot)$ $a$ predictive density $z \sigma^{-2}, y, X\sim \mathcal{n}(xb_{1}, \sigma^{2}\sigma_{0})$, (3.9) $\sigma^{-2}$ where follows the posterior distribution (2.5) and $\Sigma_{0}=I_{N}+XB_{1}X $. From (3.9), $C_{1}$ we estimate in (3.8) as $\hat{c}_{1}=e_{\beta,\sigma^{-2} y,x}[e_{z \sigma^{-2},y,x}[\frac{\sigma^{-2}}{2}(z-x\beta) (z-x\beta)]]$ $= E_{\beta,\sigma^{-2} y,x}[e_{z \sigma^{-2},y,x}[\frac{\sigma^{-2}}{2}(z-xb_{1}) (z-xb_{1})]]$ $+ E_{\sigma^{-2} y,x}[tr\{\frac{\sigma^{-2}}{2}(x X)E_{\beta \sigma^{-2},y,x}[(\beta-b_{1})(\beta-b_{1}) ]\}]$. (3.10) The first term on the right-hand side of (3.10) can be rewritten as follows $E_{\beta,\sigma^{-2} y,x}[e_{z \sigma^{-2},y,x}[\frac{\sigma^{-2}}{2}(z-xb_{1}) (z-xb_{1})]]$ $= E_{\sigma^{-2} y,x}[\frac{\sigma^{-2}}{2}$ tr $\{E_{z \sigma^{-2},y,x}[(z-xb_{1})(z-xb_{1}) ]\}]$ $= E_{\sigma}-2 y,x[\frac{\sigma^{-2}}{2}$tr $\{\sigma^{2}\sigma_{0}\}]$ $= \frac{1}{2}$ tr $\{I_{N}+XB_{1}X \}$ $= \frac{n}{2}+\frac{1}{2}$ tr $\{(X X)B_{1}\}$. (3.11) The second term on the right-hand side of (3.10) is similarly obtained from the posterior distribution (2.4) as follows $E_{\sigma^{-2} y,x}[tr\{\frac{\sigma^{-2}}{2}(x X)E_{\beta \sigma^{-2},y,x}[(\beta-b_{1})(\beta-b_{1}) ]\}]$ $= E_{\sigma^{-2} y,x}[tr\{\frac{\sigma^{-2}}{2}(x X)\sigma^{2}B_{1}\}]$

5 does 62 $= \frac{1}{2}$tr $\{(X X)B_{1}\}.$ (3.12) From (3.11) and (3.12), $\hat{c}_{1}$ in (3.10) is evaluated as follows $\hat{c}_{1}=e_{\beta,\sigma^{-2} y,x}[e_{z \sigma^{-2},y,x}[\frac{\sigma^{-2}}{2}(z-xb_{1}) (z-xb_{1})]]$ $+ E_{\sigma^{-2} y,x}[tr\{\frac{\sigma^{-2}}{2}(x X)E_{\beta \sigma^{-2},y,x}[(\beta-b_{1})(\beta-b_{1}) ]\}]$ $= \frac{n}{2}+\frac{1}{2}$tr $\{(X X)B_{1}\}+\frac{1}{2}$ tr $\{(X X)B_{1}\}$ $= \frac{n}{2}+$ tr $\{(X X)B_{1}\}$. (3.13) $\hat{c}_{1}$ Then we notice that not depend on any data $y$, i.e., $E_{y}(\hat{C}_{1})=\hat{C}_{1}.$ Suppose that interchange of order of integrations is valid, we can rewrite $E_{y}(C_{2})$ as such $E_{y}(C_{2})=E_{y}[E_{\beta,\sigma^{-2} y,x}[\frac{\sigma^{-2}}{2}(y-x\beta) (y-x\beta)]]$ $= E_{\beta,\sigma^{-2}}[E_{y X,\beta,\sigma^{-2}}[\frac{\sigma^{-2}}{2}(y-X\beta) (y-x\beta)]]$ (3.14) where $E_{\beta,\sigma^{-2}}[\cdot]$ is an expectation with respect to the joint prior distribution and $E_{y X,\beta,\sigma^{-2}}[\cdot]$ is an expectation with respect to $N$ -dimensional multivariate normal distribution with mean vector and variance-covariance matrix $X\beta$ $\sigma^{2}i_{n}$. Since $E_{y X,\beta,\sigma^{-2}}[(y-X\beta)(y-$ $X\beta) ]=\sigma^{2}i_{n},$ $E_{y}(C_{2})$ in (3.14) can be evaluated as $E_{y}(C_{2})=E_{\beta,\sigma^{-2}}[E_{y X,\beta,\sigma^{-2}}[tr\{\frac{\sigma^{-2}}{2}(y-X\beta)(y-X\beta) \}]]$ $= \frac{1}{2}$ tr $\{I_{N}\}$ $= \frac{n}{2}$. (3.15) Therefore $\hat{b}_{n}$ is given by $\hat{b}_{n}=e_{y}[\hat{c}_{1}-c_{2}]$ $=\hat{c}_{1}-e_{y}(c_{2})$ $= \frac{n}{2}+tr\{(x X)B_{1}\}-\frac{N}{2}$ $=$ tr $\{(X X)B_{1}\}$. (3.16) Then $\hat{b}_{n}$ in (3.16) can be regarded as a ratio of variance-covariance matrices $\sigma^{2}(x X)^{-1}$ and $\sigma^{2}b_{1}.$ $\hat{\mathcal{t}}_{n}-\hat{b}_{n}$ Multiplying $-2$ to the bias-corrected estimator, our proposed information criterion for variable selection in the Bayesian linear regression model $(IC_{BL})$ is obtained by $IC_{BL}=-2\hat{\mathcal{T}}_{N}+2tr\{(X X)B_{1}\}$ (3.17)

6 is 63 where $B_{1}=(X X+B_{0}^{-1})^{-1}$ For simplicity, let us denote the parameter in (2.2) as $B_{0}$ $B_{0}=\kappa_{0}I_{K}(\kappa_{0}>0)$ and $\hat{b}_{n}$ the bias term can be rewritten as $\hat{b}_{n}=k-tr\{(x XB_{0}+I_{K})^{-1}\}$ (3.18) from the matrix inversion lemma 3. Then if the sample size $Narrow\infty$, the last term in (3.18) is expected to be zero because tr $\{(\kappa_{0}x X/N+I_{N}/N)^{-1}/N\}arrow 0$ $(i.e., \hat{b}_{n}arrow K)$ when each element of $X X/N$ $\kappa_{0}$ does not diverge. Furthermore, if sufficiently large (i.e., non-informative prior), we also have tr $\{(\kappa_{0}x X+I_{K})^{-1}\}arrow 0$ in (3.18). 4 Simulation Experiments 4.1 Deviance Information Criterion (DIC) We conduct two simulation studies to compare small sample performances of our proposed information criterion $(IC_{BL})$ in (3.17) with the deviance information criterion (DIC) : DIC $=-2\hat{\mathcal{T}}_{N}+p_{D}$, (4.1) where Spiegelhalter et al. (2002) termed $p_{d}$ as the effective number of parameters defined $p_{d}\equiv 2\log\{g(y X,\overline{\beta},\overline{\sigma}^{-2})\}-2\hat{\mathcal{T}}_{N}$ to be evaluated at the posterior means of parameters $\sqrt{}-(=b_{1})$ $\overline{\sigma}^{-2}(=\nu_{1}/\lambda_{1})$ and in (2.4) and (2.5). 4.2 Simulation studies In this article, we present two small sample simulation studies: The first one is designed to examine cases where the set of candidate models includes many over-specified models; the second one examines performances with respect to under-specified candidate models. Case 1 As in Hurvich and Tsai (1989), we consider the nested candidate models by using seven explanatory variables $x_{1},$ $x_{2},$ $x_{7}$ $\ldots,$. In our simulation study, is a vector $N\cross 1$ whose $x_{1}$ elements are ones (i.e., intercept term) and the other $N\cross 1$ vectors $x_{i}(2\leq i\leq 7)$ are generated from the uniform distribution. These variables $\mathcal{u}(-2,2)$ $x_{1},$ $x_{2},$ $x_{7}$ $\ldots,$ are included int the candidate models in a sequentially nested fashion. The candidate models $0$ are linear regression models given by. $y=\beta_{1}x_{1}+\beta_{2}x_{2}+\cdots+\beta_{k}x_{k}+\epsilon,$ $\epsilon\sim \mathcal{n}(0, \sigma^{2}i_{n})$ The candidate model with $K=1$ has only intercept term and with $K=7$ is the full model. In this simulation study, we determine the number of variables $K$ by using our proposed information criterion $(IC_{BL})$ in (3.17) and DIC in (4.1) in small sample cases $N=25,50,100$ with informative $(\kappa_{0}=0.1)$ and non-informative $(\kappa_{0}=100)$ priors. To 3For any matrices $A(m\cross m),$ $B(m\cross n),$ $C(n\cross m)$, and $D(n\cross n)$, we have where $A$ and $D$ are nonsingular matrices. $(A-BD^{-1}C)^{-1}=A^{-1}+A^{-1}B(D-CA^{-1}B)^{-1}CA^{-1}$

7 in $\hat{\mathcal{t}}_{n}$ in 64 examine the small sample performance in the Bayesian linear regression cases, we generate a sample of $y$ from the true model $(K=3)$ : $y=1.0x_{1}+2.0x_{2}+3.0x_{3}+\epsilon, \epsilon\sim \mathcal{n}(0,1.0i_{n})$. (4.2) In Table 1, we examine the performance of and DIC for the small sample cases $IC_{BL}$ $(N=25,50,100)$. The parameters of prior distributions in (2.2) and (2.3) are set to be $b_{0}=0,$ $B_{0}=\kappa_{0}I_{K}$ ( or 100), and. The simulation considered each $\kappa_{0}=0.1$ $\nu_{0}=\lambda_{0}=0.1$ combination of $N=25,50,100$ and $\kappa_{0}=0.1,100$. We draw 50, 000 MCMC samples from the posterior distributions in (2.4) and (2.5) to compute the posterior mean of observed $\hat{\mathcal{t}}_{n}$ $\log$-likelihood (3.5). For each combination of, we generate 100 observations $(N, \kappa_{0})$ of and DIC, and record the number of selected models (i.e., the candidate model $IC_{BL}$ with minimum value for the two criteria). Table 1 shows that our proposed information criterion $(IC_{BL})$ identifies the true model $(K=3)$ for the small sample cases $(N=25,50,100)$ with informative prior $(\kappa_{0}=0.1)$ far better than DIC, on the other hand DIC tends to overfit the true model. For the non-informative prior $(\kappa_{0}=100)$, both criteria tend to overfit the true model for the sample size $N=25$ and 50, but nevertheless our proposed information criterion $(IC_{BL})$ far outperformes DIC at the sample size $N=100.$ In Tables 2 and 3, we show the results of average criteria in 100 observations for the small sample cases $(N=25,50,100)$ with informative $(\kappa_{0}=0.1)$ and non-informative $(\kappa_{0}=100)$ priors. Both criteria selected the true model $(K=3)$ in all cases, but the $2\hat{b}_{N}$ difference between and $p_{d}$ becomes more apparent along with an increase in the number of explanatory variables. Hence the effective number of parameters $p_{d}$ in DIC $2\hat{b}_{N}$ tends to underestimate the model complexity compared with the bias term in $IC_{BL}.$ Case 2 Next we consider the cases where explanatory variables in the candidate models are selected from the subsets of, (i.e., $\{x_{1}, x_{2}, x_{3}\}$ $\{x_{1}\},$ $\{x_{2}\},$ $\{x_{3}\},$ $\{x_{1}, x_{2}\},$ $\{x_{1}, x_{3}\},$ $\{x_{2}, x_{3}\}$ and. Then we specify a true model such as $\{x_{1}, x_{2}, x_{3}\})$ $y=1.ox_{1}+2.0x_{2}+\epsilon, \epsilon\sim \mathcal{n}(0,1.0i_{n})$. (4.3) We implement a simulation study to examine the performance of in (3.17) and DIC $IC_{BL}$ in (4.1) for under-specified candidate models in small sample cases. The parameters of prior distributions in (2.2) and (2.3) are set to be $b_{0}=0,$ $B_{0}=$ $\kappa_{0}i_{k}$ ( or 100), and. We draw 50, 000 MCMC samples from the $\kappa_{0}=0.1$ $\nu_{0}=\lambda_{0}=0.1$ posterior distributions in (2.4) and (2.5) and compute the posterior mean of observed $\log$-likelihood (3.5). For each combination of, we generate 100 observations $(N, \kappa_{0})$ of and DIC and record the number of selected models such as Case 1. $IC_{BL}$ Table 4 shows that DIC tends to select a full model $(i.e., \{x_{1}, x_{2}, x_{3}\})$ as compared with in small sample cases. On the other hand, under-specified candidate models $IC_{BL}$ $(i.e., \{x_{1}\}, \{x_{2}\}, \{x_{3}\}, \{x_{1}, x_{3}\}, \{x_{2}, x_{3}\})$ are not selected at all. Hence, DIC shows a tendency to overfit the true $mo$del in this simulation study. In Tables 5 and 6, average values of $mo$del complexity in and DIC with respect to $IC_{BL}$ the candidate models, and are close to each other. However, $\{x_{1}, x_{2}\},$ $\{x_{1}, x_{3}\}$ $\{x_{2}, x_{3}\}$

8 $\hat{\mathcal{t}}_{n}$ among 65 true model $(i.e., \{x_{1}, x_{2}\})$ has the largest value of posterior mean of observed $\log$-likelihood three candidate models. Therefore, average values of $IC_{BL}$ and DIC successfully select the true model (4.3) in all cases. 5 Conclusion and Discussion In Bayesian data analysis, DIC (Spiegelhalter et al., 2002) is widely used for the model selection, since this criterion is relatively easy to calculate and applicable to a wide range of statistical models. Spiegelhalter et al. (2002) gave an asymptotic justification of DIC in cases where the number of observations is large relative to the number of parameters. However, Burnham (2002) questioned if the DIC needs a modification for small sample size. In this article, we proposed a variable selection criterion for $IC_{BL}$ the Bayesian linear regression models in small sample cases, as Sugiura (1978) proposed $AIC_{C}$ in frequentist framework. We then examined the performance of our proposed information criterion $(IC_{BL})$ relative to the DIC for small sample cases. In our simulation studies, we found that DIC often showed tendency to overfit the true model (see Tables 1 and 4), whereas our proposed information criterion $(IC_{BL})$ performs well for small sample cases $(N=25,50,100)$. We also found that the measure of model $2\hat{b}_{N}$ complexity was mostly larger than $p_{d}$ (see Tables 2 and 3), leading us to conclude bias correction of DIC underestimates the model complexity in small sample cases. An important directions for the further research would be to extend our information criterion to the several types of Bayesian regression models (e.g., the hierarchical Bayesian regression models, Bayesian regression models with serially correlated error, and Bayesian Markov-switching models) because empirical analysis is generally performed under the limitation of data availability.

9 $\ulcorner\frac{o}{a}$ $\wedge\overline{\dot{o}}$ $\vee t_{4\underline{\frac{o}{d}}}\dotplus^{\frac{q)\succ}{\zeta 6\partial}}\frac{\prime.+\frac{}{}\frac{\Xi\partial}{0\geq}a_{\mathfrak{B}}o\llcorner o_{\wedge}o^{-}\iota_{\phi}\circ\iota 0\omega\omega t3 }{\epsilon 60,\infty 8}e^{o}\frac{8}{}\frac{\frac{\omega}{o}}{\frac{\zeta)}{\tau\fcircle},\frac{O}{>}\frac{p\zeta 0}{\frac{r_{O}\frac{\tau_{\omega}\vec{\circ\circ}}{\omega\infty}Q)08}{\prime\acute{E}-\frac{oq\frac {}{}o)g}{=q}}}\zeta z_{6}\triangleleft m}.\vert \overline{\overline{8c6}}).$ $\backslash -\overline{-}$ $\hat{\circ\frac{o}{\vee 9 }\frac{q)\frac{\underline{o\overline{}}}{q}\check{\zeta 6}\underline{\succ}\frac{8}{o}}{\frac{\frac{d}{1}}{Z^{O}}},}\ovalbox{\tt\small REJECT}\underline{\wedge\overline{\vee\dot{\circ}g\frac{\underline{\overline{o}}}{\frac{aq\triangleright}{+\zeta 6\partial}}\frac{S}{o}- 0}}$ $o\frac{o}{\xi }\geq 0\iota 0\geq 1\circ No\frac{o}{\geq }\geq 0\iota 0\geq\iota O\circ\iota _{1} ^{1} $ $\underline{\underline{\underline{\underline{\frac{\fcircle}{\fcircle}o_{\hat{k}}^{m}}o_{\xi^{o}}^{m}\frac{o}{\fcircle}}o^{\infty}\frac{o}{\fcircle}}\frac{o}{\fcircle}o^{\infty}}\underline{\underline{\frac{o}{o}o^{m}}\frac{o}{o}o^{n}}\overline{\tau^{q)}}\triangleleft \mathfrak{t}\triangleleft\triangleleft\triangleleft r\triangleleft.$ $\triangleright\infty 0\circ\iota oco\infty oo_{1}\circ)\iota 0\triangleleft oooo\circ)oo_{1\circ)}\infty\triangleright 0\circ 0\triangleleft cooo\circ$ $\triangleleft$ $\triangleleft\circ$ $\ovalbox{\tt\small REJECT}$ 66 coooo $OO\iota 0\underline{\circ l}$ noooo $\triangleleft u\circ\triangleright oo_{o}ooooo_{\circ 3}-\infty\infty\triangleleft to\triangleright\circ)\trianglerightrightarrow\infty t\circrightarrow\overline{\infty\triangleright}- \triangleright\sim 1\circ noo\triangleright^{)\circ 0}\triangleright N-)$ oo coolonolo oooo $rightarrow$ ocncono-oo ol $\triangleright$.. $0$ $\underline{\omega}.-$ $o\underline{\iota}$ $\zeta 60$ 1 $0$

10 $\wedge\overline{\dot{o}}$ $\ddot{\infty}q)$ $11$ $E^{c6}\neg.\underline{-}$ 67 $\omega$ \S $\overline{o}o$

11 $E^{\zeta}\neg\prime oo_{6_{\vee^{*}}}$ 68. $\overline{o\overline{\overline{o}}}.$

12 $\tau_{\frac{}{\zeta 6}}$ $\frac{\tau_{\omega}\infty Q)0}{E-s^{Q)}r_{8}trightarrow \mathfrak{q}\not\supset\infty\frac{\circ}{\circ Q)}Q)}\frac{\infty}{\approx^{q)}}\frac{}{s^{h}}t1^{\frac{o}{o}}c\frac{O}{.\fcircle,\frac{o}{o^{f\mathfrak{Q}}c6Q}}\frac{a_{U}\backslash \circ J\iota O\iota 0\prec\underline{\underline{+d}}c6\infty\infty Q)Q\rangle\frac{\sum_{s}}{0\geq}\frac{8}{o}Q)\dot{\mathfrak{c}6}\underline{9\succ} }{\alpha,\infty c6s}krightarrow 08^{\wedge}08^{\wedge}\infty\vee\overline{\overline{6\dashv}}\Vert$ $\triangleleft\frac{o}{-}$ $U\ulcorner \mathfrak{q}$ $\hat{\underline{\vee B^{\vee}\frac{\dotplus^{\frac{Q)\succ}{\circ h8^{6}\zeta\ni}}\overline{\frac{o}{\frac{}{\alpha}}}}{z\frac{-}{}\frac{1}{o}}\hat{\tilde{\dot{o}}}\frac{o}{\vee\not\subset 0}o\overline{\check{8_{t}^{6}0\zeta}\omega\frac{\underline{o}}{\alpha}}\underline{e^{o}\succ }}}$ $\Xi\circ 1\iota 0\Xi 0\iota oo\xi\frac{o}{\xi }\Xi 1\circ\circ J\Xi 01\circ oo\sim $ $\overline{\xi^{o}\varpi^{q)}}\underline{\fcircle^{\infty}\underline{o^{\alpha)}}\frac{o}{\mathfrak{q}}\underline{o^{\alpha)}}\frac{\fcircle}{\fcircle}\underline{o^{\infty}}\frac{o}{\fcircle}\underline{c)}\frac{\fcircle}{\fcircle}\underline{o^{m}\frac{o}{\fcircle}\frac{o}{\fcircle}}\triangleleft\triangleleft\triangleleft q}\infty\triangleleft\triangleleft$ $\triangleright ooo\triangleleft oo\underline{ _{}\hat{\aleph^{co}}0}ooo\infty oo\triangleright oo\circ\llcorner ooo\underline{10}oooo)oooooo,oooo0rightarrow 00\circ)oooooooooo o_{\aleph\dagger i}oooo_{j}oo\underline{o}\overline{\aleph}\aleph\hat{\re^{q\hat{\frac{\aleph^{c1}\wedge}{\re}}}}\aleph^{h^{\wedge}}\hat{\mathring{\aleph}^{1\frac{\aleph^{\hat{n}}\wedge}{\aleph}}}\hat{\sim}\zeta oooo_{?}\circ\infty OJ\infty\triangleright^{)}\overline{O1}\infty\circ\infty\circ?o_{J}o^{oo\triangleleft}\hat{\infty}\hat{\infty}=$ 69 $- $ $0$ ooocooo $\vee\vee\vee\vee\vee\vee\vee\wedge ooo_{\overline{o}}oooooo_{\infty\circ}$ $\underline{\omega}.-$ $r\circ\underline{ }$ c6 $0$

13 $\hat{rightarrow\dot{o}}$ $11$ $E^{c6}\neg.\underline{-}$ 70 $\iota\ddot{o}q)\re Q)$ \={o} $O$

14 $Q)\omega$ $r\xi \mathfrak{b}$ $q)g\alpha$ $\underline{\circ 8}$ $\infty(6$ $s\phi$ $\eta\supset\s$ $\overline{c6}$ $\infty$ 71 $d$. $ $

15 72 References Akaike, H. (1973) Information theory as an extension of the maximum likelihood principle. Second International Symposium on Information Theory Akademiai Kiado, Budapest pp Ando, T. (2007) Bayesian predictive information criterion for the evaluation of hierarchical bayesian and empirical bayes models. Biometrika 94(2), Bedrick, E.., and C. L. Tsai (1994) $J$ samples. Biometrics pp $Mo$ del selection for multivariate regression in small Bumham, K.. (2002) Discussion of a paper by D.J. Spiegelhalter et al. Joumal of the $P$ Royal Statistical Society. Series (Statistical Methodology) 64(4), 629 $B$ Gelfand, A.., and S. K. Ghosh (1998) Model choice: $E$ $A$ loss approach. Biometrika 85(1), 1-11 minimum posterior predictive Hurvich, C. $M$., and C.L. Tsai (1989) Regression and time series model selection in small samples. Biometrika 76(2), Ibrahim, J. $G$., M. H. Chen, and D. Sinha (2001) Criterion-based methods for bayesian model assessment. Statistica Sinica 11 (2), Kitagawa, G. (1997) Information criteria for the predictive evaluation of bayesian $mo$dels. Communications in Statistics-Theory and Methods 26(9), Laud, P. $W$., and J. G. Ibrahim (1995) Predictive model selection. Joumal of the Royal Statistical Society. Series $B$ (Methodological) pp $J$ Spiegelhalter, D.., N. G. Best, B. P. Carlin, and A. van der Linde (2002) Bayesian measures of model complexity and fit. Joumal of the Royal Statistical Society. Series $B$ (Statistical Methodology) 64(4), Sugiura, N. (1978) Further analysts of the data by akaike s information criterion and the finite corrections. Communications in Statistics-Theory and Methods 7(1), 13-26

Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models

Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models Junfeng Shang Bowling Green State University, USA Abstract In the mixed modeling framework, Monte Carlo simulation

More information

Regression I: Mean Squared Error and Measuring Quality of Fit

Regression I: Mean Squared Error and Measuring Quality of Fit Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving

More information

Outlier detection in ARIMA and seasonal ARIMA models by. Bayesian Information Type Criteria

Outlier detection in ARIMA and seasonal ARIMA models by. Bayesian Information Type Criteria Outlier detection in ARIMA and seasonal ARIMA models by Bayesian Information Type Criteria Pedro Galeano and Daniel Peña Departamento de Estadística Universidad Carlos III de Madrid 1 Introduction The

More information

1. Introduction Over the last three decades a number of model selection criteria have been proposed, including AIC (Akaike, 1973), AICC (Hurvich & Tsa

1. Introduction Over the last three decades a number of model selection criteria have been proposed, including AIC (Akaike, 1973), AICC (Hurvich & Tsa On the Use of Marginal Likelihood in Model Selection Peide Shi Department of Probability and Statistics Peking University, Beijing 100871 P. R. China Chih-Ling Tsai Graduate School of Management University

More information

Akaike Information Criterion

Akaike Information Criterion Akaike Information Criterion Shuhua Hu Center for Research in Scientific Computation North Carolina State University Raleigh, NC February 7, 2012-1- background Background Model statistical model: Y j =

More information

Bias-corrected AIC for selecting variables in Poisson regression models

Bias-corrected AIC for selecting variables in Poisson regression models Bias-corrected AIC for selecting variables in Poisson regression models Ken-ichi Kamo (a), Hirokazu Yanagihara (b) and Kenichi Satoh (c) (a) Corresponding author: Department of Liberal Arts and Sciences,

More information

Bootstrap Variants of the Akaike Information Criterion for Mixed Model Selection

Bootstrap Variants of the Akaike Information Criterion for Mixed Model Selection Bootstrap Variants of the Akaike Information Criterion for Mixed Model Selection Junfeng Shang, and Joseph E. Cavanaugh 2, Bowling Green State University, USA 2 The University of Iowa, USA Abstract: Two

More information

On Properties of QIC in Generalized. Estimating Equations. Shinpei Imori

On Properties of QIC in Generalized. Estimating Equations. Shinpei Imori On Properties of QIC in Generalized Estimating Equations Shinpei Imori Graduate School of Engineering Science, Osaka University 1-3 Machikaneyama-cho, Toyonaka, Osaka 560-8531, Japan E-mail: imori.stat@gmail.com

More information

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS020) p.3863 Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Jinfang Wang and

More information

Additive Outlier Detection in Seasonal ARIMA Models by a Modified Bayesian Information Criterion

Additive Outlier Detection in Seasonal ARIMA Models by a Modified Bayesian Information Criterion 13 Additive Outlier Detection in Seasonal ARIMA Models by a Modified Bayesian Information Criterion Pedro Galeano and Daniel Peña CONTENTS 13.1 Introduction... 317 13.2 Formulation of the Outlier Detection

More information

Combining predictions from linear models when training and test inputs differ

Combining predictions from linear models when training and test inputs differ Combining predictions from linear models when training and test inputs differ Thijs van Ommen Centrum Wiskunde & Informatica Amsterdam, The Netherlands Abstract Methods for combining predictions from different

More information

A Variant of AIC Using Bayesian Marginal Likelihood

A Variant of AIC Using Bayesian Marginal Likelihood CIRJE-F-971 A Variant of AIC Using Bayesian Marginal Likelihood Yuki Kawakubo Graduate School of Economics, The University of Tokyo Tatsuya Kubokawa The University of Tokyo Muni S. Srivastava University

More information

Bias Correction of Cross-Validation Criterion Based on Kullback-Leibler Information under a General Condition

Bias Correction of Cross-Validation Criterion Based on Kullback-Leibler Information under a General Condition Bias Correction of Cross-Validation Criterion Based on Kullback-Leibler Information under a General Condition Hirokazu Yanagihara 1, Tetsuji Tonda 2 and Chieko Matsumoto 3 1 Department of Social Systems

More information

Discrepancy-Based Model Selection Criteria Using Cross Validation

Discrepancy-Based Model Selection Criteria Using Cross Validation 33 Discrepancy-Based Model Selection Criteria Using Cross Validation Joseph E. Cavanaugh, Simon L. Davies, and Andrew A. Neath Department of Biostatistics, The University of Iowa Pfizer Global Research

More information

Order Selection for Vector Autoregressive Models

Order Selection for Vector Autoregressive Models IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 2, FEBRUARY 2003 427 Order Selection for Vector Autoregressive Models Stijn de Waele and Piet M. T. Broersen Abstract Order-selection criteria for vector

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

An Akaike Criterion based on Kullback Symmetric Divergence in the Presence of Incomplete-Data

An Akaike Criterion based on Kullback Symmetric Divergence in the Presence of Incomplete-Data An Akaike Criterion based on Kullback Symmetric Divergence Bezza Hafidi a and Abdallah Mkhadri a a University Cadi-Ayyad, Faculty of sciences Semlalia, Department of Mathematics, PB.2390 Marrakech, Moroco

More information

DIC: Deviance Information Criterion

DIC: Deviance Information Criterion (((( Welcome Page Latest News DIC: Deviance Information Criterion Contact us/bugs list WinBUGS New WinBUGS examples FAQs DIC GeoBUGS DIC (Deviance Information Criterion) is a Bayesian method for model

More information

Penalized Loss functions for Bayesian Model Choice

Penalized Loss functions for Bayesian Model Choice Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented

More information

Model comparison: Deviance-based approaches

Model comparison: Deviance-based approaches Model comparison: Deviance-based approaches Patrick Breheny February 19 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/23 Model comparison Thus far, we have looked at residuals in a fairly

More information

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Jeremy S. Conner and Dale E. Seborg Department of Chemical Engineering University of California, Santa Barbara, CA

More information

Lecture 6: Model Checking and Selection

Lecture 6: Model Checking and Selection Lecture 6: Model Checking and Selection Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 27, 2014 Model selection We often have multiple modeling choices that are equally sensible: M 1,, M T. Which

More information

KULLBACK-LEIBLER INFORMATION THEORY A BASIS FOR MODEL SELECTION AND INFERENCE

KULLBACK-LEIBLER INFORMATION THEORY A BASIS FOR MODEL SELECTION AND INFERENCE KULLBACK-LEIBLER INFORMATION THEORY A BASIS FOR MODEL SELECTION AND INFERENCE Kullback-Leibler Information or Distance f( x) gx ( ±) ) I( f, g) = ' f( x)log dx, If, ( g) is the "information" lost when

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of

More information

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

arxiv: v1 [stat.me] 30 Mar 2015

arxiv: v1 [stat.me] 30 Mar 2015 Comparison of Bayesian predictive methods for model selection Juho Piironen and Aki Vehtari arxiv:53.865v [stat.me] 3 Mar 25 Abstract. The goal of this paper is to compare several widely used Bayesian

More information

A Study on the Bias-Correction Effect of the AIC for Selecting Variables in Normal Multivariate Linear Regression Models under Model Misspecification

A Study on the Bias-Correction Effect of the AIC for Selecting Variables in Normal Multivariate Linear Regression Models under Model Misspecification TR-No. 13-08, Hiroshima Statistical Research Group, 1 23 A Study on the Bias-Correction Effect of the AIC for Selecting Variables in Normal Multivariate Linear Regression Models under Model Misspecification

More information

Consistency of test based method for selection of variables in high dimensional two group discriminant analysis

Consistency of test based method for selection of variables in high dimensional two group discriminant analysis https://doi.org/10.1007/s42081-019-00032-4 ORIGINAL PAPER Consistency of test based method for selection of variables in high dimensional two group discriminant analysis Yasunori Fujikoshi 1 Tetsuro Sakurai

More information

Variable Selection in Multivariate Linear Regression Models with Fewer Observations than the Dimension

Variable Selection in Multivariate Linear Regression Models with Fewer Observations than the Dimension Variable Selection in Multivariate Linear Regression Models with Fewer Observations than the Dimension (Last Modified: November 3 2008) Mariko Yamamura 1, Hirokazu Yanagihara 2 and Muni S. Srivastava 3

More information

Departamento de Economía Universidad de Chile

Departamento de Economía Universidad de Chile Departamento de Economía Universidad de Chile GRADUATE COURSE SPATIAL ECONOMETRICS November 14, 16, 17, 20 and 21, 2017 Prof. Henk Folmer University of Groningen Objectives The main objective of the course

More information

Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis

Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis Yasunori Fujikoshi and Tetsuro Sakurai Department of Mathematics, Graduate School of Science,

More information

The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models

The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health

More information

A note on Reversible Jump Markov Chain Monte Carlo

A note on Reversible Jump Markov Chain Monte Carlo A note on Reversible Jump Markov Chain Monte Carlo Hedibert Freitas Lopes Graduate School of Business The University of Chicago 5807 South Woodlawn Avenue Chicago, Illinois 60637 February, 1st 2006 1 Introduction

More information

Metropolis-Hastings Algorithm

Metropolis-Hastings Algorithm Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Automatic Autocorrelation and Spectral Analysis

Automatic Autocorrelation and Spectral Analysis Piet M.T. Broersen Automatic Autocorrelation and Spectral Analysis With 104 Figures Sprin ger 1 Introduction 1 1.1 Time Series Problems 1 2 Basic Concepts 11 2.1 Random Variables 11 2.2 Normal Distribution

More information

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I.

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I. Vector Autoregressive Model Vector Autoregressions II Empirical Macroeconomics - Lect 2 Dr. Ana Beatriz Galvao Queen Mary University of London January 2012 A VAR(p) model of the m 1 vector of time series

More information

Empirical Likelihood Based Deviance Information Criterion

Empirical Likelihood Based Deviance Information Criterion Empirical Likelihood Based Deviance Information Criterion Yin Teng Smart and Safe City Center of Excellence NCS Pte Ltd June 22, 2016 Outline Bayesian empirical likelihood Definition Problems Empirical

More information

CHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA

CHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA STATISTICS IN MEDICINE, VOL. 17, 59 68 (1998) CHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA J. K. LINDSEY AND B. JONES* Department of Medical Statistics, School of Computing Sciences,

More information

Regression and Time Series Model Selection in Small Samples. Clifford M. Hurvich; Chih-Ling Tsai

Regression and Time Series Model Selection in Small Samples. Clifford M. Hurvich; Chih-Ling Tsai Regression and Time Series Model Selection in Small Samples Clifford M. Hurvich; Chih-Ling Tsai Biometrika, Vol. 76, No. 2. (Jun., 1989), pp. 297-307. Stable URL: http://links.jstor.org/sici?sici=0006-3444%28198906%2976%3a2%3c297%3aratsms%3e2.0.co%3b2-4

More information

On Autoregressive Order Selection Criteria

On Autoregressive Order Selection Criteria On Autoregressive Order Selection Criteria Venus Khim-Sen Liew Faculty of Economics and Management, Universiti Putra Malaysia, 43400 UPM, Serdang, Malaysia This version: 1 March 2004. Abstract This study

More information

Modfiied Conditional AIC in Linear Mixed Models

Modfiied Conditional AIC in Linear Mixed Models CIRJE-F-895 Modfiied Conditional AIC in Linear Mixed Models Yuki Kawakubo Graduate School of Economics, University of Tokyo Tatsuya Kubokawa University of Tokyo July 2013 CIRJE Discussion Papers can be

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

arxiv: v4 [stat.me] 23 Mar 2016

arxiv: v4 [stat.me] 23 Mar 2016 Comparison of Bayesian predictive methods for model selection Juho Piironen Aki Vehtari Aalto University juho.piironen@aalto.fi aki.vehtari@aalto.fi arxiv:53.865v4 [stat.me] 23 Mar 26 Abstract The goal

More information

Model selection for good estimation or prediction over a user-specified covariate distribution

Model selection for good estimation or prediction over a user-specified covariate distribution Graduate Theses and Dissertations Iowa State University Capstones, Theses and Dissertations 2010 Model selection for good estimation or prediction over a user-specified covariate distribution Adam Lee

More information

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1) HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter

More information

An Unbiased C p Criterion for Multivariate Ridge Regression

An Unbiased C p Criterion for Multivariate Ridge Regression An Unbiased C p Criterion for Multivariate Ridge Regression (Last Modified: March 7, 2008) Hirokazu Yanagihara 1 and Kenichi Satoh 2 1 Department of Mathematics, Graduate School of Science, Hiroshima University

More information

Measures of Fit from AR(p)

Measures of Fit from AR(p) Measures of Fit from AR(p) Residual Sum of Squared Errors Residual Mean Squared Error Root MSE (Standard Error of Regression) R-squared R-bar-squared = = T t e t SSR 1 2 ˆ = = T t e t p T s 1 2 2 ˆ 1 1

More information

UNBIASED ESTIMATE FOR b-value OF MAGNITUDE FREQUENCY

UNBIASED ESTIMATE FOR b-value OF MAGNITUDE FREQUENCY J. Phys. Earth, 34, 187-194, 1986 UNBIASED ESTIMATE FOR b-value OF MAGNITUDE FREQUENCY Yosihiko OGATA* and Ken'ichiro YAMASHINA** * Institute of Statistical Mathematics, Tokyo, Japan ** Earthquake Research

More information

A high-dimensional bias-corrected AIC for selecting response variables in multivariate calibration

A high-dimensional bias-corrected AIC for selecting response variables in multivariate calibration A high-dimensional bias-corrected AIC for selecting response variables in multivariate calibration Ryoya Oda, Yoshie Mima, Hirokazu Yanagihara and Yasunori Fujikoshi Department of Mathematics, Graduate

More information

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1 Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

CHAPTER 6: SPECIFICATION VARIABLES

CHAPTER 6: SPECIFICATION VARIABLES Recall, we had the following six assumptions required for the Gauss-Markov Theorem: 1. The regression model is linear, correctly specified, and has an additive error term. 2. The error term has a zero

More information

Using Model Selection and Prior Specification to Improve Regime-switching Asset Simulations

Using Model Selection and Prior Specification to Improve Regime-switching Asset Simulations Using Model Selection and Prior Specification to Improve Regime-switching Asset Simulations Brian M. Hartman, PhD ASA Assistant Professor of Actuarial Science University of Connecticut BYU Statistics Department

More information

Model Choice Lecture 2

Model Choice Lecture 2 Model Choice Lecture 2 Brian Ripley http://www.stats.ox.ac.uk/ ripley/modelchoice/ Bayesian approaches Note the plural I think Bayesians are rarely Bayesian in their model selection, and Geisser s quote

More information

DIC, AIC, BIC, PPL, MSPE Residuals Predictive residuals

DIC, AIC, BIC, PPL, MSPE Residuals Predictive residuals DIC, AIC, BIC, PPL, MSPE Residuals Predictive residuals Overall Measures of GOF Deviance: this measures the overall likelihood of the model given a parameter vector D( θ) = 2 log L( θ) This can be averaged

More information

Estimating prediction error in mixed models

Estimating prediction error in mixed models Estimating prediction error in mixed models benjamin saefken, thomas kneib georg-august university goettingen sonja greven ludwig-maximilians-university munich 1 / 12 GLMM - Generalized linear mixed models

More information

DYNAMIC ECONOMETRIC MODELS Vol. 9 Nicolaus Copernicus University Toruń Mariola Piłatowska Nicolaus Copernicus University in Toruń

DYNAMIC ECONOMETRIC MODELS Vol. 9 Nicolaus Copernicus University Toruń Mariola Piłatowska Nicolaus Copernicus University in Toruń DYNAMIC ECONOMETRIC MODELS Vol. 9 Nicolaus Copernicus University Toruń 2009 Mariola Piłatowska Nicolaus Copernicus University in Toruń Combined Forecasts Using the Akaike Weights A b s t r a c t. The focus

More information

Generalized common spatial factor model

Generalized common spatial factor model Biostatistics (2003), 4, 4,pp. 569 582 Printed in Great Britain Generalized common spatial factor model FUJUN WANG Eli Lilly and Company, Indianapolis, IN 46285, USA MELANIE M. WALL Division of Biostatistics,

More information

Autoregressive Moving Average (ARMA) Models and their Practical Applications

Autoregressive Moving Average (ARMA) Models and their Practical Applications Autoregressive Moving Average (ARMA) Models and their Practical Applications Massimo Guidolin February 2018 1 Essential Concepts in Time Series Analysis 1.1 Time Series and Their Properties Time series:

More information

Appendix A: The time series behavior of employment growth

Appendix A: The time series behavior of employment growth Unpublished appendices from The Relationship between Firm Size and Firm Growth in the U.S. Manufacturing Sector Bronwyn H. Hall Journal of Industrial Economics 35 (June 987): 583-606. Appendix A: The time

More information

STAT Financial Time Series

STAT Financial Time Series STAT 6104 - Financial Time Series Chapter 4 - Estimation in the time Domain Chun Yip Yau (CUHK) STAT 6104:Financial Time Series 1 / 46 Agenda 1 Introduction 2 Moment Estimates 3 Autoregressive Models (AR

More information

Chapter 14 Stein-Rule Estimation

Chapter 14 Stein-Rule Estimation Chapter 14 Stein-Rule Estimation The ordinary least squares estimation of regression coefficients in linear regression model provides the estimators having minimum variance in the class of linear and unbiased

More information

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance

More information

The connection of dropout and Bayesian statistics

The connection of dropout and Bayesian statistics The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University

More information

BAYESIAN MODEL CRITICISM

BAYESIAN MODEL CRITICISM Monte via Chib s BAYESIAN MODEL CRITICM Hedibert Freitas Lopes The University of Chicago Booth School of Business 5807 South Woodlawn Avenue, Chicago, IL 60637 http://faculty.chicagobooth.edu/hedibert.lopes

More information

Lecture 7: Model Building Bus 41910, Time Series Analysis, Mr. R. Tsay

Lecture 7: Model Building Bus 41910, Time Series Analysis, Mr. R. Tsay Lecture 7: Model Building Bus 41910, Time Series Analysis, Mr R Tsay An effective procedure for building empirical time series models is the Box-Jenkins approach, which consists of three stages: model

More information

The joint posterior distribution of the unknown parameters and hidden variables, given the

The joint posterior distribution of the unknown parameters and hidden variables, given the DERIVATIONS OF THE FULLY CONDITIONAL POSTERIOR DENSITIES The joint posterior distribution of the unknown parameters and hidden variables, given the data, is proportional to the product of the joint prior

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

Geographically Weighted Regression as a Statistical Model

Geographically Weighted Regression as a Statistical Model Geographically Weighted Regression as a Statistical Model Chris Brunsdon Stewart Fotheringham Martin Charlton October 6, 2000 Spatial Analysis Research Group Department of Geography University of Newcastle-upon-Tyne

More information

The Bayesian Approach to Multi-equation Econometric Model Estimation

The Bayesian Approach to Multi-equation Econometric Model Estimation Journal of Statistical and Econometric Methods, vol.3, no.1, 2014, 85-96 ISSN: 2241-0384 (print), 2241-0376 (online) Scienpress Ltd, 2014 The Bayesian Approach to Multi-equation Econometric Model Estimation

More information

All models are wrong but some are useful. George Box (1979)

All models are wrong but some are useful. George Box (1979) All models are wrong but some are useful. George Box (1979) The problem of model selection is overrun by a serious difficulty: even if a criterion could be settled on to determine optimality, it is hard

More information

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)

More information

Improved model selection criteria for SETAR time series models

Improved model selection criteria for SETAR time series models Journal of Statistical Planning and Inference 37 2007 2802 284 www.elsevier.com/locate/jspi Improved model selection criteria for SEAR time series models Pedro Galeano a,, Daniel Peña b a Departamento

More information

Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"

Kneib, Fahrmeir: Supplement to Structured additive regression for categorical space-time data: A mixed model approach Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach" Sonderforschungsbereich 386, Paper 43 (25) Online unter: http://epub.ub.uni-muenchen.de/

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Spatial Statistics Chapter 4 Basics of Bayesian Inference and Computation

Spatial Statistics Chapter 4 Basics of Bayesian Inference and Computation Spatial Statistics Chapter 4 Basics of Bayesian Inference and Computation So far we have discussed types of spatial data, some basic modeling frameworks and exploratory techniques. We have not discussed

More information

Obtaining simultaneous equation models from a set of variables through genetic algorithms

Obtaining simultaneous equation models from a set of variables through genetic algorithms Obtaining simultaneous equation models from a set of variables through genetic algorithms José J. López Universidad Miguel Hernández (Elche, Spain) Domingo Giménez Universidad de Murcia (Murcia, Spain)

More information

Statistics 262: Intermediate Biostatistics Model selection

Statistics 262: Intermediate Biostatistics Model selection Statistics 262: Intermediate Biostatistics Model selection Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Today s class Model selection. Strategies for model selection.

More information

Bayesian Networks in Educational Assessment

Bayesian Networks in Educational Assessment Bayesian Networks in Educational Assessment Estimating Parameters with MCMC Bayesian Inference: Expanding Our Context Roy Levy Arizona State University Roy.Levy@asu.edu 2017 Roy Levy MCMC 1 MCMC 2 Posterior

More information

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks (9) Model selection and goodness-of-fit checks Objectives In this module we will study methods for model comparisons and checking for model adequacy For model comparisons there are a finite number of candidate

More information

Comparison of New Approach Criteria for Estimating the Order of Autoregressive Process

Comparison of New Approach Criteria for Estimating the Order of Autoregressive Process IOSR Journal of Mathematics (IOSRJM) ISSN: 78-578 Volume 1, Issue 3 (July-Aug 1), PP 1- Comparison of New Approach Criteria for Estimating the Order of Autoregressive Process Salie Ayalew 1 M.Chitti Babu,

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

MASM22/FMSN30: Linear and Logistic Regression, 7.5 hp FMSN40:... with Data Gathering, 9 hp

MASM22/FMSN30: Linear and Logistic Regression, 7.5 hp FMSN40:... with Data Gathering, 9 hp Selection criteria Example Methods MASM22/FMSN30: Linear and Logistic Regression, 7.5 hp FMSN40:... with Data Gathering, 9 hp Lecture 5, spring 2018 Model selection tools Mathematical Statistics / Centre

More information

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Applied Mathematical Sciences, Vol. 4, 2010, no. 62, 3083-3093 Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Julia Bondarenko Helmut-Schmidt University Hamburg University

More information

PARAMETER ESTIMATION AND ORDER SELECTION FOR LINEAR REGRESSION PROBLEMS. Yngve Selén and Erik G. Larsson

PARAMETER ESTIMATION AND ORDER SELECTION FOR LINEAR REGRESSION PROBLEMS. Yngve Selén and Erik G. Larsson PARAMETER ESTIMATION AND ORDER SELECTION FOR LINEAR REGRESSION PROBLEMS Yngve Selén and Eri G Larsson Dept of Information Technology Uppsala University, PO Box 337 SE-71 Uppsala, Sweden email: yngveselen@ituuse

More information

Performance of Autoregressive Order Selection Criteria: A Simulation Study

Performance of Autoregressive Order Selection Criteria: A Simulation Study Pertanika J. Sci. & Technol. 6 (2): 7-76 (2008) ISSN: 028-7680 Universiti Putra Malaysia Press Performance of Autoregressive Order Selection Criteria: A Simulation Study Venus Khim-Sen Liew, Mahendran

More information

Cross-sectional space-time modeling using ARNN(p, n) processes

Cross-sectional space-time modeling using ARNN(p, n) processes Cross-sectional space-time modeling using ARNN(p, n) processes W. Polasek K. Kakamu September, 006 Abstract We suggest a new class of cross-sectional space-time models based on local AR models and nearest

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

Ken-ichi Kamo Department of Liberal Arts and Sciences, Sapporo Medical University, Hokkaido, Japan

Ken-ichi Kamo Department of Liberal Arts and Sciences, Sapporo Medical University, Hokkaido, Japan A STUDY ON THE BIAS-CORRECTION EFFECT OF THE AIC FOR SELECTING VARIABLES IN NOR- MAL MULTIVARIATE LINEAR REGRESSION MOD- ELS UNDER MODEL MISSPECIFICATION Authors: Hirokazu Yanagihara Department of Mathematics,

More information

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking

More information

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

A Bayesian view of model complexity

A Bayesian view of model complexity A Bayesian view of model complexity Angelika van der Linde University of Bremen, Germany 1. Intuitions 2. Measures of dependence between observables and parameters 3. Occurrence in predictive model comparison

More information

Saskia Rinke and Philipp Sibbertsen* Information criteria for nonlinear time series models

Saskia Rinke and Philipp Sibbertsen* Information criteria for nonlinear time series models Stud. Nonlinear Dyn. E. 16; (3): 35 341 Saskia Rinke and Philipp Sibbertsen* Information criteria for nonlinear time series models DOI 1.1515/snde-15-6 Abstract: In this paper the performance of different

More information

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model

More information