Variable Selection for Bayesian Linear Regression
|
|
- Katherine Ellis
- 6 years ago
- Views:
Transcription
1 Variable Selection for Bayesian Linear Regression Model in a Finite Sample Size 1 Introduction Satoshi KABE 1 Graduate School of Systems and Information Engineering, University of Tsukuba. Yuichiro KANAZAWA 2 Graduate School of Systems and Information Engineering, University of Tsukuba. Data analysis often involves a comparison of several candidate models. Because true model is seldom known a priori, there is a need for a simple, effective, and objective methods for the selection of the best approximating model. Akaike (1973) proposed an information criterion later known to be Akaike s information criterion (AIC) based on the concept of minimizing the Kullback-Leibler Information, a measure of discrepancy between the true density (or model) and approximating model: The discrepancy between two models or two probability densities is expressed by the expected -likelihood with respect to the $\log$ true density. AIC is designed to be an approximately unbiased estimator of the expected $\log$-likelihood under the assumption that true model is one of the candidate models being considered. Hurvich and Tsai (1989) illustrated that AIC can be dramatically biased when the sample size is small by comparing AIC with the finite bias-corrected version of AIC $(AIC_{C})$ proposed by Sugiura (1978). This criterion adjusts AIC to be an exact unbiased estimator of the expected -likelihood. They then extend so $\log$ $AIC_{C}$ that it can be employed for autoregressive ( $AR$ ) model and autoregressive moving average (ARMA) model. Later Bedrick and Tsai (1994) further extend Sugiura (1978) to multivariate regression cases where response variable is expressed by a matrix. In the fully Bayesian data analysis, the deviance information criterion (DIC) is widely used for the model selection. Spiegelhalter et al. (2002) proposed a Bayesian measure of model complexity $(i.e.,$ effective number $of$ parameters $p_{d})$ obtained from the difference between the posterior mean of the deviance and the deviance at the posterior mean of the parameters. When the number of data is sufficiently large, DIC is given by adding $p_{d}$ to the posterior mean of the deviance. Spiegelhalter et al. (2002) gave an asymptotic justification of DIC in cases where the number of observations is large relative to the number of parameters. In an discussion to Spiegelhalter et al. (2002), Burnham (2002) questioned lesson $[a]$ that we learned was that, if sample size is small or the number of estimated parameters $n$ $p$ is large relative to, a modified AIC should be used, such as $n$ $AIC_{C}=$ AIC $+2p(p+$ $I$ $1)/(n-p-1)$. wonder whether DIC needs such a modification or if it really automatically adjusts for a small sample size or large $p$, relative to $n.$ We write this article to answer this question, at least partially for a very important case of linear regression. More concretely, $1E$ -mail: k @sk.tsukuba.ac.jp. 2 $E$ -mail: kanazawa@sk.tsukuba.ac.jp.
2 . follows 59 we propose a finite-sample bias corrected information criterion for the Bayesian linear regression models with normally distributed error, and then we implement simulation studies when the sample size is relatively small. The rest of this article is organized as follows: Next section briefly describes the Bayesian linear regression model. In Section 3, we propose our information criterion for the Bayesian linear regression $mo$del. Section 4 shows the results of simulation studies to show the validity of our proposed information criterion when the sample size is small. Finally, Section 5 draws some conclusions concerning our proposed criterion. 2 Bayesian Linear Regression Model We consider the linear regression model as follows $y=x\beta+\epsilon, \epsilon\sim \mathcal{n}(0, \sigma^{2}i_{n})$ (2.1) where $y$ is a $N\cross 1$ vector and $X$ is a $N\cross K$ non-stochastic matrix. The parameter $\beta$ vector is a vector $K\cross 1$ $\epsilon$ and error term a $N$ -dimensional multivariate normal distribution. $\mathcal{n}(0, \sigma^{2}i_{n})$ $\beta$ We assume that prior distribution of is a $K$ -dimensional multivariate normal distribution and that of is a gamma $\sigma^{-2}$ distribution: $\beta \sigma^{-2}\sim \mathcal{n}(b_{0}, \sigma^{2}b_{0})$ (2.2) $\sigma^{-2}\sim \mathcal{g}(\frac{\nu_{0}}{2}, \frac{\lambda_{0}}{2})$. (2.3) $\beta$ Then posterior distributions of parameters and $\sigma^{-2}$ are expressed as $\beta \sigma^{-2}, y, X\sim \mathcal{n}(b_{1}, \sigma^{2}b_{1})$ (2.4) $\sigma^{-2} y, X\sim \mathcal{g}(\frac{v_{1}}{2}, \frac{\lambda_{1}}{2})$ (2.5) where $b_{1}=b_{1}(x y+b_{0}^{-1}b_{0}),$ $B_{1}=(X X+B_{0}^{-1})^{-1},$ $\nu_{1}=\nu_{0}+n+k,$ $\lambda_{1}=\lambda_{0}+(y-$ $X\sqrt{}N) (y-x\hat{\beta}_{n})\wedge+(b_{0}-\hat{\beta}_{n}) [(X X)^{-1}+B_{0}]^{-1}(b_{0}-\hat{\beta}_{N})$ and $\hat{\beta}_{n}=(x X)^{-1}X y.$ 3 Finite-Sample Bias Correction and Variable Selection Criterion Let us denote unknown true density as and approximating model as $f_{y}(\cdot)$ $g(\cdot \theta)$ with $\theta$ parameter vector Then Kullback-Leibler Information between $f_{y}(.$ $)$ and can be $g(\cdot \theta)$ expressed as follows $I(f_{Y}, g( \cdot \theta))=\int f_{y}(z)\log\{\frac{f_{y}(z)}{g(z \theta)}\}dz$ (3.1) and (3.1) can be rewritten as $I(f_{Y}, g(\cdot \theta))=e_{z}[\log\{f_{y}(z)\}]-e_{z}[\log\{g(z \theta)\}]$. (3.2)
3 in 60 Even though the true density is unknown, the first term on the right-hand side of $f_{y}(\cdot)$ Kullback-Leibler Information in (3.2) can be regarded as a constant since the variable $z$ is integrated out. Then we select the best approximating model with maximizing the expected -likelihood. $\log$ In Bayesian perspective, parameters follow the posterior distributions estimated by observed data $y$. Hence we consider the posterior mean of (3.2): $E_{\theta y}[i(f_{y}, g(\cdot \theta))]=e_{z}[\log\{f_{y}(z)\}]-e_{\theta y}[e_{z}[\log\{9(z \theta)\}]]$ (3.3) and as in Spiegelhalter et al. (2002) and Ando (2007), our proposed information criterion is constructed based on the posterior mean of expected -likelihood. $\log$ From the Bayesian hnear regression model (2.1), we use $y$ as observed data of sample size $N$ obtained from the unknown true density $f_{y}(y)$ to estimate the posterior distributions of parameters and, while we also use as replicate data of sample size $z$ $\sigma^{-2}$ $\beta$ $N$ generated from the unknown true density $f_{y}(z)$ to evaluate the goodness of fit of approximating model $g(z X,\beta, \sigma^{-2})$. Then posterior mean of expected -likelihood $\log$ $\mathcal{t}$ is defined as $\mathcal{t}\equiv E_{\beta,\sigma^{-2} y,x}[e_{z}[\log\{g(z X, \beta, \sigma^{-2})\}]]$ (3.4) where expectation can be calculated by from the $E_{\beta,\sigma^{-2} y,x}[\cdot]$ $E_{\sigma^{-2} y,x}[e_{\beta \sigma^{-2}y,x}[\cdot]]$ posterior distributions (2.4) and (2.5). To estimate the posterior mean of expected -likelihood $\log$ $\mathcal{t}$ in (3.4), we use the posterior mean of observed -likelihood $\log$ $\hat{\mathcal{t}}_{n}$ : $\hat{\mathcal{t}}_{n}\equiv E_{\beta,\sigma^{-2} y,x}[\log\{g(y X, \beta, \sigma^{-2})\}]$. (3.5) $\hat{\mathcal{t}}_{n}-\hat{b}_{n}$ $\hat{b}_{n}$ and the bias-corrected estimator is obtained as, where is an estimate of bias $b_{\theta}\equiv E_{y}[\hat{\mathcal{T}}_{N}-T]\neq 0$. Then we propose information criterion ( $IC$ ) of the form $IC\equiv-2\hat{\mathcal{T}}_{N}+2\hat{b}_{N}$. (3.6) Ignoring the constant term, we can express the -likelihood function for the replicate $\log$ data $z$ such as $\log\{g(z X, \beta, a-2)\}=\frac{n}{2}\log\sigma^{-2}-\frac{\sigma^{-2}}{2}(z-x\beta) (z-x\beta)$ (3.7) $\sigma^{-2}$ $\beta$ where parameters and follow the posterior distributions estimated by observed data $y$ and X. Then the posterior mean of expected -likelihood $\log$ $\mathcal{t}$ (3.4) is rewritten as $\mathcal{t}=e_{\beta,\sigma^{-2} y,x}[e_{z}[\frac{n}{2}\log\sigma^{-2}-\frac{\sigma^{-2}}{2}(z-x\beta) (z-x\beta)]]$ $= E_{\beta,\sigma^{-2} y,x}[\frac{n}{2}\log\sigma^{-2}-\frac{\sigma^{-2}}{2}(y-x\beta) (y-x\beta)]$ $- E_{\beta,\sigma}-2 y,x[e_{z}[\frac{\sigma^{-2}}{2}(z-x\beta) (z-x\beta)]]$ $+ E_{\beta,\sigma^{-2} y,x}[\frac{\sigma^{-2}}{2}(y-x\beta) (y-x\beta)]$
4 with 61 $=\hat{\mathcal{t}}_{n}-c_{1}+c_{2}$ (3.8) and the bias $b_{\theta}$ $E_{y}[C_{1}-C_{2}].$ respect to the true density $f_{y}(y)$ is obtained by $b_{\theta}\equiv E_{y}[\hat{\mathcal{T}}_{N}-\eta=$ $C_{1}$ First we evaluate in (3.8). However, true density $f_{y}(z)$ is seldom known in practice, so that expectation with respect to the true density is not analytically obtained. In the previous studies, Kitagawa (1997) replaced the unknown true density by the prior predictive density to construct the predictive information criterion (PIC) for the Bayesian linear Gaussian model, while Laud and Ibrahim (1995), Gelfand and Ghosh (1998), and Ibrahim et al. (2001) considered using the posterior predictive density to generate the replicate data for model assessment. In this article, we use the posterior predictive $z$ density to evaluate the expectation with respect to the true density $f_{y}(z)$ $C_{1}$ in because the prior predictive density is far more sensitive to the selection of prior distribution. $C_{1}$ To evaluate in (3.8), we replace the true density with (conditional) posterior $f_{y}(\cdot)$ $a$ predictive density $z \sigma^{-2}, y, X\sim \mathcal{n}(xb_{1}, \sigma^{2}\sigma_{0})$, (3.9) $\sigma^{-2}$ where follows the posterior distribution (2.5) and $\Sigma_{0}=I_{N}+XB_{1}X $. From (3.9), $C_{1}$ we estimate in (3.8) as $\hat{c}_{1}=e_{\beta,\sigma^{-2} y,x}[e_{z \sigma^{-2},y,x}[\frac{\sigma^{-2}}{2}(z-x\beta) (z-x\beta)]]$ $= E_{\beta,\sigma^{-2} y,x}[e_{z \sigma^{-2},y,x}[\frac{\sigma^{-2}}{2}(z-xb_{1}) (z-xb_{1})]]$ $+ E_{\sigma^{-2} y,x}[tr\{\frac{\sigma^{-2}}{2}(x X)E_{\beta \sigma^{-2},y,x}[(\beta-b_{1})(\beta-b_{1}) ]\}]$. (3.10) The first term on the right-hand side of (3.10) can be rewritten as follows $E_{\beta,\sigma^{-2} y,x}[e_{z \sigma^{-2},y,x}[\frac{\sigma^{-2}}{2}(z-xb_{1}) (z-xb_{1})]]$ $= E_{\sigma^{-2} y,x}[\frac{\sigma^{-2}}{2}$ tr $\{E_{z \sigma^{-2},y,x}[(z-xb_{1})(z-xb_{1}) ]\}]$ $= E_{\sigma}-2 y,x[\frac{\sigma^{-2}}{2}$tr $\{\sigma^{2}\sigma_{0}\}]$ $= \frac{1}{2}$ tr $\{I_{N}+XB_{1}X \}$ $= \frac{n}{2}+\frac{1}{2}$ tr $\{(X X)B_{1}\}$. (3.11) The second term on the right-hand side of (3.10) is similarly obtained from the posterior distribution (2.4) as follows $E_{\sigma^{-2} y,x}[tr\{\frac{\sigma^{-2}}{2}(x X)E_{\beta \sigma^{-2},y,x}[(\beta-b_{1})(\beta-b_{1}) ]\}]$ $= E_{\sigma^{-2} y,x}[tr\{\frac{\sigma^{-2}}{2}(x X)\sigma^{2}B_{1}\}]$
5 does 62 $= \frac{1}{2}$tr $\{(X X)B_{1}\}.$ (3.12) From (3.11) and (3.12), $\hat{c}_{1}$ in (3.10) is evaluated as follows $\hat{c}_{1}=e_{\beta,\sigma^{-2} y,x}[e_{z \sigma^{-2},y,x}[\frac{\sigma^{-2}}{2}(z-xb_{1}) (z-xb_{1})]]$ $+ E_{\sigma^{-2} y,x}[tr\{\frac{\sigma^{-2}}{2}(x X)E_{\beta \sigma^{-2},y,x}[(\beta-b_{1})(\beta-b_{1}) ]\}]$ $= \frac{n}{2}+\frac{1}{2}$tr $\{(X X)B_{1}\}+\frac{1}{2}$ tr $\{(X X)B_{1}\}$ $= \frac{n}{2}+$ tr $\{(X X)B_{1}\}$. (3.13) $\hat{c}_{1}$ Then we notice that not depend on any data $y$, i.e., $E_{y}(\hat{C}_{1})=\hat{C}_{1}.$ Suppose that interchange of order of integrations is valid, we can rewrite $E_{y}(C_{2})$ as such $E_{y}(C_{2})=E_{y}[E_{\beta,\sigma^{-2} y,x}[\frac{\sigma^{-2}}{2}(y-x\beta) (y-x\beta)]]$ $= E_{\beta,\sigma^{-2}}[E_{y X,\beta,\sigma^{-2}}[\frac{\sigma^{-2}}{2}(y-X\beta) (y-x\beta)]]$ (3.14) where $E_{\beta,\sigma^{-2}}[\cdot]$ is an expectation with respect to the joint prior distribution and $E_{y X,\beta,\sigma^{-2}}[\cdot]$ is an expectation with respect to $N$ -dimensional multivariate normal distribution with mean vector and variance-covariance matrix $X\beta$ $\sigma^{2}i_{n}$. Since $E_{y X,\beta,\sigma^{-2}}[(y-X\beta)(y-$ $X\beta) ]=\sigma^{2}i_{n},$ $E_{y}(C_{2})$ in (3.14) can be evaluated as $E_{y}(C_{2})=E_{\beta,\sigma^{-2}}[E_{y X,\beta,\sigma^{-2}}[tr\{\frac{\sigma^{-2}}{2}(y-X\beta)(y-X\beta) \}]]$ $= \frac{1}{2}$ tr $\{I_{N}\}$ $= \frac{n}{2}$. (3.15) Therefore $\hat{b}_{n}$ is given by $\hat{b}_{n}=e_{y}[\hat{c}_{1}-c_{2}]$ $=\hat{c}_{1}-e_{y}(c_{2})$ $= \frac{n}{2}+tr\{(x X)B_{1}\}-\frac{N}{2}$ $=$ tr $\{(X X)B_{1}\}$. (3.16) Then $\hat{b}_{n}$ in (3.16) can be regarded as a ratio of variance-covariance matrices $\sigma^{2}(x X)^{-1}$ and $\sigma^{2}b_{1}.$ $\hat{\mathcal{t}}_{n}-\hat{b}_{n}$ Multiplying $-2$ to the bias-corrected estimator, our proposed information criterion for variable selection in the Bayesian linear regression model $(IC_{BL})$ is obtained by $IC_{BL}=-2\hat{\mathcal{T}}_{N}+2tr\{(X X)B_{1}\}$ (3.17)
6 is 63 where $B_{1}=(X X+B_{0}^{-1})^{-1}$ For simplicity, let us denote the parameter in (2.2) as $B_{0}$ $B_{0}=\kappa_{0}I_{K}(\kappa_{0}>0)$ and $\hat{b}_{n}$ the bias term can be rewritten as $\hat{b}_{n}=k-tr\{(x XB_{0}+I_{K})^{-1}\}$ (3.18) from the matrix inversion lemma 3. Then if the sample size $Narrow\infty$, the last term in (3.18) is expected to be zero because tr $\{(\kappa_{0}x X/N+I_{N}/N)^{-1}/N\}arrow 0$ $(i.e., \hat{b}_{n}arrow K)$ when each element of $X X/N$ $\kappa_{0}$ does not diverge. Furthermore, if sufficiently large (i.e., non-informative prior), we also have tr $\{(\kappa_{0}x X+I_{K})^{-1}\}arrow 0$ in (3.18). 4 Simulation Experiments 4.1 Deviance Information Criterion (DIC) We conduct two simulation studies to compare small sample performances of our proposed information criterion $(IC_{BL})$ in (3.17) with the deviance information criterion (DIC) : DIC $=-2\hat{\mathcal{T}}_{N}+p_{D}$, (4.1) where Spiegelhalter et al. (2002) termed $p_{d}$ as the effective number of parameters defined $p_{d}\equiv 2\log\{g(y X,\overline{\beta},\overline{\sigma}^{-2})\}-2\hat{\mathcal{T}}_{N}$ to be evaluated at the posterior means of parameters $\sqrt{}-(=b_{1})$ $\overline{\sigma}^{-2}(=\nu_{1}/\lambda_{1})$ and in (2.4) and (2.5). 4.2 Simulation studies In this article, we present two small sample simulation studies: The first one is designed to examine cases where the set of candidate models includes many over-specified models; the second one examines performances with respect to under-specified candidate models. Case 1 As in Hurvich and Tsai (1989), we consider the nested candidate models by using seven explanatory variables $x_{1},$ $x_{2},$ $x_{7}$ $\ldots,$. In our simulation study, is a vector $N\cross 1$ whose $x_{1}$ elements are ones (i.e., intercept term) and the other $N\cross 1$ vectors $x_{i}(2\leq i\leq 7)$ are generated from the uniform distribution. These variables $\mathcal{u}(-2,2)$ $x_{1},$ $x_{2},$ $x_{7}$ $\ldots,$ are included int the candidate models in a sequentially nested fashion. The candidate models $0$ are linear regression models given by. $y=\beta_{1}x_{1}+\beta_{2}x_{2}+\cdots+\beta_{k}x_{k}+\epsilon,$ $\epsilon\sim \mathcal{n}(0, \sigma^{2}i_{n})$ The candidate model with $K=1$ has only intercept term and with $K=7$ is the full model. In this simulation study, we determine the number of variables $K$ by using our proposed information criterion $(IC_{BL})$ in (3.17) and DIC in (4.1) in small sample cases $N=25,50,100$ with informative $(\kappa_{0}=0.1)$ and non-informative $(\kappa_{0}=100)$ priors. To 3For any matrices $A(m\cross m),$ $B(m\cross n),$ $C(n\cross m)$, and $D(n\cross n)$, we have where $A$ and $D$ are nonsingular matrices. $(A-BD^{-1}C)^{-1}=A^{-1}+A^{-1}B(D-CA^{-1}B)^{-1}CA^{-1}$
7 in $\hat{\mathcal{t}}_{n}$ in 64 examine the small sample performance in the Bayesian linear regression cases, we generate a sample of $y$ from the true model $(K=3)$ : $y=1.0x_{1}+2.0x_{2}+3.0x_{3}+\epsilon, \epsilon\sim \mathcal{n}(0,1.0i_{n})$. (4.2) In Table 1, we examine the performance of and DIC for the small sample cases $IC_{BL}$ $(N=25,50,100)$. The parameters of prior distributions in (2.2) and (2.3) are set to be $b_{0}=0,$ $B_{0}=\kappa_{0}I_{K}$ ( or 100), and. The simulation considered each $\kappa_{0}=0.1$ $\nu_{0}=\lambda_{0}=0.1$ combination of $N=25,50,100$ and $\kappa_{0}=0.1,100$. We draw 50, 000 MCMC samples from the posterior distributions in (2.4) and (2.5) to compute the posterior mean of observed $\hat{\mathcal{t}}_{n}$ $\log$-likelihood (3.5). For each combination of, we generate 100 observations $(N, \kappa_{0})$ of and DIC, and record the number of selected models (i.e., the candidate model $IC_{BL}$ with minimum value for the two criteria). Table 1 shows that our proposed information criterion $(IC_{BL})$ identifies the true model $(K=3)$ for the small sample cases $(N=25,50,100)$ with informative prior $(\kappa_{0}=0.1)$ far better than DIC, on the other hand DIC tends to overfit the true model. For the non-informative prior $(\kappa_{0}=100)$, both criteria tend to overfit the true model for the sample size $N=25$ and 50, but nevertheless our proposed information criterion $(IC_{BL})$ far outperformes DIC at the sample size $N=100.$ In Tables 2 and 3, we show the results of average criteria in 100 observations for the small sample cases $(N=25,50,100)$ with informative $(\kappa_{0}=0.1)$ and non-informative $(\kappa_{0}=100)$ priors. Both criteria selected the true model $(K=3)$ in all cases, but the $2\hat{b}_{N}$ difference between and $p_{d}$ becomes more apparent along with an increase in the number of explanatory variables. Hence the effective number of parameters $p_{d}$ in DIC $2\hat{b}_{N}$ tends to underestimate the model complexity compared with the bias term in $IC_{BL}.$ Case 2 Next we consider the cases where explanatory variables in the candidate models are selected from the subsets of, (i.e., $\{x_{1}, x_{2}, x_{3}\}$ $\{x_{1}\},$ $\{x_{2}\},$ $\{x_{3}\},$ $\{x_{1}, x_{2}\},$ $\{x_{1}, x_{3}\},$ $\{x_{2}, x_{3}\}$ and. Then we specify a true model such as $\{x_{1}, x_{2}, x_{3}\})$ $y=1.ox_{1}+2.0x_{2}+\epsilon, \epsilon\sim \mathcal{n}(0,1.0i_{n})$. (4.3) We implement a simulation study to examine the performance of in (3.17) and DIC $IC_{BL}$ in (4.1) for under-specified candidate models in small sample cases. The parameters of prior distributions in (2.2) and (2.3) are set to be $b_{0}=0,$ $B_{0}=$ $\kappa_{0}i_{k}$ ( or 100), and. We draw 50, 000 MCMC samples from the $\kappa_{0}=0.1$ $\nu_{0}=\lambda_{0}=0.1$ posterior distributions in (2.4) and (2.5) and compute the posterior mean of observed $\log$-likelihood (3.5). For each combination of, we generate 100 observations $(N, \kappa_{0})$ of and DIC and record the number of selected models such as Case 1. $IC_{BL}$ Table 4 shows that DIC tends to select a full model $(i.e., \{x_{1}, x_{2}, x_{3}\})$ as compared with in small sample cases. On the other hand, under-specified candidate models $IC_{BL}$ $(i.e., \{x_{1}\}, \{x_{2}\}, \{x_{3}\}, \{x_{1}, x_{3}\}, \{x_{2}, x_{3}\})$ are not selected at all. Hence, DIC shows a tendency to overfit the true $mo$del in this simulation study. In Tables 5 and 6, average values of $mo$del complexity in and DIC with respect to $IC_{BL}$ the candidate models, and are close to each other. However, $\{x_{1}, x_{2}\},$ $\{x_{1}, x_{3}\}$ $\{x_{2}, x_{3}\}$
8 $\hat{\mathcal{t}}_{n}$ among 65 true model $(i.e., \{x_{1}, x_{2}\})$ has the largest value of posterior mean of observed $\log$-likelihood three candidate models. Therefore, average values of $IC_{BL}$ and DIC successfully select the true model (4.3) in all cases. 5 Conclusion and Discussion In Bayesian data analysis, DIC (Spiegelhalter et al., 2002) is widely used for the model selection, since this criterion is relatively easy to calculate and applicable to a wide range of statistical models. Spiegelhalter et al. (2002) gave an asymptotic justification of DIC in cases where the number of observations is large relative to the number of parameters. However, Burnham (2002) questioned if the DIC needs a modification for small sample size. In this article, we proposed a variable selection criterion for $IC_{BL}$ the Bayesian linear regression models in small sample cases, as Sugiura (1978) proposed $AIC_{C}$ in frequentist framework. We then examined the performance of our proposed information criterion $(IC_{BL})$ relative to the DIC for small sample cases. In our simulation studies, we found that DIC often showed tendency to overfit the true model (see Tables 1 and 4), whereas our proposed information criterion $(IC_{BL})$ performs well for small sample cases $(N=25,50,100)$. We also found that the measure of model $2\hat{b}_{N}$ complexity was mostly larger than $p_{d}$ (see Tables 2 and 3), leading us to conclude bias correction of DIC underestimates the model complexity in small sample cases. An important directions for the further research would be to extend our information criterion to the several types of Bayesian regression models (e.g., the hierarchical Bayesian regression models, Bayesian regression models with serially correlated error, and Bayesian Markov-switching models) because empirical analysis is generally performed under the limitation of data availability.
9 $\ulcorner\frac{o}{a}$ $\wedge\overline{\dot{o}}$ $\vee t_{4\underline{\frac{o}{d}}}\dotplus^{\frac{q)\succ}{\zeta 6\partial}}\frac{\prime.+\frac{}{}\frac{\Xi\partial}{0\geq}a_{\mathfrak{B}}o\llcorner o_{\wedge}o^{-}\iota_{\phi}\circ\iota 0\omega\omega t3 }{\epsilon 60,\infty 8}e^{o}\frac{8}{}\frac{\frac{\omega}{o}}{\frac{\zeta)}{\tau\fcircle},\frac{O}{>}\frac{p\zeta 0}{\frac{r_{O}\frac{\tau_{\omega}\vec{\circ\circ}}{\omega\infty}Q)08}{\prime\acute{E}-\frac{oq\frac {}{}o)g}{=q}}}\zeta z_{6}\triangleleft m}.\vert \overline{\overline{8c6}}).$ $\backslash -\overline{-}$ $\hat{\circ\frac{o}{\vee 9 }\frac{q)\frac{\underline{o\overline{}}}{q}\check{\zeta 6}\underline{\succ}\frac{8}{o}}{\frac{\frac{d}{1}}{Z^{O}}},}\ovalbox{\tt\small REJECT}\underline{\wedge\overline{\vee\dot{\circ}g\frac{\underline{\overline{o}}}{\frac{aq\triangleright}{+\zeta 6\partial}}\frac{S}{o}- 0}}$ $o\frac{o}{\xi }\geq 0\iota 0\geq 1\circ No\frac{o}{\geq }\geq 0\iota 0\geq\iota O\circ\iota _{1} ^{1} $ $\underline{\underline{\underline{\underline{\frac{\fcircle}{\fcircle}o_{\hat{k}}^{m}}o_{\xi^{o}}^{m}\frac{o}{\fcircle}}o^{\infty}\frac{o}{\fcircle}}\frac{o}{\fcircle}o^{\infty}}\underline{\underline{\frac{o}{o}o^{m}}\frac{o}{o}o^{n}}\overline{\tau^{q)}}\triangleleft \mathfrak{t}\triangleleft\triangleleft\triangleleft r\triangleleft.$ $\triangleright\infty 0\circ\iota oco\infty oo_{1}\circ)\iota 0\triangleleft oooo\circ)oo_{1\circ)}\infty\triangleright 0\circ 0\triangleleft cooo\circ$ $\triangleleft$ $\triangleleft\circ$ $\ovalbox{\tt\small REJECT}$ 66 coooo $OO\iota 0\underline{\circ l}$ noooo $\triangleleft u\circ\triangleright oo_{o}ooooo_{\circ 3}-\infty\infty\triangleleft to\triangleright\circ)\trianglerightrightarrow\infty t\circrightarrow\overline{\infty\triangleright}- \triangleright\sim 1\circ noo\triangleright^{)\circ 0}\triangleright N-)$ oo coolonolo oooo $rightarrow$ ocncono-oo ol $\triangleright$.. $0$ $\underline{\omega}.-$ $o\underline{\iota}$ $\zeta 60$ 1 $0$
10 $\wedge\overline{\dot{o}}$ $\ddot{\infty}q)$ $11$ $E^{c6}\neg.\underline{-}$ 67 $\omega$ \S $\overline{o}o$
11 $E^{\zeta}\neg\prime oo_{6_{\vee^{*}}}$ 68. $\overline{o\overline{\overline{o}}}.$
12 $\tau_{\frac{}{\zeta 6}}$ $\frac{\tau_{\omega}\infty Q)0}{E-s^{Q)}r_{8}trightarrow \mathfrak{q}\not\supset\infty\frac{\circ}{\circ Q)}Q)}\frac{\infty}{\approx^{q)}}\frac{}{s^{h}}t1^{\frac{o}{o}}c\frac{O}{.\fcircle,\frac{o}{o^{f\mathfrak{Q}}c6Q}}\frac{a_{U}\backslash \circ J\iota O\iota 0\prec\underline{\underline{+d}}c6\infty\infty Q)Q\rangle\frac{\sum_{s}}{0\geq}\frac{8}{o}Q)\dot{\mathfrak{c}6}\underline{9\succ} }{\alpha,\infty c6s}krightarrow 08^{\wedge}08^{\wedge}\infty\vee\overline{\overline{6\dashv}}\Vert$ $\triangleleft\frac{o}{-}$ $U\ulcorner \mathfrak{q}$ $\hat{\underline{\vee B^{\vee}\frac{\dotplus^{\frac{Q)\succ}{\circ h8^{6}\zeta\ni}}\overline{\frac{o}{\frac{}{\alpha}}}}{z\frac{-}{}\frac{1}{o}}\hat{\tilde{\dot{o}}}\frac{o}{\vee\not\subset 0}o\overline{\check{8_{t}^{6}0\zeta}\omega\frac{\underline{o}}{\alpha}}\underline{e^{o}\succ }}}$ $\Xi\circ 1\iota 0\Xi 0\iota oo\xi\frac{o}{\xi }\Xi 1\circ\circ J\Xi 01\circ oo\sim $ $\overline{\xi^{o}\varpi^{q)}}\underline{\fcircle^{\infty}\underline{o^{\alpha)}}\frac{o}{\mathfrak{q}}\underline{o^{\alpha)}}\frac{\fcircle}{\fcircle}\underline{o^{\infty}}\frac{o}{\fcircle}\underline{c)}\frac{\fcircle}{\fcircle}\underline{o^{m}\frac{o}{\fcircle}\frac{o}{\fcircle}}\triangleleft\triangleleft\triangleleft q}\infty\triangleleft\triangleleft$ $\triangleright ooo\triangleleft oo\underline{ _{}\hat{\aleph^{co}}0}ooo\infty oo\triangleright oo\circ\llcorner ooo\underline{10}oooo)oooooo,oooo0rightarrow 00\circ)oooooooooo o_{\aleph\dagger i}oooo_{j}oo\underline{o}\overline{\aleph}\aleph\hat{\re^{q\hat{\frac{\aleph^{c1}\wedge}{\re}}}}\aleph^{h^{\wedge}}\hat{\mathring{\aleph}^{1\frac{\aleph^{\hat{n}}\wedge}{\aleph}}}\hat{\sim}\zeta oooo_{?}\circ\infty OJ\infty\triangleright^{)}\overline{O1}\infty\circ\infty\circ?o_{J}o^{oo\triangleleft}\hat{\infty}\hat{\infty}=$ 69 $- $ $0$ ooocooo $\vee\vee\vee\vee\vee\vee\vee\wedge ooo_{\overline{o}}oooooo_{\infty\circ}$ $\underline{\omega}.-$ $r\circ\underline{ }$ c6 $0$
13 $\hat{rightarrow\dot{o}}$ $11$ $E^{c6}\neg.\underline{-}$ 70 $\iota\ddot{o}q)\re Q)$ \={o} $O$
14 $Q)\omega$ $r\xi \mathfrak{b}$ $q)g\alpha$ $\underline{\circ 8}$ $\infty(6$ $s\phi$ $\eta\supset\s$ $\overline{c6}$ $\infty$ 71 $d$. $ $
15 72 References Akaike, H. (1973) Information theory as an extension of the maximum likelihood principle. Second International Symposium on Information Theory Akademiai Kiado, Budapest pp Ando, T. (2007) Bayesian predictive information criterion for the evaluation of hierarchical bayesian and empirical bayes models. Biometrika 94(2), Bedrick, E.., and C. L. Tsai (1994) $J$ samples. Biometrics pp $Mo$ del selection for multivariate regression in small Bumham, K.. (2002) Discussion of a paper by D.J. Spiegelhalter et al. Joumal of the $P$ Royal Statistical Society. Series (Statistical Methodology) 64(4), 629 $B$ Gelfand, A.., and S. K. Ghosh (1998) Model choice: $E$ $A$ loss approach. Biometrika 85(1), 1-11 minimum posterior predictive Hurvich, C. $M$., and C.L. Tsai (1989) Regression and time series model selection in small samples. Biometrika 76(2), Ibrahim, J. $G$., M. H. Chen, and D. Sinha (2001) Criterion-based methods for bayesian model assessment. Statistica Sinica 11 (2), Kitagawa, G. (1997) Information criteria for the predictive evaluation of bayesian $mo$dels. Communications in Statistics-Theory and Methods 26(9), Laud, P. $W$., and J. G. Ibrahim (1995) Predictive model selection. Joumal of the Royal Statistical Society. Series $B$ (Methodological) pp $J$ Spiegelhalter, D.., N. G. Best, B. P. Carlin, and A. van der Linde (2002) Bayesian measures of model complexity and fit. Joumal of the Royal Statistical Society. Series $B$ (Statistical Methodology) 64(4), Sugiura, N. (1978) Further analysts of the data by akaike s information criterion and the finite corrections. Communications in Statistics-Theory and Methods 7(1), 13-26
Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models
Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models Junfeng Shang Bowling Green State University, USA Abstract In the mixed modeling framework, Monte Carlo simulation
More informationRegression I: Mean Squared Error and Measuring Quality of Fit
Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving
More informationOutlier detection in ARIMA and seasonal ARIMA models by. Bayesian Information Type Criteria
Outlier detection in ARIMA and seasonal ARIMA models by Bayesian Information Type Criteria Pedro Galeano and Daniel Peña Departamento de Estadística Universidad Carlos III de Madrid 1 Introduction The
More information1. Introduction Over the last three decades a number of model selection criteria have been proposed, including AIC (Akaike, 1973), AICC (Hurvich & Tsa
On the Use of Marginal Likelihood in Model Selection Peide Shi Department of Probability and Statistics Peking University, Beijing 100871 P. R. China Chih-Ling Tsai Graduate School of Management University
More informationAkaike Information Criterion
Akaike Information Criterion Shuhua Hu Center for Research in Scientific Computation North Carolina State University Raleigh, NC February 7, 2012-1- background Background Model statistical model: Y j =
More informationBias-corrected AIC for selecting variables in Poisson regression models
Bias-corrected AIC for selecting variables in Poisson regression models Ken-ichi Kamo (a), Hirokazu Yanagihara (b) and Kenichi Satoh (c) (a) Corresponding author: Department of Liberal Arts and Sciences,
More informationBootstrap Variants of the Akaike Information Criterion for Mixed Model Selection
Bootstrap Variants of the Akaike Information Criterion for Mixed Model Selection Junfeng Shang, and Joseph E. Cavanaugh 2, Bowling Green State University, USA 2 The University of Iowa, USA Abstract: Two
More informationOn Properties of QIC in Generalized. Estimating Equations. Shinpei Imori
On Properties of QIC in Generalized Estimating Equations Shinpei Imori Graduate School of Engineering Science, Osaka University 1-3 Machikaneyama-cho, Toyonaka, Osaka 560-8531, Japan E-mail: imori.stat@gmail.com
More informationModel Selection for Semiparametric Bayesian Models with Application to Overdispersion
Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS020) p.3863 Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Jinfang Wang and
More informationAdditive Outlier Detection in Seasonal ARIMA Models by a Modified Bayesian Information Criterion
13 Additive Outlier Detection in Seasonal ARIMA Models by a Modified Bayesian Information Criterion Pedro Galeano and Daniel Peña CONTENTS 13.1 Introduction... 317 13.2 Formulation of the Outlier Detection
More informationCombining predictions from linear models when training and test inputs differ
Combining predictions from linear models when training and test inputs differ Thijs van Ommen Centrum Wiskunde & Informatica Amsterdam, The Netherlands Abstract Methods for combining predictions from different
More informationA Variant of AIC Using Bayesian Marginal Likelihood
CIRJE-F-971 A Variant of AIC Using Bayesian Marginal Likelihood Yuki Kawakubo Graduate School of Economics, The University of Tokyo Tatsuya Kubokawa The University of Tokyo Muni S. Srivastava University
More informationBias Correction of Cross-Validation Criterion Based on Kullback-Leibler Information under a General Condition
Bias Correction of Cross-Validation Criterion Based on Kullback-Leibler Information under a General Condition Hirokazu Yanagihara 1, Tetsuji Tonda 2 and Chieko Matsumoto 3 1 Department of Social Systems
More informationDiscrepancy-Based Model Selection Criteria Using Cross Validation
33 Discrepancy-Based Model Selection Criteria Using Cross Validation Joseph E. Cavanaugh, Simon L. Davies, and Andrew A. Neath Department of Biostatistics, The University of Iowa Pfizer Global Research
More informationOrder Selection for Vector Autoregressive Models
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 2, FEBRUARY 2003 427 Order Selection for Vector Autoregressive Models Stijn de Waele and Piet M. T. Broersen Abstract Order-selection criteria for vector
More informationModel comparison and selection
BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)
More information9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures
FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationAn Akaike Criterion based on Kullback Symmetric Divergence in the Presence of Incomplete-Data
An Akaike Criterion based on Kullback Symmetric Divergence Bezza Hafidi a and Abdallah Mkhadri a a University Cadi-Ayyad, Faculty of sciences Semlalia, Department of Mathematics, PB.2390 Marrakech, Moroco
More informationDIC: Deviance Information Criterion
(((( Welcome Page Latest News DIC: Deviance Information Criterion Contact us/bugs list WinBUGS New WinBUGS examples FAQs DIC GeoBUGS DIC (Deviance Information Criterion) is a Bayesian method for model
More informationPenalized Loss functions for Bayesian Model Choice
Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented
More informationModel comparison: Deviance-based approaches
Model comparison: Deviance-based approaches Patrick Breheny February 19 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/23 Model comparison Thus far, we have looked at residuals in a fairly
More informationAnalysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems
Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Jeremy S. Conner and Dale E. Seborg Department of Chemical Engineering University of California, Santa Barbara, CA
More informationLecture 6: Model Checking and Selection
Lecture 6: Model Checking and Selection Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 27, 2014 Model selection We often have multiple modeling choices that are equally sensible: M 1,, M T. Which
More informationKULLBACK-LEIBLER INFORMATION THEORY A BASIS FOR MODEL SELECTION AND INFERENCE
KULLBACK-LEIBLER INFORMATION THEORY A BASIS FOR MODEL SELECTION AND INFERENCE Kullback-Leibler Information or Distance f( x) gx ( ±) ) I( f, g) = ' f( x)log dx, If, ( g) is the "information" lost when
More informationOn the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models
On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of
More informationModel Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model
Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationarxiv: v1 [stat.me] 30 Mar 2015
Comparison of Bayesian predictive methods for model selection Juho Piironen and Aki Vehtari arxiv:53.865v [stat.me] 3 Mar 25 Abstract. The goal of this paper is to compare several widely used Bayesian
More informationA Study on the Bias-Correction Effect of the AIC for Selecting Variables in Normal Multivariate Linear Regression Models under Model Misspecification
TR-No. 13-08, Hiroshima Statistical Research Group, 1 23 A Study on the Bias-Correction Effect of the AIC for Selecting Variables in Normal Multivariate Linear Regression Models under Model Misspecification
More informationConsistency of test based method for selection of variables in high dimensional two group discriminant analysis
https://doi.org/10.1007/s42081-019-00032-4 ORIGINAL PAPER Consistency of test based method for selection of variables in high dimensional two group discriminant analysis Yasunori Fujikoshi 1 Tetsuro Sakurai
More informationVariable Selection in Multivariate Linear Regression Models with Fewer Observations than the Dimension
Variable Selection in Multivariate Linear Regression Models with Fewer Observations than the Dimension (Last Modified: November 3 2008) Mariko Yamamura 1, Hirokazu Yanagihara 2 and Muni S. Srivastava 3
More informationDepartamento de Economía Universidad de Chile
Departamento de Economía Universidad de Chile GRADUATE COURSE SPATIAL ECONOMETRICS November 14, 16, 17, 20 and 21, 2017 Prof. Henk Folmer University of Groningen Objectives The main objective of the course
More informationConsistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis
Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis Yasunori Fujikoshi and Tetsuro Sakurai Department of Mathematics, Graduate School of Science,
More informationThe Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models
The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health
More informationA note on Reversible Jump Markov Chain Monte Carlo
A note on Reversible Jump Markov Chain Monte Carlo Hedibert Freitas Lopes Graduate School of Business The University of Chicago 5807 South Woodlawn Avenue Chicago, Illinois 60637 February, 1st 2006 1 Introduction
More informationMetropolis-Hastings Algorithm
Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to
More information7. Estimation and hypothesis testing. Objective. Recommended reading
7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing
More informationAutomatic Autocorrelation and Spectral Analysis
Piet M.T. Broersen Automatic Autocorrelation and Spectral Analysis With 104 Figures Sprin ger 1 Introduction 1 1.1 Time Series Problems 1 2 Basic Concepts 11 2.1 Random Variables 11 2.2 Normal Distribution
More informationVector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I.
Vector Autoregressive Model Vector Autoregressions II Empirical Macroeconomics - Lect 2 Dr. Ana Beatriz Galvao Queen Mary University of London January 2012 A VAR(p) model of the m 1 vector of time series
More informationEmpirical Likelihood Based Deviance Information Criterion
Empirical Likelihood Based Deviance Information Criterion Yin Teng Smart and Safe City Center of Excellence NCS Pte Ltd June 22, 2016 Outline Bayesian empirical likelihood Definition Problems Empirical
More informationCHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA
STATISTICS IN MEDICINE, VOL. 17, 59 68 (1998) CHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA J. K. LINDSEY AND B. JONES* Department of Medical Statistics, School of Computing Sciences,
More informationRegression and Time Series Model Selection in Small Samples. Clifford M. Hurvich; Chih-Ling Tsai
Regression and Time Series Model Selection in Small Samples Clifford M. Hurvich; Chih-Ling Tsai Biometrika, Vol. 76, No. 2. (Jun., 1989), pp. 297-307. Stable URL: http://links.jstor.org/sici?sici=0006-3444%28198906%2976%3a2%3c297%3aratsms%3e2.0.co%3b2-4
More informationOn Autoregressive Order Selection Criteria
On Autoregressive Order Selection Criteria Venus Khim-Sen Liew Faculty of Economics and Management, Universiti Putra Malaysia, 43400 UPM, Serdang, Malaysia This version: 1 March 2004. Abstract This study
More informationModfiied Conditional AIC in Linear Mixed Models
CIRJE-F-895 Modfiied Conditional AIC in Linear Mixed Models Yuki Kawakubo Graduate School of Economics, University of Tokyo Tatsuya Kubokawa University of Tokyo July 2013 CIRJE Discussion Papers can be
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationarxiv: v4 [stat.me] 23 Mar 2016
Comparison of Bayesian predictive methods for model selection Juho Piironen Aki Vehtari Aalto University juho.piironen@aalto.fi aki.vehtari@aalto.fi arxiv:53.865v4 [stat.me] 23 Mar 26 Abstract The goal
More informationModel selection for good estimation or prediction over a user-specified covariate distribution
Graduate Theses and Dissertations Iowa State University Capstones, Theses and Dissertations 2010 Model selection for good estimation or prediction over a user-specified covariate distribution Adam Lee
More informationChapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)
HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter
More informationAn Unbiased C p Criterion for Multivariate Ridge Regression
An Unbiased C p Criterion for Multivariate Ridge Regression (Last Modified: March 7, 2008) Hirokazu Yanagihara 1 and Kenichi Satoh 2 1 Department of Mathematics, Graduate School of Science, Hiroshima University
More informationMeasures of Fit from AR(p)
Measures of Fit from AR(p) Residual Sum of Squared Errors Residual Mean Squared Error Root MSE (Standard Error of Regression) R-squared R-bar-squared = = T t e t SSR 1 2 ˆ = = T t e t p T s 1 2 2 ˆ 1 1
More informationUNBIASED ESTIMATE FOR b-value OF MAGNITUDE FREQUENCY
J. Phys. Earth, 34, 187-194, 1986 UNBIASED ESTIMATE FOR b-value OF MAGNITUDE FREQUENCY Yosihiko OGATA* and Ken'ichiro YAMASHINA** * Institute of Statistical Mathematics, Tokyo, Japan ** Earthquake Research
More informationA high-dimensional bias-corrected AIC for selecting response variables in multivariate calibration
A high-dimensional bias-corrected AIC for selecting response variables in multivariate calibration Ryoya Oda, Yoshie Mima, Hirokazu Yanagihara and Yasunori Fujikoshi Department of Mathematics, Graduate
More informationVariable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1
Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationCHAPTER 6: SPECIFICATION VARIABLES
Recall, we had the following six assumptions required for the Gauss-Markov Theorem: 1. The regression model is linear, correctly specified, and has an additive error term. 2. The error term has a zero
More informationUsing Model Selection and Prior Specification to Improve Regime-switching Asset Simulations
Using Model Selection and Prior Specification to Improve Regime-switching Asset Simulations Brian M. Hartman, PhD ASA Assistant Professor of Actuarial Science University of Connecticut BYU Statistics Department
More informationModel Choice Lecture 2
Model Choice Lecture 2 Brian Ripley http://www.stats.ox.ac.uk/ ripley/modelchoice/ Bayesian approaches Note the plural I think Bayesians are rarely Bayesian in their model selection, and Geisser s quote
More informationDIC, AIC, BIC, PPL, MSPE Residuals Predictive residuals
DIC, AIC, BIC, PPL, MSPE Residuals Predictive residuals Overall Measures of GOF Deviance: this measures the overall likelihood of the model given a parameter vector D( θ) = 2 log L( θ) This can be averaged
More informationEstimating prediction error in mixed models
Estimating prediction error in mixed models benjamin saefken, thomas kneib georg-august university goettingen sonja greven ludwig-maximilians-university munich 1 / 12 GLMM - Generalized linear mixed models
More informationDYNAMIC ECONOMETRIC MODELS Vol. 9 Nicolaus Copernicus University Toruń Mariola Piłatowska Nicolaus Copernicus University in Toruń
DYNAMIC ECONOMETRIC MODELS Vol. 9 Nicolaus Copernicus University Toruń 2009 Mariola Piłatowska Nicolaus Copernicus University in Toruń Combined Forecasts Using the Akaike Weights A b s t r a c t. The focus
More informationGeneralized common spatial factor model
Biostatistics (2003), 4, 4,pp. 569 582 Printed in Great Britain Generalized common spatial factor model FUJUN WANG Eli Lilly and Company, Indianapolis, IN 46285, USA MELANIE M. WALL Division of Biostatistics,
More informationAutoregressive Moving Average (ARMA) Models and their Practical Applications
Autoregressive Moving Average (ARMA) Models and their Practical Applications Massimo Guidolin February 2018 1 Essential Concepts in Time Series Analysis 1.1 Time Series and Their Properties Time series:
More informationAppendix A: The time series behavior of employment growth
Unpublished appendices from The Relationship between Firm Size and Firm Growth in the U.S. Manufacturing Sector Bronwyn H. Hall Journal of Industrial Economics 35 (June 987): 583-606. Appendix A: The time
More informationSTAT Financial Time Series
STAT 6104 - Financial Time Series Chapter 4 - Estimation in the time Domain Chun Yip Yau (CUHK) STAT 6104:Financial Time Series 1 / 46 Agenda 1 Introduction 2 Moment Estimates 3 Autoregressive Models (AR
More informationChapter 14 Stein-Rule Estimation
Chapter 14 Stein-Rule Estimation The ordinary least squares estimation of regression coefficients in linear regression model provides the estimators having minimum variance in the class of linear and unbiased
More informationStatistics 203: Introduction to Regression and Analysis of Variance Penalized models
Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance
More informationThe connection of dropout and Bayesian statistics
The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University
More informationBAYESIAN MODEL CRITICISM
Monte via Chib s BAYESIAN MODEL CRITICM Hedibert Freitas Lopes The University of Chicago Booth School of Business 5807 South Woodlawn Avenue, Chicago, IL 60637 http://faculty.chicagobooth.edu/hedibert.lopes
More informationLecture 7: Model Building Bus 41910, Time Series Analysis, Mr. R. Tsay
Lecture 7: Model Building Bus 41910, Time Series Analysis, Mr R Tsay An effective procedure for building empirical time series models is the Box-Jenkins approach, which consists of three stages: model
More informationThe joint posterior distribution of the unknown parameters and hidden variables, given the
DERIVATIONS OF THE FULLY CONDITIONAL POSTERIOR DENSITIES The joint posterior distribution of the unknown parameters and hidden variables, given the data, is proportional to the product of the joint prior
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More informationGeographically Weighted Regression as a Statistical Model
Geographically Weighted Regression as a Statistical Model Chris Brunsdon Stewart Fotheringham Martin Charlton October 6, 2000 Spatial Analysis Research Group Department of Geography University of Newcastle-upon-Tyne
More informationThe Bayesian Approach to Multi-equation Econometric Model Estimation
Journal of Statistical and Econometric Methods, vol.3, no.1, 2014, 85-96 ISSN: 2241-0384 (print), 2241-0376 (online) Scienpress Ltd, 2014 The Bayesian Approach to Multi-equation Econometric Model Estimation
More informationAll models are wrong but some are useful. George Box (1979)
All models are wrong but some are useful. George Box (1979) The problem of model selection is overrun by a serious difficulty: even if a criterion could be settled on to determine optimality, it is hard
More informationBIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation
BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)
More informationImproved model selection criteria for SETAR time series models
Journal of Statistical Planning and Inference 37 2007 2802 284 www.elsevier.com/locate/jspi Improved model selection criteria for SEAR time series models Pedro Galeano a,, Daniel Peña b a Departamento
More informationKneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"
Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach" Sonderforschungsbereich 386, Paper 43 (25) Online unter: http://epub.ub.uni-muenchen.de/
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationSpatial Statistics Chapter 4 Basics of Bayesian Inference and Computation
Spatial Statistics Chapter 4 Basics of Bayesian Inference and Computation So far we have discussed types of spatial data, some basic modeling frameworks and exploratory techniques. We have not discussed
More informationObtaining simultaneous equation models from a set of variables through genetic algorithms
Obtaining simultaneous equation models from a set of variables through genetic algorithms José J. López Universidad Miguel Hernández (Elche, Spain) Domingo Giménez Universidad de Murcia (Murcia, Spain)
More informationStatistics 262: Intermediate Biostatistics Model selection
Statistics 262: Intermediate Biostatistics Model selection Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Today s class Model selection. Strategies for model selection.
More informationBayesian Networks in Educational Assessment
Bayesian Networks in Educational Assessment Estimating Parameters with MCMC Bayesian Inference: Expanding Our Context Roy Levy Arizona State University Roy.Levy@asu.edu 2017 Roy Levy MCMC 1 MCMC 2 Posterior
More informationST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks
(9) Model selection and goodness-of-fit checks Objectives In this module we will study methods for model comparisons and checking for model adequacy For model comparisons there are a finite number of candidate
More informationComparison of New Approach Criteria for Estimating the Order of Autoregressive Process
IOSR Journal of Mathematics (IOSRJM) ISSN: 78-578 Volume 1, Issue 3 (July-Aug 1), PP 1- Comparison of New Approach Criteria for Estimating the Order of Autoregressive Process Salie Ayalew 1 M.Chitti Babu,
More informationAnalysis Methods for Supersaturated Design: Some Comparisons
Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationMASM22/FMSN30: Linear and Logistic Regression, 7.5 hp FMSN40:... with Data Gathering, 9 hp
Selection criteria Example Methods MASM22/FMSN30: Linear and Logistic Regression, 7.5 hp FMSN40:... with Data Gathering, 9 hp Lecture 5, spring 2018 Model selection tools Mathematical Statistics / Centre
More informationSequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process
Applied Mathematical Sciences, Vol. 4, 2010, no. 62, 3083-3093 Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Julia Bondarenko Helmut-Schmidt University Hamburg University
More informationPARAMETER ESTIMATION AND ORDER SELECTION FOR LINEAR REGRESSION PROBLEMS. Yngve Selén and Erik G. Larsson
PARAMETER ESTIMATION AND ORDER SELECTION FOR LINEAR REGRESSION PROBLEMS Yngve Selén and Eri G Larsson Dept of Information Technology Uppsala University, PO Box 337 SE-71 Uppsala, Sweden email: yngveselen@ituuse
More informationPerformance of Autoregressive Order Selection Criteria: A Simulation Study
Pertanika J. Sci. & Technol. 6 (2): 7-76 (2008) ISSN: 028-7680 Universiti Putra Malaysia Press Performance of Autoregressive Order Selection Criteria: A Simulation Study Venus Khim-Sen Liew, Mahendran
More informationCross-sectional space-time modeling using ARNN(p, n) processes
Cross-sectional space-time modeling using ARNN(p, n) processes W. Polasek K. Kakamu September, 006 Abstract We suggest a new class of cross-sectional space-time models based on local AR models and nearest
More informationIntroduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016
Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An
More informationKen-ichi Kamo Department of Liberal Arts and Sciences, Sapporo Medical University, Hokkaido, Japan
A STUDY ON THE BIAS-CORRECTION EFFECT OF THE AIC FOR SELECTING VARIABLES IN NOR- MAL MULTIVARIATE LINEAR REGRESSION MOD- ELS UNDER MODEL MISSPECIFICATION Authors: Hirokazu Yanagihara Department of Mathematics,
More informationRidge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation
Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking
More informationLecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis
Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,
More informationA Bayesian view of model complexity
A Bayesian view of model complexity Angelika van der Linde University of Bremen, Germany 1. Intuitions 2. Measures of dependence between observables and parameters 3. Occurrence in predictive model comparison
More informationSaskia Rinke and Philipp Sibbertsen* Information criteria for nonlinear time series models
Stud. Nonlinear Dyn. E. 16; (3): 35 341 Saskia Rinke and Philipp Sibbertsen* Information criteria for nonlinear time series models DOI 1.1515/snde-15-6 Abstract: In this paper the performance of different
More information1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches
Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model
More information