GENERAL FORMULA OF BIAS-CORRECTED AIC IN GENERALIZED LINEAR MODELS
|
|
- Hugh Randall
- 5 years ago
- Views:
Transcription
1 GENERAL FORMULA OF BIAS-CORRECTED AIC IN GENERALIZED LINEAR MODELS Shinpei Imori, Hirokazu Yanagihara, Hirofumi Wakaki Department of Mathematics, Graduate School of Science, Hiroshima University ABSTRACT The present paper considers a bias correction of Akaike s information criterion (AIC) for selecting variables in the generalized linear model (GLM). When the sample size is not so large, the AIC has a non-negligible bias that will negatively affect variable selection. In the present study, we obtain a simple expression for a bias-corrected AIC (corrected AIC, or CAIC) in GLMs. A numerical study reveals that the CAIC has better performance than the AIC for variable selection. Key words: Akaike s information criterion, Bias correction, Generalized linear model, Maximum likelihood estimation, Variable selection.. INTRODUCTION In real data analysis, deciding the best subset of variables in regression models is an important problem. It is common for a model selection to measure the goodness of fit of the model by the risk function based on the expected Kullback-Leibler (KL) information (Kullback and Leibler (95)). For actual use, we must estimate the risk function, which depends on unknown parameters. The most famous estimator of the risk function is Akaike s information criterion (AIC) proposed by Akaike (973). Since the AIC can be simply defined as 2 the maximum log-likelihood +2 the number of parameters, the AIC is widely applied in chemometrics, engineering, econometrics, psychometrics, and many other fields for selecting appropriate models using a set of explanatory variables. In addition, the order of the bias of the AIC to the risk function is O(n ), which indicates implicitly that the AIC sometimes has a non-negligible bias to the risk function when the sample size n is not so large. The AIC tends to underestimate the risk function and the bias of AIC tends to increase with the number of parameters in the model. Potentially, the AIC tends to choose the model that has more parameters than the true model as the best model (see Shibata (980)). Combined with these characteristics, the bias will cause a disadvantage whereby the model having the most parameters is easily chosen by the AIC among the candidate models as the best model. One method of avoiding this undesirable result is to correct the bias of the AIC. A number of authors Corresponding author address: m06547@hiroshima-u.ac.jp
2 have investigated bias-corrected AIC for various models. For example, Sugiura (978) developed an unbiased estimator of the risk function in linear regression models, which is the UMVUE of the risk function reported by Davies et al. (2006). Hurvich and Tsai (989) formally adjusted the bias of the AIC (called AICc) in several models. In particular, the AICc is equivalent to Sugiura s bias-corrected AIC in the case of the linear regression model. Wong and Li (998) extended Hurvich and Tsai s AICc to a wider model and verified that their AICc has a higher performance than the original AIC by conducting numerical studies. Unfortunately, except for the linear regression model, the AICc does not completely reduce the bias of the AIC to O(n 2 ). As mentioned previously, the goodness of fit of the model is measured by the risk function based on the expected KL information. Thus, obtaining a higher-order asymptotic unbiased estimator of the risk function will allow us to correctly measure the goodness of fit of the model. This will further facilitate the reasonable selection of variables. From this viewpoint, Yanagihara et al. (2003) and Kamo et al. (20) proposed the bias-corrected AIC s in the logistic model and the Poisson regression model, respectively, which complementarily reduce the bias of the AIC to O(n 2 ). We refer to the completely bias-corrected AIC to O(n 2 ) as the corrected AIC (CAIC). Frequently, the CAIC improves the performance of the original AIC dramatically. This strongly suggests the usefulness of the CAIC for real data analysis. Nevertheless, the CAIC is rarely used in real data analysis because the CAIC has been derived only in a few models. Moreover, since the derivation of the CAIC is complicated, a great deal of practice is needed in order to carry out the calculation of the CAIC if a researcher wants to use the CAIC in a model in which the CAIC has not been derived. Hence, the CAIC is not a user-friendly model selector under the present circumstances, although the CAIC has better performance than the original AIC. If we can obtain the CAIC in a small amount of time, the CAIC will become a useful and user-friendly model selector. The goal of the present paper is to derive a simple formula for the CAIC in the widest model possible. The model considered is the generalized linear model (GLM), which was proposed by Nelder and Wedderburn (972). Nevertheless, the GLM can express a number of statistical models by changing the distribution and the link function, such as the normal linear regression model, the logistic regression model, and the probit model, which are currently commonly used in a number of applied fields, cf., Barnett and Nurmagambetov (200), Matas et al. (200), Sánchez-Carneo et al. (20), and Teste and Lieffers (20). Practically speaking, the GLM can be easily fitted to real data using the glm function in R (R Development Core Team (20)), which implements a number of distributions and link functions. Since the model considered herein is wide and can be easily fitted to real data, the CAIC in the GLM is confirmed useful in real data analysis. Generally, the additional bias correction terms in the CAIC requires estimators of several orders of moments of the response variables. Such moments should be calculated for each specified model. However, we emphasize that 2
3 the moments do not remain in our formula. Hence, in order to apply the CAIC to the real data with our formula, we need only calculate the derivatives of the log-likelihood function of less than or equal to the fourth order. Our formula can be applied using formula manipulation software such as Mathematica. The remainder of the present paper is organized as follows: In Section 2, we consider a stochastic expansion of the maximum likelihood estimator (MLE) in the GLM. In Section 3, we propose a new information criterion by complementarily reducing the bias of the AIC in the GLMs to O(n 2 ). In Section 4, we present several examples of the CAIC in a model in which the glm program is implemented. These examples will be helpful to applied researchers. In Section 5, we investigate the performance of the proposed CAIC through numerical simulations. Technical details are provided in the Appendix. 2. STOCHASTIC EXPANSION OF THE MLE IN THE GLM The GLM considered herein is developed to allow us to fit regression models for the response variables that follow a very general distribution belonging to the exponential family, the probability density function of which is given as follows: { θy b(θ) f(y; θ, ϕ) = exp + c(y, ϕ), (2.) where a( ), b( ), and c( ) are known functions, the unknown parameter θ is referred to as the natural location parameter, and ϕ is often referred to as the scale parameter. (For the details of the GLM, see, e.g., Meyer et al. (2002).) In the present paper, we assume that ϕ is known. The exponential family includes the normal, binomial, Poisson, geometric, negative binomial, exponential, gamma, and inverse normal distributions. Let each datum consist of a sequence {(y i, x i ); (i =,..., n), where y,..., y n are independent random variables referred to as response variables, and x,..., x n are p-dimensional nonstochastic vectors referred to as explanatory variables. The expectation of the response y i is related to a linear predictor η i = x iβ by a link function h( ), i.e., h(e[y i ]) = h(µ(θ i )) = η i. For theoretical purposes, we define u = (h µ), i.e., θ i = u(η i ). When h = µ, i.e., u is an identity function, we say that h is the natural link function. For example, the logistic regression model uses the natural link function. Finally, the candidate model is expressed as y i i.d f(y i ; θ i (β), ϕ), where f( ) is given by (2.). The p-dimensional unknown vector β can be estimated by the maximum likelihood method. The joint probability density function of y = (y,..., y n ) is given by n n { θi y i b(θ i ) f(y; β) = f(y i ; θ i (β), ϕ) = exp + c(y i, ϕ). 3
4 Hence, the log-likelihood function of the GLM is expressed as l(β; y) = log f(y; β) = { θi y i b(θ i ) + c(y i, ϕ). (2.2) Let ˆβ be the MLE of β. Here, ˆβ is given as the solution of the following likelihood equation: l(β; y) = β ( y i b(θ ) i) θi x i = θ i η i X (y µ) = 0 p, where X = (x,... x n ), = diag( θ / η,..., θ n / η n ), and µ = ( b(θ )/ θ,..., b(θ n )/ θ n ). Note that b(θ) is a C -class function and all of the orders of the moments of y exist in the interior Θ 0 of the natural parameter space Θ. In using some of the properties of the MLE, we have the following regularity assumptions (see, e.g., Fahrmeir and Kaufmann (985)): C. : x iβ h(m) (i =,..., n), for all β B, C.2 : h is three times continuously differentiable, C.3 : For all x i χ, θ i / η i 0, (i =,..., n), C.4 : n 0 s.t. X X has full rank for n n 0, where B is an admissible open set in R p for the parameter β, χ is a compact set for the regressors x i, and M denotes the image µ(θ 0 ). Condition C. is necessary in order to obtain the GLM for all β. Condition C.2 is necessary in order to calculate the bias. Conditions C.3 and 4 ensure that X V X is positive definite for all β B, n n 0, where ( 2 b(θ ) V = diag θ 2,..., 2 b(θ n ) θ 2 n Moreover, we have the following additional conditions to assure strong consistency and asymptotic normality of ˆβ, which can be derived by slightly modifying the results reported by Fahrmeir and Kaufmann (985): C.5 : sequence {x i lies in χ with u(x β) Θ 0, β B, C.6 : lim inf n λ min (X V X/n) > 0, C.7 : c > 0, n, λ min (X X) > cλ max (X X), n n, where λ min (A) is the smallest eigenvalue of symmetric matrix A. According to Theorem 5 in Fahrmeir and Kaufmann (985), ˆβ has strong consistency and asymptotic normality under these conditions. Furthermore, from C.6, X V X = O(n), (n ). ). 4
5 Based on the above conditions, ˆβ can be formally expanded as follows: ˆβ = β + n b + n b 2 + n n b 3 + O p (n 2 ). (2.3) Note that l( ˆβ; y)/ β = 0 p. By applying a Taylor expansion around ˆβ = β to this equation, the likelihood equation is expanded as follows: 0 p = (g + G 2 b ) + { G b 2 + n n 2 G 3(b b ) + n n { G 2 b G 3(b b 2 + b 2 b ) + 6 (I p b )G 4 (b b ) where 0 p is a p-dimensional vector of zeros, and g = n l(β; y) β = n (y i d i )c i x i, + O p (n 2 ), (2.4) G 2 = 2 l(β; y) = { d n β β i2 c 2 i + (y i d i )c i2 x i x n i, G 3 = ( ) n β 2 l(β; y), β β = { d i3 c 3 i 3d i2 c i c i2 + (y i d i )c i3 (x i x i x n i), G 4 = ( ) 2 n β β 2 l(β; y) β β = { d i4 c 4 i 6d i3 c 2 n ic i2 3d i2 c i c 2 i2 4d i2 c i3 + (y i d i )c i4 (x i x i x i x i). Here, coefficients c ik and d ik, (i =,..., n; k =,..., 4) are defined as c ik = k θ i η k i, d ik = k b(θ i ). (2.5) θi k Note that c ik is determined by the link function, and d ik is determined by the distribution of the model. Let us define Z i = n(g i M i ) (i = 2, 3, 4), where M i = E[G i ], the explicit forms of which are M 2 = n M 3 = n M 4 = n ( d i2 c 2 i)x i x i, ( d i3 c 3 i 3d i2 c i c i2 )(x i x i x i), ( d i4 c 4 i 6d i3 c 2 ic i2 3d i2 c 2 i2 4d i2 c i c i3 )(x i x i x i x i). 5
6 Thus, Z i (i = 2, 3, 4) can be expressed as Z 2 = Z 3 = Z 4 = n n n {(y i d i )c i2 x i x i, {(y i d i )c i3 (x i x i x i), {(y i d i )c i4 (x i x i x i x i). Based on the regularity assumptions, nonsingularity of M 2 is guaranteed. Furthermore, the regularity assumptions and Conditions C.6 and C.7 ensure the asymptotic normality of Z i. Hence, we can rewrite (2.4) as 0 p = (g + M 2 b ) + {M 2 b n n M 3(b b ) + Z 2 b + { n M 2 b 3 + n 2 M 3(b b 2 + b 2 b ) + 6 (I p b )M 4 (b b ) + Z 2 b Z 3(b b ) + O p (n 2 ). (2.6) Comparing the terms of the same order in both sides of (2.6), the explicit forms of b, b 2, and b 3 are obtained as follows: b = M2 g, { b 2 = M2 2 M 3(b b ) + Z 2 b, b 3 = M 2 { 2 M 3(b b 2 + b 2 b ) + 6 (I p b ) M 4 (b b ) + Z 2 b Z 3(b b ). 3. BIAS CORRECTION OF THE AIC The goodness of fit of the model is measured by the risk function based on the expected KL information, as follows: R = E y E y [ 2l( ˆβ; y )], where y = (y,..., yn) is an n-dimensional random vector that is independent of y and has the same distribution as y. At the beginning of this section, we derive the bias of 2l( ˆβ; y) to R. Under ordinary circumstances, calculation of the expectations of y under the specific distribution are needed in order to express the bias. However, based on the characteristics of the exponential family, we can obtain the bias without calculating the expectations of y under the specific distribution. The explicit form of the bias can be expressed by several derivatives of the log-likelihood function. 6
7 The bias when we estimate R by 2l( ˆβ; y) is given as B = R E y [ 2l( ˆβ; y)] = E y [E y [ [ 2l( ˆβ; y) 2l( ˆβ; ]] y ) = 2E y ] (y i d i ) ˆθ i. (3.) By applying a Taylor expansion around ˆβ = β to ˆθ i = (h µ) (x i ˆβ), ˆθ i is expanded as ˆθ i = θ i + ( ˆβ β) θ i β + 2 ( ˆβ β) 2 θ i β β ( ˆβ β) + 6 ( ˆβ β) {( β 2 β β ) θ i {( ˆβ β) ( ˆβ β) + O p (n 2 ). (3.2) Substituting the stochastic expansion of ˆβ in (2.3) into (3.2) yields the following: ˆθ i = θ i + c i x n ib + {c i x ib n c i2(x ib ) 2 + {c n i x ib 3 + c i2 (x ib )(x ib 2 ) + 6 n c i3(x ib ) 3 + O p (n 2 ). (3.3) By combining (3.) and (3.3), we obtain [ ] [ ] (y i d i ) B = 2E θ i + 2 (y i d i ) E c i x n ib [ + 2 n E (y i d i ) {c i x ib c i2(x ib ] ) 2 [ + 2 { n n E (y i d i ) c i x ib 3 + c i2 (x ib )(x ib 2 ) + ] 6 c i3(x ib ) 3 + O(n 2 ). (3.4) Recall that d i = b(θ i )/ θ i = E[y i ]. This yields the first term of (3.4), as follows: [ ] (y i d i ) 2E θ i = 0. (3.5) Since E[gg ] = M 2, the second term of (3.4) can be calculated as [ ] 2 (y i d i ) E c i x n ib = 2E[g M2 g] = 2p. (3.6) The third term of (3.4) can be obtained as [ 2 y i d i E {c i x ib n c i2(x ib ] ) 2 3 = d n 2 i3 c 2 ic i2 Uii 2 + d n 3 2 i3 c 3 i(d j3 c 3 j + 3d j2 c j c j2 )Uij 3 + O(n 2 ), (3.7) i,j where n i,j refers to n n j=, and U ij is the (i, j)th element of the matrix U = XM2 X, i.e., U ij = x im 2 x j. (3.8) 7
8 Note that coefficient U ij is determined by both the link function and the distribution of the model. The derivation of (3.7) is shown in Appendix A.. Furthermore, the fourth term of (3.4) can be expanded as [ 2 n n E (y i d i ) {c i x ib 3 + c i2 (x ib )(x ib 2 ) + 6 c i3(x ib ] ) 3 = (d n 2 i4 c 4 i + 6d i3 c 2 ic i2 d i2 c 2 i2)uii 2 {2(d n 3 2 i3 c 3 i)(d j3 c 3 j + 3d j2 c j c j2 ) + 4(d i2 c i c i2 )(d j2 c j c j2 )Uij 3 n 3 2 i,j {(d i3 c 3 i)(d j3 c 3 j + 3d j2 c j c j2 ) + 4(d i2 c i c i2 )(d j2 c j c j2 )U ij U ii U jj + O(n 2 ). (3.9) i,j The detailed derivation of (3.9) is given in Appendix A.2. Finally, by substituting (3.5), (3.6), (3.7), and (3.9) into (3.4), we obtain the asymptotic expansion of B up to order n as B = 2p + n (w + w 2 ) + O(n 2 ), (3.0) where w = n w 2 = n 2 2 (d i4 c 4 i + 3d i3 c 2 ic i2 d i2 c 2 i2)uii, 2 {d i3 c 3 i(d j3 c 3 j + 3d j2 c j c j2 ) + 4(d i2 c i c i2 )(d j2 c j c j2 )(Uij 3 + U ij U ii U jj ). i,j (3.) By a simple calculation, we have c i = and c i2 = 0 when the link function is natural. Thus, if the model has the natural link function, w and w 2 became simple, as follows: w = n w 2 = n 2 2 d i4 Uii, 2 d i3 d j3 (Uij 3 + U ij U ii U jj ). i,j (3.2) Equation (3.0) yields the following formula for the CAIC: CAIC = AIC + n (ŵ + ŵ 2 ), (3.3) where ŵ and ŵ 2 are defined by replacing β in w and w 2 with ˆβ. On the other hand, if h is not the natural link function, we have to use w and w 2 in (3.). Note that ŵ and ŵ 2 depend only on several derivatives. Therefore, we can comfortably obtain coefficients ŵ and ŵ 2 using formula manipulation software. 8
9 4. THE CAIC IN THE MODELS IMPLEMENTED IN glm In this section, we will present several examples of the CAIC in the GLM, which can be used in the glm of the R software. In glm, the binomial distribution accepts the links logit, Probit, cauchit, and cloglog (complementary log-log). The Gamma distribution accepts the inverse, identity, and log links. The Poisson accepts the distribution of the log, identity, and sqrt links, and the inverse Gaussian distribution accepts the /µ 2, inverse, and log links. These examples are obtained through our formula in (3.3). If h is the natural link function, w and w 2 in (3.3) are expressed as shown in (3.2). Otherwise, w and w 2 in (3.3) are expressed as in (3.). Next, we present the coefficients of the CAIC in the all of models mentioned above. The results indicate that the CAIC in the model with a non-natural link function is more complex than that in the model with the natural link function. An expression of the log-likelihood function of the GLM is given by Equation (2.). 4.. Case of the Binomial Distribution When we assume that y i is distributed according to the Binomial distribution B(m i, p i ) (i =,..., n), the parameters and functions based on the distribution are given by ( ) pi θ i = log, b(θ i ) = m i log( + exp(θ i )), p i ( ) mi ϕ =, =, c(y i, ϕ) = log. Then, the coefficients of the CAIC specifying the distribution are given by d i = m i exp(θ i ) + exp(θ i ), d i2 = m i exp(θ i ) ( + exp(θ i )) 2, d i3 = m i exp(θ i )( exp(θ i )) ( + exp(θ i )) 3, d i4 = m i exp(θ i )( 4 exp(θ i ) + exp(2θ i )) ( + exp(θ i )) 4. The remaining coefficients of the CAIC, which are determined by the link function, are as follows: Case of the logistic link function, i.e., E[y i ] = m i p i = m i ( + exp( η i )) (natural link function): { U ij = x i m k exp( η k ) n ( + exp( η k )) x kx 2 k x j. k= The CAIC derived from the above coefficients coincides with the CAIC reported by Yanagihara et al. (2003). Case of the probit link function, i.e., m i p i = m i Φ(η i ), where Φ( ) is the cumulative distribution y i 9
10 function (CDF) of the standard normal distribution: ϕ(η i ) c i = ( Φ(η i ))Φ(η i ), c ϕ(η i ) i2 = {( Φ(η i ))Φ(η i ) {η iφ(η 2 i )( Φ(η i )) + ϕ(η i )( 2Φ(η i )), { U ij = x i m k ϕ(η k ) 2 n ( Φ(η k ))Φ(η k ) x kx k x j, k= where ϕ( ) is the probability density function (PDF) of the standard normal distribution. Case of the cauchit link function, i.e., m i p i = m i Ψ(η i ), where Ψ( ) is the CDF of the standard Cauchy distribution: ψ(η i ) c i = ( Ψ(η i ))Ψ(η i ), c ψ(η i ) 2 i2 = {( Ψ(η i ))Ψ(η i ) {2πη iψ(η 2 i )( Ψ(η i )) ( 2Ψ(η i )), { U ij = x i m k ψ(η k ) 2 n ( Ψ(η k ))Ψ(η k ) x kx k x j, k= where ψ( ) is the PDF of the standard Cauchy distribution. Case of the cloglog link function i.e., m i p i = m i m i exp{ exp(η i ): exp(η i ) c i = exp( exp(η i )), c i2 = exp(η i){ ( exp(η i )) exp( exp(η i )), { exp( exp(η i )) 2 { U ij = x i m k exp(2η k ) exp(exp(η k )) x k x n exp( exp(η k )) k x j. k= 4.2. Case of the Poisson Distribution Second, when we assume that y i is distributed according to a Poisson distribution P o(λ i ) (i =,..., n), the parameters and functions based on the model are given as follows: θ i = log λ i, b(θ i ) = exp(θ i ), ϕ =, =, c(y i, ϕ) = log(y i!). The coefficients of the CAIC specifying the distribution are given by d i = d i2 = d i3 = d i4 = exp(θ i ). The remaining coefficients of the CAIC, which are determined by the link function, are as follows: Case of the log link function, i.e., E[y i ] = λ i = e η i (natural link function): { U ij = x i exp(η k )x k x k x j. n k= The CAIC obtained from the above coefficients coincides with that reported by Kamo et al. (20). 0
11 Case of the identity link function, i.e., λ i = η i : c i =, η i c i2 =, ηi 2 U ij = x i ( n k= ) x k x k x j. η k Case of the sqrt link function, i.e., λ i = ηi 2 : c i = 2, η i c i2 = 2, ηi 2 U ij = x i ( 4 n ) x k x k x j. k= 4.3. Case of the Gamma Distribution When we assume that a positive observed response, y i, is distributed according to the Gamma distribution Γ(λ i, ν) (i =,..., n), the parameters and functions based on the model are given as follows: θ i = λ i ν, b(θ i) = log( θ i ), ϕ = ν, = ϕ, c(y i, ϕ) = ν log ν log(γ(ν)) + (ν ) log y i, where Γ( ) is the gamma function. Then, the coefficients of the CAIC specifying the distribution are given by d i = θ i, d i2 = θ 2 i, d i3 = 2θ 3 i, d i4 = 6θ 4 i. The remaining coefficients of the CAIC, which are determined by the link function, are as follows: Case of the inverse link function, i.e., E[y i ] = λ i ν = η i ( U ij = x i ν n k= (natural link function): η 2 k x kx k) x j. Case of the log link function, i.e., λ i ν = exp(η i ): ( c i = exp( η i ), c i2 = exp( η i ), U ij = x i ν n x k x k) x j. k= Case of the identity link function, i.e., λ i ν = η i : c i = η 2 i, c i2 = 2η 3 i U ij = x i ( ν n k= η 4 k x kx k) x j.
12 4.4. Case of the Inverse Gaussian Distribution When we assume that a positive observed response, y i, is distributed according to the inverse Gaussian distribution IG(µ i, λ) (i =,..., n), the parameters and functions based on the model are given as follows: θ i = µ 2 i, b(θ i ) = 2 θ i, ϕ = λ, = λ 2, c(y i, ϕ) = 2 log(λ) 2 log(2πx3 ) λ 2x, Then, the coefficients of the CAIC specifying the distribution are given as d i = θ /2 i, d i2 = 2 θ 3/2 i, d i3 = 3θ 5/2 i, d i4 = 5 2 θ 7/2 i. The remaining coefficients of the CAIC, which are determined by the link function, are as follows: Case of the /µ 2 link function, i.e., E[y i ] = µ i = η /2 i ( U ij = x i nλ Case of the inverse link function, i.e., µ i = η i : k= c i = 2η i, c i2 = 2, U ij = x i Case of the log link function, i.e., µ i = exp(η i ): (natural link function): η 3/2 k x k x k) x j. ( 4 nλ c i = 2 exp( 2η i ), c i2 = 4 exp( 2η i ), U ij = x i k= ( η k x kx k) x j. 4 nλ exp( η k )x k x k) x j. k= 5. NUMERICAL STUDIES In this section, we conduct numerical studies to show that the CAIC is better than the original AIC. At the beginning of this section, we examine the numerical studies for the frequencies of the model and the prediction error of the best models selected by the criteria. We prepared the eight candidate models with n = 50 and 00. First, we constructed an n 8 explanatory variable matrix X = (x,..., x n ). The first column of X is n, where n is an n-dimensional vector of ones, and the remaining seven columns of X were defined by realizations of independent dummy variables with binomial distribution B(, 0.4). In this simulation, we prepared two parameters β, as follows: Case : β = (0.65, 0.65), Case2 : β = (0., 0., 0.3, 0.5). 2
13 The explanatory variables matrix in the jth model consists of the first j columns of X (j =,..., 8). Thus, in case, the true model is the second model, and in case 2, the true model is the fourth model. We simulated,000 realizations of y = (y,..., y n ) in the probit regression model, i.e., y i.d i B(, p i ), where p i = Φ(x iβ) (i =,..., n). Table : Selection-probability and prediction error for the case of β = (0.65, 0.65) n Model PE B Risk AIC CAIC Risk AIC CAIC Table 2: Selection-probability and prediction error for the case in which β = (0., 0., 0.3, 0.5) n Model PE B Risk AIC CAIC Risk AIC CAIC Tables and 2 list the following properties. () selection-probability: the frequency of the model chosen by minimizing the information criterion. (2) prediction error of the best model (PE B ): the risk function of the model selected by the information criterion as the best model, which is estimated as PE B = 000 E y [ 2l( 000 ˆβ Bi ; y )], where y is a future observation, and ˆβ Bi iteration. is the value of ˆβ of the selected model at the ith In particular, PE B is an important property because it is equivalent to the expected KL information between the true model and the best model selected by the criteria. In case with n = 50 and 00 and in case 2 with n = 00, the model having the smallest risk function (referred to as the principle best model) coincides with the true model. On the other hand, in case 2 with n = 50, 3
14 the principle best model became the first model rather than the fourth model, i.e., the true model did not conform with the principle best model. This means that using a model that is smaller than the true model is better for prediction in case 2 with n = 50. From Tables and 2, we can see that the selection-probabilities and prediction errors of the CAIC were improved in all situations in comparison with the AIC. We simulated several other models and obtained similar results. Table 3: Selection-probability and estimated prediction error Selected model AIC CAIC Logistic Probit Total Logistic Probit Total X Ray X Ray, Acid X Ray, Age 0 0 X Ray, Grade X Ray, Stage X Ray, Grade, Acid X Ray, Stage, Acid X Ray, Stage, Age X Ray, Stage, Grade X Ray, Grade, Age, Acid X Ray, Stage, Age, Acid X Ray, Stage, Grade, Acid X Ray, Stage, Grade, Age X Ray, Stage, Grade, Age, Acid PE B Next, for the purpose of analyzing the GLM, we consider the data reported in Brown (980), who discussed an experiment in which 53 prostate cancer patients underwent surgery to examine their lymph nodes for evidence of cancer. The response variable is the number of patients with nodal involvement, and there were five predictor variables: X Ray, Stage, Age, Acid, and Grade. First, we assume that the response variable y i is distributed according to B(, p i ) (i =,..., n). For the link function, we prepare two functions: the logistic link function and the Probit link function. In this analysis, we select the link functions and variables simultaneously. Table 3 shows the selection-probability of the model selected by minimizing the information criterion and the estimated prediction error of the best model selected by the information criterion. We divide the data into calibration sample data and validation sample data. The sample sizes of the calibration sample and the validation sample were 43 and 0, respectively. The best subset of the variables and the link function were selected by criteria derived from the calibration sample. The selection-probabilities were evaluated from only the calibration sample. The prediction errors were estimated as follows. Let d j = (d j,..., d nj ) be an n-dimensional vector expressing a pattern to leave out 0 data at the jth iteration j =,..., 00), i.e., d ij = or 0 and n d ij = 0. Moreover, we let ˆβ B,[ dj ] denote 4
15 ˆβ [ dj ] of β of the best model evaluated from the calibration sample, where ˆβ [ dj ] is given as ˆβ [ dj ] = arg max β Finally, the estimated PE B is given as PE B = j= ( d ij ) log f(y i ; β). d ij { 2 log f(y i ; ˆβ B,[ dj ]). Table 3 indicates that the models selected by the AIC were spread over a wider area than those of the CAIC, although the model most selected by the AIC is the same as that selected by the CAIC. In particular, the selection probability of the model most selected by the CAIC is much higher than that selected by the AIC. The estimated prediction error of the CAIC was smaller than that of the AIC. Thus, the CAIC is thought to have improved the accuracy of the original AIC. Consequently, from Tables, 2, and 3, we recommend the use of the CAIC rather than the AIC for selecting variables in the GLMs. 6. CONCLUSION In the present paper, we derived a simple formula for the CAIC in the GLM. All of the coefficients in our formula are the first through fourth derivatives of the log-likelihood function. The GLM can express a number of statistical models by changing the distribution and the link function and can be easily fitted to the real data using the function glm in the R software, which implements several distributions and link functions. Hence, based on the real data analysis, the present result is useful for real data analysis. Moreover, the numerical studies revealed that the CAIC is better than the original AIC. For example, we presented explicit forms of the CAIC in all of the models that are implemented in glm. These explicit forms are thus confirmed to be useful in real data analysis. Even if a researcher wants to use the CAIC in a model that for which an example CAIC has not yet been derived, the researcher can easily obtain the CAIC using formula manipulation software. In the present paper, we deal primarily with variable selection. However, in the simulation of the real data analysis, we also considered the selection of the link function. If we choose the link function by minimizing the original AIC, the optimal link function is determined only by maximizing the log-likelihood function. On the other hand, if we use the CAIC to select the link function, the optimal link function is not determined only by maximizing the log-likelihood function. Thus, using the CAIC will allow us to select an appropriate link function. As mentioned above, we confirm that the results are useful and user friendly. 5
16 APPENDIX A. Derivation of the Third Term of (3.4) In order to calculate the moments of b and Z 2, we rewrite the third term of (3.4) using b, Z 2, and M 3 as [ 2 n E (y i d i ) { c i x ib c i2(x ib ) 2 ] = n E [b M 3 (b b )] + 3 n E [b Z 2 b ], (A.) where c ij and d ij are defined in (2.5), and U ij is defined in (3.8). Let φ b (t) be the characteristic function of the distribution of b, defined as φ b (t) = E[exp(it b )] = n E[exp{i(y j d j )s j ], j= s j = n c j t M 2 x j, where t = (t,..., t p ). Note that E[exp{i(y µ)s] is the characteristic function of y µ, which is expressed as Therefore, we have { b(θ + is) b(θ) E[exp{i(y µ)s] = exp iµs. { ( ) b(θj + is j ) b(θ j ) φ b (t) = exp id j s j. j= Based on the property of the random variable with mean zero, the third moment is equivalent to the third cumulant. Since s j = O(n /2 ), log φ b (t) can be expanded as log φ b (t) = { b(θj + is j ) b(θ j ) id j s j j= = j= { 2 d j2(is j ) d j3(is j ) d j4(is j ) 4 j= + O(n 3/2 ). Thus, the third cumulant of b = (b,..., b p ) is computed through the derivative of log φ b (t), i.e., E[b α b α2 b α3 ] = i 3 3 log φ b (t) t α t α2 t α3 t=0 = 2 s j s j s j d j3 + O(n 3/2 ). t α t α2 t α3 Note that s j = c j e t αl n α l M2 x j, 6
17 where e j is the p-dimensional vector, the jth element of which is and the other elements of which are 0. Thus, using Equations (2.5) and (3.8), we obtain n E[b M 3 (b b )] = n 3 d j3 c 3 j(d i3 c 3 i + 3d i2 c i c i2 )Uij 3 + O(n 2 ). i,j (A.2) Let φ b,z 2 (t, T ) denote the characteristic function of the joint distribution for b and Z 2 as { b(θ j + iv j ) b(θ j ) φ b,z 2 (t, T ) = exp id j v j, where T = (t () ij ) (i, j =,..., p) and v j = j= n ( c j t M 2 x j + c j2 x jt x j ). In the same manner as in the calculation of log ϕ b (t), we have log φ b,z 2 (t, T ) = Note that { b(θj + iv j ) b(θ j ) id j v j j= = j= { 2 d j2(iv j ) d j3(iv j ) d j4(iv j ) 4 + O(n 3/2 ). v k = c k e t i n im2 x k, v k T ij = n c k2 (e ix k )(e jx k ). Hence, we obtain n E[b Z 2 b ] = n 2 d i3 c 2 ic i2 Uii 2 + O(n 2 ). (A.3) Substituting (A.2) and (A.3) into (A.), the third term of (3.4) is given by (3.7). A.2 Derivation of the Fourth Term of (3.4) In order to use the asymptotic properties, we express the fourth term of (3.4) in terms of b, Z 2, Z 3 M 2 M 3, and M 4 as [ 2 n n E (y i d i ) {c i x ib 3 + c i2 (x ib )(x ib 2 ) + 6 c i3(x ib ) 3 ] = n E [ (b b ) M 3M 2 M 3 (b b ) ] + 3n E [(b b ) M 4 (b b )] 3 n E [ (b b ) M 3M 2 Z 2 b ] 4 n E [ b Z 2 M 2 Z 2 b ] + 4 3n E [b Z 3 (b b )]. (A.4) 7
18 Let φ b,z 2,Z 3 ( ) denote the characteristic function of the joint distribution for b, Z 2, and Z 3 defined by { ( ) b(θj + ir j ) b(θ j ) φ b,z 2,Z 3 (t, T 2, T 3 ) = exp id j r j, where T 2 = (T (2) ij ) (i, j =,..., p), T 3 = (T (3) ) (i, j, k =,..., p) and r j = j= ijk n ( c j t M 2 x j + c j2 x jt 2 x j + c j3 x jt 3 (x j x j )). In order to simplify the calculations, we define the following notation: τ kj = k! (id jkr j ) k, 2 τ 2α κ ij = = t α= i t j n m= 2 τ 2α κ i,kl = = α= t i T (2) n jk m= 2 τ 2α κ ik,jl = = α= T (2) (2) ik T n jl κ i,ijk = α= 2 τ 2α t i T (3) ijk = n m= d m2 c 2 m(e im 2 x m )(e jm 2 x m ), (A.5) d m2 c m c m2 (e im 2 x m )(e kx m )(e lx m ), (A.6) d m2 c 2 m2(e ix m )(e kx m )(e jx m )(e lx m ), m= Using the derivations of ϕ b,z 2,Z 3, the first term of (A.4) is given by E[(b b ) M 3M 2 M 3 (b b )] = = (A.7) d m2 c m c m3 (e im 2 x m )(e ix m )(e jx m )(e kx m ). (A.8) p [M 3M 2 M 3 ] i,j,k,l E[b i b j b k b l ] i,j,k,l p [M 3M 2 4 φ b,z M 3 ] 2,Z 3 (t, T 2, T 3 ) i,j,k,l t i t j t k t l i,j,k,l t=0p,t 2 =0,T 3 =0 By applying a Taylor expansion, we obtain { ( ) 4 b(θj + ir j ) b(θ j ) exp id j r j t i t j t k t l j= t=0p,t2=0,t3=0 { 4 = exp (τ 2j + τ 3j + τ 4j ) + O(n ) 3/2 t i t j t k t l j= t=0 p,t 2 =0,T 3 =0 { 4 τ 4α = κ ij κ kl + κ ik κ jl + κ jk κ il + + O(n 2/3 ) exp { + O(n 3/2 ). t i t j t k t l Note that r j = O(n /2 ) and 4 τ 4α t i t j t k t l = α= α= 3 d α4 r α t i r α t j r α t k r α t l = O(n ). 8.
19 Hence, the first term of (A.4) is expressed as p E[(b b ) M 3M 2 M 3 (b b )] = [M 3M 2 M 3 ] i,j,k,l (κ ij κ kl + κ ik κ jl + κ jk κ il ) + O(n ). (A.9) i,j,k,l By substituting (A.5) into (A.9), we obtain E[(b b ) M 3M 2 M 3 (b b )] = (d n 2 2 i2 c 3 i + 3d i2 c i c i2 )(d j2 c 3 j + 3d j2 c j c j2 )(U ii U ij U jj + 2Uij) 3 + O(n ). (A.0) i,j The remaining terms of (A.4), as well as the first term of (A.4), will be calculated. The second term of (A.4) is similarly obtained from (A.0) as follows: E[(b b ) M 4 (b b )] = 3 (d i4 c 4 i + 6d i3 c 2 n ic i2 + 3d i2 c 2 i2 + 4d i2 c i c i3 )Uii 2 + O(n ). Next, we calculate the third term of (A.4). The third term of (A.4) is expressed as follows: E [ ] p (b b ) M 3M 2 Z 2 b = [M 3M 2 ] i,j,k E[b i b j b l Z 2,kl ] Expression (A.6) implies that = i,j,k,l (A.) p [M 3M 2 ] i,j,k (κ ij κ l,kl + κ ik κ j,kl + κ jk κ i,kl ) + O(n ), (A.2) i,j,k,l E [ ] (b b ) M 3M 2 Z 2 b = (d n 2 2 i3 c 3 i + 3d i2 c i c i2 )d j2 c j c j2 (U ii U ij U jj + 2Uij) 3 + O(n ). (A.3) The fourth term of (A.4) is as follows: i,j E [ ] p b Z 2 M2 Z 2 b = i,j,k,l It follows from (A.7) and (A.4) that E [ ] b Z 2 M2 Z 2 b = d i2 c 2 n i2u ii + n 2 2 [M Finally, we calculate the fifth term of (A.4). Note that p E [b Z 3 (b b )] = E[Z 3,ijk b i b j b k ] = 2 ] jk (κ il κ ik,jl + κ i,ij κ l,kl + κ i,kl κ l,ij ) + O(n ). (A.4) (d i2 c i c i2 )(d j2 c j c j2 )(U ii U ij U jj + Uij) 3 + O(n ). (A.5) i,j i,j,k p (κ ij κ k,ijk + κ ik κ j,ijk + κ jk κ i,ijk ) + O(n ). (A.6) i,j,k 9
20 Substituting (A.8) into (A.6) yields E [b Z 3 (b b )] = 3 n d i2 c i c i3 Uii 2 + O(n ). (A.7) Consequently, from (A.0), (A.), (A.3), (A.5), and (A.7), we obtain the fourth term of (3.4) as (3.9). ACKNOWLEDGEMENTS The authors would like to thank Dr. Mariko Yamamura of Hiroshima University and Dr. Eisuke Inoue of Kitazato University for their helpful comments and suggestions. The authors would also like to thank Dr. Kenichi Kamo of Sapporo Medical University for teaching us the R programming langauge. REFERENCES [] Akaike, H. (973). Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory (eds. Petrov, B. N. & Csáki, F.), , Akadémiai Kiadó, Budapest. [2] Barnett, S. B. L. & Nurmagambetov, T. A. (200). Cost of asthma in the United States: J. Allergy Clin. Immun., 27, [3] Brown, B. W. (980). Prediction analysis for binary data. In Biostatistics Casebook. (eds. Miller Jr., G. R., Efron., B., Brown, B. W., & Moses, L. E.), 3 8, Wiley, New York. [4] Davies, S. L., Neath, A. A. & Cavanaugh, J. E. (2006). Estimation optimality of corrected AIC and modified Cp in linear regression. Internat. Statist. Rev., 74, 2, [5] Fahrmeir, L. & Kaufmann, H. (985). Consistency and asymptotic normality of the maximum likelihood estimator in generalized linear models. Ann. Statist., 3, [6] Hurvich, C. M. & Tsai, C. L. (989). Regression and time series model selection in small samples. Biometrika, 76, [7] Kamo, K., Yanagihara, H. & Satoh, K. (20). Bias-corrected AIC for selecting variables in Poisson regression models. Comm. Statist. Theory Methods (in press). [8] Kullback, S. & Libler, R. (95). On information and sufficiency. Ann. Math. Statist., 22, [9] Nelder, J. A. & Wedderburn, W. M. (972). Generalized linear models. J. R. Statist. Soc. A., 35,
21 [0] Matas, M., Picornell, A., Cifuentes, C., Payeras, A., Bassa, A., Homar, F., López-Labrador, F. X., Moya, A., Ramon, M. M. & Castro, J. A. (200). Relating the liver damage with hepatitis C virus polymorphism in core region and human variables in HIV--coinfected patients. Infect. Genet. Evol., 0, [] Meyers, R. H., Montgomery, D. C. & Vining, G. G. (2002). Generalized Linear Models with Applications in Engineering and the Sciences. Wiley Interscience, Canada. [2] R Development Core Team (20). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. URL [3] Sánchez-Carneo, N., Couñago E., Rodrigues-Perez, D. & Freire, J. (20). Exploiting oceanographic satellite data to study the small scale coastal dynamics in a NE Atlantic open embayment. J. Marine Syst., 87, [4] Shibata, R. (980). Asymptotically efficient selection of the order of the model for estimating parameters of a linear process. Ann. Math. Statist., 8, [5] Sugiura, N. (978). Further analysis of the data by Akaike s information criterion and the finite corrections. Commun. Statist. -Theory Meth., 7,, [6] Teste, F. P. & Lieffers, V. J. (20). Snow damage in lodgepole pine stands brought into thinning and fertilization regimes. Forest Ecol. Manag., 26, [7] Wong, C. S. & Li, W. K. (998). A note on the corrected Akaike information criterion for threshold autoregressive models. J. Time Ser. Anal., 9, [8] Yanagihara, H., Sekiguchi, R. & Fujikoshi, Y. (2003). Bias correction of AIC in logistic regression models. J. Statist. Plann. Inference, 5,
Bias-corrected AIC for selecting variables in Poisson regression models
Bias-corrected AIC for selecting variables in Poisson regression models Ken-ichi Kamo (a), Hirokazu Yanagihara (b) and Kenichi Satoh (c) (a) Corresponding author: Department of Liberal Arts and Sciences,
More informationOn Properties of QIC in Generalized. Estimating Equations. Shinpei Imori
On Properties of QIC in Generalized Estimating Equations Shinpei Imori Graduate School of Engineering Science, Osaka University 1-3 Machikaneyama-cho, Toyonaka, Osaka 560-8531, Japan E-mail: imori.stat@gmail.com
More informationBias Correction of Cross-Validation Criterion Based on Kullback-Leibler Information under a General Condition
Bias Correction of Cross-Validation Criterion Based on Kullback-Leibler Information under a General Condition Hirokazu Yanagihara 1, Tetsuji Tonda 2 and Chieko Matsumoto 3 1 Department of Social Systems
More informationA Study on the Bias-Correction Effect of the AIC for Selecting Variables in Normal Multivariate Linear Regression Models under Model Misspecification
TR-No. 13-08, Hiroshima Statistical Research Group, 1 23 A Study on the Bias-Correction Effect of the AIC for Selecting Variables in Normal Multivariate Linear Regression Models under Model Misspecification
More informationSTA216: Generalized Linear Models. Lecture 1. Review and Introduction
STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general
More informationAn Unbiased C p Criterion for Multivariate Ridge Regression
An Unbiased C p Criterion for Multivariate Ridge Regression (Last Modified: March 7, 2008) Hirokazu Yanagihara 1 and Kenichi Satoh 2 1 Department of Mathematics, Graduate School of Science, Hiroshima University
More informationLOGISTIC REGRESSION Joseph M. Hilbe
LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of
More informationSTA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random
STA 216: GENERALIZED LINEAR MODELS Lecture 1. Review and Introduction Much of statistics is based on the assumption that random variables are continuous & normally distributed. Normal linear regression
More informationSelection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models
Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models Junfeng Shang Bowling Green State University, USA Abstract In the mixed modeling framework, Monte Carlo simulation
More informationVariable Selection in Multivariate Linear Regression Models with Fewer Observations than the Dimension
Variable Selection in Multivariate Linear Regression Models with Fewer Observations than the Dimension (Last Modified: November 3 2008) Mariko Yamamura 1, Hirokazu Yanagihara 2 and Muni S. Srivastava 3
More informationKen-ichi Kamo Department of Liberal Arts and Sciences, Sapporo Medical University, Hokkaido, Japan
A STUDY ON THE BIAS-CORRECTION EFFECT OF THE AIC FOR SELECTING VARIABLES IN NOR- MAL MULTIVARIATE LINEAR REGRESSION MOD- ELS UNDER MODEL MISSPECIFICATION Authors: Hirokazu Yanagihara Department of Mathematics,
More informationMODEL SELECTION BASED ON QUASI-LIKELIHOOD WITH APPLICATION TO OVERDISPERSED DATA
J. Jpn. Soc. Comp. Statist., 26(2013), 53 69 DOI:10.5183/jjscs.1212002 204 MODEL SELECTION BASED ON QUASI-LIKELIHOOD WITH APPLICATION TO OVERDISPERSED DATA Yiping Tang ABSTRACT Overdispersion is a common
More informationModel Selection for Semiparametric Bayesian Models with Application to Overdispersion
Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS020) p.3863 Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Jinfang Wang and
More informationGeneralized Linear Models Introduction
Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,
More informationDiscrepancy-Based Model Selection Criteria Using Cross Validation
33 Discrepancy-Based Model Selection Criteria Using Cross Validation Joseph E. Cavanaugh, Simon L. Davies, and Andrew A. Neath Department of Biostatistics, The University of Iowa Pfizer Global Research
More informationGeneralized Linear Models. Last time: Background & motivation for moving beyond linear
Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered
More informationWeighted Least Squares I
Weighted Least Squares I for i = 1, 2,..., n we have, see [1, Bradley], data: Y i x i i.n.i.d f(y i θ i ), where θ i = E(Y i x i ) co-variates: x i = (x i1, x i2,..., x ip ) T let X n p be the matrix of
More informationComparison with RSS-Based Model Selection Criteria for Selecting Growth Functions
TR-No. 14-07, Hiroshima Statistical Research Group, 1 12 Comparison with RSS-Based Model Selection Criteria for Selecting Growth Functions Keisuke Fukui 2, Mariko Yamamura 1 and Hirokazu Yanagihara 2 1
More informationJackknife Bias Correction of the AIC for Selecting Variables in Canonical Correlation Analysis under Model Misspecification
Jackknife Bias Correction of the AIC for Selecting Variables in Canonical Correlation Analysis under Model Misspecification (Last Modified: May 22, 2012) Yusuke Hashiyama, Hirokazu Yanagihara 1 and Yasunori
More informationStandard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j
Standard Errors & Confidence Intervals β β asy N(0, I( β) 1 ), where I( β) = [ 2 l(β, φ; y) ] β i β β= β j We can obtain asymptotic 100(1 α)% confidence intervals for β j using: β j ± Z 1 α/2 se( β j )
More informationLinear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52
Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Components of a linear model The two
More informationComparison with Residual-Sum-of-Squares-Based Model Selection Criteria for Selecting Growth Functions
c 215 FORMATH Research Group FORMATH Vol. 14 (215): 27 39, DOI:1.15684/formath.14.4 Comparison with Residual-Sum-of-Squares-Based Model Selection Criteria for Selecting Growth Functions Keisuke Fukui 1,
More informationGeneralized Linear Models (1/29/13)
STA613/CBB540: Statistical methods in computational biology Generalized Linear Models (1/29/13) Lecturer: Barbara Engelhardt Scribe: Yangxiaolu Cao When processing discrete data, two commonly used probability
More informationGeneralized Linear Models
Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.
More information1. Introduction Over the last three decades a number of model selection criteria have been proposed, including AIC (Akaike, 1973), AICC (Hurvich & Tsa
On the Use of Marginal Likelihood in Model Selection Peide Shi Department of Probability and Statistics Peking University, Beijing 100871 P. R. China Chih-Ling Tsai Graduate School of Management University
More informationST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples
ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will
More informationConsistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis
Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis Yasunori Fujikoshi and Tetsuro Sakurai Department of Mathematics, Graduate School of Science,
More informationGeneralized linear models
Generalized linear models Søren Højsgaard Department of Mathematical Sciences Aalborg University, Denmark October 29, 202 Contents Densities for generalized linear models. Mean and variance...............................
More informationStatistics 203: Introduction to Regression and Analysis of Variance Course review
Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying
More informationGeneralized Linear Models: An Introduction
Applied Statistics With R Generalized Linear Models: An Introduction John Fox WU Wien May/June 2006 2006 by John Fox Generalized Linear Models: An Introduction 1 A synthesis due to Nelder and Wedderburn,
More informationGeneralized Linear Models
Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n
More informationGeneralized Linear Models. Kurt Hornik
Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general
More informationSCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models
SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION
More informationA Fast Algorithm for Optimizing Ridge Parameters in a Generalized Ridge Regression by Minimizing an Extended GCV Criterion
TR-No. 17-07, Hiroshima Statistical Research Group, 1 24 A Fast Algorithm for Optimizing Ridge Parameters in a Generalized Ridge Regression by Minimizing an Extended GCV Criterion Mineaki Ohishi, Hirokazu
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More informationModel Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection
Model Selection in GLMs Last class: estimability/identifiability, analysis of deviance, standard errors & confidence intervals (should be able to implement frequentist GLM analyses!) Today: standard frequentist
More informationLinear, Generalized Linear, and Mixed-Effects Models in R. Linear and Generalized Linear Models in R Topics
Linear, Generalized Linear, and Mixed-Effects Models in R John Fox McMaster University ICPSR 2018 John Fox (McMaster University) Statistical Models in R ICPSR 2018 1 / 19 Linear and Generalized Linear
More informationGeographically Weighted Regression as a Statistical Model
Geographically Weighted Regression as a Statistical Model Chris Brunsdon Stewart Fotheringham Martin Charlton October 6, 2000 Spatial Analysis Research Group Department of Geography University of Newcastle-upon-Tyne
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationAsymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands
Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department
More informationGeneralized Linear Models
York SPIDA John Fox Notes Generalized Linear Models Copyright 2010 by John Fox Generalized Linear Models 1 1. Topics I The structure of generalized linear models I Poisson and other generalized linear
More informationModel comparison and selection
BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)
More informationConsistency of test based method for selection of variables in high dimensional two group discriminant analysis
https://doi.org/10.1007/s42081-019-00032-4 ORIGINAL PAPER Consistency of test based method for selection of variables in high dimensional two group discriminant analysis Yasunori Fujikoshi 1 Tetsuro Sakurai
More informationLogistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20
Logistic regression 11 Nov 2010 Logistic regression (EPFL) Applied Statistics 11 Nov 2010 1 / 20 Modeling overview Want to capture important features of the relationship between a (set of) variable(s)
More information,..., θ(2),..., θ(n)
Likelihoods for Multivariate Binary Data Log-Linear Model We have 2 n 1 distinct probabilities, but we wish to consider formulations that allow more parsimonious descriptions as a function of covariates.
More informationSome explanations about the IWLS algorithm to fit generalized linear models
Some explanations about the IWLS algorithm to fit generalized linear models Christophe Dutang To cite this version: Christophe Dutang. Some explanations about the IWLS algorithm to fit generalized linear
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.
More informationLinear Regression With Special Variables
Linear Regression With Special Variables Junhui Qian December 21, 2014 Outline Standardized Scores Quadratic Terms Interaction Terms Binary Explanatory Variables Binary Choice Models Standardized Scores:
More informationBayesian Estimation of Prediction Error and Variable Selection in Linear Regression
Bayesian Estimation of Prediction Error and Variable Selection in Linear Regression Andrew A. Neath Department of Mathematics and Statistics; Southern Illinois University Edwardsville; Edwardsville, IL,
More informationThe consequences of misspecifying the random effects distribution when fitting generalized linear mixed models
The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models John M. Neuhaus Charles E. McCulloch Division of Biostatistics University of California, San
More informationPQL Estimation Biases in Generalized Linear Mixed Models
PQL Estimation Biases in Generalized Linear Mixed Models Woncheol Jang Johan Lim March 18, 2006 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure for the generalized
More informationBayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence
Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns
More informationGauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA
JAPANESE BEETLE DATA 6 MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA Gauge Plots TuscaroraLisa Central Madsen Fairways, 996 January 9, 7 Grubs Adult Activity Grub Counts 6 8 Organic Matter
More informationAn Akaike Criterion based on Kullback Symmetric Divergence in the Presence of Incomplete-Data
An Akaike Criterion based on Kullback Symmetric Divergence Bezza Hafidi a and Abdallah Mkhadri a a University Cadi-Ayyad, Faculty of sciences Semlalia, Department of Mathematics, PB.2390 Marrakech, Moroco
More informationGeneralized Quasi-likelihood versus Hierarchical Likelihood Inferences in Generalized Linear Mixed Models for Count Data
Sankhyā : The Indian Journal of Statistics 2009, Volume 71-B, Part 1, pp. 55-78 c 2009, Indian Statistical Institute Generalized Quasi-likelihood versus Hierarchical Likelihood Inferences in Generalized
More informationSelection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty
Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationLecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2)
Lectures on Machine Learning (Fall 2017) Hyeong In Choi Seoul National University Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2) Topics to be covered:
More informationStat 579: Generalized Linear Models and Extensions
Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 15 1 / 38 Data structure t1 t2 tn i 1st subject y 11 y 12 y 1n1 Experimental 2nd subject
More informationA high-dimensional bias-corrected AIC for selecting response variables in multivariate calibration
A high-dimensional bias-corrected AIC for selecting response variables in multivariate calibration Ryoya Oda, Yoshie Mima, Hirokazu Yanagihara and Yasunori Fujikoshi Department of Mathematics, Graduate
More information1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches
Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model
More informationLatent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent
Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary
More informationSpring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =
Spring 2012 Math 541A Exam 1 1. (a) Let Z i be independent N(0, 1), i = 1, 2,, n. Are Z = 1 n n Z i and S 2 Z = 1 n 1 n (Z i Z) 2 independent? Prove your claim. (b) Let X 1, X 2,, X n be independent identically
More informationGeneralized Estimating Equations
Outline Review of Generalized Linear Models (GLM) Generalized Linear Model Exponential Family Components of GLM MLE for GLM, Iterative Weighted Least Squares Measuring Goodness of Fit - Deviance and Pearson
More informationA General Overview of Parametric Estimation and Inference Techniques.
A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying
More informationEconometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit
Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit R. G. Pierse 1 Introduction In lecture 5 of last semester s course, we looked at the reasons for including dichotomous variables
More informationFULL LIKELIHOOD INFERENCES IN THE COX MODEL
October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach
More informationSTAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song
STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April
More informationApplied Linear Statistical Methods
Applied Linear Statistical Methods (short lecturenotes) Prof. Rozenn Dahyot School of Computer Science and Statistics Trinity College Dublin Ireland www.scss.tcd.ie/rozenn.dahyot Hilary Term 2016 1. Introduction
More informationApproximate Likelihoods
Approximate Likelihoods Nancy Reid July 28, 2015 Why likelihood? makes probability modelling central l(θ; y) = log f (y; θ) emphasizes the inverse problem of reasoning y θ converts a prior probability
More informationDepartment of Mathematics, Graduate School of Science Hiroshima University Kagamiyama, Higashi Hiroshima, Hiroshima , Japan.
High-Dimensional Properties of Information Criteria and Their Efficient Criteria for Multivariate Linear Regression Models with Covariance Structures Tetsuro Sakurai and Yasunori Fujikoshi Center of General
More informationEstimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk
Ann Inst Stat Math (0) 64:359 37 DOI 0.007/s0463-00-036-3 Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Paul Vos Qiang Wu Received: 3 June 009 / Revised:
More informationImproved model selection criteria for SETAR time series models
Journal of Statistical Planning and Inference 37 2007 2802 284 www.elsevier.com/locate/jspi Improved model selection criteria for SEAR time series models Pedro Galeano a,, Daniel Peña b a Departamento
More information12 Modelling Binomial Response Data
c 2005, Anthony C. Brooms Statistical Modelling and Data Analysis 12 Modelling Binomial Response Data 12.1 Examples of Binary Response Data Binary response data arise when an observation on an individual
More informationLecture 8. Poisson models for counts
Lecture 8. Poisson models for counts Jesper Rydén Department of Mathematics, Uppsala University jesper.ryden@math.uu.se Statistical Risk Analysis Spring 2014 Absolute risks The failure intensity λ(t) describes
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationSGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection
SG 21006 Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection Ioan Tabus Department of Signal Processing Tampere University of Technology Finland 1 / 28
More informationStat 579: Generalized Linear Models and Extensions
Stat 579: Generalized Linear Models and Extensions Yan Lu Jan, 2018, week 3 1 / 67 Hypothesis tests Likelihood ratio tests Wald tests Score tests 2 / 67 Generalized Likelihood ratio tests Let Y = (Y 1,
More informationModfiied Conditional AIC in Linear Mixed Models
CIRJE-F-895 Modfiied Conditional AIC in Linear Mixed Models Yuki Kawakubo Graduate School of Economics, University of Tokyo Tatsuya Kubokawa University of Tokyo July 2013 CIRJE Discussion Papers can be
More informationIllustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives
TR-No. 14-06, Hiroshima Statistical Research Group, 1 11 Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives Mariko Yamamura 1, Keisuke Fukui
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More information11. Generalized Linear Models: An Introduction
Sociology 740 John Fox Lecture Notes 11. Generalized Linear Models: An Introduction Copyright 2014 by John Fox Generalized Linear Models: An Introduction 1 1. Introduction I A synthesis due to Nelder and
More informationComparison of Estimators in GLM with Binary Data
Journal of Modern Applied Statistical Methods Volume 13 Issue 2 Article 10 11-2014 Comparison of Estimators in GLM with Binary Data D. M. Sakate Shivaji University, Kolhapur, India, dms.stats@gmail.com
More informationOutline of GLMs. Definitions
Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density
More informationAn Overview of Methods for Applying Semi-Markov Processes in Biostatistics.
An Overview of Methods for Applying Semi-Markov Processes in Biostatistics. Charles J. Mode Department of Mathematics and Computer Science Drexel University Philadelphia, PA 19104 Overview of Topics. I.
More informationGeneralized linear models
Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models
More informationBootstrap prediction and Bayesian prediction under misspecified models
Bernoulli 11(4), 2005, 747 758 Bootstrap prediction and Bayesian prediction under misspecified models TADAYOSHI FUSHIKI Institute of Statistical Mathematics, 4-6-7 Minami-Azabu, Minato-ku, Tokyo 106-8569,
More informationNow consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.
Weighting We have seen that if E(Y) = Xβ and V (Y) = σ 2 G, where G is known, the model can be rewritten as a linear model. This is known as generalized least squares or, if G is diagonal, with trace(g)
More informationCHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA
STATISTICS IN MEDICINE, VOL. 17, 59 68 (1998) CHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA J. K. LINDSEY AND B. JONES* Department of Medical Statistics, School of Computing Sciences,
More informationMachine Learning. Lecture 3: Logistic Regression. Feng Li.
Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification
More informationAkaike Information Criterion
Akaike Information Criterion Shuhua Hu Center for Research in Scientific Computation North Carolina State University Raleigh, NC February 7, 2012-1- background Background Model statistical model: Y j =
More informationThe Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models
The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health
More informationAsymptotic inference for a nonstationary double ar(1) model
Asymptotic inference for a nonstationary double ar() model By SHIQING LING and DONG LI Department of Mathematics, Hong Kong University of Science and Technology, Hong Kong maling@ust.hk malidong@ust.hk
More informationBootstrap Variants of the Akaike Information Criterion for Mixed Model Selection
Bootstrap Variants of the Akaike Information Criterion for Mixed Model Selection Junfeng Shang, and Joseph E. Cavanaugh 2, Bowling Green State University, USA 2 The University of Iowa, USA Abstract: Two
More informationEstimation of Dynamic Regression Models
University of Pavia 2007 Estimation of Dynamic Regression Models Eduardo Rossi University of Pavia Factorization of the density DGP: D t (x t χ t 1, d t ; Ψ) x t represent all the variables in the economy.
More informationVariable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1
Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr
More informationSome Approximations of the Logistic Distribution with Application to the Covariance Matrix of Logistic Regression
Working Paper 2013:9 Department of Statistics Some Approximations of the Logistic Distribution with Application to the Covariance Matrix of Logistic Regression Ronnie Pingel Working Paper 2013:9 June
More informationLecture Stat Information Criterion
Lecture Stat 461-561 Information Criterion Arnaud Doucet February 2008 Arnaud Doucet () February 2008 1 / 34 Review of Maximum Likelihood Approach We have data X i i.i.d. g (x). We model the distribution
More informationObtaining simultaneous equation models from a set of variables through genetic algorithms
Obtaining simultaneous equation models from a set of variables through genetic algorithms José J. López Universidad Miguel Hernández (Elche, Spain) Domingo Giménez Universidad de Murcia (Murcia, Spain)
More informationInference For High Dimensional M-estimates. Fixed Design Results
: Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and
More information