GENERAL FORMULA OF BIAS-CORRECTED AIC IN GENERALIZED LINEAR MODELS

Size: px
Start display at page:

Download "GENERAL FORMULA OF BIAS-CORRECTED AIC IN GENERALIZED LINEAR MODELS"

Transcription

1 GENERAL FORMULA OF BIAS-CORRECTED AIC IN GENERALIZED LINEAR MODELS Shinpei Imori, Hirokazu Yanagihara, Hirofumi Wakaki Department of Mathematics, Graduate School of Science, Hiroshima University ABSTRACT The present paper considers a bias correction of Akaike s information criterion (AIC) for selecting variables in the generalized linear model (GLM). When the sample size is not so large, the AIC has a non-negligible bias that will negatively affect variable selection. In the present study, we obtain a simple expression for a bias-corrected AIC (corrected AIC, or CAIC) in GLMs. A numerical study reveals that the CAIC has better performance than the AIC for variable selection. Key words: Akaike s information criterion, Bias correction, Generalized linear model, Maximum likelihood estimation, Variable selection.. INTRODUCTION In real data analysis, deciding the best subset of variables in regression models is an important problem. It is common for a model selection to measure the goodness of fit of the model by the risk function based on the expected Kullback-Leibler (KL) information (Kullback and Leibler (95)). For actual use, we must estimate the risk function, which depends on unknown parameters. The most famous estimator of the risk function is Akaike s information criterion (AIC) proposed by Akaike (973). Since the AIC can be simply defined as 2 the maximum log-likelihood +2 the number of parameters, the AIC is widely applied in chemometrics, engineering, econometrics, psychometrics, and many other fields for selecting appropriate models using a set of explanatory variables. In addition, the order of the bias of the AIC to the risk function is O(n ), which indicates implicitly that the AIC sometimes has a non-negligible bias to the risk function when the sample size n is not so large. The AIC tends to underestimate the risk function and the bias of AIC tends to increase with the number of parameters in the model. Potentially, the AIC tends to choose the model that has more parameters than the true model as the best model (see Shibata (980)). Combined with these characteristics, the bias will cause a disadvantage whereby the model having the most parameters is easily chosen by the AIC among the candidate models as the best model. One method of avoiding this undesirable result is to correct the bias of the AIC. A number of authors Corresponding author address: m06547@hiroshima-u.ac.jp

2 have investigated bias-corrected AIC for various models. For example, Sugiura (978) developed an unbiased estimator of the risk function in linear regression models, which is the UMVUE of the risk function reported by Davies et al. (2006). Hurvich and Tsai (989) formally adjusted the bias of the AIC (called AICc) in several models. In particular, the AICc is equivalent to Sugiura s bias-corrected AIC in the case of the linear regression model. Wong and Li (998) extended Hurvich and Tsai s AICc to a wider model and verified that their AICc has a higher performance than the original AIC by conducting numerical studies. Unfortunately, except for the linear regression model, the AICc does not completely reduce the bias of the AIC to O(n 2 ). As mentioned previously, the goodness of fit of the model is measured by the risk function based on the expected KL information. Thus, obtaining a higher-order asymptotic unbiased estimator of the risk function will allow us to correctly measure the goodness of fit of the model. This will further facilitate the reasonable selection of variables. From this viewpoint, Yanagihara et al. (2003) and Kamo et al. (20) proposed the bias-corrected AIC s in the logistic model and the Poisson regression model, respectively, which complementarily reduce the bias of the AIC to O(n 2 ). We refer to the completely bias-corrected AIC to O(n 2 ) as the corrected AIC (CAIC). Frequently, the CAIC improves the performance of the original AIC dramatically. This strongly suggests the usefulness of the CAIC for real data analysis. Nevertheless, the CAIC is rarely used in real data analysis because the CAIC has been derived only in a few models. Moreover, since the derivation of the CAIC is complicated, a great deal of practice is needed in order to carry out the calculation of the CAIC if a researcher wants to use the CAIC in a model in which the CAIC has not been derived. Hence, the CAIC is not a user-friendly model selector under the present circumstances, although the CAIC has better performance than the original AIC. If we can obtain the CAIC in a small amount of time, the CAIC will become a useful and user-friendly model selector. The goal of the present paper is to derive a simple formula for the CAIC in the widest model possible. The model considered is the generalized linear model (GLM), which was proposed by Nelder and Wedderburn (972). Nevertheless, the GLM can express a number of statistical models by changing the distribution and the link function, such as the normal linear regression model, the logistic regression model, and the probit model, which are currently commonly used in a number of applied fields, cf., Barnett and Nurmagambetov (200), Matas et al. (200), Sánchez-Carneo et al. (20), and Teste and Lieffers (20). Practically speaking, the GLM can be easily fitted to real data using the glm function in R (R Development Core Team (20)), which implements a number of distributions and link functions. Since the model considered herein is wide and can be easily fitted to real data, the CAIC in the GLM is confirmed useful in real data analysis. Generally, the additional bias correction terms in the CAIC requires estimators of several orders of moments of the response variables. Such moments should be calculated for each specified model. However, we emphasize that 2

3 the moments do not remain in our formula. Hence, in order to apply the CAIC to the real data with our formula, we need only calculate the derivatives of the log-likelihood function of less than or equal to the fourth order. Our formula can be applied using formula manipulation software such as Mathematica. The remainder of the present paper is organized as follows: In Section 2, we consider a stochastic expansion of the maximum likelihood estimator (MLE) in the GLM. In Section 3, we propose a new information criterion by complementarily reducing the bias of the AIC in the GLMs to O(n 2 ). In Section 4, we present several examples of the CAIC in a model in which the glm program is implemented. These examples will be helpful to applied researchers. In Section 5, we investigate the performance of the proposed CAIC through numerical simulations. Technical details are provided in the Appendix. 2. STOCHASTIC EXPANSION OF THE MLE IN THE GLM The GLM considered herein is developed to allow us to fit regression models for the response variables that follow a very general distribution belonging to the exponential family, the probability density function of which is given as follows: { θy b(θ) f(y; θ, ϕ) = exp + c(y, ϕ), (2.) where a( ), b( ), and c( ) are known functions, the unknown parameter θ is referred to as the natural location parameter, and ϕ is often referred to as the scale parameter. (For the details of the GLM, see, e.g., Meyer et al. (2002).) In the present paper, we assume that ϕ is known. The exponential family includes the normal, binomial, Poisson, geometric, negative binomial, exponential, gamma, and inverse normal distributions. Let each datum consist of a sequence {(y i, x i ); (i =,..., n), where y,..., y n are independent random variables referred to as response variables, and x,..., x n are p-dimensional nonstochastic vectors referred to as explanatory variables. The expectation of the response y i is related to a linear predictor η i = x iβ by a link function h( ), i.e., h(e[y i ]) = h(µ(θ i )) = η i. For theoretical purposes, we define u = (h µ), i.e., θ i = u(η i ). When h = µ, i.e., u is an identity function, we say that h is the natural link function. For example, the logistic regression model uses the natural link function. Finally, the candidate model is expressed as y i i.d f(y i ; θ i (β), ϕ), where f( ) is given by (2.). The p-dimensional unknown vector β can be estimated by the maximum likelihood method. The joint probability density function of y = (y,..., y n ) is given by n n { θi y i b(θ i ) f(y; β) = f(y i ; θ i (β), ϕ) = exp + c(y i, ϕ). 3

4 Hence, the log-likelihood function of the GLM is expressed as l(β; y) = log f(y; β) = { θi y i b(θ i ) + c(y i, ϕ). (2.2) Let ˆβ be the MLE of β. Here, ˆβ is given as the solution of the following likelihood equation: l(β; y) = β ( y i b(θ ) i) θi x i = θ i η i X (y µ) = 0 p, where X = (x,... x n ), = diag( θ / η,..., θ n / η n ), and µ = ( b(θ )/ θ,..., b(θ n )/ θ n ). Note that b(θ) is a C -class function and all of the orders of the moments of y exist in the interior Θ 0 of the natural parameter space Θ. In using some of the properties of the MLE, we have the following regularity assumptions (see, e.g., Fahrmeir and Kaufmann (985)): C. : x iβ h(m) (i =,..., n), for all β B, C.2 : h is three times continuously differentiable, C.3 : For all x i χ, θ i / η i 0, (i =,..., n), C.4 : n 0 s.t. X X has full rank for n n 0, where B is an admissible open set in R p for the parameter β, χ is a compact set for the regressors x i, and M denotes the image µ(θ 0 ). Condition C. is necessary in order to obtain the GLM for all β. Condition C.2 is necessary in order to calculate the bias. Conditions C.3 and 4 ensure that X V X is positive definite for all β B, n n 0, where ( 2 b(θ ) V = diag θ 2,..., 2 b(θ n ) θ 2 n Moreover, we have the following additional conditions to assure strong consistency and asymptotic normality of ˆβ, which can be derived by slightly modifying the results reported by Fahrmeir and Kaufmann (985): C.5 : sequence {x i lies in χ with u(x β) Θ 0, β B, C.6 : lim inf n λ min (X V X/n) > 0, C.7 : c > 0, n, λ min (X X) > cλ max (X X), n n, where λ min (A) is the smallest eigenvalue of symmetric matrix A. According to Theorem 5 in Fahrmeir and Kaufmann (985), ˆβ has strong consistency and asymptotic normality under these conditions. Furthermore, from C.6, X V X = O(n), (n ). ). 4

5 Based on the above conditions, ˆβ can be formally expanded as follows: ˆβ = β + n b + n b 2 + n n b 3 + O p (n 2 ). (2.3) Note that l( ˆβ; y)/ β = 0 p. By applying a Taylor expansion around ˆβ = β to this equation, the likelihood equation is expanded as follows: 0 p = (g + G 2 b ) + { G b 2 + n n 2 G 3(b b ) + n n { G 2 b G 3(b b 2 + b 2 b ) + 6 (I p b )G 4 (b b ) where 0 p is a p-dimensional vector of zeros, and g = n l(β; y) β = n (y i d i )c i x i, + O p (n 2 ), (2.4) G 2 = 2 l(β; y) = { d n β β i2 c 2 i + (y i d i )c i2 x i x n i, G 3 = ( ) n β 2 l(β; y), β β = { d i3 c 3 i 3d i2 c i c i2 + (y i d i )c i3 (x i x i x n i), G 4 = ( ) 2 n β β 2 l(β; y) β β = { d i4 c 4 i 6d i3 c 2 n ic i2 3d i2 c i c 2 i2 4d i2 c i3 + (y i d i )c i4 (x i x i x i x i). Here, coefficients c ik and d ik, (i =,..., n; k =,..., 4) are defined as c ik = k θ i η k i, d ik = k b(θ i ). (2.5) θi k Note that c ik is determined by the link function, and d ik is determined by the distribution of the model. Let us define Z i = n(g i M i ) (i = 2, 3, 4), where M i = E[G i ], the explicit forms of which are M 2 = n M 3 = n M 4 = n ( d i2 c 2 i)x i x i, ( d i3 c 3 i 3d i2 c i c i2 )(x i x i x i), ( d i4 c 4 i 6d i3 c 2 ic i2 3d i2 c 2 i2 4d i2 c i c i3 )(x i x i x i x i). 5

6 Thus, Z i (i = 2, 3, 4) can be expressed as Z 2 = Z 3 = Z 4 = n n n {(y i d i )c i2 x i x i, {(y i d i )c i3 (x i x i x i), {(y i d i )c i4 (x i x i x i x i). Based on the regularity assumptions, nonsingularity of M 2 is guaranteed. Furthermore, the regularity assumptions and Conditions C.6 and C.7 ensure the asymptotic normality of Z i. Hence, we can rewrite (2.4) as 0 p = (g + M 2 b ) + {M 2 b n n M 3(b b ) + Z 2 b + { n M 2 b 3 + n 2 M 3(b b 2 + b 2 b ) + 6 (I p b )M 4 (b b ) + Z 2 b Z 3(b b ) + O p (n 2 ). (2.6) Comparing the terms of the same order in both sides of (2.6), the explicit forms of b, b 2, and b 3 are obtained as follows: b = M2 g, { b 2 = M2 2 M 3(b b ) + Z 2 b, b 3 = M 2 { 2 M 3(b b 2 + b 2 b ) + 6 (I p b ) M 4 (b b ) + Z 2 b Z 3(b b ). 3. BIAS CORRECTION OF THE AIC The goodness of fit of the model is measured by the risk function based on the expected KL information, as follows: R = E y E y [ 2l( ˆβ; y )], where y = (y,..., yn) is an n-dimensional random vector that is independent of y and has the same distribution as y. At the beginning of this section, we derive the bias of 2l( ˆβ; y) to R. Under ordinary circumstances, calculation of the expectations of y under the specific distribution are needed in order to express the bias. However, based on the characteristics of the exponential family, we can obtain the bias without calculating the expectations of y under the specific distribution. The explicit form of the bias can be expressed by several derivatives of the log-likelihood function. 6

7 The bias when we estimate R by 2l( ˆβ; y) is given as B = R E y [ 2l( ˆβ; y)] = E y [E y [ [ 2l( ˆβ; y) 2l( ˆβ; ]] y ) = 2E y ] (y i d i ) ˆθ i. (3.) By applying a Taylor expansion around ˆβ = β to ˆθ i = (h µ) (x i ˆβ), ˆθ i is expanded as ˆθ i = θ i + ( ˆβ β) θ i β + 2 ( ˆβ β) 2 θ i β β ( ˆβ β) + 6 ( ˆβ β) {( β 2 β β ) θ i {( ˆβ β) ( ˆβ β) + O p (n 2 ). (3.2) Substituting the stochastic expansion of ˆβ in (2.3) into (3.2) yields the following: ˆθ i = θ i + c i x n ib + {c i x ib n c i2(x ib ) 2 + {c n i x ib 3 + c i2 (x ib )(x ib 2 ) + 6 n c i3(x ib ) 3 + O p (n 2 ). (3.3) By combining (3.) and (3.3), we obtain [ ] [ ] (y i d i ) B = 2E θ i + 2 (y i d i ) E c i x n ib [ + 2 n E (y i d i ) {c i x ib c i2(x ib ] ) 2 [ + 2 { n n E (y i d i ) c i x ib 3 + c i2 (x ib )(x ib 2 ) + ] 6 c i3(x ib ) 3 + O(n 2 ). (3.4) Recall that d i = b(θ i )/ θ i = E[y i ]. This yields the first term of (3.4), as follows: [ ] (y i d i ) 2E θ i = 0. (3.5) Since E[gg ] = M 2, the second term of (3.4) can be calculated as [ ] 2 (y i d i ) E c i x n ib = 2E[g M2 g] = 2p. (3.6) The third term of (3.4) can be obtained as [ 2 y i d i E {c i x ib n c i2(x ib ] ) 2 3 = d n 2 i3 c 2 ic i2 Uii 2 + d n 3 2 i3 c 3 i(d j3 c 3 j + 3d j2 c j c j2 )Uij 3 + O(n 2 ), (3.7) i,j where n i,j refers to n n j=, and U ij is the (i, j)th element of the matrix U = XM2 X, i.e., U ij = x im 2 x j. (3.8) 7

8 Note that coefficient U ij is determined by both the link function and the distribution of the model. The derivation of (3.7) is shown in Appendix A.. Furthermore, the fourth term of (3.4) can be expanded as [ 2 n n E (y i d i ) {c i x ib 3 + c i2 (x ib )(x ib 2 ) + 6 c i3(x ib ] ) 3 = (d n 2 i4 c 4 i + 6d i3 c 2 ic i2 d i2 c 2 i2)uii 2 {2(d n 3 2 i3 c 3 i)(d j3 c 3 j + 3d j2 c j c j2 ) + 4(d i2 c i c i2 )(d j2 c j c j2 )Uij 3 n 3 2 i,j {(d i3 c 3 i)(d j3 c 3 j + 3d j2 c j c j2 ) + 4(d i2 c i c i2 )(d j2 c j c j2 )U ij U ii U jj + O(n 2 ). (3.9) i,j The detailed derivation of (3.9) is given in Appendix A.2. Finally, by substituting (3.5), (3.6), (3.7), and (3.9) into (3.4), we obtain the asymptotic expansion of B up to order n as B = 2p + n (w + w 2 ) + O(n 2 ), (3.0) where w = n w 2 = n 2 2 (d i4 c 4 i + 3d i3 c 2 ic i2 d i2 c 2 i2)uii, 2 {d i3 c 3 i(d j3 c 3 j + 3d j2 c j c j2 ) + 4(d i2 c i c i2 )(d j2 c j c j2 )(Uij 3 + U ij U ii U jj ). i,j (3.) By a simple calculation, we have c i = and c i2 = 0 when the link function is natural. Thus, if the model has the natural link function, w and w 2 became simple, as follows: w = n w 2 = n 2 2 d i4 Uii, 2 d i3 d j3 (Uij 3 + U ij U ii U jj ). i,j (3.2) Equation (3.0) yields the following formula for the CAIC: CAIC = AIC + n (ŵ + ŵ 2 ), (3.3) where ŵ and ŵ 2 are defined by replacing β in w and w 2 with ˆβ. On the other hand, if h is not the natural link function, we have to use w and w 2 in (3.). Note that ŵ and ŵ 2 depend only on several derivatives. Therefore, we can comfortably obtain coefficients ŵ and ŵ 2 using formula manipulation software. 8

9 4. THE CAIC IN THE MODELS IMPLEMENTED IN glm In this section, we will present several examples of the CAIC in the GLM, which can be used in the glm of the R software. In glm, the binomial distribution accepts the links logit, Probit, cauchit, and cloglog (complementary log-log). The Gamma distribution accepts the inverse, identity, and log links. The Poisson accepts the distribution of the log, identity, and sqrt links, and the inverse Gaussian distribution accepts the /µ 2, inverse, and log links. These examples are obtained through our formula in (3.3). If h is the natural link function, w and w 2 in (3.3) are expressed as shown in (3.2). Otherwise, w and w 2 in (3.3) are expressed as in (3.). Next, we present the coefficients of the CAIC in the all of models mentioned above. The results indicate that the CAIC in the model with a non-natural link function is more complex than that in the model with the natural link function. An expression of the log-likelihood function of the GLM is given by Equation (2.). 4.. Case of the Binomial Distribution When we assume that y i is distributed according to the Binomial distribution B(m i, p i ) (i =,..., n), the parameters and functions based on the distribution are given by ( ) pi θ i = log, b(θ i ) = m i log( + exp(θ i )), p i ( ) mi ϕ =, =, c(y i, ϕ) = log. Then, the coefficients of the CAIC specifying the distribution are given by d i = m i exp(θ i ) + exp(θ i ), d i2 = m i exp(θ i ) ( + exp(θ i )) 2, d i3 = m i exp(θ i )( exp(θ i )) ( + exp(θ i )) 3, d i4 = m i exp(θ i )( 4 exp(θ i ) + exp(2θ i )) ( + exp(θ i )) 4. The remaining coefficients of the CAIC, which are determined by the link function, are as follows: Case of the logistic link function, i.e., E[y i ] = m i p i = m i ( + exp( η i )) (natural link function): { U ij = x i m k exp( η k ) n ( + exp( η k )) x kx 2 k x j. k= The CAIC derived from the above coefficients coincides with the CAIC reported by Yanagihara et al. (2003). Case of the probit link function, i.e., m i p i = m i Φ(η i ), where Φ( ) is the cumulative distribution y i 9

10 function (CDF) of the standard normal distribution: ϕ(η i ) c i = ( Φ(η i ))Φ(η i ), c ϕ(η i ) i2 = {( Φ(η i ))Φ(η i ) {η iφ(η 2 i )( Φ(η i )) + ϕ(η i )( 2Φ(η i )), { U ij = x i m k ϕ(η k ) 2 n ( Φ(η k ))Φ(η k ) x kx k x j, k= where ϕ( ) is the probability density function (PDF) of the standard normal distribution. Case of the cauchit link function, i.e., m i p i = m i Ψ(η i ), where Ψ( ) is the CDF of the standard Cauchy distribution: ψ(η i ) c i = ( Ψ(η i ))Ψ(η i ), c ψ(η i ) 2 i2 = {( Ψ(η i ))Ψ(η i ) {2πη iψ(η 2 i )( Ψ(η i )) ( 2Ψ(η i )), { U ij = x i m k ψ(η k ) 2 n ( Ψ(η k ))Ψ(η k ) x kx k x j, k= where ψ( ) is the PDF of the standard Cauchy distribution. Case of the cloglog link function i.e., m i p i = m i m i exp{ exp(η i ): exp(η i ) c i = exp( exp(η i )), c i2 = exp(η i){ ( exp(η i )) exp( exp(η i )), { exp( exp(η i )) 2 { U ij = x i m k exp(2η k ) exp(exp(η k )) x k x n exp( exp(η k )) k x j. k= 4.2. Case of the Poisson Distribution Second, when we assume that y i is distributed according to a Poisson distribution P o(λ i ) (i =,..., n), the parameters and functions based on the model are given as follows: θ i = log λ i, b(θ i ) = exp(θ i ), ϕ =, =, c(y i, ϕ) = log(y i!). The coefficients of the CAIC specifying the distribution are given by d i = d i2 = d i3 = d i4 = exp(θ i ). The remaining coefficients of the CAIC, which are determined by the link function, are as follows: Case of the log link function, i.e., E[y i ] = λ i = e η i (natural link function): { U ij = x i exp(η k )x k x k x j. n k= The CAIC obtained from the above coefficients coincides with that reported by Kamo et al. (20). 0

11 Case of the identity link function, i.e., λ i = η i : c i =, η i c i2 =, ηi 2 U ij = x i ( n k= ) x k x k x j. η k Case of the sqrt link function, i.e., λ i = ηi 2 : c i = 2, η i c i2 = 2, ηi 2 U ij = x i ( 4 n ) x k x k x j. k= 4.3. Case of the Gamma Distribution When we assume that a positive observed response, y i, is distributed according to the Gamma distribution Γ(λ i, ν) (i =,..., n), the parameters and functions based on the model are given as follows: θ i = λ i ν, b(θ i) = log( θ i ), ϕ = ν, = ϕ, c(y i, ϕ) = ν log ν log(γ(ν)) + (ν ) log y i, where Γ( ) is the gamma function. Then, the coefficients of the CAIC specifying the distribution are given by d i = θ i, d i2 = θ 2 i, d i3 = 2θ 3 i, d i4 = 6θ 4 i. The remaining coefficients of the CAIC, which are determined by the link function, are as follows: Case of the inverse link function, i.e., E[y i ] = λ i ν = η i ( U ij = x i ν n k= (natural link function): η 2 k x kx k) x j. Case of the log link function, i.e., λ i ν = exp(η i ): ( c i = exp( η i ), c i2 = exp( η i ), U ij = x i ν n x k x k) x j. k= Case of the identity link function, i.e., λ i ν = η i : c i = η 2 i, c i2 = 2η 3 i U ij = x i ( ν n k= η 4 k x kx k) x j.

12 4.4. Case of the Inverse Gaussian Distribution When we assume that a positive observed response, y i, is distributed according to the inverse Gaussian distribution IG(µ i, λ) (i =,..., n), the parameters and functions based on the model are given as follows: θ i = µ 2 i, b(θ i ) = 2 θ i, ϕ = λ, = λ 2, c(y i, ϕ) = 2 log(λ) 2 log(2πx3 ) λ 2x, Then, the coefficients of the CAIC specifying the distribution are given as d i = θ /2 i, d i2 = 2 θ 3/2 i, d i3 = 3θ 5/2 i, d i4 = 5 2 θ 7/2 i. The remaining coefficients of the CAIC, which are determined by the link function, are as follows: Case of the /µ 2 link function, i.e., E[y i ] = µ i = η /2 i ( U ij = x i nλ Case of the inverse link function, i.e., µ i = η i : k= c i = 2η i, c i2 = 2, U ij = x i Case of the log link function, i.e., µ i = exp(η i ): (natural link function): η 3/2 k x k x k) x j. ( 4 nλ c i = 2 exp( 2η i ), c i2 = 4 exp( 2η i ), U ij = x i k= ( η k x kx k) x j. 4 nλ exp( η k )x k x k) x j. k= 5. NUMERICAL STUDIES In this section, we conduct numerical studies to show that the CAIC is better than the original AIC. At the beginning of this section, we examine the numerical studies for the frequencies of the model and the prediction error of the best models selected by the criteria. We prepared the eight candidate models with n = 50 and 00. First, we constructed an n 8 explanatory variable matrix X = (x,..., x n ). The first column of X is n, where n is an n-dimensional vector of ones, and the remaining seven columns of X were defined by realizations of independent dummy variables with binomial distribution B(, 0.4). In this simulation, we prepared two parameters β, as follows: Case : β = (0.65, 0.65), Case2 : β = (0., 0., 0.3, 0.5). 2

13 The explanatory variables matrix in the jth model consists of the first j columns of X (j =,..., 8). Thus, in case, the true model is the second model, and in case 2, the true model is the fourth model. We simulated,000 realizations of y = (y,..., y n ) in the probit regression model, i.e., y i.d i B(, p i ), where p i = Φ(x iβ) (i =,..., n). Table : Selection-probability and prediction error for the case of β = (0.65, 0.65) n Model PE B Risk AIC CAIC Risk AIC CAIC Table 2: Selection-probability and prediction error for the case in which β = (0., 0., 0.3, 0.5) n Model PE B Risk AIC CAIC Risk AIC CAIC Tables and 2 list the following properties. () selection-probability: the frequency of the model chosen by minimizing the information criterion. (2) prediction error of the best model (PE B ): the risk function of the model selected by the information criterion as the best model, which is estimated as PE B = 000 E y [ 2l( 000 ˆβ Bi ; y )], where y is a future observation, and ˆβ Bi iteration. is the value of ˆβ of the selected model at the ith In particular, PE B is an important property because it is equivalent to the expected KL information between the true model and the best model selected by the criteria. In case with n = 50 and 00 and in case 2 with n = 00, the model having the smallest risk function (referred to as the principle best model) coincides with the true model. On the other hand, in case 2 with n = 50, 3

14 the principle best model became the first model rather than the fourth model, i.e., the true model did not conform with the principle best model. This means that using a model that is smaller than the true model is better for prediction in case 2 with n = 50. From Tables and 2, we can see that the selection-probabilities and prediction errors of the CAIC were improved in all situations in comparison with the AIC. We simulated several other models and obtained similar results. Table 3: Selection-probability and estimated prediction error Selected model AIC CAIC Logistic Probit Total Logistic Probit Total X Ray X Ray, Acid X Ray, Age 0 0 X Ray, Grade X Ray, Stage X Ray, Grade, Acid X Ray, Stage, Acid X Ray, Stage, Age X Ray, Stage, Grade X Ray, Grade, Age, Acid X Ray, Stage, Age, Acid X Ray, Stage, Grade, Acid X Ray, Stage, Grade, Age X Ray, Stage, Grade, Age, Acid PE B Next, for the purpose of analyzing the GLM, we consider the data reported in Brown (980), who discussed an experiment in which 53 prostate cancer patients underwent surgery to examine their lymph nodes for evidence of cancer. The response variable is the number of patients with nodal involvement, and there were five predictor variables: X Ray, Stage, Age, Acid, and Grade. First, we assume that the response variable y i is distributed according to B(, p i ) (i =,..., n). For the link function, we prepare two functions: the logistic link function and the Probit link function. In this analysis, we select the link functions and variables simultaneously. Table 3 shows the selection-probability of the model selected by minimizing the information criterion and the estimated prediction error of the best model selected by the information criterion. We divide the data into calibration sample data and validation sample data. The sample sizes of the calibration sample and the validation sample were 43 and 0, respectively. The best subset of the variables and the link function were selected by criteria derived from the calibration sample. The selection-probabilities were evaluated from only the calibration sample. The prediction errors were estimated as follows. Let d j = (d j,..., d nj ) be an n-dimensional vector expressing a pattern to leave out 0 data at the jth iteration j =,..., 00), i.e., d ij = or 0 and n d ij = 0. Moreover, we let ˆβ B,[ dj ] denote 4

15 ˆβ [ dj ] of β of the best model evaluated from the calibration sample, where ˆβ [ dj ] is given as ˆβ [ dj ] = arg max β Finally, the estimated PE B is given as PE B = j= ( d ij ) log f(y i ; β). d ij { 2 log f(y i ; ˆβ B,[ dj ]). Table 3 indicates that the models selected by the AIC were spread over a wider area than those of the CAIC, although the model most selected by the AIC is the same as that selected by the CAIC. In particular, the selection probability of the model most selected by the CAIC is much higher than that selected by the AIC. The estimated prediction error of the CAIC was smaller than that of the AIC. Thus, the CAIC is thought to have improved the accuracy of the original AIC. Consequently, from Tables, 2, and 3, we recommend the use of the CAIC rather than the AIC for selecting variables in the GLMs. 6. CONCLUSION In the present paper, we derived a simple formula for the CAIC in the GLM. All of the coefficients in our formula are the first through fourth derivatives of the log-likelihood function. The GLM can express a number of statistical models by changing the distribution and the link function and can be easily fitted to the real data using the function glm in the R software, which implements several distributions and link functions. Hence, based on the real data analysis, the present result is useful for real data analysis. Moreover, the numerical studies revealed that the CAIC is better than the original AIC. For example, we presented explicit forms of the CAIC in all of the models that are implemented in glm. These explicit forms are thus confirmed to be useful in real data analysis. Even if a researcher wants to use the CAIC in a model that for which an example CAIC has not yet been derived, the researcher can easily obtain the CAIC using formula manipulation software. In the present paper, we deal primarily with variable selection. However, in the simulation of the real data analysis, we also considered the selection of the link function. If we choose the link function by minimizing the original AIC, the optimal link function is determined only by maximizing the log-likelihood function. On the other hand, if we use the CAIC to select the link function, the optimal link function is not determined only by maximizing the log-likelihood function. Thus, using the CAIC will allow us to select an appropriate link function. As mentioned above, we confirm that the results are useful and user friendly. 5

16 APPENDIX A. Derivation of the Third Term of (3.4) In order to calculate the moments of b and Z 2, we rewrite the third term of (3.4) using b, Z 2, and M 3 as [ 2 n E (y i d i ) { c i x ib c i2(x ib ) 2 ] = n E [b M 3 (b b )] + 3 n E [b Z 2 b ], (A.) where c ij and d ij are defined in (2.5), and U ij is defined in (3.8). Let φ b (t) be the characteristic function of the distribution of b, defined as φ b (t) = E[exp(it b )] = n E[exp{i(y j d j )s j ], j= s j = n c j t M 2 x j, where t = (t,..., t p ). Note that E[exp{i(y µ)s] is the characteristic function of y µ, which is expressed as Therefore, we have { b(θ + is) b(θ) E[exp{i(y µ)s] = exp iµs. { ( ) b(θj + is j ) b(θ j ) φ b (t) = exp id j s j. j= Based on the property of the random variable with mean zero, the third moment is equivalent to the third cumulant. Since s j = O(n /2 ), log φ b (t) can be expanded as log φ b (t) = { b(θj + is j ) b(θ j ) id j s j j= = j= { 2 d j2(is j ) d j3(is j ) d j4(is j ) 4 j= + O(n 3/2 ). Thus, the third cumulant of b = (b,..., b p ) is computed through the derivative of log φ b (t), i.e., E[b α b α2 b α3 ] = i 3 3 log φ b (t) t α t α2 t α3 t=0 = 2 s j s j s j d j3 + O(n 3/2 ). t α t α2 t α3 Note that s j = c j e t αl n α l M2 x j, 6

17 where e j is the p-dimensional vector, the jth element of which is and the other elements of which are 0. Thus, using Equations (2.5) and (3.8), we obtain n E[b M 3 (b b )] = n 3 d j3 c 3 j(d i3 c 3 i + 3d i2 c i c i2 )Uij 3 + O(n 2 ). i,j (A.2) Let φ b,z 2 (t, T ) denote the characteristic function of the joint distribution for b and Z 2 as { b(θ j + iv j ) b(θ j ) φ b,z 2 (t, T ) = exp id j v j, where T = (t () ij ) (i, j =,..., p) and v j = j= n ( c j t M 2 x j + c j2 x jt x j ). In the same manner as in the calculation of log ϕ b (t), we have log φ b,z 2 (t, T ) = Note that { b(θj + iv j ) b(θ j ) id j v j j= = j= { 2 d j2(iv j ) d j3(iv j ) d j4(iv j ) 4 + O(n 3/2 ). v k = c k e t i n im2 x k, v k T ij = n c k2 (e ix k )(e jx k ). Hence, we obtain n E[b Z 2 b ] = n 2 d i3 c 2 ic i2 Uii 2 + O(n 2 ). (A.3) Substituting (A.2) and (A.3) into (A.), the third term of (3.4) is given by (3.7). A.2 Derivation of the Fourth Term of (3.4) In order to use the asymptotic properties, we express the fourth term of (3.4) in terms of b, Z 2, Z 3 M 2 M 3, and M 4 as [ 2 n n E (y i d i ) {c i x ib 3 + c i2 (x ib )(x ib 2 ) + 6 c i3(x ib ) 3 ] = n E [ (b b ) M 3M 2 M 3 (b b ) ] + 3n E [(b b ) M 4 (b b )] 3 n E [ (b b ) M 3M 2 Z 2 b ] 4 n E [ b Z 2 M 2 Z 2 b ] + 4 3n E [b Z 3 (b b )]. (A.4) 7

18 Let φ b,z 2,Z 3 ( ) denote the characteristic function of the joint distribution for b, Z 2, and Z 3 defined by { ( ) b(θj + ir j ) b(θ j ) φ b,z 2,Z 3 (t, T 2, T 3 ) = exp id j r j, where T 2 = (T (2) ij ) (i, j =,..., p), T 3 = (T (3) ) (i, j, k =,..., p) and r j = j= ijk n ( c j t M 2 x j + c j2 x jt 2 x j + c j3 x jt 3 (x j x j )). In order to simplify the calculations, we define the following notation: τ kj = k! (id jkr j ) k, 2 τ 2α κ ij = = t α= i t j n m= 2 τ 2α κ i,kl = = α= t i T (2) n jk m= 2 τ 2α κ ik,jl = = α= T (2) (2) ik T n jl κ i,ijk = α= 2 τ 2α t i T (3) ijk = n m= d m2 c 2 m(e im 2 x m )(e jm 2 x m ), (A.5) d m2 c m c m2 (e im 2 x m )(e kx m )(e lx m ), (A.6) d m2 c 2 m2(e ix m )(e kx m )(e jx m )(e lx m ), m= Using the derivations of ϕ b,z 2,Z 3, the first term of (A.4) is given by E[(b b ) M 3M 2 M 3 (b b )] = = (A.7) d m2 c m c m3 (e im 2 x m )(e ix m )(e jx m )(e kx m ). (A.8) p [M 3M 2 M 3 ] i,j,k,l E[b i b j b k b l ] i,j,k,l p [M 3M 2 4 φ b,z M 3 ] 2,Z 3 (t, T 2, T 3 ) i,j,k,l t i t j t k t l i,j,k,l t=0p,t 2 =0,T 3 =0 By applying a Taylor expansion, we obtain { ( ) 4 b(θj + ir j ) b(θ j ) exp id j r j t i t j t k t l j= t=0p,t2=0,t3=0 { 4 = exp (τ 2j + τ 3j + τ 4j ) + O(n ) 3/2 t i t j t k t l j= t=0 p,t 2 =0,T 3 =0 { 4 τ 4α = κ ij κ kl + κ ik κ jl + κ jk κ il + + O(n 2/3 ) exp { + O(n 3/2 ). t i t j t k t l Note that r j = O(n /2 ) and 4 τ 4α t i t j t k t l = α= α= 3 d α4 r α t i r α t j r α t k r α t l = O(n ). 8.

19 Hence, the first term of (A.4) is expressed as p E[(b b ) M 3M 2 M 3 (b b )] = [M 3M 2 M 3 ] i,j,k,l (κ ij κ kl + κ ik κ jl + κ jk κ il ) + O(n ). (A.9) i,j,k,l By substituting (A.5) into (A.9), we obtain E[(b b ) M 3M 2 M 3 (b b )] = (d n 2 2 i2 c 3 i + 3d i2 c i c i2 )(d j2 c 3 j + 3d j2 c j c j2 )(U ii U ij U jj + 2Uij) 3 + O(n ). (A.0) i,j The remaining terms of (A.4), as well as the first term of (A.4), will be calculated. The second term of (A.4) is similarly obtained from (A.0) as follows: E[(b b ) M 4 (b b )] = 3 (d i4 c 4 i + 6d i3 c 2 n ic i2 + 3d i2 c 2 i2 + 4d i2 c i c i3 )Uii 2 + O(n ). Next, we calculate the third term of (A.4). The third term of (A.4) is expressed as follows: E [ ] p (b b ) M 3M 2 Z 2 b = [M 3M 2 ] i,j,k E[b i b j b l Z 2,kl ] Expression (A.6) implies that = i,j,k,l (A.) p [M 3M 2 ] i,j,k (κ ij κ l,kl + κ ik κ j,kl + κ jk κ i,kl ) + O(n ), (A.2) i,j,k,l E [ ] (b b ) M 3M 2 Z 2 b = (d n 2 2 i3 c 3 i + 3d i2 c i c i2 )d j2 c j c j2 (U ii U ij U jj + 2Uij) 3 + O(n ). (A.3) The fourth term of (A.4) is as follows: i,j E [ ] p b Z 2 M2 Z 2 b = i,j,k,l It follows from (A.7) and (A.4) that E [ ] b Z 2 M2 Z 2 b = d i2 c 2 n i2u ii + n 2 2 [M Finally, we calculate the fifth term of (A.4). Note that p E [b Z 3 (b b )] = E[Z 3,ijk b i b j b k ] = 2 ] jk (κ il κ ik,jl + κ i,ij κ l,kl + κ i,kl κ l,ij ) + O(n ). (A.4) (d i2 c i c i2 )(d j2 c j c j2 )(U ii U ij U jj + Uij) 3 + O(n ). (A.5) i,j i,j,k p (κ ij κ k,ijk + κ ik κ j,ijk + κ jk κ i,ijk ) + O(n ). (A.6) i,j,k 9

20 Substituting (A.8) into (A.6) yields E [b Z 3 (b b )] = 3 n d i2 c i c i3 Uii 2 + O(n ). (A.7) Consequently, from (A.0), (A.), (A.3), (A.5), and (A.7), we obtain the fourth term of (3.4) as (3.9). ACKNOWLEDGEMENTS The authors would like to thank Dr. Mariko Yamamura of Hiroshima University and Dr. Eisuke Inoue of Kitazato University for their helpful comments and suggestions. The authors would also like to thank Dr. Kenichi Kamo of Sapporo Medical University for teaching us the R programming langauge. REFERENCES [] Akaike, H. (973). Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory (eds. Petrov, B. N. & Csáki, F.), , Akadémiai Kiadó, Budapest. [2] Barnett, S. B. L. & Nurmagambetov, T. A. (200). Cost of asthma in the United States: J. Allergy Clin. Immun., 27, [3] Brown, B. W. (980). Prediction analysis for binary data. In Biostatistics Casebook. (eds. Miller Jr., G. R., Efron., B., Brown, B. W., & Moses, L. E.), 3 8, Wiley, New York. [4] Davies, S. L., Neath, A. A. & Cavanaugh, J. E. (2006). Estimation optimality of corrected AIC and modified Cp in linear regression. Internat. Statist. Rev., 74, 2, [5] Fahrmeir, L. & Kaufmann, H. (985). Consistency and asymptotic normality of the maximum likelihood estimator in generalized linear models. Ann. Statist., 3, [6] Hurvich, C. M. & Tsai, C. L. (989). Regression and time series model selection in small samples. Biometrika, 76, [7] Kamo, K., Yanagihara, H. & Satoh, K. (20). Bias-corrected AIC for selecting variables in Poisson regression models. Comm. Statist. Theory Methods (in press). [8] Kullback, S. & Libler, R. (95). On information and sufficiency. Ann. Math. Statist., 22, [9] Nelder, J. A. & Wedderburn, W. M. (972). Generalized linear models. J. R. Statist. Soc. A., 35,

21 [0] Matas, M., Picornell, A., Cifuentes, C., Payeras, A., Bassa, A., Homar, F., López-Labrador, F. X., Moya, A., Ramon, M. M. & Castro, J. A. (200). Relating the liver damage with hepatitis C virus polymorphism in core region and human variables in HIV--coinfected patients. Infect. Genet. Evol., 0, [] Meyers, R. H., Montgomery, D. C. & Vining, G. G. (2002). Generalized Linear Models with Applications in Engineering and the Sciences. Wiley Interscience, Canada. [2] R Development Core Team (20). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. URL [3] Sánchez-Carneo, N., Couñago E., Rodrigues-Perez, D. & Freire, J. (20). Exploiting oceanographic satellite data to study the small scale coastal dynamics in a NE Atlantic open embayment. J. Marine Syst., 87, [4] Shibata, R. (980). Asymptotically efficient selection of the order of the model for estimating parameters of a linear process. Ann. Math. Statist., 8, [5] Sugiura, N. (978). Further analysis of the data by Akaike s information criterion and the finite corrections. Commun. Statist. -Theory Meth., 7,, [6] Teste, F. P. & Lieffers, V. J. (20). Snow damage in lodgepole pine stands brought into thinning and fertilization regimes. Forest Ecol. Manag., 26, [7] Wong, C. S. & Li, W. K. (998). A note on the corrected Akaike information criterion for threshold autoregressive models. J. Time Ser. Anal., 9, [8] Yanagihara, H., Sekiguchi, R. & Fujikoshi, Y. (2003). Bias correction of AIC in logistic regression models. J. Statist. Plann. Inference, 5,

Bias-corrected AIC for selecting variables in Poisson regression models

Bias-corrected AIC for selecting variables in Poisson regression models Bias-corrected AIC for selecting variables in Poisson regression models Ken-ichi Kamo (a), Hirokazu Yanagihara (b) and Kenichi Satoh (c) (a) Corresponding author: Department of Liberal Arts and Sciences,

More information

On Properties of QIC in Generalized. Estimating Equations. Shinpei Imori

On Properties of QIC in Generalized. Estimating Equations. Shinpei Imori On Properties of QIC in Generalized Estimating Equations Shinpei Imori Graduate School of Engineering Science, Osaka University 1-3 Machikaneyama-cho, Toyonaka, Osaka 560-8531, Japan E-mail: imori.stat@gmail.com

More information

Bias Correction of Cross-Validation Criterion Based on Kullback-Leibler Information under a General Condition

Bias Correction of Cross-Validation Criterion Based on Kullback-Leibler Information under a General Condition Bias Correction of Cross-Validation Criterion Based on Kullback-Leibler Information under a General Condition Hirokazu Yanagihara 1, Tetsuji Tonda 2 and Chieko Matsumoto 3 1 Department of Social Systems

More information

A Study on the Bias-Correction Effect of the AIC for Selecting Variables in Normal Multivariate Linear Regression Models under Model Misspecification

A Study on the Bias-Correction Effect of the AIC for Selecting Variables in Normal Multivariate Linear Regression Models under Model Misspecification TR-No. 13-08, Hiroshima Statistical Research Group, 1 23 A Study on the Bias-Correction Effect of the AIC for Selecting Variables in Normal Multivariate Linear Regression Models under Model Misspecification

More information

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

STA216: Generalized Linear Models. Lecture 1. Review and Introduction STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general

More information

An Unbiased C p Criterion for Multivariate Ridge Regression

An Unbiased C p Criterion for Multivariate Ridge Regression An Unbiased C p Criterion for Multivariate Ridge Regression (Last Modified: March 7, 2008) Hirokazu Yanagihara 1 and Kenichi Satoh 2 1 Department of Mathematics, Graduate School of Science, Hiroshima University

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random STA 216: GENERALIZED LINEAR MODELS Lecture 1. Review and Introduction Much of statistics is based on the assumption that random variables are continuous & normally distributed. Normal linear regression

More information

Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models

Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models Junfeng Shang Bowling Green State University, USA Abstract In the mixed modeling framework, Monte Carlo simulation

More information

Variable Selection in Multivariate Linear Regression Models with Fewer Observations than the Dimension

Variable Selection in Multivariate Linear Regression Models with Fewer Observations than the Dimension Variable Selection in Multivariate Linear Regression Models with Fewer Observations than the Dimension (Last Modified: November 3 2008) Mariko Yamamura 1, Hirokazu Yanagihara 2 and Muni S. Srivastava 3

More information

Ken-ichi Kamo Department of Liberal Arts and Sciences, Sapporo Medical University, Hokkaido, Japan

Ken-ichi Kamo Department of Liberal Arts and Sciences, Sapporo Medical University, Hokkaido, Japan A STUDY ON THE BIAS-CORRECTION EFFECT OF THE AIC FOR SELECTING VARIABLES IN NOR- MAL MULTIVARIATE LINEAR REGRESSION MOD- ELS UNDER MODEL MISSPECIFICATION Authors: Hirokazu Yanagihara Department of Mathematics,

More information

MODEL SELECTION BASED ON QUASI-LIKELIHOOD WITH APPLICATION TO OVERDISPERSED DATA

MODEL SELECTION BASED ON QUASI-LIKELIHOOD WITH APPLICATION TO OVERDISPERSED DATA J. Jpn. Soc. Comp. Statist., 26(2013), 53 69 DOI:10.5183/jjscs.1212002 204 MODEL SELECTION BASED ON QUASI-LIKELIHOOD WITH APPLICATION TO OVERDISPERSED DATA Yiping Tang ABSTRACT Overdispersion is a common

More information

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS020) p.3863 Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Jinfang Wang and

More information

Generalized Linear Models Introduction

Generalized Linear Models Introduction Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,

More information

Discrepancy-Based Model Selection Criteria Using Cross Validation

Discrepancy-Based Model Selection Criteria Using Cross Validation 33 Discrepancy-Based Model Selection Criteria Using Cross Validation Joseph E. Cavanaugh, Simon L. Davies, and Andrew A. Neath Department of Biostatistics, The University of Iowa Pfizer Global Research

More information

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Generalized Linear Models. Last time: Background & motivation for moving beyond linear Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered

More information

Weighted Least Squares I

Weighted Least Squares I Weighted Least Squares I for i = 1, 2,..., n we have, see [1, Bradley], data: Y i x i i.n.i.d f(y i θ i ), where θ i = E(Y i x i ) co-variates: x i = (x i1, x i2,..., x ip ) T let X n p be the matrix of

More information

Comparison with RSS-Based Model Selection Criteria for Selecting Growth Functions

Comparison with RSS-Based Model Selection Criteria for Selecting Growth Functions TR-No. 14-07, Hiroshima Statistical Research Group, 1 12 Comparison with RSS-Based Model Selection Criteria for Selecting Growth Functions Keisuke Fukui 2, Mariko Yamamura 1 and Hirokazu Yanagihara 2 1

More information

Jackknife Bias Correction of the AIC for Selecting Variables in Canonical Correlation Analysis under Model Misspecification

Jackknife Bias Correction of the AIC for Selecting Variables in Canonical Correlation Analysis under Model Misspecification Jackknife Bias Correction of the AIC for Selecting Variables in Canonical Correlation Analysis under Model Misspecification (Last Modified: May 22, 2012) Yusuke Hashiyama, Hirokazu Yanagihara 1 and Yasunori

More information

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j Standard Errors & Confidence Intervals β β asy N(0, I( β) 1 ), where I( β) = [ 2 l(β, φ; y) ] β i β β= β j We can obtain asymptotic 100(1 α)% confidence intervals for β j using: β j ± Z 1 α/2 se( β j )

More information

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Components of a linear model The two

More information

Comparison with Residual-Sum-of-Squares-Based Model Selection Criteria for Selecting Growth Functions

Comparison with Residual-Sum-of-Squares-Based Model Selection Criteria for Selecting Growth Functions c 215 FORMATH Research Group FORMATH Vol. 14 (215): 27 39, DOI:1.15684/formath.14.4 Comparison with Residual-Sum-of-Squares-Based Model Selection Criteria for Selecting Growth Functions Keisuke Fukui 1,

More information

Generalized Linear Models (1/29/13)

Generalized Linear Models (1/29/13) STA613/CBB540: Statistical methods in computational biology Generalized Linear Models (1/29/13) Lecturer: Barbara Engelhardt Scribe: Yangxiaolu Cao When processing discrete data, two commonly used probability

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.

More information

1. Introduction Over the last three decades a number of model selection criteria have been proposed, including AIC (Akaike, 1973), AICC (Hurvich & Tsa

1. Introduction Over the last three decades a number of model selection criteria have been proposed, including AIC (Akaike, 1973), AICC (Hurvich & Tsa On the Use of Marginal Likelihood in Model Selection Peide Shi Department of Probability and Statistics Peking University, Beijing 100871 P. R. China Chih-Ling Tsai Graduate School of Management University

More information

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will

More information

Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis

Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis Yasunori Fujikoshi and Tetsuro Sakurai Department of Mathematics, Graduate School of Science,

More information

Generalized linear models

Generalized linear models Generalized linear models Søren Højsgaard Department of Mathematical Sciences Aalborg University, Denmark October 29, 202 Contents Densities for generalized linear models. Mean and variance...............................

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Generalized Linear Models: An Introduction

Generalized Linear Models: An Introduction Applied Statistics With R Generalized Linear Models: An Introduction John Fox WU Wien May/June 2006 2006 by John Fox Generalized Linear Models: An Introduction 1 A synthesis due to Nelder and Wedderburn,

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

Generalized Linear Models. Kurt Hornik

Generalized Linear Models. Kurt Hornik Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

A Fast Algorithm for Optimizing Ridge Parameters in a Generalized Ridge Regression by Minimizing an Extended GCV Criterion

A Fast Algorithm for Optimizing Ridge Parameters in a Generalized Ridge Regression by Minimizing an Extended GCV Criterion TR-No. 17-07, Hiroshima Statistical Research Group, 1 24 A Fast Algorithm for Optimizing Ridge Parameters in a Generalized Ridge Regression by Minimizing an Extended GCV Criterion Mineaki Ohishi, Hirokazu

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection Model Selection in GLMs Last class: estimability/identifiability, analysis of deviance, standard errors & confidence intervals (should be able to implement frequentist GLM analyses!) Today: standard frequentist

More information

Linear, Generalized Linear, and Mixed-Effects Models in R. Linear and Generalized Linear Models in R Topics

Linear, Generalized Linear, and Mixed-Effects Models in R. Linear and Generalized Linear Models in R Topics Linear, Generalized Linear, and Mixed-Effects Models in R John Fox McMaster University ICPSR 2018 John Fox (McMaster University) Statistical Models in R ICPSR 2018 1 / 19 Linear and Generalized Linear

More information

Geographically Weighted Regression as a Statistical Model

Geographically Weighted Regression as a Statistical Model Geographically Weighted Regression as a Statistical Model Chris Brunsdon Stewart Fotheringham Martin Charlton October 6, 2000 Spatial Analysis Research Group Department of Geography University of Newcastle-upon-Tyne

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department

More information

Generalized Linear Models

Generalized Linear Models York SPIDA John Fox Notes Generalized Linear Models Copyright 2010 by John Fox Generalized Linear Models 1 1. Topics I The structure of generalized linear models I Poisson and other generalized linear

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

Consistency of test based method for selection of variables in high dimensional two group discriminant analysis

Consistency of test based method for selection of variables in high dimensional two group discriminant analysis https://doi.org/10.1007/s42081-019-00032-4 ORIGINAL PAPER Consistency of test based method for selection of variables in high dimensional two group discriminant analysis Yasunori Fujikoshi 1 Tetsuro Sakurai

More information

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20 Logistic regression 11 Nov 2010 Logistic regression (EPFL) Applied Statistics 11 Nov 2010 1 / 20 Modeling overview Want to capture important features of the relationship between a (set of) variable(s)

More information

,..., θ(2),..., θ(n)

,..., θ(2),..., θ(n) Likelihoods for Multivariate Binary Data Log-Linear Model We have 2 n 1 distinct probabilities, but we wish to consider formulations that allow more parsimonious descriptions as a function of covariates.

More information

Some explanations about the IWLS algorithm to fit generalized linear models

Some explanations about the IWLS algorithm to fit generalized linear models Some explanations about the IWLS algorithm to fit generalized linear models Christophe Dutang To cite this version: Christophe Dutang. Some explanations about the IWLS algorithm to fit generalized linear

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.

More information

Linear Regression With Special Variables

Linear Regression With Special Variables Linear Regression With Special Variables Junhui Qian December 21, 2014 Outline Standardized Scores Quadratic Terms Interaction Terms Binary Explanatory Variables Binary Choice Models Standardized Scores:

More information

Bayesian Estimation of Prediction Error and Variable Selection in Linear Regression

Bayesian Estimation of Prediction Error and Variable Selection in Linear Regression Bayesian Estimation of Prediction Error and Variable Selection in Linear Regression Andrew A. Neath Department of Mathematics and Statistics; Southern Illinois University Edwardsville; Edwardsville, IL,

More information

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models John M. Neuhaus Charles E. McCulloch Division of Biostatistics University of California, San

More information

PQL Estimation Biases in Generalized Linear Mixed Models

PQL Estimation Biases in Generalized Linear Mixed Models PQL Estimation Biases in Generalized Linear Mixed Models Woncheol Jang Johan Lim March 18, 2006 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure for the generalized

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA JAPANESE BEETLE DATA 6 MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA Gauge Plots TuscaroraLisa Central Madsen Fairways, 996 January 9, 7 Grubs Adult Activity Grub Counts 6 8 Organic Matter

More information

An Akaike Criterion based on Kullback Symmetric Divergence in the Presence of Incomplete-Data

An Akaike Criterion based on Kullback Symmetric Divergence in the Presence of Incomplete-Data An Akaike Criterion based on Kullback Symmetric Divergence Bezza Hafidi a and Abdallah Mkhadri a a University Cadi-Ayyad, Faculty of sciences Semlalia, Department of Mathematics, PB.2390 Marrakech, Moroco

More information

Generalized Quasi-likelihood versus Hierarchical Likelihood Inferences in Generalized Linear Mixed Models for Count Data

Generalized Quasi-likelihood versus Hierarchical Likelihood Inferences in Generalized Linear Mixed Models for Count Data Sankhyā : The Indian Journal of Statistics 2009, Volume 71-B, Part 1, pp. 55-78 c 2009, Indian Statistical Institute Generalized Quasi-likelihood versus Hierarchical Likelihood Inferences in Generalized

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2)

Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2) Lectures on Machine Learning (Fall 2017) Hyeong In Choi Seoul National University Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2) Topics to be covered:

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 15 1 / 38 Data structure t1 t2 tn i 1st subject y 11 y 12 y 1n1 Experimental 2nd subject

More information

A high-dimensional bias-corrected AIC for selecting response variables in multivariate calibration

A high-dimensional bias-corrected AIC for selecting response variables in multivariate calibration A high-dimensional bias-corrected AIC for selecting response variables in multivariate calibration Ryoya Oda, Yoshie Mima, Hirokazu Yanagihara and Yasunori Fujikoshi Department of Mathematics, Graduate

More information

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n = Spring 2012 Math 541A Exam 1 1. (a) Let Z i be independent N(0, 1), i = 1, 2,, n. Are Z = 1 n n Z i and S 2 Z = 1 n 1 n (Z i Z) 2 independent? Prove your claim. (b) Let X 1, X 2,, X n be independent identically

More information

Generalized Estimating Equations

Generalized Estimating Equations Outline Review of Generalized Linear Models (GLM) Generalized Linear Model Exponential Family Components of GLM MLE for GLM, Iterative Weighted Least Squares Measuring Goodness of Fit - Deviance and Pearson

More information

A General Overview of Parametric Estimation and Inference Techniques.

A General Overview of Parametric Estimation and Inference Techniques. A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying

More information

Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit

Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit R. G. Pierse 1 Introduction In lecture 5 of last semester s course, we looked at the reasons for including dichotomous variables

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April

More information

Applied Linear Statistical Methods

Applied Linear Statistical Methods Applied Linear Statistical Methods (short lecturenotes) Prof. Rozenn Dahyot School of Computer Science and Statistics Trinity College Dublin Ireland www.scss.tcd.ie/rozenn.dahyot Hilary Term 2016 1. Introduction

More information

Approximate Likelihoods

Approximate Likelihoods Approximate Likelihoods Nancy Reid July 28, 2015 Why likelihood? makes probability modelling central l(θ; y) = log f (y; θ) emphasizes the inverse problem of reasoning y θ converts a prior probability

More information

Department of Mathematics, Graduate School of Science Hiroshima University Kagamiyama, Higashi Hiroshima, Hiroshima , Japan.

Department of Mathematics, Graduate School of Science Hiroshima University Kagamiyama, Higashi Hiroshima, Hiroshima , Japan. High-Dimensional Properties of Information Criteria and Their Efficient Criteria for Multivariate Linear Regression Models with Covariance Structures Tetsuro Sakurai and Yasunori Fujikoshi Center of General

More information

Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk

Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Ann Inst Stat Math (0) 64:359 37 DOI 0.007/s0463-00-036-3 Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Paul Vos Qiang Wu Received: 3 June 009 / Revised:

More information

Improved model selection criteria for SETAR time series models

Improved model selection criteria for SETAR time series models Journal of Statistical Planning and Inference 37 2007 2802 284 www.elsevier.com/locate/jspi Improved model selection criteria for SEAR time series models Pedro Galeano a,, Daniel Peña b a Departamento

More information

12 Modelling Binomial Response Data

12 Modelling Binomial Response Data c 2005, Anthony C. Brooms Statistical Modelling and Data Analysis 12 Modelling Binomial Response Data 12.1 Examples of Binary Response Data Binary response data arise when an observation on an individual

More information

Lecture 8. Poisson models for counts

Lecture 8. Poisson models for counts Lecture 8. Poisson models for counts Jesper Rydén Department of Mathematics, Uppsala University jesper.ryden@math.uu.se Statistical Risk Analysis Spring 2014 Absolute risks The failure intensity λ(t) describes

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

SGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection

SGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection SG 21006 Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection Ioan Tabus Department of Signal Processing Tampere University of Technology Finland 1 / 28

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Yan Lu Jan, 2018, week 3 1 / 67 Hypothesis tests Likelihood ratio tests Wald tests Score tests 2 / 67 Generalized Likelihood ratio tests Let Y = (Y 1,

More information

Modfiied Conditional AIC in Linear Mixed Models

Modfiied Conditional AIC in Linear Mixed Models CIRJE-F-895 Modfiied Conditional AIC in Linear Mixed Models Yuki Kawakubo Graduate School of Economics, University of Tokyo Tatsuya Kubokawa University of Tokyo July 2013 CIRJE Discussion Papers can be

More information

Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives

Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives TR-No. 14-06, Hiroshima Statistical Research Group, 1 11 Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives Mariko Yamamura 1, Keisuke Fukui

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

11. Generalized Linear Models: An Introduction

11. Generalized Linear Models: An Introduction Sociology 740 John Fox Lecture Notes 11. Generalized Linear Models: An Introduction Copyright 2014 by John Fox Generalized Linear Models: An Introduction 1 1. Introduction I A synthesis due to Nelder and

More information

Comparison of Estimators in GLM with Binary Data

Comparison of Estimators in GLM with Binary Data Journal of Modern Applied Statistical Methods Volume 13 Issue 2 Article 10 11-2014 Comparison of Estimators in GLM with Binary Data D. M. Sakate Shivaji University, Kolhapur, India, dms.stats@gmail.com

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

An Overview of Methods for Applying Semi-Markov Processes in Biostatistics.

An Overview of Methods for Applying Semi-Markov Processes in Biostatistics. An Overview of Methods for Applying Semi-Markov Processes in Biostatistics. Charles J. Mode Department of Mathematics and Computer Science Drexel University Philadelphia, PA 19104 Overview of Topics. I.

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

Bootstrap prediction and Bayesian prediction under misspecified models

Bootstrap prediction and Bayesian prediction under misspecified models Bernoulli 11(4), 2005, 747 758 Bootstrap prediction and Bayesian prediction under misspecified models TADAYOSHI FUSHIKI Institute of Statistical Mathematics, 4-6-7 Minami-Azabu, Minato-ku, Tokyo 106-8569,

More information

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown. Weighting We have seen that if E(Y) = Xβ and V (Y) = σ 2 G, where G is known, the model can be rewritten as a linear model. This is known as generalized least squares or, if G is diagonal, with trace(g)

More information

CHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA

CHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA STATISTICS IN MEDICINE, VOL. 17, 59 68 (1998) CHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA J. K. LINDSEY AND B. JONES* Department of Medical Statistics, School of Computing Sciences,

More information

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Machine Learning. Lecture 3: Logistic Regression. Feng Li. Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification

More information

Akaike Information Criterion

Akaike Information Criterion Akaike Information Criterion Shuhua Hu Center for Research in Scientific Computation North Carolina State University Raleigh, NC February 7, 2012-1- background Background Model statistical model: Y j =

More information

The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models

The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health

More information

Asymptotic inference for a nonstationary double ar(1) model

Asymptotic inference for a nonstationary double ar(1) model Asymptotic inference for a nonstationary double ar() model By SHIQING LING and DONG LI Department of Mathematics, Hong Kong University of Science and Technology, Hong Kong maling@ust.hk malidong@ust.hk

More information

Bootstrap Variants of the Akaike Information Criterion for Mixed Model Selection

Bootstrap Variants of the Akaike Information Criterion for Mixed Model Selection Bootstrap Variants of the Akaike Information Criterion for Mixed Model Selection Junfeng Shang, and Joseph E. Cavanaugh 2, Bowling Green State University, USA 2 The University of Iowa, USA Abstract: Two

More information

Estimation of Dynamic Regression Models

Estimation of Dynamic Regression Models University of Pavia 2007 Estimation of Dynamic Regression Models Eduardo Rossi University of Pavia Factorization of the density DGP: D t (x t χ t 1, d t ; Ψ) x t represent all the variables in the economy.

More information

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1 Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr

More information

Some Approximations of the Logistic Distribution with Application to the Covariance Matrix of Logistic Regression

Some Approximations of the Logistic Distribution with Application to the Covariance Matrix of Logistic Regression Working Paper 2013:9 Department of Statistics Some Approximations of the Logistic Distribution with Application to the Covariance Matrix of Logistic Regression Ronnie Pingel Working Paper 2013:9 June

More information

Lecture Stat Information Criterion

Lecture Stat Information Criterion Lecture Stat 461-561 Information Criterion Arnaud Doucet February 2008 Arnaud Doucet () February 2008 1 / 34 Review of Maximum Likelihood Approach We have data X i i.i.d. g (x). We model the distribution

More information

Obtaining simultaneous equation models from a set of variables through genetic algorithms

Obtaining simultaneous equation models from a set of variables through genetic algorithms Obtaining simultaneous equation models from a set of variables through genetic algorithms José J. López Universidad Miguel Hernández (Elche, Spain) Domingo Giménez Universidad de Murcia (Murcia, Spain)

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information