MODEL SELECTION BASED ON QUASI-LIKELIHOOD WITH APPLICATION TO OVERDISPERSED DATA
|
|
- Sybil Neal
- 5 years ago
- Views:
Transcription
1 J. Jpn. Soc. Comp. Statist., 26(2013), DOI: /jjscs MODEL SELECTION BASED ON QUASI-LIKELIHOOD WITH APPLICATION TO OVERDISPERSED DATA Yiping Tang ABSTRACT Overdispersion is a common phenomenon in discrete data analysis with generalized linear models which can never be ignored. In the literature on model selection with overdispersion, the main stream of the methods is to use the maximum likelihood method with a strong assumption of defining an entire distribution of the data to construct information criteria. For example, Fitzmaurice (1997) proposed model selection methods based on Efron s (1986) double-exponential families. However, the assumption of a parametric distribution of the data is sometimes too strong. In this paper, we propose a new model selection strategy based on the quasi-likelihood functions. First, we define a statistical model with two moments in the form of mean and variance. Second, we define the semiparametric information based on the semiparametric statistical models. Finally, we propose the semiparametric information criterion (SIC) based on quasi-likelihood estimators with weak assumptions on the first two moments. Although the proposed semiparametric information criterion is inspired by data with overdispersion, it applies more generally to data no matter whether they are discrete or continuous. It may be useful for any models for which the quasi-likelihood methods are appropriate. We also demonstrate the use of SIC through simulation studies and data application of the well-known data on germination of Orobanche. 1. Introduction 1.1. Overdispersed data In many areas of applications such as Biostatistics, life insurance, epidemiological studies and so on, when dealing with data analysis depending on GLMs (including binomial and Poisson distributions), the phenomenon of overdispersion widely exists. That is, given a standard exponential family, GLMs hold a specified mean-variance relationship, the variation of the data is however greater than the variance predicted by the mean-variance relationship. Example 1 We use a familiar example introduced by McCullagh and Nelder (1989, p.125) to illustrate the problem of overdispersion. Overdispersion commonly arises in cluster sampling. Clusters usually vary in size, for simplicity, we assume that the cluster size, k, is fixed and that the m individuals sampled actually come from m/k clusters. In the ith cluster, the number of positive respondents, Z i, is assumed to have the binomial distribution with index k and parameter π i, which is considered as a random variable and thus varies from cluster to cluster. The total number of positive respondents is Y = Z 1 + Z Z m/k. Graduate School of Science, Chiba University, 1-33 Yayoi-cho, Inage-ku, Chiba , Japan tou.kihei@gmail.com Key words: Akaike information criterion; Generalized linear models; Kullback-Leibler information; Model selection; Overdispersion; Quasi-likelihood 53
2 TANG If we let E(π i ) = π and Var(π i ) = τ 2 π(1 π), it may be shown that the unconditional mean and variance of Y are given by E(Y ) = mπ, Var(Y ) = mπ(1 π){1 + (k 1)τ 2 } = mπ(1 π)σ 2. Note that the above results only depend on the first two moments of π i. Note also that the dispersion parameter σ 2 = 1 + (k 1)τ 2 depends both on the cluster size and on the variability of π from cluster to cluster, but not on the sample size, m. Overdispersion can occur only if m > Modeling overdispersed data In order to model overdispersion, many approaches have been proposed to use the maximum likelihood method with an assumption of defining an entire distribution of the data. Williams (1982) modified standard logistic-linear analysis by using beta-binomial distributions to accommodate the overdispersed binomial data. Lawless (1987) proposed negativebinomial regression models to deal with overdispersed Poisson data. Efron (1986) proposed the double-exponential family method to add a dispersion parameter to control the variance independently of the mean. Anderson et al. (1994) proposed modified information criteria QAIC and QAIC c which are based on the quasi-likelihood to deal with model selection with overdispersed data, and they found that these criteria performed well in product multinomial models of capture-recapture data in the presence of differing levels of overdispersion. Hurvich and Tsai (1995) provided information on the use of AIC c with overdispersed data. Fitzmaurice (1997) developed model selection strategies based on Efron s (1986) doubleexponential families so that the maximum likelihood method can still be used and model selection can be done in the generalized linear model framework. However, in practical data analysis, the above methods have a common drawback that the true distribution of the data will never be known, therefore the assumption of specifying an exact distribution is very strong. Wedderburn (1974) proposed an important method called the quasi-likelihood method. He added a dispersion parameter to control the variance independently of the mean, but weakened the assumption of defining the entire distribution of the data. He just used assumptions on the first two moments, rather than the entire distribution of the data. An attractive property of the quasi-likelihood function is that it has properties similar to those of the likelihood function. When we deal with overdispersed data, and only use the first and the second moments, the maximum likelihood methods will not be available. Furthermore, when the model selection problem appears, maximum likelihood based information criterion AIC can not be effective any more Contributions of the paper The purpose of this paper is to extend GLM to situations where we do not wish to impose any distributional assumptions for the data, instead we make assumptions on the first two moments of the data. Using this semiparametric information, we proceed by extending the idea of Akaike (1974) using the Kullback-Leibler information to measure the discrepancy between the true distribution and the assumed semiparametric models. We call the resulting information criterion the semiparametric information criterion (SIC). One 54
3 Semiparametric Information Criteria Using Quasi-likelihood major application is to apply SIC for analyzing overdispersed data. The rest of the paper is organized as follows. In Section 2, we make a brief review of the quasi-likelihood method of Wedderburn (1974). We shall use the quasi-likelihood estimator in the subsequent sections for constructing predictive semiparametric models. Section 3 contains the main theoretic results. New information criteria based on the semiparametric models are developed in this section. In Section 4, we apply the proposed model selection techniques for analyzing the classical seeds germination data of Crowder (1978). We found that our results are consistent with most results in the literature. Finally, in Section 5, we show some results of a series of simulation studies to demonstrate the potential usefulness of the proposed model selection methods, which also indicate problems for future research. 2. Quasi-likelihood Generalized linear models are widely used as a standard tool in modern regression models. They were first introduced by Nelder and Wedderburn (1972). Their estimation is based on the iteratively reweighted least squares algorithm, which only requires a relationship between the mean and variance instead of the full distribution. Wedderburn (1974) extended the GLM by replacing the likelihood function with the quasi-likelihood function. McCullagh (1983) further developed the quasi-likelihood and established the consistency and asymptotic normality of the parameter estimates under second moment assumptions. We will briefly review the quasi-likelihood method here. Suppose that the n components of the response vector Y are independent with mean vector µ and covariance matrix ϕv (µ); ϕ may be a known or unknown constant and V ( ) is known. Since the components of Y are independent, the matrix V (µ) can be written in the following form: V (µ) = diag {V 1 (µ),..., V n (µ)}. One assumption in V (µ) is that V i (µ) depends only on the ith component of µ. We first consider a single component Y of the response vector Y. Under the above conditions, we define U = Q(Y ; µ) µ The function U has the following properties: E(U) = 0, Var (U) = 1/{ϕV (µ)}, E The quasi-likelihood function Q(Y ; µ) may be written as follows: Q(Y ; µ) = µ Y = Y µ ϕv (µ). (1) Y t ϕv (t) dt. ( ) U = 1/{ϕV (µ)}. µ We refer to Q(Y ; µ) as the quasi-likelihood, or more correctly, as the log quasi-likelihood for µ. Since the components of Y are independent by assumption, the log quasi-likelihood for the complete data Q (Y ; µ) is the sum of the individual contributions. The quasi-likelihood estimating equations for the regression parameters β, obtained by differentiating Q (Y ; µ) with respect to β, may be written in the form U(β) = 0, where U(β) = D T V 1 (Y µ) /ϕ (2) 55
4 TANG is called the quasi-score function. In this expression, the components of D, of order n q, are D ir = µ i / β r, the derivatives of µ(β) with respect to the parameters. The covariance matrix of U(β), which is also the negative expected value of U(β)/ β, is i β = D T V 1 D/ϕ. For quasi-likelihood functions, this matrix plays a similar role to the Fisher information for ordinary likelihood functions. Let y i (i = 1,..., n) be independent response variables which may be proportions or counts with mean µ i and variance ϕv (µ i ). In addition, suppose that Ey i ] = µ i = t 1 (η i ), where η i = x i β, t 1 ( ) is the inverse link function, x i is a q vector of covariates, β is a q vector of the unknown parameters β = (β 1,..., β q ) and ϕ does not depend on β. Let X be the n q matrix of covariates with the ith row being X (i), θ = (β, ϕ) Θ q+1 and q < n. Since the above model only depends on the assumption of the mean and variance, we may use the quasi-likelihood method to estimate the unknown parameters θ. For estimating β, we use the following quasi-score functions: ] n yi t 1 (X (i) β) U 1j (β, ϕ) = ϕv ( t ( 1 X (i) β )) i X ij, for j = 1,..., q. (3) For estimating ϕ we use the second moment function: ( n yi t ( 1 X (i) β )) ] 2 U 2 (β, ϕ) = ϕv ( t ( 1 X (i) β )) (n q 1), (4) where X (i) is the ith row of X, and i = t 1 (η i )/ η i. The quasi-likelihood estimators ( β, ϕ) are the simultaneous solutions of the q + 1 equations U 1j ( β, ϕ) = 0, for j = 1,..., q and U 2 ( β, ϕ) = 0. According to the results of McCullagh (1983), the numerical value of β is not affected by ϕ whether ϕ is known or not. So U 1j (β, ϕ) can be viewed as a function of β. We rewrite quasi-likelihood equations in a matrix form, U 1 ( β, ϕ) { ( = D T V 1 Y t 1 X β )} = 0, (5) where Y is a n 1 vector; t 1 (X β) = (t 1 (X (1) β),..., t 1 (X (n) β)) ; D is a n q matrix with components as D ij = i X ij, for i = 1,..., n and j = 1,..., q; V is a n-dimensional diagonal matrix with components ϕv ( t ( 1 X (i) β )), for i = 1,..., n. The asymptotic covariance matrix of β ( 1 is cov( β) = D T V D) 1. Beginning with an arbitrary value β 0 sufficiently close to β, the sequence of parameter estimates defined by the Newton-Raphson method with Fisher scoring is β 1 = β ( ) DT 0 V 0 D 0 DT 0 1 V 0 {Y ( t 1 X β )} 0. (6) The quasi-likelihood estimator β can be obtained by iterating until convergence. Approximate unbiasedness and asymptotic normality of β follow directly from (6) under the second moment assumptions. Let θ = ( β, ϕ). Moore (1986) established consistency and asymptotic normality of θ. 56
5 Semiparametric Information Criteria Using Quasi-likelihood 3. Semiparametric Model Selection Criteria In this section, we propose semiparametric model selection criteria SIC and SIC T. They apply generally to data no matter whether they are discrete or continuous. For convenience, we suppose that the data in the subsequent paragraphs are continuous. If they are discrete then integration is replaced by summation Semiparametric models We assume that the true probability density function of Y is h(y) and the true distribution function is H(y). We will relax the assumption of a complete specification of the probability model, only using some general form of the variance function about the mean as used in GLM. Now suppose that the statistician makes the assumption that the random variables y i (i = 1,..., n) have distribution G( ), density function g( ), mean µ i (β) and variance ϕv (µ i (β)). The first and the second moments can be combined in the following way: ( m(y i, θ) = m(y i, β, ϕ) = y i µ i (β) (y i µ i (β)) 2 ϕv (µ i (β)) ) = ( ) m1i m 2i By assumption, we have that E G m (y i, θ)] = m(y i, θ) dg(y) = 0. Let M denote the set of all possible distribution functions, let { } G(θ) = G M : m(y, θ) dg(y) = 0. Define G = (θ Θ) G(θ) as the set of all possible distribution functions that are restricted by the moments conditions. The class G is the statistical model we shall consider in this paper Kullback-Leibler information Extending the idea of AIC, we shall generalize the data-based model selection approach based also on the Kullback-Leibler information. We use the Kullback-Leibler information to measure the closeness between H(y) and the assumed semiparametric model G( ). Let g vary in the statistical model class G. We consider the constrained minimization problem ( ) h(y) v(β, ϕ) = inf log h(y) dy (7) g G g(y) subject to m(y, β, ϕ)g(y) dy = 0 and g(y) dy = 1. Using the results in convex analysis (e.g., Kitamura, 2006), the solution to the infinite dimensional optimization problem v(β, ϕ) can be equivalently written as the solution to the finite dimensional optimization problem v (β, ϕ) as follows: v (β, ϕ) = sup λ,r λ + ] log ( λ r m (y, β, ϕ)) h(y) dy. (8) 57
6 TANG Since we are only interested in the maximization of v (β, ϕ) with respect to β and ϕ, we can ignore other terms and proceed to define the semiparametric information of the statistical model G as follows: Definition 1 The semiparametric Kullback-Leibler information for quasi-likelihood function is defined as follows: SKL(θ) = sup E H ρ(y, θ, r)] = sup ρ(y, θ, r)h(y) dy, (9) r 2 r 2 where ρ(y, θ, r) = log (1 + r m(y, β, ϕ)). Since the true distribution function H( ) is unknown, we use the empirical distribution of the data to estimate the true distribution function H( ) Regularity conditions In this paper, we shall use SKL(θ) to select an appropriate semiparametric model. We will consider both the correctly specified case and the misspecified case. A semiparametric model is called correctly specified if the moments assumed evaluated at a particular value of the parameter are equal to the moments of the underlying true distribution. First, we state the regularity conditions which will be used in the sequel to prove the asymptotic unbiasedness of the proposed information criteria. C1 Define θ = (β, ϕ ) as the pseudo-true value, such that the semiparametric Kullback- Leibler information SKL(θ) is minimized at θ. C2 ρ(y, θ, r) is twice continuously differentiable in θ Θ and r 2. Note that C2 implies that r(θ) = arg sup ρ(y, θ, r)h(y) dy (10) r 2 is continuous with respect to θ. Furthermore, from C1 we know that there is a unique r corresponding to θ : r = arg sup r 2 ρ(y, θ, r)h(y) dy. (11) C3 The problem of inf θ Θ SKL(θ) has a unique saddle point solution (θ, r ), where r is defined in (11). When the model G is correctly specified, there exists a unique (θ, r ) satisfying SKL(θ ) = 0. C4 The following law of large numbers holds: sup sup θ Θ r 2 1 n n ρ(y i, θ, r) E H ρ(y, θ, r)] = o p(1) The expected information When we consider SKL(θ) as defined in (9), the minimizer of SKL(θ) can be viewed as the solution to the following saddle point problem: θ = arg min sup θ Θ r L(θ) E H ρ(y, θ, r)], (12) 58
7 Semiparametric Information Criteria Using Quasi-likelihood where Θ denotes the parameter space for θ, L(θ) = {r : r m(y, θ) I}, and I is an open interval containing zero. Next we describe the process of solving the above saddle point problem. Firstly, the expectation of ρ(y, θ, r) is maximized for given θ, so that the θ satisfies the following condition: E H ρ 1 (y, θ, r) m(y, θ)] = 0, where ρ 1 (y, θ, r) = 1/(1 + r m(y, θ)) is the first derivative of ρ with respect to r m(y, θ). C5 Assume that ρ(y, θ, r)h(y) dy and ρ 1 (y, θ, r)h(y) dy are differentiable under the integral sign. Secondly, Since θ is the minimizer of the quantity E H ρ(y, θ, r)], for a proper r, implying that E H ρ 1 (y, θ, r) M(y, θ) r(θ)] = 0, where M(y, θ) = m(y, θ)/ θ. Now denote the matrix of the first derivatives of ρ(y, θ, r) with respect to θ and r by ( ) ρ1 (y, θ, r) M(y, θ) Υ(θ, r) = r, ρ 1 (y, θ, r)m(y, θ) The values (θ, r(θ )) must satisfy the following first-order condition: E H Υ (θ, r (θ ))] = 0. (13) C6 Define ] Υ(θ, r ) Q = E H (θ, S = E, r ) H Υ(θ, r )Υ(θ, r ) ]. Assume that Q and S are finite and S is invertible. However, θ is not computable because it involves the expectation with respect to the true density function h(y). One way is to use the value to minimize the empirical version of the expected semiparametric Kullback-Leibler information: θ = arg min θ Θ { 1 n } n ρ (y i, θ, r (θ)), (14) where r(θ) is defined in (10). To approximate r(θ), we use the empirical distribution function in place of the true distribution function H( ) to obtain { } 1 n r(θ) = arg max ρ(y i, θ, r). (15) r 2 n We can show that ( θ, r( θ )) are consistent and asymptotically normally distributed along similar lines of Chen et al. (2007), we omit the proof here. The calculation of ( θ, r( θ )) is unstable and usually difficult to compute. Since the θ defined in Section 2 is also a consistent estimator, it is reasonable to use ( θ, r( θ)) instead. It is easy to show that ( θ, r( θ)) holds the properties of consistency and asymptotic normality too. 59
8 TANG Definition 2 In the statistical model G, the expected semiparametric information is defined as follows: ] E H SKL( θ)] = E H ρ(y, θ, r( θ))h(y) dy. (16) For later use, we write explicitly the components of Υ(θ, r) as follows: ( ) m(y, θ) M1 M M(y, θ) = θ = 3, M 2 M 4 where M 1 = m 1 β = µ i(β) β, M 2 = m 2 β = 2(µ i (β) y) µ i β ϕ V (µ i(β)) µ i µ i β, M 3 = m 1 ϕ = 0, M 4 = m 2 ϕ = V (µ i(β)) The proposed information criteria Definition 3 Let k be the dimension of θ, then we define the following information criterion: SIC = n ( ρ y i, θ, ) r( θ) + k, (17) where θ = ( β, ϕ) with β being the quasi-likelihood estimator and ϕ the estimator from the usual Pearson residual. We shall call this information criterion the semiparametric information criterion, or SIC for short. The proposed criterion SIC has the following properties. Proposition 1 Suppose that the model is correctly specified. Further suppose that (C1) (C6) hold. Then SIC is an asymptotically unbiased estimator of the expected information. 1 n Proof. Under C4, we know that when n goes to infinity, n ρ(y i, θ, r( θ)) converges in probability to E H ρ(y, θ, ] r( θ)). So that the expected information can be written as follows: E H SKL( θ)] = E H E H ρ(y, θ, r( θ) ]] θ) + E H E H ρ(y, θ, r( θ) θ) ]] E H EH ρ(y, θ, r(θ ) θ ) ]] + E H EH ρ(y, θ, r(θ ) θ ) ]] E H EH ρ(y, θ, r(θ ) θ ) ]] + E H EH ρ(y, θ, r(θ ) θ ) ]] E H E H ρ(y, θ, r( θ) θ) ]] = E H E H ρ(y, θ, r( θ) θ) ]] + D 1 + D 2 + D 3, where D 1, D 2 and D 3 denote the term on the second, third and the fourth lines above, respectively. Now we compute these quantities next. First, we consider D 1. According to Taylor s theorem and C5 and C6, by expanding ρ(y, θ, r( θ)) at (θ, r ) up to the second order, we have 60
9 Semiparametric Information Criteria Using Quasi-likelihood ( ) ρ(y, θ, r( θ)) =. θ θ ρ(y, θ, r) ρ (y, θ, r ) + r( θ) r (θ, r ) (θ, r ( ) ) + 1 ( ) θ θ 2 ρ(y, θ, r) θ θ 2 r( θ) r (θ, r ) 2 (θ, r r( θ) r ) Taking expectation of both sides with respect to the true distribution function H(y), we obtain that ] ρ(y, θ, r) E H (θ, r ) = E H Υ(θ, r )] = 0. (θ, r ) Thus we deduce that ( )]] ( )]] D 1 = E H E H ρ y, θ, r( θ) θ E H E H ρ y, θ, r(θ ) θ ( ) = 1 ( )]] θ 2 E θ 2 ρ(y, θ, r) H E H θ θ r( θ) r (θ, r ) 2 (θ, r + o p (1). r( θ) r ) Next, we consider a second-order Taylor expansion of D 3, obtaining ( ρ (y, θ, r(θ )) =. ( ρ y, θ, ) θ r( θ) + θ ) ρ(y, θ, r) r(θ ) r( θ) (θ, r ) ( + 1 θ θ ) ( 2 ρ(y, θ, r) θ 2 r(θ ) r( θ) (θ, r ) 2 θ ) ( θ, r ( θ)) r(θ ) r( θ) The properties of the consistency of ( θ, r( θ) ) implies that ] ρ(y, θ, r) E H (θ, r ) = 0. ( θ, r ( θ)) These results jointly imply that ( D 3 = E H E H ρ y, θ, r(θ ) θ ( = 1 2 E θ H E θ H r( θ ) r( θ) ( θ θ )]] + o p (1). r( θ ) r( θ) )]] ( E H E H ρ y, θ, r( θ) ) 2 ρ(y, θ, r) (θ, r ) 2 ( θ, r ( θ)) ( θ, r ( θ)) )]] θ Finally, we consider D 2. According to the law of large numbers, we have that D 2 = 0. In addition, the quadratic forms in both D 1 and D 3 converge to centrally distributed chi-square random variables with k degrees of freedom. This completes the proof. For a possibly misspecified model, we propose the following criterion for model selection. 61
10 TANG Definition 4 Consider a possibly misspecified semiparametric model G as defined in Section 3.1. Let ) Ŝ = 1 n ) ) ] 1 n Υ i ( θ, r( θ) Υ i ( θ, r( θ), Q = Υ i ( θ, r( θ) n n (θ., r ) Then we define the following information criterion SIC T = n ( ρ y i, θ, ) r( θ) trace(ŝ Q 1 ). (18) The proposed criterion SIC T has the following properties. Proposition 2 Under the regularity conditions (C1) (C6) listed in Section 3.3 and 3.4, SIC T is an asymptotically unbiased estimator of the expected information. SIC T can be understood as a semiparametric extension of the Takeuchi information criterion for parametric likelihood analysis (Takeuchi, 1976). Proof. Under (C1) (C6), we obtain that and 1 n n Υ i (θ, r )Υ i (θ, r ) P EH Υ(θ, r )Υ(θ, r ) ] = S, 1 n n Υ i (θ, r) (θ, r ) ] P Υ(θ, r ) EH (θ, r (θ = Q., r ) ) Furthermore, using the consistency and the asymptotic normality of ( θ, r( θ)): ( ) θ θ n N(0, Q 1 SQ 1 ), r( θ) r we obtain that ( ) ]] 2nE H E H ρ y, θ, r( θ) θ ρ (y, θ, r ) ( ) ( )]] θ θ 2 ρ(y, θ, r) = ne H E H θ θ r( θ) r (θ, r ) 2 (θ, r r( θ) r ) ] ( ) ( ) ]] = trace E 2 ρ(y, θ, r) H θ θ θ θ (θ, r ) 2 (θ, r ne H r( θ) r ) r( θ) r P trace Q ( Q 1 SQ 1)] = trace ( SQ 1), 62
11 Semiparametric Information Criteria Using Quasi-likelihood where S and Q can be explicitly written as follows: S = E H Υ (θ, r ) Υ (θ, r ) ] = E H ( ρ1 M(y, θ ) r ρ 1 m(y, θ ) ) (ρ1 r M(y, θ ) ρ 1 m(y, θ ) )] = E H ρ 2 1 M(y, θ ) r r M(y, θ ) ρ 2 1M(y, θ ) r m(y, θ ) ρ 2 1m(y, θ ) r M(y, θ ) ρ 2 1m(y, θ )m(y, θ ) Q = E H Υ (θ, r ) (θ, r ) ] = E H ρ 1 M θ r + ρ 2 M r r M ρ 1 M + ρ 2 M r m(y, θ ) ρ 1 M + ρ 2 m(y, θ ) r M ρ 2 m(y, θ )m(y, θ ) where M = M(y, θ ) and the derivative with respect to θ can be written as M(y, θ ) r ( ) ( ) 2 m 1 = β β, 2 m 2 r 2 m 1 β β β, 2 m 2 r ϕ β ) ϕ. (19) θ ( 2 m 1 ϕ β, 2 m 2 ϕ β r ( 2 m 1 ϕ 2, 2 m 2 ϕ 2 ) r The entries in (19) are calculated as follows: 2 m 1 β β = µ ] i 2 ( m 1 2 ) β β, ϕ β = m 1 2 m 1 β = 0, ϕ ϕ 2 = 0, 2 m 2 β β = 2(µ i (β) y i ) µ i β β ϕ V (µ ] i(β)) µ i µ i β, 2 ( m 2 2 ) ϕ β = m 2 β = V (µ i(β)) µ i ϕ µ i β, 2 m 2 ϕ 2 = Data Analysis Crowder (1978, Table 3) considered the data on germination of the seed variates Orobanche. Orobanche is a parasitic plant growing on the roots of flowering plants. In this experiment, two types of seeds from Orobanche (O.aegyptiaca75 and O.aegyptiaca73) were tested for germination when they were given two different root extracts (Bean and cucumber). The goal of the experiment was to determine the effect of the root extract on inhibiting the growth of the parasitic plant. This data was also analyzed by Breslow and Clayton (1993) to illustrate the use of generalized linear mixed models for overdispersion. In these studies, the authors have not considered the aspects of model selection. Now we reanalyze it by applying the model selection technique proposed based on the quasi-likelihood. In the Orobanche data, y i are the proportion of germinated seeds, and n i are seeds in the ith data set (i = 1,..., n = 21). Here, we apply the quasi-likelihood method to deal with these overdispersed data. We suppose that Ey i ] = µ i = π i and Vary i ] = ϕv (µ i ) = ϕπ i (1 π i )/n i, where ϕ is the overdispersion parameter. Furthermore, suppose each mean π i is associated with a set of covariates x i through a logistic link function, namely logit(π i ) = log(π i /(1 π i )) = Xi T β. Here X i = (1, x 1i, x 2i, x 1i x 2i ), where x 1i = 1 if the root extract is cucumber, otherwise 0 if the root extract is Bean; and x 2i = 1 if the type of seeds is O.aegyptiaca75, otherwise 0 if the type of seeds is O.aegyptiaca73. Here we have five candidate models: ], ], 63
12 TANG (A) logit(π i ) = β 0 (B) logit(π i ) = β 0 + β 1 x 1i (C) logit(π i ) = β 0 + β 2 x 2i (D) logit(π i ) = β 0 + β 1 x 1i + β 2 x 2i (E) logit(π i ) = β 0 + β 1 x 1i + β 2 x 2i + β 3 x 1i x 2i We first used the quasi-binomial likelihood method of the free software R (R Development Core Team, 2012) to estimate the regression coefficients in all the candidate models, the results are summarized in Table 1. From Table 1, we know that the standard errors of the parameters are adjusted by the overdispersion factor to a reasonable level. Furthermore, the overdispersion parameter also has an impact on hypothesis testing. Comparing model (E) with model (D) by F -test, we found that the p value is The p value is 0.25, when model (B) is compared with model (D). Thus these two pairs of models are not significantly different when the p values are compared with the threshold value 5%. On the other hand, both the difference between model (A) and model (B) and that between model (C) and model (D) are significant using appropriate F -tests. Table 1: Summaries of quasi-binomial analysis for the Orobanche data Models (A) (B) (C) (D) (E) Intercept (sd) (0.15) (0.15) (0.27) (0.22) (0.25) Extracts (sd) (0.21) (0.21) (0.34) Seeds (sd) (0.33) (0.23) (0.30) Interaction (sd) (0.42) Overdispersion Next we computed the values of the various information criteria and the results are summarized in Table 2. SIC has the smallest value at model (B) among all candidate models. In Table 2, AIC BB denotes the values of AIC based on the beta-binomial model. As can be seen in Table 2, SIC is the smallest for model (B). SIC ranks the models in the following way: (B) > (A) > (D) > (E) > (C). This result is consistent with the results obtained Crawley (2005, p ). Crawley (2005) used the quasi-likelihood method to analyze the Orobanche data. He applied one F-test to compare the full model (E) with other simpler models. He found that there is no compelling evidence that the types of seeds and the interaction should be kept in the model, and the minimal adequate model is model (B). On the other hand, SIC T ranks the models in the following way: (C) > (A) > (B) > (D) > Table 2: Semiparametric model selection criteria for the Orobanche data Models (A) (B) (C) (D) (E) SIC SIC T AIC BB
13 Semiparametric Information Criteria Using Quasi-likelihood (E), which is quite different from the results obtained by using SIC. Even in full likelihood analysis, Takeuchi type information criteria have been reported to be unstable in some cases (Shibata, 1989). Comparing to SIC and SIC T, AIC BB selects the most complicated model (E) as the best one. 5. Simulation Studies Now we examine the performance of SIC and SIC T numerically. The other criterion considered in the simulation studies is AIC (Akaike, 1973). We computed the frequencies of selecting the candidate models for each information criterion, following the idea of Greven and Kneib (2010). We generated data {(y i, n i ), i = 1,..., n}, where y i is the proportion of successes in n i trials. The total sample size is fixed as N = n n i. Similar to the germination data set, we shall consider the case of two covariates x 1 and x 2. Firstly, we generated the number of trials n i. The data are divided into J clusters. Assume that the total number of trials in the jth cluster is m j, so that J j=1 m j = N. We generated the m j from the multinomial distribution Multinomial(N, p N ). We further suppose that there are v sets of data in every cluster. Given a fixed number of trials m j, the number of trials n ij in set i are generated from the multinomial distribution Multinomial ( m j, (q 1,..., q v ) ), where q 1 =... = q v = 1/v and m j = v n ij. The next step is to generate the proportion of successes y i. In order to ensure the existence of overdispersion, we suppose that the data follow a mixture model so that n i y i Binomial(n i, P i ), that is, n i y i follows a binomial distribution with mean n i π i, where P i is a beta random variable with parameters α i = cπ i, β i = c(1 π i ) with c = α i +β i being a constant independent of i. Simple calculation shows that y i has mean EY i ] = π i and variance 1 VarY i ] = π i (1 π i )1 + (n i 1)( c + 1 )]/n i, the variance being larger than the nominal binomial variance π i (1 π i )/n i. The degree of overdispersion is controlled by the positive constant c, smaller c corresponding to larger overdispersion. Finally, π i relates to the regression parameters through the logistic link function, namely logit(π i ) = β 0 + β 1 x 1i + β 2 x 2i + β 3 x 1i x 2i. The five candidate models in our simulation studies are the same as the models considered in Section 4. We considered three scenarios. In each case, we set J = 4, N = 2000, p N = (1/2, 1/4, 1/8, 1/8), n = (40, 60, 80) and c = (20, 40, 60, 80). The true model is logit(π i ) = β 0 + β 1 x 1i, where the β 0 = 0.27,and β 1 = In each case we generated the pseudo random numbers 2000 times. In the following tables, AIC denotes the Akaike s information criterion AIC based on the binomial model while AIC BB refers to the AIC based on the beta-binomial model Scenario I In this case the covariates x 1i and x 2i are binary variables taking values 0, 1. The obtained results are summarized in Table 3. In Table 3, we find that when n = 40 and c = 20, AIC BB chooses the true model (B) over 76% of the samples. Both SIC and SIC T select the true model over 60% of the samples, and SIC is slightly better than SIC T. AIC ignores the overdispersion and performs poorly with only about 45% selection frequency. When c increases from 20 to 80, the degree of overdispersion decreases. In the extreme case, when c goes into infinity, the beta-binomial distribution tends to the binomial distribution. 65
14 TANG Table 3: The frequency of models selected by various criteria in scenario I n c Criteria (A) (B) (C) (D) (E) AIC(%) 0(0) 907(45) 0(0) 452(23) 641(32) SIC(%) 657(33) 1329(66) 8(0) 6(0) 0(0) SIC T (%) 690(34) 1206(60) 3(0) 47(2) 54(3) AIC BB (%) 0(0) 1524(76) 0(0) 284(14) 192(10) AIC(%) 0(0) 1327(66) 0(0) 350(18) 323(16) SIC(%) 209(10) 1778(89) 2(0) 11(1) 0(0) SIC T (%) 217(11) 1610(80) 0(0) 96(5) 77(4) AIC BB(%) 0(0) 1519(76) 0(0) 289(14) 192(10) AIC(%) 0(0) 1172(59) 0(0) 396(20) 432(22) SIC(%) 137(7) 1844(92) 3(0) 15(1) 1(0) SIC T (%) 149(7) 1541(77) 3(0) 88(4) 219(11) AIC BB (%) 0(0) 1544(77) 0(0) 276(14) 180(9) AIC(%) 0(0) 1458(73) 0(0) 341(17) 201(10) SIC(%) 14(1) 1981(99) 1(0) 4(0) 0(0) SIC T (%) 12(1) 1444(72) 1(0) 227(11) 316(16) AIC BB(%) 0(0) 1567(78) 0(0) 282(14) 151(8) In every scenario when n = 40 and c increases from 40 to 60, the selection frequencies of AIC, SIC and SIC T become higher while the behavior of AIC BB seems to be stable. A similar tendency is observed in the cases when n = 80 and c = (40, 60). The frequencies of models selected by various criteria when n = 60 and c = (20, 40, 60, 80) are the intermediate results between the cases when n = 40, c = (20, 80) and the cases when n = 80, c = (20, 80), we omit them in all scenarios. When the sample size n = 80, AIC BB selects the true models 77% of times over the samples. SIC behaves better than AIC BB in all cases, it chooses the true models 92% of the times and achieves the highest rate 99% when n = 80 and c = 80. However, SIC T behaves better than AIC while is a little worse than AIC BB. When c increases, AIC, SIC and SIC T become better and better; while AIC BB seems to be stable Scenario II In this scenario, the covariate x 1i was generated from the standard Normal distribution N(0, 1) and x 2i was generated from the Uniform distribution U(0, 1). The obtained results are shown in Table 4. In Table 4, when the sample size n = 40, both SIC and SIC T perform better than AIC but worse than AIC BB. When c increases from 20 to 80, the values of AIC BB keep constant, selecting the true model 74% times over the samples; SIC has selection rates increasing correspondingly from 54% to 68% while the selection rates of SIC T increase from 49% to 62%. When the sample size is set to n = 80, AIC BB performs stably with selection rates varying around 76%. Both SIC and SIC T perform better than AIC BB in all cases. SIC behaves more stably than SIC T Scenario III In this case, both x 1i and x 2i were generated independently from the standard Normal distribution N(0, 1). The obtained results are shown in Tables 5. In Table 5, when the sample size n = 40, both SIC and SIC T perform better than AIC but worse than AIC BB. When c increases from 20 to 80, the values of AIC BB are relatively 66
15 Semiparametric Information Criteria Using Quasi-likelihood Table 4: The frequency of models selected by various criteria in scenario II n c Criteria (A) (B) (C) (D) (E) AIC(%) 0(0) 677(34) 0(0) 507(25) 816(41) SIC(%) 927(46) 1070(54) 3(0) 0(0) 0(0) SIC T (%) 993(50) 984(49) 3(0) 11(1) 9(0) AIC BB (%) 0(0) 1504(75) 0(0) 290(14) 206(10) AIC(%) 0(0) 1175(59) 0(0) 443(22) 382(19) SIC(%) 643(32) 1356(68) 0(0) 1(0) 0(0) SIC T (%) 708(35) 1243(62) 0(0) 29(1) 20(1) AIC BB(%) 0(0) 1474(74) 0(0) 310(16) 216(11) AIC(%) 0(0) 930(46) 0(0) 493(25) 577(29) SIC(%) 143(7) 1744(87) 106(5) 4(0) 3(0) SIC T (%) 156(8) 1678(84) 98(5) 44(2) 24(1) AIC BB (%) 0(0) 1530(76) 0(0) 267(13) 203(10) AIC(%) 0(0) 1354(68) 0(0) 365(18) 281(14) SIC(%) 35(2) 1936(97) 27(1) 2(0) 0(0) SIC T (%) 42(2) 1758(88) 26(1) 106(5) 68(3) AIC BB(%) 0(0) 1546(77) 0(0) 290(14) 164(8) Table 5: The frequency of models selected by various criteria in scenario III n c Criteria (A) (B) (C) (D) (E) AIC(%) 0(0) 704(35) 0(0) 526(26) 770(38) SIC(%) 923(46) 1077(54) 0(0) 0(0) 0(0) SIC T (%) 988(49) 987(49) 0(0) 9(0) 16(1) AIC BB(%) 0(0) 1501(75) 0(0) 289(14) 210(10) AIC(%) 0(0) 1180(59) 0(0) 396(20) 424(21) SIC(%) 642(32) 1357(68) 0(0) 1(0) 0(0) SIC T (%) 703(35) 1253(63) 0(0) 25(1) 19(1) AIC BB (%) 0(0) 1503(75) 0(0) 285(14) 212(11) AIC(%) 0(0) 871(44) 0(0) 470(24) 659(33) SIC(%) 175(9) 1766(88) 46(2) 11(1) 2(0) SIC T (%) 185(9) 1707(85) 39(2) 44(2) 25(1) AIC BB(%) 0(0) 1506(75) 0(0) 278(14) 216(11) AIC(%) 0(0) 1314(66) 0(0) 398(20) 288(14) SIC(%) 41(2) 1946(97) 10(1) 3(0) 0(0) SIC T (%) 51(3) 1769(88) 9(0) 102(5) 69(3) AIC BB (%) 0(0) 1519(76) 0(0) 322(16) 159(8) stable with selection rates of the true model larger than 75%; SIC becomes a little better, the percentage of the true model selected increases from 54% to 68% while the selection rates of SIC T increase from 49% to 63%. When the sample size n = 80, we see that AIC BB performs stably with the selection rates being around 76%. Both SIC and SIC T perform better than AIC BB in all cases. 67
16 TANG 6. Conclusions In this paper we proposed two information criteria, namely SIC and SIC T, for selecting semiparametric models for use in conjunction with the quasi-likelihood methodologies. These model selection techniques complement the usual deviance analyses for generalized linear models. We demonstrated the possible usefulness of the proposed information criteria through both data analysis and simulation studies. In our simulation studies, we generated the data using the beta-binomial model so that the standard Akaike information criterion can be used. In this sense, it is unfair for our proposed methods, since both SIC and SIC T only use partial information on the first and second moments. However, our simulation results show that both SIC and SIC T are comparable with AIC in terms of the frequency of the true model selected. When trace(ŝ Q 1 ) = k, the Takeuchi-type information criterion SIC T reduces to SIC. This occurs when the model is correctly-specified. However, our simulation results show that SIC T may be unstable for a couple of reasons. One reason is the difficulty to estimate S and Q. The instability of the Takeuchi-type information criterion has been reported even in full-likelihood analysis (Shibata, 1989). The results in Section 5 suggest that both SIC and SIC T may be used when sample size is moderately large. Our simulation results also indicate that there are two factors affecting the performance of SIC and SIC T, namely the sample size n and the degree of overdispersion c. We see that when the sample size is fixed and c increases, the frequencies of the true models selected by both SIC and SIC T also increase. When the sample size n increases, both SIC and SIC T have higher true model selection percentages. As often pointed out in the literature, AIC tends to select overparametrized models. From the results shown in Tables 3 5, the phenomenon is also observed here. AIC based on the inappropriate binomial model particularly tends to select the most complicated model (E). AIC based on the beta-binomial distributions is obviously better but has the same tendency. On the other hand, the quasilikelihood based model selection criteria SIC and SIC T tend to select simpler models. SIC looks to be more stable than SIC T in our simulation studies. Due to the relative instability and computational complexity of SIC T, when both SIC and SIC T are available, use of SIC is a relatively conservative choice. There are still problems left unsolved. For example, The performance of SIC and SIC T is not good when the true model becomes more complicated. We will continue to involve these problems in our future works. Acknowledgments This paper is part of my Ph.D. project. I would like to thank my supervisor Professor Jinfang Wang for his kind supervision during these years. I am grateful to the editor and the anonymous referees for their valuable suggestions that improved the quality of the paper. REFERENCES Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory, B. N. Petrov and F. Csaki (eds), , Budapest: Akademiai Kiado. Anderson, D. R., Burnham, K. P. and White, G. C. (1994). AIC model selection in overdispersed capture-recapture data. Ecology, 75, Breslow, N. E. and Clayton, D. G. (1993). Approximate inference in generalized linear mixed models. J Am. Statist. Ass., 88,
17 Semiparametric Information Criteria Using Quasi-likelihood Chen, X., Hong, H., and Shum, M. (2007). Nonparametric likelihood ratio model selection tests between parametric likelihood and moment conditional models, Journal of Econometrics, 141, Crawley, M. J. (2005). Statistics: an introduction using R. John Wiley & Sons, Ltd. Crowder, M. J. (1978). Beta-binomial anova for proportions. Appl. Statist., 27, Efron, B. (1986). Double exponential families and their use in generalized linear regression. J Am. Statist. Ass., 81, Fitzmaurice, D. M. (1997). Model selection with overdispersed data. Statistician, 46, Greven, S. and Kneib, T. (2010). On the behaviour of marginal and conditional AIC in linear mixed models. Biometrika, 97, Hurvich, C. M. and Tsai, C-L. (1995). Model selection for extended quasi-likelihood models in small samples. Biometrics, 51, Kitamura, Y. (2006). Empirical likelihood methods in econometrics: theory and practice. Unpublished Manuscript, Yale University. Lawless, J. F. (1987). Negative binomial and mixed Poisson regression. Canadian J. Statist., 15, McCullagh, P. (1983). Quasi-likelihood functions. Ann. Statist., 11, McCullagh, P. and Nelder, J. A. (1989). Generalized linear models. Chapman and Hall. Moore, D. F. (1986). Asymptotic properties of moment estimators for overdispersed counts and proportions. Biometrika, 73, Nelder, J. A. and Wedderburn, R. W. M. (1972). Generalized linear models. J.R. Statist. Soc. A, 135, R Development Core Team. (2012). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN , URL Shibata, R. (1989). Statistical aspect of model selection, In From Data to Model, Ed. J. C. Willems, Springer, Takeuchi, K. (1976). Distribution of informational statistics and criteria for adequacy of models. Mathematical Sciences (In Japanese), 153, Wedderburn, R. W. M. (1974). Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method. Biometrika, 61, Williams, D. A. (1982). Extra-binomial variation in logistic-linear models. Appl. Statist., 31, (Received: December 26, 2012, Accepted: September 23, 2013) 69
18
Model Selection for Semiparametric Bayesian Models with Application to Overdispersion
Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS020) p.3863 Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Jinfang Wang and
More informationOn Properties of QIC in Generalized. Estimating Equations. Shinpei Imori
On Properties of QIC in Generalized Estimating Equations Shinpei Imori Graduate School of Engineering Science, Osaka University 1-3 Machikaneyama-cho, Toyonaka, Osaka 560-8531, Japan E-mail: imori.stat@gmail.com
More informationBias-corrected AIC for selecting variables in Poisson regression models
Bias-corrected AIC for selecting variables in Poisson regression models Ken-ichi Kamo (a), Hirokazu Yanagihara (b) and Kenichi Satoh (c) (a) Corresponding author: Department of Liberal Arts and Sciences,
More informationOutline of GLMs. Definitions
Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density
More informationGeneralized Linear Models (GLZ)
Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the
More informationBias Correction of Cross-Validation Criterion Based on Kullback-Leibler Information under a General Condition
Bias Correction of Cross-Validation Criterion Based on Kullback-Leibler Information under a General Condition Hirokazu Yanagihara 1, Tetsuji Tonda 2 and Chieko Matsumoto 3 1 Department of Social Systems
More informationTesting Goodness Of Fit Of The Geometric Distribution: An Application To Human Fecundability Data
Journal of Modern Applied Statistical Methods Volume 4 Issue Article 8 --5 Testing Goodness Of Fit Of The Geometric Distribution: An Application To Human Fecundability Data Sudhir R. Paul University of
More informationKULLBACK-LEIBLER INFORMATION THEORY A BASIS FOR MODEL SELECTION AND INFERENCE
KULLBACK-LEIBLER INFORMATION THEORY A BASIS FOR MODEL SELECTION AND INFERENCE Kullback-Leibler Information or Distance f( x) gx ( ±) ) I( f, g) = ' f( x)log dx, If, ( g) is the "information" lost when
More informationPQL Estimation Biases in Generalized Linear Mixed Models
PQL Estimation Biases in Generalized Linear Mixed Models Woncheol Jang Johan Lim March 18, 2006 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure for the generalized
More informationParametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1
Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson
More informationSelection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models
Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models Junfeng Shang Bowling Green State University, USA Abstract In the mixed modeling framework, Monte Carlo simulation
More informationLOGISTIC REGRESSION Joseph M. Hilbe
LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of
More informationGeneralized Estimating Equations
Outline Review of Generalized Linear Models (GLM) Generalized Linear Model Exponential Family Components of GLM MLE for GLM, Iterative Weighted Least Squares Measuring Goodness of Fit - Deviance and Pearson
More informationOpen Problems in Mixed Models
xxiii Determining how to deal with a not positive definite covariance matrix of random effects, D during maximum likelihood estimation algorithms. Several strategies are discussed in Section 2.15. For
More informationSUPPLEMENTARY SIMULATIONS & FIGURES
Supplementary Material: Supplementary Material for Mixed Effects Models for Resampled Network Statistics Improve Statistical Power to Find Differences in Multi-Subject Functional Connectivity Manjari Narayan,
More informationThe In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.1/27
The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification Brett Presnell Dennis Boos Department of Statistics University of Florida and Department of Statistics North Carolina State
More informationif n is large, Z i are weakly dependent 0-1-variables, p i = P(Z i = 1) small, and Then n approx i=1 i=1 n i=1
Count models A classical, theoretical argument for the Poisson distribution is the approximation Binom(n, p) Pois(λ) for large n and small p and λ = np. This can be extended considerably to n approx Z
More informationGeneralized Linear Models
Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n
More informationComparison of Estimators in GLM with Binary Data
Journal of Modern Applied Statistical Methods Volume 13 Issue 2 Article 10 11-2014 Comparison of Estimators in GLM with Binary Data D. M. Sakate Shivaji University, Kolhapur, India, dms.stats@gmail.com
More informationBootstrap prediction and Bayesian prediction under misspecified models
Bernoulli 11(4), 2005, 747 758 Bootstrap prediction and Bayesian prediction under misspecified models TADAYOSHI FUSHIKI Institute of Statistical Mathematics, 4-6-7 Minami-Azabu, Minato-ku, Tokyo 106-8569,
More informationModeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use
Modeling Longitudinal Count Data with Excess Zeros and : Application to Drug Use University of Northern Colorado November 17, 2014 Presentation Outline I and Data Issues II Correlated Count Regression
More informationThe equivalence of the Maximum Likelihood and a modified Least Squares for a case of Generalized Linear Model
Applied and Computational Mathematics 2014; 3(5): 268-272 Published online November 10, 2014 (http://www.sciencepublishinggroup.com/j/acm) doi: 10.11648/j.acm.20140305.22 ISSN: 2328-5605 (Print); ISSN:
More informationATINER's Conference Paper Series STA
ATINER CONFERENCE PAPER SERIES No: LNG2014-1176 Athens Institute for Education and Research ATINER ATINER's Conference Paper Series STA2014-1255 Parametric versus Semi-parametric Mixed Models for Panel
More informationApproximate Likelihoods
Approximate Likelihoods Nancy Reid July 28, 2015 Why likelihood? makes probability modelling central l(θ; y) = log f (y; θ) emphasizes the inverse problem of reasoning y θ converts a prior probability
More information1/15. Over or under dispersion Problem
1/15 Over or under dispersion Problem 2/15 Example 1: dogs and owners data set In the dogs and owners example, we had some concerns about the dependence among the measurements from each individual. Let
More informationarxiv: v2 [stat.me] 8 Jun 2016
Orthogonality of the Mean and Error Distribution in Generalized Linear Models 1 BY ALAN HUANG 2 and PAUL J. RATHOUZ 3 University of Technology Sydney and University of Wisconsin Madison 4th August, 2013
More informationImproving the Precision of Estimation by fitting a Generalized Linear Model, and Quasi-likelihood.
Improving the Precision of Estimation by fitting a Generalized Linear Model, and Quasi-likelihood. P.M.E.Altham, Statistical Laboratory, University of Cambridge June 27, 2006 This article was published
More informationEstimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk
Ann Inst Stat Math (0) 64:359 37 DOI 0.007/s0463-00-036-3 Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Paul Vos Qiang Wu Received: 3 June 009 / Revised:
More informationCHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA
STATISTICS IN MEDICINE, VOL. 17, 59 68 (1998) CHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA J. K. LINDSEY AND B. JONES* Department of Medical Statistics, School of Computing Sciences,
More informationFULL LIKELIHOOD INFERENCES IN THE COX MODEL
October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach
More informationGeneralized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.
Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint
More informationLECTURE 2 LINEAR REGRESSION MODEL AND OLS
SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another
More informationModel comparison and selection
BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)
More informationModel Estimation Example
Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions
More informationAn Akaike Criterion based on Kullback Symmetric Divergence in the Presence of Incomplete-Data
An Akaike Criterion based on Kullback Symmetric Divergence Bezza Hafidi a and Abdallah Mkhadri a a University Cadi-Ayyad, Faculty of sciences Semlalia, Department of Mathematics, PB.2390 Marrakech, Moroco
More informationNon-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models
Optimum Design for Mixed Effects Non-Linear and generalized Linear Models Cambridge, August 9-12, 2011 Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models
More informationDiscrepancy-Based Model Selection Criteria Using Cross Validation
33 Discrepancy-Based Model Selection Criteria Using Cross Validation Joseph E. Cavanaugh, Simon L. Davies, and Andrew A. Neath Department of Biostatistics, The University of Iowa Pfizer Global Research
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.
More informationGeneralized Linear Models. Kurt Hornik
Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general
More informationPoisson regression: Further topics
Poisson regression: Further topics April 21 Overdispersion One of the defining characteristics of Poisson regression is its lack of a scale parameter: E(Y ) = Var(Y ), and no parameter is available to
More informationQuasi-likelihood Scan Statistics for Detection of
for Quasi-likelihood for Division of Biostatistics and Bioinformatics, National Health Research Institutes & Department of Mathematics, National Chung Cheng University 17 December 2011 1 / 25 Outline for
More informationGeneralized Quasi-likelihood versus Hierarchical Likelihood Inferences in Generalized Linear Mixed Models for Count Data
Sankhyā : The Indian Journal of Statistics 2009, Volume 71-B, Part 1, pp. 55-78 c 2009, Indian Statistical Institute Generalized Quasi-likelihood versus Hierarchical Likelihood Inferences in Generalized
More informationRepeated ordinal measurements: a generalised estimating equation approach
Repeated ordinal measurements: a generalised estimating equation approach David Clayton MRC Biostatistics Unit 5, Shaftesbury Road Cambridge CB2 2BW April 7, 1992 Abstract Cumulative logit and related
More informationH-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL
H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL Intesar N. El-Saeiti Department of Statistics, Faculty of Science, University of Bengahzi-Libya. entesar.el-saeiti@uob.edu.ly
More informationComputational methods for mixed models
Computational methods for mixed models Douglas Bates Department of Statistics University of Wisconsin Madison March 27, 2018 Abstract The lme4 package provides R functions to fit and analyze several different
More informationMath 423/533: The Main Theoretical Topics
Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)
More informationMARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES
REVSTAT Statistical Journal Volume 13, Number 3, November 2015, 233 243 MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES Authors: Serpil Aktas Department of
More informationOn the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models
On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of
More informationGeneralized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence
Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Sunil Kumar Dhar Center for Applied Mathematics and Statistics, Department of Mathematical Sciences, New Jersey
More informationAkaike Information Criterion
Akaike Information Criterion Shuhua Hu Center for Research in Scientific Computation North Carolina State University Raleigh, NC February 7, 2012-1- background Background Model statistical model: Y j =
More informationWeighted Least Squares I
Weighted Least Squares I for i = 1, 2,..., n we have, see [1, Bradley], data: Y i x i i.n.i.d f(y i θ i ), where θ i = E(Y i x i ) co-variates: x i = (x i1, x i2,..., x ip ) T let X n p be the matrix of
More informationGeneralized Quasi-likelihood (GQL) Inference* by Brajendra C. Sutradhar Memorial University address:
Generalized Quasi-likelihood (GQL) Inference* by Brajendra C. Sutradhar Memorial University Email address: bsutradh@mun.ca QL Estimation for Independent Data. For i = 1,...,K, let Y i denote the response
More informationOn the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models
On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of
More informationEstimation and sample size calculations for correlated binary error rates of biometric identification devices
Estimation and sample size calculations for correlated binary error rates of biometric identification devices Michael E. Schuckers,11 Valentine Hall, Department of Mathematics Saint Lawrence University,
More informationLarge Sample Properties of Estimators in the Classical Linear Regression Model
Large Sample Properties of Estimators in the Classical Linear Regression Model 7 October 004 A. Statement of the classical linear regression model The classical linear regression model can be written in
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationStat 710: Mathematical Statistics Lecture 12
Stat 710: Mathematical Statistics Lecture 12 Jun Shao Department of Statistics University of Wisconsin Madison, WI 53706, USA Jun Shao (UW-Madison) Stat 710, Lecture 12 Feb 18, 2009 1 / 11 Lecture 12:
More informationOverdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion
Biostokastikum Overdispersion is not uncommon in practice. In fact, some would maintain that overdispersion is the norm in practice and nominal dispersion the exception McCullagh and Nelder (1989) Overdispersion
More informationGenerating Half-normal Plot for Zero-inflated Binomial Regression
Paper SP05 Generating Half-normal Plot for Zero-inflated Binomial Regression Zhao Yang, Xuezheng Sun Department of Epidemiology & Biostatistics University of South Carolina, Columbia, SC 29208 SUMMARY
More informationGeographically Weighted Regression as a Statistical Model
Geographically Weighted Regression as a Statistical Model Chris Brunsdon Stewart Fotheringham Martin Charlton October 6, 2000 Spatial Analysis Research Group Department of Geography University of Newcastle-upon-Tyne
More informationGeneralized Linear Models. Last time: Background & motivation for moving beyond linear
Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered
More informationThe GENMOD Procedure. Overview. Getting Started. Syntax. Details. Examples. References. SAS/STAT User's Guide. Book Contents Previous Next
Book Contents Previous Next SAS/STAT User's Guide Overview Getting Started Syntax Details Examples References Book Contents Previous Next Top http://v8doc.sas.com/sashtml/stat/chap29/index.htm29/10/2004
More informationGeneralized Linear Models I
Statistics 203: Introduction to Regression and Analysis of Variance Generalized Linear Models I Jonathan Taylor - p. 1/16 Today s class Poisson regression. Residuals for diagnostics. Exponential families.
More informationEstimating prediction error in mixed models
Estimating prediction error in mixed models benjamin saefken, thomas kneib georg-august university goettingen sonja greven ludwig-maximilians-university munich 1 / 12 GLMM - Generalized linear mixed models
More informationOn the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models
On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Institute of Statistics and Econometrics Georg-August-University Göttingen Department of Statistics
More informationLikelihood and p-value functions in the composite likelihood context
Likelihood and p-value functions in the composite likelihood context D.A.S. Fraser and N. Reid Department of Statistical Sciences University of Toronto November 19, 2016 Abstract The need for combining
More informationA test for improved forecasting performance at higher lead times
A test for improved forecasting performance at higher lead times John Haywood and Granville Tunnicliffe Wilson September 3 Abstract Tiao and Xu (1993) proposed a test of whether a time series model, estimated
More informationPoisson Regression. Ryan Godwin. ECON University of Manitoba
Poisson Regression Ryan Godwin ECON 7010 - University of Manitoba Abstract. These lecture notes introduce Maximum Likelihood Estimation (MLE) of a Poisson regression model. 1 Motivating the Poisson Regression
More informationConsistency of test based method for selection of variables in high dimensional two group discriminant analysis
https://doi.org/10.1007/s42081-019-00032-4 ORIGINAL PAPER Consistency of test based method for selection of variables in high dimensional two group discriminant analysis Yasunori Fujikoshi 1 Tetsuro Sakurai
More informationChapter 4: Generalized Linear Models-II
: Generalized Linear Models-II Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay
More informationUsing Estimating Equations for Spatially Correlated A
Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship
More information1. Introduction Over the last three decades a number of model selection criteria have been proposed, including AIC (Akaike, 1973), AICC (Hurvich & Tsa
On the Use of Marginal Likelihood in Model Selection Peide Shi Department of Probability and Statistics Peking University, Beijing 100871 P. R. China Chih-Ling Tsai Graduate School of Management University
More informationUniversity of California, Berkeley
University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 24 Paper 153 A Note on Empirical Likelihood Inference of Residual Life Regression Ying Qing Chen Yichuan
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationQUANTIFYING PQL BIAS IN ESTIMATING CLUSTER-LEVEL COVARIATE EFFECTS IN GENERALIZED LINEAR MIXED MODELS FOR GROUP-RANDOMIZED TRIALS
Statistica Sinica 15(05), 1015-1032 QUANTIFYING PQL BIAS IN ESTIMATING CLUSTER-LEVEL COVARIATE EFFECTS IN GENERALIZED LINEAR MIXED MODELS FOR GROUP-RANDOMIZED TRIALS Scarlett L. Bellamy 1, Yi Li 2, Xihong
More informationBIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation
BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)
More informationTrends in Human Development Index of European Union
Trends in Human Development Index of European Union Department of Statistics, Hacettepe University, Beytepe, Ankara, Turkey spxl@hacettepe.edu.tr, deryacal@hacettepe.edu.tr Abstract: The Human Development
More information1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches
Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model
More informationLasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices
Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,
More informationHigh Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data
High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data Song Xi CHEN Guanghua School of Management and Center for Statistical Science, Peking University Department
More informationGeneralized linear models
Generalized linear models Søren Højsgaard Department of Mathematical Sciences Aalborg University, Denmark October 29, 202 Contents Densities for generalized linear models. Mean and variance...............................
More informationA NOTE ON BAYESIAN ESTIMATION FOR THE NEGATIVE-BINOMIAL MODEL. Y. L. Lio
Pliska Stud. Math. Bulgar. 19 (2009), 207 216 STUDIA MATHEMATICA BULGARICA A NOTE ON BAYESIAN ESTIMATION FOR THE NEGATIVE-BINOMIAL MODEL Y. L. Lio The Negative Binomial model, which is generated by a simple
More informationA NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL
Discussiones Mathematicae Probability and Statistics 36 206 43 5 doi:0.75/dmps.80 A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Tadeusz Bednarski Wroclaw University e-mail: t.bednarski@prawo.uni.wroc.pl
More informationApplication of Poisson and Negative Binomial Regression Models in Modelling Oil Spill Data in the Niger Delta
International Journal of Science and Engineering Investigations vol. 7, issue 77, June 2018 ISSN: 2251-8843 Application of Poisson and Negative Binomial Regression Models in Modelling Oil Spill Data in
More informationGeneralized Linear Models
York SPIDA John Fox Notes Generalized Linear Models Copyright 2010 by John Fox Generalized Linear Models 1 1. Topics I The structure of generalized linear models I Poisson and other generalized linear
More informationBayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units
Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional
More information2.2 Classical Regression in the Time Series Context
48 2 Time Series Regression and Exploratory Data Analysis context, and therefore we include some material on transformations and other techniques useful in exploratory data analysis. 2.2 Classical Regression
More informationTento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/
Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/28.0018 Statistical Analysis in Ecology using R Linear Models/GLM Ing. Daniel Volařík, Ph.D. 13.
More informationLinear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52
Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Components of a linear model The two
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationOn prediction and density estimation Peter McCullagh University of Chicago December 2004
On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating
More informationUNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator
UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages
More informationi=1 h n (ˆθ n ) = 0. (2)
Stat 8112 Lecture Notes Unbiased Estimating Equations Charles J. Geyer April 29, 2012 1 Introduction In this handout we generalize the notion of maximum likelihood estimation to solution of unbiased estimating
More informationLast lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton
EM Algorithm Last lecture 1/35 General optimization problems Newton Raphson Fisher scoring Quasi Newton Nonlinear regression models Gauss-Newton Generalized linear models Iteratively reweighted least squares
More informationSTA216: Generalized Linear Models. Lecture 1. Review and Introduction
STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general
More informationReconstruction of individual patient data for meta analysis via Bayesian approach
Reconstruction of individual patient data for meta analysis via Bayesian approach Yusuke Yamaguchi, Wataru Sakamoto and Shingo Shirahata Graduate School of Engineering Science, Osaka University Masashi
More informationMultinomial Logistic Regression Models
Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word
More informationNormal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,
Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability
More informationHypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations
Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Takeshi Emura and Hisayuki Tsukuma Abstract For testing the regression parameter in multivariate
More informationGeneralized linear models
Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models
More information