if n is large, Z i are weakly dependent 0-1-variables, p i = P(Z i = 1) small, and Then n approx i=1 i=1 n i=1

Count models A classical, theoretical argument for the Poisson distribution is the approximation Binom(n, p) Pois(λ) for large n and small p and λ = np. This can be extended considerably to n approx Z i Pois(λ) if n is large, Z i are weakly dependent 0-1-variables, p i = P(Z i = 1) small, and n λ = p i. Then Slide 1/13 Niels Richard Hansen Regression June 7, 2010 n V (Y ) = p i (1 p i ) + cov(z i, Z j ). i j }{{} λ

Overdispersion If the variables are weakly positively correlated, this leads to overdispersion; the variance is larger than the mean. It is a practical fact that overdispersion is often seen for count data; e.g. the variance of the Pearson residuals does not seem to 1. The estimated mean might be correct, but with overdispersion the confidence intervals become too narrow, and deviance tests too optimistic. Over- or underdispersion occurs in sampling with subsampling; N Y = with N a random variable (sampling) and Z i s conditionally on N are iid 0-1-variables (subsampling). Z i Slide 2/13 Niels Richard Hansen Regression June 7, 2010

Three critical modeling assumptions The relation between explanatory variables and the mean; g(e(y )) = g(µ) = (1, x T )β The relation between the mean and the variance; The error distribution. var µ (Y ). For the Poisson distribution the variance function, var µ (Y ) = µ, is determined by the error distribution overdispersion can not be incorporated in the model. One possible solution is a different error distribution; the negative binomial has variance function var µ (Y ) = µ + µ2 k. Slide 3/13 Niels Richard Hansen Regression June 7, 2010

Estimation equations We abandon the specification of the error distribution and propose only to specify the variance var µ (Y ) = σ 2 V (µ). We introduce the estimation equations U j (β) := n µ (η i )x ij y i µ(η i ) σ 2 V (µ(η i )) = 0, j = 1,..., p. Since E(U j (β)) = 0 the equations are unbiased (that is, sensible) estimation equations. Slide 4/13 Niels Richard Hansen Regression June 7, 2010

Estimation equations The estimation equations extend the scope of generalized linear models. It allows us to handle models with mean-variance relations that do not fit within a glm. The asymptotic theory supporting estimates of standard errors and formal statistical tests are based on the mean-variance relation only [to be discussed]... Hence the distributional assumption is not critical in itself. It is possible to include the nuisance parameter σ 2 to capture over- and underdispersion. One drawback: We don t obtain a generative model we can not simulate from the resulting model. The solution of the estimation equations is done by the IWLS exactly as for glm s. The nuisance parameter, σ 2, cancels out. Slide 5/13 Niels Richard Hansen Regression June 7, 2010

Quasi-likelihood functions A solution to the estimation equations is a (local) minimizer of the log-quasi-likelihood function Q(β; y) = n Q(η i (β); y i ) with The quasi-deviance is Q(η; y) = µ(η) y y t σ 2 V (t) dt. D = 2σ 2 Q(ˆβ; y), and does not depend upon the nuisance parameter σ 2. Slide 6/13 Niels Richard Hansen Regression June 7, 2010

Example Take V (µ) = µ then Q(µ; y) = µ y y t y log(µ) µ + y y log y σ 2 dt = t σ 2 which is the log-likelihood for the Poisson distribution up to the constant scaling σ 2 and an additive term not depending upon µ. Estimation proceeds just as for the Poisson model the deviance is exactly the same. Slide 7/13 Niels Richard Hansen Regression June 7, 2010

Estimation of σ 2. The preferred estimator is the moment estimator ˆσ 2 = 1 n p n (y i ˆµ i ) 2 V ( ˆµ i ) The intuition is that ( ) (Y µ) 2 E = σ 2 V (µ) and if we consistently estimate β for n, this estimator ˆσ 2 is consistent too. Dividing by n p instead of n does not in general remove the bias of the estimator entirely, as it does for the Gaussian family but it does provide a first order bias correction. Slide 8/13 Niels Richard Hansen Regression June 7, 2010

Asymptotics The asymptotic distributional results hinges on U(β) asymp N(0, var(u(β))) and the Taylor expansion, since U(ˆβ) = 0, U(β) E(DU(β))(β ˆβ) with E(DU(β)) replacing the derivative DU(ˆβ)). Isolating ˆβ gives ˆβ asymp N(β, E(DU(β)) 1 var(u(β))e(du(β)) 1 ) This is general estimation equation theory. Slide 9/13 Niels Richard Hansen Regression June 7, 2010

Asymptotics For the score function we have the identity var(u(β)) = E(DU(β)) (1) which simplifies the variance expression to var(u(β)) 1 = J (β) For the quasi-score equation the identity (1) holds too, and the asymptotic variance simplifies likewise. Take home message: The asymptotic theory works for quasilikelihood estimation exactly as for likelihood estimation, except that we have to estimate σ 2. Slide 10/13 Niels Richard Hansen Regression June 7, 2010

Correlated measures Let V(µ) denote the variance matrix, then independent observations imply a diagonal V(µ) with entries σ 2 V (µ i ) in the diagonal. The matrix form of the estimation function is U(β) = (y µ(β)) T V(µ) 1 diag{µ (η i )}X }{{} D We can replace V(µ) by any variance matrix to incorporate dependence between the observations; estimation becomes an iteration of the two steps Solve U(β) = 0 by IWLS (or take one or more steps in the algorithm) for fixed values of all nuisance parameters in V(µ). Compute residuals y i ˆµ i and estimate nuisance parameters in V(µ) from the residuals. Slide 11/13 Niels Richard Hansen Regression June 7, 2010

Repeated measures With repeated measures V(µ) is block diagonal, and we assume of the form V 1... 0 V(µ) =..... 0... V N And we will assume V (µi1 )... 0 V i = φ..... 0... V (µin ) R i V (µi1 )... 0..... 0... V (µin ) with R i a correlation matrix not depending upon the means. Slide 12/13 Niels Richard Hansen Regression June 7, 2010

One remark on asymptotics All arguments regarding asymptotics carry over; ˆβ asymp N(β, J (β) 1 ) where J (β) = var(u(β)) = E(DU(β)) = D T V(µ)D We can plug in estimates in the formula above to estimate the asymptotic variance matrix; the recommendation in the literature is to use the sandwich estimator obtained from J (β) 1 CJ (β) 1 with C a direct estimator of var(u(β)). Slide 13/13 Niels Richard Hansen Regression June 7, 2010