Smoothing Age-Period-Cohort models with P -splines: a mixed model approach
|
|
- Gavin Randall
- 5 years ago
- Views:
Transcription
1 Smoothing Age-Period-Cohort models with P -splines: a mixed model approach Running headline: Smooth Age-Period-Cohort models I D Currie, Department of Actuarial Mathematics and Statistics, and the Maxwell Institute for Mathematical Sciences, Heriot-Watt University, Edinburgh, EH14 4AS, UK. J G Kirkby, Department of Actuarial Mathematics and Statistics,, and the Maxwell Institute for Mathematical Sciences, Heriot-Watt University, Edinburgh, EH14 4AS, UK. M Durban, Departamento de Estadistica y Econometria, Universidad Carlos III de Madrid, Edificio Torres Quevedo, Leganes, Madrid, Spain. P H C Eilers, Department of Methodology and Statistics, Utrecht University, 3508 TC Utrecht, The Netherlands. Responsible author: I D Currie, I.D.Currie@hw.ac.uk Abstract: We use smoothing with B-splines and penalties, the P -spline method of Eilers and Marx (1996), to smooth the Age-Period-Cohort (APC) model of mortality. We describe how smoothing with penalties in one dimension allows a mixed model approach to be used. We apply this method to the APC model and show that penalization gives a way of dealing with identifiability problems in the discrete APC model and leads to a mixed model representation of the model. We show that individual random effects can be used to model overdispersion and that this can also be achieved within the mixed model framework. We illustrate our methods with some mortality data provided by the UK insurance industry. Keywords: Age-period-cohort; identifiability; mixed models; mortality; overdispersion; P - splines; Schall s algorithm; smoothing. File name: /talks/london.2006/apc.paper/paper.tex: 20 June
2 Sect:Intro 1 Introduction We suppose that we have mortality data arranged in a two-way table classified by age at death and year of death. Age-Period-Cohort models (APC) are an important class of models in the study of mortality in such tables and, more generally, of disease incidence. A difficulty with the APC model is the choice of parameterization since the model is in general not identifiable. Clayton and Schifflers (1987) give a careful discussion of parameterization in APC models and sound warnings about the dangers of over-interpreting the fitted parameters; they are equally sceptical about the wisdom of forecasting by extrapolating parameter values. Holford (1983) and Carstensen (2007) also discuss the APC model with particular reference to the problems caused by non-identifiability. The discussion in these papers revolves round the properties of different parameterizations. Our approach is different: we force smoothness on the fitted model by penalizing differences between adjacent coefficients. Penalization does two things: first, it replaces the usual identifiability constraints, and second, it allows the APC model to be cast in a mixed model framework. The purpose of this paper is to explore and illustrate these ideas. Smoothing with P -splines was introduced by Eilers and Marx (1996) and here we apply the method to smooth the APC model. Smooth versions of the APC model have already appeared in the literature. Heuer (1997) used restricted cubic splines or natural splines to give a smooth version of the APC model; he also included interactions in his model by using the Kronecker product of the spline functions. An important difference between Heuer s approach and ours is that in Heuer s paper smoothness at the edges is produced by modifying the B-spline basis (natural splines) whereas in our case smoothness is produced by the use of penalties. Ogata et al. (2000) used splines in a Bayesian framework and also produced a smooth version of the APC model. The plan of the paper is as follows. In section 2 we explain the P -spline approach in one dimension and describe a transformation of the B-spline basis which allows the model to be expressed as a mixed model; Schall s algorithm (Schall, 1991) for fitting a generalized linear mixed model is described. In section 3 we apply our transformation to the APC model and show that the transformation deals with the identifiability problems that arise in the APC model. The mixed model representation has an interpretation as an additive model where the fixed component is a plane and there are three random components which correspond to the age, period and cohort effects. There is evidence of overdispersion and in section 4 we follow Perperoglou and Eilers (unpublished) and use individual random effects to model overdispersion; the mixed 2
3 model is a natural setting for this model and Schall s algorithm copes well with the computational challenge of model fitting. We use some mortality data provided by the UK insurance industry to illustrate our methods throughout. The paper ends with a short discussion. ect:pspline ect:bspline 2 Smoothing in one dimension with P -splines Smoothing with P -splines, introduced by Eilers and Marx (1996), is based on two ideas: (a) use B-splines as the basis for a regression, and (b) use a difference penalty on adjacent regression coefficients to ensure smoothness. Estimation is by penalized likelihood. The method has two attractive features which follow from these ideas: (a) the regression nature of P -splines means that it is straightforward to introduce smooth terms into a larger regression model, and (b) the difference penalty means that the familiar least squares (LS) solution in a normal model and, more generally, the iterative weighted least squares (IWLS) algorithm in a generalized linear model (GLM) apply in the P -spline setting. A number of papers contain descriptions of the method (Eilers and Marx, 1996, Marx and Eilers, 1998, Currie et al., 2004). We present a short introduction. In section 2.1 we describe P -spline smoothing with a B-spline basis and then in section 2.2 we describe a transformation of the B-spline basis which gives an alternative way of fitting the P -spline model. This new basis has two advantages: first, it allows us to use (generalized) linear mixed model methods and second, it enables us to deal with identifiability problems in more complex models. We describe the mixed model approach in section 2.3. This approach allows simple fitting with standard software and has the further advantage that it enables us to deal easily with overdispersion. 2.1 P -splines with a B-spline basis Our introduction is set in the context of Poisson errors, appropriate for modelling mortality data. We suppose we have data (d i, e i, x i ), i = 1,..., n, on a set of lives all aged sixty-five, say, where d i is the number of deaths in year x i and e i is the exposed to risk. Let d = (d 1,..., d n ), e = (e 1,..., e n ), etc. We suppose that the number of deaths d i is a realization of a Poisson distribution with mean µ i = e i θ i where θ i is the force of mortality or hazard function in year x i. We seek a smooth estimate of θ = (θ i ). A natural approach is to fit a GLM with Poisson errors, i.e., log µ = log e + log θ = log e + Xa where X = X(x), the regression matrix, is a function of year x and log e is an offset in the linear predictor. It seems unlikely that a polynomial basis will be suitable for modelling the variability in θ and a more flexible basis is provided by a set of B-splines {B 1 (x),..., B c (x)}; such a basis is shown in the upper left panel of Fig. 1 for c = 7. 3
4 We are still in the framework of classical regression with regression matrix, B = B(x), say, where the rows of B are the values of the B-splines in the basis evaluated at each year in turn. We use some mortality data provided by the Continuous Mortality Investigation (CMI) on claim incidence in the UK insurance business to illustrate our methods. The data run from ages 20 to 90 and years 1947 to 2002; for more information on these data see Currie et al. (2004). The middle left panel of Fig. 1 shows a plot of the logarithm of the raw forces of mortality, ˆθ i = d i /e i, for the sixty-five year old policy-holders in the data set. We fit an unpenalized GLM with c = 23 cubic B-splines in the basis and the resulting fit is also shown. It seems that the data have been undersmoothed; conversely, if there are too few B-splines in the basis, then the data will be oversmoothed. Thus, one approach is to optimise the number, and possibly position, of splines in the basis (Friedman and Silverman, 1989). An alternative is to consider the behaviour, not of the fitted curve, but of the fitted coefficients, â k. The c = 23 values of â k are plotted at their corresponding knot positions, and we see that the erratic nature of the fitted curve is a consequence of similar behaviour of the â k. The P -spline solution (Eilers and Marx, 1996) to this problem is to use a rich basis of B-splines and then ensure smoothness of the fitted curve by penalizing the resulting roughness in the â k with a difference penalty. For example, the second order penalty (which we will use throughout this paper) is given by (a 1 2a 2 + a 3 ) (a c 2 2a c 1 + a c ) 2 = a D Da (2.1) eq:penalty where D is a difference matrix of order 2; first and third order penalties can also be used. The log-likelihood function is modified by the penalty function and a is estimated by maximising the penalized log-likelihood l p = l(a; y) 1 2 a P a (2.2) eq:pl where l(a; y) is the usual log-likelihood for a GLM, P = P B = λd D is the penalty matrix and λ is the smoothing parameter. The suffix B, as in P B, indicates the associated basis and emphasizes that the penalty depends on the choice of basis. Other basis and penalties will be introduced below but when the context allows we will suppress the suffix and write the penalty simply as P. Maximizing (2.2) gives the penalized likelihood equations B (y µ) = P a (2.3) eq:ple which, conditional on the value of the smoothing parameter λ, can be solved with (B W B + P )â = B W z, (2.4) eq:scoring 4
5 the penalized version of the scoring algorithm; here B is the regression matrix, P is the penalty matrix, the tilde as in ã denotes a current estimate, and similarly for µ, z = Bã + W 1 (y µ), the working variable, and W = diag( µ), the diagonal matrix of weights, while â denotes the updated estimate of a. The hat-matrix is H = B(B Ŵ B + P ) 1 B Ŵ (2.5) eq:hat and the trace of the hat-matrix, tr(h), a measure of the effective dimension, ED, or effective degrees of freedom of the model (Hastie and Tibshirani, 1990, p52), is [ ] ED = Tr(H) = Tr (B Ŵ B + P ) 1 B Ŵ B ; (2.6) eq:tr a convenient alternative to (2.6) is [ ] ED = Tr(H) = c Tr (B Ŵ B + P ) 1 P (2.7) eq:tr2 where c is the number of columns in B. Standard errors for â can be computed from Var(â) (B W B + P ) 1. (2.8) eq:sterr Wahba (1983) and Silverman (1985) used a Bayesian argument to derive (2.8); see also Wood (2006, section 4.8) for a good discussion. We also note that in the extreme cases, λ = 0 and λ =, (2.8) reduces to familiar results: we get the usual asymptotic variance in an unpenalized GLM when λ = 0; when λ the limiting fit is a straight line (on the log scale) and the variance in (2.8) reduces to the variance when the linear predictor is linear in age. (Here we assume that a second order penalty and B-splines of degree at least two are used; see Eilers and Marx (1996)). There remains the choice of the smoothing parameter. In this paper we will use mixed model methods to select the smoothing parameter but there are other possibilities: the Akaike Information Criterion (AIC) (Akaike, 1973), the Bayesian Information Criterion (BIC) (Schwarz, 1978) or Generalised Cross Classification (GCV) (Craven and Wahba, 1979), for example. The right middle panel of Fig. 1 shows the result of using BIC to select the smoothing parameter with BIC is defined as BIC = Dev + log n Tr (2.9) eq:bic where Dev is the deviance in a GLM and n is the number of observations. We still have c = 23 B-splines in the basis but with a second order penalty and λ chosen by BIC the degrees of freedom are reduced from 23 to about 6.5. The fitted coefficients also exhibit smoothness and demonstrate an important difference between smoothing with natural splines and P -splines: in 5
6 the former, smoothness at the edges is ensured by the use of splines linear in the tails while in the latter, B-splines are used throughout the basis and smoothness is ensured by the penalty. t:transform 2.2 P -splines with a transformed basis The fitted coefficients with a B-spline basis have an attractive property: each coefficient is associated with a B-spline and the estimated value of the coefficient is approximately a weighted average of the observations in the vicinity of this B-spline, as the middle panels of Fig. 1 demonstrate. The second order penalty penalizes departures from linearity. An alternative strategy is to extract a linear component from the fitted trend and fit the remaining variation by fitting a smooth curve with a penalty that penalizes departures from zero. This approach has echoes of a mixed model where trend is split into a fixed part (the linear part) and the random part (the curved part); Green (1985) is an early reference to this idea which is also discussed by Verbyla et al. (1999). A transformation that achieves this decomposition with a B-spline basis and second order penalty penalty is given by Eilers (1999) in the discussion of the Verbyla et al. paper; see also Currie et al. (2006). Such transformations not only give access to (generalized) linear mixed model methods but also allow us to deal with problems of identifiability, as we will see in section 3. Welham et al. (2007) give a comprehensive review of mixed model representations of spline models. Let B = B(x), n c, be the regression matrix of B-splines and D D, c c, define the penalty matrix. Let UΦU be the singular value decomposition of D D where Φ is the diagonal matrix consisting of the eigenvalues φ 1, φ 2,..., φ c of D D in ascending order. We assume that a second order penalty is used so φ 1 = φ 2 = 0. Now a linear function is in the null space of D D so we can take U n = [1 : x ], c 2, as an orthogonal basis for the null space where 1 is (1,..., 1)/ c and x is (1, 2,..., c) centred and scaled to have unit length. Let U s, c (c 2), be the submatrix of U corresponding to the c 2 non-zero eigenvalues. We take U = [U n : U s ] and transform Bθ = Xβ + Zα, say, where X = BU n and Z = BU s (Φ + ) 0.5, and β = U nθ and α = (Φ + ) 0.5 U sθ ; here Φ + is the diagonal matrix consisting of the c 2 positive eigenvalues in Φ. With this transformation, the penalty θ D Dθ = α α. Furthermore, since β is unpenalized we may replace X = BU n = B[1 : x ] by [1, x] where 1 is a vector of 1 s of length n and x is the vector of year values. With these definitions in place the following estimation procedures 6
7 in (2.4) are equivalent: Regression matrix: B = B(x) [X : Z], X = [1, x], Z = BU s (Φ + ) 0.5 (2.10) eq:transform Penalty matrix: P B = λd D P F = λ blockdiag[o 2, I c 2 ] (2.11) eq:pstar where O 2 is a 2 2 matrix of zeros and I c 2 is the identity matrix of size c 2. The linear part, Xβ, is unpenalized, while the non-linear part, Zα, is penalized or shrunk towards zero. We interpret this representation as a mixed model with Xβ as the fixed part and Zα as the random part in section 2.3; see also Currie et al. (2006). Figure 1 explains how the new basis works. With c = 7 B-splines there are five basis functions in Z, as shown in the upper right panel. These new basis functions are very different from the original B-splines; first, they are no longer local functions and second, the high frequency functions have low amplitude (a consequence of the scaling by (Φ + ) 0.5 shown in the lower right panel). The lower left panel shows the values of ˆα, c = 23, from the unpenalized fit,, and the penalized fit, ; the shrinkage of the penalized estimates towards zero is evident. It is important to realise that although the amplitudes of the basis functions differ greatly their coefficients are equally penalized. Indeed, it would be possible to remove the high frequency/low amplitude basis functions from Z with little effect on the resulting fit, an idea exploited by Wood (2003). bsect:mixed 2.3 A mixed model representation Equations (2.10) and (2.11) say that fitting the penalized GLM with regression matrix B and penalty matrix P B = λd D is equivalent to fitting the penalized GLM with regression matrix [X : Z] and penalty matrix P F = λ blockdiag[o 2, I c 2 ]. With this second representation the scoring algorithm (2.4) becomes [ X W X X W Z Z W X Z W Z + λ I ] [ ˆβ ˆα ] = [ X W Z W ] z, (2.12) eq:mixed.equs where I = I c 2, z = X β +Z α+ W 1 (y µ) and W = diag( µ) with µ = e exp(x β +Z α) in the Poisson case. We recognise (2.12) as the mixed model equations for the linear mixed model z = Xβ + Zα + ɛ, α N (0, λ 1 I), ɛ N (0, W 1 ); (2.13) eq:mixed see Searle et al. (1992), p276. Smoothing parameters may be selected selected by maximizing the residual log-likelihood l(λ) = 1 2 log V 1 2 log X V 1 X 1 2 z (V 1 V 1 X(X V 1 X) 1 X V 1 ) z (2.14) eq:reml 7
8 where V = W 1 + ZGZ (2.15) eq:v and G = λ 1 I is the variance of the random effects. We now iterate between (2.12) and (2.14). We will return to this method in section 4 but in the present case we use the proposal of Schall (1991) for the estimation of β, α and the smoothing parameter λ in a generalized linear mixed model. With the same notation as Schall we let [ C X W X X W Z C = Z W X Z W Z + λ I ] (2.16) eq:cprimec and define T to be the lower (c 2) (c 2) block of the inverse of C C (corresponding to α). Schall s algorithm is 1. for given β = β, α = α and λ = λ estimate β and α from (2.12), and 2. for given β = β, α = α and λ = λ estimate λ from ˆλ 1 = α α c 2 v (2.17) eq:schall where v = λ Tr(T ). This fixed point iteration scheme yields approximate residual maximum likelihood (REML) estimates. We have found Schall s algorithm to provide an efficient solution. Approximate maximum likelihood estimates can be obtained by defining T to be the inverse of the lower (c 2) (c 2) block of C C. It follows from (2.7), the form of P F in (2.11) and the definition of T that the effective dimension of the fitted model can be written ED = 2 + (c 2 v) (2.18) eq:ed where 2 is the number of fixed effects, c 2 is the number of random effects and v = λtr(t ). We can interpret c 2 v as the effective degrees of freedom of the non-linear component of the effect of year. In the example in section 2.1 we have c p v = 5.1 with total ED = 7.1, slightly less smoothing than obtained with λ chosen by BIC when ED = 6.5. The decomposition (2.18) extends to the smooth APC model presented in the next section (see equation (3.7)). Overdispersion is a common problem with Poisson models. If Var(y) = σ 2 µ then Schall s estimate of σ 2 reduces to ˆσ 2 = (y ˆµ) Ŵ 1 (y ˆµ) n c + v 8 (2.19) eq:sigma2
9 which in our present example gives ˆσ 2 = 1.20; there is little evidence of serious overdispersion. In general, it is preferable to incorporate overdispersion directly into the estimation process and the mixed model approach enables this to happen in a natural way. The mixed model (2.13) becomes z = Xβ + Zα + ɛ, α N (0, λ 1 I), ɛ N (0, σ 2 1 W ) (2.20) eq:mixed2 and Schall s algorithm is modified as follows: in step 1, replace W by σ 2 W and add step 3, estimate σ 2 from (2.19). In this example there is little change: we find a fitted model with a slightly lower effective dimension of 7.01 and estimated overdispersion of ˆσ 2 = The model may also be fitted with standard software. We use the glmmpql( ) function of R (R Development Core Team, 2004) in the MASS library of Venables and Ripley (2002). The fitted model has effective dimension of 6.64 (computed from (2.6) (with W replaced by σ 2 W )) and estimated overdispersion of ˆσ 2 = We give some skeleton R code in Appendix B. Sect:APC 3 Smooth Age-Period-Cohort models We suppose that we have data matrices Y and E, both n a n y, of deaths and exposures respectively. The rows of Y and E are indexed by age at death x a and the columns by year of death x y. The classical approach to the APC model is the factor model in which the variation in the force of mortality, θ ijk at age i in year j for cohort k, is decomposed into three components: log θ ijk = α i + β j + γ k, i = 1,..., n a, j = 1,... n y, k = 1,..., n a + n y 1 (3.1) eq:factor where α i, β j and γ k are the age, period (year) and cohort effects respectively. With Poisson errors, this is a GLM so is easily fitted with standard software such as R. However, there is a difficulty with the interpretation of the fitted parameters since of the 2n a +2n y 1 parameters in (3.1) only 2n a +2n y 4 are identifiable; see Clayton and Schifflers (1987) for a careful discussion of the dangers of over-interpretation of the fitted parameters. Instead of trying to interpret the fitted parameters we consider the fitted log(mortality) surface which is unique. The upper left panel in Fig. 2 shows the mean fitted log(mortality) by age with the linear effect of age removed for the data,, for the factor model,, and for the smooth model described below, ; the corresponding plots for year and cohort are also given. These plots suggest that a smooth model in age, year and cohort is a natural alternative to the discrete factor model. The smoothness assumption is deceptive since we will see that this alone is sufficient to deal with the identifiability constraints. A smooth model may also deal with another problem with the APC 9
10 model: the cohort parameters which correspond to the oldest and youngest cohorts tend to be poorly estimated, a consequence of the small numbers of cells which contribute to estimates of the corner cohort parameters; the parameter estimates corresponding to the youngest cohorts in the CMI dataset are particularly unstable. A smooth model should help to deal with this instability. We assume that the parameters α, β and γ in (3.1) are smooth and define a smooth APC model as follows. Let M a, M y and M c be the n a n y matrices with entries age at death, year of death and year of birth (cohort) and let x a = vec(m a ), x y = vec(m y ) and x c = vec(m c ). Let B a = B(x a ) be the regression matrix of B-splines based on x a with similar definitions for B y and B c. We define a smooth APC regression matrix by B = [B a : B y : B c ] (3.2) eq:smooth with corresponding coefficients a = (a a, a y, a c). We impose smoothness on the coefficients a a, a y and a c by the block diagonal penalty matrix P = blockdiag[λ a D ad a, λ y D yd y, λ c D cd c ] (3.3) eq:p where D a, D y and D c are second order difference matrices and λ a, λ y and λ c are the smoothing parameters for the age, year and cohort parameters respectively. The model defined by (3.2) and (3.3) is a generalized additive model (GAM) (Hastie and Tibshirani, 1990) but instead of using back-fitting we fit directly with (2.4). However, the regression matrix in (3.2) is not of full rank so some care is required. There are a number of possibilities: we could use a small ridge penalty on the system of equations, as in Marx and Eilers (1998), or we could use a generalized inverse. A third possibility is to transform B to a non-singular basis; the transformation developed in section 2.2 enables us to extract the linear components of age, year and cohort, i.e., a plane in the age-year space. An important point is that all three methods give exactly the same fitted values. Let X a = [1, x a ], X y = [1, x y ] and X c = [1, x c ] be the n a n y 2 matrices corresponding to (2.10). Then removing the linear dependencies among the columns of X a, X y and X c we obtain the X matrix in the transformed model as X = [1 : x a : x y ]. (3.4) eq:bigx Note that although X is not unique the space spanned by X is, since this space equals the null space of P in (3.3). The Z matrix is given by Z = [Z a : Z y : Z c ] (3.5) eq:bigz 10
11 where, for example, Z a = B a U a:s (Φ + a ) 0.5, and U a:s and Φ + a are obtained from the singular value decomposition of D ad a as in section 2.2. Lastly, with the new regression matrix defined as [X : Z], the penalty transforms into P = blockdiag[o 3, λ a I ca 2, λ y I cy 2, λ c I cc 2] (3.6) eq:block where O 3 is the 3 3 matrix of 0 s and c a 2 is the column dimension of Z a, etc. The model may now be fitted as in section 2.3 with fixed regression matrix given by (3.4), random regression matrix by (3.5) and penalty matrix by (3.6). We fit the smooth APC model with B a, n 10, B y, n 13 and B c, n 28 where n = 3976, i.e., c a = 10, c y = 13 and c y = 28. Schall s algorithm, (2.16) and (2.17), extends as follows: let T a be the (c a 2) (c a 2) block of the inverse of C C which corresponds to the Z a coefficients; we take similar definitions for T y and T c for Z y and Z c. Fitting the Poisson model without overdispersion we find with REML that the dimension of the fitted model is reduced from 249 for the factor model to an effective dimension of about Generalizing (2.18) we write ED = 3 + (c a 2 v a ) + (c y 2 v y ) + (c c 2 v c ) (3.7) eq:partition where there are three fixed effects, c a 2 is the column dimension of Z a and v a = λ a Tr(T a ), etc. The non-linear components of the effects of age, year and cohort are c a 2 v a = 7.7, c y 2 v y = 8.1 and c c 2 v c = 17.0 respectively in the present example. The estimate of overdispersion using (2.19) is ˆσ 2 = 2.00, evidence of some overdispersion. We refit the model with σ 2 included as part of the estimation process. With overdispersion included in the estimation process we would expect heavier smoothing since the smoothed surface will be less inclined to follow the local behaviour of the observed mortality surface. The effective dimension is further reduced to about 33.8 the estimated value of σ 2 is 2.00, as before. The resulting detrended mean log(mortality) curves have been added to Fig. 2; the fitted log(mortality) is also shown for age 65. Skeleton R code is provided in Appendix B. In the previous paragraph we described overdispersion as a variance effect. However, with mortality data this approach ignores effects such as cold winters which can inflate death rates. In the next section we use the approach of Perperoglou and Eilers (unpublished) where overdispersion is viewed not as a variance problem but as a problem with the linear structure of the model. They suggest the addition of individual random effects to the linear predictor as a way of dealing with the lack of fit that is otherwise modelled with overdispersion. 11
12 Sect:Over 4 Overdispersion as individual random effects In the previous section we showed that the linear predictor for the APC model has a mixed model representation Xβ + Zα where X and Z are defined in (3.4) and (3.5) respectively. Perperoglou and Eilers (unpublished) modified the linear predictor by the addition of individual random effects to give Xβ + Zα + γ (4.1) eq:over where the length of γ is the same as the number of observations, n, say. Thus, the model has more parameters than observations but a ridge penalty on γ maintains identifiability and shrinks γ towards zero; the penalty (3.6) becomes P = blockdiag[o 3, λ a I ca 2, λ y I cy 2, λ c I cc 2, κi n ] = blockdiag[o 3, P, κi n ], (4.2) eq:blockover say. We have a mixed model where the variance of the random effects α and γ is given by G = blockdiag[λ 1 a I ca 2, λ 1 y I cy 2, λ 1 c I cc 2, κ 1 I n ] = blockdiag[p 1, κ 1 I n ]. (4.3) eq:varg The mixed model equations (2.12) become X W X X W Z X W Z W X Z W Z + P Z W W X W Z W + κin ˆβ ˆα ˆγ = X W Z W W z. (4.4) eq:mixed.equs2 This is a very large system of equations but Perperoglou and Eilers (unpublished) provide a device which facilitates its solution. We define a modified weight matrix W = κ( W + κi n ) 1 W (4.5) eq:wstar and solve (4.4) for ˆγ to get κˆγ = W ( z X ˆβ Z ˆα) (4.6) eq:kgamma from which it follows that (4.4) reduces to [ X W X X W Z Z W X Z W Z + P ] [ ˆβ ˆα ] = [ X W Z W ] z. (4.7) eq:mixed.equs3 This is the same system as obtained for the original smooth APC model but with the weight matrix W replaced by W. For given κ we optimize over the remaining parameters by using Schall s algorithm; κ is estimated by maximizing the profile residual log-likelihood l(ˆλ a, ˆλ y, ˆλ c, κ) from (2.14). It is essential to avoid the inversion of large matrices such as the left hand side of (4.4) and some matrix identities to this end are provided in Appendix A. 12
13 Figure 3 shows the results of fitting the model. Figure 3 also shows the profile log-likelihood l(ˆλ a, ˆλ y, ˆλ c, κ) plotted against log κ; evidently, the smoothing parameter which shrinks the individual random effects towards zero is sharply estimated. Values of the observed and smoothed log(mortality) together with the estimated individual effects are also shown for ages forty and sixty. There is a noticeable difference in the individual random effects at these ages. An explanation can be found in the lower right panel of Fig. 3 which gives the numbers of deaths at ages sixty and forty. Since Var(log(d/e)) 1/d the values of log(d/e) are a good estimate of the true underlying smooth log(mortality) at age sixty, but a poor estimate at age forty. It follows that the residuals log(d/e) X ˆβ are almost entirely explained by the individual random effects at age sixty, while the stochastic element of the residual is substantial at age forty. Furthermore the individual random effects show systematic deviations between the data and the model at age forty, evidence of lack of fit at this age. The modified weight matrix W in (4.5) deserves some comment. We note that W is a diagonal matrix with entries w i = κ w i /( w i + κ) with w i = µ in the Poisson case considered here. Thurston et al. (2000) used a similar weight matrix in their algorithm to fit the negative binomial distribution, a distribution often used to model overdispersion; in their paper the weight w i did not include the estimated random effect γ i. For further comment on this point see Perperoglou and Eilers (unpublished). Sect:Disc 5 Discussion The model (4.1) with overdispersion involves choosing four smoothing parameters in the framework of a GLM with over four thousand linear parameters; Schall s algorithm combined with the modified weight method in (4.5) and (4.7) gives a low-footprint, efficient method of model fitting with simple direct coding. Our conclusion is that Schall s algorithm (1991) is a simple and effective method of fitting in the mixed model setting. In this paper we have considered random effects acting at the individual age and year level. One other possibility arises as a result of such things as outbreaks of influenza or cold winters. Such effects can be modelled as smooth random effects which act on the mortality of a whole year. The individual random effects γ with length n = n a n y in (4.1) are replaced by annual random effects (I ny B a )γ where γ has length n y c s ; here c s is the column dimension of the B-spline basis B a and denotes the Kronecker product. Some initial results from this approach are reported in Kirkby et al. (2007). 13
14 We have used B-splines and penalties to smooth the APC model. Transformation of the B- spline basis enables the model to be expressed as a mixed model which allows the modelling of overdispersion as individual random effects. The problem of identifiability is addressed with the same transformation. In conclusion we offer a unified approach for smoothing the APC model in a mixed model framework, dealing with non-identifiability in the APC model and modelling overdispersed counts. 14
15 References Akaike H (1973) Maximum likelihood identification of Gaussian autoregressive moving average models. Biometrika, 60, Carstensen B (2007) Age-period-cohort models for the Lexis diagram. Statistics in Medicine, 26, Clayton D and Schifflers E (1987) Models for temporal variation in cancer rates. II: Age-periodcohort models. Statistics in Medicine, 6, Craven P and Wahba G (1979) Smoothing noisy data with spline functions. Numerische Mathematik, 31, Currie ID, Durban M, Eilers PHC (2004) Smoothing and forecasting mortality rates. Statistical Modelling 4, Currie ID, Durban M and Eilers PHC (2006) Generalized linear array models with applications to multidimensional smoothing. Journal of the Royal Statistical Society: Series B 68, Eilers PHC (1999) Discussion of The analysis of designed experiments and longitudinal data by using smoothing splines (with discussion) (by AP Verbyla, BR Cullis, MG Kenward and SJ Welham) Applied Statistics, 48, Eilers PHC and Marx BD (1996) Flexible smoothing with B-splines and penalties. Statistical Science 11, Friedman JH and Silverman BW (1989) Flexible parsimonious smoothing and additive modeling. Technometrics 31, Green PJ (1985) Linear models for field trials, smoothing and cross-validation. Biometrika, 72, Hastie TJ and Tibshirani RJ (1990) Generalized additive models. London: Chapman and Hall. Heuer C (1997) Modeling of time trends and interactions in vital rates using restricted regression splines. Biometrics, 53, Holford TR (1983) The estimation of age, period and cohort effects for vital rates. Biometrics, 39, Kirkby JG and Currie ID (2007) Smooth models of mortality with period shocks. Proceedings of 22nd International Workshop on Statistical Modelling, Barcelona, to appear. Marx BD and Eilers PHC (1998) Direct generalized additive modeling with penalized likelihood. Computational Statistics and Data Analysis, 28, Ogata Y, Katsura K, Keiding N, Holst C and Green A (2000) Empirical Bayes Age-Period- 15
16 Cohort analysis of retrospective incidence data. Scandinavian Journal of Statistics, 27, Perperoglou A and Eilers PHC Overdispersion modelling with individual random effects. Unpublished manuscript. R Development Core Team (2004). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN , URL Schall R (1991) Estimation in generalized linear models with random effects. Biometrika, 78, Schwarz G (1978) Estimating the dimension of a model. Annals of Statistics, 6, Searle SR, Casella G and McCulloch CE (1992) Variance components. New York: John Wiley & Sons. Silverman BW (1985) Some aspects of the spline smoothing approach to nonparametric regression curve fitting (with Discussion). Journal of the Royal Statistical Society: Series B, 47, Thurston SW, Wand MP and Wiencke JK (2000) Negative binomial additive models. Biometrics, 56, Venables WN and Ripley BD (2002) Modern Applied Statistics with S. New York: Springer- Verlag. Verbyla AP, Cullis BR, Kenward MG and Welham SJ (1999) The analysis of designed experiments and longitudinal data by using smoothing splines (with discussion). Applied Statistics, 48, Wahba G (1983) Bayesian confidence intervals for the cross-validated smoothing spline. Journal of the Royal Statistical Society: Series B, 45, Welham SJ, Cullis BR, Kenward MG and Thompson R (2007) A comparison of mixed model splines for curve fitting. Australian and New Zealand Journal of Statistics 49, Wood SN (2003) Thin plate regression splines. Journal of the Royal Statistical Society: Series B 65, Wood SN (2006) Generalized additive models: an introduction with R. London: Chapman and Hall. 16
17 Appendix A We provide some matrix identities which allow estimation in the smooth APC model with individual random effects, model (4.1) and (4.2). In (4.1) let C C = X W X X W Z X W Z W X Z W Z + P Z W W X W Z W + κin. (5.1) eq:mixed.app1 This matrix is (c a + c y + c c 3 + n) (c a + c y + c c 3 + n). For given κ, Schall s algorithm requires the leading (c a + c y + c c 3) (c a + c y + c c 3) block of (C C) 1. It follows from results on the inverse of partition matrices and the definition of W in (4.5) that this matrix is given by [ X W X X W Z ] 1, (5.2) eq:mixed.app2 Z W X Z W Z + P the inverse of the matrix on the left hand side of (4.7). The Schall estimation scheme, as in section 3, is now used (conditional on κ) to estimate the remaining parameters. To estimate κ we compute the profile residual log-likelihood l(ˆλ a, ˆλ y, ˆλ c, κ) from 1 2 log V 1 2 log X V 1 X 1 2 z (V 1 V 1 X(X V 1 X) 1 X V 1 ) z. (5.3) eq:app3 Now, with the variance of the random effects given by (4.3), we find [ V = W 1 P 1 O + [Z : I n ] O κ 1 I n ] [Z : I n ] (5.4) eq:app4 = W 1 + ZP 1 Z (5.5) eq:app5 where P is defined in (4.2). It follows that V 1 and V are V 1 = W W Z(P + Z W Z) 1 Z W (5.6) eq:app6 and V = (λ ca 2 a λy cy 2 λ cc 2 c ) 1 W 1 P + Z W Z. (5.7) eq:app7 17
18 Appendix B Skeleton code to fit the mixed model (2.13) is given below. It is assumed that deaths and exposures are stored in vectors Dth and Exp, and the fixed and random effects regression matrices are X and Z respectively. The function myglmmpql is a copy of the R-function glmmpql in which the line mcall$method <- "ML" is replaced by mcall$method <- "REML". library(nlme) library(mass) Id <- factor(rep(1,length(dth))) data.fr <- groupeddata(dth ~ X[,-1] rep(1,length = length(dth)), data = data.frame(dth, X, Z, Exp)) fit <- myglmmpql(dth ~ X[,-1] + offset(log(exp)), data = data.fr, random = list(id = pdident(~z-1)), family = poisson) Skeleton code to fit the penalized APC model in section 3 is given below. The fixed and random effects regression matrices are X, and Z.a, Z.y and Z.c respectively. Id <- factor(rep(1, length(dth))) Z.block <- list(list(id = pdident(~z.a-1)), list(id = pdident(~z.y-1)), list(id = pdident(~z.c-1))) Z.block <- unlist(z.block, recursive = FALSE) data.fr <- groupeddata(dth ~ X[,-1] rep(1,length = length(dth)), data = data.frame(dth, X, Z.a, Z.y, Z.c, Exp)) fit <- myglmmpql(dth ~ X[,-1] + offset(log(exp)), data = data.fr, random = Z.block, family=poisson) 18
19 B spline Transformed basis Year Year log(mortality) Age = 65 Npar = 23 DF = 23 log(mortality) Age = 65 Npar = 23 DF = Year Year Coefficient Scaling of basis functions Index of basis functions Index of basis functions Fig.Bases Figure 1: (a) B-spline basis (b) transformed basis (c) unpenalized regression: coefficient,, data (d) penalized regression: coefficient,, data (e) unpenalized,, and penalized,, coefficients in transformed regression (f) scaling of basis functions, φ 0.5 i, i = 3,..., c. 19
20 log(mortality) log(mortality) Factor Smooth Age Year log(mortality) log(mortality) Age Year of Birth Year Fig.Detrend Figure 2: Age-Period-Cohort model: detrended plots of mean log(mortality) by (a) age (b) year (c) cohort; (d) observed and fitted log(mortality) at age 65 20
21 profile residual logl log(mortality) Age = log(kappa) Year log(mortality) Age = 60 Number of deaths Age = 60 Age = Year Year Fig.APC.Ind Figure 3: Age-Period-Cohort model with individual random effects: (a) profile residual loglikelihood l(ˆλ a, ˆλ y, ˆλ c, κ) against log κ; (b) and (c) observed,, and fitted log(mortality), X ˆβ + Z ˆα, and individual random effects, ˆγ; (d) numbers of deaths. 21
Smoothing Age-Period-Cohort models with P -splines: a mixed model approach
Smoothing Age-Period-Cohort models with P -splines: a mixed model approach Running headline: Smooth Age-Period-Cohort models I D Currie, Department of Actuarial Mathematics and Statistics, and the Maxwell
More informationCurrie, Iain Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh EH14 4AS, UK
An Introduction to Generalized Linear Array Models Currie, Iain Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh EH14 4AS, UK E-mail: I.D.Currie@hw.ac.uk 1 Motivating
More informationUsing P-splines to smooth two-dimensional Poisson data
1 Using P-splines to smooth two-dimensional Poisson data Maria Durbán 1, Iain Currie 2, Paul Eilers 3 17th IWSM, July 2002. 1 Dept. Statistics and Econometrics, Universidad Carlos III de Madrid, Spain.
More informationGLAM An Introduction to Array Methods in Statistics
GLAM An Introduction to Array Methods in Statistics Iain Currie Heriot Watt University GLAM A Generalized Linear Array Model is a low-storage, high-speed, method for multidimensional smoothing, when data
More informationMultidimensional Density Smoothing with P-splines
Multidimensional Density Smoothing with P-splines Paul H.C. Eilers, Brian D. Marx 1 Department of Medical Statistics, Leiden University Medical Center, 300 RC, Leiden, The Netherlands (p.eilers@lumc.nl)
More informationP -spline ANOVA-type interaction models for spatio-temporal smoothing
P -spline ANOVA-type interaction models for spatio-temporal smoothing Dae-Jin Lee 1 and María Durbán 1 1 Department of Statistics, Universidad Carlos III de Madrid, SPAIN. e-mail: dae-jin.lee@uc3m.es and
More informationA Hierarchical Perspective on Lee-Carter Models
A Hierarchical Perspective on Lee-Carter Models Paul Eilers Leiden University Medical Centre L-C Workshop, Edinburgh 24 The vantage point Previous presentation: Iain Currie led you upward From Glen Gumbel
More informationFlexible Spatio-temporal smoothing with array methods
Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session IPS046) p.849 Flexible Spatio-temporal smoothing with array methods Dae-Jin Lee CSIRO, Mathematics, Informatics and
More informationIdentification of the age-period-cohort model and the extended chain ladder model
Identification of the age-period-cohort model and the extended chain ladder model By D. KUANG Department of Statistics, University of Oxford, Oxford OX TG, U.K. di.kuang@some.ox.ac.uk B. Nielsen Nuffield
More informationVariable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting
Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting Andreas Groll 1 and Gerhard Tutz 2 1 Department of Statistics, University of Munich, Akademiestrasse 1, D-80799, Munich,
More informationSpace-time modelling of air pollution with array methods
Space-time modelling of air pollution with array methods Dae-Jin Lee Royal Statistical Society Conference Edinburgh 2009 D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 1 Motivation
More informationAnalysis Methods for Supersaturated Design: Some Comparisons
Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs
More information1 Mixed effect models and longitudinal data analysis
1 Mixed effect models and longitudinal data analysis Mixed effects models provide a flexible approach to any situation where data have a grouping structure which introduces some kind of correlation between
More informationGeneralized Linear Models
Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n
More informationTheorems. Least squares regression
Theorems In this assignment we are trying to classify AML and ALL samples by use of penalized logistic regression. Before we indulge on the adventure of classification we should first explain the most
More informationLikelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science
1 Likelihood Ratio Tests that Certain Variance Components Are Zero Ciprian M. Crainiceanu Department of Statistical Science www.people.cornell.edu/pages/cmc59 Work done jointly with David Ruppert, School
More informationA Modern Look at Classical Multivariate Techniques
A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico
More informationSpatial Process Estimates as Smoothers: A Review
Spatial Process Estimates as Smoothers: A Review Soutir Bandyopadhyay 1 Basic Model The observational model considered here has the form Y i = f(x i ) + ɛ i, for 1 i n. (1.1) where Y i is the observed
More informationIllustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives
TR-No. 14-06, Hiroshima Statistical Research Group, 1 11 Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives Mariko Yamamura 1, Keisuke Fukui
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationRecovering Indirect Information in Demographic Applications
Recovering Indirect Information in Demographic Applications Jutta Gampe Abstract In many demographic applications the information of interest can only be estimated indirectly. Modelling events and rates
More informationModel checking overview. Checking & Selecting GAMs. Residual checking. Distribution checking
Model checking overview Checking & Selecting GAMs Simon Wood Mathematical Sciences, University of Bath, U.K. Since a GAM is just a penalized GLM, residual plots should be checked exactly as for a GLM.
More informationBivariate Weibull-power series class of distributions
Bivariate Weibull-power series class of distributions Saralees Nadarajah and Rasool Roozegar EM algorithm, Maximum likelihood estimation, Power series distri- Keywords: bution. Abstract We point out that
More informationBoosting Methods: Why They Can Be Useful for High-Dimensional Data
New URL: http://www.r-project.org/conferences/dsc-2003/ Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) March 20 22, Vienna, Austria ISSN 1609-395X Kurt Hornik,
More informationEstimation of cumulative distribution function with spline functions
INTERNATIONAL JOURNAL OF ECONOMICS AND STATISTICS Volume 5, 017 Estimation of cumulative distribution function with functions Akhlitdin Nizamitdinov, Aladdin Shamilov Abstract The estimation of the cumulative
More informationSimultaneous Confidence Bands for the Coefficient Function in Functional Regression
University of Haifa From the SelectedWorks of Philip T. Reiss August 7, 2008 Simultaneous Confidence Bands for the Coefficient Function in Functional Regression Philip T. Reiss, New York University Available
More informationModel selection and comparison
Model selection and comparison an example with package Countr Tarak Kharrat 1 and Georgi N. Boshnakov 2 1 Salford Business School, University of Salford, UK. 2 School of Mathematics, University of Manchester,
More informationGeneralized Additive Models
Generalized Additive Models The Model The GLM is: g( µ) = ß 0 + ß 1 x 1 + ß 2 x 2 +... + ß k x k The generalization to the GAM is: g(µ) = ß 0 + f 1 (x 1 ) + f 2 (x 2 ) +... + f k (x k ) where the functions
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationImproving the Precision of Estimation by fitting a Generalized Linear Model, and Quasi-likelihood.
Improving the Precision of Estimation by fitting a Generalized Linear Model, and Quasi-likelihood. P.M.E.Altham, Statistical Laboratory, University of Cambridge June 27, 2006 This article was published
More informationLeast Absolute Shrinkage is Equivalent to Quadratic Penalization
Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr
More informationSome properties of Likelihood Ratio Tests in Linear Mixed Models
Some properties of Likelihood Ratio Tests in Linear Mixed Models Ciprian M. Crainiceanu David Ruppert Timothy J. Vogelsang September 19, 2003 Abstract We calculate the finite sample probability mass-at-zero
More informationPENALIZING YOUR MODELS
PENALIZING YOUR MODELS AN OVERVIEW OF THE GENERALIZED REGRESSION PLATFORM Michael Crotty & Clay Barker Research Statisticians JMP Division, SAS Institute Copyr i g ht 2012, SAS Ins titut e Inc. All rights
More information9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures
FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models
More informationChapter 7: Model Assessment and Selection
Chapter 7: Model Assessment and Selection DD3364 April 20, 2012 Introduction Regression: Review of our problem Have target variable Y to estimate from a vector of inputs X. A prediction model ˆf(X) has
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationRegularization: Ridge Regression and the LASSO
Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression
More informationPENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA
PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA Kasun Rathnayake ; A/Prof Jun Ma Department of Statistics Faculty of Science and Engineering Macquarie University
More informationBasis Penalty Smoothers. Simon Wood Mathematical Sciences, University of Bath, U.K.
Basis Penalty Smoothers Simon Wood Mathematical Sciences, University of Bath, U.K. Estimating functions It is sometimes useful to estimate smooth functions from data, without being too precise about the
More informationVariable Selection and Model Choice in Survival Models with Time-Varying Effects
Variable Selection and Model Choice in Survival Models with Time-Varying Effects Boosting Survival Models Benjamin Hofner 1 Department of Medical Informatics, Biometry and Epidemiology (IMBE) Friedrich-Alexander-Universität
More informationLocal Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina
Local Likelihood Bayesian Cluster Modeling for small area health data Andrew Lawson Arnold School of Public Health University of South Carolina Local Likelihood Bayesian Cluster Modelling for Small Area
More informationMath 423/533: The Main Theoretical Topics
Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)
More informationRegularization in Cox Frailty Models
Regularization in Cox Frailty Models Andreas Groll 1, Trevor Hastie 2, Gerhard Tutz 3 1 Ludwig-Maximilians-Universität Munich, Department of Mathematics, Theresienstraße 39, 80333 Munich, Germany 2 University
More informationSelection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty
Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the
More informationGeneralized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.
Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint
More informationBiostatistics Advanced Methods in Biostatistics IV
Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results
More informationExact Likelihood Ratio Tests for Penalized Splines
Exact Likelihood Ratio Tests for Penalized Splines By CIPRIAN CRAINICEANU, DAVID RUPPERT, GERDA CLAESKENS, M.P. WAND Department of Biostatistics, Johns Hopkins University, 615 N. Wolfe Street, Baltimore,
More informationModel Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model
Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population
More informationLOGISTIC REGRESSION Joseph M. Hilbe
LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of
More informationEstimation of spatiotemporal effects by the fused lasso for densely sampled spatial data using body condition data set from common minke whales
Estimation of spatiotemporal effects by the fused lasso for densely sampled spatial data using body condition data set from common minke whales Mariko Yamamura 1, Hirokazu Yanagihara 2, Keisuke Fukui 3,
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Supervised Learning: Regression I Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Some of the
More informationSurvival Analysis I (CHL5209H)
Survival Analysis Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca January 7, 2015 31-1 Literature Clayton D & Hills M (1993): Statistical Models in Epidemiology. Not really
More informationRecap. HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis:
1 / 23 Recap HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis: Pr(G = k X) Pr(X G = k)pr(g = k) Theory: LDA more
More informationInversion Base Height. Daggot Pressure Gradient Visibility (miles)
Stanford University June 2, 1998 Bayesian Backtting: 1 Bayesian Backtting Trevor Hastie Stanford University Rob Tibshirani University of Toronto Email: trevor@stat.stanford.edu Ftp: stat.stanford.edu:
More informationLecture 8. Poisson models for counts
Lecture 8. Poisson models for counts Jesper Rydén Department of Mathematics, Uppsala University jesper.ryden@math.uu.se Statistical Risk Analysis Spring 2014 Absolute risks The failure intensity λ(t) describes
More informationLecture 6: Methods for high-dimensional problems
Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,
More informationOutline. Mixed models in R using the lme4 package Part 3: Longitudinal data. Sleep deprivation data. Simple longitudinal data
Outline Mixed models in R using the lme4 package Part 3: Longitudinal data Douglas Bates Longitudinal data: sleepstudy A model with random effects for intercept and slope University of Wisconsin - Madison
More informationModelling Survival Data using Generalized Additive Models with Flexible Link
Modelling Survival Data using Generalized Additive Models with Flexible Link Ana L. Papoila 1 and Cristina S. Rocha 2 1 Faculdade de Ciências Médicas, Dep. de Bioestatística e Informática, Universidade
More informationThe lasso, persistence, and cross-validation
The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University
More informationAnalysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems
Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Jeremy S. Conner and Dale E. Seborg Department of Chemical Engineering University of California, Santa Barbara, CA
More informationGeographically Weighted Regression as a Statistical Model
Geographically Weighted Regression as a Statistical Model Chris Brunsdon Stewart Fotheringham Martin Charlton October 6, 2000 Spatial Analysis Research Group Department of Geography University of Newcastle-upon-Tyne
More informationRepeated ordinal measurements: a generalised estimating equation approach
Repeated ordinal measurements: a generalised estimating equation approach David Clayton MRC Biostatistics Unit 5, Shaftesbury Road Cambridge CB2 2BW April 7, 1992 Abstract Cumulative logit and related
More informationForecasting with the age-period-cohort model and the extended chain-ladder model
Forecasting with the age-period-cohort model and the extended chain-ladder model By D. KUANG Department of Statistics, University of Oxford, Oxford OX1 3TG, U.K. di.kuang@some.ox.ac.uk B. Nielsen Nuffield
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationModel Estimation Example
Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions
More informationSTAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă
STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă mmp@stat.washington.edu Reading: Murphy: BIC, AIC 8.4.2 (pp 255), SRM 6.5 (pp 204) Hastie, Tibshirani
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.
More informationNonparametric Small Area Estimation Using Penalized Spline Regression
Nonparametric Small Area Estimation Using Penalized Spline Regression 0verview Spline-based nonparametric regression Nonparametric small area estimation Prediction mean squared error Bootstrapping small
More informationLinear Models 1. Isfahan University of Technology Fall Semester, 2014
Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and
More informationStatistics 203: Introduction to Regression and Analysis of Variance Course review
Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying
More informationLog-linear multidimensional Rasch model for capture-recapture
Log-linear multidimensional Rasch model for capture-recapture Elvira Pelle, University of Milano-Bicocca, e.pelle@campus.unimib.it David J. Hessen, Utrecht University, D.J.Hessen@uu.nl Peter G.M. Van der
More informationNow consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.
Weighting We have seen that if E(Y) = Xβ and V (Y) = σ 2 G, where G is known, the model can be rewritten as a linear model. This is known as generalized least squares or, if G is diagonal, with trace(g)
More informationModelling geoadditive survival data
Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model
More informationSCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models
SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION
More informationmgcv: GAMs in R Simon Wood Mathematical Sciences, University of Bath, U.K.
mgcv: GAMs in R Simon Wood Mathematical Sciences, University of Bath, U.K. mgcv, gamm4 mgcv is a package supplied with R for generalized additive modelling, including generalized additive mixed models.
More informationCluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May
Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May 5-7 2008 Peter Schlattmann Institut für Biometrie und Klinische Epidemiologie
More informationOutline of GLMs. Definitions
Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density
More informationHigh-dimensional regression
High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and
More informationRestricted Likelihood Ratio Tests in Nonparametric Longitudinal Models
Restricted Likelihood Ratio Tests in Nonparametric Longitudinal Models Short title: Restricted LR Tests in Longitudinal Models Ciprian M. Crainiceanu David Ruppert May 5, 2004 Abstract We assume that repeated
More informationVector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I.
Vector Autoregressive Model Vector Autoregressions II Empirical Macroeconomics - Lect 2 Dr. Ana Beatriz Galvao Queen Mary University of London January 2012 A VAR(p) model of the m 1 vector of time series
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationCOMPARING PARAMETRIC AND SEMIPARAMETRIC ERROR CORRECTION MODELS FOR ESTIMATION OF LONG RUN EQUILIBRIUM BETWEEN EXPORTS AND IMPORTS
Applied Studies in Agribusiness and Commerce APSTRACT Center-Print Publishing House, Debrecen DOI: 10.19041/APSTRACT/2017/1-2/3 SCIENTIFIC PAPER COMPARING PARAMETRIC AND SEMIPARAMETRIC ERROR CORRECTION
More informationHigh-dimensional Ordinary Least-squares Projection for Screening Variables
1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor
More informationAppendix A. Numeric example of Dimick Staiger Estimator and comparison between Dimick-Staiger Estimator and Hierarchical Poisson Estimator
Appendix A. Numeric example of Dimick Staiger Estimator and comparison between Dimick-Staiger Estimator and Hierarchical Poisson Estimator As described in the manuscript, the Dimick-Staiger (DS) estimator
More informationChecking, Selecting & Predicting with GAMs. Simon Wood Mathematical Sciences, University of Bath, U.K.
Checking, Selecting & Predicting with GAMs Simon Wood Mathematical Sciences, University of Bath, U.K. Model checking Since a GAM is just a penalized GLM, residual plots should be checked, exactly as for
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationProteomics and Variable Selection
Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial
More informationRegularization Methods for Additive Models
Regularization Methods for Additive Models Marta Avalos, Yves Grandvalet, and Christophe Ambroise HEUDIASYC Laboratory UMR CNRS 6599 Compiègne University of Technology BP 20529 / 60205 Compiègne, France
More informationCHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA
STATISTICS IN MEDICINE, VOL. 17, 59 68 (1998) CHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA J. K. LINDSEY AND B. JONES* Department of Medical Statistics, School of Computing Sciences,
More informationGeneralized Elastic Net Regression
Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1
More informationOutlier detection and variable selection via difference based regression model and penalized regression
Journal of the Korean Data & Information Science Society 2018, 29(3), 815 825 http://dx.doi.org/10.7465/jkdi.2018.29.3.815 한국데이터정보과학회지 Outlier detection and variable selection via difference based regression
More informationImproved Liu Estimators for the Poisson Regression Model
www.ccsenet.org/isp International Journal of Statistics and Probability Vol., No. ; May 202 Improved Liu Estimators for the Poisson Regression Model Kristofer Mansson B. M. Golam Kibria Corresponding author
More informationOn Properties of QIC in Generalized. Estimating Equations. Shinpei Imori
On Properties of QIC in Generalized Estimating Equations Shinpei Imori Graduate School of Engineering Science, Osaka University 1-3 Machikaneyama-cho, Toyonaka, Osaka 560-8531, Japan E-mail: imori.stat@gmail.com
More informationVarious Issues in Fitting Contingency Tables
Various Issues in Fitting Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Complete Tables with Zero Entries In contingency tables, it is possible to have zero entries in a
More informationOn prediction and density estimation Peter McCullagh University of Chicago December 2004
On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating
More informationSparse orthogonal factor analysis
Sparse orthogonal factor analysis Kohei Adachi and Nickolay T. Trendafilov Abstract A sparse orthogonal factor analysis procedure is proposed for estimating the optimal solution with sparse loadings. In
More informationModel comparison and selection
BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)
More informationLinear Regression Linear Regression with Shrinkage
Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle
More informationCMSC858P Supervised Learning Methods
CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors
More information