Smoothing Age-Period-Cohort models with P -splines: a mixed model approach

Size: px
Start display at page:

Download "Smoothing Age-Period-Cohort models with P -splines: a mixed model approach"

Transcription

1 Smoothing Age-Period-Cohort models with P -splines: a mixed model approach Running headline: Smooth Age-Period-Cohort models I D Currie, Department of Actuarial Mathematics and Statistics, and the Maxwell Institute for Mathematical Sciences, Heriot-Watt University, Edinburgh, EH14 4AS, UK. J G Kirkby, Department of Actuarial Mathematics and Statistics,, and the Maxwell Institute for Mathematical Sciences, Heriot-Watt University, Edinburgh, EH14 4AS, UK. M Durban, Departamento de Estadistica y Econometria, Universidad Carlos III de Madrid, Edificio Torres Quevedo, Leganes, Madrid, Spain. P H C Eilers, Department of Methodology and Statistics, Utrecht University, 3508 TC Utrecht, The Netherlands. Responsible author: I D Currie, I.D.Currie@hw.ac.uk Abstract: We use smoothing with B-splines and penalties, the P -spline method of Eilers and Marx (1996), to smooth the Age-Period-Cohort (APC) model of mortality. We describe how smoothing with penalties in one dimension allows a mixed model approach to be used. We apply this method to the APC model and show that penalization gives a way of dealing with identifiability problems in the discrete APC model and leads to a mixed model representation of the model. We show that individual random effects can be used to model overdispersion and that this can also be achieved within the mixed model framework. We illustrate our methods with some mortality data provided by the UK insurance industry. Keywords: Age-period-cohort; identifiability; mixed models; mortality; overdispersion; P - splines; Schall s algorithm; smoothing. File name: /talks/london.2006/apc.paper/paper.tex: 20 June

2 Sect:Intro 1 Introduction We suppose that we have mortality data arranged in a two-way table classified by age at death and year of death. Age-Period-Cohort models (APC) are an important class of models in the study of mortality in such tables and, more generally, of disease incidence. A difficulty with the APC model is the choice of parameterization since the model is in general not identifiable. Clayton and Schifflers (1987) give a careful discussion of parameterization in APC models and sound warnings about the dangers of over-interpreting the fitted parameters; they are equally sceptical about the wisdom of forecasting by extrapolating parameter values. Holford (1983) and Carstensen (2007) also discuss the APC model with particular reference to the problems caused by non-identifiability. The discussion in these papers revolves round the properties of different parameterizations. Our approach is different: we force smoothness on the fitted model by penalizing differences between adjacent coefficients. Penalization does two things: first, it replaces the usual identifiability constraints, and second, it allows the APC model to be cast in a mixed model framework. The purpose of this paper is to explore and illustrate these ideas. Smoothing with P -splines was introduced by Eilers and Marx (1996) and here we apply the method to smooth the APC model. Smooth versions of the APC model have already appeared in the literature. Heuer (1997) used restricted cubic splines or natural splines to give a smooth version of the APC model; he also included interactions in his model by using the Kronecker product of the spline functions. An important difference between Heuer s approach and ours is that in Heuer s paper smoothness at the edges is produced by modifying the B-spline basis (natural splines) whereas in our case smoothness is produced by the use of penalties. Ogata et al. (2000) used splines in a Bayesian framework and also produced a smooth version of the APC model. The plan of the paper is as follows. In section 2 we explain the P -spline approach in one dimension and describe a transformation of the B-spline basis which allows the model to be expressed as a mixed model; Schall s algorithm (Schall, 1991) for fitting a generalized linear mixed model is described. In section 3 we apply our transformation to the APC model and show that the transformation deals with the identifiability problems that arise in the APC model. The mixed model representation has an interpretation as an additive model where the fixed component is a plane and there are three random components which correspond to the age, period and cohort effects. There is evidence of overdispersion and in section 4 we follow Perperoglou and Eilers (unpublished) and use individual random effects to model overdispersion; the mixed 2

3 model is a natural setting for this model and Schall s algorithm copes well with the computational challenge of model fitting. We use some mortality data provided by the UK insurance industry to illustrate our methods throughout. The paper ends with a short discussion. ect:pspline ect:bspline 2 Smoothing in one dimension with P -splines Smoothing with P -splines, introduced by Eilers and Marx (1996), is based on two ideas: (a) use B-splines as the basis for a regression, and (b) use a difference penalty on adjacent regression coefficients to ensure smoothness. Estimation is by penalized likelihood. The method has two attractive features which follow from these ideas: (a) the regression nature of P -splines means that it is straightforward to introduce smooth terms into a larger regression model, and (b) the difference penalty means that the familiar least squares (LS) solution in a normal model and, more generally, the iterative weighted least squares (IWLS) algorithm in a generalized linear model (GLM) apply in the P -spline setting. A number of papers contain descriptions of the method (Eilers and Marx, 1996, Marx and Eilers, 1998, Currie et al., 2004). We present a short introduction. In section 2.1 we describe P -spline smoothing with a B-spline basis and then in section 2.2 we describe a transformation of the B-spline basis which gives an alternative way of fitting the P -spline model. This new basis has two advantages: first, it allows us to use (generalized) linear mixed model methods and second, it enables us to deal with identifiability problems in more complex models. We describe the mixed model approach in section 2.3. This approach allows simple fitting with standard software and has the further advantage that it enables us to deal easily with overdispersion. 2.1 P -splines with a B-spline basis Our introduction is set in the context of Poisson errors, appropriate for modelling mortality data. We suppose we have data (d i, e i, x i ), i = 1,..., n, on a set of lives all aged sixty-five, say, where d i is the number of deaths in year x i and e i is the exposed to risk. Let d = (d 1,..., d n ), e = (e 1,..., e n ), etc. We suppose that the number of deaths d i is a realization of a Poisson distribution with mean µ i = e i θ i where θ i is the force of mortality or hazard function in year x i. We seek a smooth estimate of θ = (θ i ). A natural approach is to fit a GLM with Poisson errors, i.e., log µ = log e + log θ = log e + Xa where X = X(x), the regression matrix, is a function of year x and log e is an offset in the linear predictor. It seems unlikely that a polynomial basis will be suitable for modelling the variability in θ and a more flexible basis is provided by a set of B-splines {B 1 (x),..., B c (x)}; such a basis is shown in the upper left panel of Fig. 1 for c = 7. 3

4 We are still in the framework of classical regression with regression matrix, B = B(x), say, where the rows of B are the values of the B-splines in the basis evaluated at each year in turn. We use some mortality data provided by the Continuous Mortality Investigation (CMI) on claim incidence in the UK insurance business to illustrate our methods. The data run from ages 20 to 90 and years 1947 to 2002; for more information on these data see Currie et al. (2004). The middle left panel of Fig. 1 shows a plot of the logarithm of the raw forces of mortality, ˆθ i = d i /e i, for the sixty-five year old policy-holders in the data set. We fit an unpenalized GLM with c = 23 cubic B-splines in the basis and the resulting fit is also shown. It seems that the data have been undersmoothed; conversely, if there are too few B-splines in the basis, then the data will be oversmoothed. Thus, one approach is to optimise the number, and possibly position, of splines in the basis (Friedman and Silverman, 1989). An alternative is to consider the behaviour, not of the fitted curve, but of the fitted coefficients, â k. The c = 23 values of â k are plotted at their corresponding knot positions, and we see that the erratic nature of the fitted curve is a consequence of similar behaviour of the â k. The P -spline solution (Eilers and Marx, 1996) to this problem is to use a rich basis of B-splines and then ensure smoothness of the fitted curve by penalizing the resulting roughness in the â k with a difference penalty. For example, the second order penalty (which we will use throughout this paper) is given by (a 1 2a 2 + a 3 ) (a c 2 2a c 1 + a c ) 2 = a D Da (2.1) eq:penalty where D is a difference matrix of order 2; first and third order penalties can also be used. The log-likelihood function is modified by the penalty function and a is estimated by maximising the penalized log-likelihood l p = l(a; y) 1 2 a P a (2.2) eq:pl where l(a; y) is the usual log-likelihood for a GLM, P = P B = λd D is the penalty matrix and λ is the smoothing parameter. The suffix B, as in P B, indicates the associated basis and emphasizes that the penalty depends on the choice of basis. Other basis and penalties will be introduced below but when the context allows we will suppress the suffix and write the penalty simply as P. Maximizing (2.2) gives the penalized likelihood equations B (y µ) = P a (2.3) eq:ple which, conditional on the value of the smoothing parameter λ, can be solved with (B W B + P )â = B W z, (2.4) eq:scoring 4

5 the penalized version of the scoring algorithm; here B is the regression matrix, P is the penalty matrix, the tilde as in ã denotes a current estimate, and similarly for µ, z = Bã + W 1 (y µ), the working variable, and W = diag( µ), the diagonal matrix of weights, while â denotes the updated estimate of a. The hat-matrix is H = B(B Ŵ B + P ) 1 B Ŵ (2.5) eq:hat and the trace of the hat-matrix, tr(h), a measure of the effective dimension, ED, or effective degrees of freedom of the model (Hastie and Tibshirani, 1990, p52), is [ ] ED = Tr(H) = Tr (B Ŵ B + P ) 1 B Ŵ B ; (2.6) eq:tr a convenient alternative to (2.6) is [ ] ED = Tr(H) = c Tr (B Ŵ B + P ) 1 P (2.7) eq:tr2 where c is the number of columns in B. Standard errors for â can be computed from Var(â) (B W B + P ) 1. (2.8) eq:sterr Wahba (1983) and Silverman (1985) used a Bayesian argument to derive (2.8); see also Wood (2006, section 4.8) for a good discussion. We also note that in the extreme cases, λ = 0 and λ =, (2.8) reduces to familiar results: we get the usual asymptotic variance in an unpenalized GLM when λ = 0; when λ the limiting fit is a straight line (on the log scale) and the variance in (2.8) reduces to the variance when the linear predictor is linear in age. (Here we assume that a second order penalty and B-splines of degree at least two are used; see Eilers and Marx (1996)). There remains the choice of the smoothing parameter. In this paper we will use mixed model methods to select the smoothing parameter but there are other possibilities: the Akaike Information Criterion (AIC) (Akaike, 1973), the Bayesian Information Criterion (BIC) (Schwarz, 1978) or Generalised Cross Classification (GCV) (Craven and Wahba, 1979), for example. The right middle panel of Fig. 1 shows the result of using BIC to select the smoothing parameter with BIC is defined as BIC = Dev + log n Tr (2.9) eq:bic where Dev is the deviance in a GLM and n is the number of observations. We still have c = 23 B-splines in the basis but with a second order penalty and λ chosen by BIC the degrees of freedom are reduced from 23 to about 6.5. The fitted coefficients also exhibit smoothness and demonstrate an important difference between smoothing with natural splines and P -splines: in 5

6 the former, smoothness at the edges is ensured by the use of splines linear in the tails while in the latter, B-splines are used throughout the basis and smoothness is ensured by the penalty. t:transform 2.2 P -splines with a transformed basis The fitted coefficients with a B-spline basis have an attractive property: each coefficient is associated with a B-spline and the estimated value of the coefficient is approximately a weighted average of the observations in the vicinity of this B-spline, as the middle panels of Fig. 1 demonstrate. The second order penalty penalizes departures from linearity. An alternative strategy is to extract a linear component from the fitted trend and fit the remaining variation by fitting a smooth curve with a penalty that penalizes departures from zero. This approach has echoes of a mixed model where trend is split into a fixed part (the linear part) and the random part (the curved part); Green (1985) is an early reference to this idea which is also discussed by Verbyla et al. (1999). A transformation that achieves this decomposition with a B-spline basis and second order penalty penalty is given by Eilers (1999) in the discussion of the Verbyla et al. paper; see also Currie et al. (2006). Such transformations not only give access to (generalized) linear mixed model methods but also allow us to deal with problems of identifiability, as we will see in section 3. Welham et al. (2007) give a comprehensive review of mixed model representations of spline models. Let B = B(x), n c, be the regression matrix of B-splines and D D, c c, define the penalty matrix. Let UΦU be the singular value decomposition of D D where Φ is the diagonal matrix consisting of the eigenvalues φ 1, φ 2,..., φ c of D D in ascending order. We assume that a second order penalty is used so φ 1 = φ 2 = 0. Now a linear function is in the null space of D D so we can take U n = [1 : x ], c 2, as an orthogonal basis for the null space where 1 is (1,..., 1)/ c and x is (1, 2,..., c) centred and scaled to have unit length. Let U s, c (c 2), be the submatrix of U corresponding to the c 2 non-zero eigenvalues. We take U = [U n : U s ] and transform Bθ = Xβ + Zα, say, where X = BU n and Z = BU s (Φ + ) 0.5, and β = U nθ and α = (Φ + ) 0.5 U sθ ; here Φ + is the diagonal matrix consisting of the c 2 positive eigenvalues in Φ. With this transformation, the penalty θ D Dθ = α α. Furthermore, since β is unpenalized we may replace X = BU n = B[1 : x ] by [1, x] where 1 is a vector of 1 s of length n and x is the vector of year values. With these definitions in place the following estimation procedures 6

7 in (2.4) are equivalent: Regression matrix: B = B(x) [X : Z], X = [1, x], Z = BU s (Φ + ) 0.5 (2.10) eq:transform Penalty matrix: P B = λd D P F = λ blockdiag[o 2, I c 2 ] (2.11) eq:pstar where O 2 is a 2 2 matrix of zeros and I c 2 is the identity matrix of size c 2. The linear part, Xβ, is unpenalized, while the non-linear part, Zα, is penalized or shrunk towards zero. We interpret this representation as a mixed model with Xβ as the fixed part and Zα as the random part in section 2.3; see also Currie et al. (2006). Figure 1 explains how the new basis works. With c = 7 B-splines there are five basis functions in Z, as shown in the upper right panel. These new basis functions are very different from the original B-splines; first, they are no longer local functions and second, the high frequency functions have low amplitude (a consequence of the scaling by (Φ + ) 0.5 shown in the lower right panel). The lower left panel shows the values of ˆα, c = 23, from the unpenalized fit,, and the penalized fit, ; the shrinkage of the penalized estimates towards zero is evident. It is important to realise that although the amplitudes of the basis functions differ greatly their coefficients are equally penalized. Indeed, it would be possible to remove the high frequency/low amplitude basis functions from Z with little effect on the resulting fit, an idea exploited by Wood (2003). bsect:mixed 2.3 A mixed model representation Equations (2.10) and (2.11) say that fitting the penalized GLM with regression matrix B and penalty matrix P B = λd D is equivalent to fitting the penalized GLM with regression matrix [X : Z] and penalty matrix P F = λ blockdiag[o 2, I c 2 ]. With this second representation the scoring algorithm (2.4) becomes [ X W X X W Z Z W X Z W Z + λ I ] [ ˆβ ˆα ] = [ X W Z W ] z, (2.12) eq:mixed.equs where I = I c 2, z = X β +Z α+ W 1 (y µ) and W = diag( µ) with µ = e exp(x β +Z α) in the Poisson case. We recognise (2.12) as the mixed model equations for the linear mixed model z = Xβ + Zα + ɛ, α N (0, λ 1 I), ɛ N (0, W 1 ); (2.13) eq:mixed see Searle et al. (1992), p276. Smoothing parameters may be selected selected by maximizing the residual log-likelihood l(λ) = 1 2 log V 1 2 log X V 1 X 1 2 z (V 1 V 1 X(X V 1 X) 1 X V 1 ) z (2.14) eq:reml 7

8 where V = W 1 + ZGZ (2.15) eq:v and G = λ 1 I is the variance of the random effects. We now iterate between (2.12) and (2.14). We will return to this method in section 4 but in the present case we use the proposal of Schall (1991) for the estimation of β, α and the smoothing parameter λ in a generalized linear mixed model. With the same notation as Schall we let [ C X W X X W Z C = Z W X Z W Z + λ I ] (2.16) eq:cprimec and define T to be the lower (c 2) (c 2) block of the inverse of C C (corresponding to α). Schall s algorithm is 1. for given β = β, α = α and λ = λ estimate β and α from (2.12), and 2. for given β = β, α = α and λ = λ estimate λ from ˆλ 1 = α α c 2 v (2.17) eq:schall where v = λ Tr(T ). This fixed point iteration scheme yields approximate residual maximum likelihood (REML) estimates. We have found Schall s algorithm to provide an efficient solution. Approximate maximum likelihood estimates can be obtained by defining T to be the inverse of the lower (c 2) (c 2) block of C C. It follows from (2.7), the form of P F in (2.11) and the definition of T that the effective dimension of the fitted model can be written ED = 2 + (c 2 v) (2.18) eq:ed where 2 is the number of fixed effects, c 2 is the number of random effects and v = λtr(t ). We can interpret c 2 v as the effective degrees of freedom of the non-linear component of the effect of year. In the example in section 2.1 we have c p v = 5.1 with total ED = 7.1, slightly less smoothing than obtained with λ chosen by BIC when ED = 6.5. The decomposition (2.18) extends to the smooth APC model presented in the next section (see equation (3.7)). Overdispersion is a common problem with Poisson models. If Var(y) = σ 2 µ then Schall s estimate of σ 2 reduces to ˆσ 2 = (y ˆµ) Ŵ 1 (y ˆµ) n c + v 8 (2.19) eq:sigma2

9 which in our present example gives ˆσ 2 = 1.20; there is little evidence of serious overdispersion. In general, it is preferable to incorporate overdispersion directly into the estimation process and the mixed model approach enables this to happen in a natural way. The mixed model (2.13) becomes z = Xβ + Zα + ɛ, α N (0, λ 1 I), ɛ N (0, σ 2 1 W ) (2.20) eq:mixed2 and Schall s algorithm is modified as follows: in step 1, replace W by σ 2 W and add step 3, estimate σ 2 from (2.19). In this example there is little change: we find a fitted model with a slightly lower effective dimension of 7.01 and estimated overdispersion of ˆσ 2 = The model may also be fitted with standard software. We use the glmmpql( ) function of R (R Development Core Team, 2004) in the MASS library of Venables and Ripley (2002). The fitted model has effective dimension of 6.64 (computed from (2.6) (with W replaced by σ 2 W )) and estimated overdispersion of ˆσ 2 = We give some skeleton R code in Appendix B. Sect:APC 3 Smooth Age-Period-Cohort models We suppose that we have data matrices Y and E, both n a n y, of deaths and exposures respectively. The rows of Y and E are indexed by age at death x a and the columns by year of death x y. The classical approach to the APC model is the factor model in which the variation in the force of mortality, θ ijk at age i in year j for cohort k, is decomposed into three components: log θ ijk = α i + β j + γ k, i = 1,..., n a, j = 1,... n y, k = 1,..., n a + n y 1 (3.1) eq:factor where α i, β j and γ k are the age, period (year) and cohort effects respectively. With Poisson errors, this is a GLM so is easily fitted with standard software such as R. However, there is a difficulty with the interpretation of the fitted parameters since of the 2n a +2n y 1 parameters in (3.1) only 2n a +2n y 4 are identifiable; see Clayton and Schifflers (1987) for a careful discussion of the dangers of over-interpretation of the fitted parameters. Instead of trying to interpret the fitted parameters we consider the fitted log(mortality) surface which is unique. The upper left panel in Fig. 2 shows the mean fitted log(mortality) by age with the linear effect of age removed for the data,, for the factor model,, and for the smooth model described below, ; the corresponding plots for year and cohort are also given. These plots suggest that a smooth model in age, year and cohort is a natural alternative to the discrete factor model. The smoothness assumption is deceptive since we will see that this alone is sufficient to deal with the identifiability constraints. A smooth model may also deal with another problem with the APC 9

10 model: the cohort parameters which correspond to the oldest and youngest cohorts tend to be poorly estimated, a consequence of the small numbers of cells which contribute to estimates of the corner cohort parameters; the parameter estimates corresponding to the youngest cohorts in the CMI dataset are particularly unstable. A smooth model should help to deal with this instability. We assume that the parameters α, β and γ in (3.1) are smooth and define a smooth APC model as follows. Let M a, M y and M c be the n a n y matrices with entries age at death, year of death and year of birth (cohort) and let x a = vec(m a ), x y = vec(m y ) and x c = vec(m c ). Let B a = B(x a ) be the regression matrix of B-splines based on x a with similar definitions for B y and B c. We define a smooth APC regression matrix by B = [B a : B y : B c ] (3.2) eq:smooth with corresponding coefficients a = (a a, a y, a c). We impose smoothness on the coefficients a a, a y and a c by the block diagonal penalty matrix P = blockdiag[λ a D ad a, λ y D yd y, λ c D cd c ] (3.3) eq:p where D a, D y and D c are second order difference matrices and λ a, λ y and λ c are the smoothing parameters for the age, year and cohort parameters respectively. The model defined by (3.2) and (3.3) is a generalized additive model (GAM) (Hastie and Tibshirani, 1990) but instead of using back-fitting we fit directly with (2.4). However, the regression matrix in (3.2) is not of full rank so some care is required. There are a number of possibilities: we could use a small ridge penalty on the system of equations, as in Marx and Eilers (1998), or we could use a generalized inverse. A third possibility is to transform B to a non-singular basis; the transformation developed in section 2.2 enables us to extract the linear components of age, year and cohort, i.e., a plane in the age-year space. An important point is that all three methods give exactly the same fitted values. Let X a = [1, x a ], X y = [1, x y ] and X c = [1, x c ] be the n a n y 2 matrices corresponding to (2.10). Then removing the linear dependencies among the columns of X a, X y and X c we obtain the X matrix in the transformed model as X = [1 : x a : x y ]. (3.4) eq:bigx Note that although X is not unique the space spanned by X is, since this space equals the null space of P in (3.3). The Z matrix is given by Z = [Z a : Z y : Z c ] (3.5) eq:bigz 10

11 where, for example, Z a = B a U a:s (Φ + a ) 0.5, and U a:s and Φ + a are obtained from the singular value decomposition of D ad a as in section 2.2. Lastly, with the new regression matrix defined as [X : Z], the penalty transforms into P = blockdiag[o 3, λ a I ca 2, λ y I cy 2, λ c I cc 2] (3.6) eq:block where O 3 is the 3 3 matrix of 0 s and c a 2 is the column dimension of Z a, etc. The model may now be fitted as in section 2.3 with fixed regression matrix given by (3.4), random regression matrix by (3.5) and penalty matrix by (3.6). We fit the smooth APC model with B a, n 10, B y, n 13 and B c, n 28 where n = 3976, i.e., c a = 10, c y = 13 and c y = 28. Schall s algorithm, (2.16) and (2.17), extends as follows: let T a be the (c a 2) (c a 2) block of the inverse of C C which corresponds to the Z a coefficients; we take similar definitions for T y and T c for Z y and Z c. Fitting the Poisson model without overdispersion we find with REML that the dimension of the fitted model is reduced from 249 for the factor model to an effective dimension of about Generalizing (2.18) we write ED = 3 + (c a 2 v a ) + (c y 2 v y ) + (c c 2 v c ) (3.7) eq:partition where there are three fixed effects, c a 2 is the column dimension of Z a and v a = λ a Tr(T a ), etc. The non-linear components of the effects of age, year and cohort are c a 2 v a = 7.7, c y 2 v y = 8.1 and c c 2 v c = 17.0 respectively in the present example. The estimate of overdispersion using (2.19) is ˆσ 2 = 2.00, evidence of some overdispersion. We refit the model with σ 2 included as part of the estimation process. With overdispersion included in the estimation process we would expect heavier smoothing since the smoothed surface will be less inclined to follow the local behaviour of the observed mortality surface. The effective dimension is further reduced to about 33.8 the estimated value of σ 2 is 2.00, as before. The resulting detrended mean log(mortality) curves have been added to Fig. 2; the fitted log(mortality) is also shown for age 65. Skeleton R code is provided in Appendix B. In the previous paragraph we described overdispersion as a variance effect. However, with mortality data this approach ignores effects such as cold winters which can inflate death rates. In the next section we use the approach of Perperoglou and Eilers (unpublished) where overdispersion is viewed not as a variance problem but as a problem with the linear structure of the model. They suggest the addition of individual random effects to the linear predictor as a way of dealing with the lack of fit that is otherwise modelled with overdispersion. 11

12 Sect:Over 4 Overdispersion as individual random effects In the previous section we showed that the linear predictor for the APC model has a mixed model representation Xβ + Zα where X and Z are defined in (3.4) and (3.5) respectively. Perperoglou and Eilers (unpublished) modified the linear predictor by the addition of individual random effects to give Xβ + Zα + γ (4.1) eq:over where the length of γ is the same as the number of observations, n, say. Thus, the model has more parameters than observations but a ridge penalty on γ maintains identifiability and shrinks γ towards zero; the penalty (3.6) becomes P = blockdiag[o 3, λ a I ca 2, λ y I cy 2, λ c I cc 2, κi n ] = blockdiag[o 3, P, κi n ], (4.2) eq:blockover say. We have a mixed model where the variance of the random effects α and γ is given by G = blockdiag[λ 1 a I ca 2, λ 1 y I cy 2, λ 1 c I cc 2, κ 1 I n ] = blockdiag[p 1, κ 1 I n ]. (4.3) eq:varg The mixed model equations (2.12) become X W X X W Z X W Z W X Z W Z + P Z W W X W Z W + κin ˆβ ˆα ˆγ = X W Z W W z. (4.4) eq:mixed.equs2 This is a very large system of equations but Perperoglou and Eilers (unpublished) provide a device which facilitates its solution. We define a modified weight matrix W = κ( W + κi n ) 1 W (4.5) eq:wstar and solve (4.4) for ˆγ to get κˆγ = W ( z X ˆβ Z ˆα) (4.6) eq:kgamma from which it follows that (4.4) reduces to [ X W X X W Z Z W X Z W Z + P ] [ ˆβ ˆα ] = [ X W Z W ] z. (4.7) eq:mixed.equs3 This is the same system as obtained for the original smooth APC model but with the weight matrix W replaced by W. For given κ we optimize over the remaining parameters by using Schall s algorithm; κ is estimated by maximizing the profile residual log-likelihood l(ˆλ a, ˆλ y, ˆλ c, κ) from (2.14). It is essential to avoid the inversion of large matrices such as the left hand side of (4.4) and some matrix identities to this end are provided in Appendix A. 12

13 Figure 3 shows the results of fitting the model. Figure 3 also shows the profile log-likelihood l(ˆλ a, ˆλ y, ˆλ c, κ) plotted against log κ; evidently, the smoothing parameter which shrinks the individual random effects towards zero is sharply estimated. Values of the observed and smoothed log(mortality) together with the estimated individual effects are also shown for ages forty and sixty. There is a noticeable difference in the individual random effects at these ages. An explanation can be found in the lower right panel of Fig. 3 which gives the numbers of deaths at ages sixty and forty. Since Var(log(d/e)) 1/d the values of log(d/e) are a good estimate of the true underlying smooth log(mortality) at age sixty, but a poor estimate at age forty. It follows that the residuals log(d/e) X ˆβ are almost entirely explained by the individual random effects at age sixty, while the stochastic element of the residual is substantial at age forty. Furthermore the individual random effects show systematic deviations between the data and the model at age forty, evidence of lack of fit at this age. The modified weight matrix W in (4.5) deserves some comment. We note that W is a diagonal matrix with entries w i = κ w i /( w i + κ) with w i = µ in the Poisson case considered here. Thurston et al. (2000) used a similar weight matrix in their algorithm to fit the negative binomial distribution, a distribution often used to model overdispersion; in their paper the weight w i did not include the estimated random effect γ i. For further comment on this point see Perperoglou and Eilers (unpublished). Sect:Disc 5 Discussion The model (4.1) with overdispersion involves choosing four smoothing parameters in the framework of a GLM with over four thousand linear parameters; Schall s algorithm combined with the modified weight method in (4.5) and (4.7) gives a low-footprint, efficient method of model fitting with simple direct coding. Our conclusion is that Schall s algorithm (1991) is a simple and effective method of fitting in the mixed model setting. In this paper we have considered random effects acting at the individual age and year level. One other possibility arises as a result of such things as outbreaks of influenza or cold winters. Such effects can be modelled as smooth random effects which act on the mortality of a whole year. The individual random effects γ with length n = n a n y in (4.1) are replaced by annual random effects (I ny B a )γ where γ has length n y c s ; here c s is the column dimension of the B-spline basis B a and denotes the Kronecker product. Some initial results from this approach are reported in Kirkby et al. (2007). 13

14 We have used B-splines and penalties to smooth the APC model. Transformation of the B- spline basis enables the model to be expressed as a mixed model which allows the modelling of overdispersion as individual random effects. The problem of identifiability is addressed with the same transformation. In conclusion we offer a unified approach for smoothing the APC model in a mixed model framework, dealing with non-identifiability in the APC model and modelling overdispersed counts. 14

15 References Akaike H (1973) Maximum likelihood identification of Gaussian autoregressive moving average models. Biometrika, 60, Carstensen B (2007) Age-period-cohort models for the Lexis diagram. Statistics in Medicine, 26, Clayton D and Schifflers E (1987) Models for temporal variation in cancer rates. II: Age-periodcohort models. Statistics in Medicine, 6, Craven P and Wahba G (1979) Smoothing noisy data with spline functions. Numerische Mathematik, 31, Currie ID, Durban M, Eilers PHC (2004) Smoothing and forecasting mortality rates. Statistical Modelling 4, Currie ID, Durban M and Eilers PHC (2006) Generalized linear array models with applications to multidimensional smoothing. Journal of the Royal Statistical Society: Series B 68, Eilers PHC (1999) Discussion of The analysis of designed experiments and longitudinal data by using smoothing splines (with discussion) (by AP Verbyla, BR Cullis, MG Kenward and SJ Welham) Applied Statistics, 48, Eilers PHC and Marx BD (1996) Flexible smoothing with B-splines and penalties. Statistical Science 11, Friedman JH and Silverman BW (1989) Flexible parsimonious smoothing and additive modeling. Technometrics 31, Green PJ (1985) Linear models for field trials, smoothing and cross-validation. Biometrika, 72, Hastie TJ and Tibshirani RJ (1990) Generalized additive models. London: Chapman and Hall. Heuer C (1997) Modeling of time trends and interactions in vital rates using restricted regression splines. Biometrics, 53, Holford TR (1983) The estimation of age, period and cohort effects for vital rates. Biometrics, 39, Kirkby JG and Currie ID (2007) Smooth models of mortality with period shocks. Proceedings of 22nd International Workshop on Statistical Modelling, Barcelona, to appear. Marx BD and Eilers PHC (1998) Direct generalized additive modeling with penalized likelihood. Computational Statistics and Data Analysis, 28, Ogata Y, Katsura K, Keiding N, Holst C and Green A (2000) Empirical Bayes Age-Period- 15

16 Cohort analysis of retrospective incidence data. Scandinavian Journal of Statistics, 27, Perperoglou A and Eilers PHC Overdispersion modelling with individual random effects. Unpublished manuscript. R Development Core Team (2004). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN , URL Schall R (1991) Estimation in generalized linear models with random effects. Biometrika, 78, Schwarz G (1978) Estimating the dimension of a model. Annals of Statistics, 6, Searle SR, Casella G and McCulloch CE (1992) Variance components. New York: John Wiley & Sons. Silverman BW (1985) Some aspects of the spline smoothing approach to nonparametric regression curve fitting (with Discussion). Journal of the Royal Statistical Society: Series B, 47, Thurston SW, Wand MP and Wiencke JK (2000) Negative binomial additive models. Biometrics, 56, Venables WN and Ripley BD (2002) Modern Applied Statistics with S. New York: Springer- Verlag. Verbyla AP, Cullis BR, Kenward MG and Welham SJ (1999) The analysis of designed experiments and longitudinal data by using smoothing splines (with discussion). Applied Statistics, 48, Wahba G (1983) Bayesian confidence intervals for the cross-validated smoothing spline. Journal of the Royal Statistical Society: Series B, 45, Welham SJ, Cullis BR, Kenward MG and Thompson R (2007) A comparison of mixed model splines for curve fitting. Australian and New Zealand Journal of Statistics 49, Wood SN (2003) Thin plate regression splines. Journal of the Royal Statistical Society: Series B 65, Wood SN (2006) Generalized additive models: an introduction with R. London: Chapman and Hall. 16

17 Appendix A We provide some matrix identities which allow estimation in the smooth APC model with individual random effects, model (4.1) and (4.2). In (4.1) let C C = X W X X W Z X W Z W X Z W Z + P Z W W X W Z W + κin. (5.1) eq:mixed.app1 This matrix is (c a + c y + c c 3 + n) (c a + c y + c c 3 + n). For given κ, Schall s algorithm requires the leading (c a + c y + c c 3) (c a + c y + c c 3) block of (C C) 1. It follows from results on the inverse of partition matrices and the definition of W in (4.5) that this matrix is given by [ X W X X W Z ] 1, (5.2) eq:mixed.app2 Z W X Z W Z + P the inverse of the matrix on the left hand side of (4.7). The Schall estimation scheme, as in section 3, is now used (conditional on κ) to estimate the remaining parameters. To estimate κ we compute the profile residual log-likelihood l(ˆλ a, ˆλ y, ˆλ c, κ) from 1 2 log V 1 2 log X V 1 X 1 2 z (V 1 V 1 X(X V 1 X) 1 X V 1 ) z. (5.3) eq:app3 Now, with the variance of the random effects given by (4.3), we find [ V = W 1 P 1 O + [Z : I n ] O κ 1 I n ] [Z : I n ] (5.4) eq:app4 = W 1 + ZP 1 Z (5.5) eq:app5 where P is defined in (4.2). It follows that V 1 and V are V 1 = W W Z(P + Z W Z) 1 Z W (5.6) eq:app6 and V = (λ ca 2 a λy cy 2 λ cc 2 c ) 1 W 1 P + Z W Z. (5.7) eq:app7 17

18 Appendix B Skeleton code to fit the mixed model (2.13) is given below. It is assumed that deaths and exposures are stored in vectors Dth and Exp, and the fixed and random effects regression matrices are X and Z respectively. The function myglmmpql is a copy of the R-function glmmpql in which the line mcall$method <- "ML" is replaced by mcall$method <- "REML". library(nlme) library(mass) Id <- factor(rep(1,length(dth))) data.fr <- groupeddata(dth ~ X[,-1] rep(1,length = length(dth)), data = data.frame(dth, X, Z, Exp)) fit <- myglmmpql(dth ~ X[,-1] + offset(log(exp)), data = data.fr, random = list(id = pdident(~z-1)), family = poisson) Skeleton code to fit the penalized APC model in section 3 is given below. The fixed and random effects regression matrices are X, and Z.a, Z.y and Z.c respectively. Id <- factor(rep(1, length(dth))) Z.block <- list(list(id = pdident(~z.a-1)), list(id = pdident(~z.y-1)), list(id = pdident(~z.c-1))) Z.block <- unlist(z.block, recursive = FALSE) data.fr <- groupeddata(dth ~ X[,-1] rep(1,length = length(dth)), data = data.frame(dth, X, Z.a, Z.y, Z.c, Exp)) fit <- myglmmpql(dth ~ X[,-1] + offset(log(exp)), data = data.fr, random = Z.block, family=poisson) 18

19 B spline Transformed basis Year Year log(mortality) Age = 65 Npar = 23 DF = 23 log(mortality) Age = 65 Npar = 23 DF = Year Year Coefficient Scaling of basis functions Index of basis functions Index of basis functions Fig.Bases Figure 1: (a) B-spline basis (b) transformed basis (c) unpenalized regression: coefficient,, data (d) penalized regression: coefficient,, data (e) unpenalized,, and penalized,, coefficients in transformed regression (f) scaling of basis functions, φ 0.5 i, i = 3,..., c. 19

20 log(mortality) log(mortality) Factor Smooth Age Year log(mortality) log(mortality) Age Year of Birth Year Fig.Detrend Figure 2: Age-Period-Cohort model: detrended plots of mean log(mortality) by (a) age (b) year (c) cohort; (d) observed and fitted log(mortality) at age 65 20

21 profile residual logl log(mortality) Age = log(kappa) Year log(mortality) Age = 60 Number of deaths Age = 60 Age = Year Year Fig.APC.Ind Figure 3: Age-Period-Cohort model with individual random effects: (a) profile residual loglikelihood l(ˆλ a, ˆλ y, ˆλ c, κ) against log κ; (b) and (c) observed,, and fitted log(mortality), X ˆβ + Z ˆα, and individual random effects, ˆγ; (d) numbers of deaths. 21

Smoothing Age-Period-Cohort models with P -splines: a mixed model approach

Smoothing Age-Period-Cohort models with P -splines: a mixed model approach Smoothing Age-Period-Cohort models with P -splines: a mixed model approach Running headline: Smooth Age-Period-Cohort models I D Currie, Department of Actuarial Mathematics and Statistics, and the Maxwell

More information

Currie, Iain Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh EH14 4AS, UK

Currie, Iain Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh EH14 4AS, UK An Introduction to Generalized Linear Array Models Currie, Iain Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh EH14 4AS, UK E-mail: I.D.Currie@hw.ac.uk 1 Motivating

More information

Using P-splines to smooth two-dimensional Poisson data

Using P-splines to smooth two-dimensional Poisson data 1 Using P-splines to smooth two-dimensional Poisson data Maria Durbán 1, Iain Currie 2, Paul Eilers 3 17th IWSM, July 2002. 1 Dept. Statistics and Econometrics, Universidad Carlos III de Madrid, Spain.

More information

GLAM An Introduction to Array Methods in Statistics

GLAM An Introduction to Array Methods in Statistics GLAM An Introduction to Array Methods in Statistics Iain Currie Heriot Watt University GLAM A Generalized Linear Array Model is a low-storage, high-speed, method for multidimensional smoothing, when data

More information

Multidimensional Density Smoothing with P-splines

Multidimensional Density Smoothing with P-splines Multidimensional Density Smoothing with P-splines Paul H.C. Eilers, Brian D. Marx 1 Department of Medical Statistics, Leiden University Medical Center, 300 RC, Leiden, The Netherlands (p.eilers@lumc.nl)

More information

P -spline ANOVA-type interaction models for spatio-temporal smoothing

P -spline ANOVA-type interaction models for spatio-temporal smoothing P -spline ANOVA-type interaction models for spatio-temporal smoothing Dae-Jin Lee 1 and María Durbán 1 1 Department of Statistics, Universidad Carlos III de Madrid, SPAIN. e-mail: dae-jin.lee@uc3m.es and

More information

A Hierarchical Perspective on Lee-Carter Models

A Hierarchical Perspective on Lee-Carter Models A Hierarchical Perspective on Lee-Carter Models Paul Eilers Leiden University Medical Centre L-C Workshop, Edinburgh 24 The vantage point Previous presentation: Iain Currie led you upward From Glen Gumbel

More information

Flexible Spatio-temporal smoothing with array methods

Flexible Spatio-temporal smoothing with array methods Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session IPS046) p.849 Flexible Spatio-temporal smoothing with array methods Dae-Jin Lee CSIRO, Mathematics, Informatics and

More information

Identification of the age-period-cohort model and the extended chain ladder model

Identification of the age-period-cohort model and the extended chain ladder model Identification of the age-period-cohort model and the extended chain ladder model By D. KUANG Department of Statistics, University of Oxford, Oxford OX TG, U.K. di.kuang@some.ox.ac.uk B. Nielsen Nuffield

More information

Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting

Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting Andreas Groll 1 and Gerhard Tutz 2 1 Department of Statistics, University of Munich, Akademiestrasse 1, D-80799, Munich,

More information

Space-time modelling of air pollution with array methods

Space-time modelling of air pollution with array methods Space-time modelling of air pollution with array methods Dae-Jin Lee Royal Statistical Society Conference Edinburgh 2009 D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 1 Motivation

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

1 Mixed effect models and longitudinal data analysis

1 Mixed effect models and longitudinal data analysis 1 Mixed effect models and longitudinal data analysis Mixed effects models provide a flexible approach to any situation where data have a grouping structure which introduces some kind of correlation between

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

Theorems. Least squares regression

Theorems. Least squares regression Theorems In this assignment we are trying to classify AML and ALL samples by use of penalized logistic regression. Before we indulge on the adventure of classification we should first explain the most

More information

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science 1 Likelihood Ratio Tests that Certain Variance Components Are Zero Ciprian M. Crainiceanu Department of Statistical Science www.people.cornell.edu/pages/cmc59 Work done jointly with David Ruppert, School

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Spatial Process Estimates as Smoothers: A Review

Spatial Process Estimates as Smoothers: A Review Spatial Process Estimates as Smoothers: A Review Soutir Bandyopadhyay 1 Basic Model The observational model considered here has the form Y i = f(x i ) + ɛ i, for 1 i n. (1.1) where Y i is the observed

More information

Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives

Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives TR-No. 14-06, Hiroshima Statistical Research Group, 1 11 Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives Mariko Yamamura 1, Keisuke Fukui

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

Recovering Indirect Information in Demographic Applications

Recovering Indirect Information in Demographic Applications Recovering Indirect Information in Demographic Applications Jutta Gampe Abstract In many demographic applications the information of interest can only be estimated indirectly. Modelling events and rates

More information

Model checking overview. Checking & Selecting GAMs. Residual checking. Distribution checking

Model checking overview. Checking & Selecting GAMs. Residual checking. Distribution checking Model checking overview Checking & Selecting GAMs Simon Wood Mathematical Sciences, University of Bath, U.K. Since a GAM is just a penalized GLM, residual plots should be checked exactly as for a GLM.

More information

Bivariate Weibull-power series class of distributions

Bivariate Weibull-power series class of distributions Bivariate Weibull-power series class of distributions Saralees Nadarajah and Rasool Roozegar EM algorithm, Maximum likelihood estimation, Power series distri- Keywords: bution. Abstract We point out that

More information

Boosting Methods: Why They Can Be Useful for High-Dimensional Data

Boosting Methods: Why They Can Be Useful for High-Dimensional Data New URL: http://www.r-project.org/conferences/dsc-2003/ Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) March 20 22, Vienna, Austria ISSN 1609-395X Kurt Hornik,

More information

Estimation of cumulative distribution function with spline functions

Estimation of cumulative distribution function with spline functions INTERNATIONAL JOURNAL OF ECONOMICS AND STATISTICS Volume 5, 017 Estimation of cumulative distribution function with functions Akhlitdin Nizamitdinov, Aladdin Shamilov Abstract The estimation of the cumulative

More information

Simultaneous Confidence Bands for the Coefficient Function in Functional Regression

Simultaneous Confidence Bands for the Coefficient Function in Functional Regression University of Haifa From the SelectedWorks of Philip T. Reiss August 7, 2008 Simultaneous Confidence Bands for the Coefficient Function in Functional Regression Philip T. Reiss, New York University Available

More information

Model selection and comparison

Model selection and comparison Model selection and comparison an example with package Countr Tarak Kharrat 1 and Georgi N. Boshnakov 2 1 Salford Business School, University of Salford, UK. 2 School of Mathematics, University of Manchester,

More information

Generalized Additive Models

Generalized Additive Models Generalized Additive Models The Model The GLM is: g( µ) = ß 0 + ß 1 x 1 + ß 2 x 2 +... + ß k x k The generalization to the GAM is: g(µ) = ß 0 + f 1 (x 1 ) + f 2 (x 2 ) +... + f k (x k ) where the functions

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Improving the Precision of Estimation by fitting a Generalized Linear Model, and Quasi-likelihood.

Improving the Precision of Estimation by fitting a Generalized Linear Model, and Quasi-likelihood. Improving the Precision of Estimation by fitting a Generalized Linear Model, and Quasi-likelihood. P.M.E.Altham, Statistical Laboratory, University of Cambridge June 27, 2006 This article was published

More information

Least Absolute Shrinkage is Equivalent to Quadratic Penalization

Least Absolute Shrinkage is Equivalent to Quadratic Penalization Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr

More information

Some properties of Likelihood Ratio Tests in Linear Mixed Models

Some properties of Likelihood Ratio Tests in Linear Mixed Models Some properties of Likelihood Ratio Tests in Linear Mixed Models Ciprian M. Crainiceanu David Ruppert Timothy J. Vogelsang September 19, 2003 Abstract We calculate the finite sample probability mass-at-zero

More information

PENALIZING YOUR MODELS

PENALIZING YOUR MODELS PENALIZING YOUR MODELS AN OVERVIEW OF THE GENERALIZED REGRESSION PLATFORM Michael Crotty & Clay Barker Research Statisticians JMP Division, SAS Institute Copyr i g ht 2012, SAS Ins titut e Inc. All rights

More information

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models

More information

Chapter 7: Model Assessment and Selection

Chapter 7: Model Assessment and Selection Chapter 7: Model Assessment and Selection DD3364 April 20, 2012 Introduction Regression: Review of our problem Have target variable Y to estimate from a vector of inputs X. A prediction model ˆf(X) has

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Regularization: Ridge Regression and the LASSO

Regularization: Ridge Regression and the LASSO Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression

More information

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA Kasun Rathnayake ; A/Prof Jun Ma Department of Statistics Faculty of Science and Engineering Macquarie University

More information

Basis Penalty Smoothers. Simon Wood Mathematical Sciences, University of Bath, U.K.

Basis Penalty Smoothers. Simon Wood Mathematical Sciences, University of Bath, U.K. Basis Penalty Smoothers Simon Wood Mathematical Sciences, University of Bath, U.K. Estimating functions It is sometimes useful to estimate smooth functions from data, without being too precise about the

More information

Variable Selection and Model Choice in Survival Models with Time-Varying Effects

Variable Selection and Model Choice in Survival Models with Time-Varying Effects Variable Selection and Model Choice in Survival Models with Time-Varying Effects Boosting Survival Models Benjamin Hofner 1 Department of Medical Informatics, Biometry and Epidemiology (IMBE) Friedrich-Alexander-Universität

More information

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina Local Likelihood Bayesian Cluster Modeling for small area health data Andrew Lawson Arnold School of Public Health University of South Carolina Local Likelihood Bayesian Cluster Modelling for Small Area

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Regularization in Cox Frailty Models

Regularization in Cox Frailty Models Regularization in Cox Frailty Models Andreas Groll 1, Trevor Hastie 2, Gerhard Tutz 3 1 Ludwig-Maximilians-Universität Munich, Department of Mathematics, Theresienstraße 39, 80333 Munich, Germany 2 University

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science. Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

Exact Likelihood Ratio Tests for Penalized Splines

Exact Likelihood Ratio Tests for Penalized Splines Exact Likelihood Ratio Tests for Penalized Splines By CIPRIAN CRAINICEANU, DAVID RUPPERT, GERDA CLAESKENS, M.P. WAND Department of Biostatistics, Johns Hopkins University, 615 N. Wolfe Street, Baltimore,

More information

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

Estimation of spatiotemporal effects by the fused lasso for densely sampled spatial data using body condition data set from common minke whales

Estimation of spatiotemporal effects by the fused lasso for densely sampled spatial data using body condition data set from common minke whales Estimation of spatiotemporal effects by the fused lasso for densely sampled spatial data using body condition data set from common minke whales Mariko Yamamura 1, Hirokazu Yanagihara 2, Keisuke Fukui 3,

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Supervised Learning: Regression I Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Some of the

More information

Survival Analysis I (CHL5209H)

Survival Analysis I (CHL5209H) Survival Analysis Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca January 7, 2015 31-1 Literature Clayton D & Hills M (1993): Statistical Models in Epidemiology. Not really

More information

Recap. HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis:

Recap. HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis: 1 / 23 Recap HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis: Pr(G = k X) Pr(X G = k)pr(g = k) Theory: LDA more

More information

Inversion Base Height. Daggot Pressure Gradient Visibility (miles)

Inversion Base Height. Daggot Pressure Gradient Visibility (miles) Stanford University June 2, 1998 Bayesian Backtting: 1 Bayesian Backtting Trevor Hastie Stanford University Rob Tibshirani University of Toronto Email: trevor@stat.stanford.edu Ftp: stat.stanford.edu:

More information

Lecture 8. Poisson models for counts

Lecture 8. Poisson models for counts Lecture 8. Poisson models for counts Jesper Rydén Department of Mathematics, Uppsala University jesper.ryden@math.uu.se Statistical Risk Analysis Spring 2014 Absolute risks The failure intensity λ(t) describes

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

Outline. Mixed models in R using the lme4 package Part 3: Longitudinal data. Sleep deprivation data. Simple longitudinal data

Outline. Mixed models in R using the lme4 package Part 3: Longitudinal data. Sleep deprivation data. Simple longitudinal data Outline Mixed models in R using the lme4 package Part 3: Longitudinal data Douglas Bates Longitudinal data: sleepstudy A model with random effects for intercept and slope University of Wisconsin - Madison

More information

Modelling Survival Data using Generalized Additive Models with Flexible Link

Modelling Survival Data using Generalized Additive Models with Flexible Link Modelling Survival Data using Generalized Additive Models with Flexible Link Ana L. Papoila 1 and Cristina S. Rocha 2 1 Faculdade de Ciências Médicas, Dep. de Bioestatística e Informática, Universidade

More information

The lasso, persistence, and cross-validation

The lasso, persistence, and cross-validation The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University

More information

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Jeremy S. Conner and Dale E. Seborg Department of Chemical Engineering University of California, Santa Barbara, CA

More information

Geographically Weighted Regression as a Statistical Model

Geographically Weighted Regression as a Statistical Model Geographically Weighted Regression as a Statistical Model Chris Brunsdon Stewart Fotheringham Martin Charlton October 6, 2000 Spatial Analysis Research Group Department of Geography University of Newcastle-upon-Tyne

More information

Repeated ordinal measurements: a generalised estimating equation approach

Repeated ordinal measurements: a generalised estimating equation approach Repeated ordinal measurements: a generalised estimating equation approach David Clayton MRC Biostatistics Unit 5, Shaftesbury Road Cambridge CB2 2BW April 7, 1992 Abstract Cumulative logit and related

More information

Forecasting with the age-period-cohort model and the extended chain-ladder model

Forecasting with the age-period-cohort model and the extended chain-ladder model Forecasting with the age-period-cohort model and the extended chain-ladder model By D. KUANG Department of Statistics, University of Oxford, Oxford OX1 3TG, U.K. di.kuang@some.ox.ac.uk B. Nielsen Nuffield

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă mmp@stat.washington.edu Reading: Murphy: BIC, AIC 8.4.2 (pp 255), SRM 6.5 (pp 204) Hastie, Tibshirani

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.

More information

Nonparametric Small Area Estimation Using Penalized Spline Regression

Nonparametric Small Area Estimation Using Penalized Spline Regression Nonparametric Small Area Estimation Using Penalized Spline Regression 0verview Spline-based nonparametric regression Nonparametric small area estimation Prediction mean squared error Bootstrapping small

More information

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Linear Models 1. Isfahan University of Technology Fall Semester, 2014 Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Log-linear multidimensional Rasch model for capture-recapture

Log-linear multidimensional Rasch model for capture-recapture Log-linear multidimensional Rasch model for capture-recapture Elvira Pelle, University of Milano-Bicocca, e.pelle@campus.unimib.it David J. Hessen, Utrecht University, D.J.Hessen@uu.nl Peter G.M. Van der

More information

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown. Weighting We have seen that if E(Y) = Xβ and V (Y) = σ 2 G, where G is known, the model can be rewritten as a linear model. This is known as generalized least squares or, if G is diagonal, with trace(g)

More information

Modelling geoadditive survival data

Modelling geoadditive survival data Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

mgcv: GAMs in R Simon Wood Mathematical Sciences, University of Bath, U.K.

mgcv: GAMs in R Simon Wood Mathematical Sciences, University of Bath, U.K. mgcv: GAMs in R Simon Wood Mathematical Sciences, University of Bath, U.K. mgcv, gamm4 mgcv is a package supplied with R for generalized additive modelling, including generalized additive mixed models.

More information

Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May

Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May 5-7 2008 Peter Schlattmann Institut für Biometrie und Klinische Epidemiologie

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

Restricted Likelihood Ratio Tests in Nonparametric Longitudinal Models

Restricted Likelihood Ratio Tests in Nonparametric Longitudinal Models Restricted Likelihood Ratio Tests in Nonparametric Longitudinal Models Short title: Restricted LR Tests in Longitudinal Models Ciprian M. Crainiceanu David Ruppert May 5, 2004 Abstract We assume that repeated

More information

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I.

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I. Vector Autoregressive Model Vector Autoregressions II Empirical Macroeconomics - Lect 2 Dr. Ana Beatriz Galvao Queen Mary University of London January 2012 A VAR(p) model of the m 1 vector of time series

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

COMPARING PARAMETRIC AND SEMIPARAMETRIC ERROR CORRECTION MODELS FOR ESTIMATION OF LONG RUN EQUILIBRIUM BETWEEN EXPORTS AND IMPORTS

COMPARING PARAMETRIC AND SEMIPARAMETRIC ERROR CORRECTION MODELS FOR ESTIMATION OF LONG RUN EQUILIBRIUM BETWEEN EXPORTS AND IMPORTS Applied Studies in Agribusiness and Commerce APSTRACT Center-Print Publishing House, Debrecen DOI: 10.19041/APSTRACT/2017/1-2/3 SCIENTIFIC PAPER COMPARING PARAMETRIC AND SEMIPARAMETRIC ERROR CORRECTION

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

Appendix A. Numeric example of Dimick Staiger Estimator and comparison between Dimick-Staiger Estimator and Hierarchical Poisson Estimator

Appendix A. Numeric example of Dimick Staiger Estimator and comparison between Dimick-Staiger Estimator and Hierarchical Poisson Estimator Appendix A. Numeric example of Dimick Staiger Estimator and comparison between Dimick-Staiger Estimator and Hierarchical Poisson Estimator As described in the manuscript, the Dimick-Staiger (DS) estimator

More information

Checking, Selecting & Predicting with GAMs. Simon Wood Mathematical Sciences, University of Bath, U.K.

Checking, Selecting & Predicting with GAMs. Simon Wood Mathematical Sciences, University of Bath, U.K. Checking, Selecting & Predicting with GAMs Simon Wood Mathematical Sciences, University of Bath, U.K. Model checking Since a GAM is just a penalized GLM, residual plots should be checked, exactly as for

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Proteomics and Variable Selection

Proteomics and Variable Selection Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial

More information

Regularization Methods for Additive Models

Regularization Methods for Additive Models Regularization Methods for Additive Models Marta Avalos, Yves Grandvalet, and Christophe Ambroise HEUDIASYC Laboratory UMR CNRS 6599 Compiègne University of Technology BP 20529 / 60205 Compiègne, France

More information

CHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA

CHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA STATISTICS IN MEDICINE, VOL. 17, 59 68 (1998) CHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA J. K. LINDSEY AND B. JONES* Department of Medical Statistics, School of Computing Sciences,

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

Outlier detection and variable selection via difference based regression model and penalized regression

Outlier detection and variable selection via difference based regression model and penalized regression Journal of the Korean Data & Information Science Society 2018, 29(3), 815 825 http://dx.doi.org/10.7465/jkdi.2018.29.3.815 한국데이터정보과학회지 Outlier detection and variable selection via difference based regression

More information

Improved Liu Estimators for the Poisson Regression Model

Improved Liu Estimators for the Poisson Regression Model www.ccsenet.org/isp International Journal of Statistics and Probability Vol., No. ; May 202 Improved Liu Estimators for the Poisson Regression Model Kristofer Mansson B. M. Golam Kibria Corresponding author

More information

On Properties of QIC in Generalized. Estimating Equations. Shinpei Imori

On Properties of QIC in Generalized. Estimating Equations. Shinpei Imori On Properties of QIC in Generalized Estimating Equations Shinpei Imori Graduate School of Engineering Science, Osaka University 1-3 Machikaneyama-cho, Toyonaka, Osaka 560-8531, Japan E-mail: imori.stat@gmail.com

More information

Various Issues in Fitting Contingency Tables

Various Issues in Fitting Contingency Tables Various Issues in Fitting Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Complete Tables with Zero Entries In contingency tables, it is possible to have zero entries in a

More information

On prediction and density estimation Peter McCullagh University of Chicago December 2004

On prediction and density estimation Peter McCullagh University of Chicago December 2004 On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating

More information

Sparse orthogonal factor analysis

Sparse orthogonal factor analysis Sparse orthogonal factor analysis Kohei Adachi and Nickolay T. Trendafilov Abstract A sparse orthogonal factor analysis procedure is proposed for estimating the optimal solution with sparse loadings. In

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

CMSC858P Supervised Learning Methods

CMSC858P Supervised Learning Methods CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors

More information