Using P-splines to smooth two-dimensional Poisson data

1 Using P-splines to smooth two-dimensional Poisson data Maria Durbán 1, Iain Currie 2, Paul Eilers 3 17th IWSM, July 2002. 1 Dept. Statistics and Econometrics, Universidad Carlos III de Madrid, Spain. 2 Dept. Actuarial Mathematics and Statistics, Heriot-Watt University, Edinburgh, UK 3 Department of Medical Statistics, Leiden University Medical Center, The Netherlands

What is this talk about? 2

What is this talk about? 2 Introduction The data P-splines Smoothing Poisson data with P-splines (one dimensional case)

What is this talk about? 2 Introduction The data P-splines Smoothing Poisson data with P-splines (one dimensional case) Several models for two-dimensional Poisson data. Generalized additive model Two-dimensional smoothing with penalties Dimension reduction using P-splines

The data 3 Male policyholders, source: Continuous Mortality Investigation Bureau (CMIB). For each calendar year (1947-1999) and each age (11-100) we have: Number of years lived (the exposure). Number of policy claims (deaths). Mortality of male policyholders has improved rapidly over the last 30 years Model mortality trends overtime and dependence on age.

P-spline Use B-splines as the basis for the regression. Modify the log-likelihood by a difference penalty on the regression coefficients. y = f(x) + ɛ f(x) Ba S = (y Ba) (y Ba) + λa D Da â = (B B + λd D) 1 B y 4 B-spline basis Scaled B-splines and their sum 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0 10 20 30 40 0 10 20 30 40

Poisson data and P-splines, 1D-case 5

Poisson data and P-splines, 1D-case 5 E x = number of years lived aged x y x = number of deaths aged x Y x P (E x θ x ) η = log(θ x ) = Ba Maximise l(a; y x ) 1 2 λa D Da â t+1 = (B W t B + λd D) 1 B W t z t where z = η + W 1 (y µ) is the working variable and W = diag(µ) is the diagonal matrix of weights.

1. A generalized additive model 6

1. A generalized additive model 6 Y = (y ij ) matrix of deaths at age i = 1,..., m and year j = 1,..., n. E = (E ij ), expousure Θ = (θ ij ) and log θ = Ba a = (α, a 1, a 2), B = (1 : B a : B y ) B a, N n a, set of B-splines for age B y, N n y, set of B-splines for years â t+1 = (B W t B + P ) B W t z t P = blockdiag(0, P a, P y ); P a = λ a D ad a and P y = λ y D yd y are the penalty matrices for age and year

Computational issues 7

Computational issues 7 No need for backfitting closed form for H

Computational issues 7 No need for backfitting closed form for H Singular Matrix Ridge penalty Generalized inverse Use a different parametrisation

Computational issues 7 No need for backfitting closed form for H Singular Matrix Ridge penalty Generalized inverse Use a different parametrisation Number of parameters = ncol(b), much smaller than N

Computational issues 7 No need for backfitting closed form for H Singular Matrix Ridge penalty Generalized inverse Use a different parametrisation Number of parameters = ncol(b), much smaller than N Fast when N is large, not posible with cubic smoothing splines

Model 1 8 log(mu) -7.8-7.4-7.0-6.6 Age: 34 log(mu) -5.0-4.8-4.6-4.4-4.2 Age: 60 1950 1970 1990 1950 1970 1990 Year Year

2. Two dimensional smoothing with penalties 9

2. Two dimensional smoothing with penalties 9 Suppose log mortalities is a matrix of parameters: log Θ = A = (a 1,..., a n ), A = (a r 1,..., a r m) and impose a smoothness condition on each row and column of A: n l(a; Y ) 1 2 λ a a jd a D a a j 1 2 λ y j=1 l(a; y) 1 2 a (λ a P a + λ y P y )a m i=1 a r i D y D y a r i a = (a 1,..., a n), P a = I n D a D a, P y = D y D y I m. â t+1 = (W t + P ) 1 W t z t

Computational issues 10

Computational issues 10 Algorithm: Iterate between rows and columns Working variable to update the column estimates: Z = (z 1,..., z n ) = A + (Y M λ y AD y D y )/M. Updated estimate of a j, j = 1,..., n, is a j = (diag(µ j ) + λ a D a D a ) 1 diag(µ j )z j. Copes with the potential computational problems associated with twodimensional smoothing with large data sets.

Model 2 11 log(mu) -7.6-7.2-6.8-6.4 Age: 34 log(mu) -5.0-4.8-4.6-4.4-4.2 Age: 60 1950 1970 1990 1950 1970 1990 Year Year

3. Dimension reduction using P -splines 12

3. Dimension reduction using P -splines 12 B a, m n a, one-dimensional B-spline basis for smoothing by age for a single year B y, n n y, one-dimensional B-spline basis for smoothing by year for a single age Assume that log θ = Ba B = B y B a. Equivalent to Model 2 with a in matrix form: A = (a 1,..., a ny ), A = (a r 1,..., a r n a ). l(a; y) 1 2 a (λ a P a + λ y P y )a P a = I ny D a D a and P y = D y D y I na

Computational issues 13

Computational issues 13 bdeg = 0, n a = n, n y = m B = I nm and Model 2 = Model 3, but not possible to fit it.

Computational issues 13 bdeg = 0, n a = n, n y = m B = I nm and Model 2 = Model 3, but not possible to fit it. Matrix B is N n a n y storage problems. Solution:

Computational issues 13 bdeg = 0, n a = n, n y = m B = I nm and Model 2 = Model 3, but not possible to fit it. Matrix B is N n a n y storage problems. Solution: work with partitioned matrix B = [B 1, B 2, B 3 ] take advantaje of the banded nature of B

Model 3 14 log(mu) -7.6-7.2-6.8-6.4 Age: 34 log(mu) -5.0-4.8-4.6-4.4-4.2 Age: 60 1950 1970 1990 1950 1970 1990 Year Year

15-2 0-2 0 log(mu) log(mu) -10-8 -6-4 19971987 1977 1967 1957 Year 30 50 70 90 Age -8-6 -4 1997198719771967 1957 Year 30 50 70 90 Age -2 0 log(mu) -8-6 -4 1997198719771967 1957 Year 30 50 70 90 Age

Conclusions and future work 16

Conclusions and future work 16 P -splines are useful tool to model two-dimensional Poisson data Investigate a method for approximating the value of tr(h) in Model 2 Develope methods for dealing with over-dispersion Fit the models in the context of GLMM Comparison with age-period-cohort models

-8-6 -4-2 17 0 Z -10 50 40 30 Y 20 10 20 40 60 80 X