Using P-splines to smooth two-dimensional Poisson data

Size: px

Start display at page:

Download "Using P-splines to smooth two-dimensional Poisson data"

Mercy Carson
6 years ago
Views:

1 1 Using P-splines to smooth two-dimensional Poisson data Maria Durbán 1, Iain Currie 2, Paul Eilers 3 17th IWSM, July Dept. Statistics and Econometrics, Universidad Carlos III de Madrid, Spain. 2 Dept. Actuarial Mathematics and Statistics, Heriot-Watt University, Edinburgh, UK 3 Department of Medical Statistics, Leiden University Medical Center, The Netherlands

2 What is this talk about? 2

3 What is this talk about? 2 Introduction The data P-splines Smoothing Poisson data with P-splines (one dimensional case)

4 What is this talk about? 2 Introduction The data P-splines Smoothing Poisson data with P-splines (one dimensional case) Several models for two-dimensional Poisson data. Generalized additive model Two-dimensional smoothing with penalties Dimension reduction using P-splines

5 What is this talk about? 2 Introduction The data P-splines Smoothing Poisson data with P-splines (one dimensional case) Several models for two-dimensional Poisson data. Generalized additive model Two-dimensional smoothing with penalties Dimension reduction using P-splines Dicuss computational issues for large data sets.

6 What is this talk about? 2 Introduction The data P-splines Smoothing Poisson data with P-splines (one dimensional case) Several models for two-dimensional Poisson data. Generalized additive model Two-dimensional smoothing with penalties Dimension reduction using P-splines Dicuss computational issues for large data sets. Analysis of mortality data.

7 The data 3 Male policyholders, source: Continuous Mortality Investigation Bureau (CMIB). For each calendar year ( ) and each age (11-100) we have: Number of years lived (the exposure). Number of policy claims (deaths). Mortality of male policyholders has improved rapidly over the last 30 years Model mortality trends overtime and dependence on age.

8 P-spline Use B-splines as the basis for the regression. Modify the log-likelihood by a difference penalty on the regression coefficients. y = f(x) + ɛ f(x) Ba S = (y Ba) (y Ba) + λa D Da â = (B B + λd D) 1 B y 4

9 P-spline Use B-splines as the basis for the regression. Modify the log-likelihood by a difference penalty on the regression coefficients. y = f(x) + ɛ f(x) Ba S = (y Ba) (y Ba) + λa D Da â = (B B + λd D) 1 B y 4 B-spline basis Scaled B-splines and their sum

10 Poisson data and P-splines, 1D-case 5

11 Poisson data and P-splines, 1D-case 5 E x = number of years lived aged x y x = number of deaths aged x Y x P (E x θ x ) η = log(θ x ) = Ba Maximise l(a; y x ) 1 2 λa D Da â t+1 = (B W t B + λd D) 1 B W t z t where z = η + W 1 (y µ) is the working variable and W = diag(µ) is the diagonal matrix of weights.

12 1. A generalized additive model 6

13 1. A generalized additive model 6 Y = (y ij ) matrix of deaths at age i = 1,..., m and year j = 1,..., n. E = (E ij ), expousure Θ = (θ ij ) and log θ = Ba a = (α, a 1, a 2), B = (1 : B a : B y ) B a, N n a, set of B-splines for age B y, N n y, set of B-splines for years

14 1. A generalized additive model 6 Y = (y ij ) matrix of deaths at age i = 1,..., m and year j = 1,..., n. E = (E ij ), expousure Θ = (θ ij ) and log θ = Ba a = (α, a 1, a 2), B = (1 : B a : B y ) B a, N n a, set of B-splines for age B y, N n y, set of B-splines for years â t+1 = (B W t B + P ) B W t z t P = blockdiag(0, P a, P y ); P a = λ a D ad a and P y = λ y D yd y are the penalty matrices for age and year

15 1. A generalized additive model 6 Y = (y ij ) matrix of deaths at age i = 1,..., m and year j = 1,..., n. E = (E ij ), expousure Θ = (θ ij ) and log θ = Ba a = (α, a 1, a 2), B = (1 : B a : B y ) B a, N n a, set of B-splines for age B y, N n y, set of B-splines for years â t+1 = (B W t B + P ) B W t z t P = blockdiag(0, P a, P y ); P a = λ a D ad a and P y = λ y D yd y are the penalty matrices for age and year Smoothing parameter selection dev(y; a, λ a, λ y ) + δ tr(h) δ=2 AIC δ = log(n) BIC

16 Computational issues 7

17 Computational issues 7 No need for backfitting closed form for H

18 Computational issues 7 No need for backfitting closed form for H Singular Matrix Ridge penalty Generalized inverse Use a different parametrisation

19 Computational issues 7 No need for backfitting closed form for H Singular Matrix Ridge penalty Generalized inverse Use a different parametrisation Number of parameters = ncol(b), much smaller than N

20 Computational issues 7 No need for backfitting closed form for H Singular Matrix Ridge penalty Generalized inverse Use a different parametrisation Number of parameters = ncol(b), much smaller than N Fast when N is large, not posible with cubic smoothing splines

21 Model 1 8 log(mu) Age: 34 log(mu) Age: Year Year

22 2. Two dimensional smoothing with penalties 9

23 2. Two dimensional smoothing with penalties 9 Suppose log mortalities is a matrix of parameters: log Θ = A = (a 1,..., a n ), A = (a r 1,..., a r m) and impose a smoothness condition on each row and column of A:

24 2. Two dimensional smoothing with penalties 9 Suppose log mortalities is a matrix of parameters: log Θ = A = (a 1,..., a n ), A = (a r 1,..., a r m) and impose a smoothness condition on each row and column of A: n l(a; Y ) 1 2 λ a a jd a D a a j 1 2 λ y j=1 l(a; y) 1 2 a (λ a P a + λ y P y )a m i=1 a r i D y D y a r i a = (a 1,..., a n), P a = I n D a D a, P y = D y D y I m. â t+1 = (W t + P ) 1 W t z t

25 Computational issues 10

26 Computational issues 10 Algorithm: Iterate between rows and columns Working variable to update the column estimates: Z = (z 1,..., z n ) = A + (Y M λ y AD y D y )/M. Updated estimate of a j, j = 1,..., n, is a j = (diag(µ j ) + λ a D a D a ) 1 diag(µ j )z j.

27 Computational issues 10 Algorithm: Iterate between rows and columns Working variable to update the column estimates: Z = (z 1,..., z n ) = A + (Y M λ y AD y D y )/M. Updated estimate of a j, j = 1,..., n, is a j = (diag(µ j ) + λ a D a D a ) 1 diag(µ j )z j. Copes with the potential computational problems associated with twodimensional smoothing with large data sets.

28 Computational issues 10 Algorithm: Iterate between rows and columns Working variable to update the column estimates: Z = (z 1,..., z n ) = A + (Y M λ y AD y D y )/M. Updated estimate of a j, j = 1,..., n, is a j = (diag(µ j ) + λ a D a D a ) 1 diag(µ j )z j. Copes with the potential computational problems associated with twodimensional smoothing with large data sets. Problem: tr(h) cannot be calculated AIC, BIC cannot be computed

29 Model 2 11 log(mu) Age: 34 log(mu) Age: Year Year

30 3. Dimension reduction using P -splines 12

31 3. Dimension reduction using P -splines 12 B a, m n a, one-dimensional B-spline basis for smoothing by age for a single year B y, n n y, one-dimensional B-spline basis for smoothing by year for a single age Assume that log θ = Ba B = B y B a. Equivalent to Model 2 with a in matrix form: A = (a 1,..., a ny ), A = (a r 1,..., a r n a ). l(a; y) 1 2 a (λ a P a + λ y P y )a P a = I ny D a D a and P y = D y D y I na

32 Computational issues 13

33 Computational issues 13 bdeg = 0, n a = n, n y = m B = I nm and Model 2 = Model 3, but not possible to fit it.

34 Computational issues 13 bdeg = 0, n a = n, n y = m B = I nm and Model 2 = Model 3, but not possible to fit it. Matrix B is N n a n y storage problems. Solution:

35 Computational issues 13 bdeg = 0, n a = n, n y = m B = I nm and Model 2 = Model 3, but not possible to fit it. Matrix B is N n a n y storage problems. Solution: work with partitioned matrix B = [B 1, B 2, B 3 ] take advantaje of the banded nature of B

36 Model 3 14 log(mu) Age: 34 log(mu) Age: Year Year

37 log(mu) log(mu) Year Age Year Age -2 0 log(mu) Year Age

38 Conclusions and future work 16

39 Conclusions and future work 16 P -splines are useful tool to model two-dimensional Poisson data Investigate a method for approximating the value of tr(h) in Model 2 Develope methods for dealing with over-dispersion Fit the models in the context of GLMM Comparison with age-period-cohort models

40 Z Y X

GLAM An Introduction to Array Methods in Statistics

GLAM An Introduction to Array Methods in Statistics Iain Currie Heriot Watt University GLAM A Generalized Linear Array Model is a low-storage, high-speed, method for multidimensional smoothing, when data