A Hierarchical Perspective on Lee-Carter Models

Size: px

Start display at page:

Download "A Hierarchical Perspective on Lee-Carter Models"

Priscilla Paul
5 years ago
Views:

1 A Hierarchical Perspective on Lee-Carter Models Paul Eilers Leiden University Medical Centre L-C Workshop, Edinburgh 24

2 The vantage point Previous presentation: Iain Currie led you upward From Glen Gumbel to Ben Spline Full, smooth, interaction model for log-mortality surface Let s get a bit arrogant Look back at Lee-Carter model(s) from this height L-C Workshop, Edinburgh 24 1

3 The full model An array of tensor products of B-splines: B a B t Think of an egg-carton for the tensor products Give them different heights (coefficients) and sum Coefficients in matrix A Fitted log-mortality rate: Z = B a AB t or Z = BA B for short z ij = α kl B ik B jl k l L-C Workshop, Edinburgh 24 2

4 Perspective view of full model Year L-C Workshop, Edinburgh 24 3

5 Image view of full model Females, log 1 mortality Year Year 3.8 L-C Workshop, Edinburgh 24 4

6 Illustrative data Netherlands 1 11 Years Males and females separately One-year intervals Source: L-C Workshop, Edinburgh 24 5

7 Additive components and interaction Array of coefficients A = [α kl ] ANOVA-like decomposition: α kl = α k + α l + γ kl Row means α k (for age) Column means α l (for time) Interaction γ kl L-C Workshop, Edinburgh 24 6

8 Dutch females, log-mortality surface 11 1 Females, log 1 mortality 11 1 Females, residual surface Year Year L-C Workshop, Edinburgh 24 7

9 Dutch females, age and year effects 4 Females, period component log 1 mortality Year Females, age component log 1 mortality L-C Workshop, Edinburgh 24 8

10 Dutch males, log-mortality surface 11 Males, log 1 mortality 11 Males, residual surface Year Year L-C Workshop, Edinburgh 24 9

11 Dutch males, age and year effects 3.8 Males, period component log 1 mortality Year Males, age component log 1 mortality L-C Workshop, Edinburgh 24 1

12 The -Period model Forget the interaction part Just take average per age And average over time (Period) Additive model, sum the two Too simple: same age distribution, scaled in time Interaction surface shows size of error Up to factor 2 (log 1 2 =.3) Also clear structure in interaction L-C Workshop, Edinburgh 24 11

13 Decomposition of the interaction Singular value decomposition of Γ = [γ kl ] is Γ = USV Diagonal S, singular values s 1 to s r on diagonal Rank r: minimum of # of rows and # of columns Both U and V orthonormal: U U = I, V V = I Optimal approximation property Best rank q approximation: ˆγ jk = q h=1 u ihs h v jh Variance decomposition h s 2 h = k l γ 2 kl L-C Workshop, Edinburgh 24 12

14 An alternative decomposition Apply singular value decomposition to Z = BA B Results are different Tensor products blurs orthogonality in SVD of A Number of nonzero singular values not changed Rank of A determines rank of Z L-C Workshop, Edinburgh 24 13

15 Singular values for Dutch data Table of s 2 q/ t s 2 t (% of variance explained) Females A Females Z Males A Males Z First singular vector gives good approximation Approximating Z is the better choice L-C Workshop, Edinburgh 24 14

16 Rank-one approximation Approximate interaction by one singular vector z kl = z k + z l + u k1 s 1 v l1 This already looks like Lee-Carter, but smoothness is assumed additional time series z l included L-C Workshop, Edinburgh 24 15

17 Rank-one approximation for females I 11 1 Females, interaction surface Females, rank one approximation Year L-C Workshop, Edinburgh 24 16

18 Rank-one approximation for females II 11 Females, interaction surface.3 Differences with rank one approximation Year L-C Workshop, Edinburgh 24 17

19 Quality of fit for females Data and rank one fit 195; Females Rank one Full model Data and rank one fit 1964; Females Rank one Full model Data and rank one fit 1979; Females Rank one Full model Data and rank one fit 1999; Females Rank one Full model L-C Workshop, Edinburgh 24 18

20 Rank-one approximation for males I Males, interaction surface.3 Males, rank one approximation Year L-C Workshop, Edinburgh 24 19

21 Rank-one approximation for males II 11 1 Males, interaction surface.3 Differences with rank one approximation Year L-C Workshop, Edinburgh 24 2

22 Quality of fit for males Data and rank one fit 195; Males Rank one Full model Data and rank one fit 1964; Males Rank one Full model Data and rank one fit 1979; Males Rank one Full model Data and rank one fit 1999; Males Rank one Full model L-C Workshop, Edinburgh 24 21

23 Higher rank approximation Use two or more singular vectors Improvement not impressive Similar idea proposed by Hyndman Slides on his website (Monash University) www-personal.buseco.monash.edu.au/ hyndman L-C Workshop, Edinburgh 24 22

24 Direct Poisson regression 2D smoothing followed by rank-one approximation is weird It resembles SVD of raw log-mortality Poisson regression more logical (Brouhns, Denuit & Vermunt) Combined with smoothing (Currie, Durban & Eilers) Better fit? Is the extra time series important (LC+)? (Implemented with alternating Poisson regression) L-C Workshop, Edinburgh 24 23

25 Rank-one approximation, again Data and rank one fit 195; Males Rank one Full model Data and rank one fit 1964; Males Rank one Full model Data and rank one fit 1979; Males Rank one Full model Data and rank one fit 1999; Males Rank one Full model L-C Workshop, Edinburgh 24 24

26 LC by smooth Poisson regression 35 Data and LC smooth fit 195; Males 35 Data and LC smooth fit 1964; Males Data and LC smooth fit 1979; Males 35 Data and LC smooth fit 1999; Males L-C Workshop, Edinburgh 24 25

27 LC+ by smooth Poisson regression 35 Data and LC smooth fit 195; Males 35 Data and LC smooth fit 1964; Males Data and LC smooth fit 1979; Males 35 Data and LC smooth fit 1999; Males L-C Workshop, Edinburgh 24 26

28 Computation LC model (age i, year j): z ij = α i + β i κ j Assume α, β and κ to be smooth Approach 1: model them all with penalized splines Approach 2: direct model (no bases) with penalties General principle: alternating Poisson regressions Efficient computation L-C Workshop, Edinburgh 24 27

29 Fitting with P-splines Assume α and β known B-spline model for κ: κ j = l b jl a l, or κ = Ba Expectations E(y ij ) = µ ij = x ij exp(z ij ) (exposures in X) Linear predictor z ij = α i + β i l b jl a l Poisson log-likelihood l = i j(y ij z ij e z ij ) Subtract roughness penalty λ l( 2 a l ) 2 /2 L-C Workshop, Edinburgh 24 28

30 Computational outline Stack columns of Y: y = vec(y) Stack columns of M = [µ ij ]: µ = vec(m) Weighting matrix W = diag(µ ) Compute large basis B = β B Improve approximation ã: (B WB + λd D)a = B (y µ + WB ã) Term λd D comes from penalty L-C Workshop, Edinburgh 24 29

31 Shortcuts Large matrix B (5 years, 1 ages, 2 B-splines) Full matrix W very large (5 by 5) (No problem in Matlab: sparse matrices) Use a shortcut: µ ij β 2 i b jk bjl = ( µ ij β 2 i ) b jk bjl i j j i Compute effective weights v j = i µ ij β 2 i and V = diag(v) Then B WB = B V B Similar trick for B y L-C Workshop, Edinburgh 24 3

32 Computation of α and β First standardize κ: mean, variance 1 Let α = B f and β = Bg Large basis B = [1 B : κ B] One vector contains α and β: c = [α : β ] Weighted regression as above, with vec(y ) and vec(m ) Similar shortcuts as for κ When η ij = α i + β j κ j + γ j (LC+) ditto for γ and κ L-C Workshop, Edinburgh 24 31

33 No bases Use degenerate B-spline basis: identity matrix But keep the penalties Shortcuts essential without sparse matrices Avoids mysterious B-splines Penalty allows automatic extrapolation of κ (γ) L-C Workshop, Edinburgh 24 32

34 Independence plots Jones and Koch (23) proposal for density estimators Plot mixed derivative of log density It is zero for (local) independence Follows from mini-cross-table Apply it to estimated log mortality: diff(diff(z ) ) For L-C: β i κ j + κ j β i For tensor model more complicated (cohort effects) L-C Workshop, Edinburgh 24 33

35 Independence plot for L-C model 11 Males, log mortality fit by LC 11 Independence plot x Year Year L-C Workshop, Edinburgh 24 34

36 Independence plot for tensor product model 11 Males, log 1 mortality Males, independence plot (λ = 1) 11 x Year Year L-C Workshop, Edinburgh 24 35

37 Discussion Interpretation of LC model as rank-one approximation The missing time-average It looks neat and elegant for interpretation Practical use limited Direct smooth Poisson regression better than SVD L-C Workshop, Edinburgh 24 36

38 Lee-Carter for counts Usually Lee-Carter is used for mortality rates Over a large age range If we limit age (to, say, over 7) It also fits well to counts Interesting alternative interpretation Normal distributions on a transformed time scale L-C Workshop, Edinburgh 24 37

39 Fit of smooth LC+ for counts 8 Females α.3 κ β 1 γ L-C Workshop, Edinburgh 24 38

40 Parameter curves of LC+ for counts Data and LC smooth fit (counts) 195; Females 35 Data and LC smooth fit (counts) 1964; Females Data and LC smooth fit (counts) 1979; Females 35 Data and LC smooth fit (counts) 1999; Females L-C Workshop, Edinburgh 24 39

41 Warping age to get normality Can we make the (scaled) age distributions normal? By only transforming age? Yes, we can: z ij = E(y ij ) = f (g(a i ) φ j ), with f (u) = exp( u 2 /2) Counts y ij at age a i, year t j Smooth nonparametric function g(.) Shift parameters φ Distributions scaled by their maximum Expanding the square and collecting terms gives LC+ L-C Workshop, Edinburgh 24 4

42 How do we do it in a simple way? Model warping function with B-splines g(a) = K k=1 B k (a)α k Use roughness penalty when fitting, minimize S = (y ij z ij ) 2 + λ ( 2 α k ) 2 i j k Linearize with Taylor expansion Nickname: SWaN, Shifted Warped Normal L-C Workshop, Edinburgh 24 41

43 SWaN for Dutch females 35 Distributions, NL, Females.5 Center shift φ 3 [deaths/year] Distributions, NL, Females 5 Transformed age 3 4 [deaths/year] Transformed age Transformed age L-C Workshop, Edinburgh 24 42

44 SWaN for Dutch males 3 Distributions, NL, Males.2 Center shift φ [deaths/year] Distributions, NL, Males 5 Transformed age 25 4 [deaths/year] Transformed age Transformed age L-C Workshop, Edinburgh 24 43

45 Improvements Scaled distributions and least squares not optimal Data are counts, assume Poisson distribution Generalized linear model, expected values µ ij η ij = log µ ij = γ j + f (g(a i ) β j ) Time series γ to catch the trend Taylor linearization, (penalized) scoring algorithm A kind of GLM with curious (i.e normal) link function f (.) Results not much different from least squares approach L-C Workshop, Edinburgh 24 44

46 Further research Remarkable: constant width (variance) of normal Will variable width give much improvement? How do transformations compare (gender, countries)? Penalties allow automatic extrapolation of β and γ. How good? Raw mortalities used, population size not involved! Why does it work so well? L-C Workshop, Edinburgh 24 45

Using P-splines to smooth two-dimensional Poisson data

1 Using P-splines to smooth two-dimensional Poisson data Maria Durbán 1, Iain Currie 2, Paul Eilers 3 17th IWSM, July 2002. 1 Dept. Statistics and Econometrics, Universidad Carlos III de Madrid, Spain.