DEPARTMENT OF ACTUARIAL STUDIES RESEARCH PAPER SERIES

DEPARTMENT OF ACTUARIAL STUDIES RESEARCH PAPER SERIES State space models in actuarial science by Piet de Jong piet.dejong@mq.edu.au Research Paper No. 2005/02 July 2005 Division of Economic and Financial Studies Macquarie University Sydney NSW 2109 Australia

The Macquarie University Actuarial Studies Research Papers are written by members or affiliates of the Department of Actuarial Studies, Macquarie University. Although unrefereed, the papers are under the review and supervision of an editorial board. Editorial Board: Piet de Jong Leonie Tickle Copies of the research papers are available from the World Wide Web at: http://www.actuary.mq.edu.au/research_papers/index.html Views expressed in this paper are those of the author(s) and not necessarily those of the Department of Actuarial Studies.

State space models in actuarial science Piet de Jong Actuarial Studies, Macquarie University, piet.dejong@mq.edu.au Abstract The article discusses the usefulness of the state space model for analyzing insurance data. State space models can be viewed as as a framework to describe smoothness and purge data of noise. It is the smooth features in data which are critical for inference and extrapolation. We describe simple state models and apply them to two insurance data sets: one dealing with mortality, the other with insurance claims. The state space framework avoids the need to reinvent the wheel when dealing with the modelling, estimation, checking and projecting of smooth data features. 1 Introduction The notion of smoothness is common in actuarial or insurance data sets. Smooth features capture the more permanent features of the data. More permanent features form the basis for inference and extrapolation. State space models have been developed to analyze smoothness and capitalize on smooth features in data. The models work by imposing a sequential correlational structure so that points which are close in time or some other variable are more highly correlated than those far apart. Close correlation formalizes the idea of smoothness. Close correlation can often be effectively modelled using very few parameters. Figure 1 displays two insurance data sets used here to illustrate state space methods. The left panel displays observed yearly log mortality for Australian females (Booth, Maindonald, and Smith 2002) for the years 1921 through to 2000 and classified according to yearly age from 0 through to 100. The right panel is cumulative insurance liabilities (Mack 1991) for a portfolio of risks classified according to accident year and development year. The accident year indicates the year of the accident whereas the development year is the number of years since the accident. Each calendar year thus yields a new diagonal of cumulatives. The mortality data set is large, containing 80 101 = 80 080 data points. In contrast, the claims data set is small containing just 10+(10 9)/2 = 55 observations. Invited paper: Second Brazilian Conference on Statistical Modelling in Insurance and Finance, Maresias, August 28 - September 3, 2005. 1

log-mortality cumulative liabilities 0-4 -8 1920-12 1940 1960 year 1980 2000 60 80100 40 age 20 30 20 10 0 2 4 6 accident year 8 10 2 8 10 6 4 development year Figure 1: Left panel is log-mortality of Australian females 1921 2000. Right panel is cumulative insurance liabilities (in $1000 s) classified according to accident year and development year (number of years since accident). State space methods are effective on both large and small data sets: methods tend to be scalable. The aims in using a state space model are empirical: we wish to capture the more permanent features or signals in data and use these signals for inference and extrapolation. State space models have other names such as unobserved component, Kalman filter, variance component or varying coefficient regression models. The first of these is perhaps the most descriptive since it emphasizes that the data is structured in terms of simple unobserved sources of variation which often have concrete interpretations. For example in the time series setting unobserved components may correspond to level, trend and seasonal. Since each component has a concrete interpretation, it is more easily understood, thought about, communicated, modified and extrapolated. The Kalman filter model terminology arises from the fact that central to the computational treatment of these models is the Kalman filter algorithm (Kalman 1960). The algorithm is central to estimation, extrapolation, smoothing and diagnostic checking of these models. A good treatment of state space model and the Kalman filter algorithm is given in Harvey (1989). The varying coefficient terminology appeals to those who like to think in terms of regression: the models can be viewed as generalizing usual regression models by allowing regression coefficients to vary with time or other variable. Many insurance or insurance related data sets are time series and may contain serial correlation. The time series setting is the classic domain of the state space model. In this setting close correlation is called serial or autocorrelation. Forecasting is intimately tied to exploiting serial correlation (Harvey 1989). The time series setting is not the only area of application and no single view of state space models will do justice to all its ramifications and possible applications. The two applications below illustrate the flexibility of the framework. 2

2 Why state space models? Harvey (1989) has argued that state space models provide a convenient and powerful framework for analyzing sequential data. The framework is convenient because it contains many of the models often used in practice and hence makes them amenable to a uniform treatment without resort to specialized arguments tailored to specific cases. This uniform treatment is extends across: 1. Disciplined and unified approach to smoothness modelling. State space models can all be cast into the single set of equations set down in 6. These equations spell out all the key features of the smoothing problem. 2. Often useful interpretations. The unobserved components in a state space model often have useful if not important interpretations. Thus couching a model in terms of say level and trend is more than just a matter of modelling convenience: it aids interpretation and sensibility checks, as well as the communication of the results. 3. Unified approach to estimation. Estimation of parameters in the state space model is usually based on maximum likelihood or a closely related method. Evaluating the likelihood can be done using the Kalman filter (Harvey 1989). The Kalman filter is a recursive and scalable algorithm. 4. Extrapolation. Many insurance data sets require extrapolation, especially if the the data has a time dimension. The Kalman filter is again the calculating tool to effect extrapolations and associated standard errors provided the model is of the state space form. 5. Smoothing or signal extraction. The smoothing filter (De Jong 1989) uses output from the Kalman filter to effect in sample smoothing. The two algorithms together, called the Kalman Filter Smoother (KFS) can be used to calculate estimated smooth features or signals and associated standard errors. 6. Diagnostic checking. De Jong and Penzer (1998) have shown that output from the KFS is critical for diagnostic checking of models. Thus after smoothing or extrapolation the KFS output can be used to judge the adequacy of the employed models. 7. Platform for more complicated situations. There are nonlinear extensions of the state space model. For example the stochastic volatility model (Harvey, Ruiz, and Shephard 1994) is simple extension of a standard state space model. Thus the state space model can serve as platform for more complicated modelling. In such a structure the KFS still plays a critical role except that now it is augmented with further algorithms such as MCMC sampling (De Jong and Shephard 1995). 3

3 Simple state space models A simple example of a state space model is: y t = µ t + λɛ t, µ t+1 = µ t + η t, t = 1,..., n (1) Here y t is an observed sequence or series while all other variables are unobserved. The ɛ t and η t are noise processes, denoted ɛ t (0, σ 2 ) and η t (0, σ 2 ), meaning they have mean zero, variance σ 2 and are uncorrelated over time and with each other. The model consists of two equations. The first is called the measurement or observation equation, the second is the so-called state or transition equation. The above model is usually called the random walk observed with error. The parameter λ is sometimes called the noise to signal ratio and allows for different variances between the variance components ɛ t and η t. When λ becomes very large then the noise in the first equation dominates the noise arising in the second equation and for practical purposes the model corresponds to y t (µ, σ 2 ) where µ is constant. Model (1) is a basis for many useful generalizations. Regression effects can be added leading to y t = x tβ + µ t + λɛ t, µ t+1 = µ t + η t. (2) This is the usual regression model where the error term µ t +λɛ t is a random walk plus noise model. If λ is large then µ t is effectively constant and serves as an intercept term. Another generalization of (1) is where the dynamics are made more complicated. For example where y t = µ t + λɛ t, µ t+1 = µ t + δ t + η t, δ t+1 = δ t + 1.268η t, (3) ( ɛt η t ) (0, σ 2 I). In this case µ t and δ t are interpreted as the level and trend or slope respectively. Note that when the variance of η t approaches zero then µ t approaches a straight line with fixed slope. Thus the model generalized the simple straight line regression model. The generalization is to say that locally y t, as a function of t, is a straight line. The straight line is allowed to evolve slowly in both its intercept µ t and slope δ t. Model (3) is called the cubic spline model (Kohn and Ansley 1987). The coefficient 1.268 on the η t component in (3) invites comment. The above evolving straight line interpretation holds for all values of this coefficient, not just 1.268. However with 1.268 (De Jong and Mazzi 2001) the path of µ t is a cubic spline, as is the fitted µ t, which minimizes the penalized least squares criterion n n (y t µ t ) 2 + λ 2 t=1 0 (µ t ) 2 dt, (4) with respect to µ t, where µ t denotes the second derivative of µ t thought of as a continuous function of t and it is supposed there are n consecutive observations. In 4

practical settings λ is estimated from the data as discussed in xx. We illustrate this cubic spline fitting, and hence the application of the model (??) in $xx with Australian female mortality data presented in Figure 1. Cubic spline smoothing is closely related to Whittaker smoothing (Whittaker 1923) which minimizes the roughness of a curve (as measured through the sum of squares of the second differences) subject to fidelity (as measured through the sum of squares of differences between the interpolated rates and the observed rates.) Further generalizations are where y t contains a vector of observations. The data sets displayed in Figure 1 displays this feature. For the Australian female mortality data each vector y t of observations may be taken as log mortalities (for different ages) at each of the years between 1920 and 2000. For the claims data each vector of observations may be taken as the diagonal of cumulatives classified according accident year (or equivalently development year). The challenge is then to appropriately model the vector capturing relevant smoothness features inherent in the data. An example approach is given in the?). 4 State space models for monitoring and forecasting mortality Longevity has increased dramatically in most Western countries. What will be the age profile of populations in the decades ahead? This age profile has profound consequences on societies. This section displays the use of the cubic spline model (3) to smoothing the log mortality displayed in Figure 1. The aim of such smoothing is twofold. First, to estimate the level and trends in log mortality for each age at each point of time, called in sample prediction. Second, the aim is to provide a basis for projecting future mortality. For such projection the latest level and trend estimates are critical since future log mortality, under the cubic spline model, will be the current estimated level plus a the lead time times the trend. We therefore focus on generating latest level and trend estimates for each age. The approach taken below is a two step process consisting of first smoothing age specific log mortalities across time and then smoothing the first stage results across age. 4.1 First stage smoothing - smoothing across time Using the cubic spline model (3), the first stage smoothing derived from the KFS delivers smoothed level µ t and slope δ t estimates at each point of time t = 1,..., n. The smoothing results depend on the smoothing parameter λ which must be estimated or chosen. If λ = 0 then log mortality is assumed to follow a random walk with drift. Thus the latest log mortalities are paramount in forecasting future log mortality. In particular the latest log mortality is the current level estimate and the latest change in log mortality is the estimated current trend. 5

As λ increases from zero, more and and more weight is accorded to the previous observations when estimating both the level and trend. In particular when λ then the estimation is equivalent to fitting a straight line to the log mortalities at each age with δ n corresponding to the fitted slope and µ n the value of the straight line at the latest time point t = n. Thus with λ = 0 only the latest data counts whereas with λ all the historical data is counted equally. log-mortality smoothed log-mortality 0 0-4 -4-8 1920-12 1940 1960 year 1980 2000 60 80100 age 40 20-8 1920-12 1940 1960 year 1980 20000 60 80100 age 40 20 smoothing errors 2000 level & trend 20 40 60 80 100 1 0-1 1920 1940 1960 year 1980 20000 20 60 80100 40 age log--mortality -1-2 -3-4 -5-6 -7-8 -9-10 age Figure 2: First stage smoothing results for λ = 1000. The four panels in Figure 2 displays the first stage smoothing corresponding to smoothing parameter λ = 1000. This is a large value and hence almost equal weight is given to all the historical data in determining the latest levels and trends. The top left panel reproduces the raw log mortality data as displayed in Figure 1. The top right panel indicates the smoothed estimates derived with the cubic spline applied separately to each age. Note that each age s log mortality fit is almost linear. The bottom left panel indicates the smoothing errors, that is the difference between the raw data and the smoothed estimates. Note that there appears to be significant correlational structure in these errors suggesting the smoothing parameter λ does not permit sufficient responsiveness to the changing pattern of mortality. Finally the bottom right panel indicates the final trend and slope estimates µ n and 100 δ n at each each age. The bottom right panel of Figure 2 suggests that mortality at each age is reducing by about 2% per annum at each age, with bigger percentage drops at the very young ages and the middle ages. Also there appears to be sharp recent drop at the extreme 6

old ages. However these results depend on the given λ = 1000. In the next subsection we consider the sensitivity of the estimates to λ. 4.2 Sensitivity to the smoothing parameter Figure 3 displays the final trend estimates δ n across the ages for different λ values as well as 95% confidence intervals based on normality. The top left panel is is with λ = 1 and hence the slope is essentially estimated as the difference, across time, of the latest log mortalities. The wide confidence with λ = 1 suggests little precise information is available about the trend under this smoothing scenario. lambda=1 lambda=10 100 15 50 10 5 20 40 60 80 100-5 20 40 60 80 100-50 -10-15 -100-20 lambda=100 lambda=1000 2 1-1 -2-3 -4-5 -6-7 20 40 60 80 100-0.5-1 -1.5-2 -2.5-3 -3.5-4 -4.5 20 40 60 80 100 Figure 3: Trend estimates and confidence interval at each age for different λ. When λ increases from 1 the confidence intervals become progressively narrower. Thus there is increased apparent precision but with the possibility of bias in that the trend estimates responds lethargically to recent changes. When λ = 1000 the trend is essentially estimated giving equal weight to the entire history and the confidence interval corresponds to the usual confidence interval on the slope in a linear regression. 4.3 Second stage smoothing: smoothing across age The relative roughness of especially the trend estimates suggest a second stage of smoothing of the final estimates across age. The results of such smoothing, for each of the λ used in the across time smoothing, is displayed in Figure 4. 7

In Figure 4 the cubic spline model is again employed, but now across age. The smoothness depends on the smoothing parameter used in the across age smoothing. We use 500 s i where s i is the standard deviation of the final trend estimate at age i = 0,... 100. This indicates a generalization of the the cubic spline model (3) in that the smoothing parameter varies with age. Thus age specific trend estimates are weighted by their precision 1/s i. Each panel in Figure 4 indicates a curve of about equivalent roughness suggesting such precision weighted smoothing delivers comparable results across the different λ. The differences in the four panels of Figure 4 are due to lambda=1 lambda=10 0.15 0.1 0.05-0.05-0.1-0.15 20 40 60 80 100 0.02 0.01-0.01-0.02-0.03-0.04-0.05-0.06 20 40 60 80 100-0.005-0.01-0.015-0.02-0.025-0.03-0.035-0.04-0.045 lambda=100 20 40 60 80 100-0.005-0.01-0.015-0.02-0.025-0.03-0.035-0.04-0.045 lambda=1000 20 40 60 80 100 Figure 4: Trend estimates and confidence interval at each age for different λ after smoothing across age. the different weight given to the historical data. In the top left panel, only the recent data are used and these data suggest that teenagers are experiencing the best improvements in mortality with the mid 30 s and extreme old ages experiencing hardly any improvement. The confidence associated with these conclusions is somewhat weak with only the teenagers and the 60 to 80 year olds apparently having definitely improving mortality. The uncertainty arises because effectively, only the latest data is used to estimate the trend. The other extreme is displayed in the bottom right panel of Figure 4. Here all the historical data are used equally. Based on these data annual improvements in mortality of about 2.5% are suggested across the age range with less improvement for the late teens and as one gets older. Good improvements are apparent for the extreme old ages. The confidence intervals are now much tighter because effectively all the historical data is used, not just the more recent. 8

The top right and bottom left panels in Figure 4 are intermediate cases where more weight is given to the recent log-mortality data but the historical pattern is not entirely discounted. The conclusion that emerges is that the estimated trend in mortality depends on the stance one... 4.4 Possible extensions The above general approach to mortality smoothing can be extended and made more sophisticated in a variety of ways. The following are example extensions: 1. More complicated dynamics. The above has assumed a cubic spline structure. But this can be generalized to include more sophisticated dynamics or even higher order dynamics that is dynamics involving more than a level and trend. These extensions will usually lead to more parameters which need to be estimated. 2. Regressor variables. An example is where male mortality over the same period which increased dramatically during World War II. An intervention dummy variables can be used to model the departure from normal progress of mortality. Thus the first equation in (3) would change to the first equation in (1), with x t containing appropriately coded indicator variables. If the effect of diet or motor vehicle usage is of interest we could for example in x t measure diet or vehicle usage to gauge their effects. 3. Cross country comparisons. Mortality in say neighbouring European countries are likely to be closely related. This suggest for example a first stage cubic spline smoothing model that pools data from different countries with possible parameters modelling differences. 5 Claims reserving The forecasting of the liabilities of a casualty or general insurance companies is an important practical problem requiring care and attention. For example, recently in Australia, a major general insurer collapsed with unfunded liabilities of around $US 4bn. Present forecasting methods are rudimentary. This section describes the application of the state space methods to the problem. The data used to illustrate methods is displayed in the right panel of Figure 1. Define c ij as the cumulative payments with respect to accident year i through to development year j where i = 1,..., n and j = 0,..., i 1. Thus c ij c i,j 1 is the total actual payments made with respect to year i accidents in year i + j. The aim of claims reserving is to forecast claims the outstanding claims c i,n 1 c i,n i for each accident year i = 2,..., n 1. There is an extensive literature on claims reserving: see for example the bibliography in England and Verrall (2002) or Taylor (2000). References which take a time series or state space approach to claims reserving include De Jong and Zehnwirth (1983), Verrall (1989a), and Verrall (1989b). 9

The approach taken below differs from these previous approaches in that we focus on the development and assessment of a simple correlational model. 5.1 Hertig s model A formalization of the chain ladder method (Taylor 2000) of claims reserving often used by actuaries is to assume ( ) cij y ij ln = µ j + σ j ɛ ij, i = 1,..., n, j = 0,... n 1, (5) c i,j 1 where ɛ ij (0, 1). This model was first suggested by Hertig (1985). The change here from Hertig (1985) is that the model is also assumed to hold for j = 0 with c i, 1 1 implying y i0 ln c i0 (µ 0, σ 2 0). This small addition is important as seen below. Hertig s model (5) states that the percentage increases in the cumulative payments for each accident year i have mean and variance depending only on j, the development year. To estimate the µ j and σ j in (5), one calculates the sample mean ˆµ j and standard deviation ˆσ j of the y ij, i = 1,..., n j. Future percentage changes are then forecast as ˆµ j with standard deviation ˆσ j. Future cumulatives are forecast by adding estimated percentage changes and applying them to the latest observed cumulative c i,n i, i = 2,..., n. Hertig s model (5) imposes smoothness on the Figure 1 claims surface in two ways. First, it is assumed that claims evolve in the development year j direction according to a random walk in the logs. Second, the increments in the random walk have a common mean and variance depending on the development year. Small modifications to this setup yield further practically useful models as illustrated below. Both Hertig s model and the generalizations fit into the state space framework and hence are amenable to state space calculations. 5.2 Application of Hertig s model Figure 5 displays the results of applying Hertig s model (5) to the claims data displayed in Figure 1. The figure displays the estimated distribution of total forecast liabilities n n ( (ĉ i,n 1 c i,n i ) = c i,n i eˆµ n i+1 + +ˆµ n 1 1 ). (6) i=2 i=2 Thus estimated future growth rates ˆµ j are applied to each accident s year latest cumulative c i,n i to estimate the final cumulative c i,n 1. Total outstanding liabilities are then estimated by summing the expected change in the cumulatives over all accident years. The distribution is simulated by recognizing that the growth rates ĝ i ˆµ n i+1 + + ˆµ n 1, i = 2,..., n have an approximately multivariate Normal distribution centred on (Taylor 2000) mean vector (µ 2,..., µ n ) and known covariance matrix given (σ 2,..., σ n ). In the simulation the σ j are replaced by their estimates. The simulated liability distribution calculated from the claims data in Figure 1 and Hertig s model is displayed in Figure 5. The estimated liability distribution 10

$0 $50 $100 $150 $200 $250 $300 Thousands Figure 5: Liability distribution for the AFG data using basic model suggests there is reasonable probability of liabilities exceeding say $100,000 with the liability distribution skewed to the right. This initial conclusion is very much altered once the we allow for appropriate further correlation. 5.3 Smoothness constraints for Hertig s model The calculations leading up to the formation of (6) and the associated covariance matrix can be done by positing an appropriate state space model and using the Kalman filter. This would be of limited interest if not for the fact that the state space setup can handle many generalizations of Hertig s model without further computational complications. Also the framework automatically delivers estimation and diagnostic tools which can be used to assess fits and projections. This section points to one generalization of (5) displays the results of appropriate calculations. 45 40 35 30 25 20 15 10 5 0 0 2000 4000 6000 4 3.5 3 2.5 2 1.5 1 0.5 0 0 2 4 6 8 10 Figure 6: Relation between c i0 (horizontal axis) and c i1 /c i0 (vertical axis) for the claims data in the original (left panel) and log scale (right panel) 11

Figure 6 displays the feature that the residuals from the fitted Hertig s model for development year 0 and 1 and highly negatively correlated. That is a large cumulative in development year 0, relative to the mean, is almost certainly followed by a small increment in the cumulative the following year. Allowing for the correlation has a dramatic impact for the forecast liabilities relating to the final accident year. In particular y 10,0 is about 0.25 standard deviations above the mean and hence we expect the percentage increase y 10,1 to be about 0.25 standard deviations below the mean, with resulting reductions in the expected growth in the cumulative c 10,j, j = 1,..., 10. To model the correlation we modify Hertig s model (5) for i = 1,..., n 1 and j = 1 to y i1 = µ 1 + σ 1 (ɛ i1 + θɛ i0 ), i = 1,..., n, (7) where θ is a parameter modelling the correlation between y i0 and y i1. All other aspects of Hertig s model (5) remain unchanged. Hertig s model (5) modified to (7) is called the development correlation (DC) model. 5.4 Using the state space form for the development correlation model State space modelling for claims data was used by De Jong and Zehnwirth (1983) and more recently in Verrall (1989b). The application below illustrates a different approach using the development correlation model which can be cast into the state space framework. Hence all calculations can be performed using off the shelf technology and there is no need to reinvent the wheel. In particular The parameters σ j and θ can be estimated using maximum likelihood using the Kalman filter to evaluate the normal based likelihood. Thus the likelihood is evaluated for each of a range of σ j and θ values with those maximizing the likelihood chosen in the further analysis. Given the σ j and θ estimates, a Kalman filter run yields the µ j. A further backwards run with the smoothing filter is used to estimate the ɛ ij and associated standard deviations. The Kalman filter can be used to forecast future y ij including variances and covariances. In particular for each accident year i we forecast the total growth in the log cumulatives ĝ i ŷ i,n i + ŷ i,n i+1 +... + ŷ i,n 1, i = 2,..., n, and associated variances and covariances. Assuming the forecast growth rates are ĝ i are multivariate normally distributed, simulation is used to derive the distribution n ( c i,n i e ĝ i 1 ) i=2 12

A critical point is that all variances and covariances associated with the ĝ i are derived given the state space representation of the model and using the Kalman filter and there is no need to develop specialized expressions. Further the diagnostics associated with the fit: errors, error bounds, standard errors, and so on are automatically available. $0 $50 $100 $150 $200 Thousands Figure 7: Estimated liability distribution using the DC model Figure 7 and Table 1 display the estimated conditional distribution of (6) using the claims data of Figure 1 and DC model. Comparison with the results displayed in Figure 5 using Hertig s model displays very material differences. For example under the DC model the probability of claims exceeding $150,000 is negligible. Table 1 compares the DC estimates with those of the chain ladder (CL) estimates developed and presented by Mack (1994, p.130), and estimates presented by England and Verrall (2002, p.45) based on an over dispersed Poisson model with a Hoerl curve. Table 1: Comparison between forecast liabilities accident mean standard deviation year DC model CL Poisson DC model CL Poisson 2 155 154 243 146 206 486 3 643 617 885 375 623 984 4 1702 1636 2033 753 753 1589 5 2845 2747 3582 1271 1456 2216 6 3953 3649 3849 1462 2007 2301 7 5954 5435 5393 2290 2228 2873 8 12293 10907 11091 5463 5344 4686 9 12578 10650 10568 6747 6284 5563 10 22859 16339 17654 11551 24509 12801 Total 62982 52135 55297 16260 26909 17357 Table 1 emphasizes the material differences between estimates derived from differing methods. Here and below, I argue that for these claims data, the DC model 13

estimates, generated from using the state space methods, are appropriate. Initial insight into the relative merits of the methods is gained by examining the estimates. The CL method gives an estimate of the standard deviation of the prediction error for accident year 2 of 206 while the Poisson model gives 486. The data directly relevant to this estimate is the 0.92% or 172 growth in liabilities in accident year 1 between development years 8 and 9. Using the CL estimates and the arguably conservative normal approximation lead us to expect liabilities in accident year 2 to grow in excess of 154+1.65 206 = 500 with a probability of around 5% while the Poisson approach would give an even more surprising 5% limit of 1045. These conclusions seem at odds with the data. Furthermore consider the standard deviation of total liabilities under the CL method. The figure of 26 909 is only marginally higher the standard deviation associated with the estimated accident year 10 liability. This has the counter intuitive implication that the individual accident year liability estimates are at most marginally positively correlated and probably negatively correlated. These issues and the material differences between the estimates warrant further examination and discussion. 6 The general state space model The mortality and claims data sets were both analyzed using specific forms of the state space model. These specific forms are specialized versions of the following general model y t = X t β + Z t α t + G t ε t, α t+1 = W t β + T t α t + H t ε t, t = 1,..., n, (8) This general form covers a multitude of special cases. The advantage of the general form is that all questions of estimation, inference, diagnostics, forecasting, and so on, can be answered once and for all, without resort to specialized arguments tailored to specific instances. Furthermore, all programming can be done with a single set of routines written generally. For example all the core calculations above were performed using the same general routines based on (8). The only bookkeeping issue is how to cast a specific model into the general form (8) In this general form: The y t are the observations which may be vector. In the cubic spline model y t was taken is the log mortality at time t although in the across age second stage smoothing y t is the final trend at age t. In the DC model the vector of observations at time t is the vector of the latest increments in the log cumulatives. ε t (0, σ 2 I) is a noise process, possibly vector. Note that the noise manifests itself in both the equations, depending on the matrices G t and H t. Noise entering the second, state equation has a more permanent effect than that entering the first equation, which impacts only the current observation. The β vector is a vector of fixed effects. With the cubic spline model there is no fixed effect although these may be introduced to model, for example, 14

the effect of wars or diet in which case the X t matrix would carry appropriate regression variables, as in the usual regression context. In the DC model for claims reserving β contains the vector of means (µ 0,..., µ n ). The α t vectors are the unobserved state vectors. It is the state vectors which transmit the memory of the process. For the cubic spline model used with the mortality data, α t (µ t, δ t ). For the DC model the state vector is artificial in that it carries the noise term ɛ t0 for transmission to the next period s observations. The remaining matrices such as X t, Z t and so on are design matrices, again constructed bookkeeping fashion. The initial condition α 1 also plays a role and its detailed specification (for example as diffuse or fixed ) depends on specific assumptions. Given the general form where. In this specification y t is again observed and may be a vector of observations at each point t. The matrices X t and W t model regression effects. Note that regression effects may arise both in the the measurement and state equation. The vector α t is the state vector and captures all the 7 State space models issues It is sometimes argued that state space models are not sufficiently parametric. That is they do not explicitly address issues of causality or structure. For example for the claims data in Figure 1 it may be argued that a proper analysis should take into account all the causal factors that may explain claims arriving at different times after an accident. Such causal factors may include legislative changes, explicit policy by the insurance company to speed up claims, and so on. I will not argue that such causal explanations and associated parametric approaches are never appropriate. Indeed the state space model can accommodate such drivers if appropriate. However I do wish to argue that their utility is probably much less than often claimed. To make the argument as direct as possible consider the situation where one is modelling a stock price. The log of the stock price is usually modelled as a random walk analogous to (1), possibly augmented with a drift term to reflect an expected rate of return. When using such a model it is not argued that causal factors are not important. For example the day to day movements in the stock price can often be well explained by such things as earning announcements, boardroom rumors, new product announcements, economic conditions, and so on. However all such factors are evident ex post and often have no forecasting value. The random walk arises from assuming that on an ex ante basis the causal factors may be effectively summed up as behaving as a noise process. Thus the argument is not that causal factors are unimportant, but rather that their net effects are, when viewed as occurring in the future, unpredictable. Thus, after the fact, a particular earning downgrade announcement maybe a very good explanation for explaining the 15

movement in a stock price. However it does not follow we should this information in a forecasting model. References Booth, H., J. Maindonald, and L. Smith (2002). Applying Lee Carter under conditions of mortality decline. Population Studies 56, 325 336. De Jong, P. (1989). Smoothing and interpolation with the state-space model. Journal of the American Statistical Association 84 (408), 1085 1088. De Jong, P. and S. Mazzi (2001). Modelling and smoothing unequally spaced sequence data. Statistical Inference for Stochastic Processes 4 (1), 53 71. De Jong, P. and J. R. Penzer (1998). Diagnosing shocks in time series. Journal of the American Statistical Association 93 (442), 796 806. De Jong, P. and N. Shephard (1995). The simulation smoother for time series models. Biometrika 82, 339 350. De Jong, P. and B. Zehnwirth (1983). Claims reserving, state-space models and the Kalman filter. Journal of the Institute of Actuaries 110, 157 181. England, P. and R. Verrall (2002). Stochastic claims reserving in general insurance. Journal of the Institute of Actuaries 129, 1 76. Harvey, A. C. (1989). Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University Press. Harvey, A. C., E. Ruiz, and N. Shephard (1994). Multivariate stochastic variance models. Review of Economic Studies 61, 247 264. Hertig, J. (1985). A statistical approach to IBNR-reserves in marine reinsurance. Astin Bulletin 15 (2), 171 183. Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Journal of Basic Engineering, Transactions ASME D-82, 35 45. Kohn, R. and C. F. Ansley (1987). A new algorithm for spline smoothing based on smoothing a stochastic process. SIAM Journal of Scientific and Statistical Computation 8 (1), 33 48. Mack, T. (1991). A simple parametric model for rating automobile insurance or estimating IBNR claims reserves. ASTIN Bulletin 21, 93 109. Mack, T. (1994). Measuring the variability of chain ladder reserve estimates. Proceedings of the Casualty Actuarial Society Spring Forum, 101 182. Taylor, G. (2000). Loss Reserving: An Actuarial Perspective. Boston: Kluwer. Verrall, R. (1989a). Modelling claims run off triangles with two dimensional time series. Scandanavian Actuarial Journal. Verrall, R. (1989b). A state space representation of the chain ladder linear model. Journal of the Institute of Actuaries 116, 589 610. 16

Whittaker, E. T. (1923). A new method of graduation. Proceedings of the Edinburgh Mathematical Society 41, 63 75. 17