INSTITUTE AND FACULTY OF ACTUARIES Curriculum 09 SPECIMEN SOLUTIONS Subject CSA Risk Modelling and Survival Analysis Institute and Faculty of Actuaries
Sample path A continuous time, discrete state process +½ Sample path B discrete time, continuous state process +½ Sample path C discrete time, discrete state process +½ Sample path D continuous time, continuous state process +½ In a proportional hazards model the hazard of experiencing an event may be factorised into two components: + one depending only on duration since some start event, +½ which is known as the baseline hazard, +½ and the other depending only on a set of covariates and associated parameters. +½ The hazards for any two lives are therefore in the same proportion at all durations. + [max. 3] 3 Advantages The method takes into account the evolution of age patterns of mortality over time (it is a two-factor model) +½ It has been programmed in several statistical packages. +½ It is stochastic, which allows users to assess the degree of uncertainty in parameter estimates, and the extent of random error in the mortality forecasts. +½ It is simple to understand. +½ It can easily be extended and adapted in various ways to suit particular contexts, for example by smoothing the age patterns of mortality using penalised regression splines. + Disadvantages It is calibrated using past data, which may reflect particular period events in the recent past. Unless care is used, these events could unduly influence mortality forecasts. + As mortality rates for each age are forecast independently, it can produce Implausible results, such as mortality rates which are forecast to decrease with age. +½ Unless observed rates are used for the forecasting, it can produce jump-off effects. +½ Page
It assumes that the ratio of the rates of mortality change at different ages remains constant over time, when there is empirical evidence that this is not so. +½ [max. 4] 4 (i) (a) To protect itself from the risk of large claims. + (b) Excess of loss reinsurance where the reinsurer pays any amount of a claim above the retention. + Proportional reinsurance where the reinsurer pays a fixed proportion of any claim. + [3] (ii) We must first find the parameters and λ of the Pareto distribution. λ 70 and λ ( ) ( ) 340 λ ( ) 340 so 340 70.58573388 so.58573388.58573388 5.445 and λ 70 4.445 9.90375 We need to find M such that P(X > M) 0.05 i.e. λ λ+ M 0.05 λ λ+ M λ 0.05 0.05 ( λ+ M )
M λ 0.05 0.05 9.90375 0.05 0.05 5.445 5.445 880.8 [4] [Total 7] 5 (i) The conventional approach in machine learning is to divide the data into two. One part of the data (usually the majority) is used to train the algorithm to choose the best hypothesis. The other is used to test the chosen hypothesis on data that the algorithm has not seen before. + In practice, the training data is often split into a part used to estimate the parameters of the model, and a part used to validate the model. + It involves three data sets: a training data set: the sample of data used to fit the model +½ a validation data set: the sample of data used to provide an unbiased evaluation of model fit on the training dataset while tuning model hyperparameters +½ a test data set: the sample of data used to provide an unbiased evaluation of the final model fit on the training data set. +½ Typically, the training data set will use around 60% of the data available, the validation data set about 0% and the test data set about 0% +½ but these percentages may vary, and a key criterion in dividing the data is is that the validation and test data sets should have enough data to be fit for purpose. +½ [max. 4] (ii) The number of covariates which may be used to model risk prices could be vary large. +½ Using conventional approaches to finding the best model (the best combination of covariates) can be very time consuming and computationally infeasible. +½ Page 4
Penalised regression constrains the sum of the estimated parameter values to be less than a certain value. We exact a penalty which may be written λp( β... β J ), where β... βj are the parameters P() is some increasing function. +½ Need sufficient data to find best choice of λ (done e.g. through crossvalidation). +½ Advantages for risk pricing: model building is automated no debates about borderline decisions quicker speed to market models can be continuously updated as new data appear. + Disadvantages: data sets can be very large. throwing all your data at penalised regression does not work. data are not equally relevant (e.g. should we use three years data or just two years data?) + Other relevant points could score credit. [max. 4] [Total 8] 6 (i) µ is the hazard for a man who does not drink beer. + β measures the impact on the hazard of a one glass increase in the daily amount of beer drunk. + (ii) The hazard for a man who drinks two glasses of beer a day is 0.03exp(0.* ) 0.0448. + (iii) (a) The probability that a man aged 60 years who drinks three glasses of beer a day will survive to his 70th birthday is 0 exp 0.03exp(0.*3) dt exp( 0.547) 0.579. + 0 (b) Because the hazard of death is constant, the expectation of life at age 60 years is given by 8.3 years. + ht ( ) µ exp( βx) 0.03exp(0.*3)
(iv) The owner s total revenue R is proportional to the average number of glasses of beer drunk per day multiplied by the man's expectation of life: x R. + 0.03exp(0. x) We maximise R with respect to x. dr 0.03exp(0. x) x(0.03*0.) exp(0. x) 0.x dx. + [0.03exp(0. x)] 0.03exp(0. x) This is zero when 0.x 0, or when x 5. THEN EITHER d R 0.[0.03exp(0. x)] ( 0. x)[0.03*0. exp(0. x)] dx [0.03exp(0. x)] 0. 0.( 0. x) 0.(0.x ) [0.03exp(0. x)] [0.03exp(0. x)] which is negative when x 5, so we have a maximum. + [4] OR When x is 0, total revenue is 0. When x is strictly positive, total revenue is always greater than zero. Therefore we must have a maximum. So the owner should sell the man five glasses of beer per day. + [4] Various alternatives can be awarded credit, e.g. using the logarithm of the likelihood, or a demonstration by calculating predicted revenue for x 0,,, 3 that x 5 marks a maximum. [Total 9] Page 6
7 (i) ALTERNATIVE If the probabilities do not depend solely upon the length of the time interval t-s, the process is time-inhomogeneous. + OR ALTERNATIVE The transition rates vary with time. + OR ALTERNATIVE 3 The transition rates depend upon the start and end times. + [] (ii) A model with time-inhomogeneous rates has more parameters, and there may not be sufficient data available to estimate these parameters. + Also, the solutions to Kolmogorov s equations may not be easy (or even possible) to find analytically. + Time-inhomogeneous processes are computationally harder to simulate. + [max. ] (iii) We need P GG t. (t), i.e. probability the process remains in G throughout time 0 to This satisfies d P GG () t dt Hence PGG () t d dt ( 0. 0.04t) P GG (t). + P GG (t) ( 0. 0.04t); d dt [ln P GG (t)] ( 0. 0.04t). + Integrate both sides: s t s t GG s 0 s 0 ln P ( s) 0.s 0.0s. P GG (0),
so P GG (t) exp( 0.t 0.0t ). + [3] (iv) Occurs when P GG (t) 0.5, so we have 0.5 exp( 0.t 0.0t ), 0.0t + 0.t 0.6935 0, + and solving using the quadratic equation formula produces t.74 or t.74. The answer lies between 0 and 8, so we require t.74. + d (v) P G () t dt ( 0. 0.04t) P G (t) + (0.4 0.04t) P N (t) + But P N (t) P G (t), d So P G () t ( 0. 0.04t) P G (t) + (0.4 0.04t) ( P G (t)), + dt OR d P G () t 0.4 0.04t 0.6P G (t). + dt [max. ] [Total 0] 8 (i) We can deduce that t t Y at + e + i i and so E( Yt ) at and Var( Yt ) tσ. + Since these expressions depend on t the process is not stationary. + [3] (ii) As s < t we have t t s t s Cov( Y, Y ) Cov( at + e, as + e ) Var( e ) ( t s) σ t t s i j j i j j +½ Page 8
which is linear in s as required. +½ (iii) First, note that the differenced series: X Y Y a+ e t t t t is essentially a white noise process. So estimates of a and σ can be found by constructing the sample differences series xi yi yi for i,,, n and taking the mean and sample variance(or its square for estimating ) respectively. [3] (iv) In this case yˆ () aˆ+ y + 0 aˆ+ y + n n n and yˆn() aˆ+ yˆn() + 0 aˆ+ yn + [Total 0] 9 (i) The force of mortality, µ x+ tat age x + t is defined by the expression µ x+ t lim Pr[ T dt 0 dt x+ t+ dt T > x+ t] + [] µ 5 5p0 e (ii) + [] (iii) p 5λ e 5 5 p 5 5 e 5λ p But 5 5 5 5 p + Hence e 5λ 5λ e 5λ loge 5λ 0.693 5λ + so that 0.0693 λ. + [3]
(iv) If λ µ 0.0693, then e 0 4.43years. µ 0.0693 + [] (v) If λ µ, then 0 5 µ s 5µ λs. e e ds + e e ds 0 0 Evaluating the integrals gives 5 µ s 5µ λs e0 e + e e µ 0 λ µ 5 µ 5 e + + e 0 + µ µ λ 5 5 e µ ( e µ + ), λ µ and putting λ 0.0693 gives 0 µ 5 ( µ 5 ) e0 e + e 0.0693 µ µ 5 µ 5 4.43 e + ( e ). µ OR or 5 e µ 4.43 +. µ µ If λ µ, then e0 t p µ dt. t x 0 5 µ t 5 µ λ t 5 e te µ dt + e te ( ) λdt 0 0 5 Integrating by parts 5 µ t 5( µ λ) λt t e dt + e t e dt t t 0 5 5 5 µ t µ t 5 µ λ λt λt ( ) te + e dt + e te + e dt 0 5 0 5 Page 0
5 λt 5µ µ t 5 5 ( µ λ) t e λ e + e e te + + µ 5 0 λ 5 5 5 5 5( ) 5 5 e µ µ λ µ λ e e e ( 0 5e ) λ + + + + 0+ µ µ λ µ 5 5 e µ µ 5 5e + + + e 5+ µ µ λ 5 5 e µ ( e µ + ), λ µ µ 5 µ 5 4.43 e + ( e ). µ [4] [Total 0] 0 (i) The lag polynomial here is 0.6L 0.6L ( 0.8L)( + 0.L) with roots.5 and 5 therefore it is stationary. + Hence an ARMA(,0) process. + (ii) From the stationarity condition then E(Y t ) µ + 0.6µ + 0.6µ + 0 µ 0.76 0.4 (iii) From the Yule-Walker equations for autocorrelation function values and for lags,, 3, we have that ρ k 0.6ρ k + 0.6ρ k. In particular, for k, or ρ ρ 0.6ρ 0 + 0.6ρ 0.6 + 0.6ρ For k we have that For k 3 0.6 0.6 0.6 0.84 0.743 5/7 + 0.6 ρ 0.6ρ + 0.6ρ 0 0.6ρ + 0.6 0.84 + 0.6 0.5886 03/75. +
For k 4 ρ 3 0.6ρ + 0.6ρ 0.6 * 0.588574 + 0.6 * 0.74857 0.4674 409/875 + ρ 4 0.6ρ 3 + 0.6ρ 0.6 * 0.467486 + 0.6 * 0.588474 0.3746 639/4375 + For the partial autocorrelation function we have that ψ ρ 0.743 5/7 + ψ ρ ρ ρ 0.6 4/5 + and ψ 3 ψ 4 0 since Y t is AR(). + [7] [Total ] (i) Formula for the Gumbel copula function The inverse of x ( y) ( ln y) ψ is y ( x) exp ( x ) ψ. + So the copula function is: ( ) ( ) Cuv (, ) ψ ψ( u) + ψ ( v) exp ( ln u) + ( ln v) + (ii) Lower tail dependence Using the formula given and the copula function from part (i): Page
λ L u 0+ ( u u ) Cuu (, ) lim lim exp ( ln ) + ( ln ) u 0+ u u 0+ u ( u ) lim exp ( ln ) u 0+ u lim exp ( ln ) u 0+ u lim exp ln u 0+ u lim u u 0+ u lim u { u } ( u ) +½ Since < <, we know that 0< < and 0< <. So λ L 0. +½ (iii) Extreme values (a) In this context the extreme values are the claims in the right-hand tail of the claims distribution. +½ These claims have a low probability but are for large amounts. +½ (b) Because the payments for these claims are large, they can have a significant financial impact on the insurance company. +½ The probabilities for the right-hand tail cannot be calculated accurately using the usual techniques used for the main body of the distribution. +½ (iv) Probability that the individual claims exceed 0 million Using the distribution functions give: 0.5 PX ( > 0) FX (0) exp exp 5 e.5 e 0.8000 0.000 +
0 5 PY ( > 0) FY ( x) exp exp 7.5 e 0.6667 e 0.5984 0.406 + (v) Probability that both claims exceed 0 million (a) using the product copula The product copula implies that X and Y are independent. +½ So with this copula: PX ( > 0, Y> 0) PX ( > 0) PY ( > 0) 0.000 0.406 0.0803 +½ (b) using the Gumbel copula With the Gumbel copula we have: u PX ( 0) 0.8000 and v PY ( 0) 0.5984. So ( ) 0.5 PX ( 0, Y 0) exp ( ln 0.8000) + ( ln 0.5984) 0.573 + Using the hint given: PX ( > 0, Y> 0) PX ( 0) PY ( 0) + PX ( 0, Y 0) 0.8000 0.5984 + 0.573 0.79 + (vi) Comment With the Gumbel copula the probability that both claims will exceed 0 million (7.9%) is more than twice what it would be if the claims were independent (8.03%). +½ So, if the insurance company treated the claims from these two policies as independent, it would significantly underestimate the risk. +½ [Total ] Page 4
(i) 3β β 0% β β β β β 0% β 40% β β β (ii) We require each row of the transition matrix to sum to. +½ Here this holds for all values of β. +½ We require each of the following to lie between 0 and inclusive: β, β, β, 3 β, β β. The first two require that 0 β. +½ The third requires that 0 β /. +½ The fourth that β, β 0. +½ 3 The fifth impliesβ 5 as the negative root is not viable. + So overall 0 β. 3 +½ [max. 3] (iii) If β> 0 then it can reach any other state (so it is irreducible) + and it has a loop on each state (so it is aperiodic). +½ However if β 0 it can never leave its current state so it is reducible. +½
(iv) In this case the matrix is P 0.89 0. 0.0 0. 0.7 0. 0.0 0. 0.89 +½ The stationary distribution satisfies We have: ππ P. +½ π 0.89π + 0.π + 0.0π 0 0 0 40 π 0.π + 0.7π + 0.π 0 0 0 40 π 0.0π + 0.π + 0.89π 40 0 0 40 +½ and π 0 +π 0 +π 40. +½ The first and third equations give 0.π 0.π 0.0π 0.0π 0.3 π 40 π 0.769π0 0.3 0.+ 0.769 π 0 π 0 0.93π 0, 0.3 40 0 0 40 so ( + 0.93+.769) π 0, and hence + 3 π 0 0.7 48 π 0 0.5 4 3 π 40 0.479 48 are the long term proportion of taxpayers at each marginal rate. + [4] (v) Looking for the rates two years later, these are given by P, which is 0.89 0. 0.0 0.89 0. 0.0 0.80 0.6 0.0378 0. 0.7 0.. 0. 0.7 0. 0.6 0.5 0.39 0.0 0. 0.89 0.0 0. 0.89 0.078 0.6 0.8 Page 6
So the required probabilities are: (a) 0.6 (b) 0.5 (c) 0.39. [Total 3] END OF MARKING SCHEDULE