INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN SOLUTIONS

Similar documents
Course 1 Solutions November 2001 Exams

Course 4 Solutions November 2001 Exams

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

Econ 424 Time Series Concepts

Practice Exam #1 CAS Exam 3L

Discrete time processes

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Lifetime Dependence Modelling using a Generalized Multivariate Pareto Distribution

Exam C Solutions Spring 2005

November 2000 Course 1. Society of Actuaries/Casualty Actuarial Society

Lecture 2: Univariate Time Series

Notes for Math 324, Part 19

Solutions of the Financial Risk Management Examination

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation

Chapter 2: Unit Roots

TIME SERIES ANALYSIS. Forecasting and Control. Wiley. Fifth Edition GWILYM M. JENKINS GEORGE E. P. BOX GREGORY C. REINSEL GRETA M.

ASM Study Manual for Exam P, First Edition By Dr. Krzysztof M. Ostaszewski, FSA, CFA, MAAA Errata

A Data-Driven Model for Software Reliability Prediction

Volatility. Gerald P. Dwyer. February Clemson University

THE ROYAL STATISTICAL SOCIETY 2009 EXAMINATIONS SOLUTIONS GRADUATE DIPLOMA MODULAR FORMAT MODULE 3 STOCHASTIC PROCESSES AND TIME SERIES

Econometrics of financial markets, -solutions to seminar 1. Problem 1

Tail negative dependence and its applications for aggregate loss modeling

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:

P 1.5 X 4.5 / X 2 and (iii) The smallest value of n for

Some Time-Series Models

Modules 1-2 are background; they are the same for regression analysis and time series.

Permanent Income Hypothesis (PIH) Instructor: Dmytro Hryshko

Circle a single answer for each multiple choice question. Your choice should be made clearly.

3 Theory of stationary random processes

CS 195-5: Machine Learning Problem Set 1

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis

Midterm, Fall 2003

MLC Fall 2015 Written Answer Questions. Model Solutions

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

Solutions to the Fall 2018 CAS Exam MAS-1

Time Series Solutions HT 2009

Quantile-quantile plots and the method of peaksover-threshold

Train the model with a subset of the data. Test the model on the remaining data (the validation set) What data to choose for training vs. test?

Chapter 4: Models for Stationary Time Series

Reserving for multiple excess layers

Covariance Stationary Time Series. Example: Independent White Noise (IWN(0,σ 2 )) Y t = ε t, ε t iid N(0,σ 2 )

Regression of Time Series

ECONOMETRICS HONOR S EXAM REVIEW SESSION

Heteroskedasticity in Time Series

Solution of the Financial Risk Management Examination

Solutions to the Spring 2015 CAS Exam ST

Switching Regime Estimation

Notes for Math 324, Part 17

Empirical Market Microstructure Analysis (EMMA)

Time Series 2. Robert Almgren. Sept. 21, 2009

Introduction to Bayesian Inference

Cheng Soon Ong & Christian Walder. Canberra February June 2018

On Bias, Variance, 0/1-Loss, and the Curse-of-Dimensionality. Weiqiang Dong

Currie, Iain Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh EH14 4AS, UK

Multivariate Time Series: VAR(p) Processes and Models

SPRING 2007 EXAM C SOLUTIONS

STAT 479: Short Term Actuarial Models

Exercises. (a) Prove that m(t) =

Applying the proportional hazard premium calculation principle

ASM Study Manual for Exam P, Second Edition By Dr. Krzysztof M. Ostaszewski, FSA, CFA, MAAA Errata

Multivariate Time Series Analysis and Its Applications [Tsay (2005), chapter 8]

Gaussian with mean ( µ ) and standard deviation ( σ)

Stochastic Modelling Solutions to Exercises on Time Series

Introduction to Algorithmic Trading Strategies Lecture 10

Estimation for Modified Data

Name of the Student: Problems on Discrete & Continuous R.Vs

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University

7. Integrated Processes

Gaussian processes for inference in stochastic differential equations

Department of Agricultural Economics. PhD Qualifier Examination. May 2009

Yuxin Zhang. September 21st, 2018

Characterizing Forecast Uncertainty Prediction Intervals. The estimated AR (and VAR) models generate point forecasts of y t+s, y ˆ

CAS Exam MAS-1 Practice Exam #1

Problem Set 1 Solution Sketches Time Series Analysis Spring 2010

SOME BASICS OF TIME-SERIES ANALYSIS

Nonparametric Bayesian Methods - Lecture I

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

STAT 520: Forecasting and Time Series. David B. Hitchcock University of South Carolina Department of Statistics

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Statistics & Data Sciences: First Year Prelim Exam May 2018

Causal Inference with Big Data Sets

If we want to analyze experimental or simulated data we might encounter the following tasks:

Overfitting, Bias / Variance Analysis

7. Integrated Processes

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question

Chapter 3 - Temporal processes

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences

For a stochastic process {Y t : t = 0, ±1, ±2, ±3, }, the mean function is defined by (2.2.1) ± 2..., γ t,

Ch 6. Model Specification. Time Series Analysis

Quantitative Trendspotting. Rex Yuxing Du and Wagner A. Kamakura. Web Appendix A Inferring and Projecting the Latent Dynamic Factors

Lesson 2: Analysis of time series

Generalized Autoregressive Score Models

On Consistency of Tests for Stationarity in Autoregressive and Moving Average Models of Different Orders

1 Linear Difference Equations

Solutions to the Spring 2018 CAS Exam MAS-1

Kernel Logistic Regression and the Import Vector Machine

Econometrics Summary Algebraic and Statistical Preliminaries

Introduction to ARMA and GARCH processes

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

Transcription:

INSTITUTE AND FACULTY OF ACTUARIES Curriculum 09 SPECIMEN SOLUTIONS Subject CSA Risk Modelling and Survival Analysis Institute and Faculty of Actuaries

Sample path A continuous time, discrete state process +½ Sample path B discrete time, continuous state process +½ Sample path C discrete time, discrete state process +½ Sample path D continuous time, continuous state process +½ In a proportional hazards model the hazard of experiencing an event may be factorised into two components: + one depending only on duration since some start event, +½ which is known as the baseline hazard, +½ and the other depending only on a set of covariates and associated parameters. +½ The hazards for any two lives are therefore in the same proportion at all durations. + [max. 3] 3 Advantages The method takes into account the evolution of age patterns of mortality over time (it is a two-factor model) +½ It has been programmed in several statistical packages. +½ It is stochastic, which allows users to assess the degree of uncertainty in parameter estimates, and the extent of random error in the mortality forecasts. +½ It is simple to understand. +½ It can easily be extended and adapted in various ways to suit particular contexts, for example by smoothing the age patterns of mortality using penalised regression splines. + Disadvantages It is calibrated using past data, which may reflect particular period events in the recent past. Unless care is used, these events could unduly influence mortality forecasts. + As mortality rates for each age are forecast independently, it can produce Implausible results, such as mortality rates which are forecast to decrease with age. +½ Unless observed rates are used for the forecasting, it can produce jump-off effects. +½ Page

It assumes that the ratio of the rates of mortality change at different ages remains constant over time, when there is empirical evidence that this is not so. +½ [max. 4] 4 (i) (a) To protect itself from the risk of large claims. + (b) Excess of loss reinsurance where the reinsurer pays any amount of a claim above the retention. + Proportional reinsurance where the reinsurer pays a fixed proportion of any claim. + [3] (ii) We must first find the parameters and λ of the Pareto distribution. λ 70 and λ ( ) ( ) 340 λ ( ) 340 so 340 70.58573388 so.58573388.58573388 5.445 and λ 70 4.445 9.90375 We need to find M such that P(X > M) 0.05 i.e. λ λ+ M 0.05 λ λ+ M λ 0.05 0.05 ( λ+ M )

M λ 0.05 0.05 9.90375 0.05 0.05 5.445 5.445 880.8 [4] [Total 7] 5 (i) The conventional approach in machine learning is to divide the data into two. One part of the data (usually the majority) is used to train the algorithm to choose the best hypothesis. The other is used to test the chosen hypothesis on data that the algorithm has not seen before. + In practice, the training data is often split into a part used to estimate the parameters of the model, and a part used to validate the model. + It involves three data sets: a training data set: the sample of data used to fit the model +½ a validation data set: the sample of data used to provide an unbiased evaluation of model fit on the training dataset while tuning model hyperparameters +½ a test data set: the sample of data used to provide an unbiased evaluation of the final model fit on the training data set. +½ Typically, the training data set will use around 60% of the data available, the validation data set about 0% and the test data set about 0% +½ but these percentages may vary, and a key criterion in dividing the data is is that the validation and test data sets should have enough data to be fit for purpose. +½ [max. 4] (ii) The number of covariates which may be used to model risk prices could be vary large. +½ Using conventional approaches to finding the best model (the best combination of covariates) can be very time consuming and computationally infeasible. +½ Page 4

Penalised regression constrains the sum of the estimated parameter values to be less than a certain value. We exact a penalty which may be written λp( β... β J ), where β... βj are the parameters P() is some increasing function. +½ Need sufficient data to find best choice of λ (done e.g. through crossvalidation). +½ Advantages for risk pricing: model building is automated no debates about borderline decisions quicker speed to market models can be continuously updated as new data appear. + Disadvantages: data sets can be very large. throwing all your data at penalised regression does not work. data are not equally relevant (e.g. should we use three years data or just two years data?) + Other relevant points could score credit. [max. 4] [Total 8] 6 (i) µ is the hazard for a man who does not drink beer. + β measures the impact on the hazard of a one glass increase in the daily amount of beer drunk. + (ii) The hazard for a man who drinks two glasses of beer a day is 0.03exp(0.* ) 0.0448. + (iii) (a) The probability that a man aged 60 years who drinks three glasses of beer a day will survive to his 70th birthday is 0 exp 0.03exp(0.*3) dt exp( 0.547) 0.579. + 0 (b) Because the hazard of death is constant, the expectation of life at age 60 years is given by 8.3 years. + ht ( ) µ exp( βx) 0.03exp(0.*3)

(iv) The owner s total revenue R is proportional to the average number of glasses of beer drunk per day multiplied by the man's expectation of life: x R. + 0.03exp(0. x) We maximise R with respect to x. dr 0.03exp(0. x) x(0.03*0.) exp(0. x) 0.x dx. + [0.03exp(0. x)] 0.03exp(0. x) This is zero when 0.x 0, or when x 5. THEN EITHER d R 0.[0.03exp(0. x)] ( 0. x)[0.03*0. exp(0. x)] dx [0.03exp(0. x)] 0. 0.( 0. x) 0.(0.x ) [0.03exp(0. x)] [0.03exp(0. x)] which is negative when x 5, so we have a maximum. + [4] OR When x is 0, total revenue is 0. When x is strictly positive, total revenue is always greater than zero. Therefore we must have a maximum. So the owner should sell the man five glasses of beer per day. + [4] Various alternatives can be awarded credit, e.g. using the logarithm of the likelihood, or a demonstration by calculating predicted revenue for x 0,,, 3 that x 5 marks a maximum. [Total 9] Page 6

7 (i) ALTERNATIVE If the probabilities do not depend solely upon the length of the time interval t-s, the process is time-inhomogeneous. + OR ALTERNATIVE The transition rates vary with time. + OR ALTERNATIVE 3 The transition rates depend upon the start and end times. + [] (ii) A model with time-inhomogeneous rates has more parameters, and there may not be sufficient data available to estimate these parameters. + Also, the solutions to Kolmogorov s equations may not be easy (or even possible) to find analytically. + Time-inhomogeneous processes are computationally harder to simulate. + [max. ] (iii) We need P GG t. (t), i.e. probability the process remains in G throughout time 0 to This satisfies d P GG () t dt Hence PGG () t d dt ( 0. 0.04t) P GG (t). + P GG (t) ( 0. 0.04t); d dt [ln P GG (t)] ( 0. 0.04t). + Integrate both sides: s t s t GG s 0 s 0 ln P ( s) 0.s 0.0s. P GG (0),

so P GG (t) exp( 0.t 0.0t ). + [3] (iv) Occurs when P GG (t) 0.5, so we have 0.5 exp( 0.t 0.0t ), 0.0t + 0.t 0.6935 0, + and solving using the quadratic equation formula produces t.74 or t.74. The answer lies between 0 and 8, so we require t.74. + d (v) P G () t dt ( 0. 0.04t) P G (t) + (0.4 0.04t) P N (t) + But P N (t) P G (t), d So P G () t ( 0. 0.04t) P G (t) + (0.4 0.04t) ( P G (t)), + dt OR d P G () t 0.4 0.04t 0.6P G (t). + dt [max. ] [Total 0] 8 (i) We can deduce that t t Y at + e + i i and so E( Yt ) at and Var( Yt ) tσ. + Since these expressions depend on t the process is not stationary. + [3] (ii) As s < t we have t t s t s Cov( Y, Y ) Cov( at + e, as + e ) Var( e ) ( t s) σ t t s i j j i j j +½ Page 8

which is linear in s as required. +½ (iii) First, note that the differenced series: X Y Y a+ e t t t t is essentially a white noise process. So estimates of a and σ can be found by constructing the sample differences series xi yi yi for i,,, n and taking the mean and sample variance(or its square for estimating ) respectively. [3] (iv) In this case yˆ () aˆ+ y + 0 aˆ+ y + n n n and yˆn() aˆ+ yˆn() + 0 aˆ+ yn + [Total 0] 9 (i) The force of mortality, µ x+ tat age x + t is defined by the expression µ x+ t lim Pr[ T dt 0 dt x+ t+ dt T > x+ t] + [] µ 5 5p0 e (ii) + [] (iii) p 5λ e 5 5 p 5 5 e 5λ p But 5 5 5 5 p + Hence e 5λ 5λ e 5λ loge 5λ 0.693 5λ + so that 0.0693 λ. + [3]

(iv) If λ µ 0.0693, then e 0 4.43years. µ 0.0693 + [] (v) If λ µ, then 0 5 µ s 5µ λs. e e ds + e e ds 0 0 Evaluating the integrals gives 5 µ s 5µ λs e0 e + e e µ 0 λ µ 5 µ 5 e + + e 0 + µ µ λ 5 5 e µ ( e µ + ), λ µ and putting λ 0.0693 gives 0 µ 5 ( µ 5 ) e0 e + e 0.0693 µ µ 5 µ 5 4.43 e + ( e ). µ OR or 5 e µ 4.43 +. µ µ If λ µ, then e0 t p µ dt. t x 0 5 µ t 5 µ λ t 5 e te µ dt + e te ( ) λdt 0 0 5 Integrating by parts 5 µ t 5( µ λ) λt t e dt + e t e dt t t 0 5 5 5 µ t µ t 5 µ λ λt λt ( ) te + e dt + e te + e dt 0 5 0 5 Page 0

5 λt 5µ µ t 5 5 ( µ λ) t e λ e + e e te + + µ 5 0 λ 5 5 5 5 5( ) 5 5 e µ µ λ µ λ e e e ( 0 5e ) λ + + + + 0+ µ µ λ µ 5 5 e µ µ 5 5e + + + e 5+ µ µ λ 5 5 e µ ( e µ + ), λ µ µ 5 µ 5 4.43 e + ( e ). µ [4] [Total 0] 0 (i) The lag polynomial here is 0.6L 0.6L ( 0.8L)( + 0.L) with roots.5 and 5 therefore it is stationary. + Hence an ARMA(,0) process. + (ii) From the stationarity condition then E(Y t ) µ + 0.6µ + 0.6µ + 0 µ 0.76 0.4 (iii) From the Yule-Walker equations for autocorrelation function values and for lags,, 3, we have that ρ k 0.6ρ k + 0.6ρ k. In particular, for k, or ρ ρ 0.6ρ 0 + 0.6ρ 0.6 + 0.6ρ For k we have that For k 3 0.6 0.6 0.6 0.84 0.743 5/7 + 0.6 ρ 0.6ρ + 0.6ρ 0 0.6ρ + 0.6 0.84 + 0.6 0.5886 03/75. +

For k 4 ρ 3 0.6ρ + 0.6ρ 0.6 * 0.588574 + 0.6 * 0.74857 0.4674 409/875 + ρ 4 0.6ρ 3 + 0.6ρ 0.6 * 0.467486 + 0.6 * 0.588474 0.3746 639/4375 + For the partial autocorrelation function we have that ψ ρ 0.743 5/7 + ψ ρ ρ ρ 0.6 4/5 + and ψ 3 ψ 4 0 since Y t is AR(). + [7] [Total ] (i) Formula for the Gumbel copula function The inverse of x ( y) ( ln y) ψ is y ( x) exp ( x ) ψ. + So the copula function is: ( ) ( ) Cuv (, ) ψ ψ( u) + ψ ( v) exp ( ln u) + ( ln v) + (ii) Lower tail dependence Using the formula given and the copula function from part (i): Page

λ L u 0+ ( u u ) Cuu (, ) lim lim exp ( ln ) + ( ln ) u 0+ u u 0+ u ( u ) lim exp ( ln ) u 0+ u lim exp ( ln ) u 0+ u lim exp ln u 0+ u lim u u 0+ u lim u { u } ( u ) +½ Since < <, we know that 0< < and 0< <. So λ L 0. +½ (iii) Extreme values (a) In this context the extreme values are the claims in the right-hand tail of the claims distribution. +½ These claims have a low probability but are for large amounts. +½ (b) Because the payments for these claims are large, they can have a significant financial impact on the insurance company. +½ The probabilities for the right-hand tail cannot be calculated accurately using the usual techniques used for the main body of the distribution. +½ (iv) Probability that the individual claims exceed 0 million Using the distribution functions give: 0.5 PX ( > 0) FX (0) exp exp 5 e.5 e 0.8000 0.000 +

0 5 PY ( > 0) FY ( x) exp exp 7.5 e 0.6667 e 0.5984 0.406 + (v) Probability that both claims exceed 0 million (a) using the product copula The product copula implies that X and Y are independent. +½ So with this copula: PX ( > 0, Y> 0) PX ( > 0) PY ( > 0) 0.000 0.406 0.0803 +½ (b) using the Gumbel copula With the Gumbel copula we have: u PX ( 0) 0.8000 and v PY ( 0) 0.5984. So ( ) 0.5 PX ( 0, Y 0) exp ( ln 0.8000) + ( ln 0.5984) 0.573 + Using the hint given: PX ( > 0, Y> 0) PX ( 0) PY ( 0) + PX ( 0, Y 0) 0.8000 0.5984 + 0.573 0.79 + (vi) Comment With the Gumbel copula the probability that both claims will exceed 0 million (7.9%) is more than twice what it would be if the claims were independent (8.03%). +½ So, if the insurance company treated the claims from these two policies as independent, it would significantly underestimate the risk. +½ [Total ] Page 4

(i) 3β β 0% β β β β β 0% β 40% β β β (ii) We require each row of the transition matrix to sum to. +½ Here this holds for all values of β. +½ We require each of the following to lie between 0 and inclusive: β, β, β, 3 β, β β. The first two require that 0 β. +½ The third requires that 0 β /. +½ The fourth that β, β 0. +½ 3 The fifth impliesβ 5 as the negative root is not viable. + So overall 0 β. 3 +½ [max. 3] (iii) If β> 0 then it can reach any other state (so it is irreducible) + and it has a loop on each state (so it is aperiodic). +½ However if β 0 it can never leave its current state so it is reducible. +½

(iv) In this case the matrix is P 0.89 0. 0.0 0. 0.7 0. 0.0 0. 0.89 +½ The stationary distribution satisfies We have: ππ P. +½ π 0.89π + 0.π + 0.0π 0 0 0 40 π 0.π + 0.7π + 0.π 0 0 0 40 π 0.0π + 0.π + 0.89π 40 0 0 40 +½ and π 0 +π 0 +π 40. +½ The first and third equations give 0.π 0.π 0.0π 0.0π 0.3 π 40 π 0.769π0 0.3 0.+ 0.769 π 0 π 0 0.93π 0, 0.3 40 0 0 40 so ( + 0.93+.769) π 0, and hence + 3 π 0 0.7 48 π 0 0.5 4 3 π 40 0.479 48 are the long term proportion of taxpayers at each marginal rate. + [4] (v) Looking for the rates two years later, these are given by P, which is 0.89 0. 0.0 0.89 0. 0.0 0.80 0.6 0.0378 0. 0.7 0.. 0. 0.7 0. 0.6 0.5 0.39 0.0 0. 0.89 0.0 0. 0.89 0.078 0.6 0.8 Page 6

So the required probabilities are: (a) 0.6 (b) 0.5 (c) 0.39. [Total 3] END OF MARKING SCHEDULE