Estimation for Modified Data

Similar documents
Nonparametric Model Construction

Analysis of Time-to-Event Data: Chapter 2 - Nonparametric estimation of functions of survival time

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:

Chapter 4 Fall Notations: t 1 < t 2 < < t D, D unique death times. d j = # deaths at t j = n. Y j = # at risk /alive at t j = n

Estimation, Evaluation, and Selection of Actuarial Models 1

Survival Analysis: Weeks 2-3. Lu Tian and Richard Olshen Stanford University

ST745: Survival Analysis: Nonparametric methods

SPRING 2007 EXAM C SOLUTIONS

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis

Censoring and Truncation - Highlighting the Differences

TMA 4275 Lifetime Analysis June 2004 Solution

Lecture 4 - Survival Models

18.465, further revised November 27, 2012 Survival analysis and the Kaplan Meier estimator

Stat 475. Solutions to Homework Assignment 1

Multistate Modeling and Applications

Multi-state models: prediction

Statistical Inference and Methods

Survival Analysis. Stat 526. April 13, 2018

STAT Sample Problem: General Asymptotic Results

Course 4 Solutions November 2001 Exams

MAS3301 / MAS8311 Biostatistics Part II: Survival

Lecture 6 PREDICTING SURVIVAL UNDER THE PH MODEL

Let us use the term failure time to indicate the time of the event of interest in either a survival analysis or reliability analysis.

Kernel density estimation in R

Exam C Solutions Spring 2005

Lecture 7. Poisson and lifetime processes in risk analysis

Chapter 7 Fall Chapter 7 Hypothesis testing Hypotheses of interest: (A) 1-sample

Exercises. (a) Prove that m(t) =

Lecture 3. Truncation, length-bias and prevalence sampling

Log-linearity for Cox s regression model. Thesis for the Degree Master of Science

9. Estimating Survival Distribution for a PH Model

1 The problem of survival analysis

Estimating transition probabilities for the illness-death model The Aalen-Johansen estimator under violation of the Markov assumption Torunn Heggland

May 2013 MLC Solutions

e 4β e 4β + e β ˆβ =0.765

MLC Fall Model Solutions. Written Answer Questions

Right-truncated data. STAT474/STAT574 February 7, / 44

CIMAT Taller de Modelos de Capture y Recaptura Known Fate Survival Analysis

Survival Regression Models

Solutions to the Spring 2015 CAS Exam ST

Multi-state Models: An Overview

Notes for Math 324, Part 20

Empirical Processes & Survival Analysis. The Functional Delta Method

Course 1 Solutions November 2001 Exams

Survival Analysis I (CHL5209H)

1. If X has density. cx 3 e x ), 0 x < 0, otherwise. Find the value of c that makes f a probability density. f(x) =

Survival Distributions, Hazard Functions, Cumulative Hazards

The Weibull in R is actually parameterized a fair bit differently from the book. In R, the density for x > 0 is

Stat 476 Life Contingencies II. Multiple Life and Multiple Decrement Models

4 Testing Hypotheses. 4.1 Tests in the regression setting. 4.2 Non-parametric testing of survival between groups

SPRING 2007 EXAM MLC SOLUTIONS

10 Introduction to Reliability

Definitions and examples Simple estimation and testing Regression models Goodness of fit for the Cox model. Recap of Part 1. Per Kragh Andersen

Lecture 5 Models and methods for recurrent event data

November 2000 Course 1. Society of Actuaries/Casualty Actuarial Society

11 Survival Analysis and Empirical Likelihood

Survival Analysis. STAT 526 Professor Olga Vitek

Subject CT4 Models. October 2015 Examination INDICATIVE SOLUTION

Introduction to repairable systems STK4400 Spring 2011

Typical Survival Data Arising From a Clinical Trial. Censoring. The Survivor Function. Mathematical Definitions Introduction

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520

UNIVERSITY OF CALIFORNIA, SAN DIEGO

The Design of a Survival Study

Description Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see

Errata for the ASM Study Manual for Exam P, Fourth Edition By Dr. Krzysztof M. Ostaszewski, FSA, CFA, MAAA

Fall 2003 Society of Actuaries Course 3 Solutions = (0.9)(0.8)(0.3) + (0.5)(0.4)(0.7) (0.9)(0.8)(0.5)(0.4) [1-(0.7)(0.3)] = (0.

Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN SOLUTIONS

Asymptotic Distributions for the Nelson-Aalen and Kaplan-Meier estimators and for test statistics.

MLC Fall 2015 Written Answer Questions. Model Solutions

Survival Analysis. 732G34 Statistisk analys av komplexa data. Krzysztof Bartoszek

Textbook: Survivial Analysis Techniques for Censored and Truncated Data 2nd edition, by Klein and Moeschberger

RMSC 2001 Introduction to Risk Management

Survival Analysis Math 434 Fall 2011

Chapter 7: Hypothesis testing

2011/04 LEUKAEMIA IN WALES Welsh Cancer Intelligence and Surveillance Unit

Notes largely based on Statistical Methods for Reliability Data by W.Q. Meeker and L. A. Escobar, Wiley, 1998 and on their class notes.

Creating New Distributions

Meei Pyng Ng 1 and Ray Watson 1

Errata and updates for ASM Exam C/Exam 4 Manual (Sixth Edition) sorted by date

Constrained estimation for binary and survival data

Machine Learning. Module 3-4: Regression and Survival Analysis Day 2, Asst. Prof. Dr. Santitham Prom-on

Frequency Distributions

Errata and updates for ASM Exam C/Exam 4 Manual (Sixth Edition) sorted by page

Philosophy and Features of the mstate package

Kaplan-Meier in SAS. filename foo url " data small; infile foo firstobs=2; input time censor; run;

Analysis of competing risks data and simulation of data following predened subdistribution hazards

Exam C. Exam C. Exam C. Exam C. Exam C

Cox s proportional hazards model and Cox s partial likelihood

Lecture 7. Proportional Hazards Model - Handling Ties and Survival Estimation Statistics Survival Analysis. Presented February 4, 2016

STAT 6385 Survey of Nonparametric Statistics. Order Statistics, EDF and Censoring

ASM Study Manual for Exam P, Second Edition By Dr. Krzysztof M. Ostaszewski, FSA, CFA, MAAA Errata

f X (x) = λe λx, , x 0, k 0, λ > 0 Γ (k) f X (u)f X (z u)du

Introduction to BGPhazard

Part III. Hypothesis Testing. III.1. Log-rank Test for Right-censored Failure Time Data

Lecture 22 Survival Analysis: An Introduction

Statistical Methods for Alzheimer s Disease Studies

Extensions of Cox Model for Non-Proportional Hazards Purpose

RMSC 2001 Introduction to Risk Management

Transcription:

Definition. Estimation for Modified Data 1. Empirical distribution for complete individual data (section 11.) An observation X is truncated from below ( left truncated) at d if when it is at or below d (X d) it is not recorded, but when it is above d it is recorded at its observed value X. An observation X is truncated from above ( left truncated) at u if when it is at or above u (X u) it is not recorded, but when it is below u it is recorded at its observed value X. An observation X is censored from below ( left censored) at d if when it is at or below d (X d) it is recorded as being equal to d, but when it is above d it is recorded at its observed value X. An observation X is censored from above ( right censored) at u if when it is at or above u (X u) it is recorded as being equal to u, but when it is below u it is recorded at its observed value X. The most common occurrences are left truncation and right censoring. Left truncation occurs when an ordinary deductible of d is applied. When a policyholder has a loss below d, he or she knows no benefits will be paid and so does not inform the insurer. When the loss is above d, the amount of the loss will be reported. A policy limit is an example of right censoring. When the amount of the loss exceeds u, benefits beyond that value are not paid, and so the exact value is not recorded. However, it is known that a loss of at least u has occurred. In chapters 11 and 1, truncated means truncated from below and censored always means censored from above. 1

For the observed values {x 1,,..., x n } let y 1 < y < < y k be the unique values of the x i s. For each j we associate a so-called risk set r j as follows: For survival/mortality data, the risk set is the number of people that are alive at age y j. For loss amount data, the risk set is the number of policies with observed loss amounts (either the actual amount or the maximum amount due to a policy limit) larger than or equal to y j less those with deductibles greater than or equal to y j. Let s j be the number of times the uncensored observation y j appears in the sample. We also use the following notations: For an individual data, let d j denote the truncation point of the j-th observation with d j = 0 in the absence of truncation. In a mortality table, it is the entry time. u j denote the censored point of the j-th observation. In the mortality table, it is the withdrawal time. x j denote the uncensored value (loss amount). In a mortality table, it is the death time. Definition. In terms of a mortality study, risk set at the j-th ordered observation y j, denoted by r j, is the number of individuals who are under observation at that age. r j = #(x i y j ) + #(u i y j ) #(d i y j ) Since the total number of d i s equals the sum of the number of x i s and the number of u i s, we can calculate r j in the following way too: r j = #(d i < y j ) #(x i < y j ) #(u i < y j ) = number of those who have entered prior to y j number of those who have left prior to y j For data consisting of loss amounts:

Risk set = the number of policies with observed loss amounts (either the actual amount or the maximum amount due to a policy limit) greater than or equal to y j less those with deductibles greater than or equal to y j. Example (Finan s notes - Example 51.1). The following mortality table is given: Life Time of Entry Time of exit Reason of exit 1 0 4 End of Study 0 0.5 Death 3 0 1 Leaving Study 4 0 4 End of Study 5 1 4 End of Study 6 1. Death 7 1.5 Death 8 3 Leaving Study 9.5 4 End of Study 10 3.1 3. Death (a) Complete the following table: (b) Create a table summarizing y j, s j, and r j 3

i d i x i u i 1 3 4 5 6 7 8 9 10 Solution for part (a). Life Time of Entry Time of exit Reason of exit 1 0 4 End of Study 0 0.5 Death 3 0 1 Leaving Study 4 0 4 End of Study 5 1 4 End of Study 6 1. Death 7 1.5 Death 8 3 Leaving Study 9.5 4 End of Study 10 3.1 3. Death i d i x i u i entry death withdraw 1 0 4 0 0.5 3 0 1 4 0 4 5 1 4 6 1. 7 1.5 8 3 9.5 4 10 3.1 3. Solution for part (b). 4

The unique values of x i s are y j {0.5,, 3.}. Now form the extended table with asterisk assigned to y j values: all numbers in the table 0 0.5 1 1. 1.5.5 3 3.1 3. 4 number of d i s 4 1 1 1 1 1 1 number of x i s 1 1 number of u i s 1 1 4 Then from the formula r j = #(x i y j ) + #(u i y j ) #(d i y j ) we can complete the following table j y j s j r j 1 0.5 1 4 + 6-6 = 4 3 + 5-3 = 5 3 3. 1 1 + 4-0 = 5 5

Definition. The Kaplan-Meier (product-limit) estimate for the survival function is given by 1 0 t < y 1 S n (t) = j 1 i=1 ( ) 1 s i r i y j 1 t < y j j =,..., k k i=1 ( ) 1 s i r i or 0 t y k where for the case of s k = r k, which means that nobody survives beyond age y k, we set S n (t) = 0 for t y k. Note. This estimator is a point estimation, as it gives a value for Ŝ(t) for each time t. Note. Note that the s i = #(x i : x i = y j ) gives the number of those who die at age y j, while r j = #(x i y j ) + #(u i y j ) #(d i y j ) gives the number of those who are at y j under study before any event occurs (death or withdrawal). So, the difference 1 s j r j gives the probability of surviving past time y j. = r j s j r j Example. Find the Kaplan-Meier estimate of the survival function for the data in the previous example. j y j s j r j 1 0.5 1 4 + 6-6 = 4 3 + 5-3 = 5 3 3. 1 1 + 4-0 = 5 6

S 10 (t) = 1 0 t < 0.5 1 s 1 r 1 = 0.75 0.5 t < ( ) (0.75) 1 s r = 0.45 t < 3. ( ) (0.45) 1 s 3 r 3 = 0.36 t 3. Note. The Kaplan-Meier estimation applied to a complete individual data is the same as the empirical survival function defined by S n (x) = 1 number of observations x n Note. For a survival function S(x) associated with a variable X we have this is because P (X x) = lim t x S(t) = S(x ) P (X x) = 1 P (X < x) = 1 F (x ) = S(x ) Example ( ). You are studying the length of time attorneys are involved in settling bodily injury lawsuits. T represents the number of months from the time an attorney is assigned such a case to the time the case is settled. Nine cases were observed during the study period, two of which were not settled at the conclusion of the study. For those two cases, the time spent up to the conclusion of the study, 4 months and 6 months, was recorded instead. The observed values of T for the other seven cases are as follows: 1 3 3 5 8 8 9 Estimate P r(3 T 5) using the Product-Limit estimator. 7

i d i x i u i entry death withdraw 1 0 1 0 3 3 0 3 4 0 4 5 0 5 6 0 6 7 0 8 8 0 8 9 0 9 j y j s j r j 1 1 1 9 3 8 3 5 1 5 4 8 3 5 9 1 1 Ŝ(t) = 8 9 3 8 15 1 0 t < 1 1 1 9 = 8 9 1 t < 3 ( ) 1 8 = 3 3 t < 5 ( ) 1 1 5 = 8 15 5 t < 8 ( ) 1 3 = 8 45 8 t < 9 8 45 ( 1 1 1) = 0 t 9 P (3 T 5) = P (T 3) P (T > 5) = Ŝ(3 ) Ŝ(5) = 8 9 8 15 = 0.356 Example. The claim payments on a sample of ten policies are: 3 3 5 5 + 6 7 7 + 9 10 + + indicates that the loss exceeded the policy limit 8

Using the Product-Limit estimator, calculate the probability that the loss on a policy exceeds 8. i d i x i u i entry event censored 1 0 0 3 3 0 3 4 0 5 5 0 5 6 0 6 7 0 7 8 0 7 9 0 9 10 0 10 j y j s j r j 1 1 10 3 9 3 5 1 7 4 6 1 5 5 7 1 4 6 9 1 Ŝ(t) = 1 0 t < 1 1 10 = 0.9 t < 3 (0.9) ( 1 9) = 0.7 3 t < 5 (0.7) ( 1 9) = 0.6 5 t < 6 (0.6) ( 1 1 5) = 0.48 6 t < 7 (0.45) ( 1 4) 1 = 0.36 7 t < 9 (0.36) ( 1 ) 1 = 0.18 t 9 Ŝ(8) = Ŝ(7) = 0.36 9

Definition. As in the case of complete data, the Nelson-Åalen Estimator cumulative hazard rate function is defined by: 0 0 t < y 1 Ĥ(t) = j 1 i=1 s i r i y j 1 t < y j j =,..., k k i=1 s i r i t y k and then set Ŝ(t) = e Ĥ(t) for the estimation of the survival function. Example. Find the Nelson-Åalen estimate of the survival function. 0 0 t < 0.5 1 4 = 0.5 0.5 t < Ĥ(t) = 0.5 + 5 = 0.65 t < 3. 0.65 + 1 5 = 0.85 t 3. 1 0 t < 0.5 e 0.5 = 0.7788 0.5 t < Ŝ(t) = e 0.65 = 0.50 t < 3. e 0.85 = 0.474 t 3. 10

Mean and Variance of Nelson-Åalen Estimator (from section 1.) Suppose that events have occurred at times t i. In the Nelson-Åalen estimation j i=1 s i r i Ĥ(y j ) the value s i r i is an estimation for the hazard rate function h(t i ). If we assume a Poisson distribution for the number of events at time t i, this distribution has mean r i h(t i ), which is approximated by r i ( si r i ) = s i. So, if S i represents the random number of events at time t i, then Var(S i ) s i as S i is assumed to follow a Poisson distribution with mean approximately s i. If we assume independence between the random variables S i, then Var(Ĥ(y j)) = Var ( j i=1 S i r i ) = Note. This formula is recursive: j i=1 ( ) Si Var = r i ( ) j Var(Si = i=1 r i j i=1 ( si r i ) for Var(Ĥ(y j)) = Var(Ĥ(y j 1)) + s j r j Example (from the Finan s note)- 55.1. For a survival study with censored and truncated data, you are given: Time (t) Number at risk Failures at at time t time t 1 30 5 7 9 3 3 6 4 5 5 5 0 4 Estimate the variance of Ĥ(4). 11

Var(Ĥ(y 4 4)) = s i r i=1 i = 5 30 + 9 7 + 6 3 + 5 5 = 0.0318 Example. Fifteen cancer patients were observed from the time of diagnosis until the earlier of death or 36 months from diagnosis. Deaths occurred during the study as follows: Time in Months Number of Since Diagnosis Deaths 15 0 3 4 30 d 34 36 1 The Nelson-Aalen estimate Ĥ(35) is 1.5641. Calculate the Nelson-Aalen estimate of the variance of Ĥ(35). First of all, note that d 8 because prior to t = 30 only 8 survivals remain in the group. 15 + 3 13 + 10 + d 8 + 8 d = 1.5641 After simplifying, this reduces to: d 8 + 8 d = 1 By multiplying both sides by 8(8 d) gives: d(8 d) + 16 = 8(8 d) d 16d + 48 = 0 d = 4, 1 d 8 d = 4 The estimation for the variance will be: 1

15 + 3 13 + 10 + 4 8 + 4 = 0.341 13

Variance of Kaplan-Meier Estimator (from section 1.) The vairiance of the Kaplan-Meier estimator is: Var[Ŝ(y j)] = Ŝ(y j) j i=1 s i r i (r i s i ) and for t s other than y j s we have: { s Var[Ŝ(t)] = Ŝ(t) i r i (r i s i ) : } y i t We note that the quantity Ŝ(t) estimates the probability P (X > t). Now if one wants to estimate the variance of the estimator of a conditional probability t p s = P (X > t + s X > s), where s is one of the y j values, then the formula would be Var[P (X > t + s X > s)] = P (X > t + s X > s) { s i r i (r i s i ) : } s + 1 y i s + t Note. If the data is a complete one, then this formula reduces to the empirical approximation of the variance: Var[S n (x)] = Sn(x)(1 Sn(x)) n Example. The following payment data is given: 1500, 1700 +, 550, 800, 800, 3750 +, 5600 in here the plus sign means that the policy limit has been reached (note that in absence of deductible, the policy limit is the same as the maximum covered loss). Find the Greenwood s approximation for the variance of the Kaplan-Meier estimate of S(4000). 14

all numbers in the table 1500 1700 550 800 3750 5600 number of x i s 1 1 1 number of u i s 1 1 The unique values of y j s are 1500, 550, 800, 5600 The formula r j = #(x y j ) + #(u y j ) #(d y j ) still works fine, but since there is no d-value, it reduces to the formula r j = #(x y j ) + #(u y j ) j y j s j r j 1 1500 1 5+=7 550 1 4+1=5 3 800 3+1=4 4 5600 1 1 S(4000) = ( 6 7 )(4 5 )( 4 ) = 0.349 ) 3 Var (Ŝ(4000) = S(4000) i=1 [ s i r i (r i s i ) = 0.349 1 (7)(6) + 1 (5)(4) + 1 (4)() ] = 0.0381 Example. In a five year mortality study, you are given: y j s j r j 1 3 15 4 80 3 5 5 4 6 60 5 3 10 15

Calculate Greenwood s approximation of the conditional variance of the product limit estimator of S(4). Ŝ(4) = ( 1 15 ) ( 56 80 ) ( 0 5 ) ( ) 54 = 0.403 60 [ ] Var(Ŝ(4)) = 3 0.403 (15)(1) + 4 (80)(56) + 5 (5)(0) + 6 = 0.0055 (60)(54) Example given:. A mortality study is conducted on 50 lives observed from time zero. You are Time Number of Deaths Number Censored t d t c t (i) 15 0 17 0 3 5 4 0 30 0 c 30 3 8 0 40 0 (ii) Ŝ(35) is the Product-Limit estimate of S(35). (iii) Var(Ŝ(35)) ˆ is the estimate of Ŝ(35) using Greenwood s formula. (iv) Var(Ŝ(35)) ˆ ( Ŝ(35) ) = 0.011467 Determine c 30, the number censored at 30. 16

all numbers in the table 0 15 17 5 30 3 40 number of x i s 4 8 number of u i s 3 c number of d i s 50 Note that the initial time we d 0 = 50. We use the following formula to calculate r j s: r j = #(d < y j ) #(x < y j ) #(u < y j ) t j s j r j 15 50 5 4 45 3 8 41-c 40 33-c 0.011467 = ˆ Var(Ŝ(35)) (Ŝ(35) ) = (50)(48) + 4 (45)(41) + 8 (41 c)(33 c) (41 c)(33 c) = 945 solve quadratic equation c = 6 Note. For mortality data, the provability P (T > s + t T > s) is denoted by t p s and the probability of failing at or before time s + t given survival past s is denoted by t q s. The Kaplan-Meier estimation estimates the probabilities t p 0. If you want a conditional probability tp s, then you must calculate the Kaplan-Meier product starting at time s + 1 since tp s = P (T > s + t) P (T > s) = the product starting at s + 1 Example. For a survival study with censored and truncated data, you are given: 17

Number at Risk Time (t) at Time t Failures at Time t 1 30 5 7 9 3 3 6 4 5 5 5 0 4 The probability of failing at or before Time 4, given survival past Time 1, is 3 q 1. Calculate Greenwood s approximation of the variance of 3 q 1. 3p 1 = ( 18 7 ) ( 6 3 ) ( ) 0 = 0.4333 5 9 (7)(18) + 6 (3)(6) + 5 (5)(5) = 0.03573 Var( 3 q 1 ) = Var(1 3 p 1 ) = Var( 3 p 1 ) = (0.4333) (0.03573) = 0.006709 Definition. If S(t) is the Kaplan-Meier estimator for the survival function (the empirical estimate is a special case of this), then the 100(1 α)% linear confidence interval is defined to be the interval with endpoints: S(t) ± z p+1 Var(S(t)) Similarly, if H(t) is Nelson-Aalen estimate of the cumulative hazard function, we have the 100(1 α)% linear confidence interval H(t) ± z p+1 Var(H(t)) But, this confidence interval may include negative values, and the linear confidence interval for S(t) may include numbers larger than 1. To avoid these problems, the delta method can be 18

used (details omitted) to find the following so-call log-transformed confidence intervals : (S(t) 1 U, S(t) U ) U = exp ( z p+1 Var(S(t)) S(t) ln S(t) ) ( H(t) U, U H(t) ) U = exp ( z p+1 Var(H(t)) H(t) ) Example. In the previous example, calculate the 95% log-transformed confidence interval for H(3), based on the Nelson-Aalen estimate. Ĥ(3) = 5 30 + 9 7 + 6 3 = 0.6875 Var(Ĥ(3)) = 5 30 + 9 7 + 6 3 = 0.038 U = exp ( 1.96 ) 0.038 = exp(0.4395) = 1.5519 0.6875 confidence interval ( ) ( ) H 0.6875 U, HU = 1.5519, (0.6875)(1.5519) = (0.4430, 1.0669) Example. Twelve policyholders were monitored from the starting date of the policy to the time of first claim. The observed data are as follows: Time of First claim 1 3 4 5 6 7 Number of Claims 1 1 Using the Nelson-Aalen estimator, calculate the 95% linear confidence interval for the cumulative hazard rate function H(4.5). 19

y j r i s i 1 1 10 1 3 9 4 7 H(4.5) = s 1 r 1 + s r + s 3 r 3 + s 4 r 4 = 1 + 1 10 + 9 + 7 = 0.7746 Var(H(4.5)) = s 1 r1 + s r + s 3 r3 + s 4 r4 = 1 + 1 10 + 9 + 7 = 0.0894 linear confidence interval : 0.7746 ± 1.96 0.0894 = (0.189, 1.361) Example. For a survival study, you are given: (i) The Product-Limit estimator Ŝ(t 0) is used to construct confidence intervals for S(t 0 ). (ii) The 95% log-transformed confidence interval for S(t 0 ) is (0.695, 0.843). Determine Ŝ(t 0). S 1 U = 0.695 S U = 0.843 1 U ln(s) = ln(0.695) U ln(s) = ln(0.843) Dividing the second row by the first row gives us: U = ln(0.843) ln(0.695) S = U = ( S 1 U ) U = 0.695 0.6851 = 0.7794 ln(0.843) ln(0.695) = 0.6851 0