GENERALIZED KAPLAN MEIER ESTIMATOR FOR FUZZY SURVIVAL TIMES. Muhammad Shafiq. Reinhard Viertl. 1. Introduction

Similar documents
Foundations of Fuzzy Bayesian Inference

Statistical Analysis of Competing Risks With Missing Causes of Failure

Membership Functions Representing a Number vs. Representing a Set: Proof of Unique Reconstruction

Fuzzy histograms and fuzzy probability distributions

SAMPLE CHAPTERS UNESCO EOLSS FOUNDATIONS OF STATISTICS. Reinhard Viertl Vienna University of Technology, A-1040 Wien, Austria

Investigation of goodness-of-fit test statistic distributions by random censored samples

Why Trapezoidal and Triangular Membership Functions Work So Well: Towards a Theoretical Explanation

Multistate Modeling and Applications

An Evaluation of the Reliability of Complex Systems Using Shadowed Sets and Fuzzy Lifetime Data

Constrained estimation for binary and survival data

Computations Under Time Constraints: Algorithms Developed for Fuzzy Computations can Help

Empirical Processes & Survival Analysis. The Functional Delta Method

How to Define "and"- and "or"-operations for Intuitionistic and Picture Fuzzy Sets

TMA 4275 Lifetime Analysis June 2004 Solution

Distribution Theory. Comparison Between Two Quantiles: The Normal and Exponential Cases

Introduction to Intelligent Control Part 6

Lecture 3. Truncation, length-bias and prevalence sampling

Intersection and union of type-2 fuzzy sets and connection to (α 1, α 2 )-double cuts

Analysis of Time-to-Event Data: Chapter 2 - Nonparametric estimation of functions of survival time

Where are we? Operations on fuzzy sets (cont.) Fuzzy Logic. Motivation. Crisp and fuzzy sets. Examples

Solution of Fuzzy Maximal Flow Network Problem Based on Generalized Trapezoidal Fuzzy Numbers with Rank and Mode

Fuzzy relation equations with dual composition

Augustin: Some Basic Results on the Extension of Quasi-Likelihood Based Measurement Error Correction to Multivariate and Flexible Structural Models

Nonparametric Bayes Estimator of Survival Function for Right-Censoring and Left-Truncation Data

Which Bio-Diversity Indices Are Most Adequate

Product-limit estimators of the survival function with left or right censored data

Interval based Uncertain Reasoning using Fuzzy and Rough Sets

type-2 fuzzy sets, α-plane, intersection of type-2 fuzzy sets, union of type-2 fuzzy sets, fuzzy sets

Chapter 2 Inference on Mean Residual Life-Overview

Uncertain Systems are Universal Approximators

Statistical properties of a control design of controls provided by Supreme Chamber of Control

f a f a a b the universal set,

A new approach for stochastic ordering of risks

Goodness-of-fit tests for randomly censored Weibull distributions with estimated parameters

Chapter 2. Fuzzy function space Introduction. A fuzzy function can be considered as the generalization of a classical function.

University of California, Berkeley

On correlation between two real interval sets

Statistical Hypotheses Testing in the Fuzzy Environment

Research Article On Decomposable Measures Induced by Metrics

Inference on reliability in two-parameter exponential stress strength model

A New Method to Forecast Enrollments Using Fuzzy Time Series

DYNAMIC ECONOMETRIC MODELS Vol. 9 Nicolaus Copernicus University Toruń Mariola Piłatowska Nicolaus Copernicus University in Toruń

Chi-square goodness-of-fit test for vague data

AN INTRODUCTION TO FUZZY SOFT TOPOLOGICAL SPACES

Availability and Reliability Analysis for Dependent System with Load-Sharing and Degradation Facility

Counterintuitive results from Bayesian belief network software reliability model

Bayesian Inference. Chapter 1. Introduction and basic concepts

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis

Sets of Joint Probability Measures Generated by Weighted Marginal Focal Sets

Fuzzy Numerical Results Derived From Crashing CPM/PERT Networks of Padma Bridge in Bangladesh

STAT Section 2.1: Basic Inference. Basic Definitions

Hazard Function, Failure Rate, and A Rule of Thumb for Calculating Empirical Hazard Function of Continuous-Time Failure Data

Asymptotic Distributions for the Nelson-Aalen and Kaplan-Meier estimators and for test statistics.

Numerical Methods for Data Analysis

TYPE-2 FUZZY G-TOLERANCE RELATION AND ITS PROPERTIES

A Note on Stochastic Orders

Quantile Regression for Residual Life and Empirical Likelihood

Domination in Fuzzy Graph: A New Approach

How to Assign Weights to Different Factors in Vulnerability Analysis: Towards a Justification of a Heuristic Technique

Uncertainty modeling for robust verifiable design. Arnold Neumaier University of Vienna Vienna, Austria

SURVIVAL DATA ANALYSIS BY GENERALIZED ENTROPY OPTIMIZATION METHODS

Human Blood Pressure and Body Temp Analysis Using Fuzzy Logic Control System

Exercises. (a) Prove that m(t) =

Logistic regression model for survival time analysis using time-varying coefficients

A Time Truncated Two Stage Group Acceptance Sampling Plan For Compound Rayleigh Distribution

Multistate models in survival and event history analysis

Topics in Stochastic Geometry. Lecture 4 The Boolean model

UNCERTAINTY FUNCTIONAL DIFFERENTIAL EQUATIONS FOR FINANCE

Analytical Bootstrap Methods for Censored Data

Survival Analysis for Case-Cohort Studies

Notes largely based on Statistical Methods for Reliability Data by W.Q. Meeker and L. A. Escobar, Wiley, 1998 and on their class notes.

Fuzzy Expert Systems Lecture 6 (Fuzzy Logic )

A framework for type 2 fuzzy time series models. K. Huarng and H.-K. Yu Feng Chia University, Taiwan

Fuzzy reliability analysis of washing unit in a paper plant using soft-computing based hybridized techniques

and Comparison with NPMLE

Inclusion Relationship of Uncertain Sets

Multi attribute decision making using decision makers attitude in intuitionistic fuzzy context

Application of Fuzzy Measure and Fuzzy Integral in Students Failure Decision Making

Tail Properties and Asymptotic Expansions for the Maximum of Logarithmic Skew-Normal Distribution

The Trapezoidal Fuzzy Number. Linear Programming

Imprecise Software Reliability

Lecture 5 Models and methods for recurrent event data

Membership Function of a Special Conditional Uncertain Set

Monte Carlo Integration using Importance Sampling and Gibbs Sampling

CONTROL SYSTEMS, ROBOTICS AND AUTOMATION Vol. XVII - Analysis and Stability of Fuzzy Systems - Ralf Mikut and Georg Bretthauer

Mean. Pranab K. Mitra and Bimal K. Sinha. Department of Mathematics and Statistics, University Of Maryland, Baltimore County

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model

Stock Sampling with Interval-Censored Elapsed Duration: A Monte Carlo Analysis

Why Bellman-Zadeh Approach to Fuzzy Optimization

A Bayesian Nonparametric Approach to Causal Inference for Semi-competing risks

Product of Partially Ordered Sets (Posets), with Potential Applications to Uncertainty Logic and Space-Time Geometry

Survival Analysis. 732G34 Statistisk analys av komplexa data. Krzysztof Bartoszek

Lecture 4 - Survival Models

The Expected Opportunity Cost and Selecting the Optimal Subset

Design of Optimal Bayesian Reliability Test Plans for a Series System

Variations of non-additive measures

Fuzzy Limits of Functions

First Results for a Mathematical Theory of Possibilistic Processes

STAT Sample Problem: General Asymptotic Results

Estimation of Stress-Strength Reliability for Kumaraswamy Exponential Distribution Based on Upper Record Values

Transcription:

GENERALIZED KAPLAN MEIER ESTIMATOR FOR FUZZY SURVIVAL TIMES ŚLĄSKI Muhammad Shafiq Department of Economics Kohat University of Science & Technology, Kohat, Pakistan Reinhard Viertl Institute of Statistics and Probability Theory Vienna University of Technology, Vienna, Austria ISSN 1644-6739 e-issn 2449-9765 DOI: 10.15611/sps.2015.13.01 Summary: Survival analysis can be defined as a set of methods where the response of interest is the time until a specified event occurred. The most common specified event is death and the related time is called survival time or life time in medical sciences. The Kaplan Meier estimator is one of the popular methods for precise survival times. It is natural that life time is of a continuous nature, therefore it is unrealistic to treat life time observations as precise numbers. In [Viertl 2009] it is shown that life time observations are not precise numbers, but more or less fuzzy. In this study a Generalized Kaplan Meier estimator for fuzzy survival time observations is proposed. Keywords: characterizing function, fuzzy numbers, Kaplan Meier estimator, non-precise data, survival time. 1. Introduction Statistical modeling for life time data started in the 20 th century, and is now known as reliability analysis or survival analysis. Reliability analysis is mainly concerned with the models of life time data obtained from components and systems in engineering sciences, and survival analysis models are mainly concerned with life time data obtained in biological or life sciences. Life time, survival time, failure time or event time can simply be defined as the waiting time till a specified event occurs. The event may be death in life science, failure in engineering sciences, divorce in sociology, change of residence in demography, and so on. Survival analysis techniques are mainly concerned with predicting the probability of response, probability of survival, mean life time, and comparing survival functions [Deshpande and Purhit 2005].

ŚLĄSKI 8 Muhammad Shafiq, Reinhard Viertl 2. Survival Function The survival function is conventionally denoted by S( ), which is defined as: S(t) = Pr(T > t) t 0. Where t is some specified time, T is the stochastic quantity describing time of death, and Pr stands for probability. This function gives the probability that the unit will survive time t or we can say that the event will occur after time t. For the survival function it is usually assumed that S(0) = 1, and lim t S(t) = 0 [Lee and Wang 2003]. 3. Kaplan Meier Estimator Let 0 t 1 t 2 t 3 t n be n precise life times from a given population, and n i be the number of observations at risk at time t i, and d i the number of deaths at time t i. If d i denotes the number of Table 1. Kaplan Meier Survival probabilities Time d i n i 1 d i n i S(t) 0 0 5 1 1 10 1 5 0.8 0.8 20 1 4 0.75 0.6 30 1 3 0.67 0.402 40 1 2 0.5 0.201 50 1 1 0 0 Source: own elaboration. S(t) 0 0.2 0.4 0.6 0.8 1 Figure 1. Kaplan Meier Survival Curve Source: own elaboration. 0 10 20 30 40 50 Time

Generalized Kaplan Meier Estimator 9 deaths at time t i, frequently it is either 0 or 1, but tied survival times are possible. In that case d i may be greater than 1. The Kaplan Meier estimate can be expressed as: ŚLĄSKI S(t) = 1 d i t i t t 0 [Kaplan and Meier 1958]. n i For example, if we have five precise complete life time observations, i.e. 10, 20, 30, 40, 50, then the Kaplan Meier survival probabilities and survival curve are given in Table 1 and Figure 1 respectively. 4. Fuzzy Information Standard statistical procedures like estimation of parameters and testing of hypotheses are based on precise numbers. It looks unrealistic to represent continuous real variables in the form of precise numbers or vectors because exact measurements of real continuous variables are not possible, they are more or less fuzzy. Some books and research papers have already been written dealing with fuzzy observations like [Klir and Yuan 1995; Lee 2005; Viertl and Hareter 2006; Huang et al. 2006; Wu 2009]. Survival time is a non-negative valued variable, and it is already shown in [Viertl 2009] that life time observations are not precise numbers but more or less fuzzy. Therefore dealing with time analysis instead of classical statistical tools, fuzzy numbers approaches are more suitable and realistic. For fuzzy life time observations, a Generalized Kaplan Meier estimator is proposed in this paper. 5. Fuzzy Numbers Let t * be a fuzzy observation with a so-called characterizing function ξ( ), which is a function of one real variable obeying the following: 1. ξ : R [0;1]. 2. For all δ (0;1] the so-called δ-cut δ (t * ) = {t R : ξ(t) δ} is a finite union of compact intervals [a δ,j ; b δ,j ], i.e δ (t * ) = k δ j=1 [a δ,j ; b δ,j ] φ. 3. The support of ξ( ) is bounded, i.e. supp[ξ( )] = {t R : ξ(t) > 0} [a ; b]. The set of all fuzzy numbers is denoted by F(R). If all δ-cuts of a fuzzy number are non-empty closed bounded intervals, the corresponding fuzzy number is called a fuzzy interval.

ŚLĄSKI 10 Muhammad Shafiq, Reinhard Viertl 6. Fuzzy Vectors A n-dimensional fuzzy vector t is determined by its so-called vector characterizing function ζ(.,,. ) which is a real function of n real variables t 1, t 2,, t n obeying the following three conditions: 1. ζ :R n [0 ; 1]. 2. For all δ (0 ; 1] the so-called δ-cut δ t = {t R n : ζ t δ} is non-empty, bounded, and a finite union of simply connected and closed sets. 3. The support of ζ(.,,. ) defined by supp [ζ(.,,. )] = {t R : ζ t > 0} is a bounded set. The set of all n-dimensional fuzzy vectors is denoted by F(R n ). Let T be a stochastic quantity with observation space M T [0 ; ), and a sample of size n i.e t 1, t 2,, t n is considered from it. Each t i is an element of the observation space and (t 1, t 2,, t n ) is an element of the so-called sample space M T n which is the Cartesian product of n copies of M T, i.e. M T n M T M T M T. While on the other hand in the case of fuzzy observations, each fuzzy observation t i, i = 1(1)n with characterizing function ξ i ( ) is a fuzzy element of M T then (t 1, t 2,, t n ) is not a fuzzy element of M T n. In order to generalize the Kaplan Meier estimator, the aggregation of the fuzzy observations into a fuzzy element of the sample space is necessary. To construct a fuzzy element (fuzzy vector) of the sample space M T n usually the so-called minimum t-norm is used. For the vector-characterizing function of the combined fuzzy sample t (t 1, t 2,, t n ) applying the minimum t-norm, i.e. ζ(t 1, t 2,, t n ) = min{ξ 1 (t 1 ), ξ 2 (t 2 ),, ξ n (t n )} (t 1, t 2,, t n ) R n, a fuzzy element of M T n R n is obtained, whose vector characterizing function is ζ(.,,. ). Remark: The δ-cuts of the combined fuzzy sample will be obtained as the Cartesian products of the δ-cuts of respective fuzzy observations, i.e. n δ [ζ(.,,. )] = i=1 δ [ξ i ( )] δ (0 ; 1] [Viertl 2011]. Extension Principle: This is the generalization of an arbitrary function g: M N for fuzzy argument value a in M. Let a be a fuzzy element of M with membership function μ: M [0 ; 1], then the fuzzy value y = g(a ) is the fuzzy element y in N whose membership function θ( ) is defined by

Generalized Kaplan Meier Estimator 11 sup{ μ(a) a M, g(a) = y} if a: g(a) = y θ(y): = y N 0 if a: g(a) = y ŚLĄSKI [Klir and Yuan 1995]. Theorem: For a continuous function f: R R and for a fuzzy interval t the following holds true: δ [f(t )] = min f(t) t δ (t ) ; max f(t) t δ (t ) δ (0 ; 1] where min f(t), t 0 determines the lower end of the δ-cut, and t δ (t ) max f(t) t δ (t ) determines the upper end of the δ-cut of the fuzzy value f(t ) [Viertl 2011]. Examples of characterizing functions of fuzzy life times are depicted in Figure 2. ξ i ( t ) 0 0.2 0.4 0.6 0.8 1 0 20 40 60 80 100 t (Time) Figure 2. Fuzzy sample Source: own elaboration. For the generalized Kaplan Meier estimator S (t), upper and lower δ-level curves are obtained with the help of δ-cuts from the above mentioned theorem in the following way: δ (S (t)) = min S t t n i=1 δ t i ; max S t t n i=1 δ t i with t = (t 1, t 2,, t n ) [ 0 ; ) n δ (0 ; 1].

ŚLĄSKI 12 Muhammad Shafiq, Reinhard Viertl Where min S t t n i=1 δ t i lower δ-level curve and max S t is the lower end of the δ-cut which defines the t n i=1 δ t i is the upper end of the δ-cut which defines the upper δ-level curve. The above mathematical calculations are made through the following algorithm: 1. The values for δ are taken from 0 to 1 with an increment (0 ; 1). 2. For a given value of δ calculate the δ-cut of the fuzzy combined sample t. 3. Taking minimum and maximum from the δ-cuts to generate hypothetical classical samples. 4. The Kaplan Meier survival probabilities are calculated and the Kaplan Meier survival curves are drawn for fixed δ-level. 5. Steps 2-4 are performed for each δ = 0 ( ) 1. Example: For the fuzzy life time data given in Figure 2 the lower δ-level curves and upper δ-level curves of the generalized Kaplan Meier estimator are calculated for δ = 0, 0.2, 0.4, 0.6, 0.8, 1. They are depicted in Figure 3. 2 S(t) 0 0.2 0.4 0.6 0.8 1 0 20 40 60 80 100 Figure 3. Generalized Kaplan Meier estimator for the fuzzy sample from Figure Source: own elaboration.

Generalized Kaplan Meier Estimator 13 The generalized estimated survival curve (generalized Kaplan Meier estimator) is depicted in Figure 3. The functions are the lower and upper δ-level curves defined by the considered δ-levels. ŚLĄSKI 7. Conclusion The precise measurement of a continuous variable is impossible. Survival time observations are usually assumed as precise numbers. However, these observations are of a continuous nature and therefore survival time observations are more or less fuzzy. Consequently, fuzzy numbers are more suitable and realistic to describe real survival times. In the given study, the classical Kaplan Meier estimator based on precise observations is generalized for fuzzy life time observations. References Deshpande J.V., Purhit S.G., Life Time Data: Statistical Models and Methods, World Scientific Publishing, Singapore 2005. Huang H.-Z., Zuo M.J., Sun, Z.-Q., Bayesian reliability analysis for fuzzy lifetime data, Fuzzy Sets and Systems 2006, Vol. 157(12), pp. 1674 1686. Kaplan E.L., Meier P., Nonparametric estimation from incomplete observations, Journal of the American Statistical Association 1958, Vol. 53(282), pp. 457 481. Klir G., Yuan B., Fuzzy Sets and Fuzzy Logic Theory and Applications, Upper Saddle River: Prentice Hall, 1995. Lee E.T., Wang J.W, Statistical Methods for Survival Data Analysis, Wiley, New Yersey 2003. Lee K.H., First Course on Fuzzy Theory and Applications, Springer, Heidelberg 2005. Viertl R., On reliability estimation based on fuzzy lifetime data, Journal of Statistical Planning and Inference 2009, Vol. 139(5), pp. 1750 1755. Viertl R., Statistical Methods for Fuzzy Data, Wiley, Chichester 2011. Viertl R., Hareter D., Beschreibung und Analyse unscharfer Information Statistische Methoden für unscharfe Daten, Springer, Wien 2006. Wu H.-C., Statistical confidence intervals for fuzzy data, Expert Systems with Applications 2009, Vol. 36(2), pp. 2670 2676.

ŚLĄSKI 14 Muhammad Shafiq, Reinhard Viertl UOGÓLNIONY ESTYMATOR KAPLANA MEIERA DLA ROZMYTEGO CZASU PRZEŻYCIA Streszczenie: Analiza przeżycia definiowana jest jako zestaw metod badawczych służących do określenia czasu zajścia pewnego wyspecyfikowanego zdarzenia (losowego). W szczególności zdarzeniem takim jest śmierć człowieka. Do estymacji czasu przeżycia stosowana jest metoda Kaplana-Mayera. W 2009 r. Viertl wykazał, że czasu życia nie można określić precyzyjnie i zaproponował, by stosować liczby rozmyte. W niniejszym artykule zaproponowano uogólniony estymator Kaplana-Mayera wykorzystujący obserwacje rozmyte. Słowa kluczowe: liczby rozmyte, estymatory Kaplana Meiera, dane nieprecyzyjne, czas przeżycia.