Produc Inegraion Richard D. Gill Mahemaical Insiue, Universiy of Urech, Neherlands EURANDOM, Eindhoven, Neherlands Augus 9, 21 Absrac This is a brief survey of produc-inegraion for biosaisicians. 1 Produc-Inegraion Produc-inegraion was inroduced more han 11 years ago by he Ialian mahemaician Vio Volerra, as a ool in he soluion of a cerain class of differenial equaions. I was sudied inensively by mahemaicians for half a cenury, bu finally he subjec became unfashionable and lapsed ino obscuriy. Tha is a piy, since ideas of produc-inegraion make a very naural appearance in survival analysis, and he developmen of his subjec (in paricular, of he Kaplan-Meier esimaor) could have been a lo smooher if produc-inegraion had been a familiar opic from he sar. The Kaplan-Meier esimaor is he produc-inegral of he Nelson-Aalen esimaor of he cumulaive hazard funcion; hese wo esimaors bear he same relaion o one anoher as he acual survival funcion and he acual cumulaive hazard funcion. There are many oher applicaions of produc-inegraion in survival analysis, for insance in he sudy of muli-sae processes (conneced o he heory of Markov processes), and in he heory of parial likelihood. Ordinary inegraion is a generalisaion of summaion, and properies of inegrals are ofen easily guessed by hinking of hem as sums of very, very many erms (all or mos of hem being very small). Similarly, produc-inegraion generalises he aking of producs; a produc inegral is a produc of many, many erms (all or mos of hem being very close o he number 1). Thinking of produc-inegrals in his simplisic way is acually very helpful. Properies of produc-inegrals are easy o guess and o undersand. The heory of producinegraion can be a grea help in sudying he saisical properies of saisical quaniies which explicily or implicily are defined in erms of produc-inegrals. Before defining produc-inegrals in general and exhibiing some of heir properies, we will discuss he relaion, in survival analysis, beween survival 1
funcion and hazard funcion. This will lead us naurally o he noion of produc-inegraion in he mos simple possible of conexs. Consider a survival ime T wih survival funcion S() = Pr(T > ), ; S() = 1. Suppose T is coninuously disribued wih a densiy f() and a hazard rae α(). These wo funcions have inuiive probabilisic meanings: for a small ime inerval, + h, he uncondiional probabiliy Pr( T + h) f().h; while he condiional probabiliy Pr( T + h T ) α().h. In fac, he probabiliy densiy f() = (d/d)s() while he hazard rae α() = f()/s(). One can mahemaically recover he disribuion funcion F () = 1 S() from he densiy by inegraion; F () = f(s)ds. Also one can recover he survival funcion from he hazard rae. Noing ha α() = (d/d) log S() one finds by inegraion (and using ha S() = 1, hence log S() = ), ha log S() = α(s)ds hence S() = exp( α(s)ds). This is simple enough, bu neiher he resul nor is derivaion have a probabilisic inerpreaion. If he survival ime T had a discree disribuion, one would inroduce he discree densiy f() = Pr(T = ) and he discree hazard α() = Pr(T = T ) = f()/s( ). Sill he survival funcion can be recovered from boh densiy and hazard, bu he formula in he laer case now seems quie differen: S() = (1 α(s)). The coninuous case formula S() = exp( α(s)ds) has herefore wo major defecs: firsly, i does no have any inuiive inerpreaion; and secondly; i gives he wrong generalisaion o he discree case. Here is how boh formulas can be unified and made inuiively inerpreable. Define he cumulaive hazard A() by, in he coninuous case, A() = α(s)ds, and in he discree case, A() = α(s). (These wo formulas are special cases of he compleely general expression A() = ds(s)/s(s ).) Now we can wrie, boh in he coninuous and he discree case, S() = (1 da(s)) (1) which can be inerpreed as he produc over many small ime inervals s, s + ds making up he inerval [,], of he probabiliy (1 da(s)). Since he hazard da(s) can be hough of as he probabiliy of dying in he inerval from s o s+ds given survival up o he beginning of ha ime inerval, 1 minus he hazard is he probabiliy of surviving hrough he small ime inerval given survival up o is sar. Muliplying over he small ime inervals making up [,] yields he uncondiional probabiliy of surviving pas ; in oher words, equaion (1) is jus he limiing form of he equaliy Pr(T > ) = k Pr(T > i T > i 1 )= i=1 k ( ) 1 Pr(T i T > i 1 ) where = < 1 < < k = is a pariion of he ime inerval [,]. Consider now he saisical problem of esimaing he survival curve S() given a sample of independenly censored survival imes. Le 1 < 2 <... i=1 2
denoe he disinc imes when deahs are observed; le r j denoe he number of individuals a risk jus before ime j and le d j denoe he number of observed deahs a ime j. We esimae he cumulaive hazard funcion A corresponding o S wih he Nelson-Aalen esimaor Â() = j This is a discree cumulaive hazard funcion, corresponding o he discree esimaed hazard α( j )=d j /r j, α() zero for no an observed deah ime. The produc-inegral of  is hen d j r j. Ŝ() = (1 dâ) = (1 d j ), r j j which is nohing else han he Kaplan-Meier esimaor. The acual definiion of he produc-inegral in (1) is he following: (1 da(s)) = lim (1 (A(i ) A( i 1 ))) max i i 1 where he limi is aken over a sequence of ever finer pariions = < 1 < < k = of he ime inerval [,]. From his poin we can choose eiher o sudy properies of he producinegral or define i in greaer generaliy. Boh aspecs are imporan in applicaions. Le us firs give a more general definiion. The imporan generalisaion is ha we will define produc-inegrals of marix-valued funcions, raher han jus scalar valued funcions. The concep now really comes ino is own, because when we muliply a sequence of marices ogeher he resul will generally depend on he order in which he marices are aken. Even in he coninuous case here will no be a simple exponenial formula expressing he resul in erms of an ordinary inegral. Muliplying producs of marices urns up in he heory of Markov processes, and his connecs direcly o he saisical analysis of muli-sae models in survival analysis. Suppose X() is a p p marix-valued funcion of ime. Suppose also X (or if you like, each componen of X) is righ coninuous wih lef hand limis. Le I denoe he ideniy marix. The produc-inegral of X over he inerval [,] is now defined as (I + dx(s)) = lim (I +(X(i ) X( i 1 ))) max i i 1 where as always he limi is aken over a sequence of ever finer pariions = < 1 < < k = of he ime inerval [,]. For he limi o exis, X has o be of bounded variaion; equivalenly, each componen of X is he difference of wo increasing funcions. 3
2 Applicaion o Markov processes We briefly skech he applicaion of produc-inegraion o Markov processes. Suppose an individual moves beween p differen saes as ime proceeds, saying in each sae for some random lengh of ime and hen jumping o anoher. Suppose he individual has inensiy α ij () of jumping from sae i o sae j a ime, given he whole pas hisory (in oher words, he process is Markov: he inensiy only depends on he presen ime and he presen sae). Define cumulaive inensiies A ij () = α ij(s)ds and negaive oal cumulaive inensiy of leaving a sae A ii = j i A ij. Collec hese ino a square marix valued funcion of ime A. Then one can show ha he marix of ransiion probabiliies P (,), whose ij componen is he probabiliy of being in sae j a ime given he individual sared a ime in sae i, is given by a produc-inegral of A: P (,)= (I + da(s)). This formula generalises he usual formula for ransiion probabiliies of a discree ime Markov chain, since he marix (I + da(s)) can be hough of as he ransiion probabiliy marix for he small ime inerval s, s + ds. Given possibly censored observaions from a Markov process, one can esimae he elemens of he marix of cumulaive inensiies A by Nelson-Aalen esimaors. The corresponding esimae of he ransiion probabiliies, he Aalen- Johansen esimaor, is found by aking he produc-inegral of Â. 3 Mahemaical properies A very obvious propery of produc-inegraion is is muliplicaiviy. Defining he produc-inegral over an arbirary ime inerval in he naural way, we have for < s < s (I + dx) = (I + dx) (I + dx). s We can guess many oher useful properies of produc-inegrals by looking a various simple ideniies for finie producs. For insance, i is ofen imporan o sudy he difference beween wo produc-inegrals. Now if a 1,...,a k and b 1,...,b k are wo sequences of numbers, we have he ideniy: (1 + ai ) (1 + b i )= + a i )(a j b j ) j i<j(1 (1 + b i ). i>j This can be easily proved by replacing he middle erm on he righ, (a j b j ), by (1 + a i ) (1 + b i ). Expanding abou his difference, he righ hand side becomes ( (1 + a i ) (1 + b i ) (1 + a i ) (1 + b i )). j i j i>j i>j 1 4
This is a elescoping sum; wriing ou he erms one by one he whole expression collapses o he wo ouside producs, giving he lef hand side of he ideniy. The same manipulaions work for marices. In general i is herefore no surprise, replacing sums by inegrals and producs by produc-inegrals, ha (I + dx) (I + dy )= s= s (I + dx)(dx(s) dy (s)) (I + dy ). s+ This valuable ideniy is called he Duhamel equaion. As an example, consider he scalar case, le A be a cumulaive hazard funcion and  he Nelson-Aalen esimaor based on a sample of censored survival imes. Le S be he corresponding survival fucnion and Ŝ he Kaplan-Meier esimaor. The Duhamel equaion hen becomes he ideniy Ŝ() S() = s= Ŝ(s )(dâ(s) da(s))(s()/s(s)) which can be exploied o ge boh small sample and asympoic resuls for he Kaplan-Meier esimaor. We illusrae one oher imporan ideniy in a similar manner. Noe ha (1 + a i ) (1 + a i )= (1 + a i )a j. i j Adding over j from 1 o k gives us i k(1 + a i ) 1= j (1 + a i )a j. Now we can guess he ideniy (I + dx) I = s= s (I + dx)dx(s). This is essenially Kolmogorov s forward equaion from he heory of Markov processes, and i is he ype of equaion solve Y () =I + Y (s )dx(s) for unknown Y, given X which originally moivaed Volerra o inven producinegraion. Y () = (I + dx) is he unique soluion of his equaion. (I is also jus a special case of he Duhamel equaion when we ake he second inegrand Y idenically equal o zero). 4 Concluding remarks The produc-inegral seems firs o have been used as a fundamenal ool in modern survival analysis by Aalen and Johansen (1978), hough i also appears in a more informal conex in Cox (1972) and in Kalbfleisch and Prenice 5
(198). Surveys of he heory of produc-inegraion are given by Gill and Johansen (199), Gill (1994). The former paper also pays aenion o he earlier hisory of he subjec. In paricular, i is worh menioning ha a large variey of noaions has been used for he produc-inegral, including large curly P s, produc-symbols, and he ordinary inegral sign embellished wih a half circle over he op. As well as playing a role in he heory of he Kaplan-Meier and he Aalen- Johansen esimaors, he produc-inegral is also a useful way o wrie likelihoods and parial-likelihoods in survival analysis, since hese can be usefully hough of as coninuous producs of condiional likelihoods for he daa in each new infiniesimal ime inerval given he pas. The produc-inegral is also useful in mulivariae survival analysis. In paricular, Dabrowska s 1988 mulivariae produc-limi esimaor is based on a represenaion of a mulivariae survival funcion in erms of produc-inegrals of a collecion of higher-dimensional join and condiional hazard funcions. The book Andersen e al. (1993) gives a brief survey of he heory and many deailed applicaions, covering all he opics menioned above. References Aalen, O. and S. Johansen (1978). An empirical ransiion marix for nonhomogenous Markov chains based on censored observaions. Scandinavian Journal of Saisics 5, 141 15. Andersen, P., Ø. Borgan, R. Gill, and N. Keiding (1993). Saisical Models Based on Couning Processes. New York: Springer-Verlag. Cox, D. (1972). Regression models and life ables (wih discussion). Journal of he Royal Saisical Sociey, Series B 34, 187 22. Dabrowska, D. (1988). Kaplan-meier esimae on he plane. Annals of Saisics 16, 1475 1489. Gill, R. (1994). Lecures on survival analysis. In P. Bernard (Ed.), Lecures on Probabiliy Theory (Ecole d Éé de Probabiliés de Sain Flour XXII - 1992), Berlin, pp. 115 241. Springer-Verlag (SLNM 1581). Gill, R. and S. Johansen (199). A survey of produc-inegraion wih a view owards applicaion in survival analysis. Annals of Saisics 18, 151 1555. Kalbfleisch, J. and R. Prenice (198). The Saisical Analysis of Failure Time Daa. New York: Wiley. 6