Practical tools for survival analysis with heterogeneity-induced competing risks

Size: px
Start display at page:

Download "Practical tools for survival analysis with heterogeneity-induced competing risks"

Transcription

1 Practical tools for survival analysis with heterogeneity-induced competing risks J. van Baardewijk a, H. Garmo b, M. van Hemelrijck b, L. Holmberg b, A.C.C. Coolen a,c a Institute for Mathematical and Molecular Biomedicine, King s College London b Cancer Epidemiology Group, King s College London c London Institute for Mathematical Sciences June 2013 When censoring by non-primary risks is informative, many conventional survival analysis methods are not applicable. The observed primary risk hazard rates are no longer estimators of what they would have been in the absence of other risks. Recovering the decontamined primary hazard rates and survival functions from survival data is called the competing risk problem. Most competing risks studies assume implicitly that risk correlations are induced by cohort or disease heterogeneity that was not captured by covariates. If in addition one assumes that proportional hazards holds at the level of individuals, for all risks, one obtains a generic statistical description that allows us to handle the competing risk problem, and from which Cox regression, frailty and random effects models, and latent class models can all be recovered in special limits. From this we derive new practical tools for epidemiologists, such as formulae for decontaminated primary risk hazard rates and survival functions, and for retrospective assignment of patients to cohort sub-classes (if these exist). Synthetic data confirm that our approach can map a cohort s substructure, and remove heterogeneity-induced false protectivity and false exposure effects. Application to survival data, with prostate cancer as the primary risk (the ULSAM study), leads to plausible alternative explanations for previous counter-intuitive inferences. Keywords: survival analysis; heterogeneity; competing risks 1

2 Contents 1 Introduction 3 2 Definitions and general identities Survival probability and crude cause specific hazard rates Decontaminated cause-specific risk measures the competing risk problem Heterogeneity-induced competing risks Connection between cohort level and individual level descriptions Heterogeneous cohorts and different levels of risk complexity Implications of having heterogeneity-induced competing risks Canonical level of description for resolving heterogeneity-induced competing risks Estimation of W[h 0,..., h R z] from survival data Parametrisation of W[h 0,..., h R z] Generic parametrisation Connection with conventional regression methods A simple latent class parametrisation for heterogeneity-induced competing risks Application to synthetic survival data Cohort substructure and regression parameters Decontaminated survival functions Retrospective class identification Applications to prostate cancer data Cohort substructure and regression parameters Decontaminated survival curves Discussion 24 A Connection between cohort level and individual level cause-specific hazard rates 28 B Equivalence of formulae for data likelihood in terms of W [h 0,..., h R z] 28 C Connection with standard regression methods 29 D Numerical details 31 2

3 1 Introduction For general introductions to the survival analysis literature we refer to the textbooks [1, 2, 3, 4]. The competing risk problem is the question of how to handle contamination by informative censoring of those primary risk characteristics that may be inferred from survival data [5], such as cause-specific hazard rates and survival curves. One would like to know their values in the hypothetical situation where all non-primary risks were disabled. This is nontrivial, since disabling non-primary risks will generally affect the hazard rate of the primary risk. If all risks have statistically independent event times, censoring is not informative and simple methods are available for analysis and regression, such as those of [6, 7]. Unfortunately, one cannot infer presence or absence of risk correlations from survival data alone [8], and in many cases the independence assumption is expected to be incorrect. The importance of having reliable epidemiological tools for isolating statistical features even for interrelated comorbid diseases is increasingly recognised [9]. Unaccounted for risk correlations can lead to incorrect inferences [10, 11, 12, 13, 14], and simulations with synthetic data confirm that uncritical use of simple methods can be quite misleading; see e.g. Figure 1. Risk correlations are often fingerprints of residual heterogeneity, i.e. heterogeneity that is not visible in the covariates. A primary and a secondary disease could share molecular pathways, or be jointly influenced by factors that were not observed. Or, a given disease could in fact be a spectrum of distinct diseases, each with specific covariate associations. Many authors have tried to model residual cohort heterogeneity, usually starting from Cox-type cause-specific hazard rates, but with additional individualised risk multipliers. If the multipliers do not depend on the covariates we speak of frailty models, e.g. [15, 16, 17, 18, 19], and regard them as representing the impact of unobserved covariates, see e.g. [20]. If they depend on the covariates we would speak of random effects models, e.g. [21, 11, 22, 23, 24]. If the distribution of frailty factors takes the form of discrete clusters (latent classes, [25]), we obtain the latent class models; see e.g. [26] or [27] (which combines frailty and random effects with covariate-dependent class allocation as in [28]). Further variations include time-dependent frailty factors, and models in which the latent class of each individual is known. Most frailty and random effects studies, however, quantify only the hazard rate of the primary risk. They thereby capture some consequences of cohort heterogeneity, but without modelling also the non-primary risks it is fundamentally impossible to deal with the competing risk problem. The approach of [29] focuses on parametrising the covariate-conditioned cumulative incidence function of the primary risk. It is is conceptually similar to [7]; both model the primary risk profile in the presence of all risks. Cumulative incidence functions appear more intuitive than hazard rates; they are directly measurable, and incorporate also the impact of non-primary risks. However, expressing the data likelihood in terms of cumulative incidence functions is more cumbersome than in terms of hazard rates. But while [29] quantify risks that compete, they do not address the competing risk problem. Further developments involve e.g. alternative parametrisations [30, 31], application to the cumulative incidence of non-primary risks [32], and the inclusion of frailty factors [33]. Another community of authors have focused further on identifying which mathematical constraints or conditions need to be imposed on multi-risk survival analysis models in order to circumvent Tsiatis identifiability problem, and infer the joint event time distribution unambiguously from survival data. Examples involving survival data with covariates are [34], and [35]. However, also these studies do not take the step towards decontamination tools. So we face the unsatisfactory situation of multiple distinct approaches to cohort heterogeneity and competing risks. Only few address the competing risk problem, which requires modelling all risks and their correlations. None give formulae for decontamined primary risk measures. In this 3

4 1.0 primary risk only 1.0 primary & secondary risk 0.8 S1 KM S1 KM UQ LQ 0.4 LQ t 0.0 UQ t Figure 1: Illustration of the dangers of using Kaplan-Meier (KM) estimators [6] in the presence of competing risks. We show KM estimators S1 KM of the primary risk survival function, for upper and lower quartiles (UQ, LQ) of covariate 1, for two data sets whose primary risk characteristics are identical. They differ in that on the left only the primary risk is active, whereas on the right a second risk is activated which causes informative censoring. The KM estimators on the right suggest a strong association of the primary risk with covariate 1, which is in fact spurious. What we see is false exposure. One of the aims of this work is to develop practical formulae for decontaminated survival estimators, that for both data sets would report the correct curves (i.e. the ones on the left). Details of these synthetic data are given in a later section. work we try to build a generic statistical description of competing risks and a partial resolution of the competing risk problem that unifies the various schools of thought above. Our work is based on the observation that most papers implicitly assume that correlations between competing risks are induced by residual cohort heterogeneity. We show how this simple and transparent assumption leads in a natural way to a formalism with exact formulae for decontaminated primary risk measures, in which Cox regression, frailty models, random effect models, and latent class models are all included as special cases, and which produces transparent parametrisations of the cumulative incidence function (the language of Fine and Gray). This report is organised as follows. In section 2 we define the competing risk problem in mathematical terms. We inspect in section 3 the relation between cohort level and individual level statistical descriptions, classify different levels of risk complexity from the competing risk perspective, and define what we mean by heterogeneity-induced competing risks. We derive the implications of having heterogeneity-induced competing risks, and show that the canonical mathematical description involves the covariate-conditioned functional distribution of the individual hazard rates of all risks. In section 4 we work out the theory for a natural family of such parametrisations, that includes conventional methods (Cox regression, frailty, random effects and latent class models) in special limits. In the remaining sections we apply the formalism to synthetic data, and to real survival data from the ULSAM longitudinal study [36, 37], with prostate cancer as the primary risk. This application leads to appealing and transparent new explanations for previously counter-intuitive inferences. We end with a summary of our findings. 4

5 2 Definitions and general identities We recall briefly the basic definitions of survival analysis, and define the competing risk problem in mathematical terms. In doing so we will try to stay as close as possible to the notation conventions and terminology of [2]. 2.1 Survival probability and crude cause specific hazard rates We imagine a cohort of individuals who are subject to R true risks, labelled by r = 1... R. We use r = 0 to indicate the end-of-trial censoring event, since for the structure of the theory there is no difference between censoring due to alternative risks and censoring due to trial termination. Most of the mathematical relations of survival analysis are derived directly from the joint distribution P(t 0,..., t R ) of event times (t 0,..., t R ), where t r 0 is the time at which risk r triggers an event. 1 From this distribution follow the crude cause-specific hazard rates, i.e. the probabilities per unit time that given failures occur at time t if until then none of the possible events has yet occurred: h r (t) = 1... dt 0... dt R P(t 0,..., t R )δ(t t r ) S(t) 0 0 R θ(t r t) (1) We used the delta-distribution δ(x), defined by the identity dx δ(x)f(x) = f(0), and the step function, defined by θ(x > 0) = 1 and θ(x < 0) = 0. It is easy to show that the survival function can be written as S(t) = e R t r=0 ds hr(s) 0, (2) The crude cause-specific hazard rates provide the link between theory and observations, since the probability density P (t, r) to find the earliest event occurring at time t and corresponding to risk r, is given by P (t, r) = h r (t)e R t ds h 0 r (s), (3) These relations hold irrespective of whether we have a large or a small cohort, or even a single individual, although the values of P(t 0,..., t R ) would be different. However, at some point we will work simultaneously with cohort level and individual level descriptions, and it will be necessary to specify with further indices to which we refer. Conditioning on covariate information is trivial. For simplicity we assume the covariates to be discrete; for continuous covariates one finds similar formulae, with integrals instead of sums. Knowing the values z IR p of p covariates means starting from the distribution P(t 0,..., t R z) which gives the event time statistics of the sub-cohort of those individuals i that have covariate vector z i = z. It is related to the previous distribution of the full cohort via P(t 0,..., t R ) = z P(t 0,..., t R z)p (z), where P (z) gives the fraction of the cohort that have covariates z. We then obtain the following covariate-conditioned survival functions and crude cause-specific hazard rates: r r S(t z) = R... dt 0... dt R P(t 0,..., t R z) θ(t r t) (4) 0 0 r=0 1 This starting point is not fully general. It assumes that all risks will ultimately lead to failure. One can include events with a finite chance of not happening at any time, by adding for each risk r a binary variable τ r to indicate whether or not the calamity button is pressed at time t r. 5

6 h r (t z) = 1... dt 0... dt R P(t 0,..., t R z)δ(t t r ) S(t z) 0 0 R θ(t r t) (5) r r with the usual relation between survival and crude hazard rates, and the usual link to observations: S(t z) = e R t ds h 0 r (s z) (6) P (t, r z) = h r (t z)e R t 0 ds h r (s z) (7) If we study a cohort of N individuals, with covariate vectors {z 1,..., z N }, the survival data D usually consist of N samples of event time and event type pairs (t, r), viz. D = {(t i, r i )}. The probability density for an individual with covariate vector z to report (t, r) is given by (7), so the data likelihood P (D) = N P (t i, r i z i ) obeys log P (D) = R { N N δ r,ri log h r (t i z i ) r=0 ti 0 } dt h r (t z i ) (8) 2.2 Decontaminated cause-specific risk measures the competing risk problem The aim of survival analysis is to extract statistical patterns from the data, that allow us to make predictions for new individuals, if we know their covariates. We are often interested in one specific primary risk. Many relevant risk-specific quantities can be calculated once we know the crude hazard rates. For instance, the cause-specific cumulative incidence function F r (t), the probability that event r has been observed at any time prior to time t, is F r (t) = t 0 dt S(t )h r (t ) (9) Although F r (t) refers to risk r specifically, it can be heavily influenced by other risks. If it is small, this may be because event r is intrinsically unlikely, or because it tends to be preceeded by events r r. One cannot tell. To obtain decontaminated information on a primary risk r one must consider the hypothetical situation where all other risks r r are disabled. This means replacing 2 R P(t 0,..., t R ) P(t r ) lim δ(t r Λ) (10) Λ r r with the the marginal event time distribution P(t r ) =... [ s r dt s]p(t 0,..., t R ). Inserting (10) into (1) gives, as expected, zero values for all non-primary crude hazard rates, but it also affects the value of the primary risk hazard rate. One now finds the following formulae for the decontaminated (conditioned) cause-specific survival function and hazard rate for risk r, indicated with tildes to distinguish them from their crude counterparts: S r (t) = S r (t z) = t t dt r P(t r ), hr (t) = d dt log S r (t) (11) dt r P(t r z), hr (t z) = d dt log S r (t z) (12) 2 It was noted by [39] that one cannot be sure that this statement is always appropriate; it may be that correlated risks share biochemical pathways such that they can never be deactivated independently. 6

7 In general one will indeed find that h r (t) h r (t) and h r (t z) h r (t z). Equations (11,12) tell us that to determine the decontaminated risk measures for the primary risk r we must estimate the marginal distributions P(t r ) or P(t r z) from survival data. Tsiatis showed [8] that this is impossible without further assumptions. For every P(t 0,..., t R ) there is an alternative distribution P(t 0,..., t R ) that describes independent times, but such that P and P both generate identical cause-specific hazard rates for all risks: P(t 0,..., t R ) = R (h r (t)e t ds hr(s)) 0 (13) r=0 in which {h r (t)} are the cause-specific hazard rates of P(t 0,..., t R ). Hence the only information that can be estimated from survival data alone are the (covariate-conditioned) crude cause-specicifc hazard rates. One cannot calculate P(t 0,..., t R ) or P(t 0,..., t R z) and their marginals. Without further information or assumptions there is no way to disentangle the different risks. This is the identifiability problem. One way out is to assume that all risks are statistically independent, i.e. that P(t 0,..., t R z) = Rr=0 P(t r z). This solves trivially the competing risk problem, since now one finds that h r (t z) = h r (t z) for all r, and S r (t z) = e t 0 ds hr(s z) (14) This assumption underlies the clinical use of e.g. Cox s proportional hazards regression [7] and Kaplan-Meier estimators of the cause-specific survival function [6], which would otherwise be inappropriate tools (see Figure 1). 3 Heterogeneity-induced competing risks We work out the consequences of assuming that event time correlations are caused by residual cohort heterogeneity. This is much weaker than assuming risk independence, but still allows us to deal with the competing risk problem. 3.1 Connection between cohort level and individual level descriptions The standard survival analysis formalism is built solely on the starting point of a joint event time distribution; it can hence also be applied to risk at the level of individuals. Let N be the number of individuals in the cohort to which P(t 0,..., t R ) refers, labelled by i = 1... N. We write the joint event time distribution of individual i in this cohort as P i (t 0,..., t R ), and the crude cause-specific hazard rates of individual i as h i r(t). It then follows that h i r(t) = 1... dt 0... dt R P i (t 0,..., t R )δ(t t r ) S i (t) 0 0 R θ(t r t) (15) S i (t) = e R t r=0 ds 0 hi r (s) (16) P i (t, r) = h i r(t)e R t r r 0 ds hi r (s), (17) S i (t) is the survival function of individual i, and P i (t, r) is the probability that the first event for individual i occurs at time t and corresponds to risk r. When describing a cohort, we have the added 7

8 uncertainty of not knowing which individuals were picked from the population, so the connection between the two levels is simply given by P(t 0,..., t R ) = 1 N P i (t 0,..., t R ) (18) N i, z P(t 0,..., t R z) = i =z P i(t 0,..., t R ) i, z i =z 1 (19) For quantities that depend linearly on the joint event time distribution, the link between cohort level and individual level is a simple averaging over the label i, possibly conditioned on covariates, e.g. S(t) = 1 N S i (t), N S(t z) = P (t, r) = 1 N i, z i =z S i(t), P (t, r z) = i, z i =z 1 N P i (t, r) (20) i, z i =z P i(t, r) i, z i =z 1 (21) In contrast, quantities such as the crude cause-specific hazard rates depend in a more complicated way on P(t 0,..., t R ), via their conditioning on survival. Cohort level cause-specific hazard rates, for instance, are not direct averages over their individual level counterparts. Instead one finds (see appendix A for details): h r (t) = h r (t z) = N h i r(t)e R t N e R t i,z i =z hi r(t)e R t 0 ds hi r (s) 0 ds hi r (s), (22) 0 ds hi r (s) i,z R t (23) i =z e ds 0 hi r (s) 3.2 Heterogeneous cohorts and different levels of risk complexity We always allow our cohorts to be heterogeneous in terms of covariates; we refer here to heterogeneity in the relation between covariates and risks. A homogeneous cohort is one in which this relation is uniform, so the distribution P i (t 0,..., t R ) can depend on i only via z i. Hence there exists a function P(t 0,..., t R z) such that P i (t 0,..., t R ) = P(t 0,..., t R z i ) for all i (24) The same is then true for the cause-specific hazard rates: h i r(t) = h r (t z i ) for all i, in which h r (t z) is related to P(t 0,..., t R z) via equations (4,5). It also follows directly from (23) that at cohort level the covariate-conditioned event time distribution is P(t 0,..., t R z) = P(t 0,..., t R z), and the covariateconditioned crude hazard rates are h r (t z) = h r (t z), as expected. A property of homogeneous cohorts is that uncorrelated individual level risks, i.e. P i (t 0,..., t R ) = R r=0 P i (t r ), imply uncorrelated covariate-conditioned cohort level risks. This follows from (19): P(t 0,..., t R z) = = i, z i =z Rr=0 P i (t r ) i, z i =z 1 = R P(t r z) = r=0 8 i, z i =z Rr=0 P(t r z i ) i, z i =z 1 R P(t r z) (25) r=0

9 In heterogeneous cohorts (24) does not hold; individuals have further features, not captured by covariates, that impact upon their risks. Here one will observe a gradual filtering : high-risk individuals will drop out early, causing time dependencies at cohort level that have no counterpart at individual level. For instance, even if all individuals have stationary hazard rates, one would according to (22,23) still find time dependent crude cohort level hazard rates. Here, having uncorrelated individual level risks no longer implies having uncorrelated covariate-conditioned cohort level risks. One can have P i (t 0,..., t R ) = R r=0 P i (t r ), but still P(t 0,..., t R z) R r=0 P(t r z). Risk correlations can thus be generated at different levels, and there is a natural hierarchy of cohorts in terms of risk complexity, with implications for the applicability of methods: Level 1: homogeneous cohort, no competing risks individual: P i (t 0,..., t R ) = R r=0 P(t r z i ) cohort: P(t 0,..., t R z) = R r=0 P(t r z) The members of the cohort differ in their covariates, but they are homogeneous in terms of the link between covariates and risk. For each individual, the event times of all risks are statistically independent, and their probabilities are determined by the covariates alone. Since there is no residual heterogeneity, there is no competing risk problem; crude and true cause-specific hazard rates and survival functions are identical. Level 2: heterogeneous cohort, no competing risks individual: P i (t 0,..., t R ) = R r=0 P i (t r ) cohort: P(t 0,..., t R z) = R r=0 P(t r z) For each individual the event times of all risks are statistically independent, but their susceptibilities are no longer determined by the covariates alone (reflecting e.g. disease sub-groups or the impact of unobserved covariates). However, this residual heterogeneity does not manifest itself in risk correlations at cohort level. One will therefore observe heterogeneity-induced effects, such as cohort filtering, but no competing risks. Level 3: heterogeneity-induced competing risks individual: P i (t 0,..., t R ) = R r=0 P i (t r ) cohort: P(t 0,..., t R z) R r=0 P(t r z) For each individual the event times of all risks are statistically independent, and their susceptibilities are not determined by the covariates alone, similar to level 2. However, residual cohort heterogeneity now leads to risk correlations at cohort level, which cause informative censoring. Here one will therefore observe competing risks phenomena. Level 4: individual and cohort level competing risks individual: P i (t 0,..., t R ) R r=0 P i (t r ) cohort: P(t 0,..., t R z) R r=0 P(t r z) This is the most complex situation from a modelling point of view, where both at the level of 9

10 individuals and at cohort level the event times of different risks are correlated. We will again observe competing risk phenomena, but we can no longer say where these are generated. In fact, correlations amongst non-primary risks are harmless; what matters is only whether there are correlations between primary and non-primary risks. We could in principle make a further distinction between having P(t 0,..., t R z) = R r=0 P(t r z) and P(t 0,..., t R z) = P(t r z)p(t 0,..., t r 1, t r+1,..., t R z); the latter being weaker but still sufficient. Here we will not persue this; it is clear how the theory can incorporate this distinction. Levels 1 and 2 are those where the assumption of statistically independent risks, underlying e.g. Cox regression and Kaplan-Meier estimators, is valid. At level 2 there is still no competing risk problem, but the heterogeneity demands parametrisations of crude cohort level primary hazard rates that are more complex than those of Cox, which is the rationale behind frailty and random effects models, and the latent class models of [27]. All these approaches still only model the primary risk, and therefore cannot handle cohorts beyond level 2. Level 4 is the most complex scenario, which we will not deal with in this work. Our focus is on level 3: cohorts with heterogeneity-induced competing risks. Here the correlations between cohort level event times have their origin strictly in correlations between disease susceptibilities and covariate associations of individuals, e.g. someone with a high hazard rate for a disease A may also be likely to have a high hazard rate for B, for reasons not explained by the covariates. 3.3 Implications of having heterogeneity-induced competing risks We now show that the assumption that competing risks are induced by residual cohort heterogeneity (level 3 in our classification) leads to a resolution of the competing risk problem. In the case of heterogeneity-induced competing risks we have independent event times at the level of individuals, hence for each individual i we know that P i (t r ) = h i r(t)e t 0 ds hi r (s) (26) The covariate-conditioned cohort level event time marginals are therefore P r (t r z) = i,z i =z hi r(t)e t 0 ds hi r (s) i,z i =z 1 (27) and via (12) we can write the decontaminated cause-specific survival function and hazard rate as S r (t z) = i,z i =z e t 0 ds hi r(s) i,z i =z 1 (28) h r (t z) = i,z i =z hi r(t)e t 0 ds hi r(s) i,z i =z e t 0 ds hi r(s) (29) We used 0 ds hi r(s) = for all (i, r), which follows from the normalisation of P i (t 0,..., t R ). Expressions (28,29) are similar but not identical to formulae (14,23) for the decontaminated causespecific survival function and the crude covariate-conditioned cause-specific hazard rates which would 10

11 have been found if all risks had been independent: S r (t z) = e t h r (t z) = ds hr(s z) 0 t (30) ds 0 hi r (s) i,z i =z hi r(t)e R i,z R t (31) i =z e ds 0 hi r (s) The differences are interpreted easily. In (29) the probability that individual i survives until time t is given by exp[ t 0 ds hi r(s)] (which causes the cohort filtering ), since no risk other than r is active. In contrast, in (31) all risks contribute to cohort filtering. Formulae (29) and (31) will therefore be non-identical, unless we have risk independence, which in (31) would give rise to an identical factor in numerator and denominator that would drop out. The differences between (28,29) and (30,31) quantify the severity of the competing risk problem in our cohort. We also see that in homogenous cohorts one indeed recovers S r (t z) = S r (t z) and h r (t z) = h r (t z). Similarly, we can work out the link between the theory and survival data. Inserting (17) into (21) leads us to P (t, r z) = i, z i =z hi r(t)e R t ds 0 hi r (s) i, z i =z 1 (32) Hence the assumption that competing risks (if present) are induced by heterogeneity leads to relatively simple formulae for the decontaminated cause-specific quantities of interest and for the likelihood of observing individual survival data. What remains is to identify the minimal level of description required for evaluating these formulae, and to determine how the required information can be estimated from survival data. 3.4 Canonical level of description for resolving heterogeneity-induced competing risks The canonical level of description is the minimal set of observables in terms of which we can write the decontaminated risk-specific quantities (28,29) (so that we can calculate what we are interested in) and the data likelihood (32) (so it can be estimated). In (28,29) we need the covariate-constrained distribution of individual hazard rates for the primary risk. In (32) we need in addition the covariateconstrained distribution of the cumulative rates of non-primary risks. In combination we see that the minimal description would be the functional distribution W[h r, h /r z] = i,z i =z δ F[h r h i r]δ F [h /r r r hi r ] i,z i =z 1 (33) Here δ F denotes the functional δ-distribution 3, defined by the functional integral identity {df}δ F [f]g[f] = G[f] f(t)=0 t 0 (34) 3 Where the δ-function can be interpreted as the probability distribution for a real-valued stochastic variable x without uncertainty that always takes the value x = 0, its functional generalisation δ F[f] can be interpreted as the functional probability distribution describing a real-valued function f acting on [0, ) that always takes the value f(t) = 0 for all t 0. For illustrations of its use see e.g. [40]. 11

12 W[h r, h /r z] represents for each possible choice of the function pair {h r (t), h /r (t)}: which fraction of those individuals in our cohort that have covariates z also have the individual primary hazard rates h i r(t) = h r (t) and the cumulative non-primary hazard rates r r hi r(t) = h /r (t). In practice it will often be advantageous to relax our requirement of a minimal description. Non-primary risks will often be mutually very different in their characteristics, so finding an efficient parametrisation for the dependence on r r hi r (t) in W[h r, h /r z] will be awkward. A slightly redundant alternative choice, but one that is more easily parametrised, would be Rr=0 i,z W[h 0,..., h R z] = i =z δ F [h r h i r] i,z i =z 1 (35) It gives the joint functional distribution over the cohort of all R + 1 individual cause-specific hazard rates at all times. The distribution (33) follows from (35) via W[h r, h /r z] = {dh 0... dh R} W[h 0,..., h R z] δ F [h r h r]δ F [h /r h r ] (36) r r For independent risks one would simply find the factorised form W[h 0,..., h R z] = R r=0 W[h r z]. If we know (35) we can write the decontaminated risk-specific quantities (28,29) as S r (t z) = h r (t z) = {dh 0... dh R } W[h 0,..., h R z] e t {dh0... dh R } W[h 0,..., h R z] h r (t)e t 0 ds hr(s) (37) ds hr(s) 0 {dh0... dh R } W[h 0,..., h R z] e t 0 ds hr(s) (38) whereas their crude counterparts, which would be reported upon assuming independent risks, are S r (t z) = e t h r (t z) = ds hr(s z) 0 t (39) ds h 0 r (s) {dh0... dh R } W[h 0,..., h R z] h r (t)e R {dh0... dh R } W[h 0,..., h R z] e R t (40) ds h 0 r (s) We can quantifying the impact of competing risks in the cohort by comparing (37,38) to (39,40). If the primary risk r is not correlated with the non-primary risks (i.e. if W[h 0,..., h R z] = W[h 1 z]w[h 0, h 2,..., h R z]), or if there is just one risk, the formulae (28) and (39) as well as (38) and (40) become pairwise identical, as expected. The data likelihood (32) acquires the form P (t, r z) = {dh 0... dh R } W[h 0,..., h R z] h r (t)e R t ds h 0 r (s) (41) An alternative formula for P (t, r z) follows upon combining (23) with (40). In appendix B we show that the results are identical. Finally, the covariate-conditioned cause-specific cumulative incidence functions can be written as F r (t z) = {dh 0... dh R } W[h 0,..., h R z] t 0 dt h r (t )e R t 0 ds h r (s) (42) The level of description (35) is sufficient and necessary for handling heterogeneity-induced competing risks, apart from the trivial option to combine the non-primary risks r r into a single risk, leading 12

13 to (33). One cannot work with the crude cohort-level covariate-conditioned hazard rates alone: the latter can be calculated from W[h 0,..., h R z] via (40), but the converse is not true. In fact, for any W[h 0,..., h R z] there exists an alternative distribution W[h 0,..., h R z] describing a homogeneous cohort, such that W and W give identical crude cohort-level cause-specific hazard rates, namely W[h 0,..., h R z] = r R δ F[h r h r (z)], in which h r (z) is the function (40). 3.5 Estimation of W[h 0,..., h R z] from survival data When data are limited one must determine the relevant quantities in parametrised form, to avoid overfitting. Since the data likelihood can be expressed in terms of the crude cohort-level covariateconditioned cause-specific hazard rates, one cannot extract information from survival data on W[h 0,..., h R z] that is not contained in {h t (t z)}. However, even relatively simple parametrisations of W[h 0,..., h R z] will via (40) correspond to nontrivial crude conditioned hazard rates (with time dependencies caused by cohort filtering), that one would be very unlikely to propose when parametrising at the level of the crude hazard rates. We thus assume W[h 0,..., h R z] to be a member of a parametrised family of conditioned distributions W[h 0,..., h R z, θ], in which θ Ω denotes the vector of parameters and Ω its value domain. Since the probability density for an individual with covariates z to report the pair (t, r) is given by (41), the data likelihood P (D θ) = N P (t i, r i z i ), given the parameters θ is P (D θ) = N {dh 0... dh R } W[h 0,..., h R z i, θ] h ri (t i )e R ti ds h 0 r (s) (43) If we concentrate all the survival data in two empirical distributions, ˆP (t, r z) = i, z i =z δ(t t i)δ r,ri i, z i =z 1, ˆP (z) = 1 N N δ z,zi (44) (with δ ab = 1 if a = b, and δ ab = 0 otherwise) we can write the log-likelihood L(θ) = log P (D θ) as L(θ) = N z R ˆP (z) r=0 dt ˆP (t, r z) log {dh 0... dh R } W[h 0,..., h R z, θ] h r (t)e R t ds h 0 r (s) (45) This log-likelihood can be interpreted in terms of the dissimilarity of the empirical function ˆP (t, r z) and the model prediction ˆP (t, r z, θ), i.e. the result of substituting W[h 0,..., h R z, θ] into (41): L(θ) N = z { ˆP R (z) dt ˆP (t, r z) log ˆP R (t, r z) r=0 r=0 dt ˆP ( ˆP (t, r z) )} (t, r z) log P (t, r z, θ) (46) The first (entropic) term is independent of θ, the second is minus the Kullback-Leibler distance D( ˆP P ) [43] between ˆP and P. Hence finding the most probable parameters θ is equivalent to minimizing D( ˆP P ). From this starting point one can follow different routes for estimating θ, each with specific advantages and limitations. In maximum likelihood (ML) estimation one simply uses the value ˆθ for which the data are most likely, ˆθ ML = argmax θ Ω L(θ) (47) 13

14 In the Bayesian formalism one does not commit oneself to one choice for θ, but one uses the full posterior disribution P (θ D) = P (θ)e L(θ) Ω dθ P (θ )e L(θ ) (48) Finally, in maximum a posteriori probability (MAP) estimation one uses the value ˆθ for which P (θ D) is maximal, ˆθ MAP = argmax θ Ω [L(θ) + log P (θ)] (49) For sufficiently large data sets the above estimation methods become equivalent, i.e. lim N ˆθMAP = lim N ˆθML and lim N P (θ D) = δ(θ θ ML ). This follows from the property lim N L(θ)/N = lim N P (θ D)/N. Moreover, from (46) it follows that ˆθ MAP and ˆθ ML are both consistent estimators [41], provided W[h 0,..., h R z, θ] is an unambiguous parametrisation (i.e. the link θ P (t, r z, θ) is one-to-one), and if the data were indeed generated from P (t, r z, θ), since then we will find lim N ˆP (t, r z) = P (t, r z, ˆθ) and limn D( ˆP P ˆθ ) = 0. There are many variations on these protocols, see e.g. [42]. One could reduce the overfitting danger in the ML method by including Aikake s (AIC) or the Bayesian Information Criterion (BIC). Alternative Bayesian routes involve e.g. hyperparameter estimation, or variational approximations of the posterior parameter distribution to reduce computation costs, or model selection to select good parametrisations W[h 0,..., h R z, θ]. 4 Parametrisation of W[h 0,..., h R z] We obtain a transparent class of parametrisations for W[h 0,..., h R z, θ] by assuming that proportional hazards holds at the level of individuals. We work out the relevant equations, and show how the resulting description includes the standard methods (e.g. Cox regression, frailty, random effect and latent class models) as special cases. 4.1 Generic parametrisation For each individual i we can always write the individual cause-specific hazard rates in the form h i r(t) = λ i r(t) exp(βr 0i + p µ=1 βµi r zµ), i with all time-dependences concentrated in the λ i r(t). The parameters βr 0i represent individual risk-specific frailties, which must be normalised to remove the redundancy due to invariance of the hazard rates under {λ i r(t), βr 0i } {λ i r(t)e ζi r, β 0i r +ζr}. i According to (35), we can then write W[h 0,..., h R z] as W[h 0,..., h R z, M] = dβ 0... dβ R {dλ 0... dλ R } M(β 0,..., β R ; λ 0,..., λ R z) R r=0 with the short-hand β r = (β 0 r,..., β p r ), and with M(β 0,..., β R ; λ 0,..., λ R z) = δ F [h r λ r e β0 r + p µ=1 βµ r z µ ] (50) i, z i =z Rr=0 {δ F [λ r λ i r]δ(β r β i r)} i, z i =z 1 (51) 14

15 So in this parametrisation θ = M. Note that (50) is still completely general. It does not yet imply a proportional hazards assumption at the level of individuals unless M(β 0,..., β R ; λ 0,..., λ R z) is independent of z. However, it is practical only if M(β 0,..., β R ; λ 0,..., λ R z) depends in a relatively simple way on the parameters {β 0,..., β R } and the functions {λ 0,..., λ R }. To compactify our notation further we introduce the short-hands β z = β 0 + p µ=1 βµ z µ and Λ t (t) = t 0 ds λ r(s). Inserting (50) into (45) then gives the corresponding data log-likelihood L(M): L(M) = N R ˆP (z) dt ˆP (t, r z) log dβ 0... dβ R {dλ 0,..., λ R } z This is equivalent to L(M) = N r=0 M(β 0,..., β R ; λ 0,..., λ R z) λ r (t) e β r z R Λ r (t) exp(β r z) (52) log dβ 0... dβ R {dλ 0,..., λ R } M(β 0,..., β R ; λ 0,..., λ R z i ) λ ri (t i ) e β r i z i R Λ r (t i) exp(β r z i ) The individual cause-specific hazard rates are written in a form reminiscent of [7], but with timedependent factors and time-independent regression and frailty parameters for the R + 1, distributed according to M(β 0,..., β R ; λ 0,..., λ R z), in the spirit of fraily and random effects models. However, here this is done for all risks, so that the complexities of competing risks are captured by the correlation structure of M(β 0,..., β R ; λ 0,..., λ R z). All applications in this report are based on the generic parametrisation (50). Given (50) one obtains formulae for the decontaminated and crude cause-specific quantities of interest, which are fully exact as long as M(β 0,..., β R ; λ 0,..., λ R z) is kept general. We write the single-risk marginals of M(β 0,..., β R ; λ 0,..., λ R z) as ( ) M(β r ; λ r z) = dβ r {dλ r } M(β 0,..., β R ; λ 0,..., λ R z) (54) r r For the decontaminated cause-specific survival functions and hazard rates we then get S r (t z) = dβ r {dλ r } M(β r ; λ r z) e exp(βr z)λr(t) (55) (53) h r (t z) = dβr {dλ r } M(β r ; λ r z) λ r (t)eβr z exp(βr z)λr(t) dβr {dλ r } M(β r ; λ r z) e exp(β (56) r z)λr(t) The crude hazard rates and the data probability become dβ0... dβ h r (t z) = R {dλ0... dλ R } M(β 0,..., β R ; λ 0,..., λ R z) λ r (t)e βr z R exp(β r z)λ r (t) dβ0... dβ R {dλ0... dλ R } M(β 0,..., β R ; λ 0,..., λ R z) e R P (t, r z) = dβ 0... dβ R exp(β r z)λ r (t) (57) {dλ 0,..., λ R } M(β 0,..., β R ; λ 0,..., λ R z)λ r (t)e β r z R exp(β r z)λ r (t) (58) and, finally, the covariate-conditioned cumulative cause-specific incidence functions are F r (t z) = dβ 0... dβ R {dλ 0... dλ R } M(β 0,..., β R ; λ 0,..., λ R z) t dt λ r (t )e βr z R exp(β r z)λ r (t ) 0 (59) 15

16 4.2 Connection with conventional regression methods The parametrisation (50) is generic, so all regression methods compatible with assuming heterogeneityinduced competing risks will correspond to specific choices for M(β 0,..., β R ; λ 0,..., λ R z). We label the primary risk as r = 1. All methods that take primary and non-primary risks to be independent would have M(β 0,..., β R ; λ 0,..., λ R z) = M(β 1, λ 1 z)m(β 0, β 2,..., β R ; λ 0, λ 2,..., λ R z) with some specific M(β 1, λ 1 z), e.g. Cox s proportional hazards regression [7] Here one assumes that there is no variability in the parameters (β 1, λ 1 ) of the primary risk. Elimination of parameter redundancy then means that β 0 1 is absorbed into λ 1(t), and we find M(β 1 ; λ 1 z) = p δ F [λ 1 ˆλ] δ(β1) 0 δ(β µ 1 ˆβ µ ) (60) µ=1 Via maximum likelihood one can express the base hazard rate in terms of the regression coefficients { ˆβ µ } (giving Breslow s formula), substitution of which leads to Cox s equations [7]. See appendix C for details. Simple frailty models In simple frailty models [16, 18], the frailty parameters of different risks are assumed statistically independent, so the heterogeneity of the cohort is concentrated in the random parameter β 0 1 : M(β 1 ; λ 1 z) = p δ F [λ 1 ˆλ] g(β1) 0 δ(β µ 1 ˆβ µ ) (61) µ=1 One then usually chooses the frailty distribution g(β1 0 ) to be of a specific parametrised form that allows one to do various relevant integrals over β1 0 analytically. See appendix C for details. Simple random effects models In simple random effects models, such as [21], one still takes the primary risk parameters to be independent of the non-primary ones, but now the regression coeficients that couple to the covariates are non-uniform: M(β 1 ; λ 1 z) = δ F [λ 1 ˆλ] W (β 1 ) (62) One assumes a parametrized form for the distribution W (β 1 ) and estimates its parameters from the data. Latent class models The latent class models of [27] are found upon assuming the cohort to consists of a finite number of discrete sub-cohorts. Each is of the type (60), but with distinct base hazard rates 16

17 and association parameters. The probabilities for individuals to belong to each sub-cohort are allowed to depend on their covariates, as in [28]: M(β 1 ; λ 1 z) = w(l z) = L p w(l z) δ F [λ 1 ˆλ l ] δ(β1) 0 δ(β µ 1 ˆβ lµ ) (63) l=1 µ=1 e αl + p 0 µ=1 αl µ zµ Ll p =1 eαl 0 µ=1 αl µ zµ (64) These models all focus on the primary risk only, and thereby lose the ability to deal with the competing risk problem. Some authors have tried to characterise all risk and their parameter interactions [17, 11], but did not yet develop systematic decontamination protocols. Of course there are many variations on the above models, including versions with time-dependent covariates, and models with non-latent classes in the sense that for each individual one knows the class label. It is easy to see how they fit into the generic formulation. 4.3 A simple latent class parametrisation for heterogeneity-induced competing risks Descriptions that include all risks and their correlations will have more parameters than those limited to the primary risk. In view of overfitting, it is vital that one limits the complexity of the chosen parametrisation. The difference between frailty and random effects models is only in whether the risk variability relates to known or unknown covariates, so it seems logical to combine both. If we take the heterogeneity to be discrete, but without the covariate dependence of class probabilities of (63), if we assume the end-of-trial risk not to depend on the covariates, and if we choose the base hazard rates of all risks to be uniform in the cohort, we obtain a simple model family in which M(β 0,..., β R ; λ 0,..., λ R z) = δ(β 0 )δ F [λ 0 ˆλ 0 ]M(β 1,..., β R ; λ 1,..., λ R ), with M(β 1,..., β R ; λ 1,..., λ R ) = R M(β 1,..., β R ) δ F [λ r ˆλ r ] (65) r=1 M(β 1,..., β R ) = L R w l δ(β r ˆβ l r) (66) l=1 r=1 Here ˆβ l r = ( ˆβ l0 r,..., ˆβ lp r ). See Figure 2 for an illustration of what (65) means in terms of individual cause-specific hazard rates. For any choice for the number L of assumed latent classes, the remaining parameters to be estimated are: the cause-specific hazard rates {ˆλ r (t)} of all risks, the L class sizes w l [0, 1] (subject to L l=1 w l = 1), and the regression coeficients { ˆβ lµ r } and frailty parameters { ˆβ l0 r } of all risks r = 1... R and all latent classes. One can see (65) as a generalisation of [26] (where only frailties, as opposed to also associations, were class-dependent). The remaining parametrisation invariance is {ˆλ r (t), ˆβ l0 r } {ˆλ r (t)e ζr, ˆβ l0 r + ζ r } for all l, which is removed by setting ˆβ 10 r = 0 for all r. Finding the optimal value of L is a simple Bayesian model selection problem. The log-likelihood (53) is at the core of parameter estimation. For the multi-risk parametrisation (65) it simplifies to the following expression, with our usual short-hand β l r z = βr l0 + p µ=1 βlµ r z µ and with δ ab = 1 δ ab : L(M) = N log ˆλ N { L ri (t i ) + log w l e ˆβ R ri z i ˆΛ r=0 r(t i ) exp( ˆβ l r z i) } l=1 17

18 Latent class 1 Latent class L for all r >0: h i r(t) = ˆλ r (t)e fraction: w 1 10 ˆβ + p r µ=1 ˆβ 1µ r zµ i for all r >0: h i r(t) = ˆλ r (t)e fraction: w L L0 ˆβ + p r µ=1 ˆβ Lµ r zµ i Figure 2: Illustration of the parametrisation (65). All individuals i in the cohort are assumed to have personalised cause-specific hazard rates h i r(t) which for all risks r = 1... R are of the proportional hazards form. The cohort is allowed to be heterogeneous in that it may consist of L sub-cohorts (or latent classes ), labelled by l = 1... L. Each latent class l contains individuals with risk-specific ˆβ l0 frailties r and with risk-specific regression parameters that capture the impact of covariates. The base hazard rates ˆλ r (t) of the risks are assumed not to vary between individuals. The class membership of the individuals in our data set is not known a priori, but can be inferred a posteriori. ˆβ lµ r = L 0 (M) + L risks (M) (67) The first term probes end-of-trial censoring information. The second contains the quantities related to true risks: L 0 (M) = L risks (M) = N δ 0ri log ˆλ N 0 (t i ) ˆΛ 0 (t i ) (68) N δ 0ri log ˆλ N { L ri (t i ) + log w l e δ ˆβ R 0r i ri z i ˆΛ r=1 r(t i ) exp( ˆβ l r z i) } (69) l=1 Inserting (65) into our formulae for the decontaminated cause-specific survival function and hazard rates of the true risks r > 0 gives the relatively simple and intuitive expressions S r (t z) = L w l e exp( ˆβ l r z)ˆλ r(t) l=1 (70) h r (t z) = ˆλ r (t) Ll=1 w l e ˆβ l r z exp( ˆβ l r z)ˆλ r(t) Ll=1 w l e exp( ˆβ l r z)ˆλ r(t) (71) The crude hazard rate and the data probability become h r (t z) = ˆλ r (t) P (t, r z) = ˆλ r (t)e ˆΛ 0 (t) Ll=1 w l e ˆβ l r z R r =1 exp( ˆβ l r z)ˆλ r (t) Ll=1 w l e R r =1 exp( ˆβ l r z)ˆλ r (t) L l=1 w l e ˆβ l r z R r =1 exp( ˆβ l r z)ˆλ r (t) (72) (73) From the crude cause-specific hazard rates follow the crude cause-specific survival functions for r = 1... R, via the relation S r (t z) = exp[ t 0 ds h r(s z)]. The cumulative cause-specific incidence 18

19 functions corresponding to (65) are F r (t z) = t 0 dt ˆλr (t )e ˆΛ 0 (t ) L l=1 w l e ˆβ l r z R r =1 exp( ˆβ l r z)ˆλ r (t ) The specific parametrisation (65) has two additional useful features: (74) After the class sizes (w 1,..., w L ) have been inferred, one obtains the effective number L eff of classes via Shannon s information-theoretic entropy S [43], which takes into account any class size differences and can complement Bayesian model selection in the identification of the optimal value of L: L eff = e S, S = L w l log w l (75) Since our latent classes are defined in terms of the relation between covariates and risk, one cannot predict class membership for individuals on the basis of covariate information alone. However, Bayesian arguments allow us to calculate class membership probabilities retrospectively, for any individual on which we have the covariates z and survival information (t, r). For each class label l, the model (65) gives l=1 P (t, r z, l) = ˆλ r (t) e ˆβ l r z ˆΛ 0 (t) R r =1 exp( ˆβ l r z)ˆλ r (t) (76) Hence, using P (t, r, l z) = P (t, r z, l)w l and P (t, r z) = L l =1 P (t, r z, l )w l, we obtain P (l t, r, z) = w l P (t, r z, l) Ll =1 w l P (t, r z, l ) = l w l e ˆβ R r z r =1 exp( ˆβ l r z)ˆλ r (t) l Ll =1 w l e ˆβ z R l r r exp( ˆβ =1 r z)ˆλ r (t) (77) The probability that individual i belongs to class l is P (l t i, r i, z i ). Retroscpective class assigment could aid the search for informative new covariates, increasing our ability to predict personalised risk in heterogeneous cohorts. Such covariates are expected to be features that patients in the same class have in common. Finally, instead of imposing by hand the independence of the end-of-trial risk on covariates (to reduce the number of model parameters), one could treat the end-of-trial risk as any other risk. Any lµ parameter estimation protocol should then report that ˆβ 0 = 0 for all l and all µ = 1... p, which gives a sanity test of numerical implementations. 5 Application to synthetic survival data To test our results under controlled conditions we turn to synthetic data with heterogeneity-induced competing risks, generated from populations of the type (65). Details on numerical data generation are given in Appendix D. Our method should map a cohort s risk and association substructure, if it exists, i.e. report the number and sizes of sub-cohorts and their distinct regression parameters for all risks. It should then use this information to generate correct decontaminated survival curves, and assign individuals retrospectively to their correct latent classes. 19

20 5.1 Cohort substructure and regression parameters We generated numerically event times and event types for three heterogeneous data sets A,B and C. Each has N = 1600 individuals from two latent classes of equal size, with at most two real risks, and with end-of-trial censoring at time t = 50. Each indivdual i has three covariates (zi 1, z2 i, z3 i ), drawn randomly and independently from P (z) = (2π) 1/2 e z2 /2. All frailty parameters βr l0 are zero. The base hazard rates of the risks are time-independent: ˆλ1 (t) = 0.05 (primary risk) and ˆλ 2 (t) = 0.1 (if a secondary risk is enabled). Table 1 shows the further specifications of the data sets, together with the results of performing proportional hazards regression [7], and our generic heterogeneous regression with the latent class log-likelihood (69) where the MAP protocol was complemented with Aikake s Information Criterion as described in (111). The three data sets were constructed such that they have fully identical primary risk characterics. In set A there is heterogeneity but no competing risk. In set B a second risk is introduced, which in one of the two classes targets individuals similar to those most sensitive to the primary risk (with respect to the first covariate). Here one expects false protectivity effects. In set C a second risk is introduced, which in one of the two classes targets individuals similar to those least sensitive to the primary risk (with respect to the first covariate). Here one expects false exposure effects. As expected, the proportional hazards regression method [7] fails to report meaningful results, since it aims to describe the relation between covariates and the primary risk in each data set with a single regression vector (β1 1, β2 1, β3 1 ). The heterogeneous regression based on (69,111) always reports the correct number of classes (L = 2), and the correct class-specific parameters (within accuracy limits determined by numerical search accuracy and finite sample size). Note that the assigment of class labels to the identified classes is in principle arbitrary; see e.g. the regression results for data set B, where the class labelled l = 2 is labelled l = 1 in the data definition. 5.2 Decontaminated survival functions The second test of our method and its numerical implementation is to verify that for all three data sets A, B and C it can extract the correct decontaminated covariate-conditioned survival curve S 1 (t z) for the primary risk, from the survival data alone. The result should be identical in all three cases, since the data sets differ only in the interference effects of a secondary risk. For the primary risk in Table 1, the correct expression (70) simplifies to S 1 (t z 1 ) = 1 2 e t 20 exp(2z 1) e t 20 exp( 2z 1) (78) We can calculate the true primary risk survival curves for the upper and lower covariate quartiles (UQ, LQ) and the inter-quartile range (IQ). For our Gaussian-distributed covariates, with zero average and unit variance, the upper and lower quartile survival curves are identical, due to the symmetry S 1 (t z 1 ) = S 1 (t z 1 ). With the short-hand Dz = (2π) 1/2 e z2 /2 dz, and the quartile point z Q defined via z Q Dz = 1 4, giving z Q , we obtain from (78): LQ, UQ : S1 (t z 1 [z Q, )) = 2 Dz z Q zq IQ : S1 (t z 1 [ z Q, z Q ]) = 2 Dz 0 (e t 20 exp(2z) + e t 20 exp( 2z)) (79) (e t 20 exp(2z) + e t 20 exp( 2z)) (80) Figure 3 shows the true LQ, UQ and IQ survival curves (79,80) for data sets A, B and C, together with the decontaminated curves S 1 in (70) calculated from application of our regression method 20

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Cox s proportional hazards model and Cox s partial likelihood

Cox s proportional hazards model and Cox s partial likelihood Cox s proportional hazards model and Cox s partial likelihood Rasmus Waagepetersen October 12, 2018 1 / 27 Non-parametric vs. parametric Suppose we want to estimate unknown function, e.g. survival function.

More information

1 Using standard errors when comparing estimated values

1 Using standard errors when comparing estimated values MLPR Assignment Part : General comments Below are comments on some recurring issues I came across when marking the second part of the assignment, which I thought it would help to explain in more detail

More information

The Gaussian process latent variable model with Cox regression

The Gaussian process latent variable model with Cox regression The Gaussian process latent variable model with Cox regression James Barrett Institute for Mathematical and Molecular Biomedicine (IMMB), King s College London James Barrett (IMMB, KCL) GP Latent Variable

More information

Survival Analysis for Case-Cohort Studies

Survival Analysis for Case-Cohort Studies Survival Analysis for ase-ohort Studies Petr Klášterecký Dept. of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, harles University, Prague, zech Republic e-mail: petr.klasterecky@matfyz.cz

More information

Multivariate Survival Analysis

Multivariate Survival Analysis Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

UNIVERSITY OF CALIFORNIA, SAN DIEGO

UNIVERSITY OF CALIFORNIA, SAN DIEGO UNIVERSITY OF CALIFORNIA, SAN DIEGO Estimation of the primary hazard ratio in the presence of a secondary covariate with non-proportional hazards An undergraduate honors thesis submitted to the Department

More information

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model Other Survival Models (1) Non-PH models We briefly discussed the non-proportional hazards (non-ph) model λ(t Z) = λ 0 (t) exp{β(t) Z}, where β(t) can be estimated by: piecewise constants (recall how);

More information

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data Malaysian Journal of Mathematical Sciences 11(3): 33 315 (217) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES Journal homepage: http://einspem.upm.edu.my/journal Approximation of Survival Function by Taylor

More information

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/

More information

Multistate Modeling and Applications

Multistate Modeling and Applications Multistate Modeling and Applications Yang Yang Department of Statistics University of Michigan, Ann Arbor IBM Research Graduate Student Workshop: Statistics for a Smarter Planet Yang Yang (UM, Ann Arbor)

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

More information

Survival Analysis I (CHL5209H)

Survival Analysis I (CHL5209H) Survival Analysis Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca January 7, 2015 31-1 Literature Clayton D & Hills M (1993): Statistical Models in Epidemiology. Not really

More information

Probability and Information Theory. Sargur N. Srihari

Probability and Information Theory. Sargur N. Srihari Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal

More information

CIMAT Taller de Modelos de Capture y Recaptura Known Fate Survival Analysis

CIMAT Taller de Modelos de Capture y Recaptura Known Fate Survival Analysis CIMAT Taller de Modelos de Capture y Recaptura 2010 Known Fate urvival Analysis B D BALANCE MODEL implest population model N = λ t+ 1 N t Deeper understanding of dynamics can be gained by identifying variation

More information

Statistical Inference and Methods

Statistical Inference and Methods Department of Mathematics Imperial College London d.stephens@imperial.ac.uk http://stats.ma.ic.ac.uk/ das01/ 31st January 2006 Part VI Session 6: Filtering and Time to Event Data Session 6: Filtering and

More information

Lecture 22 Survival Analysis: An Introduction

Lecture 22 Survival Analysis: An Introduction University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 22 Survival Analysis: An Introduction There is considerable interest among economists in models of durations, which

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Model comparison. Patrick Breheny. March 28. Introduction Measures of predictive power Model selection

Model comparison. Patrick Breheny. March 28. Introduction Measures of predictive power Model selection Model comparison Patrick Breheny March 28 Patrick Breheny BST 760: Advanced Regression 1/25 Wells in Bangladesh In this lecture and the next, we will consider a data set involving modeling the decisions

More information

Likelihood and Fairness in Multidimensional Item Response Theory

Likelihood and Fairness in Multidimensional Item Response Theory Likelihood and Fairness in Multidimensional Item Response Theory or What I Thought About On My Holidays Giles Hooker and Matthew Finkelman Cornell University, February 27, 2008 Item Response Theory Educational

More information

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were

More information

Multi-state Models: An Overview

Multi-state Models: An Overview Multi-state Models: An Overview Andrew Titman Lancaster University 14 April 2016 Overview Introduction to multi-state modelling Examples of applications Continuously observed processes Intermittently observed

More information

A Regression Model For Recurrent Events With Distribution Free Correlation Structure

A Regression Model For Recurrent Events With Distribution Free Correlation Structure A Regression Model For Recurrent Events With Distribution Free Correlation Structure J. Pénichoux(1), A. Latouche(2), T. Moreau(1) (1) INSERM U780 (2) Université de Versailles, EA2506 ISCB - 2009 - Prague

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b) LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered

More information

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind

More information

Probability theory basics

Probability theory basics Probability theory basics Michael Franke Basics of probability theory: axiomatic definition, interpretation, joint distributions, marginalization, conditional probability & Bayes rule. Random variables:

More information

Model Selection in Bayesian Survival Analysis for a Multi-country Cluster Randomized Trial

Model Selection in Bayesian Survival Analysis for a Multi-country Cluster Randomized Trial Model Selection in Bayesian Survival Analysis for a Multi-country Cluster Randomized Trial Jin Kyung Park International Vaccine Institute Min Woo Chae Seoul National University R. Leon Ochiai International

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 24 Paper 153 A Note on Empirical Likelihood Inference of Residual Life Regression Ying Qing Chen Yichuan

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

MAS3301 / MAS8311 Biostatistics Part II: Survival

MAS3301 / MAS8311 Biostatistics Part II: Survival MAS3301 / MAS8311 Biostatistics Part II: Survival M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2009-10 1 13 The Cox proportional hazards model 13.1 Introduction In the

More information

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Theory of Maximum Likelihood Estimation. Konstantin Kashin Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Analysis of competing risks data and simulation of data following predened subdistribution hazards

Analysis of competing risks data and simulation of data following predened subdistribution hazards Analysis of competing risks data and simulation of data following predened subdistribution hazards Bernhard Haller Institut für Medizinische Statistik und Epidemiologie Technische Universität München 27.05.2013

More information

Power and Sample Size Calculations with the Additive Hazards Model

Power and Sample Size Calculations with the Additive Hazards Model Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine

More information

Lecture 5 Models and methods for recurrent event data

Lecture 5 Models and methods for recurrent event data Lecture 5 Models and methods for recurrent event data Recurrent and multiple events are commonly encountered in longitudinal studies. In this chapter we consider ordered recurrent and multiple events.

More information

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520 REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520 Department of Statistics North Carolina State University Presented by: Butch Tsiatis, Department of Statistics, NCSU

More information

Lecture 3. Truncation, length-bias and prevalence sampling

Lecture 3. Truncation, length-bias and prevalence sampling Lecture 3. Truncation, length-bias and prevalence sampling 3.1 Prevalent sampling Statistical techniques for truncated data have been integrated into survival analysis in last two decades. Truncation in

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Bayesian Inference. Introduction

Bayesian Inference. Introduction Bayesian Inference Introduction The frequentist approach to inference holds that probabilities are intrinsicially tied (unsurprisingly) to frequencies. This interpretation is actually quite natural. What,

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY Ingo Langner 1, Ralf Bender 2, Rebecca Lenz-Tönjes 1, Helmut Küchenhoff 2, Maria Blettner 2 1

More information

On the errors introduced by the naive Bayes independence assumption

On the errors introduced by the naive Bayes independence assumption On the errors introduced by the naive Bayes independence assumption Author Matthijs de Wachter 3671100 Utrecht University Master Thesis Artificial Intelligence Supervisor Dr. Silja Renooij Department of

More information

Reliability Engineering I

Reliability Engineering I Happiness is taking the reliability final exam. Reliability Engineering I ENM/MSC 565 Review for the Final Exam Vital Statistics What R&M concepts covered in the course When Monday April 29 from 4:30 6:00

More information

Invariant HPD credible sets and MAP estimators

Invariant HPD credible sets and MAP estimators Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in

More information

Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data

Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data Xuelin Huang Department of Biostatistics M. D. Anderson Cancer Center The University of Texas Joint Work with Jing Ning, Sangbum

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Due Thursday, September 19, in class What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

Inference and estimation in probabilistic time series models

Inference and estimation in probabilistic time series models 1 Inference and estimation in probabilistic time series models David Barber, A Taylan Cemgil and Silvia Chiappa 11 Time series The term time series refers to data that can be represented as a sequence

More information

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing. Previous lecture P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing. Interaction Outline: Definition of interaction Additive versus multiplicative

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Notes for course EE1.1 Circuit Analysis TOPIC 4 NODAL ANALYSIS

Notes for course EE1.1 Circuit Analysis TOPIC 4 NODAL ANALYSIS Notes for course EE1.1 Circuit Analysis 2004-05 TOPIC 4 NODAL ANALYSIS OBJECTIVES 1) To develop Nodal Analysis of Circuits without Voltage Sources 2) To develop Nodal Analysis of Circuits with Voltage

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Lecture 1: Introduction to Probabilistic Modelling Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Why a

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information

Survival Regression Models

Survival Regression Models Survival Regression Models David M. Rocke May 18, 2017 David M. Rocke Survival Regression Models May 18, 2017 1 / 32 Background on the Proportional Hazards Model The exponential distribution has constant

More information

A Bayesian Nonparametric Approach to Causal Inference for Semi-competing risks

A Bayesian Nonparametric Approach to Causal Inference for Semi-competing risks A Bayesian Nonparametric Approach to Causal Inference for Semi-competing risks Y. Xu, D. Scharfstein, P. Mueller, M. Daniels Johns Hopkins, Johns Hopkins, UT-Austin, UF JSM 2018, Vancouver 1 What are semi-competing

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective Anastasios (Butch) Tsiatis and Xiaofei Bai Department of Statistics North Carolina State University 1/35 Optimal Treatment

More information

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology Group Sequential Tests for Delayed Responses Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Lisa Hampson Department of Mathematics and Statistics,

More information

Introduction to Reliability Theory (part 2)

Introduction to Reliability Theory (part 2) Introduction to Reliability Theory (part 2) Frank Coolen UTOPIAE Training School II, Durham University 3 July 2018 (UTOPIAE) Introduction to Reliability Theory 1 / 21 Outline Statistical issues Software

More information

Survival Analysis. Stat 526. April 13, 2018

Survival Analysis. Stat 526. April 13, 2018 Survival Analysis Stat 526 April 13, 2018 1 Functions of Survival Time Let T be the survival time for a subject Then P [T < 0] = 0 and T is a continuous random variable The Survival function is defined

More information

MCMC 2: Lecture 2 Coding and output. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham

MCMC 2: Lecture 2 Coding and output. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham MCMC 2: Lecture 2 Coding and output Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham Contents 1. General (Markov) epidemic model 2. Non-Markov epidemic model 3. Debugging

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger

More information

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Overview of today s class Kaplan-Meier Curve

More information

g-priors for Linear Regression

g-priors for Linear Regression Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,

More information

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes: Practice Exam 1 1. Losses for an insurance coverage have the following cumulative distribution function: F(0) = 0 F(1,000) = 0.2 F(5,000) = 0.4 F(10,000) = 0.9 F(100,000) = 1 with linear interpolation

More information

Analysing geoadditive regression data: a mixed model approach

Analysing geoadditive regression data: a mixed model approach Analysing geoadditive regression data: a mixed model approach Institut für Statistik, Ludwig-Maximilians-Universität München Joint work with Ludwig Fahrmeir & Stefan Lang 25.11.2005 Spatio-temporal regression

More information

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics A short review of the principles of mathematical statistics (or, what you should have learned in EC 151).

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Machine Learning Lecture Notes

Machine Learning Lecture Notes Machine Learning Lecture Notes Predrag Radivojac January 25, 205 Basic Principles of Parameter Estimation In probabilistic modeling, we are typically presented with a set of observations and the objective

More information

Paradoxical Results in Multidimensional Item Response Theory

Paradoxical Results in Multidimensional Item Response Theory UNC, December 6, 2010 Paradoxical Results in Multidimensional Item Response Theory Giles Hooker and Matthew Finkelman UNC, December 6, 2010 1 / 49 Item Response Theory Educational Testing Traditional model

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

In contrast, parametric techniques (fitting exponential or Weibull, for example) are more focussed, can handle general covariates, but require

In contrast, parametric techniques (fitting exponential or Weibull, for example) are more focussed, can handle general covariates, but require Chapter 5 modelling Semi parametric We have considered parametric and nonparametric techniques for comparing survival distributions between different treatment groups. Nonparametric techniques, such as

More information

Hierarchical Models & Bayesian Model Selection

Hierarchical Models & Bayesian Model Selection Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or

More information

Machine learning - HT Maximum Likelihood

Machine learning - HT Maximum Likelihood Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce

More information

Interim Monitoring of Clinical Trials: Decision Theory, Dynamic Programming. and Optimal Stopping

Interim Monitoring of Clinical Trials: Decision Theory, Dynamic Programming. and Optimal Stopping Interim Monitoring of Clinical Trials: Decision Theory, Dynamic Programming and Optimal Stopping Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj

More information

INTRODUCTION TO MULTILEVEL MODELLING FOR REPEATED MEASURES DATA. Belfast 9 th June to 10 th June, 2011

INTRODUCTION TO MULTILEVEL MODELLING FOR REPEATED MEASURES DATA. Belfast 9 th June to 10 th June, 2011 INTRODUCTION TO MULTILEVEL MODELLING FOR REPEATED MEASURES DATA Belfast 9 th June to 10 th June, 2011 Dr James J Brown Southampton Statistical Sciences Research Institute (UoS) ADMIN Research Centre (IoE

More information

Group Sequential Designs: Theory, Computation and Optimisation

Group Sequential Designs: Theory, Computation and Optimisation Group Sequential Designs: Theory, Computation and Optimisation Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj 8th International Conference

More information

ECO Class 6 Nonparametric Econometrics

ECO Class 6 Nonparametric Econometrics ECO 523 - Class 6 Nonparametric Econometrics Carolina Caetano Contents 1 Nonparametric instrumental variable regression 1 2 Nonparametric Estimation of Average Treatment Effects 3 2.1 Asymptotic results................................

More information

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto. Introduction to Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca September 18, 2014 38-1 : a review 38-2 Evidence Ideal: to advance the knowledge-base of clinical medicine,

More information

Advanced Quantitative Methods: maximum likelihood

Advanced Quantitative Methods: maximum likelihood Advanced Quantitative Methods: Maximum Likelihood University College Dublin 4 March 2014 1 2 3 4 5 6 Outline 1 2 3 4 5 6 of straight lines y = 1 2 x + 2 dy dx = 1 2 of curves y = x 2 4x + 5 of curves y

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Modelling geoadditive survival data

Modelling geoadditive survival data Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model

More information

MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION

MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION THOMAS MAILUND Machine learning means different things to different people, and there is no general agreed upon core set of algorithms that must be

More information

CHAPTER 1 A MAINTENANCE MODEL FOR COMPONENTS EXPOSED TO SEVERAL FAILURE MECHANISMS AND IMPERFECT REPAIR

CHAPTER 1 A MAINTENANCE MODEL FOR COMPONENTS EXPOSED TO SEVERAL FAILURE MECHANISMS AND IMPERFECT REPAIR CHAPTER 1 A MAINTENANCE MODEL FOR COMPONENTS EXPOSED TO SEVERAL FAILURE MECHANISMS AND IMPERFECT REPAIR Helge Langseth and Bo Henry Lindqvist Department of Mathematical Sciences Norwegian University of

More information

Optimising Group Sequential Designs. Decision Theory, Dynamic Programming. and Optimal Stopping

Optimising Group Sequential Designs. Decision Theory, Dynamic Programming. and Optimal Stopping : Decision Theory, Dynamic Programming and Optimal Stopping Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj InSPiRe Conference on Methodology

More information

Bayesian model selection for computer model validation via mixture model estimation

Bayesian model selection for computer model validation via mixture model estimation Bayesian model selection for computer model validation via mixture model estimation Kaniav Kamary ATER, CNAM Joint work with É. Parent, P. Barbillon, M. Keller and N. Bousquet Outline Computer model validation

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

CTDL-Positive Stable Frailty Model

CTDL-Positive Stable Frailty Model CTDL-Positive Stable Frailty Model M. Blagojevic 1, G. MacKenzie 2 1 Department of Mathematics, Keele University, Staffordshire ST5 5BG,UK and 2 Centre of Biostatistics, University of Limerick, Ireland

More information

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so. CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

The outline for Unit 3

The outline for Unit 3 The outline for Unit 3 Unit 1. Introduction: The regression model. Unit 2. Estimation principles. Unit 3: Hypothesis testing principles. 3.1 Wald test. 3.2 Lagrange Multiplier. 3.3 Likelihood Ratio Test.

More information

Statistical Methods for Alzheimer s Disease Studies

Statistical Methods for Alzheimer s Disease Studies Statistical Methods for Alzheimer s Disease Studies Rebecca A. Betensky, Ph.D. Department of Biostatistics, Harvard T.H. Chan School of Public Health July 19, 2016 1/37 OUTLINE 1 Statistical collaborations

More information

THESIS for the degree of MASTER OF SCIENCE. Modelling and Data Analysis

THESIS for the degree of MASTER OF SCIENCE. Modelling and Data Analysis PROPERTIES OF ESTIMATORS FOR RELATIVE RISKS FROM NESTED CASE-CONTROL STUDIES WITH MULTIPLE OUTCOMES (COMPETING RISKS) by NATHALIE C. STØER THESIS for the degree of MASTER OF SCIENCE Modelling and Data

More information

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms

More information