Nonparametric estimation for current status data with competing risks

Size: px
Start display at page:

Download "Nonparametric estimation for current status data with competing risks"

Transcription

1 Nonparametric estimation for current status data with competing risks Marloes Henriëtte Maathuis A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Washington 2006 Program Authorized to Offer Degree: Statistics

2

3 University of Washington Graduate School This is to certify that I have examined this copy of a doctoral dissertation by Marloes Henriëtte Maathuis and have found that it is complete and satisfactory in all respects, and that any and all revisions required by the final examining committee have been made. Co-Chairs of the Supervisory Committee: Piet Groeneboom Jon A. Wellner Reading Committee: Piet Groeneboom Michael G. Hudgens Jon A. Wellner Date:

4

5 In presenting this dissertation in partial fulfillment of the requirements for the doctoral degree at the University of Washington, I agree that the Library shall make its copies freely available for inspection. I further agree that extensive copying of this dissertation is allowable only for scholarly purposes, consistent with fair use as prescribed in the U.S. Copyright Law. Requests for copying or reproduction of this dissertation may be referred to Proquest Information and Learning, 300 North Zeeb Road, Ann Arbor, MI , , to whom the author has granted the right to reproduce and sell (a) copies of the manuscript in microform and/or (b) printed copies of the manuscript made from microform. Signature Date

6

7 University of Washington Abstract Nonparametric estimation for current status data with competing risks Marloes Henriëtte Maathuis Co-Chairs of the Supervisory Committee: Professor Piet Groeneboom Statistics Professor Jon A. Wellner Statistics We study current status data with competing risks. Such data arise naturally in cross-sectional survival studies with several failure causes. Moreover, generalizations of these data arise in HIV vaccine clinical trials. The general framework is as follows. We analyze a system that can fail from K competing risks, where K N is fixed. The random variables of interest are (X, Y ), where X R + = (0, ) is the failure time of the system, and Y {1,...,K} is the corresponding failure cause. However, we cannot observe (X, Y ) directly. Rather, we observe the current status of the system at a single random observation time T R +, where T is independent of (X, Y ). This means that at time T, we observe whether or not failure occurred, and if and only if failure occurred, we also observe the failure cause Y. We study nonparametric estimation of the sub-distribution functions F 0k (t) = P(X t, Y = k), k = 1,...,K, t R +. We focus on two estimators: the nonparametric maximum likelihood estimator (MLE) and the naive estimator introduced by Jewell, Van der Laan and Henneman (2003). Our main interest is in asymptotic properties of the MLE, and the naive estimator is considered for comparison.

8

9 Until now, the asymptotic properties of the MLE have been largely unknown. We resolve this issue by proving its consistency, n 1/3 -rate of convergence, and limiting distribution. The limiting distribution involves a new self-induced limiting process, consisting of the convex minorants of K correlated two-sided Brownian motion processes plus parabolic drifts, plus an additional term involving the difference between the sum of the K drifting Brownian motions and their convex minorants. Various other aspects that we consider include characterizations of the estimators, uniqueness, graph theory, and computational algorithms. Furthermore, we show that both the MLE and the naive estimator are asymptotically efficient for a family of smooth functionals, with n-rate convergence to a normal limit. Finally, we study an extension of the model, where X is subject to interval censoring and Y is a continuous random variable. We show that the MLE is typically inconsistent in this model, and propose a simple method to repair this inconsistency.

10

11 TABLE OF CONTENTS List of Figures List of Tables iii v Chapter 1: Introduction Motivation and problem description Overview of previous work Overview of new results and outline of this thesis Chapter 2: The estimators Definition of the estimators Censored data perspective Graph theory and uniqueness Characterizations Chapter 3: Computation Reduction and optimization Iterative convex minorant algorithms Chapter 4: Consistency Hellinger consistency Local and uniform consistency Chapter 5: Rate of convergence Hellinger rate of convergence Asymptotic local minimax lower bound Local rate of convergence Technical lemmas and proofs i

12 Chapter 6: Limiting distribution The limiting distribution of the naive estimator The limiting distribution of the MLE Technical lemmas and proofs Chapter 7: A family of smooth functionals Information bound calculations Asymptotic normality of functionals of the MLE Chapter 8: Examples Menopause data Simulations Chapter 9: An extension: interval censored continuous mark data The model and an explicit formula for the MLE Inconsistency of the MLE Repaired MLE via discretization of marks Examples ii

13 LIST OF FIGURES Figure Number Page 2.1 The estimators: Graphical representation of the observed data Graph theory: Intersection graph for the MLE Convex minorant characterizations: Plots for the data in Table Asymptotic local minimax lower bound: The perturbation F nk Local rate: Plot of v n (t) for various values of β Local rate: Example clarifying the proof of Lemma Limiting distribution: Processes for the naive estimator at t 0 = Limiting distribution: Processes for the naive estimator at t 0 = Limiting distribution: Processes for the MLE at t 0 = Limiting distribution: Processes for the MLE at t 0 = Limiting distribution: Comparison of limiting processes at t 0 = Limiting distribution: Comparison of limiting processes at t 0 = Menopause data: Question of the Health Examination Study Menopause data: The MLE and the naive estimator Simulations: The true underlying sub-distribution functions Simulations: The estimators in a single simulation Simulations: Pointwise bias Simulations: Pointwise variance Simulations: Pointwise mean squared error Simulations: Pointwise relative efficiency Simulations: Smooth functionals of the MLE for t 0 = Simulations: Smooth functionals of the naive estimator for t 0 = Simulations: Smooth functionals of the MLE for t 0 = Simulations: Smooth functionals of the naive estimator for t 0 = Continuous mark data: Contour lines for estimates of F 0 (x, y) iii

14 9.2 Continuous mark data: Estimates of F 0X (x) Continuous mark data: Estimates of F 0 (x 0, y) iv

15 LIST OF TABLES Table Number Page 2.1 Censored data perspective: Example data Censored data perspective: Estimators for the data in Table Graph theory: Example data Graph theory: Clique matrix for the data in Table Convex minorant characterizations: Example data Simulations: Pointwise bias, variance and MSE at t = Continuous mark data: Summary of the examples v

16 ACKNOWLEDGMENTS I sincerely thank my advisors, Piet Groeneboom and Jon Wellner, for their mentorship over the past years. Their knowledge, guidance, inspiration and encouragement have been very important to me. I thank Peter Gilbert, Tilmann Gneiting, Peter Hoff and Michael Hudgens for serving on my committee, with special thanks to Michael for suggesting this research problem. I thank Bernard Deconinck for serving as the graduate school representative. I am grateful to the faculty, staff and students in our department for providing a stimulating and supportive research environment. In particular, I thank Fadoua Balabdaoui, Moulinath Banerjee and Hanna Jankowski for helpful discussions. Finally, I want to express my deep gratitude to Steven, my parents, my family and my friends, for their continuous support. vi

17 1 Chapter 1 INTRODUCTION 1.1 Motivation and problem description The work in this thesis is motivated by recent clinical trials of candidate vaccines against HIV/AIDS. The main purpose of such trials is to determine the overall efficacy of a candidate vaccine. Like many viruses, HIV exhibits significant genotypic and phenotypic variation, so that it can be distinguished into several subtypes. Therefore, it is also of interest to determine the efficacy of a vaccine against each subtype of the virus. Establishing vaccine efficacy for certain subtypes can warrant vaccination of populations in which the given subtypes are highly prevalent. Furthermore, establishing that the vaccine is efficacious for some subtypes, but not for others, gives important information for possible improvements of the vaccine. Thus, the variables of interest are the time of infection and the subtype of the infecting virus. These variables cannot be observed directly, because participants of a trial are only tested for the virus at several follow-up times. Since each test indicates whether or not infection happened before the time of the test, the time of infection is interval censored, i.e., only known to lie within a time interval determined by the follow-up times. Since simultaneous infections with several subtypes of a virus are rare, the subtypes are often analyzed as competing risks (see, e.g., Hudgens, Satten and Longini (2001)). Hence, these trials yield interval censored survival data with competing risks. In this thesis, we analyze current status data with competing risks. Current status censoring is the simplest form of interval censoring, where there is exactly one

18 2 observation time for each subject. We study these data for two reasons. First, such data arise naturally in cross-sectional studies with several failure causes. Second, understanding current status data with competing risks is a first step towards understanding the more complicated interval censored data with competing risks that arise in vaccine clinical trials. We consider the following general framework. We analyze a system that can fail from K competing risks, where K N is fixed. The random variables of interest are (X, Y ), where X R + = (0, ) is the failure time of the system, and Y {1,..., K} is the corresponding failure cause. Due to censoring, we cannot observe (X, Y ) directly. Rather, we observe the current status of the system at a single random observation time T R +, where T is independent of (X, Y ). Thus, at time T we observe whether or not failure occurred, and if and only if failure occurred, we also observe the failure cause Y. Examples that fit into this framework can be found in reliability and survival analysis. For an example, see the menopause data analyzed by Krailo and Pike (1983), where X is the age at menopause, Y is the cause of menopause (natural or operative), and T is the age at the time of the survey. In cross-sectional HIV studies we think of X as the time of HIV infection, Y as the subtype of the infecting HIV virus, and T as the time of the HIV test. Note that one is free to define the origin of the time scale as. Common choices include the date of birth and the beginning of the study. Given current status data with competing risks, we consider nonparametric estimation of the sub-distribution functions F 0k (t) = P(X t, Y = k), k = 1,...,K. This problem, or close variants thereof, has been studied by Hudgens, Satten and Longini (2001), Jewell, Van der Laan and Henneman (2003), and Jewell and Kalbfleisch (2004). However, there are still many open problems. In particular, until now, the asymptotic properties of the nonparametric maximum likelihood estimator (MLE) have been largely unknown. In this thesis, we resolve this problem. We prove con-

19 3 sistency, the rate of convergence and the limiting distribution of the MLE. These asymptotic results form an important step towards making inference about the subdistribution functions. The outline of the remainder of this chapter is as follows. In Section 1.2 we give an overview of previous work in this area. In Section 1.3 we give an outline of this thesis, together with a discussion of our main results. 1.2 Overview of previous work Hudgens, Satten and Longini (2001) study competing risks data subject to interval censoring and truncation. They derive the nonparametric maximum likelihood estimator (MLE) and provide an EM algorithm for its computation. They also introduce an alternative pseudo-likelihood estimator. They apply their methods to data from a cohort of injecting drug users in Thailand, where the event of interest is infection with HIV-1, and the competing risks are HIV-1 subtypes B and E. Jewell, Van der Laan and Henneman (2003) study current status data with competing risks. They consider some simple parametric models, some ad-hoc nonparametric estimators, and the MLE. They compare these estimators in a simulation study. Furthermore, they apply their methods to data analyzed by Krailo and Pike (1983), where the event of interest is menopause and the competing risks are natural and operative menopause. Finally, the authors discuss results suggesting that the simple ad-hoc estimators might yield fully efficient estimators for smooth functionals of the sub-distribution functions. Jewell and Kalbfleisch (2004) study maximum likelihood estimation of a series of ordered multinomial parameters. Current status data with competing risks can be viewed as a special case of this setting. The authors focus on the computation of the MLE, and introduce an iterative version of the Pool Adjacent Violators Algorithm.

20 4 1.3 Overview of new results and outline of this thesis We focus on the following two nonparametric estimators for the sub-distribution functions: the MLE F n = ( F n1,..., F nk ), and the naive estimator F n = ( F n1,..., F nk ) introduced by Jewell, Van der Laan and Henneman (2003). 1 Our main interest is in asymptotic properties of the MLE, and the naive estimator is considered for comparison. In Chapter 2 we define the estimators, and discuss the relationship between them. We show that both the MLE and the naive estimator can be viewed as maximum likelihood estimators for censored data. This observation is useful, because it allows us to use readily available theory and computational algorithms. In particular, the naive estimator can be viewed as the maximum likelihood estimator for reduced univariate current status data. Hence, many properties of the naive estimator follow straightforwardly from known results on current status data. The censored data perspective also allows us to use graph theory to study uniqueness properties of the estimators. Finally, we characterize the estimators in terms of necessary and sufficient conditions, in the form of Fenchel characterizations and (self-induced) convex minorant characterizations. These characterizations play a key role in the development of the asymptotic theory, and also lead to computational algorithms. Computational aspects of the MLE are discussed in Chapter 3. Since there are no explicit formulas available for the MLE, we compute the MLE with an iterative algorithm. We discuss two classes of algorithms and the connections between them. The first class is based on sequential quadratic programming, where each quadratic programming problem is solved using a support reduction algorithm. The second class consists of iterative convex minorant algorithms. We prove convergence of algorithms in both classes. Furthermore, we show that one particular iterative convex minorant algorithm can be viewed as a sequential quadratic programming method that only 1 The subscript n denotes the sample size.

21 5 uses the diagonal elements of the Hessian matrix. In Chapter 4 we discuss consistency of the estimators. We prove that both estimators are Hellinger consistent, and we use this to derive various forms of local and uniform consistency. The rate of convergence is discussed in Chapter 5. The Hellinger rate of convergence and the local rate of convergence of the naive estimator are n 1/3. This follows from known results on current status data without competing risks. For the MLE, we prove that the Hellinger rate of convergence is n 1/3. Next, we derive a local asymptotic minimax lower bound of n 1/3, meaning that no estimator can have a better local rate of convergence than n 1/3, in a minimax sense. We proceed by proving that the local rate of convergence of the MLE is n 1/3. This result comes as no surprise given the local asymptotic minimax lower bound and the local rate of convergence of the naive estimator. However, the proof of this result turned out to be rather involved, and required new methods. The key idea is to first establish a rate result for K k=1 F nk that holds uniformly on a fixed neighborhood around a point t 0, instead of on the usual shrinking neighborhood of order O(n 1/3 ). In Chapter 6 we discuss the limiting distribution of the estimators. The limiting distribution of the naive estimator is given by the slopes of the convex minorants of K correlated two-sided Brownian motion processes plus parabolic drifts. The limiting distribution of the MLE involves a new self-induced limiting process, consisting of the convex minorants of K correlated two-sided Brownian motion processes plus parabolic drifts, plus an additional term involving the difference between the sum of the K drifting Brownian motion processes and their convex minorants. In Chapter 7 we consider estimation of smooth functionals. Jewell, Van der Laan and Henneman (2003) suggested that the naive estimator yields asymptotically efficient smooth functionals. We show that this is indeed the case, and that the same holds for the MLE. In Chapter 8 we apply our methods to real and simulated data. We compare

22 6 the MLE and the naive estimator in a simulation study, considering both pointwise estimation and the estimation of smooth functionals. For pointwise estimation, we show that the MLE is superior to the naive estimator in terms of mean squared error, both for small and large sample sizes. For the estimation of smooth functionals, we show that the behavior of the MLE and the naive estimator is similar, and in agreement with the results in Chapter 7. Finally, in Chapter 9 we consider an extension of the model, where X is subject to interval censoring case k, and Y is a continuous random variable. This model is referred to as the interval censored continuous mark model. It is applicable to HIV vaccine clinical trials by letting X be the time of HIV infection, and Y be the viral distance between the infecting HIV virus and the virus present in the vaccine. We derive the limit of the MLE in this model, and show that the MLE is inconsistent in general. We also suggest a simple method for repairing the MLE by discretizing Y, an operation that transforms the data to interval censored data with competing risks. We illustrate the behavior of the MLE and the repaired MLE in four examples.

23 7 Chapter 2 THE ESTIMATORS In this chapter we study finite sample properties of the MLE and the naive estimator. In Section 2.1 we formally define the model and the estimators. Since both estimators can be viewed as maximum likelihood estimators for censored data, Section 2.2 provides a general discussion on the MLE for censored data. In Section 2.3 we use a graph theoretic perspective to derive properties of the estimators. Finally, in Section 2.4, we characterize the estimators in terms of necessary and sufficient Fenchel and convex minorant conditions. 2.1 Definition of the estimators Before we define the MLE and the naive estimator, we introduce some assumptions and notation. Recall that K N denotes the number of competing risks. The variables of interest are (X, Y ), where X R + is the failure time of a system, and Y {1,..., K} is the corresponding failure cause. We do not observe (X, Y ) directly. Rather, we observe the system at a random observation time T R +. At this time, we observe whether or not failure occurred, and if and only if failure occurred, we also observe the failure cause Y. Our goal is nonparametric estimation of the bivariate distribution function of (X, Y ), or equivalently, of the vector of sub-distribution functions F 0 = (F 01,...,F 0K ), where F 0k (t) = P(X t, Y = k), k = 1,..., K. We make the following assumptions:

24 8 (a) T is independent of (X, Y ); (b) The system cannot fail from two or more causes at the same time. Assumption (a) is essential for the development of the theory, and is used in the definition of the estimators in Sections and Assumption (b) ensures that the failure cause is well defined. This assumption is always satisfied by defining simultaneous failure from several causes as a new failure cause. We do not make any other assumptions. In particular, we do not require that all observation times are distinct Notation We denote the observed data by Z = (T, ), where = ( 1,..., K+1 ) and k = 1{X T, Y = k}, k = 1,...,K, (2.1) K+1 = 1{X > T }. (2.2) Thus, for k = 1,..., K, k = 1 if and only if failure happened by time T and was due to cause k. Furthermore, K+1 = 1 if and only if failure did not happen by time T. Note that K+1 k=1 k = 1, and hence K+1 = 1 K k=1 k. A graphical representation of the observed data is given in Figure 2.1. Let Z 1,...,Z n be n i.i.d. observations of Z, where Z i = (T i, i ) and i = ( i1,..., i,k+1 ). We call an observation Z i right censored if i,k+1 = 1, and left censored otherwise. Let T (1),...,T (n) be the order statistics of T 1,...,T n, where ties are broken arbitrarily after ensuring that left censored observation are ordered before right censored observations. We denote the corresponding -vectors by (1),..., (n), where (i) = ( (i)1,..., (i),k+1 ).

25 = (1, 0, 0, 0) = (0, 1, 0, 0) T T = (0, 0, 1, 0) = (0, 0, 0, 1) T T Figure 2.1: Graphical representation of the observed data (T, ) in an example with K = 3 competing risks. The grey sets indicate the values of (X, Y ) that are consistent with (T, ), for each of the four possible values of. Let e k, k = 1,..., K + 1, be the kth unit vector in R K+1, and let Z = {(t, e k ) : t R +, k = 1,...,K + 1}. (2.3) Let G be the distribution of T, and let G n be the empirical distribution of T 1,...,T n. Furthermore, let P n be the empirical distribution of Z 1,...,Z n, i.e., for any function h : Z R we have P n h(z) = h(z)dp n (z) = 1 n n i=1 h(z i). For vectors x = (x 1,...,x K ) R K, we define x + = K k=1 x k and x K+1 = 1 x +. For example, we write + = K k=1 k, F 0+ (t) = K k=1 F 0k(t) and F 0,K+1 (t) = 1 F 0+ (t). The only exception to the notation x K+1 = 1 x + is that we do not use it for the naive estimator. The reason for this will become clear in Section

26 The MLE We now define the MLE F n = ( F n1,..., F nk ) for F 0 = (F 01,..., F 0K ). Note that T Multinomial K+1 (1, (F 01 (T),..., F 0,K+1 (T))). (2.4) Hence, under F = (F 1,..., F K ), the density for a single observation z = (t, δ) is p F (z) = K+1 k=1 F k (t) δ k, (2.5) with respect to the dominating measure µ = G #, where # is counting measure on {e k : k = 1,...,K + 1}. The corresponding log likelihood (divided by n) 1 is l n (F) = log p F (u, δ)dp n (u, δ) = K+1 k=1 δ k log F k (u)dp n (u, δ), (2.6) and the MLE (if it exists) 2 is defined by l n ( F n ) = max F F K l n (F), (2.7) where F K is the set of all K-tuples of sub-distribution functions on R + with pointwise sum bounded by one. Note that we can absorb G in the dominating measure µ because of the assumed independence between T and (X, Y ) The naive estimator We now define the naive estimator F n = ( F n1,..., F n,k+1 ). The naive estimator F nk can be viewed as the MLE for the reduced current status data Z k = (T, k ). To see 1 In order to efficiently use the empirical process notation, we use the convention of dividing all log likelihoods by n. 2 Existence of the estimators will follow from Theorem 2.1 ahead.

27 11 this, let p k,fk (u, δ) be the marginal density of the reduced current status data Z k : p k,fk (u, δ) = F k (u) δ k {1 F k (u)} 1 δ k. Then the naive estimator F nk maximizes the marginal log likelihood l nk (F k ) = = log p k,fk (u, δ)dp n (u, δ) {δ k log F k (u) + (1 δ k ) log(1 F k (u))}dp n (u, δ), (2.8) for k = 1,...,K + 1. Thus, the naive estimators (if they exist) are defined by l nk ( F nk ) = max F k F l nk(f k ), k = 1,...,K, (2.9) l n,k+1 ( F n,k+1 ) = max S S l n,k+1(s). (2.10) where F is the collection of all sub-distribution functions on R +, and S is the collection of all sub-survival functions on R +. Note that we can omit G in the marginal log likelihood, since T and (X, Y ) are independent. The naive estimator provides two different estimators for the overall failure time distribution F 0+, namely F n+ = K k=1 F nk and 1 F n,k+1. Since the naive estimator does not require the sum of the sub-distribution functions to be bounded by one, F n+ may exceed one. In contrast, 1 F n,k+1 is always bounded between zero and one. This estimator is simply the MLE for the overall failure time distribution when information on the failure causes is ignored. In general, Fn,K+1 1 F n+, and we therefore do not use the shorthand notation x K+1 = 1 x + for the naive estimator Comparison of the two estimators In order to point out the similarities and differences between the MLE and the naive estimator, we give the following alternative but equivalent definition of the naive

28 12 estimator. For F = (F 1,...,F K ), we define ln (F) = K k=1 [ ] δ k log F k (u) + (1 δ k ) log(1 F k (u)) dp n (u, δ). (2.11) Then the naive estimator F n = ( F n1,..., F nk ) (if it exists) is defined by ln ( F n ) = max F F K ln (F), (2.12) where F K is the space of all K-tuples of sub-distribution functions on R +. Comparing this optimization problem with the optimization problem (2.7) for the MLE, we see the following two differences: (a) The log likelihood (2.6) for the MLE contains a term involving F K+1 (u) = 1 F + (u), while the log likelihood (2.11) for the naive estimator does not include such a term; (b) The space F K for the MLE includes the constraint that the sum of the subdistribution functions is bounded by one, while the space F K for the naive estimator does not include such a constraint. Thus, the MLE takes into account the K-dimensional system of sub-distribution functions, while the naive estimator ignores this aspect of the problem. In fact, since the sub-distribution functions in optimization problem (2.12) are not related to each other, the optimization problem can be split into the K optimization problems defined in (2.9). Since these optimization problems correspond to the MLE for univariate current status data, both computational results and asymptotic theory follow straightforwardly from known results for current status data (see Groeneboom and Wellner (1992, Part II, Sections 1.1, 4.1 and 5.1)). The fact that the MLE takes into account the system of sub-distribution functions leads to more complicated computation and asymptotic theory. However, these com-

29 13 plications result in a better pointwise behavior of the MLE, as shown in the simulation study in Section Censored data perspective From the definitions of the MLE and the naive estimator, we see that both estimators can be viewed as nonparametric maximum likelihood estimators for censored data. Viewing the estimators from this perspective allows us to use readily available computational algorithms and theory for the MLE for censored data. We consider the following general framework. Let W be a random variable taking values in W. Suppose that W has distribution F 0. Our goal is to estimate this distribution. However, we do not observe W directly. Rather, we observe a vector of random sets D = (D 1,...,D p ) that form a partition of W, i.e., p j=1 D j = W and D j D k = for j k {1,...,p}. We assume that D is independent of W. In principle, we can allow the number of random sets to be random, but for our purposes that is not needed. Furthermore, we observe an indicator vector = ( 1,..., p ), where j = 1{W D j }, j = 1,...,p. Thus, we observe a vector D containing a random partition of W, and an indicator vector indicating which set R {D 1,...,D p } contains the unobservable W. We call the set R an observed set. Using the convention 0 D j =, we can write R = p j=1 jd j. Let Z 1,...,Z n be n i.i.d. copies of Z = (D, ). These data define n i.i.d. observed sets R 1,...,R n. Writing the log likelihood in terms of these sets gives l n (F) = 1 n n log P F (R i ), i=1 where P F (R i ) denotes the probability mass in R i under distribution F. The maximum

30 14 likelihood estimator (if it exists) is defined by l n ( F n ) = max F F l n(f), (2.13) where F is the space of all distribution functions on W. Since l n (F) is optimized over the function space F, the optimization problem (2.13) is infinite dimensional. However, the number of parameters can be reduced by generalizing the reasoning of Turnbull (1976) for univariate censored data. It follows that the estimators can only assign mass to a finite collection of disjoint sets A 1,...,A m, called maximal intersections by Wong and Yu (1999). In the literature, there are several equivalent definitions of maximal intersections. Wong and Yu (1999) define A j to be a maximal intersection if and only if it is a finite intersection of the R i s such that for each i A j R i = or A j R i = A j. Gentleman and Vandal (2002) use a graph theoretic perspective. They show that the maximal intersections correspond to maximal cliques of the intersection graph of the observed sets. We discuss this perspective in detail in the next section. For observed sets that take the form of rectangles in R p, p N, Maathuis (2005) introduces yet another way to view the maximal intersections, using a height map of the observed sets. This height map is a function h : R p {0, 1,..., }, where h(x) is defined as the number of observed sets that overlap at the point x R p. Maathuis (2005) shows that the maximal intersections are exactly the local maxima of the height map of a canonical version of the observed sets. We say that R 1,...,R n are a canonical version of R 1,...,R n if the following three properties hold: (i) R 1,..., R n and R 1,...,R n have the same intersection structure, i.e., R i R j = if and only if R i R j =, for all i, j {1,...,n}; (ii) The x-coordinates of R 1,...,R n are distinct and take values in {1,...,2n}; (ii) The y-coordinates of R 1,...,R n are distinct and take values in {1,..., 2n}. Thus, any ties that may have been present in R 1,..., R n are resolved in R 1,...,R n, but in a way that does not affect the intersection structure. For details on the transformation to canonical sets, see Maathuis (2005, Section 2.1).

31 15 By generalizing the reasoning of Turnbull (1976), it follows that the MLE is indifferent to the distribution of mass within the maximal intersections. As a result, the MLE is typically not uniquely defined on the maximal intersections. This type of non-uniqueness is called representational non-uniqueness by Gentleman and Vandal (2002). Thus, we can at best hope to determine the probability masses α j = P F (A j ), j = 1,..., m. We let α = (α 1,...,α m ) and write the probability mass in an observed set R i in terms of α: P α (R i ) = m α j 1{A j R i }. (2.14) j=1 Then we can write the log likelihood as l n (α) = 1 n n log P α (R i ) = 1 n i=1 ( n m ) log α j 1{A j R i }. (2.15) i=1 j=1 Thus, we can think of the computation of the estimators as a two step process. First, in the reduction step, we compute the maximal intersections A 1,...,A m. Next, in the optimization step, we solve the optimization problem l n ( α) = max A l n(α), (2.16) where A = {α R m : α j 0, j = 1,...,m,1 T α = 1} and 1 is the all-one vector in R m. This optimization problem is an m-dimensional convex constrained optimization problem. Existence of the MLE follows directly from standard methods in optimization theory. Theorem 2.1 The MLE α defined by (2.16) exists.

32 16 Proof: Letting log(0) =, l n (α) is a continuous extended real valued function on the nonempty compact set A. Hence, the maximum exists by, e.g., Zeidler (1985, Corollary 38.10). The optimization problem (2.16) may have several solutions. This forms a second source of non-uniqueness for the MLE, called mixture non-uniqueness by Gentleman and Vandal (2002). We will show in Section 2.3 that for current status data with competing risks, both the MLE and the naive estimator are mixture unique. However, we first show how both estimators fit into the censored data framework Censored data perspective of the MLE For the MLE, the variable of interest is W = (X, Y ), taking values in the space W = R + {1,...,K}. The observation time T defines a partition of p = K + 1 random sets in W: D k = (0, T] {k}, k = 1,...,K, (2.17) D K+1 = (T, ) {1,..., K}. (2.18) Since there is a one-to-one correspondence between D = (D 1,...,D K+1 ) and T, the assumption that T is independent of (X, Y ) is equivalent to the assumption that D is independent of (X, Y ). Furthermore, note that k = 1{X T, Y = k} = 1{(X, Y ) D k } for k = 1,...,K, and K+1 = 1{X > T } = 1{(X, Y ) D K+1 }. Hence, the vector indicates which set contains the unobservable (X, Y ), and the observed data (T, ) give exactly the same information as (D, ). The corresponding observed sets are R = K+1 k=1 kd k, so that (0, T] {k} if k = 1, k = 1,...,K, R = (T, ) {1,...,K} if K+1 = 1. (2.19)

33 17 It follows that we can write the log likelihood (2.6) as l n (F) = 1 n n i=1 log P F(R i ). The MLE maximizes this expression over all bivariate sub-distribution functions F on R + {1,..., K}, or equivalently, over all K-tuples of sub-distribution functions F = (F 1,...,F K ) with pointwise sum bounded by one. We now consider the maximal intersections of the observed sets R 1,...,R n. Note that the observed sets can take the form (t, ) {1,...,K} for some t R +. Such sets are not rectangles in R 2, and hence we cannot directly use the concept of the height map of Maathuis (2005). However, by transforming such sets into (t, ) [1, K], we do have rectangles in R 2. We can then compute the maximal intersections using the concept of the height map. Afterwards we transform sets of the form (t, ) [1, K] back to (t, ) {1,..., K}. Once we have computed α, we obtain F nk (t) by summing the mass in (0, t] {k}, for k = 1,..., K and t R +. For each k {1,..., K + 1}, we call A a maximal intersection for F nk, if A is involved in the computation of F nk. A precise definition is given below. Definition 2.2 Let k {1,...,K}, and let R = {R 1,...,R n } be the observed sets as defined in (2.19). We call A a maximal intersection for F nk if it is a maximal intersection of R and A (R {k}). We call A a maximal intersection for F n+ (or equivalently, for F n,k+1 ) if A is a maximal intersection for some F nk, k = 1,..., K. Note that maximal intersections for F n+ are sets in R + {1,...,K}, although F n+ is a function on R +. Recall from Section that we order the observations such that their observation times are nondecreasing, where ties are broken arbitrarily after ensuring that left censored observations are ordered before right censored observations. Hence, if there is an observation Z i such that T i = T (n) and i,k+1 = 1, then (n),k+1 = 1 holds, even if there are other observations with T i = T (n) and ik = 1 for some k {1,..., K}. This is used in the following lemma, which provides information on the form of the maximal intersections for F nk. The lemma follows directly

34 18 from the idea of the height map. Lemma 2.3 Let k {1,..., K}. Each maximal intersection for F nk satisfies one of the following two conditions: (i) A = (T (i), T (j) ] {k}, with i < j, (i),k+1 = 1, (j)k = 1, and (l),k+1 = (l)k = 0 for all l such that T (i) < T (l) < T (j) ; (ii) A = (T (n), ) {1,..., K}, with (n),k+1 = 1. Moreover, if a set A satisfies one of these conditions, then A is a maximal intersection for F nk Censored data perspective of the naive estimator For the naive estimator F nk, we consider the reduced current status data Z k = (T, k ). Define the variables W k = X1{Y = k} + 1{Y k}, k = 1,...,K, W K+1 = X, taking values in W = R + { }. Note that F 0k (t) = P(W k t) for k = 1,..., K, and F 0,K+1 (t) = P(W K+1 > t). Hence we can take W 1,...,W K+1 to be our variables of interest. The observation time T defines a partition of p = 2 random sets in W: D 1 = (0, T] and D 2 = (T, ]. (2.20) Since there is a one-to-one correspondence between D = (D 1, D 2 ) and T, the assumption that T is independent of (X, Y ) is equivalent to the assumption that D is independent of W 1,...,W K+1.

35 19 For k = 1,..., K, note that k = 1{X T, Y = k} = 1{W k T } = 1{W k D 1 }. Hence, the vector ( k, 1 k ) indicates whether D 1 or D 2 contains the unobservable W k, and the reduced current status data (T, k ) give exactly the same information as (D, k ). The corresponding observed sets are R (k) = k D 1 (1 k )D 2, so that (0, T] if R (k) k = 1, = (T, ) if k = 0. (2.21) We can write the log likelihood (2.8) as l nk (F k ) = 1 n n i=1 log P F(R (k) i ). The naive estimator maximizes this expression over all sub-distribution functions F k on R +. For k = K + 1, note that K+1 = 1{X > t} = 1{W K+1 D 2 }. Hence, the vector (1 K+1, K+1 ) indicates whether D 1 or D 2 contains the unobservable X, and the reduced current status data (T, K+1 ) give exactly the same information as (D, K+1 ). The corresponding observed sets are R (K+1) = (1 K+1 )D 1 K+1 D 2, so that (0, T] if R (K+1) K+1 = 0, = (T, ) if K+1 = 1. (2.22) We can write the log likelihood (2.8) as l n,k+1 (S) = 1 n n i=1 log P S(R (K+1) i ). The naive estimator F n,k+1 maximizes this expression over all sub-survival functions S on R +. Definition 2.4 For k = 1,..., K + 1, we call A a maximal intersection for F nk if it is a maximal intersection of the observed sets R (k) 1,...,R n (k) (2.22). as defined in (2.21) and The maximal intersections for the naive estimator are described in Lemmas 2.5 and 2.6. Both lemmas follow directly from the idea of the height map. Lemma 2.5 Let k {1,...,K}. Each maximal intersections A for F nk satisfies one of the following two conditions:

36 20 (i) A = (T (i), T (j) ], with (T (i), T (j) ) {T 1,..., T n } =, (i)k = 0, and (j)k = 1. (ii) A = (T (n), ), with (n)k = 0. Moreover, if an interval A satisfies one of these conditions, then it is a maximal intersection for F nk. Lemma 2.6 Each maximal intersection for F n,k+1 satisfies one of the following two conditions: (i) A = (T (i), T (j) ], with (T (i), T (j) ) {T 1,...,T n } =, (i),k+1 = 1, and (j),k+1 = 0. (ii) A = (T (n), ), with (n),k+1 = 1. Moreover, if an interval A satisfies one of these conditions, then A is a maximal intersection for F n,k Comparing the maximal intersections for both estimators Definition 2.7 For any set A R 2, we define the x-interval and y-interval of A to be the projections of A on the x-axis and y-axis. Furthermore, we define the lower and upper endpoint of A to be the lower and upper endpoint of its x-interval. We now compare the maximal intersections for F nk and F nk, for k {1,...,K}. Lemma 2.8 For each k = 1,..., K, the number of maximal intersections for F nk is at least as large as the number of maximal intersections for F nk. Moreover, each upper endpoint of a maximal intersection for F nk is an upper endpoint of a maximal intersection for F nk. Proof: Let A be a maximal intersection for F nk. We show that there is a maximal intersection for F nk with the same upper endpoint. Note that A must satisfy one of

37 21 the two conditions of Lemma 2.3. First, suppose that the A = (T (n), ) {1,..., K} with (n),k+1 = 1. Then (n)k = 0, and A = (T (n), ) is a maximal intersection for F nk by Lemma 2.5. Next, suppose that A = (T (i), T (j) ] {k}, with (i),k+1 = 1, (j)k = 1 and (l)k = (l),k+1 = 0 for all l such that T (i) < T (l) < T (j). Then (j 1)k = 0, and hence A = (T (j 1), T (j) ] is a maximal intersection for F nk by Lemma 2.5. Lemma 2.9 The number of maximal intersections for F n,k+1 is at most as large as the number of maximal intersections for F n,k+1. Moreover, the collection of lower endpoints of the maximal intersections for F n,k+1 is identical to the collection of lower endpoints of the maximal intersections for F n,k+1. As a result, the number of regions on the x-axis where F n,k+1 can put mass is identical to the number of regions on the x-axis where F n,k+1 can put mass. Finally, the union of the maximal intersections for F n,k+1 is contained in the union of the x-intervals of the maximal intersections for F n,k+1. Proof: Let A be a maximal intersection for F n,k+1. We show that there is a maximal intersection for F n,k+1 with the same lower endpoint. Note that A must satisfy one of the two conditions of Lemma 2.6. First, suppose that A = (T (i), T (j) ] with (T (i), T (j) ) {T 1,...,T n } =, (i),k+1 = 1 and (j),k+1 = 0. Since (j),k+1 = 0, there must be a k {1,..., K} such that (j)k = 1. But this implies that (T (i), T (j) ] {k} is a maximal intersection for F nk, by Lemma 2.3. Next, suppose that A = (T (n), ) with (n),k+1 = 1. Then (T (n), ) {1,..., K} is a maximal intersection for F n1,..., F nk by Lemma 2.3, and hence it is a maximal intersection for F n,k+1 by definition. Next, let A be a maximal intersection for F n,k+1. We show that there is a maximal intersection for F n,k+1 with the same lower endpoint. By definition, it follows that there is a k {1,...,K} so that A is a maximal intersection for F nk. Hence, A must satisfy one of the two conditions of Lemma 2.3. First, suppose that A = (T (i), T (j) ] {k}, with (i),k+1 = 1, (j)k = 1 and (l)k = (l)k+1 = 0 for all l

38 22 Table 2.1: Example data with K = 2 competing risks, illustrating that the number of positive maximal intersections for F n,k+1 can be larger than the number of positive maximal intersections for F n,k+1. i t (i) δ (i)1 δ (i)2 δ (i) i t (i) δ (i)1 δ (i)2 δ (i) such that T (i) < T (l) < T (j). If S = (T (i), T (j) ) {T 1,..., T n } =, then (T (i), T (j) ] is a maximal intersection for F n,k+1 by Lemma 2.6. Otherwise, (T (i), min{s}] is a maximal intersection for F n,k+1. Next, suppose that A = (T (n), ) {1,..., K} with (n),k+1 = 1. Then (T (n), ) is a maximal intersection for F n,k+1 by Lemma 2.6. The last statement follows by combining the fact that the collection of lower endpoints of the maximal intersections for F n,k+1 and F n,k+1 are identical, with the fact that maximal intersections for F n,k+1 cannot contain observation times in their interior (Lemma 2.6). Remark 2.10 The last statement of Lemma 2.9 has implications for representational non-uniqueness of the estimators. It shows that it is possible that the area in which the MLE F n,k+1 suffers from representational non-uniqueness is larger than the area in which F n,k+1 suffers from representational non-uniqueness. This was also noted by Hudgens, Satten and Longini (2001), and partly motivated their pseudo-likelihood estimator. However, note that it can also happen that F n,k+1 is non-unique over a larger area, if many of the maximal intersections for F n,k+1 get zero mass. For an example, see Tables 2.1 and 2.2. Motivated by Remark 2.10, we now consider maximal intersections that get positive mass. We introduce the following terminology:

39 23 Table 2.2: The estimators for the data in Table 2.1, in terms of their maximal intersections (MIs) and the corresponding probability masses. F n,k+1 MIs mass (0, 1] {1} 3/10 (3, 4] {1} 0 (5, 8] {1} 0 (5, 6] {2} 7/10 F n,k+1 MIs mass (0, 1] 1/3 (3, 4] 1/6 (5, 6] 1/2 Definition 2.11 Let k {1,..., K + 1}. We say that A is a positive maximal intersection for F nk if A is a maximal intersection for F nk and the MLE assigns positive mass to A. Similarly, we say that F nk is a positive maximal intersection for F nk if A is a maximal intersection for F nk and F nk assigns positive mass to A. After reading Lemma 2.9, one may wonder whether the number of positive maximal intersections for F n,k+1 is at most as large as the number of positive maximal intersections for F n,k+1. This is indeed often the case in simulations, but not always. A counter example can be found in Table 2.1. In this example, Fn,K+1 has four maximal intersections, given in Table 2.2. The naive estimator F n,k+1 has three maximal intersections, with corresponding masses given in Table 2.2. Note that the maximal intersections satisfy the statement in Lemma 2.9. However, there are only two positive maximal intersections for F n,k+1, while there are three positive maximal intersections for F n,k Graph theory and uniqueness Gentleman and Vandal (2001), Gentleman and Vandal (2002), Maathuis (2003), and Vandal, Gentleman and Liu (2006) use a graph theoretic perspective to study properties of the maximum likelihood estimator for censored data. Before we apply these methods to our problem, we give an introduction to graph theory. This introduction

40 24 is mostly based on Golumbic (1980), and also partly given in Maathuis (2003, Section 3.3) Introduction to graph theory for censored data Let G = (V, E) be an undirected graph, where V is a set of vertices, and E is a set of edges. An edge is a collection of two vertices. Two vertices v and w are said to be adjacent in G if there is an edge between v and w, i.e., vw E. We say that two sets of vertices S 1 and S 2 are adjacent if there is at least one pair of vertices (v, w) such that v S 1, w S 2 and vw E. A subgraph of G = (V, E) is defined to be any graph G = (V, E ) such that V V and E E. Given a subset A V of vertices, we define the subgraph induced by A to be G A = (A, E A ), where E A = {xy E : x A, y A}. We call a subset M V of vertices a clique if every pair of distinct vertices in M is adjacent. We call M V a maximal clique if there is no clique in G that properly contains M as a subset 3. Every finite graph has a finite number of maximal cliques that we denote by C = {C 1,...,C m }. Let R = {R 1,...,R n } be a family of sets. The intersection graph of R is obtained by representing each set in R by a vertex, and connecting two vertices by an edge if and only if their corresponding sets intersect. An intersection graph of a collection of intervals on a linearly ordered set is called an interval graph. Alternatively, an undirected graph G is called an interval graph if it can be thought of as an intersection graph of a set of intervals on the real line. Every maximal clique C j in an intersection graph has a real representation A j = R C j R, given by the intersection of the sets that form the maximal clique. A sequence of vertices (v 0, v 1,...,v l ) is called a cycle of length l + 1 if v i 1 v i E for all i = 1,..., l and v l v 0 E. A cycle (v 0,...,v l ) is called a simple cycle if v i v j 3 Instead of the terms clique and maximal clique, some authors use the terms complete subgraph and clique.

41 25 for i j. A simple cycle (v 0, v 1,...,v l ) is called chordless if for all i = 0,..., l, v i v j E only for j = (i ± 1) mod (l + 1). A graph is called triangulated if it does not contain chordless cycles of length strictly greater than three. Hajös (1957) showed that every interval graph is triangulated. A clique graph of R is an intersection graph of the maximal cliques C. Thus, in this graph each vertex represents a maximal clique, and two vertices C j and C k are adjacent if and only if C j C k, i.e., if there is at least one set in R that is an element of both C j and C k. We define the clique matrix to be a vertices versus maximal cliques incidence matrix. For n observed sets with m maximal cliques, this is an n m matrix H with elements H ij = 1{A j R i }. 4 We now return to the maximum likelihood estimator for censored data. Let R = {R 1,...,R n } be the observed sets. Gentleman and Vandal (2001) showed that the maximal intersections A 1,..., A m of R, defined in Section 2.2, are exactly the real representations of the maximal cliques of the intersection graph of R. Hence, we can study the intersection graph to deduce properties of the MLE. In particular, Gentleman and Vandal (2002, Lemma 4) showed that α is unique if the intersection graph is triangulated. An alternative proof can be found in Maathuis (2003, Lemma 3.13). Finally, we can use the clique matrix H to rewrite the optimization problem (2.16). Namely, P α (R i ) = (Hα) i, so that (2.16) becomes l n ( α) = max A n log ((Hα) i ) Graph theoretic aspects and uniqueness of the naive estimator i=1 For k = 1,...,K + 1, let R (k) = {R (k) 1,...,R(k) n } be the observed sets for the naive estimator F nk, as defined in (2.21) and (2.22). The following proposition uses the structure of the intersection graph and the form of the maximal intersections to 4 Note that our H is the transpose of the incidence matrix defined in Gentleman and Vandal (2002, page 559).

Inconsistency of the MLE for the joint distribution. of interval censored survival times. and continuous marks

Inconsistency of the MLE for the joint distribution. of interval censored survival times. and continuous marks Inconsistency of the MLE for the joint distribution of interval censored survival times and continuous marks By M.H. Maathuis and J.A. Wellner Department of Statistics, University of Washington, Seattle,

More information

arxiv:math/ v2 [math.st] 17 Jun 2008

arxiv:math/ v2 [math.st] 17 Jun 2008 The Annals of Statistics 2008, Vol. 36, No. 3, 1031 1063 DOI: 10.1214/009053607000000974 c Institute of Mathematical Statistics, 2008 arxiv:math/0609020v2 [math.st] 17 Jun 2008 CURRENT STATUS DATA WITH

More information

Maximum likelihood: counterexamples, examples, and open problems. Jon A. Wellner. University of Washington. Maximum likelihood: p.

Maximum likelihood: counterexamples, examples, and open problems. Jon A. Wellner. University of Washington. Maximum likelihood: p. Maximum likelihood: counterexamples, examples, and open problems Jon A. Wellner University of Washington Maximum likelihood: p. 1/7 Talk at University of Idaho, Department of Mathematics, September 15,

More information

Survival Analysis for Interval Censored Data - Nonparametric Estimation

Survival Analysis for Interval Censored Data - Nonparametric Estimation Survival Analysis for Interval Censored Data - Nonparametric Estimation Seminar of Statistics, ETHZ Group 8, 02. May 2011 Martina Albers, Nanina Anderegg, Urs Müller Overview Examples Nonparametric MLE

More information

arxiv:math/ v2 [math.st] 17 Jun 2008

arxiv:math/ v2 [math.st] 17 Jun 2008 The Annals of Statistics 2008, Vol. 36, No. 3, 1064 1089 DOI: 10.1214/009053607000000983 c Institute of Mathematical Statistics, 2008 arxiv:math/0609021v2 [math.st] 17 Jun 2008 CURRENT STATUS DATA WITH

More information

Maximum likelihood: counterexamples, examples, and open problems

Maximum likelihood: counterexamples, examples, and open problems Maximum likelihood: counterexamples, examples, and open problems Jon A. Wellner University of Washington visiting Vrije Universiteit, Amsterdam Talk at BeNeLuxFra Mathematics Meeting 21 May, 2005 Email:

More information

Likelihood Based Inference for Monotone Response Models

Likelihood Based Inference for Monotone Response Models Likelihood Based Inference for Monotone Response Models Moulinath Banerjee University of Michigan September 5, 25 Abstract The behavior of maximum likelihood estimates (MLE s) the likelihood ratio statistic

More information

1. Introduction In many biomedical studies, the random survival time of interest is never observed and is only known to lie before an inspection time

1. Introduction In many biomedical studies, the random survival time of interest is never observed and is only known to lie before an inspection time ASYMPTOTIC PROPERTIES OF THE GMLE WITH CASE 2 INTERVAL-CENSORED DATA By Qiqing Yu a;1 Anton Schick a, Linxiong Li b;2 and George Y. C. Wong c;3 a Dept. of Mathematical Sciences, Binghamton University,

More information

Topics in Current Status Data. Karen Michelle McKeown. A dissertation submitted in partial satisfaction of the. requirements for the degree of

Topics in Current Status Data. Karen Michelle McKeown. A dissertation submitted in partial satisfaction of the. requirements for the degree of Topics in Current Status Data by Karen Michelle McKeown A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor in Philosophy in Biostatistics in the Graduate Division

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2001 Paper 100 Maximum Likelihood Estimation of Ordered Multinomial Parameters Nicholas P. Jewell John

More information

The International Journal of Biostatistics

The International Journal of Biostatistics The International Journal of Biostatistics Volume 1, Issue 1 2005 Article 3 Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics Moulinath Banerjee Jon A. Wellner

More information

Bichain graphs: geometric model and universal graphs

Bichain graphs: geometric model and universal graphs Bichain graphs: geometric model and universal graphs Robert Brignall a,1, Vadim V. Lozin b,, Juraj Stacho b, a Department of Mathematics and Statistics, The Open University, Milton Keynes MK7 6AA, United

More information

On Some Three-Color Ramsey Numbers for Paths

On Some Three-Color Ramsey Numbers for Paths On Some Three-Color Ramsey Numbers for Paths Janusz Dybizbański, Tomasz Dzido Institute of Informatics, University of Gdańsk Wita Stwosza 57, 80-952 Gdańsk, Poland {jdybiz,tdz}@inf.ug.edu.pl and Stanis

More information

Disjoint G-Designs and the Intersection Problem for Some Seven Edge Graphs. Daniel Hollis

Disjoint G-Designs and the Intersection Problem for Some Seven Edge Graphs. Daniel Hollis Disjoint G-Designs and the Intersection Problem for Some Seven Edge Graphs by Daniel Hollis A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements

More information

Tree sets. Reinhard Diestel

Tree sets. Reinhard Diestel 1 Tree sets Reinhard Diestel Abstract We study an abstract notion of tree structure which generalizes treedecompositions of graphs and matroids. Unlike tree-decompositions, which are too closely linked

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

More information

Regression analysis of interval censored competing risk data using a pseudo-value approach

Regression analysis of interval censored competing risk data using a pseudo-value approach Communications for Statistical Applications and Methods 2016, Vol. 23, No. 6, 555 562 http://dx.doi.org/10.5351/csam.2016.23.6.555 Print ISSN 2287-7843 / Online ISSN 2383-4757 Regression analysis of interval

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Maximum likelihood estimation of ordered multinomial parameters

Maximum likelihood estimation of ordered multinomial parameters Biostatistics (2004), 5, 2,pp. 291 306 Printed in Great Britain Maximum lielihood estimation of ordered multinomial parameters NICHOLAS P. JEWELL Division of Biostatistics, School of Public Health, University

More information

4 CONNECTED PROJECTIVE-PLANAR GRAPHS ARE HAMILTONIAN. Robin Thomas* Xingxing Yu**

4 CONNECTED PROJECTIVE-PLANAR GRAPHS ARE HAMILTONIAN. Robin Thomas* Xingxing Yu** 4 CONNECTED PROJECTIVE-PLANAR GRAPHS ARE HAMILTONIAN Robin Thomas* Xingxing Yu** School of Mathematics Georgia Institute of Technology Atlanta, Georgia 30332, USA May 1991, revised 23 October 1993. Published

More information

The Strong Largeur d Arborescence

The Strong Largeur d Arborescence The Strong Largeur d Arborescence Rik Steenkamp (5887321) November 12, 2013 Master Thesis Supervisor: prof.dr. Monique Laurent Local Supervisor: prof.dr. Alexander Schrijver KdV Institute for Mathematics

More information

The Minimum Rank, Inverse Inertia, and Inverse Eigenvalue Problems for Graphs. Mark C. Kempton

The Minimum Rank, Inverse Inertia, and Inverse Eigenvalue Problems for Graphs. Mark C. Kempton The Minimum Rank, Inverse Inertia, and Inverse Eigenvalue Problems for Graphs Mark C. Kempton A thesis submitted to the faculty of Brigham Young University in partial fulfillment of the requirements for

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Maximum likelihood estimation of a log-concave density based on censored data

Maximum likelihood estimation of a log-concave density based on censored data Maximum likelihood estimation of a log-concave density based on censored data Dominic Schuhmacher Institute of Mathematical Statistics and Actuarial Science University of Bern Joint work with Lutz Dümbgen

More information

Some hard families of parameterised counting problems

Some hard families of parameterised counting problems Some hard families of parameterised counting problems Mark Jerrum and Kitty Meeks School of Mathematical Sciences, Queen Mary University of London {m.jerrum,k.meeks}@qmul.ac.uk September 2014 Abstract

More information

On the number of cycles in a graph with restricted cycle lengths

On the number of cycles in a graph with restricted cycle lengths On the number of cycles in a graph with restricted cycle lengths Dániel Gerbner, Balázs Keszegh, Cory Palmer, Balázs Patkós arxiv:1610.03476v1 [math.co] 11 Oct 2016 October 12, 2016 Abstract Let L be a

More information

Claw-Free Graphs With Strongly Perfect Complements. Fractional and Integral Version.

Claw-Free Graphs With Strongly Perfect Complements. Fractional and Integral Version. Claw-Free Graphs With Strongly Perfect Complements. Fractional and Integral Version. Part II. Nontrivial strip-structures Maria Chudnovsky Department of Industrial Engineering and Operations Research Columbia

More information

September Math Course: First Order Derivative

September Math Course: First Order Derivative September Math Course: First Order Derivative Arina Nikandrova Functions Function y = f (x), where x is either be a scalar or a vector of several variables (x,..., x n ), can be thought of as a rule which

More information

REALIZING TOURNAMENTS AS MODELS FOR K-MAJORITY VOTING

REALIZING TOURNAMENTS AS MODELS FOR K-MAJORITY VOTING California State University, San Bernardino CSUSB ScholarWorks Electronic Theses, Projects, and Dissertations Office of Graduate Studies 6-016 REALIZING TOURNAMENTS AS MODELS FOR K-MAJORITY VOTING Gina

More information

Quasi-randomness is determined by the distribution of copies of a fixed graph in equicardinal large sets

Quasi-randomness is determined by the distribution of copies of a fixed graph in equicardinal large sets Quasi-randomness is determined by the distribution of copies of a fixed graph in equicardinal large sets Raphael Yuster 1 Department of Mathematics, University of Haifa, Haifa, Israel raphy@math.haifa.ac.il

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Robust estimates of state occupancy and transition probabilities for Non-Markov multi-state models

Robust estimates of state occupancy and transition probabilities for Non-Markov multi-state models Robust estimates of state occupancy and transition probabilities for Non-Markov multi-state models 26 March 2014 Overview Continuously observed data Three-state illness-death General robust estimator Interval

More information

Modular Monochromatic Colorings, Spectra and Frames in Graphs

Modular Monochromatic Colorings, Spectra and Frames in Graphs Western Michigan University ScholarWorks at WMU Dissertations Graduate College 12-2014 Modular Monochromatic Colorings, Spectra and Frames in Graphs Chira Lumduanhom Western Michigan University, chira@swu.ac.th

More information

Multivariate Survival Analysis

Multivariate Survival Analysis Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in

More information

Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures

Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures David Hunter Pennsylvania State University, USA Joint work with: Tom Hettmansperger, Hoben Thomas, Didier Chauveau, Pierre Vandekerkhove,

More information

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms

More information

STRONG FORMS OF ORTHOGONALITY FOR SETS OF HYPERCUBES

STRONG FORMS OF ORTHOGONALITY FOR SETS OF HYPERCUBES The Pennsylvania State University The Graduate School Department of Mathematics STRONG FORMS OF ORTHOGONALITY FOR SETS OF HYPERCUBES A Dissertation in Mathematics by John T. Ethier c 008 John T. Ethier

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Aditya Bhaskara CS 5968/6968, Lecture 1: Introduction and Review 12 January 2016

Aditya Bhaskara CS 5968/6968, Lecture 1: Introduction and Review 12 January 2016 Lecture 1: Introduction and Review We begin with a short introduction to the course, and logistics. We then survey some basics about approximation algorithms and probability. We also introduce some of

More information

arxiv: v1 [math.co] 28 Oct 2016

arxiv: v1 [math.co] 28 Oct 2016 More on foxes arxiv:1610.09093v1 [math.co] 8 Oct 016 Matthias Kriesell Abstract Jens M. Schmidt An edge in a k-connected graph G is called k-contractible if the graph G/e obtained from G by contracting

More information

FAILURE-TIME WITH DELAYED ONSET

FAILURE-TIME WITH DELAYED ONSET REVSTAT Statistical Journal Volume 13 Number 3 November 2015 227 231 FAILURE-TIME WITH DELAYED ONSET Authors: Man Yu Wong Department of Mathematics Hong Kong University of Science and Technology Hong Kong

More information

ECE521 lecture 4: 19 January Optimization, MLE, regularization

ECE521 lecture 4: 19 January Optimization, MLE, regularization ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity

More information

Highly Hamiltonian Graphs and Digraphs

Highly Hamiltonian Graphs and Digraphs Western Michigan University ScholarWorks at WMU Dissertations Graduate College 6-017 Highly Hamiltonian Graphs and Digraphs Zhenming Bi Western Michigan University, zhenmingbi@gmailcom Follow this and

More information

The Algorithmic Aspects of the Regularity Lemma

The Algorithmic Aspects of the Regularity Lemma The Algorithmic Aspects of the Regularity Lemma N. Alon R. A. Duke H. Lefmann V. Rödl R. Yuster Abstract The Regularity Lemma of Szemerédi is a result that asserts that every graph can be partitioned in

More information

A NEW COMBINATORIAL FORMULA FOR CLUSTER MONOMIALS OF EQUIORIENTED TYPE A QUIVERS

A NEW COMBINATORIAL FORMULA FOR CLUSTER MONOMIALS OF EQUIORIENTED TYPE A QUIVERS A NEW COMBINATORIAL FORMULA FOR CLUSTER MONOMIALS OF EQUIORIENTED TYPE A QUIVERS D. E. BAYLERAN, DARREN J. FINNIGAN, ALAA HAJ ALI, KYUNGYONG LEE, CHRIS M. LOCRICCHIO, MATTHEW R. MILLS, DANIEL PUIG-PEY

More information

NONPARAMETRIC CONFIDENCE INTERVALS FOR MONOTONE FUNCTIONS. By Piet Groeneboom and Geurt Jongbloed Delft University of Technology

NONPARAMETRIC CONFIDENCE INTERVALS FOR MONOTONE FUNCTIONS. By Piet Groeneboom and Geurt Jongbloed Delft University of Technology NONPARAMETRIC CONFIDENCE INTERVALS FOR MONOTONE FUNCTIONS By Piet Groeneboom and Geurt Jongbloed Delft University of Technology We study nonparametric isotonic confidence intervals for monotone functions.

More information

Multi-state Models: An Overview

Multi-state Models: An Overview Multi-state Models: An Overview Andrew Titman Lancaster University 14 April 2016 Overview Introduction to multi-state modelling Examples of applications Continuously observed processes Intermittently observed

More information

Equational Logic. Chapter Syntax Terms and Term Algebras

Equational Logic. Chapter Syntax Terms and Term Algebras Chapter 2 Equational Logic 2.1 Syntax 2.1.1 Terms and Term Algebras The natural logic of algebra is equational logic, whose propositions are universally quantified identities between terms built up from

More information

Learning discrete graphical models via generalized inverse covariance matrices

Learning discrete graphical models via generalized inverse covariance matrices Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,

More information

Nonparametric estimation of log-concave densities

Nonparametric estimation of log-concave densities Nonparametric estimation of log-concave densities Jon A. Wellner University of Washington, Seattle Seminaire, Institut de Mathématiques de Toulouse 5 March 2012 Seminaire, Toulouse Based on joint work

More information

K 4 -free graphs with no odd holes

K 4 -free graphs with no odd holes K 4 -free graphs with no odd holes Maria Chudnovsky 1 Columbia University, New York NY 10027 Neil Robertson 2 Ohio State University, Columbus, Ohio 43210 Paul Seymour 3 Princeton University, Princeton

More information

SPECIAL T K 5 IN GRAPHS CONTAINING K 4

SPECIAL T K 5 IN GRAPHS CONTAINING K 4 SPECIAL T K 5 IN GRAPHS CONTAINING K 4 A Thesis Presented to The Academic Faculty by Dawei He In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in Mathematics School of Mathematics

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

Additive Isotonic Regression

Additive Isotonic Regression Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive

More information

MINORS OF GRAPHS OF LARGE PATH-WIDTH. A Dissertation Presented to The Academic Faculty. Thanh N. Dang

MINORS OF GRAPHS OF LARGE PATH-WIDTH. A Dissertation Presented to The Academic Faculty. Thanh N. Dang MINORS OF GRAPHS OF LARGE PATH-WIDTH A Dissertation Presented to The Academic Faculty By Thanh N. Dang In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in Algorithms, Combinatorics

More information

Generating p-extremal graphs

Generating p-extremal graphs Generating p-extremal graphs Derrick Stolee Department of Mathematics Department of Computer Science University of Nebraska Lincoln s-dstolee1@math.unl.edu August 2, 2011 Abstract Let f(n, p be the maximum

More information

Models for Multivariate Panel Count Data

Models for Multivariate Panel Count Data Semiparametric Models for Multivariate Panel Count Data KyungMann Kim University of Wisconsin-Madison kmkim@biostat.wisc.edu 2 April 2015 Outline 1 Introduction 2 3 4 Panel Count Data Motivation Previous

More information

Additional Constructions to Solve the Generalized Russian Cards Problem using Combinatorial Designs

Additional Constructions to Solve the Generalized Russian Cards Problem using Combinatorial Designs Additional Constructions to Solve the Generalized Russian Cards Problem using Combinatorial Designs Colleen M. Swanson Computer Science & Engineering Division University of Michigan Ann Arbor, MI 48109,

More information

Lecture 5: January 30

Lecture 5: January 30 CS71 Randomness & Computation Spring 018 Instructor: Alistair Sinclair Lecture 5: January 30 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They

More information

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS STEVEN P. LALLEY AND ANDREW NOBEL Abstract. It is shown that there are no consistent decision rules for the hypothesis testing problem

More information

Fair Factorizations of the Complete Multipartite Graph and Related Edge-Colorings. Aras Erzurumluoğlu

Fair Factorizations of the Complete Multipartite Graph and Related Edge-Colorings. Aras Erzurumluoğlu Fair Factorizations of the Complete Multipartite Graph and Related Edge-Colorings by Aras Erzurumluoğlu A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the

More information

The partially monotone tensor spline estimation of joint distribution function with bivariate current status data

The partially monotone tensor spline estimation of joint distribution function with bivariate current status data University of Iowa Iowa Research Online Theses and Dissertations Summer 00 The partially monotone tensor spline estimation of joint distribution function with bivariate current status data Yuan Wu University

More information

Minimax-Regret Sample Design in Anticipation of Missing Data, With Application to Panel Data. Jeff Dominitz RAND. and

Minimax-Regret Sample Design in Anticipation of Missing Data, With Application to Panel Data. Jeff Dominitz RAND. and Minimax-Regret Sample Design in Anticipation of Missing Data, With Application to Panel Data Jeff Dominitz RAND and Charles F. Manski Department of Economics and Institute for Policy Research, Northwestern

More information

Likelihood and Fairness in Multidimensional Item Response Theory

Likelihood and Fairness in Multidimensional Item Response Theory Likelihood and Fairness in Multidimensional Item Response Theory or What I Thought About On My Holidays Giles Hooker and Matthew Finkelman Cornell University, February 27, 2008 Item Response Theory Educational

More information

The extreme points of symmetric norms on R^2

The extreme points of symmetric norms on R^2 Graduate Theses and Dissertations Iowa State University Capstones, Theses and Dissertations 2008 The extreme points of symmetric norms on R^2 Anchalee Khemphet Iowa State University Follow this and additional

More information

1 Directional Derivatives and Differentiability

1 Directional Derivatives and Differentiability Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

DOUBLY PERIODIC SELF-TRANSLATING SURFACES FOR THE MEAN CURVATURE FLOW

DOUBLY PERIODIC SELF-TRANSLATING SURFACES FOR THE MEAN CURVATURE FLOW DOUBLY PERIODIC SELF-TRANSLATING SURFACES FOR THE MEAN CURVATURE FLOW XUAN HIEN NGUYEN Abstract. We construct new examples of self-translating surfaces for the mean curvature flow from a periodic configuration

More information

Multistate Modeling and Applications

Multistate Modeling and Applications Multistate Modeling and Applications Yang Yang Department of Statistics University of Michigan, Ann Arbor IBM Research Graduate Student Workshop: Statistics for a Smarter Planet Yang Yang (UM, Ann Arbor)

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

The Chromatic Number of Ordered Graphs With Constrained Conflict Graphs

The Chromatic Number of Ordered Graphs With Constrained Conflict Graphs The Chromatic Number of Ordered Graphs With Constrained Conflict Graphs Maria Axenovich and Jonathan Rollin and Torsten Ueckerdt September 3, 016 Abstract An ordered graph G is a graph whose vertex set

More information

Estimation of Conditional Kendall s Tau for Bivariate Interval Censored Data

Estimation of Conditional Kendall s Tau for Bivariate Interval Censored Data Communications for Statistical Applications and Methods 2015, Vol. 22, No. 6, 599 604 DOI: http://dx.doi.org/10.5351/csam.2015.22.6.599 Print ISSN 2287-7843 / Online ISSN 2383-4757 Estimation of Conditional

More information

Paradoxical Results in Multidimensional Item Response Theory

Paradoxical Results in Multidimensional Item Response Theory UNC, December 6, 2010 Paradoxical Results in Multidimensional Item Response Theory Giles Hooker and Matthew Finkelman UNC, December 6, 2010 1 / 49 Item Response Theory Educational Testing Traditional model

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information

Conflict-Free Colorings of Rectangles Ranges

Conflict-Free Colorings of Rectangles Ranges Conflict-Free Colorings of Rectangles Ranges Khaled Elbassioni Nabil H. Mustafa Max-Planck-Institut für Informatik, Saarbrücken, Germany felbassio, nmustafag@mpi-sb.mpg.de Abstract. Given the range space

More information

Counting independent sets of a fixed size in graphs with a given minimum degree

Counting independent sets of a fixed size in graphs with a given minimum degree Counting independent sets of a fixed size in graphs with a given minimum degree John Engbers David Galvin April 4, 01 Abstract Galvin showed that for all fixed δ and sufficiently large n, the n-vertex

More information

Graph coloring, perfect graphs

Graph coloring, perfect graphs Lecture 5 (05.04.2013) Graph coloring, perfect graphs Scribe: Tomasz Kociumaka Lecturer: Marcin Pilipczuk 1 Introduction to graph coloring Definition 1. Let G be a simple undirected graph and k a positive

More information

ARE202A, Fall 2005 CONTENTS. 1. Graphical Overview of Optimization Theory (cont) Separating Hyperplanes 1

ARE202A, Fall 2005 CONTENTS. 1. Graphical Overview of Optimization Theory (cont) Separating Hyperplanes 1 AREA, Fall 5 LECTURE #: WED, OCT 5, 5 PRINT DATE: OCTOBER 5, 5 (GRAPHICAL) CONTENTS 1. Graphical Overview of Optimization Theory (cont) 1 1.4. Separating Hyperplanes 1 1.5. Constrained Maximization: One

More information

Analysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates

Analysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates Communications in Statistics - Theory and Methods ISSN: 0361-0926 (Print) 1532-415X (Online) Journal homepage: http://www.tandfonline.com/loi/lsta20 Analysis of Gamma and Weibull Lifetime Data under a

More information

SZEMERÉDI S REGULARITY LEMMA FOR MATRICES AND SPARSE GRAPHS

SZEMERÉDI S REGULARITY LEMMA FOR MATRICES AND SPARSE GRAPHS SZEMERÉDI S REGULARITY LEMMA FOR MATRICES AND SPARSE GRAPHS ALEXANDER SCOTT Abstract. Szemerédi s Regularity Lemma is an important tool for analyzing the structure of dense graphs. There are versions of

More information

The Skorokhod reflection problem for functions with discontinuities (contractive case)

The Skorokhod reflection problem for functions with discontinuities (contractive case) The Skorokhod reflection problem for functions with discontinuities (contractive case) TAKIS KONSTANTOPOULOS Univ. of Texas at Austin Revised March 1999 Abstract Basic properties of the Skorokhod reflection

More information

Likelihood Based Inference for Monotone Response Models

Likelihood Based Inference for Monotone Response Models Likelihood Based Inference for Monotone Response Models Moulinath Banerjee University of Michigan September 11, 2006 Abstract The behavior of maximum likelihood estimates (MLEs) and the likelihood ratio

More information

Induced Saturation Number

Induced Saturation Number Graduate Theses and Dissertations Iowa State University Capstones, Theses and Dissertations 2012 Induced Saturation Number Jason James Smith Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/etd

More information

Classical Complexity and Fixed-Parameter Tractability of Simultaneous Consecutive Ones Submatrix & Editing Problems

Classical Complexity and Fixed-Parameter Tractability of Simultaneous Consecutive Ones Submatrix & Editing Problems Classical Complexity and Fixed-Parameter Tractability of Simultaneous Consecutive Ones Submatrix & Editing Problems Rani M. R, Mohith Jagalmohanan, R. Subashini Binary matrices having simultaneous consecutive

More information

Application of Time-to-Event Methods in the Assessment of Safety in Clinical Trials

Application of Time-to-Event Methods in the Assessment of Safety in Clinical Trials Application of Time-to-Event Methods in the Assessment of Safety in Clinical Trials Progress, Updates, Problems William Jen Hoe Koh May 9, 2013 Overview Marginal vs Conditional What is TMLE? Key Estimation

More information

Convergence in shape of Steiner symmetrized line segments. Arthur Korneychuk

Convergence in shape of Steiner symmetrized line segments. Arthur Korneychuk Convergence in shape of Steiner symmetrized line segments by Arthur Korneychuk A thesis submitted in conformity with the requirements for the degree of Master of Science Graduate Department of Mathematics

More information

Modeling and Stability Analysis of a Communication Network System

Modeling and Stability Analysis of a Communication Network System Modeling and Stability Analysis of a Communication Network System Zvi Retchkiman Königsberg Instituto Politecnico Nacional e-mail: mzvi@cic.ipn.mx Abstract In this work, the modeling and stability problem

More information

Consistency Under Sampling of Exponential Random Graph Models

Consistency Under Sampling of Exponential Random Graph Models Consistency Under Sampling of Exponential Random Graph Models Cosma Shalizi and Alessandro Rinaldo Summary by: Elly Kaizar Remember ERGMs (Exponential Random Graph Models) Exponential family models Sufficient

More information

Small Label Classes in 2-Distinguishing Labelings

Small Label Classes in 2-Distinguishing Labelings Also available at http://amc.imfm.si ISSN 1855-3966 (printed ed.), ISSN 1855-3974 (electronic ed.) ARS MATHEMATICA CONTEMPORANEA 1 (2008) 154 164 Small Label Classes in 2-Distinguishing Labelings Debra

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

Modulation of symmetric densities

Modulation of symmetric densities 1 Modulation of symmetric densities 1.1 Motivation This book deals with a formulation for the construction of continuous probability distributions and connected statistical aspects. Before we begin, a

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04

More information

with Current Status Data

with Current Status Data Estimation and Testing with Current Status Data Jon A. Wellner University of Washington Estimation and Testing p. 1/4 joint work with Moulinath Banerjee, University of Michigan Talk at Université Paul

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Linear-Time Algorithms for Finding Tucker Submatrices and Lekkerkerker-Boland Subgraphs

Linear-Time Algorithms for Finding Tucker Submatrices and Lekkerkerker-Boland Subgraphs Linear-Time Algorithms for Finding Tucker Submatrices and Lekkerkerker-Boland Subgraphs Nathan Lindzey, Ross M. McConnell Colorado State University, Fort Collins CO 80521, USA Abstract. Tucker characterized

More information

High dimensional ising model selection using l 1 -regularized logistic regression

High dimensional ising model selection using l 1 -regularized logistic regression High dimensional ising model selection using l 1 -regularized logistic regression 1 Department of Statistics Pennsylvania State University 597 Presentation 2016 1/29 Outline Introduction 1 Introduction

More information

Strongly chordal and chordal bipartite graphs are sandwich monotone

Strongly chordal and chordal bipartite graphs are sandwich monotone Strongly chordal and chordal bipartite graphs are sandwich monotone Pinar Heggernes Federico Mancini Charis Papadopoulos R. Sritharan Abstract A graph class is sandwich monotone if, for every pair of its

More information

Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics

Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics Moulinath Banerjee 1 and Jon A. Wellner 2 1 Department of Statistics, Department of Statistics, 439, West

More information