The partially monotone tensor spline estimation of joint distribution function with bivariate current status data

Size: px

Start display at page:

Download "The partially monotone tensor spline estimation of joint distribution function with bivariate current status data"

Allen Parsons
5 years ago
Views:

University of Iowa Iowa Research Online Theses and Dissertations Summer 00 The partially monotone tensor spline estimation of joint distribution function with

edu/etd/76 Recommended Citation Wu, Yuan. "The partially monotone tensor spline estimation of joint distribution function with bivariate current status data.

1 University of Iowa Iowa Research Online Theses and Dissertations Summer 00 The partially monotone tensor spline estimation of joint distribution function with bivariate current status data Yuan Wu University of Iowa Copyright 00 Yuan Wu This dissertation is available at Iowa Research Online: Recommended Citation Wu, Yuan. "The partially monotone tensor spline estimation of joint distribution function with bivariate current status data." PhD (Doctor of Philosophy) thesis, University of Iowa, Follow this and additional works at: Part of the Applied Mathematics Commons

2 THE PARTIALLY MONOTONE TENSOR SPLINE ESTIMATION OF JOINT DISTRIBUTION FUNCTION WITH BIVARIATE CURRENT STATUS DATA by Yuan Wu An Abstract Of a thesis submitted in partial fulfillment of the requirements for the Doctor of Philosophy degree in Applied Mathematical and Computational Sciences in the Graduate College of The University of Iowa July 00 Thesis Supervisor: Professor Ying Zhang

3 ABSTRACT The analysis of joint distribution function with bivariate event time data is a challenging problem both theoretically and numerically. This thesis develops a tensor splinebased nonparametric maximum likelihood estimation method to estimate the joint distribution function with bivariate current status data. Tensor I-splines are developed to replace the traditional tensor B-splines in approximating joint distribution function in order to simplify the restricted maximum likelihood estimation problem in computing. The generalized gradient projection algorithm is used to compute the restricted optimization problem. We show that the proposed tensor splinebased nonparametric estimator is consistent and that the rate of convergence can be as good as n /. Simulation studies with moderate sample sizes show that the finite-sample performance of the proposed estimator is generally satisfactory. Abstract Approved: Thesis Supervisor Title and Department Date

4 THE PARTIALLY MONOTONE TENSOR SPLINE ESTIMATION OF JOINT DISTRIBUTION FUNCTION WITH BIVARIATE CURRENT STATUS DATA by Yuan Wu A thesis submitted in partial fulfillment of the requirements for the Doctor of Philosophy degree in Applied Mathematical and Computational Sciences in the Graduate College of The University of Iowa July 00 Thesis Supervisor: Professor Ying Zhang

5 Graduate College The University of Iowa Iowa City, Iowa CERTIFICATE OF APPROVAL PH.D. THESIS This is to certify that the Ph.D. thesis of Yuan Wu has been approved by the Examining Committee for the thesis requirement for the Doctor of Philosophy degree in Applied Mathematical and Computational Sciences at the July 00 graduation. Thesis Committee: Ying Zhang, Thesis Supervisor Weimin Han Jian Huang Michael Jones Lihe Wang

6 ACKNOWLEDGEMENTS I am grateful to all those people who have made this dissertation possible and because of whom my graduate experience has been one that I will cherish forever. I deeply owe my gratitude to my advisor, Dr. Ying Zhang for his time and energy he put in developing my research skills and advising my thesis work, for his encouragement and support in setting my career goal, and for his patience in correcting my English writing. His wide knowledge and logical way of thinking have been of great value for me. Without his bright insights and personal guidance, I would never have been able to finish my dissertation. I am really grateful to my academic advisor in the Department of Mathematics, Dr. Lihe Wang for his caring during these years at the University of Iowa. I would never forget he supported me financially and in all other possible ways when I had a difficult time due to my spoken English. I would like to express my warm and sincere thanks to faculties and staffs in the Department of Mathematics, especially to Dr. Weimin Han and Dr. Yi Li. They both helped me in so many ways during my study at the University of Iowa. I am grateful to Dr. Jian Huang and Dr. Michael Jones for their detailed review and valuable comments on this work. I owe my thanks to my wife Xiaoli, who brought our newborn son, Aaron. Without her love and encouragement, it would have been impossible for me to finish this work. My special gratitude is due to my parents, my parents-in-law. ii

7 ABSTRACT The analysis of joint distribution function with bivariate event time data is a challenging problem both theoretically and numerically. This thesis develops a tensor splinebased nonparametric maximum likelihood estimation method to estimate the joint distribution function with bivariate current status data. Tensor I-splines are developed to replace the traditional tensor B-splines in approximating joint distribution function in order to simplify the restricted maximum likelihood estimation problem in computing. The generalized gradient projection algorithm is used to compute the restricted optimization problem. We show that the proposed tensor splinebased nonparametric estimator is consistent and that the rate of convergence can be as good as n /. Simulation studies with moderate sample sizes show that the finite-sample performance of the proposed estimator is generally satisfactory. iii

8 TABLE OF CONTENTS LIST OF TABLES vi LIST OF FIGURES viii CHAPTER INTRODUCTION Background of Bivariate Current Status Data Thesis Objective and Proposed Method TECHNICAL BACKGROUNDS Splines The B-splines The M-splines The I-splines Monotone Spline Functions Partially Monotone Tensor Spline Functions Results on Empirical Process Theory Some Useful Results on B-splines TENSOR SPLINE-BASED SIEVE NONPARAMETRIC MAXIMUM LIKE- LIHOOD ESTIMATION METHOD Likelihood of Bivariate Current Status Data Spline-based Maximum Likelihood Estimation B-spline-based Estimation I-spline-based Estimation ASYMPTOTIC PROPERTIES Regularity Conditions Consistency Convergence Rate Technical Lemmas NUMERICAL STUDIES Computing Algorithm iv

9 5. Simulation Studies Design of the Studies Results of the Studies DISCUSSION AND FUTURE WORK REFERENCES APPENDIX v

10 LIST OF TABLES 5. Comparison of Bias and MSE of the proposed sieve maximum likelihood estimation and the maximum pseudo likelihood estimation of bivariate Clayton Copula distributed event times when the underlying distribution is correctly specified with sample size n = 00, Kendall s τ = 0.5 (the association parameter α = 5/) Comparison of Bias and MSE of the proposed sieve maximum likelihood estimation and the maximum pseudo likelihood estimation of bivariate Clayton Copula distributed event times when the underlying distribution is correctly specified with sample size n = 00, Kendall s τ = 0.5 (the association parameter α = 5/) Comparison of Bias and MSE of the proposed sieve maximum likelihood estimation and the maximum pseudo likelihood estimation of bivariate Clayton Copula distributed event times when the underlying distribution is correctly specified with sample size n = 00, Kendall s τ = 0.50 (the association parameter α = )) The comparison of Bias and MSE of the proposed sieve maximum likelihood estimation and the maximum pseudo likelihood estimation of bivariate Clayton Copula distributed event times when the underlying distribution is correctly specified with sample size n = 00, Kendall s τ = 0.50 (the association parameter α = ) Comparison of Bias and MSE of the proposed sieve maximum likelihood estimation and the maximum pseudo likelihood estimation of bivariate Clayton Copula distributed event times when the underlying distribution is correctly specified with sample size n = 00, Kendall s τ = 0.75 (the association parameter α = 7) Comparison of Bias and MSE of the proposed sieve maximum likelihood estimation and the pseudo maximum likelihood estimation of bivariate Clayton Copula distributed event times when the underlying distribution is correctly specified with sample size n = 00, Kendall s τ = 0.75 (the association parameter α = 7) vi

11 5.7 Comparison of Bias and MSE of the proposed sieve maximum likelihood estimation and the maximum pseudo likelihood estimation of bivariate Gumbel Copula distributed event times when the underlying distribution is incorrectly specified as bivariate Clayton Copula model with sample size n = 00, Kendall s τ = 0.5 (the association parameter α = /) Comparison of Bias and MSE of the proposed sieve maximum likelihood estimation and the maximum pseudo likelihood estimation of bivariate Gumbel Copula distributed event times when the underlying distribution is incorrectly specified as bivariate Clayton Copula model with sample size n = 00, Kendall s τ = 0.5 (the association parameter α = /) Comparison of Bias and MSE of the proposed sieve maximum likelihood estimation and the maximum pseudo likelihood estimation of bivariate Gumbel Copula distributed event times when the underlying distribution is incorrectly specified as bivariate Clayton Copula model with sample size n = 00, Kendall s τ = 0.50 (the association parameter α = ) Comparison of Bias and MSE of the proposed sieve maximum likelihood estimation and the maximum pseudo likelihood estimation of bivariate Gumbel Copula distributed event times when the underlying distribution is incorrectly specified as bivariate Clayton Copula model and sample size n = 00, Kendall s τ = 0.50 (the association parameter α = ) Comparison of Bias and MSE of the proposed sieve maximum likelihood estimation and the maximum pseudo likelihood estimation of bivariate Gumbel Copula distributed event times when the underlying distribution is incorrectly specified as bivariate Clayton Copula model with sample size n = 00, Kendall s τ = 0.75 (the association parameter α = ) Comparison of Bias and MSE of the proposed sieve maximum likelihood estimation and maximum pseudo likelihood estimation of bivariate Gumbel Copula distributed event times when the underlying distribution is incorrectly specified as bivariate Clayton Copula model with sample size n = 00, Kendall s τ = 0.75 (the association parameter α = ) vii

12 LIST OF FIGURES 5. Comparison of the proposed sieve maximum likelihood estimation ((a), (c)) and the maximum pseudo likelihood estimation ((b), (d)) for the joint distribution function of bivariate Clayton Copula model with the underlying distribution being correctly specified when sample size n = 00, Kendall s τ = 0.5 (the association parameter α = 5/) Comparison of the proposed sieve maximum likelihood estimation (Sieve) and the nonparametric maximum likelihood estimation (Pseudo) for the marginal distribution function of T when (T, T ) is distributed according to Clayton Copula model and n = 00, Kendall s τ = 0.5 (the association parameter α = 5/) Comparison of the proposed sieve maximum likelihood estimation ((a), (c)) and the maximum pseudo likelihood estimation ((b), (d)) for the joint distribution function of bivariate Clayton Copula model with the underlying distribution being correctly specified when sample size n = 00, Kendall s τ = 0.5 (the association parameter α = 5/) Comparison of the proposed sieve maximum likelihood estimation (Sieve) and the nonparametric maximum likelihood estimation (Pseudo) for the marginal distribution function of T when (T, T ) is distributed according to Clayton Copula model and n = 00, Kendall s τ = 0.5 (the association parameter α = 5/) Comparison of the proposed sieve maximum likelihood estimation ((a), (c)) and the maximum pseudo likelihood estimation ((b), (d)) for the joint distribution function of bivariate Clayton Copula model with the underlying distribution being correctly specified when sample size n = 00, Kendall s τ = 0.50 (the association parameter α = ) Comparison of the proposed sieve maximum likelihood estimation (Sieve) and the nonparametric maximum likelihood estimation (Pseudo) for the marginal distribution function of T when (T, T ) is distributed according to Clayton Copula model and n = 00, Kendall s τ = 0.50 (the association parameter α = ) viii

13 5.7 Comparison of the proposed sieve maximum likelihood estimation ((a), (c)) and the maximum pseudo likelihood estimation ((b), (d)) for the joint distribution function of bivariate Clayton Copula model with the underlying distribution being correctly specified when sample size n = 00, Kendall s τ = 0.50 (the association parameter α = ) Comparison of the proposed sieve maximum likelihood estimation (Sieve) and the nonparametric maximum likelihood estimation (Pseudo) for the marginal distribution function of T when (T, T ) is distributed according to Clayton Copula model and n = 00, Kendall s τ = 0.50 (the association parameter α = ) Comparison of the proposed sieve maximum likelihood estimation ((a), (c)) and the maximum pseudo likelihood estimation ((b), (d)) for the joint distribution function of bivariate Clayton Copula model with the underlying distribution being correctly specified when sample size n = 00, Kendall s τ = 0.75 (the association parameter α = 7) Comparison of the proposed sieve maximum likelihood estimation (Sieve) and the nonparametric maximum likelihood estimation (Pseudo) for the marginal distribution function of T when (T, T ) is distributed according to Clayton Copula model and n = 00, Kendall s τ = 0.75 (the association parameter α = 7) Comparison of the proposed sieve maximum likelihood estimation ((a), (c)) and the maximum pseudo likelihood estimation ((b), (d)) for the joint distribution function of bivariate Clayton Copula model with the underlying distribution being correctly specified when sample size n = 00, Kendall s τ = 0.75 (the association parameter α = 7) Comparison of the proposed sieve maximum likelihood estimation (Sieve) and the nonparametric maximum likelihood estimation (Pseudo) for the marginal distribution function of T when (T, T ) is distributed according to Clayton Copula model and n = 00, Kendall s τ = 0.75 (the association parameter α = 7) Comparison of the proposed sieve maximum likelihood estimation ((a), (c)) and the maximum pseudo likelihood estimation ((b), (d)) for the joint distribution function of bivariate Clayton Copula model with the underlying distribution being correctly specified when sample size n = 00, Kendall s τ = 0.5 (the association parameter α = /) ix

14 5. Comparison of the proposed sieve maximum likelihood estimation (Sieve) and the nonparametric maximum likelihood estimation (Pseudo) for the marginal distribution function of T when (T, T ) is distributed according to Gumbel Copula model and n = 00, Kendall s τ = 0.5 (the association parameter α = /) Comparison of the proposed sieve maximum likelihood estimation ((a), (c)) and the maximum pseudo likelihood estimation ((b), (d)) for the joint distribution function of bivariate Gumbel Copula model with the underlying distribution being incorrectly specified as bivariate Clayton Copula model when sample size n = 00, Kendall s τ = 0.5 (the association parameter α = /) Comparison of the proposed sieve maximum likelihood estimation (Sieve) and the nonparametric maximum likelihood estimation (Pseudo) for the marginal distribution function of T when (T, T ) is distributed according to Gumbel Copula model and n = 00, Kendall s τ = 0.5 (the association parameter α = /) Comparison of the proposed sieve maximum likelihood estimation ((a), (c)) and the maximum pseudo likelihood estimation ((b), (d)) for the joint distribution function of bivariate Gumbel Copula model with the underlying distribution being incorrectly specified as bivariate Clayton Copula model when sample size n = 00, Kendall s τ = 0.50 (the association parameter α = ) Comparison of the proposed sieve maximum likelihood estimation (Sieve) and the nonparametric maximum likelihood estimation (Pseudo) for the marginal distribution function of T when (T, T ) is distributed according to Gumbel Copula model and n = 00, Kendall s τ = 0.50 (the association parameter α = ) Comparison of the proposed sieve maximum likelihood estimation ((a), (c)) and the maximum pseudo likelihood estimation ((b), (d)) for the joint distribution function of bivariate Gumbel Copula model with the underlying distribution being incorrectly specified as bivariate Clayton Copula model when sample size n = 00, Kendall s τ = 0.50 (the association parameter α = ) Comparison of the proposed sieve maximum likelihood estimation (Sieve) and the nonparametric maximum likelihood estimation (Pseudo) for the marginal distribution function of T when (T, T ) is distributed according to Gumbel Copula model and n = 00, Kendall s τ = 0.50 (the association parameter α = ) x

15 5. Comparison of the proposed sieve maximum likelihood estimation ((a), (c)) and the maximum pseudo likelihood estimation ((b), (d)) for the joint distribution function of bivariate Gumbel Copula model with the underlying distribution being incorrectly specified as bivariate Clayton Copula model when sample size n = 00, Kendall s τ = 0.75 (the association parameter α = ) Comparison of the proposed sieve maximum likelihood estimation (Sieve) and the nonparametric maximum likelihood estimation (Pseudo) for the marginal distribution function of T when (T, T ) is distributed according to Gumbel Copula model and n = 00, Kendall s τ = 0.75 (the association parameter α = ) Comparison of the proposed sieve maximum likelihood estimation ((a), (c)) and the maximum pseudo likelihood estimation ((b), (d)) for the joint distribution function of bivariate Gumbel Copula model with the underlying distribution being incorrectly specified as bivariate Clayton Copula model when sample size n = 00, Kendall s τ = 0.75 (the association parameter α = ) Comparison of the proposed sieve maximum likelihood estimation (Sieve) and the nonparametric maximum likelihood estimation (Pseudo) for the marginal distribution function of T when (T, T ) is distributed according to Gumbel Copula model and n = 00, Kendall s τ = 0.75 (the association parameter α = ) xi

16 CHAPTER INTRODUCTION. Background of Bivariate Current Status Data In some survival analysis applications, observation of the random event time T is restricted to the knowledge of whether or not T exceeds a random monitoring time C. This type of data is known as current status data, and sometimes referred to as interval censored data case I (Groenoboom and Wellner (99)). Current status data arise naturally in many applications, see for example, in animal tumorigenicity experiments (Hoel and Walburg (97), Finkelstein and Wolfe (985), and Finkelstein (986)). In these examples, for each experimental animal, T is the time from exposure to a potential carcinogen until occurrence of the tumor, and C is the time, on the same scale, of sacrifice. Upon sacrifice, the presence or absence of the tumor can be determined providing current status information on T. Another example of current status data arises in studies of the distribution of the age at weaning (Diamond et al. (986), Diamond and McDonald (99), and Grummer-Strawn (99)). Here T represents the age of a child at weaning and C the age at observation. It also arises in studies of human immunodeficiency virus (HIV) and acquired immunodeficiency syndrome (AIDS) ( Shiboski and Jewell (99), and Jewell et al. (99)). The univariate current status data have been thoroughly studied in recent years. Groenoboom and Wellner (99) studied the asymptotic properties of the nonparametric maximum likelihood estimator of the distribution function with current status data. Huang (996) considered Cox s proportional hazards model with current status data and showed

17 that the MLE of the regression parameter is asymptotically normal with n convergence rate, even through the MLE of the baseline cumulative hazard function only converges at n / rate. For a review of regression models for interval censored data, see Huang and Wellner (997). Bivariate event time data occur in many applications. For example, in an Australian twin study (Duffy et al. (990)) the researchers were interested in times to a certain event such as disease or disease-related symptoms in both twins. As in univariate case, both failure times can be censored. Some work has addressed the estimation of the joint distribution function of the correlated event times with bivariate right censored data. Analogous to the well-known Kaplan-Meier estimator for survival function, the bivariate Kaplan-Meier estimation on plane was proposed by Dabrowska (988). Kooperberg (998) discussed a tensor-spline estimation of the logarithm of joint density function with bivariate right censored data without studying the asymptotic properties of the estimator. For bivariate interval censored data case, a nonparametric two-stage method was proposed to estimate the joint distribution function in the literature. In this method, first the non-zero mass intersection rectangles are found, and then EM algorithm is applied to find the MLE of the joint distribution (Betensky and Finkelstein (999) and Yu, et al. (000)). In the second stage the EM algorithm could be replaced by the more efficient Iterative Convex Minarant method proposed by Groenoboom and Wellner (99). But this nonparametric estimation is not uniquely defined (Yu, Wong and He (000)) because the non-zero mass rectangles can not be uniquely determined. Moreover, the asymptotic properties of this type of nonparametric estimators

18 are difficulty to study. This thesis focuses on bivariate current status data. Let (T, T ) be the two event times of interest and (C, C ) the two corresponding random monitoring times. In this setting, bivariate current status data consist of (C, C, = I(T C ), = I(T C )), where I( ) is the indicator function. This data structure arises in the studies of two diseases in same patients or some common disease for two correlated subjects. For example, Wang and Ding (000) studied whether or not the onsets of hypertension and diabetes are correlated for people in two towns in Taiwan. By assuming a bivariate copula model Wang and Ding (000) proposed a two-stage estimation of the association parameter of two event times, in which the joint distribution of the failure time variables is assumed to follow a bivariate copula model (Nelsen (006)). First the nonparametric estimates of the marginal distributions are obtained, and then the association parameter is estimated by the maximum pseudo-likelihood method. This two-stage method facilitates an easy estimator of the joint distribution function through copula model, and is the only available method in the literature to estimate the joint distribution function with bivariate current status data. But if the copula model is not correctly specified, this estimator could be seriously biased. Jewell et al. (005) studied the relationship between the time to HIV infection to the partner and the time to diagnosis of AIDS for the index case by estimating smooth functionals of marginal distribution functions. In both examples, the bivariate event times have the same monitoring time, that is C = C = C. Hence the joint distribution function can be only studied on the diagonal, that is, only F (c, c) is identifiable. However, the common censoring assump-

19 tion may not be true. For example, suppose a study is conducted to explore the times to first use of marijuana between siblings. Although the interview may be conducted at same times for both siblings, the ages at interview (monitoring times) of the siblings are different and these give arise to the bivariate current status data with possibly different C and C. Ding and Wang (00) proposed a nonparametric procedure for testing marginal dependence in the general scenario when two censoring times for two events could be different. But their goal was not the estimation of joint distribution function. This thesis is concerned about the estimation of joint distribution function in the general scenario when C and C may or may not be equal.. Thesis Objective and Proposed Method In this thesis we propose to estimate the joint distribution function with current status data, using the tensor spline-based sieve maximum likelihood method. Conventionally, the estimate can be defined using the method similar to Betensky and Finkelstein (999), and Yu et al. (000). For these methods, the non-zero masses are determined for the intersection rectangles made by the collection of monitoring times. In addition to the uniqueness problem, as sample size increases, this problem is a high dimensional problem which is hard to deal with. To overcome these difficulties, we proposed a partially monotone tensor spline-based sieve estimation method for the the general bivariate current status data described in Section.. Under some regularity conditions, we show the tensor spline estimator is consistent and we derive the convergence rate of this estimator. Our simulation studies indicate that the proposed tensor spline estimation method performs very well and

20 5 better than a three-stage pseudo likelihood method extended from Wang and Ding method (000). The rest of thesis is organized as follows. In Chapter, Some technical backgrounds are introduced. First, we introduce the B-splines and the I-splines. We study the equivalency between the B-spines and the I- splines. The partially monotone tensor B-splines are used in studying the asymptotic properties of the proposed estimator. The partially monotone tensor I-splines are used in computing the restricted maximum likelihood estimate, since the constraints of the problem using the tensor I-splines is much simpler than that using the tensor B-splines. Second we introduce some basic concepts and results on empirical process theory, which will be heavily used in consistency proof and the derivation of convergence rate. Finally, we derive some useful technical results on the B-splines and the tensor B-splines. In Chapter, the tensor spline-based sieve nonparametric maximum likelihood estimation method of joint distribution function with bivariate current status data is proposed. First, we derive the likelihood of bivariate current status data. Second we derive the likelihood with the B-splines, in which the partially monotone tensor B-spline function is used for the joint distribution function and the monotone B-spline functions are used for the two marginal distribution functions and we represent the spline-based sieve nonparametric maximum likelihood estimation problem as a constrained optimization problem with respect to the coefficients of the B-splines. Finally we similarly derive the likelihood with the I-splines and represent the problem in terms of the I-splines in order to have a set of constraints that are easily dealt with in computation.

21 6 In Chapter, the asymptotic properties of the proposed estimator are studied using the technical results on the B-splines. Some regularity conditions are provided for the theoretical development regarding the joint distribution function, the censoring time distribution functions and the B-splines. First, we show that the proposed estimator is consistent. Second, we derive the convergence rate of the proposed estimator, which can be as good as n /. In Chapter 5, the numerical studies are carried out. First, the generalized gradient projection method is introduced to compute the estimate. Then, extensive simulation studies are conducted to examine the finite sample performance of the proposed method. We also compare the proposed method with the Wang and Ding s method (000). In Chapter 6, we describe the further applications of the proposed method are discussed.

22 7 CHAPTER TECHNICAL BACKGROUNDS In this chapter, we describe three versions of spline functions and their relationships. We also summarize some results on modern empirical process theory that will be heavily used in studying the asymptotic properties of the spline-based nonparametric estimator of the joint distribution function with bivariate current status data.. Splines In this section, we introduce three types of splines, the B-splines, the M-splines, and the I-splines. We describe the constructions of these splines and their relationships... The B-splines The normalized B-splines or simply the B-splines for brevity in this thesis, can be evaluated by the de Boor algorithm (de Boor, 00) as follows. For a partition or a knot sequence, that is, a nondecreasing sequence {u i }, the B-splines of order with this knot sequence are the characteristic functions given by, u i s < u i+, Ni (s) = 0, otherwise. (.) If u i = u i+, N i (s) = 0. The key characteristics of these functions are (i) N i (s) for i =,, are right continuous; (ii) i N i (s) =.

23 8 recursively by From the B-splines of order (.) the B-splines of higher order can be obtained Ni k (s) = ωi k (s)n k i (s) + ( ωi+(s))n k k i+ (s), with ωi k (s) = s u i u i+k u i, u i+k u i, 0, otherwise. If u i+k u i and u i+k u i+, we have Ni k (s) = s u i N k i (s) + u i+k s N k i+ (s). (.) u i+k u i u i+k u i+ There are some important properties for the B-splines which will be used throughout the rest of thesis and are summarized below. (B) (Theorem.8, Schumaker (98)) The B-splines have support on several knot intervals. Specifically, for the B-splines of order l, if u j s < u j+, then Ni(s) l 0, j l + i j, = 0, else. (B) (Theorem.0, Schumaker (98)) We already mentioned that the B-splines of order form a partition of unity. Actually, this property is true for the B-splines of any order. Specifically, for the B-splines of order l Ni(s) s l we have Ni(s) l =. i

24 9 (B) (Theorem 5.9, Schumaker (98)) The derivatives of the B-splines can be calculated by: N l i(s) s = l N l l i (s) N l i+ u i+l u i u i+l u (s). i+.. The M-splines In numerical analysis the M-splines are non-negative spline functions. In this thesis the M-splines are used to construct the I-splines. Curry and Schoenberg (966) proposed the construction of the M-splines as follows. Suppose a knot sequence is given by {u i }, the M-splines of order for this knot sequence are defined as Mi (s) = u i+ u i, u i s < u i+, 0, otherwise. (.) From the M-splines of order (.) the M-splines of higher order can be obtained recursively by Mi(s) l = l[(s u i)m l i (s) + (u i+l s)m l i+ (s)]. (.) (l )(u i+l u i ) Lemma.. Suppose the M-splines given by (.) and the B-splines given by (.) are associated with the same knot sequence, then they are closely related by M l i(s) = l u i+l u i N l i(s). (.5) Proof. (i) Note that M i (s) = u i+ u i N i (s), so the relationship is true for l =. (ii) Suppose the relation holds for l = k, then M k i (s) = k[(s u i)m k i (s) + (u i+k s)m k i+ (s)] (k )(u i+k u i )

25 0 = k[(s u i) k u i+k u i N k i (s) + (u i+k s) k u i+k u i+ N k i+ (s)] (k )(u i+k u i ) = = k s u i [ N k i (s) + u i+k s N k i+ u i+k u i u i+k u i u i+k u (s)] i+ k u i+k u i N k i (s), which implies (.5) holds for l = k. By induction (.5) holds for any positive integer l, which completes the proof... The I-splines Ramsay (988) proposed the I-splines, which are monotone functions constrained between 0 and. The I-splines can be used as spline basis functions for regression analysis and data transformation when monotonicity is desired. The I-splines are constructed through the M-splines as follows. Suppose the M-splines M l i s have the knot sequence {u i } p+l+ satisfying u = = u l+ < u l+ < < u p < u p+ = = u p+l+, (.6) where the first and the last l + knots are equal because the M-splines are of order l. The I-splines are defined as, i =, Ii(s) l = s M l L i(t)dt, < i p, (.7) with L s U, where L and U are the left and the right end point of the knot sequence {u i } p+l+, respectively. Since M l i(s) s given by (.) are piecewise polynomials of degree l, then I l i(s) s given by (.7) are piecewise polynomials of degree l. De Boor (00) showed U L M l i(t) =

26 for positive integers i and l, which, along with the fact that Mi(t) l are nonnegative, implies that the I-splines in (.7) are monotone non-decreasing function constrained between 0 and. Suppose the knot sequence {u i } p+l+ of the M-splines M l+ s satisfies (.6), a more convenient expression of the I-splines was given by Ramsay (988) as the following: i Ii(s) l = 0, i > j, j m=i (u m+l+ u m )M l+ m (s)/(l + ), j l + i j, (.8), i < j l +, for u j s < u j+ and i p. The following Lemma. and Lemma. together indicate that the I-splines given by (.7) are equivalent to the I-splines given by (.8). Lemma.. Suppose M l+ i s in (.8) and N l+ i s are associated with the same knot sequence, then the I-splines given by (.8) can be expressed by for i p. I l i(s) = p m=i N l+ m (s), (.9) Proof. By Lemma., (.8) can be further expressed in terms of the B-splines: Ii(s) l = 0, i > j, j m=i N l+ m (s), j l + i j, (.0), i < j l +,

27 for u j s < u j+ and i p. By the B-splines property (B) in Section... for the B-spline of order l +, if u j s < u j+, N l+ m (s) 0, j (l + ) + m j, = 0, else. So the expression of the I-splines given by (.0) can be rewritten as I l i(s) = p m=i Nm l+ (s). Lemma.. Suppose M l+ i s in (.8) and N l+ i s in (.9) are associated with the same knot sequence, the I-splines given by (.9) are equivalent to the I-splines given by (.7). Proof. (i) We show p m=i N m l+ (s) = s L M l i(t)dt for i =,..., p. We shall demonstrate this by the following two steps: (a) Prove ( p ( p m=i N m l+ (s)) = ( s L M i l(t)dt) s s m=i N m l+ s (s)) = m=i as follows. p l { N l l u m+l u m(s) N m u m+l+ u m+(s)} l m+ l = N u i+l u i(s) l + i p m=i l = N u i+l u i(s) l = Mi(s) l i = ( s L M l i(t)dt) s l l u m+l+ u m+ N l m+(s) (b) p m=i N l+ m (L) = 0 = L L M l i(t)dt.

28 (a) and (b) imply for i =,..., p, p m=i N m l+ (s) and s L M l i(t)dt have same derivative and they both are zero at the left end point of the knot sequence. Therefore p m=i N m l+ (s) = s M l L i(t)dt, for i =,..., p. (ii) For i =, it is trivial because of property (B) in Section... Remark.. (.9) provides a much easier way to compute the I-splines than using definition (.7) due to the available softwares for the B-splines. In Appendix the explicit form of the I-spline with order is given, along with the steps to to construct the I-splines through the B-splines in statistics package R... Monotone Spline Functions In the literature, {N l i : i =,,, l =,, } are referred to as the B-spline basis functions. In this thesis, we denote the B-spline functions as the linear combinations of the B-spline basis functions. Suppose we have the lth-order B-spline basis functions N l i(s) with knot sequence {u i } p+l satisfying u = = u l < u l+ < u p < u p+ = = u p+l. A B-spline function of order l is given by f(s) = p β i Ni(s). l (.) i= It is obvious by (B) that j f(s) = β i Ni(s) l i=j l+ for u j s < u j+.

29 According to (B) and (.), the derivative of f(s) is given by f s = = p N β i(s) l i s p l β i { N l i (s) u i+l u i i= i= p = i= (l )(β i+ β i ) N l i+ u i+l u (s), i+ l u i+l u i+ N l i+ (s)} (.) the last equality due to the fact that N l (u) 0 and N l p+(u) 0. Lemma.. If 0 β β β p, the B-spline function f(s) given by (.) is a nonnegative and nondecreasing function. Proof. It is obvious that f(s) is nonnegative. By (.), the derivative of f(s) is nonnegative as well. Similarly the I-spline functions in the thesis are denoted by linear combinations of the I-spline basis functions {I l i : i =,,, l =,, }. Specifically, a I-spline function of (l )th-order is given by f(s) = p i= γ i I l i (s). (.) Lemma.5. If γ i 0 for i =,, p, f(s) given by (.) is nonnegative and nondecreasing. Proof. Because the I-spline basis functions are nonnegative and nondecreasing, γ i 0 for i =,, p are sufficient to guarantee f(s) for being nonnegative and nondecreasing. Lemma.6. f(s) = p i= β ini(s) l with 0 β β β p and f(s) = p i= γ ii l i (s) with γ i 0 for i =,, p are equivalent to each other.

30 5 Proof. For a monotone spline function expressed in the I-splines by (.), substitute I l i (s) = p j=i N j(s), l it follows that f(s) = p i { γ j }Ni(s). l i= j= By γ i 0, i =,, p, we can rewrite the spline function (.) as f(s) = p β i Ni(s), l i= where β i = i j= γ j, i =,..., p. Then 0 β β β p. Remark.. Lemma.6 implies that a nonnegative and nondecreasing spline function can be equivalently expressed by either the B-splines or the I-splines...5 Partially Monotone Tensor Spline Functions Suppose we have the lth-order B-splines N (),l i (s) with knot sequence {u i } p+l satisfying u = = u l < u l+ < u p < u p+ = = u p+l and the lth-order B-splines N (),l j (t) with knot sequence {v i } q+l satisfying v = = v l < v l+ < v q < v q+ = = v q+l. The tensor B-spline functions are given by f(s, t) = p i= q j= α i,j N (),l i (s)n (),l j (t). (.) By (B) the partial derivatives of the tensor B-spline function can be expressed by f(s, t) s = p i= q j= (l )(α i+,j α i,j ) N (),l i+ (s)n (),l j (t), (.5) u i+l u i+ and f(s, t) t = q p i= j= (l )(α i,j+ α i,j ) v j+l v j+ N (),l i (s)n (),l j+ (t). (.6)

31 6 Lemma.7. If 0 α,j α,j α p,j for j =,,, q and 0 α i, α i, α i,q for i =,,, p, the tensor B-spline function f(s, t) given by (.) is nonnegative and nondecreasing in both s and t directions. Proof. It is obvious that f(s, t) is nonnegative. By (.5) and (.6), both partial derivatives of f(s, t) are nonnegative. Similarly the tensor I-spline functions are given by Lemma.8. If η i,j f(s, t) = p i= q i= η i,j I (),l i (s)i (),l j (t). (.7) 0 for i =,, p and j =,, q, f(s, t) given by (.7) is nonnegative and nondecreasing in both s and t directions. Proof. Because the I-spline basis functions are nonnegative and nondecreasing, η i,j 0 for i =,..., p and j =,..., q are sufficient to guarantee f(s, t) for being nonnegative and nondecreasing in both s and t directions. Let S I = {f(s, t) = and p i= q j= S B = {f(s, t) = η i,j I (),l i p i= q j= (s)i (),l j (t) : η i,j 0 for i =,..., p and j =,..., q}, α i,j N (),l i (s)n (),l j (t) : α i,js satisfy (.8)}, 0 α,j α,j α p,j, for j =,,, q, 0 α i, α i, α i,q, for i =,,, p, (α i+,j+ α i+,j ) (α i,j+ α i,j ) 0, (.8) for i =,,, p and j =,,, q.

32 7 Lemma.9. Set S B is equivalent to set S I. Proof. Substitute I (),l i = p x=i N x (),l and I (),l j = q y=j N y (),l in (.7), then f(s, t) = p i= q { j= i j x= y= η x,y }N (),l i (s)n (),l j (t). (.9) Let α i,j = i x= j y= η x,y, then (.9) can be written as f(s, t) = p i= q j= α i,j N (),l i (s)n (),l j (t). Then the following results (i), (ii), (iii) and (iv) can be easily obtained. (i) α, = η, 0. (ii) For i =,..., p and j =,..., q, α i,j+ α i,j = = j+ i i η x,y x= y= i η x,j+ 0. x= x= y= j x= y= (iii) For i =,..., p and j =,..., q, we have i+ j i α i+,j α i,j = η x,y = j η i+,y 0. y= j x= y= η x,y η x,y

33 8 (iv) For i =,..., p and j =,..., q, i+ j+ (α i+,j+ α i+,j ) (α i,j+ α i,j ) =( η x,y x= y= ( i+ j+ i η x,y x= y= i+ = η x,j+ x= x= y= i x= j η x,y ) i x= y= η x,j+ j η x,y ) (.0) By (i), (ii), (iii) and (iv), S I and S B are equivalent. =η i+,j+ 0. Remark.. Lemma.9 implies the partially monotone tensor I-spline functions are not equivalent to the partially monotone tensor B-spline functions, since there must be an extra condition (.0) for the partially monotone tensor B-spline function to make them equivalent. However this condition is very necessary for estimating the joint distribution function, because it corresponds to the fact that the mass of the joint distribution on any rectangle region of its domain is nonnegative.. Results on Empirical Process Theory Empirical process theory is a very powerful tool in studying the asymptotic properties of nonparametric or semiparametric estimator in Statistics. In this section we introduce some basic concepts and results which will be used in our theoretical development. Definition.. (Covering number, van der Vaart and Wellner (996), P. 8) The covering number N(ɛ, F, ) is the minimal number of balls {g : g f < ɛ} of radius ɛ needed to cover the set F. The entropy with covering H(ɛ, F, ) is the logarithm of the covering number.

34 9 Definition.. (Bracketing number, van der Vaart and Wellner (996), P. 8) Given two functions l and u, the bracket [l, u] is the set of all functions f with l f u. An ɛ-bracket is a bracket [l, u] with u l < ɛ. The bracketing number N [ ] (ɛ, F, ) is the minimum number of ɛ-brackets needed to cover F. The entropy with bracketing H [ ] (ɛ, F, ) is the logarithm of the bracketing number. Definition.. (VC-index and VC-class of sets, van der Vaart and Wellner (996), P. - 5) Let C a collection of subsets of a set X. An arbitrary set of n points {x,..., x n } possesses n subsets. Say that C picks out a certain subset from {x,..., x n } if this can be formed as a set of the form C {x,..., x n } for a C in C. The collection C is said to shatter {x,..., x n } if each of its n subsets can be picked out in this manner. The VC-index V (C) of the class C is the smallest n for which no set of size n is shattered by C. A collection of measurable sets C is called a VC-class of sets if its VC-index is finite. Definition.. (Symmetric convex hull, van der Vaart and Wellner (996), P. ) The symmetric convex hull sconv(f) of a class of functions is defined as the set of functions m i= α if i, with m i= α i and each f i contained in F. Definition.5. (VC-class of functions, van der Vaart and Wellner (996), P. ) The subgraph of a function f : X R is the subset of X R given by {(x, t) : t < f(x)}. A collection F of measurable functions on a sample space is called a VC-class of functions, if the collection of all subgraphs of the functions in F forms a VC-classes of sets in X R. Definition.6. (Envelop function, van der Vaart and Wellner (996), P. 8) An envelop function of a class F is any function x F (x) such that f(x) F (x), for every x and

35 0 f F. In the following lemmas, the L r (Q) norm associated with probability measure Q is defined by f Lr (Q) = (Q f r ) /r = ( f r dq) /r, for r > 0, the empirical measure P n of a sample of random elements x,..., x n is defined by P n f = n n f(x i ), i= and accordingly G n is denoted as n(p n P ). Lemma.0. (Theorem 5.7 in van der Vaart (998)) Let M n be random functions and let M be a fixed function of θ such that for every ɛ > 0 sup M n (θ) M(θ) P 0, θ Θ sup M(θ) < M(θ 0 ). θ:d(θ,θ 0 ) ɛ Then any sequence of estimators ˆθ n with M n (ˆθ n ) M n (θ 0 ) o P () converges in probability to θ 0. Lemma.. (Theorem.6.7 in van der Vaart and Wellner (996)) For a VC-class of functions with measurable envelop function F and r, one has for any probability measure Q with F Lr(Q) > 0, N(ɛ F Lr (Q), F, L r (Q)) KV (F)(6e) V (F) ( ɛ )r(v (F) ), for a universal constant K and 0 < ɛ <.

36 Lemma.. (Theorem.6.9 in van der Vaart and Wellner (996)) Let Q be a probability measure, and let F be a class of measurable functions with measurable square integrable envelop F such that QF < and N(ɛ F L (Q), F, L (Q)) C( ɛ )V for 0 < ɛ <. Then there exists a constant K that depends on C and V only such that N(ɛ F L (Q), sconv(f), L (Q)) K( ɛ )V/(V +). Lemma.. (Theorem.5. in van der Vaart and Wellner (996)) Let F be a class of measurable functions that satisfies 0 sup Q log N(ɛ F L (Q), F, L (Q))dɛ <, where the envelop function F of F is square integrable and the supremum is taken over all finitely discrete probability measures Q with F L (Q) > 0. Let the classes F δ = {f g : f, g F, f g L (P ) < δ} and F δ be P -measurable for every δ > 0. If P F <, then F is P -Donsker. Lemma.. (Example.0.7 in van der Vaart and Wellner (996)) If F and G are P - Donsker Classes with sup f F G P f <, then the pairwise infima F G, the pairwise suprema F G, and pairwise sums F + G are P -Donsker classes. Lemma.5. (Corollary.. in van der Vaart and Wellner (996)) Let F be a class of measurable functions and the semi-norm ρ P on F be defined as ρ P (f) = {P (f P f) } /. Then F being P -Donsker class implies that, if f n f as n in semi-norm ρ P for all f n and f in F, then (P n P )(f n f) = o P (n / ). Lemma.6. (Lemma.. in van der Vaart and Wellner (996)) Let F be the class of measurable functions such that P f < δ and f M for every f in F. Then there

37 exists K > 0, which is related to M, such that E P G n F K J [ ] {δ, F, L (P )}[ + J [ ] {δ, F, L (P )} δ ], n where J [ ] {δ, F, L (P )} = δ 0 + log N[ ] {ɛ, F, L (P )}dɛ and G n F = sup f F G n f. Lemma.7. (Theorem.. in van der Vaart and Wellner (996)) For each n, let M n and M n be stochastic processes indexed by a set Θ Let θ n Θ and 0 δ n < η be arbitrary. Suppose that, for every n and δ n < δ η E sup M n (θ) M n (θ n ) δ, δ/<d n(θ,θ n) δ,θ Θ n sup n[(mn M n )(θ) (M n M n )(θ n )] + Cφ(δ), δ/<d n (θ,θ n ) δ,θ Θ n for functions φ such that δ φ(δ)/δ α is decreasing on (δ n, η), for some α <. Let r n Cδ n satisfy r nφ( r n ) n, for every n. If the sequence ˆθ n takes its values in Θ n and satisfies M n (ˆθ n ) M n (θ n ) O P (rn ) and d n (ˆθ n, θ n ) converges to zero in probability, then r n d n (ˆθ n, θ n ) = O P ().. Some Useful Results on B-splines Lemma.8. (Jackson type Theorem, De boor (00), P. 9) Suppose g(x) is a function with the continuous derivative dw g(x). Then there exists a B-spline function dx w Ag(x) = p i= β in l i(x) with order l of the B-spline basis functions satisfying l w + and have knot sequence {u i } p with L = u = = u l < u l+ < < u p < u p+ = = u p+l = U, such that g Ag c T w dw g dx w

38 for some constant c > 0 depending on l only, where T = max l i p (u i+ u i ). Lemma.9. The result of Lemma.8 can be generalized to bivariate function and is given in Lemma.9. Suppose g(x, y) is a bivariate function with the continuous mixed derivatives of order w, w mg = w g(x,y) x m y w m for m =,,..., w. Then there exists a bivariate tensor B- spline function Ag(x, y) = p i= q j= α i,jn (),l i (x)n (),l j (y) with order l in both directions satisfying l w + and have knot sequences {u i } p+l with L = u = = u l < u l+ < < u p < u p+ = = u p+l = U, {v i } q+l with L = v = = v l < v l+ < < v q < v q+ = = v q+l = U, such that g Ag c T w ( g w, ), for some constant c > 0 depending on l only, where T = max{max l i p (u i+ u i ), max l j q (v j+ v j )}, and g w, = max 0 m w w g x m y. w m Proof. We define ω(g; h) = max{ g(x, y ) g(x, y ) : x x h, y y h, x, x [L, U ], y, y [L, U ]}. Then ω(g; h) is a monotone and subadditivity function of h, that is, ω(g; h ) ω(g; h + h ) ω(g; h ) + ω(g; h ) for nonnegative h

39 and h. The monotonicity of ω(g; h) is obvious by the definition. The proof of subadditivity is as follows. For any (x, y ) and (x, y ) with x x h + h and y y h + h, we can find (x, y ) such that x x h, y y h and x x h, y y h. Therefore, for any x x h + h and y y h + h, we have g(x, y ) g(x, y ) g(x, y ) g(x, y ) + g(x, y ) g(x, y ) max g(x, y ) g(x, y ) x x h y y h + max x x h y y h g(x, y ) g(x, y ) (.) =ω(g; h ) + ω(g; h ). By (.), ω(g; h + h ) ω(g; h ) + ω(g; h ) for nonnegative h and h, that is, subadditivity of ω(g; h) holds. By choosing τ < τ < < τ p in [L, U ] and ξ < ξ < < ξ q in [L, U ], we can construct a partially monotone tensor B-spline function Ag to approximate the smooth function g on [L, U ] [L, U ] as follows. Ag(x, y) = p i= q j= g(τ i, ξ j )N (),l i (x)n (),l j (y) = p { i= q j= N (),l j (y)g(τ i, ξ j )}N (),l i (x). For (ˆx, ŷ) in [u j, u j +] [v j, v j +] [L, U ] [L, U ], Ag(ˆx, ŷ) = j j i=j + l j=j + l g(τ i, ξ j )N (),l i (ˆx)N (),l j (ŷ), (.)

40 5 by property (B) in Section... Also by property (B) in Section.., we have g(ˆx, ŷ) = g(ˆx, ŷ) = g(ˆx, ŷ) = g(ˆx, ŷ) j i=j + l j i=j + l j N (),l i (ˆx) { j j=j + l j i=j + l j=j + l N (),l j N (),l i (ŷ)}n (),l i (ˆx) (ˆx)N (),l j (ŷ). (.) By (.) and (.), g(ˆx, ŷ) Ag(ˆx, ŷ) = Then we have g(ˆx, ŷ) Ag(ˆx, ŷ) j j i=j + l j=j + l j j i=j + l j=j + l {g(ˆx, ŷ) g(τ i, ξ j )}N (),l g(ˆx, ŷ) g(τ i, ξ j ) N (),l i max g(ˆx, ŷ) g(τ i, ξ j ). j + l i j j + l j j i (ˆx)N (),l j (ŷ). (ˆx)N (),l j (ŷ) Now choose the τ i s and ξ j appropriately. since the number of τ i s is greater than the number of subintervals made from knot sequence {u i } p+l, in order to guarantee that τ i+ τ i > 0 for i =,, p, we might choose τ i = u + (i )(u l+ u l ) l, i =,..., l, u i, i = l +,..., p. (.) Similarly, we might choose ξ j = v + (j )(v l+ v l ) l, j =,..., l, v j, j = l +,..., q, (.5)

41 6 (.) and (.5) imply τ i u i T and ξ j v j T for i =,..., p and j =,..., q. We also know u i ˆx u j + u j l+ l T for j l < i j and ˆx [u j, u j +] and v j ŷ v j + v j l+ l T for j l < j j and ŷ [v j, v j +]. So we have for j l < i j and ˆx [u j, u j +] τ i ˆx (l + ) T, and for j l < j j and ŷ [v j, v j +] ξ j ŷ (l + ) T. Therefore we have max g(ˆx, ŷ) g(τ i, ξ j ) max{ g(x, y ) g(x, y ) : j + l i j j + l j j x x (l + ) T, y y (l + ) T } (.6) =ω(g; (l + ) T ) =(l + )ω(g; T ), the last inequality is by subadditivity of ω(g; h). By (.6) we have g Ag = which means the distance between g and ψ l,l sup L x U L y U g(x, y) Ag(x, y) (l + )ω(g; T ), d(g, ψ l,l ) = inf s ψ l,l g s (l + )ω(g; T ), (.7)

Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations

Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research