STAT 6385 Survey of Nonparametric Statistics. Order Statistics, EDF and Censoring

Size: px

Start display at page:

Download "STAT 6385 Survey of Nonparametric Statistics. Order Statistics, EDF and Censoring"

Austin Sullivan
6 years ago
Views:

1 STAT 6385 Survey of Nonparametric Statistics Order Statistics, EDF and Censoring

2 Quantile Function A quantile (or a percentile) of a distribution is that value of X such that a specific percentage of the probability is at or below it. A quantile divides the area under the pdf into two parts of specific amounts. The p-th quantile (or the 100p-th percentile) is that value of the random variable X, say X p, such that 100p% if the values of X in the population are less than or equal to X p, for any positive fraction p (0 < p < 1). If X is continuous, X p is a parameter that satisfies Pr(X X p ) = p, or, in terms of cdf, F X (X p ) = p. 1

3 Quantile Function Moreover, if F X is strictly increasing, the p-th quantile is the unique solution to the equation X p = F 1 X (p) = Q X(p) say, for a given p. The inverse of the cdf Q X (p), 0 < p < 1, is called the quantile function of the random variable X. Since the cdf may not be increasing for all values, we define the p-th quantile Q X (p) as the smallest X value at which the cdf is at least equal to p, or Q X (p) = F 1 X (p) = inf{x : F X(x) p},0 < p < 1. This definition gives a unique value for the quantile Q X (p) even when F X is flat or is a step function (X is discrete). 2

4 Quantile Function The cdf and the quantile function provide similar information regarding the distribution, however, there are situations where one is more natural than the other. Formulas for the moments of X can be expressed in terms of the quantile function: E(X) = 1 0 Q X (p)dp and E(X 2 ) = 1 0 Q X (p) 2 dp Let f X (x) = d dx F X(x) be the pdf of X, the first and the second derivatives of the quantile function Q X (p) are d dp Q X(p) = 1 f X [Q X (p)] d2 and dp 2Q X(p) = f X [Q X(p)] {f X [Q X (p)]} 3 3

5 Empirical Distribution Function For a random sample (X 1, X 2,..., X n ) with the common cdf F(x). the empirical distribution function (EDF) is defined as ˆF n (x) = the proportion of sample values less than or equal to the specified value x = 1 n 1 {Xi x}, n i=1 where 1 {A} is an indicator function 1 {A} = { 1 if A is true, 0 otherwise. The EDF can be defined in terms of the order statistic of a sample as ˆF n (x) = { 0 if x < X1:n, i/n if X i:n < x < X i+1:n, 1 if x X n:n. Example: n = 5, (x 1, x 2,..., x 5 ) = (7,4,8,12,10) 4

6 Empirical Distribution Function ˆF n (x) is a jump (or a step) function, with jumps occurring at the (distinct) ordered sample values, where the height of each jump is equal to the reciprocal of the sample size. When there are ties, the EDF is still a step function but it jumps only at the distinct ordered sample vaules X j:n and the height of the jump is equal to k/n, where k is the number of values tied at X j:n. Theorem 1.1. For any fixed real constant x, the random variable nˆf n (x) has a binomial distribution with parameter n and F(x). 5

7 Empirical Distribution Function Corollary 1.2. The mean and variance of the EDF ˆF n (x) are E[ˆF n (x)] = var[ˆf n (x)] = ˆF n (x) is an unbiased estimator of F(x) and the variance of ˆF n (x) 0 as n. By Chebyshev s inequality, we can show that ˆF n (x) is a consistent estimator of F(x). In other words, ˆF n (x) converges to F(x) in probability. Corollary 1.3. E[(nˆF n (x))(nˆf n (y))] = nf(x)+ n(n 1)F(x)F(y), for x < y. 6

8 Empirical Distribution Function Theorem 1.4. (Glivenko-Cantelli Theorem) ˆF n (x) converges uniformly to F(x) with probability 1, i.e., Pr [ lim n sup ˆF n (x) F(x) = 0 <x< ] = 1. Theorem 1.5. As n, the limiting probability distribution of the standardized ˆF n (x) is standard normal, i.e., lim n Pr n[ˆf n (x) F(x)] F(x)[1 F(x)] t = Φ(t), where Φ( ) is the cdf of standard normal distribution. 7

9 Order Statistics Suppose that X 1, X 2,..., X n are n independent variates from a population with continuous cumulative distribution function (cdf) F(x) and probability density function (pdf) f(x). The probability that any two or more of these random variables have equal magnitudes is zero. There exists an unique ordered arrangement within the sample. X 1:n < X 2:n < < X n:n denote the order statistics obtained by arranging the n random variates in an increasing order of magnitude. X r:n, r = 1,2,...,n, is called the r-th order statistic. 8

10 Probability-Integral Transformation One key reason why the order statistic are so important in nonparametric statistics is that for any order statistics X r:n from a continuous cdf F, the transformed random variable U r:n = F(X r:n ) has the same distribution as that of the r-th order statistic from the continuous uniform distribution on (0,1), as long as F is continuous. In this sense, F(X r:n ) may be viewed as distribution-free. This important property of continuous order statistics is called the probability-integral transformation. Theorem. Let X be a random variable with cdf F X. If F X is continuous, the random variable Y produced by the transformation Y = F X (X) has the continuous uniform probability distribution over the interval (0, 1). 9

11 Distribution of Single Order Statistic The cdf of the largest (n-th) order statistic X n:n is given by F n:n (x) = Likewise, we have the cdf of the smallest (first) order statistic X 1:n F 1:n (x) = These are special cases of the general results for the cdf of the r-th order statistic F r:n (x) = 10

12 Distribution of Single Order Statistic The joint density function of all n order statistics is f X1:n,,X n:n (x 1:n,, x n:n ) = Upon integrating out (n 1) variables in the joint density function, we obtain the marginal density function of the r-th order statistic. For instance, for the largest order statistic f n:n (x) = The smallest order statistic f 1:n (x) = The r-th order statistic f r:n (x) = 11

13 Joint Densities and Markovian Property Use the above technique, upon integrating out appropriate variables, we obtain the joint density functions of (X 1:n, X 2:n,, X i:n ) and (X 1:n,, X i:n, X j:n ) for 1 i < j n, as and f X1:n,,X i:n (x 1:n,, x i:n ) { i } = n! f(x l:n ) {1 F(x i:n )} n i, (n i)! l=1 x 1:n < < x i:n, f X1:n,,X i:n,x j:n (x 1:n,, x i:n, x j:n ) { i } n! = f(x l:n ) f(x j:n ) (j i 1)!(n j)! respectively. l=1 {F(x j:n ) F(x i:n )} j i 1 {1 F(x j:n )} n j, x 1:n < < x i:n < x j:n, 12

14 Joint Densities and Markovian Property The conditional density function of X j:n, given (X 1:n = x 1:n,, X i:n = x i:n ) for 1 i < j n, is = f Xj:n X 1:n =x 1:n,,X i:n =x i:n (x j:n x 1:n,, x i:n ) Theorem 1.5. The conditional density function of X i:n, given X j:n = x j:n, for 1 i < j n, is exactly the same as the density function of the i-th order statistic in a sample of size j 1 from the distribution F( ) truncated on the right at x j:n, i.e., from the distribution with density function f(x)/{f(x j:n )}, x < x j:n. 13

15 Joint Densities and Markovian Property Theorem 1.6. Order statistics {X i:n,1 i n} from a continuous distribution form a Markov chain. Proof. We see that the conditional density function of X j:n, given (X 1:n = x 1:n,..., X i:n = x i:n ) for 1 i < j n depends only on x i:n which implies that the order statistics form a Markov chain. Theorem 1.7. The conditional density function of X j:n, given X i:n = x i:n, for 1 i < j n, is exactly the same as the density function of the (j i)-th order statistic in a sample of size n i from the distribution F( ) truncated on the left at x i:n, i.e., from the distribution with density function f(x)/[1 F(x i:n )], x > x i:n. 14

16 Moments We have the single and product moments of order statistics as E(Xi:n l ) = n! (i 1)!(n i)! and E(X l 1 i:n X l 2 j:n ) = x l i:n {F(x i:n)} i 1 {1 F(x i:n )} n i f(x i:n )dx i:n, n! (i 1)!(j i 1)!(n j)! x i:n x l1 i:n x l 2 j:n {F(x i:n )} i 1 {F(x j:n ) F(x i:n )} j i 1 {1 F(x j:n )} n j f(x i:n )f(x j:n )dx j:n dx i:n, respectively. 15

17 Moments Upon making the probability-integral transformation u i = F(x i:n ), for 1 i n, and noting that it is order-preserving, we can rewrite the single and product moments as and E(X l i:n) = n! (i 1)!(n i)! 1 0 {F 1 (u i )} l u i 1 i (1 u i ) n i du i, E(X l 1 i:n Xl 2 j:n ) = n! (i 1)!(j i 1)!(n j)! 1 1 { F 1 (u i ) } { l1 F 1 (u j ) } l2 0 u i (u j u i ) j i 1 (1 u j ) n j du j du i. u i 1 i 16

18 Results for the Uniform Distribution For the Uniform(0,1) distribution with density function f(x) = 1 for 0 < x < 1, expressions for distributions and moments of order statistics simplify considerably. The joint density of all n order statistics simply becomes f X1:n,,X n:n (x 1:n,, x n:n ) = n!, 0 < x 1:n < < x n:n < 1. The joint density of (X 1:n,, X i:n ) becomes f X1:n,,X i:n (x 1:n,, x i:n ) = n! (n i)! (1 x i:n) n i, 0 < x 1:n < < x i:n < 1. The marginal density function of X i:n (for 1 i n) becomes f Xi:n (x i:n ) = n! (i 1)!(n i)! xi 1 i:n (1 x i:n) n i, 0 < x i:n < 1, 17

19 Results for the Uniform Distribution This simply reveals that X i:n in this case is distributed as Beta(i,n-i+1) We immediately obtain E(Xi:n) k = from which we get E(X i:n ) = and V ar(x i:n ) =. The joint density function of X i:n and X j:n (for 1 i < j n) becomes = f Xi:n,X j:n (x i:n, x j:n ) n! (i 1)!(j i 1)!(n j)! xi 1 i:n (x j:n x i:n ) j i 1 (1 x j:n ) n j, 0 < x i:n < x j:n < 1. We can show that i(j + 1) E(X i:n X j:n ) = (n + 1)(n + 2) i(n j + 1) and Cov(X i:n, X j:n ) = (n + 1) 2 (n + 2). 18

20 Censoring Censoring restricts our ability to observe the random variable. Special source of difficulty in statistical analysis. Arise when subjects are not observed for full duration of time to failure incomplete information May be due to loss, early termination, death due to other causes, etc. 19

21 Left, Right, Doubly Censored Data Right-censoring: A survival time is said to be right-censored if the event of interest occurs after the subject is observed in the study. Left-censoring: A survival time is said to be left-censored if the event of interest occurs before the subject is observed in the study. 20

22 Left, Right, Doubly Censored Data Example for Left-censoring: In early childhood learning centers, interest often focus upon testing children to determine when a child learns to accomplish certain specified tasks. The age at which a child learns the task would be considered the time-to-event. Often, some children can already perform the task when they start in the study. Such event times are considered left-censored. Doubly censoring: In a study, it may happen that some survival times are left-censored and some are right-censored. In this case we say that the sampling scheme of the study is doubly-censored. 21

23 Type-I Censoring Type-I censoring Consider an experiment which will be stopped at a pre-specified time c. All observations survive after c cannot be observed. 6 O BS 5 : censored obs. 4 : observed failure N O Time to failure c 22

24 Type-I Censoring Only observations smaller than c can be observed. It is also called time-censoring. The observations greater than c are called Type-I right censored. The number of observations being observed is a random variable. The experimental time is fixed. 23

25 Type-I Censoring Generalized Type-I censoring Sometimes the subjects entered into the study at different time point and the study will be terminated at a pre-specified time c. 6 O BS 5 : censored obs. N O : observed failure : entry time 1 Calendar Time c 24

26 Type-I Censoring Of course we can rescale all the starting time to 0 and we have the following representation: 6 O BS 5 : censored obs. N O : observed failure 1 Time to failure 25

27 Type-II Censoring Type-II Censoring One of the question in Type-I censoring is the determination of the censoring time c. If c is large, the expense of the experiment is large. If c is small, it might turn out only small portion of sample can be observed. To avoid these situations, one might terminate the experiment after the first r failures have been observed. This is also called item censoring. 26

28 Type-II Censoring Type-II Censoring Start T (1) T (2) T (r 1) censored T (r) n r End The observations are T (1) T (2) T (r). The number of observations being observed is fixed. The experimental time is a random variable. 27

29 Interval Censoring Interval Censoring In some situations the lifetime is known to occur only within an interval. A survival time is said to be interval-censored if the event of interest is only known to have occurred in a given time interval. Example: The patients in a clinical trial have periodic follow-up b 1 b 2 b k b k+1 event occurs Under this situation we only know that t i [b k, b k+1 ) 28

30 Censoring Time as random variable Censoring time as a random variable: Example: In a cancer research study, some patients might drop out from the study for some reasons (pass away for reason(s) other than cancer). Therefore, the censoring time can be a random variable. Let C i denote the censoring time for the i-th subject. Observed data = min(t i, C i ) 29

31 Censoring Time as random variable Censoring time as a random variable: Important assumption: C i and T i are independent random variables, i.e., the reason for observing a censored observation is completely unrelated to the disease process of interest. For example, the subject that moved to another city, didn t move in search for better patient care. The survival data are usually coded as (X i, δ i ), i = 1,2,..., n; where δ i = 1 if the i-th observation is uncensored, and 0 otherwise. 30

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

Distributions of Functions of Random Variables 5.1 Functions of One Random Variable 5.2 Transformations of Two Random Variables 5.3 Several Random Variables 5.4 The Moment-Generating Function Technique