Order Statistics and Distributions

Similar documents
Statistics 3657 : Moment Approximations

1 Joint and marginal distributions

Probability and Distributions

ECE353: Probability and Random Processes. Lecture 7 -Continuous Random Variable

2 (Statistics) Random variables

2 Functions of random variables

Statistics 100A Homework 5 Solutions

Actuarial Science Exam 1/P

Problem 1. Problem 2. Problem 3. Problem 4

Review: mostly probability and some statistics

Statistics 3858 : Maximum Likelihood Estimators

HW4 : Bivariate Distributions (1) Solutions

STAT 712 MATHEMATICAL STATISTICS I

Sampling Distributions

Special Discrete RV s. Then X = the number of successes is a binomial RV. X ~ Bin(n,p).

(y 1, y 2 ) = 12 y3 1e y 1 y 2 /2, y 1 > 0, y 2 > 0 0, otherwise.

4. CONTINUOUS RANDOM VARIABLES

Random Variables and Their Distributions

APPM/MATH 4/5520 Solutions to Problem Set Two. = 2 y = y 2. e 1 2 x2 1 = 1. (g 1

Continuous random variables

Statistics 3657 : Moment Generating Functions

Delta Method. Example : Method of Moments for Exponential Distribution. f(x; λ) = λe λx I(x > 0)

Continuous Random Variables

Contents 1. Contents

Sampling Distributions

Chapter 5. Chapter 5 sections

STAT 302 Introduction to Probability Learning Outcomes. Textbook: A First Course in Probability by Sheldon Ross, 8 th ed.

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

IEOR 3106: Introduction to Operations Research: Stochastic Models. Professor Whitt. SOLUTIONS to Homework Assignment 2

Spring 2012 Math 541B Exam 1

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Solution to Assignment 3

Chapter 2: Random Variables

Transformations and Expectations

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

This does not cover everything on the final. Look at the posted practice problems for other topics.

Properties of Continuous Probability Distributions The graph of a continuous probability distribution is a curve. Probability is represented by area

1 Exercises for lecture 1

Recitation 2: Probability

Math 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Mathematical Statistics 1 Math A 6330

4 Pairs of Random Variables

EE4601 Communication Systems

Lecture 11: Continuous-valued signals and differential entropy

IE 230 Probability & Statistics in Engineering I. Closed book and notes. 60 minutes.

conditional cdf, conditional pdf, total probability theorem?

Northwestern University Department of Electrical Engineering and Computer Science

Statistics for Economists Lectures 6 & 7. Asrat Temesgen Stockholm University

Basics of Stochastic Modeling: Part II

Numerical Methods I Monte Carlo Methods

ECE 4400:693 - Information Theory

LIST OF FORMULAS FOR STK1100 AND STK1110

Non-parametric Inference and Resampling

7 Random samples and sampling distributions

3 Operations on One Random Variable - Expectation

Lecture 13: Conditional Distributions and Joint Continuity Conditional Probability for Discrete Random Variables

functions Poisson distribution Normal distribution Arbitrary functions

1 Inverse Transform Method and some alternative algorithms

Joint Probability Distributions and Random Samples (Devore Chapter Five)

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Chapter 4. Continuous Random Variables 4.1 PDF

Exam P Review Sheet. for a > 0. ln(a) i=0 ari = a. (1 r) 2. (Note that the A i s form a partition)

Bivariate distributions

Homework 10 (due December 2, 2009)

Probability Review. Yutian Li. January 18, Stanford University. Yutian Li (Stanford University) Probability Review January 18, / 27

Statistics for scientists and engineers

Sample Spaces, Random Variables

Probability Notes. Compiled by Paul J. Hurtado. Last Compiled: September 6, 2017

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Probability, CLT, CLT counterexamples, Bayes. The PDF file of this lecture contains a full reference document on probability and random variables.

MAS223 Statistical Inference and Modelling Exercises

More on Distribution Function

MULTIVARIATE PROBABILITY DISTRIBUTIONS

This exam is closed book and closed notes. (You will have access to a copy of the Table of Common Distributions given in the back of the text.

Lectures on Elementary Probability. William G. Faris

STAT 450: Statistical Theory. Distribution Theory. Reading in Casella and Berger: Ch 2 Sec 1, Ch 4 Sec 1, Ch 4 Sec 6.

Lecture 11. Probability Theory: an Overveiw

Chapter 4 Multiple Random Variables

Practice Questions for Final

1 Random Variable: Topics

Math Review Sheet, Fall 2008

1 Solution to Problem 2.1

Review of Probability Theory

1 Presessional Probability

Chapter 6: Functions of Random Variables

Multivariate distributions

STAT 6385 Survey of Nonparametric Statistics. Order Statistics, EDF and Censoring

Random Variables. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay

Midterm Examination. STA 215: Statistical Inference. Due Wednesday, 2006 Mar 8, 1:15 pm

Probability Theory and Statistics. Peter Jochumzen

Probability review. September 11, Stoch. Systems Analysis Introduction 1

Introduction to Probability and Statistics (Continued)

IEOR 4703: Homework 2 Solutions

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Review of Probabilities and Basic Statistics

Things to remember when learning probability distributions:

Transcription:

Order Statistics and Distributions 1 Some Preliminary Comments and Ideas In this section we consider a random sample X 1, X 2,..., X n common continuous distribution function F and probability density f. Thus X i, i 1, 2,..., n are i.i.d. F (or f). The order statistics are then the random variables X (1) X (2)... X (k)... X (n). The random variable X (k) is thus the k-th largest of the n random variables X 1, X 2,..., X n. These are interesting for various reasons in statistics. These are the basis of how QQ plots are constructed and why they work as a statistical diagnostic tool. Since F is continuous then P (X 1 X 2 ). To see this, notice that for any any ɛ > Therefore P ( X 1 X 2 ɛ) x1+ɛ x 1 ɛ as ɛ. f(x 1 )f(x 2 )dx 2 dx 1 P ( X 1 X 2 ) lim ɛ P ( X 1 X 2 ɛ). This will imply that for any sample size n the r.v.s X 1, X 2,..., X n are all distinct with probability 1. We will not investigate this property further in this course. Note this would not occur if the X i s were discrete r.v. s. For iid r.v.s from a continuous cdf F the r.v.s are all distinct and therefore no two of them are equal. In this section we study the distribution of the order statistics. For any n the mapping g(x 1,..., x n ) (x (1), x (2),..., x (n) ) is a many to one mapping. In particular we see this quite easily when n 2 where the mapping is g(x, y) (min(x, y), max(x.y)). Thus for any a < b both (a, b) and (b, a) map to the same pair (a, b). Therefore the mapping or transformation is not a one to one invertible differential mapping so we cannot use the calculus method for these types of mappings. This means that we will need to consider some special way of finding the distribution of the order statistics, and their one and two dimensional marginal distributions. As in our earlier study of such transformations the first attempt will be to find the cdf and from this obtain the pdf. 1

Order Statistics 2 2 Marginal Distribution of X (1) The cdf, say F 1, of X (1) is easy to obtain. F 1 (x) P ( X (1) x ) 1 P ( X (1) > x ) 1 P (X 1 > x, X 2 > x,..., X n > x) n 1 P (X i > x) 1 (1 F (x)) n Differentiating w.r.t. x gives f 1 the pdf of X (1) as f 1 (x) nf(x)(1 F (x)) n 1. i1 The student should try using a similar argument to find the pdf, f n, of X (n). The answer is f n (x) nf(x)f (x) n 1. Notice that in both cases the support of f 1 and of f n is inherited from the support of f. 3 Joint Distribution of X (1), X (2) In this section we find the joint cdf of X (1), X (2), say F 1,2 and then obtain the joint pdf of X (1), X (2), say f 1,2. Even if n 2 the transformation (X 1, X 2,..., X n ) (X (1), X (2) ) is not 1 to 1. Thus we cannot find the pdf f 1,2 directly. We obtain F 1,2 and then obtain from this f 1,2. Let x < y. Consider the event B X (1) x, X (2) > y}. In words the event B occurs if and only if 1 of the n trials X i takes on a value less than or equal to x and n 1 of the trials takes on values greater than y. Notice there are ( n 1) ways of this occurring. Since the X i s are iid thus ( ) n P (B) F (x) (1 F (y)) n 1. 1 Notice that we can decompose the event X (1) x} X (1) x, X (2) y} X (1) x, X (2) > y}. Thus by the axioms by of probability we have, for x < y F 1 (x) P (X (1) x) F 1,2 (x, y) + P (B)

Order Statistics 3 Also and hence P (X (1) x) 1 (1 F (x)) n. This yields, for x < y P (X (1) > x) P (X i > x for all i 1,..., n) (1 F (x)) n F 1,2 (x, y) 1 (1 F (x)) n nf (x) (1 F (y)) n 1 Taking the second partial derivative with respect to x, y gives f 1,2 (x, y) 2 F 1,2 (x, y) x y nf(x)(n 1) (1 F (y)) n 2 ( 1)f(y) n(n 1)f(x)f(y) (1 F (y)) n 2. For x > y we also have f 1,2 (x, y). The student should think about why this is the case. Putting these together we have n(n 1)f(x)f(y) (1 F (y)) n 2 if x y f 1,2 (x, y) otherwise This method works well for finding the joint pdf of X (n 1), X (n). This direct method becomes increasingly more complicated in terms of combinatorics, and inclusion exclusion arguments. For example it is already complicated for finding the pdf of X (1), X (3). Thus the heuristic method is used to obtain the correct answer. It is helpful to notice that n(n 1) 1! 1! (n 2)! which is a combinatorial expression that comes up in the heuristic method below. A similar method of adding and subtracting appropriate regions can also be used to derive the marginal pdf of X (k) and other bivariate marginals. However this method is very complicated to keep track of all the regions. 4 Heuristic Method This is based on the idea of implementing the multinomial model for observations falling into various bins. The term heuristic refers to a process or method for attempting the solution of a problem, while giving a reasonable rule for obtaining the result may not give a technical justification for the steps involved. 4.1 Marginal Distribution Consider X (k). For a given small > P (X (k) [x, x + )) F (x) k 1 (F (x + ) F (x)) (1 F (x + )) n k} (k 1)!1!(n k)!

Order Statistics 4 There is an error in the approximation. We do not study it in this course, but this error is small compared with. The error is due to ignoring that perhaps more than one observation might fall into the interval [x, x + ). It is for this reason that the method is heuristic. Let f k be the pdf of X (k). Recall that for a pdf and therefore P (X (k) [x, x + )) P (X (k) [x, x + )) 1 x+ x x+ x f k (y)dy f k (y)dy f k (x) as +. A similar property holds of course for F and its pdf f. (There was a typo in the two lines above, now corrected.) Therefore P (X (k) [x, x + )) f k (x) lim + F (x) k 1 lim (k 1)!1!(n k)! (uses limit of product is product of limits) (k 1)!1!(n k)! F (x)k 1 f(x) (1 F (x+)) n k + (F (x + ) F (x)) } lim (1 F (x + )) n k + Notice that the method used here is to recognize that we are calculating the probability of random variables falling into one of three bins, that is for each trial of a random variable, the outcome falls into one of three bins. This of course in the multinomial distribution, which the student should have reviewed in readings earlier in the course. The method or argument is heuristic since possibly more than one r.v. can fall into the bin [x, x + ), but this last part can be shown to be small relative to, a property which we will not work through in this course. 4.2 Bivariate Marginal Distribution Consider X (k), X (l), where 1 k < l n. Let the joint pdf be f k,l. Consider the event such that for small the k and l order statistics fall into the intervals [x, x + ) and [y, y + ) respectively, with x < y. Thus approximately P (X (k) [x, x + ), X (l) [y, y + )) (k 1)!1!(l k 1)!1!(n l)! (F (y + ) F (y)) (1 F (y + )) n l} F (x) k 1 (F (x + ) F (x)) (F (y) F (x + )) l k 1 The error in this approximation is that events with 2 or more ties are omitted. We will not study this further in this course, and take it as given that this error in the approximation is small relative to limits needed below. The combinatorial term is obtained from counting the number of ways n indices in 1, 2,..., n} can be arranged into the 5 groups with k 1 of X 1, X 2,..., X n fall into (, x], 1 in [x, x + ), l k 1 into the interval [x +, y), 1 in [y, y + ) and the remaining n l into the interval [y +, ).

Order Statistics 5 Since the LHS is P (X (k) [x, x + ), X (l) [y, y + )) x+ y+ x y then dividing both sides by 2, and taking the limit as, yields f k,l (u, v)dvdu f k,l (x, y) f k,l (x, y) F (x) k 1 f(x) (F (y) F (x)) l k 1 f(y) (1 F (y)) n l} (k 1)!1!(l k 1)!1!(n l)! The coefficient in front is a generalization of the binomial coefficient and is sometimes written as ( ) n (k 1) 1 (l k 1) 1 (n l) 5 Examples Example 1 Let n 4, and that X i are iid Uniform(,1) random variables. Obtain the distribution of the sample ( ) median. Notice the sample median is thus M 1 2 X(2) + X (3). Thus we first need to obtain the density f 2,3 in the above notation. The Uniform(,1) pdf is f(x) I (,1) (x) 1 if < x < 1 if x and x 1 and the Uniform(,1) CDF is Thus f 2,3 (x, y) The CDF, say G, of M is thus given by if x < F (x) x if x < 1 1 if x 1. 4!x(1 y) if < x < y < 1 otherwise. (1) G(u) P (M u) P (X (2) + X (3) 2u) and hence M has pdf g(u) 2 f 2,3 (x, 2u x)dx. (2) The student should review the method that was used to derive the convolution formula. Aside : X (2), X (3) has pdf f 2,3. Thus T X (2) + X (3) has cdf F T and M has pdf F M (u) F T (2u). Recall we have a formula for the pdf f T in terms of an integral involving f 2,3. f M (u) df T (2u) du 2f T (2u) 2 f 2,3 (x, 2u x)dx

Order Statistics 6 End of Aside To obtain the correct limits of integration we now have to use the specific properties of the uniform pdf, specifically it support as needed in (1). For a given value of u, let A u be the support of the integrand in (2) so that g(u) 2 A u f 2,3 (x, 2u x)dx. A u x : < x < 1, x < 2u x and 2u x < 1} x : < x < 1, 2x < 2u and 2u 1 < x} We obtain these three pairs of inequalities from the support of f 2,3. If < u < 1 2 then 2u 1 < and hence If 1 2 A u x : < x < 1, x < u} (, u). < u < 1 then 1 < 2u < 2 < 2u 1 < 1, and hence A u x : < x < 1, 2u 1 < x < u} (2u 1, u). The student should complete the integral at home and find the the distribution of the sample median M in this case.

Order Statistics 7 Example 2 Find the marginal distribution of the k-th order statistic from a sample of iid Uniform(,1) random variables. By the heuristic method the pdf f k of U (k) is f k (x) (k 1)!(n k)! F (x)k 1 f(x)(1 F (x)) n k where F and f are the cdf and pdf of the Uniform(,1) distribution. There were calculate in the previous example. Thus we have f k (x) (k 1)!(n k)! xk 1 (1 x) n k if < x < 1 otherwise Remark Since pdf s integrate to 1 we obtain from example 2 the following formula for integers n > and 1 k n : x k 1 (1 x) n k (k 1)!(n k)! dx. (3) Example continued : Expected Value of k-th order statistic from a sample of n iid Uniform(,1) random variables. Suppose that U 1, U 2,..., U n are iid Unif(, 1) r.v.s. E(U (k) ) xf k (x)dx x (k 1)!(n k)! xk 1 (1 x) n k dx (k 1)!(n k)! (k 1)!(n k)! (k 1)!(n k)! k n + 1 x k+1 1 (1 x) n k dx x k+1 1 (1 x) (n+1) (k+1) dx (k + 1 1)!(n + 1 k 1)! (n + 1)! In the second last line we are using (3) with integer values k + 1 and n + 1. Example 3 Find the marginal distribution of the 2-nd order statistic from a sample of n 4 iid exponential paramter 1 random variables. By the heuristic method the pdf f k of X (k) is f 2 (x) 4! (2 1)!(4 2)! F (x)2 1 f(x)(1 F (x)) 4 2

Order Statistics 8 where F and f are the cdf and pdf of the exponential parameter 1 distribution. Specifically this gives f 2 (x) when x <, and for x we have f 2 (x) 12 ( 1 e x) e x ( 1 1 + e x) 2 12 ( 1 e x) e 3x. The student should now calculate for example P (X (2) > 1) and E(X (2) ).