Similar documents
4. Partial Sums and the Central Limit Theorem

Parameter, Statistic and Random Samples

CHAPTER 2. Mean This is the usual arithmetic mean or average and is equal to the sum of the measurements divided by number of measurements.

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Expectation and Variance of a random variable

6. Sufficient, Complete, and Ancillary Statistics

5. Likelihood Ratio Tests

Chapter 2 Descriptive Statistics

(# x) 2 n. (" x) 2 = 30 2 = 900. = sum. " x 2 = =174. " x. Chapter 12. Quick math overview. #(x " x ) 2 = # x 2 "

Probability and statistics: basic terms

Lecture 7: Properties of Random Samples

Topic 9: Sampling Distributions of Estimators

Median and IQR The median is the value which divides the ordered data values in half.

Chapter If n is odd, the median is the exact middle number If n is even, the median is the average of the two middle numbers

ENGI 4421 Probability and Statistics Faculty of Engineering and Applied Science Problem Set 1 Solutions Descriptive Statistics. None at all!

Economics 250 Assignment 1 Suggested Answers. 1. We have the following data set on the lengths (in minutes) of a sample of long-distance phone calls

Lecture 18: Sampling distributions

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Topic 9: Sampling Distributions of Estimators

Lecture 1. Statistics: A science of information. Population: The population is the collection of all subjects we re interested in studying.

An Introduction to Randomized Algorithms

Chapter 6 Principles of Data Reduction

Topic 9: Sampling Distributions of Estimators

Random Variables, Sampling and Estimation

Chapter 6 Infinite Series

Estimation for Complete Data

Exercise 4.3 Use the Continuity Theorem to prove the Cramér-Wold Theorem, Theorem. (1) φ a X(1).

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Binomial Distribution

Department of Civil Engineering-I.I.T. Delhi CEL 899: Environmental Risk Assessment HW5 Solution

HOMEWORK I: PREREQUISITES FROM MATH 727

Mathematical Statistics - MS

Stochastic Simulation

Ma 530 Introduction to Power Series

Basics of Probability Theory (for Theory of Computation courses)

Joint Probability Distributions and Random Samples. Jointly Distributed Random Variables. Chapter { }

Anna Janicka Mathematical Statistics 2018/2019 Lecture 1, Parts 1 & 2

IIT JAM Mathematical Statistics (MS) 2006 SECTION A

Randomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018)

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Unbiased Estimation. February 7-12, 2008

Chapter 2 The Monte Carlo Method

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Stat 421-SP2012 Interval Estimation Section

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics

32 estimating the cumulative distribution function

Data Description. Measure of Central Tendency. Data Description. Chapter x i

Lecture 19: Convergence

Distribution of Random Samples & Limit theorems

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Axioms of Measure Theory

EE / EEE SAMPLE STUDY MATERIAL. GATE, IES & PSUs Signal System. Electrical Engineering. Postal Correspondence Course

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Lecture 2: Monte Carlo Simulation

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

7.1 Convergence of sequences of random variables

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

Fall 2013 MTH431/531 Real analysis Section Notes

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

1.010 Uncertainty in Engineering Fall 2008

Math 155 (Lecture 3)

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)

( ) = p and P( i = b) = q.

The Maximum-Likelihood Decoding Performance of Error-Correcting Codes

Lecture 5. Random variable and distribution of probability

Summarizing Data. Major Properties of Numerical Data

IE 230 Probability & Statistics in Engineering I. Closed book and notes. No calculators. 120 minutes.

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables

Application to Random Graphs

7.1 Convergence of sequences of random variables

Approximations and more PMFs and PDFs

Read through these prior to coming to the test and follow them when you take your test.

(6) Fundamental Sampling Distribution and Data Discription

Convergence of random variables. (telegram style notes) P.J.C. Spreij

6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer.

Introducing Sample Proportions

1.3 Convergence Theorems of Fourier Series. k k k k. N N k 1. With this in mind, we state (without proof) the convergence of Fourier series.

Output Analysis and Run-Length Control

Module 1 Fundamentals in statistics

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Properties of Point Estimators and Methods of Estimation

Statistics 511 Additional Materials

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.

Department of Mathematics

Lecture 4. Random variable and distribution of probability

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

6. Uniform distribution mod 1

Introducing Sample Proportions

f X (12) = Pr(X = 12) = Pr({(6, 6)}) = 1/36

Advanced Analysis. Min Yan Department of Mathematics Hong Kong University of Science and Technology

Topic 8: Expected Values

Elementary Statistics

Chapter 6 Sampling Distributions

NOTES ON DISTRIBUTIONS

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,

PH 425 Quantum Measurement and Spin Winter SPINS Lab 1

Transcription:

1 of 7 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 6. Order Statistics Defiitios Suppose agai that we have a basic radom experimet, ad that X is a real-valued radom variable for the experimet with distributio fuctio F ad probability desity fuctio f. We perform idepedet replicatios of the basic experimet to geerate a radom sample X = (X 1, X 2,..., X ) of size from the distributio of X. Recall that this is a sequece of idepedet radom variables, each with the distributio of X. Let X, k( X) deote the k th smallest of elemet of the sample X. This statistics is called the order statistic of order k. Ofte the first step i a statistical study is to order the data; thus order statistics occur aturally. Our goal i this sectio is to study the distributio of the order statistics i terms of the samplig distributio. Note i particular that the extreme order statistics are the miimum ad maximum values: X, 1 = mi {X 1, X 2,..., X }, X, = max {X 1, X 2,..., X } 1. I the order statistic experimet, use the default settigs ad ru the experimet a few times. Note the followig: The table o the left shows the values of the order statistics. The graph o the left shows the desity fuctio of the samplig distributio i blue ad the sample values i re The graph o the right shows the desity fuctio of the selected order statistic i blue ad the empirical desity fuctio i re The mea/stadard deviatio bar of the distributio is show i blue while the empirical mea/stadard deviatio bar is show i re The table o the right gives umerical values of the desity fuctio ad momets ad the empirical desity fuctio ad momets. Distributios The Distributio of the k th Order Statistic Let G, k deote the distributio fuctio of X, k. Defie N, y = #({i {1, 2,..., } : X i y}), y R 2. Show that N, y has the biomial distributio with parameters ad F(y) for each y R.

2 of 7 7/16/2009 6:06 AM 3. Show that X, k y if ad oly if N, y k for y R ad k {1, 2,..., }. 4. Use the results of Exercises 2 ad 3 to show that G, k( y) = j =k ( j) F(y) j (1 F(y)) j, y R 5. I particular, show that G, 1( y) = 1 (1 F(y)), y R. 6. I particular, show that G, ( y) = F(y), y R. 7. Suppose ow that X has a cotiuous distributio. Show that X, k has a cotiuous distributio with probability desity fuctio g, k(y) = ( k 1, 1, k) F(y)k 1 (1 F(y)) k f (y), Hit: Differetiate the expressio i Exercise 4 with respect to y. y R 8. I the order statistic experimet, select the uiform distributio o [ 0, 1] ad = 5. Vary k from 1 to 5 ad ote the shape of the desity fuctio of X, k. For each value of k, ru the simulatio 1000 times with ad update frequecy of 10. Note the apparet covergece of the empirical desity fuctio to the true desity fuctio. There is a simple heuristic argumet for the result i Exercise 7. First, g, k(y) dy is the probability that X, k is i a ifiitesimal iterval of size dy about y. O the other had, this evet meas that oe of sample variables is i the ifiitesimal iterval, k 1 sample variables are less tha y, ad k sample variables are greater tha y. The umber of ways of choosig these variables is the multiomial coefficiet ( k 1, 1, k) =! (k 1)! 1! ( k)! By idepedece, the probability that the chose variables are i the specified itervals is F(y) k 1 (1 F(y)) k f (y) dy 9. Cosider a radom sample of size from the expoetial distributio with rate parameter r. Compute the probability desity fuctio of the k th order statistic X, k. I particular, ote that the miimum of the variables X, 1 has the expoetial distributio with rate parameter r. 10. I the order statistic experimet, select the expoetial (1) distributio ad = 5. Vary k from 1 to 5 ad ote the shape of the probability desity fuctio of X, k. For each value of k, ru the simulatio 1000 times with ad update frequecy of 10. Note the apparet covergece of the empirical desity fuctio to the true desity fuctio.

3 of 7 7/16/2009 6:06 AM 11. Cosider a radom sample of size from the uiform distributio o the iterval [ 0, 1]. Show that X, k has beta distributio with parameters k ad k + 1. Give the mea ad variace of X, k. 12. I the order statistic experimet, select the uiform distributio o [ 0, 1] ad = 6. Vary k from 1 to 6 ad ote the size ad locatio of the mea/stadard deviatio bar. For each value of k, ru the simulatio 1000 times with ad update frequecy of 10. Note the apparet covergece of the empirical momets to the distributio momets. 13. Four fair dice are rolle Fid the probability desity fuctio of each of the order statistics. 14. I the dice experimet, select the followig order statistic ad die distributio. Icrease the umber of dice from 1 to 20, otig the shape of the probability desity fuctio at each stage. Now with = 4, ru the simulatio 1000 times, updatig every 10 rus. Note the apparet covergece of the relative frequecy fuctio to the desity fuctio. M aximum score with fair dice. M iimum score with fair dice. M aximum score with ace-six flat dice. M iimum score with ace-six flat dice. Joit Distributios Suppose agai that X has a cotiuous distributio. 15. Suppose that j < k. Use a heuristic argumet to show that the joit desity of (X, j, X, k) is g, j, k(y, z) = ( j 1, 1, k j 1, 1, k) F(y) j 1 f (y) (F(z) F(y)) k j 1 f (z) (1 F(z)) k, y < z Similar argumets ca be used to obtai the joit probability desity fuctio of ay umber of the order statistics. Of course, we are particularly iterested i the joit probability desity fuctio of all of the order statistics; the followig exercise gives this joit probability desity fuctio, which has a remarkably simple form. 16. Show that (X, 1, X, 2,..., X, ) has joit probability desity fuctio give by g (y 1, y 2,..., y ) =! f (y 1 ) f (y 2 ) f (y ), y 1 < y 2 < < y For each permutatio i = (i 1, i 2,..., i ) of (1, 2,..., ), let S i = {x R : x i1 < x i2 < < x i }. O S i, the mappig (x 1, x 2,..., x ) (x i1, x i2,..., x i ) is oe-to-oe, has cotiuous first partial derivatives, ad has Jacobia 1.

4 of 7 7/16/2009 6:06 AM e. The sets S i where i rages over the! permutatios of (1, 2,..., ) are disjoit. The probability that (X 1, X 2,..., X ) is ot i oe of these sets is 0. Now use the multivariate chage of variables formul Agai, there is a simple heuristic argumet for the formula i Exercise 16. For each y R with y 1 < y 1 < < y, there are! permutatios of the coordiates of y. The probability desity of (X 1, X 2,..., X ) at each of the this poits is f (y 1 ) f (y 2 ) f (y ). Hece the probability desity of (X, 1, X, 2,..., X, ) at y is! times this product. 17. Cosider a radom sample of size from the expoetial distributio with rate parameter r. Compute the joit probability desity fuctio of the order statistics (X, 1, X, 2,..., X, ). 18. Suppose that (X 1, X 2,..., X ) is a radom sample of size from the uiform distributio o the iterval [ a, b], where a < Show that (X 1, X 2,..., X ) is uiformly distributed o [ a, b] (X, 1, X, 2,..., X, ) is uiformly distributed o {x [ a, b] : a < x 1 < x 2 < < x < b}.. 19. Four fair dice are rolle Fid the joit probability desity fuctio of the order statistics. Derived Statistics We will study several importat statistics that are based o order statistics. S ample Rage The sample rage is the radom variable R = X, X, 1 This statistic gives a simple measure of the dispersio of the sample. Note the distributio of the sample rage ca be obtaied from the joit distributio of (X, 1, X, ) give earlier. 20. Cosider a radom sample of size from the expoetial distributio with rate parameter r. Show that the sample rage R has the same distributio as the maximum of a radom sample of size 1 from this expoetial distributio. 21. Cosider a radom sample of size from the uiform distributio o [ 0, 1]. Show that R has the beta distributio with left parameter 1 ad right parameter 2.

5 of 7 7/16/2009 6:06 AM Give the mea ad variace of R. What happes as? 22. Four fair dice are rolle Fid the probability desity fuctio of the sample rage. The Sample Media If is odd, the sample media is the middle of the ordered observatios, amely X, k where k = + 1 2 If is eve, there is ot a sigle middle observatio, but rather two middle observatios. Thus, the media iterval is [ X, k, X, k+1] where k = 2 I this case, the sample media is defied to be the midpoit of the media iterval 1 2 ( X, k + X, k+1) where k = 2 I a sese, this defiitio is a bit arbitrary because there is o compellig reaso to prefer oe poit i the media iterval over aother. For more o this issue, see the discussio of error fuctios i the sectio o Variace. I ay evet, sample media is a atural statistic that is aalogous to the media of the distributio. Moreover, the distributio of the sample media ca be obtaied from our results o order statistics. S ample Quatiles We ca geeralize the sample media discussed above to other sample quatiles. Suppose that p ( 0, 1). Let k = ( + 1) p, the iteger part of ( + 1) p, ad let q = ( + 1) p k, the fractioal part of ( + 1) p. Usig liear iterpolatio, we defie the sample quatile of order p to be X, k + q (X, k+1 X, k) = (1 q) X, k + q X, k+1 Oce agai, the sample quatile of order p is a atural statistic that is aalogous to the distributio quatile of order p. Moreover, the distributio of a sample quatile ca be obtaied from our results o order statistics. The sample quatile of order 1 4 is kow as the first sample quartile ad is frequetly deoted Q 1. The the sample quatile of order 3 4 is kow as the third sample quartile ad is frequetly deoted Q 3. Note that

6 of 7 7/16/2009 6:06 AM the sample media is the quartile of order 1 2 ad is sometimes deoted Q 2. The iterquartile rage is defied to be IQR = Q 3 Q 1 The IQR is a statistic that measures the spread of the distributio about the media, but of course this umber gives less iformatio tha the iterval [ Q 1, Q 3 ]. Exploratory Data Aalysis The five statistics (X, 1, Q 1, Q 2, Q 3, X, ) are ofte referred to as the five-umber summary. Together, these statistics give a great deal of iformatio about the distributio i terms of the ceter, spread, ad skewess. Graphically, the five umbers are ofte displayed as a boxplot, which cosists of a lie extedig from the miimum X, 1 to the maximum X,, with a rectagular box from the first quartile Q 1 to the third quartile Q 3 ad tick marks at the miimum, the media Q 2, ad the maximum. 23. I the iteractive histogram, select boxplot. Costruct a frequecy distributio with at least 6 classes ad at least 10 values. Compute the statistics i the five-umber summary by had ad verify that you get the same results as the applet. 24. I the iteractive histogram, select boxplot. Set the class width to 0.1 ad costruct a distributio with at least 30 values of each of the types idicated below. The icrease the class width to each of the other four values. As you perform these operatios, ote the shape of the boxplot ad the relative positios of the statistics i the five-umber summary: e. f. A uiform distributio. A symmetric, uimodal distributio. A uimodal distributio that is skewed right. A uimodal distributio that is skewed left. A symmetric bimodal distributio. A u-shaped distributio. 25. I the iteractive histogram, select boxplot. Start with a distributio ad add additioal poits as follows. Note the effect o the boxplot: e. f. Add a poit below X, 1. Add a poit betwee X, 1 ad Q 1. Add a poit betwee Q 1 ad Q 2. Add a poit betwee Q 2 ad Q 3. Add a poit betwee Q 3 ad X,. Add a poit above X,.

7 of 7 7/16/2009 6:06 AM I the last problem, you may have oticed that whe you add a additioal poit to the distributio, oe or more of the five statistics does ot chage. I geeral, quatiles ca be relatively isesitive to chages i the dat 26. Compute the five umber summary ad sketch the boxplot for the velocity of light variable i M ichelso's dat Compare the media with the true value of the velocity of light. 27. Compute the five umber summary ad sketch the boxplot for the desity of the earth variable i Cavedish's dat Compare the media with the true value of the desity of the earth. 28. Compute the five umber summary ad sketch the boxplot for the et weight variable i the M&M dat 29. Compute the five umber summary for the sepal legth variable i Fisher's iris data, usig the cases idicated below. Plot the boxplots o parallel axes, so you ca compare. All cases Type Setosa oly Type Vergiica oly Type Versicolor oly Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 Cotets Applets Data Sets Biographies Exteral Resources Key words Feedback