1 of 7 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 6. Order Statistics Defiitios Suppose agai that we have a basic radom experimet, ad that X is a real-valued radom variable for the experimet with distributio fuctio F ad probability desity fuctio f. We perform idepedet replicatios of the basic experimet to geerate a radom sample X = (X 1, X 2,..., X ) of size from the distributio of X. Recall that this is a sequece of idepedet radom variables, each with the distributio of X. Let X, k( X) deote the k th smallest of elemet of the sample X. This statistics is called the order statistic of order k. Ofte the first step i a statistical study is to order the data; thus order statistics occur aturally. Our goal i this sectio is to study the distributio of the order statistics i terms of the samplig distributio. Note i particular that the extreme order statistics are the miimum ad maximum values: X, 1 = mi {X 1, X 2,..., X }, X, = max {X 1, X 2,..., X } 1. I the order statistic experimet, use the default settigs ad ru the experimet a few times. Note the followig: The table o the left shows the values of the order statistics. The graph o the left shows the desity fuctio of the samplig distributio i blue ad the sample values i re The graph o the right shows the desity fuctio of the selected order statistic i blue ad the empirical desity fuctio i re The mea/stadard deviatio bar of the distributio is show i blue while the empirical mea/stadard deviatio bar is show i re The table o the right gives umerical values of the desity fuctio ad momets ad the empirical desity fuctio ad momets. Distributios The Distributio of the k th Order Statistic Let G, k deote the distributio fuctio of X, k. Defie N, y = #({i {1, 2,..., } : X i y}), y R 2. Show that N, y has the biomial distributio with parameters ad F(y) for each y R.
2 of 7 7/16/2009 6:06 AM 3. Show that X, k y if ad oly if N, y k for y R ad k {1, 2,..., }. 4. Use the results of Exercises 2 ad 3 to show that G, k( y) = j =k ( j) F(y) j (1 F(y)) j, y R 5. I particular, show that G, 1( y) = 1 (1 F(y)), y R. 6. I particular, show that G, ( y) = F(y), y R. 7. Suppose ow that X has a cotiuous distributio. Show that X, k has a cotiuous distributio with probability desity fuctio g, k(y) = ( k 1, 1, k) F(y)k 1 (1 F(y)) k f (y), Hit: Differetiate the expressio i Exercise 4 with respect to y. y R 8. I the order statistic experimet, select the uiform distributio o [ 0, 1] ad = 5. Vary k from 1 to 5 ad ote the shape of the desity fuctio of X, k. For each value of k, ru the simulatio 1000 times with ad update frequecy of 10. Note the apparet covergece of the empirical desity fuctio to the true desity fuctio. There is a simple heuristic argumet for the result i Exercise 7. First, g, k(y) dy is the probability that X, k is i a ifiitesimal iterval of size dy about y. O the other had, this evet meas that oe of sample variables is i the ifiitesimal iterval, k 1 sample variables are less tha y, ad k sample variables are greater tha y. The umber of ways of choosig these variables is the multiomial coefficiet ( k 1, 1, k) =! (k 1)! 1! ( k)! By idepedece, the probability that the chose variables are i the specified itervals is F(y) k 1 (1 F(y)) k f (y) dy 9. Cosider a radom sample of size from the expoetial distributio with rate parameter r. Compute the probability desity fuctio of the k th order statistic X, k. I particular, ote that the miimum of the variables X, 1 has the expoetial distributio with rate parameter r. 10. I the order statistic experimet, select the expoetial (1) distributio ad = 5. Vary k from 1 to 5 ad ote the shape of the probability desity fuctio of X, k. For each value of k, ru the simulatio 1000 times with ad update frequecy of 10. Note the apparet covergece of the empirical desity fuctio to the true desity fuctio.
3 of 7 7/16/2009 6:06 AM 11. Cosider a radom sample of size from the uiform distributio o the iterval [ 0, 1]. Show that X, k has beta distributio with parameters k ad k + 1. Give the mea ad variace of X, k. 12. I the order statistic experimet, select the uiform distributio o [ 0, 1] ad = 6. Vary k from 1 to 6 ad ote the size ad locatio of the mea/stadard deviatio bar. For each value of k, ru the simulatio 1000 times with ad update frequecy of 10. Note the apparet covergece of the empirical momets to the distributio momets. 13. Four fair dice are rolle Fid the probability desity fuctio of each of the order statistics. 14. I the dice experimet, select the followig order statistic ad die distributio. Icrease the umber of dice from 1 to 20, otig the shape of the probability desity fuctio at each stage. Now with = 4, ru the simulatio 1000 times, updatig every 10 rus. Note the apparet covergece of the relative frequecy fuctio to the desity fuctio. M aximum score with fair dice. M iimum score with fair dice. M aximum score with ace-six flat dice. M iimum score with ace-six flat dice. Joit Distributios Suppose agai that X has a cotiuous distributio. 15. Suppose that j < k. Use a heuristic argumet to show that the joit desity of (X, j, X, k) is g, j, k(y, z) = ( j 1, 1, k j 1, 1, k) F(y) j 1 f (y) (F(z) F(y)) k j 1 f (z) (1 F(z)) k, y < z Similar argumets ca be used to obtai the joit probability desity fuctio of ay umber of the order statistics. Of course, we are particularly iterested i the joit probability desity fuctio of all of the order statistics; the followig exercise gives this joit probability desity fuctio, which has a remarkably simple form. 16. Show that (X, 1, X, 2,..., X, ) has joit probability desity fuctio give by g (y 1, y 2,..., y ) =! f (y 1 ) f (y 2 ) f (y ), y 1 < y 2 < < y For each permutatio i = (i 1, i 2,..., i ) of (1, 2,..., ), let S i = {x R : x i1 < x i2 < < x i }. O S i, the mappig (x 1, x 2,..., x ) (x i1, x i2,..., x i ) is oe-to-oe, has cotiuous first partial derivatives, ad has Jacobia 1.
4 of 7 7/16/2009 6:06 AM e. The sets S i where i rages over the! permutatios of (1, 2,..., ) are disjoit. The probability that (X 1, X 2,..., X ) is ot i oe of these sets is 0. Now use the multivariate chage of variables formul Agai, there is a simple heuristic argumet for the formula i Exercise 16. For each y R with y 1 < y 1 < < y, there are! permutatios of the coordiates of y. The probability desity of (X 1, X 2,..., X ) at each of the this poits is f (y 1 ) f (y 2 ) f (y ). Hece the probability desity of (X, 1, X, 2,..., X, ) at y is! times this product. 17. Cosider a radom sample of size from the expoetial distributio with rate parameter r. Compute the joit probability desity fuctio of the order statistics (X, 1, X, 2,..., X, ). 18. Suppose that (X 1, X 2,..., X ) is a radom sample of size from the uiform distributio o the iterval [ a, b], where a < Show that (X 1, X 2,..., X ) is uiformly distributed o [ a, b] (X, 1, X, 2,..., X, ) is uiformly distributed o {x [ a, b] : a < x 1 < x 2 < < x < b}.. 19. Four fair dice are rolle Fid the joit probability desity fuctio of the order statistics. Derived Statistics We will study several importat statistics that are based o order statistics. S ample Rage The sample rage is the radom variable R = X, X, 1 This statistic gives a simple measure of the dispersio of the sample. Note the distributio of the sample rage ca be obtaied from the joit distributio of (X, 1, X, ) give earlier. 20. Cosider a radom sample of size from the expoetial distributio with rate parameter r. Show that the sample rage R has the same distributio as the maximum of a radom sample of size 1 from this expoetial distributio. 21. Cosider a radom sample of size from the uiform distributio o [ 0, 1]. Show that R has the beta distributio with left parameter 1 ad right parameter 2.
5 of 7 7/16/2009 6:06 AM Give the mea ad variace of R. What happes as? 22. Four fair dice are rolle Fid the probability desity fuctio of the sample rage. The Sample Media If is odd, the sample media is the middle of the ordered observatios, amely X, k where k = + 1 2 If is eve, there is ot a sigle middle observatio, but rather two middle observatios. Thus, the media iterval is [ X, k, X, k+1] where k = 2 I this case, the sample media is defied to be the midpoit of the media iterval 1 2 ( X, k + X, k+1) where k = 2 I a sese, this defiitio is a bit arbitrary because there is o compellig reaso to prefer oe poit i the media iterval over aother. For more o this issue, see the discussio of error fuctios i the sectio o Variace. I ay evet, sample media is a atural statistic that is aalogous to the media of the distributio. Moreover, the distributio of the sample media ca be obtaied from our results o order statistics. S ample Quatiles We ca geeralize the sample media discussed above to other sample quatiles. Suppose that p ( 0, 1). Let k = ( + 1) p, the iteger part of ( + 1) p, ad let q = ( + 1) p k, the fractioal part of ( + 1) p. Usig liear iterpolatio, we defie the sample quatile of order p to be X, k + q (X, k+1 X, k) = (1 q) X, k + q X, k+1 Oce agai, the sample quatile of order p is a atural statistic that is aalogous to the distributio quatile of order p. Moreover, the distributio of a sample quatile ca be obtaied from our results o order statistics. The sample quatile of order 1 4 is kow as the first sample quartile ad is frequetly deoted Q 1. The the sample quatile of order 3 4 is kow as the third sample quartile ad is frequetly deoted Q 3. Note that
6 of 7 7/16/2009 6:06 AM the sample media is the quartile of order 1 2 ad is sometimes deoted Q 2. The iterquartile rage is defied to be IQR = Q 3 Q 1 The IQR is a statistic that measures the spread of the distributio about the media, but of course this umber gives less iformatio tha the iterval [ Q 1, Q 3 ]. Exploratory Data Aalysis The five statistics (X, 1, Q 1, Q 2, Q 3, X, ) are ofte referred to as the five-umber summary. Together, these statistics give a great deal of iformatio about the distributio i terms of the ceter, spread, ad skewess. Graphically, the five umbers are ofte displayed as a boxplot, which cosists of a lie extedig from the miimum X, 1 to the maximum X,, with a rectagular box from the first quartile Q 1 to the third quartile Q 3 ad tick marks at the miimum, the media Q 2, ad the maximum. 23. I the iteractive histogram, select boxplot. Costruct a frequecy distributio with at least 6 classes ad at least 10 values. Compute the statistics i the five-umber summary by had ad verify that you get the same results as the applet. 24. I the iteractive histogram, select boxplot. Set the class width to 0.1 ad costruct a distributio with at least 30 values of each of the types idicated below. The icrease the class width to each of the other four values. As you perform these operatios, ote the shape of the boxplot ad the relative positios of the statistics i the five-umber summary: e. f. A uiform distributio. A symmetric, uimodal distributio. A uimodal distributio that is skewed right. A uimodal distributio that is skewed left. A symmetric bimodal distributio. A u-shaped distributio. 25. I the iteractive histogram, select boxplot. Start with a distributio ad add additioal poits as follows. Note the effect o the boxplot: e. f. Add a poit below X, 1. Add a poit betwee X, 1 ad Q 1. Add a poit betwee Q 1 ad Q 2. Add a poit betwee Q 2 ad Q 3. Add a poit betwee Q 3 ad X,. Add a poit above X,.
7 of 7 7/16/2009 6:06 AM I the last problem, you may have oticed that whe you add a additioal poit to the distributio, oe or more of the five statistics does ot chage. I geeral, quatiles ca be relatively isesitive to chages i the dat 26. Compute the five umber summary ad sketch the boxplot for the velocity of light variable i M ichelso's dat Compare the media with the true value of the velocity of light. 27. Compute the five umber summary ad sketch the boxplot for the desity of the earth variable i Cavedish's dat Compare the media with the true value of the desity of the earth. 28. Compute the five umber summary ad sketch the boxplot for the et weight variable i the M&M dat 29. Compute the five umber summary for the sepal legth variable i Fisher's iris data, usig the cases idicated below. Plot the boxplots o parallel axes, so you ca compare. All cases Type Setosa oly Type Vergiica oly Type Versicolor oly Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 Cotets Applets Data Sets Biographies Exteral Resources Key words Feedback