Metrika, Volume 28, 1981, page 257-262. 9 Viea. Estimatio Problems for Rectagular Distributios (Or the Taxi Problem Revisited) By J.S. Rao, Sata Barbara I ) Abstract: The problem of estimatig the ukow upper boud 0 o the basis of a sample of size from a uiform or rectagular distributio o [0, 0] has cosiderable iterest. This or the aalogous discrete versio is variously kow as the "Taxi-problem" or the "Germa bomb (or Tak) problem" ad has a log history. The emphasis here is o estimatio of 0 through the legths of the observed gaps or spacigs which seem atural for this problem. 1. Itroductio Let X1... X be a radom sample from a uiform distributio o [0, 0]. Estimatio of the ukow upper boud 0 is if iterest, for istace, i coectio with estimatig the total umber of taxis i a tow o the basis of observed registratio umbers or i estimatig the umber of eemy bombs (or taks) o the basis of observed serial umbers, providig of course, some obvious assumptios hold. See, for istace, Noether [ 1971,2-5 ] for a elemetary discussio. A cotiuous uiform distributio will be assumed here, which provides a good approximatio to the results i the case of a discrete uiform o the itegers {1, 2... 0}. I fact, aalogous results may be obtaied for the latter case. Whe (X1... X) is a radom sample from R (0, 0), the rectagular (or uiform) distributio o (0, 0), the followig results are kow ad stated for completeess. Let o <Xl <X2 <... <X < o (1.1) deote the order statistics. The sample maximum X is a complete sufficiet statistic ad has the cumulative distributio fuctio (calf) FX (x) = (x/o), 0 <x < 0. (1.2) From (1.2), it is see thate o (X) = (/ + 1)0 ad hece T _ +,l 1 X (1.3) 1 ) j.s. Rao, Departmet of Mathematics, Uiversity of Califoria, Sata Barbara, Califoria 93 106, USA.
258 J.S. Rao is ubiased for 0. Sice this estimator is a fuctio of the complete sufficiet statistic, it follows from the Rao-Blackwell ad Lehma-Scheff6 theorems that T is the (essetially) uique uiformly miimum variace ubiased estimate (umvue) of 0 [see, for istace, David, p. 96]. Sample spacigs or observed gaps come aturally ito play i this problem sice X falls short of 0 by a amout equal to the last gap. Now we itroduce some basic facts about spacigs. Spacigs are defied to be the gaps betwee successive observatios, i.e. Di =)(i --Xi.1,, i= 1,2,..., (1.4) where we put Xo =-- O. Sice is held fixed i all subsequet discussios, we shall drop the secod subscript i )(i' Di' etc. to simplify the otatio. If oe defies U i=x i/o ad T i=d i/o, i=1... (1.5) the (Ux... U) has the same distributio as a radom sample from ar(0, 1) distributio while (T1... T) correspod to the "uiform spacigs." These (T.) form a exchageable set of radom variables with a joit Dirichlet distributio. Recall that a k-dimesioal radom vector (Yx... Yk) has a Dirichlet distributio deoted by D(rl... rk; rk+l) if it has the joit desity P(rl +...+rk+ 1) k r.-1 yk)rk+l-1 f(yl,. "" 'Yk) -- k+l (i~=xyi' )(I--y1--...-- (1.6) II I ~ (ri) i=1 overthesimplexsk=o~: yi>~o, k Zl Yi<-l)i Rk" SeeWilks[1962,177-182]fora excellet discussio of the basic facts about this distributio. I particular, (TI... T) has a -variate Dirichlet D(1,..., 1; 1) with all the parameter values uity, i.e., with desity f ( tl... t) =! (1.7) over the simplex S = {t: t i >~ 0, 1Z t i ~< 1 ) i R [see, for istace, David, 79-80]. From (1.7) it follows that ay T i has a D(1 ; ) or Beta (1, ) distributio. From this ad the fact D i ad Ti/O have the same distributio, it ca be verified that E(Di) = O/( + 1) V(Di) = 02/( + 1)2( + 2) (1.8) Coy (Di,/9/.) =--02/( + 1)2( + 2) for i v~/. It may be oted i passig that Dirichlet radom variables have a additive property
Estimatio Problems for Rectagular Distributios 259 l amely that for l ~< k, ( Y~ Y.) has ad(rl +... + rl; rl+ 1 +... + rk+l) which is a i=1 Beta distributio. From this, the samplig distributios of the uiform order statistics r Ur i? 1 T/ad the sample rage U Ul = ~'l T/ca be writte dow immediate- ly as the Beta (r; + 1 -- r) ad Beta ( -- 1 ; 2) respectively. 2. Estimatio of 0 Estimatio of parameters through the use of a few or all of the order statistics has several advatages, pricipally their simplicity. See, for istace, Mosteller [ 1946] or David [1970, chapters 6 ad 7]. They are especially useful i situatios where trimmig ad cesorig of the observatios is part of the model ad yield drastic reductio i labor over the optimal methods which ca be sometimes laborious. We suggest here estimatio through spacigs, liear combiatios i which are equivalet to liear fuctios of order statistics. For a discussio of liear estimatio through order statistics, refer to David [ 1970, p. 102]. As poited out earlier, the sample maximum falls short of 0 by a amout equal to the last gap. Sice the gaps are exchageable, addig the legth of ay of the gaps or the average legth of ay set of gaps or merely multiplyig ay gap by ( + 1) yields ubiased estimators of 0. Thus, for r = 1... Tlr = X + D r = 2D r+ ~ D., i~r t (2.1) T2r=X+--EDi = ~D. 1+ + s r i=1 i=1 t r+l t ad T3r = ( + 1) D r are all ubiased estimators of 0. From (1.8), oe ca verify Var(Tlr ) = 402/( + 1)2( + 2) Var(T2r)=O2(l + l)/(+ l)(+ 2) (2.2) Var(T3r ) = 02/( + 2). Because of symmetry, the variace expressios for {Tlr) ad {T3r} do ot deped o the specific D r that is used while the V(T2r ) decreases with r ad is a miimum for r = for which x(1) T2 =X + = 1 + X=T (2.3) 17
260 J.S. Rao defied i (1.3). Also recall that if X = 1E Xi/ deotes the sample mea, the 2)~ =2 i=l~(--i + 1) Di (2.4) provides yet aother ubiased estimator of 0. Thus oe may cosider a liear combiatio of spacigs w =t l hi" ~ (2.5) to estimate 0. Ubiasedess of W implies the coditio Z bi = ( + I) (2.6) 1 which is, of course, the case with all the estimators i (2.1) ad (2.4). It is ow atural to ask for the best liear ubiased estimate of 0 from amog the class (2.5). A elemetary calculatio usig (1.8) shows that the variace of W is miimized subject to (2.6) whe bi = (( + 1/) for i = 1..., with the resultig estimator (2.3). The equal weights are to be expected o all the spacigs from symmetry cosideratios. Sice (2.3) is the umvue ad is also of the from (2.5), it is o surprise that it is the best liear ubiased estimate. Ideed, equatio (1.8) shows that the vector D = = (D1... D)' follows a liear model with expectatio ( + 1) -101 where i is the colum vector with all oes, ad covariace matrix Q = [( + 1) I - - 1 I] (02/( + 1) 2 ( + 2)). Usig the fact that the iverse of [( + 1)/ -I l'] is ( + 1) -1 [1 + ll'], the formal Gauss-Markov least squares estimator (i its slightly geeralized versio sice the covariace matrix,is ot diagoal) is give by = -2,Q-I ]-i [(+l) 1 1 [(+l) "1 1 QD] (2.7) \ / which is agai the statistic i (2.3) with equal weights bi = ( + 1)/. Alterately oe ca approach the problem of ~timatig 0 with the goal of miimizig the mea square error,(mse) where MSE(0) = E(O -- 0) 2 ad relax the coditio (2.6) that the estimator 0 be ubiased. If we use equal weights, say b, o all {Di}, the the problem is to fid the weight b for which the estimator N bd i = bx has the smallest MSE. It is easy to verify that i= 1 [ b2 2b ] MSE(bX) =E(bX -0)2 = 02 ( + 2) ( + 1) I- 1 (2.8)
Estimatio Problems for Rectagular Distributios 261 which is miimized whe b = ( + 2)/( + 1). Thus the estimator (+2/x r4 =\ ~-il ] (2.9) has the smallest MSE. It is iterestig to compare this with the other competitors amely the umvue T2 i (2.3) ad the maximum lieklihood estimator X. Takig b to be (( + 2)/( + 1)), (( + 1)/) ad 1 respectively i (2.8), we get MSE(T4 ) = OZ/( + 1) 2 MSE(T2 ) = 02/( + 2) (2.10) MSE(X ) = 202/( + 1) ( + 2) from which it follows that with respect to the MSE criterio, T4 give i equatio (2.9) is uiformly better tha the umvue T2 give i (2.3) which i tur is uiformly better tha the maximum likelihood estimator X. This icidetally is aother istace of a situatio where the umvue is ot admissible uder the quadratic loss fuctio. Aother iterestig way to improve the estimators give i (2.1) with respect to their MSE's is give by the followig procedure: Sice the coefficiet of variatio v (i.e., Var(0)/02) is idepedet of 0 (cf. equatio (2.2)), 0* = (1 + v)-a0 yields aother estimator of 0 with MSE(0*) = Var(0*) + [Bias(0*)] 2 V2 v q_ =0 =02 (1 +v) 2 (1 +v) 2 2(V) which is uiformly smaller tha the MSE of the origial estimator 0. Thus each of the ubiased estimators i (2.1) may be improved with respect to the MSE. This yields the estimators ad = ( + 1) 2( + 2) T~lr ( + l)2( + 2) +4 (A +Dr) = r(+ 1)(+2) T~r r(+ l)(+ 2)+(r+ l)t2r T~ r = + 2Dr 2 (2.11) which have smaller MSE's tha the correspodig ubiased estimators give i (2.1). While the MSE of T~r ad T~' r does ot deped o r, the MSE of T~2 r does deped o r ad is a miimum for r =. It is very iterestig to ote that the resultig T~ is ideed (( + 2)/( + 1)) A, the estimator with miimum MSE that we obtaied i (2.9).
262 J.S. Rao But the real advatage of usig spacigs i estimatio of 0 comes i situatios of cesorig where some of the order statistics at either ed or i the middle are missig. The best liear ubiased estimate based o the spacigs would the be to put equal weights o the available or observed gaps. I particular, if the sample is cesored so that oe observes oly the m-th largest order statistic Xm (for m ~< ), the the followig are all ubiased estimators of 0 T~r = Xm + (( + 1)-m)D r r T~r=Xm + +l--m Z D i (2.12) r i=1 T~r = ( + 1)D r for r = 1,..., m. By a aalysis similar to that used before, it may be show, that the best liear ubiased estimate of 0 is to take T~m= ~ (+----~lld=+l i=1 m ] i m Xm (2.13) with variace V(T~m ) = ( --m + 1)02/( + 2)m. (2.14) Thus spacigs seem to be the atural quatities to cosider i the estimatio of 0. Sarha/Greeberg [ 1959] discuss the problem of cesorig at both eds i rectagular populatios usig order statistics. This alterate approach based o spacigs yields the same results, more effortlessly. The author is very grateful to the referee for his may helpful commets ad suggestios. Refereces David, H.A. : Order Statistics. New York 1970. Mosteller, F. : O some useful "iefficiet" statistics. A. Math. Statist. 17, 1946, 377-408. Noether, G. : Itroductio to statistics - a fresh approach. Bosto 1971. Sarha, A.E., ad G.B. Greeberg: Estimatio of locatio ad scale parameters for the rectagular populatios from cesored samples. J.R. Statist. Soc. B21, 1959, 356-363. Wilks, S.S. : Mathematical Statistics. New York 1962. Received April 30, 1979 (revised versio December 1979)