Nonparametric Estimation of a Polarization Measure

Size: px

Start display at page:

Download "Nonparametric Estimation of a Polarization Measure"

Nicholas Bishop
5 years ago
Views:

1 Nonarametric Estimation of a Polarization Measure Gordon Anderson y University of Toronto Oliver Linton z The London School of Economics Yoon-Jae Whang x Seoul National University June 0, 009 Abstract This aer develos methodology for nonarametric estimation of a olarization measure due to Anderson (004) and Anderson, Ge, and Leo (006) based on kernel estimation techniques. We give the asymtotic distribution theory of our estimator, which in some cases is nonstandard due to a boundary value roblem. We also roose a method for conducting inference based on estimation of unknown quantities in the limiting distribution and show that our method yields consistent inference in all cases we consider. We investigate the nite samle roerties of our methods by simulation methods. We give an alication to the study of olarization within China in recent years. Some key words: Kernel Estimation; Inequality; Overla coe cient; Poissonization JEL Classi cation Number: C, C3, C4. Thanks to David Mason for helful comments and roviding us with coies of his work. GAUSS rograms that carry out the comutations in this aer are available from the web site htt://ersonal.lse.ac.uk/lintono/software.htm. We thank Sorawoot Srisuma for helful research assistance. y Gordon Anderson, Deartment of Economics, University of Toronto, 50 St. George Street, N303, Canada. address: anderson@chass.utoronto.ca z Deartment of Economics, London School of Economics, Houghton Street, London WCA AE, United Kingdom. address: o.linton@lse.ac.uk. Thanks to the ESRC and ERC for nancial suort. This aer was artly written while I was a Universidad Carlos III de Madrid-Banco Santander Chair of Excellence, and I thank them for nancial suort. I thank seminar articiants at CEMFI for helful comments. x Deartment of Economics, Seoul National University, Seoul 5-74, Korea. whang@snu.ac.kr.

2 Introduction Polarization is the rocess whereby a social or olitical grou is divided into two oosing subgrous with fewer and fewer members of the grou remaining neutral or holding an intermediate osition. It is the subject of some interest in economics as it is both cause and consequence of much economic behavior. There have been several roosed univariate olarization indices which focus on an arbitrary number of grous (Esteban and Ray (994), Esteban, Gradin and Ray (998), hang and Kanbur (00) and Duclos, Esteban and Ray (004)) and a similar number of bi-olarization measures that focus on just two grous (Alesina and Solaore (997), Foster and Wolfson (99), Wolfson (994) and Wang and Tsui (000)). All consider olarization within a distribution to be a tendency toward multile searating modes, in essence addressing the otential for an observed distribution to be a mixture of several (or just two) unobserved sub-distributions. Gigliarano and Mosler (008) develo a family of multivariate bi-olarization measures based uon estimates of between and within grou multivariate variation and relative grou size which require observations on the two sub-distributions. It is worth noting that, within alications in economics, the measure is readily adated to general tests of distributional di erences in examining issues of convergence, indeendence and mobility for examle. The measure resented here falls into this category (though it can be readily extended to many grous) in that it re ects the olarization of the distributions of two or more identi able grous. Duclos, Esteban and Ray (004) evaluate olarization measures on the basis of the extent to which they satisfy axioms formed around a notional univariate density f(x) that is a mixture of kernels f(x; a i ) that are symmetric uni-modal on a comact suort of [a i ; a i+ ] with E(x) = i = a i+. The kernels are subject to slides (location shifts) that reserve the shae of the kernels and squeezes that are location reserving shrinkages of the kernel to their resective locations. Potential indices are evaluated in the context of such changes in terms of whether they satisfy axioms that require symmetric squeezes and outward slides to increase the olarization measure. To the extent that the kernels remain overlaed this will be the case for the index roosed here. It is also required that common oulation scaling of the kernels will reserve the ordering which will also be the case here. They argue that valid measures lie in the class P (f) = R R f(x) + f(y)jy xjdydx; where [0:5; ]: They roose an estimator of P (f) and is establish its asymtotic roerties. We will focus on measurement of olarization between two well-de ned grous. One technique for assessing olarization between two grous is to evaluate how much they have in common according to some objective outcome variable, such a measure corresonds to non-alienation and its negative corresonds to a degree of alienation. Anderson (004) and Anderson, Ge, and Leo (006) rooses an overla measure as an index of convergence and a function of its negative as a measure of alienation. An excellent summary of the roerties of the univariate indices is to be found in Esteban and Ray (007).

3 The extent to which two densities f; g overla is given by = minff(x); g(x)gdx: () It is a number between zero and one with zero corresonding to no overla and one to the erfect matching of the two distributions. It follows that is a measure of the extent to which the distributions do not match or are alienated. This quantity was rst introduced by Weitzman (970) in a comarison of income distributions by race. Note that is a unit free measure, invariant to a common smooth monotonic transformation. The de nition can easily be extended to the multivariate x case, and indeed we treat this case below. This quantity has received a lot of attention in medical statistics, Mizuno, Yamaguchi, Fukushima, Matsuyama, and Ohashi (005), and Ecology, where it is known as the overla coe cient or the coe cient of community, see for examle Ricklefs and Lau (980). To rovide a sense of the magnitude of we looked at the male-female height distributions from the National Health and Nutrition Survey of 999 for the US age grou 0-9. For these data, is aroximately Previous work, Anderson, Ge and Leo (006), has shown how to estimate and conduct inference about it when f; g are arametric, albeit in the very secial setting where e ectively there are a nite number of cells and the frequency of each cell can be estimated at square root of samle size accuracy. The discretized setting can be exected to lose information in general. Also, there is no consensus on aroriate arametric models for income distributions for examle, and the issue of misseci cation bias suggests a nonarametric aroach where this can be done e ectively. We roose a nonarametric estimator of using kernel density estimates of f; g lugged into the oulation functional: Although these estimates and regular functionals of them are well understood, the oulation arameter is a nonsmooth functional of f; g and so standard methods based on Taylor series exansion cannot be alied to treat the estimator. The roerties of the estimated can be nonstandard deending on the contact set fx : f(x) = g(x) > 0g : This set can be emty, it can contain a countable number of isolated oints, or it can be a union of intervals. In the rst case, the asymtotics are trivial because this imlies that one density always lies strictly below the other, and is not very interesting. The second case yields standard normal tye asymtotics as in between the contact oints one density estimate always revails. The third case is a sort of boundary value case. It is of interest because it corresonds to the case where the distributions overla erfectly over some range. This is an imortant case because one hyothesis of interest is that the two distributions are identical (or identical over a range) as one might believe in some alications. In that case there are binding inequality restrictions, which may be exected to induce non-normal asymtotics. We show the distribution theory for this latter case using some Poissonization techniques due to Giné et al. (003). It turns out that the limiting distribution is normal after a bias correction. In ractice, we do not know which of these three cases arises and so our inference method should be robust to these di erent ossibilities. In addition, it can be that the two densities while not identical are close to each other over a range of values, and this would induce a distortion in the usual asymtotic

4 aroximation. We develo an analytical aroach to inference and show that it yields consistent inference whatever the nature of the contact set. Estimation We assume that there are oulation random variables X; Y; where X has density f and Y has density g: Note that is invariant to monotonic transformations of X; Y; that is, if X = (X) and Y = (Y ) for a strictly increasing di erentiable transformation ; and X and Y have densities f and g ; then = R minff (t); g (t)gdt by standard alication of the law of transformation. Note R also that = jf(x) g(x)j dx; which shows that de nes a seudometric on the sace of densities. An alternative reresentation of is as an exectation = E [min f; `g;f (X)g] = E [min f; `f;g (Y )g] ; () where `g;f (x) = g(x)=f(x) is the likelihood ratio, which can be convenient for comuting estimators, see below. We will assume a multivariate setting where X; Y are d-dimensional vectors. In this case we shall assume that the integral is over all of the variables. In this case it is also ossible to consider integrating with resect to a subset of variables or to consider conditional densities, but we shall leave that for future work. We suose that there is a samle f(x ; Y ); : : : ; (X n ; Y n )g of size n on the oulation. In some cases one might have di erent sizes n; m for the two samles, but we shall leave this discussion till later. We roose to estimate by f n (x) = n b = C minff n (x); g n (x)gdx; (3) nx K b (x X i ) ; g n (x) = n i= nx K b (x Y i ) ; where C R d is the union of the suorts or some subset of interest, while K is a multivariate kernel and K b (:) = K(:=b)=b d and b is a bandwidth sequence. For simlicity we suose that the same bandwidth is used in both estimations and at each oint x: When K 0; f n (x); g n (x) 0: When X; Y have unbounded suort, R f n (x)dx = R g n (x)dx = : There is an issue about boundary e ects in the case where the suort is comact and the densities are ositive on the boundary. In that case one might use some boundary correction method. In ractice one has to comute a multivariate integral in (3) and a simle aroach is to just relace b by a samle average over a set of grid oints on the suort. Alternatively, one can take the samle average over the observations of the emirical version on (). An alternative estimator here, based on the transformation idea, is b = minf; gn X (u)gdu; 0 3 i=

5 3 Asymtotic Proerties We next discuss the asymtotic behavior of b as n! : We treat the case where X; Y have unbounded suort R d as this is more challenging and erhas of more interest for alications. 3 Schmid and Schmidt (006) have recently established consistency of b in the univariate comactly suorted secial case. We use the following notation. De ne the contact set and its comlements: C f;g = x R d : f(x) = g(x) > 0 ; (4) C f = fx : f(x) < g(x)g ; (5) C g = fx : f(x) > g(x)g : (6) Let = ( ; : : : ; d ) > denote a vector of nonnegative integer constants. For such vector, we de ne jj = P d i= i and, for any function h(x) : R d! R; D h(x) jj d d )(h(x)); where x = (x ; : : : ; x d ) > and x = Lebesgue measure of A and d Q j= x j j : For a Borel measurable set A Rd ; we de ne (A) to be the f (A) = A f = (x)dx: Let jjkjj = K (u)du and (t) = K(u)K(u + t)du=jjkjj : R d R d Assumtions. (A) (i) K is a s -th order kernel function having suort in the closed ball of radius = centered at zero, symmetric around zero, integrates to, and s -times continuously di erentiable, where s is an integer that satis es s > d: (ii) The kernel satis es (t) = some ositive constants c and : c ktk + o(ktk ) as ktk! 0 for (A) The densities f and g are strictly ositive on R d ; bounded and absolutely continuous with resect to Lebesgue measure and s - times continuously di erentiable with uniformly bounded derivatives. (A3) The bandwidth satis es: (i) nb s! 0; (ii) nb d! ; (iii) nb d = (log n)! ; (iv) nb d(+)= = (log n) (+)=! ; where is a ositive constant that satis es A5 when (C f;g ) > 0 and = otherwise. (A4) fx i : i g and fy i : i g are i.i.d. and indeendent from each other with suort R d : where g X n (x) is the density estimate based on transformed data F X n (Y i ); i = ; : : : ; n; where F X n () = n P n i= (X i ) is the emirical rocess of X: It turns out this has identical asymtotic distribution to our estimator. Schmid and Schmidt (006) nd not much di erence between the estimators in simulation exeriments. 3 This imlicitly rules out the case = 0: 4

6 (A5) (i) Whenever (C f;g ) > 0; the densities f and g satisfy h (") := f (fx : 0 < jf(x) "g) = O(" ) as "! 0 for some ositive constant : (ii) R C f;g f = (x)dx < : Remarks. By the triangle inequality we have b jf n (x) f(x)j dx + g(x)j jg n (x) g(x)j dx; (7) so that under weaker conditions than A3, seci cally just b! 0 and nb d! ; we have b = O (b s ) + O (n = b d= ): The assumtion A3 is needed for the asymtotic normality. To ful l A3 we require that s > d and s > d( + )=; in the univariate case with > it su ces to have twice di erentiable densities and bandwidth in the range n =4 to n = ; i.e., undersmoothing relative to what would be the otimal bandwidth for estimation of the densities themselves but not too much undersmoothing. If s d; the smoothing bias term dominates and revents the distribution theory below alying. In A4 we assume that the variables are mutually indeendent and the samle is i.i.d. This can be relaxed under further conditions. Assumtion A5 controls the behavior of the density functions near the boundary of the contact set C f;g : It has to do with the sharness in the decrease of h = f g to zero, see Härdle, Park and Tsybakov (995), Hall (98), and Cuevas and Fraiman (997) for related concets. It is like a tail thickness condition excet that it only alies in the vicinity of C f;g : If h is bounded away from zero outside of C f;g, then can be set to be : Assumtion A5 is used to derive the asymtotic distribution of n( b ) and to get a consistent estimator of the centering term a n in Theorem below. The requirement in A5(ii) that R C f;g f = (x)dx = E f(x) = (f(x) = g(x)) < rules out the case where both f; g are the same Cauchy density since R C f;g f = (x)dx = in this case; condition A5(ii) is imlied by the condition that E [jjxjj + (f(x) = g(x))] < for some > 0: De ne: 0 = Pr(X C f;g ) = E [ (f(x) = g(x))] = Pr(Y C f;g ) = E [ (f(y ) = g(y ))] f = Pr(X C f ) = E [ (f(x) < g(x))] ; g = Pr(Y C g ) = E [ (f(y ) > g(y ))] = f ( f ) + g ( g ); v = : Theorem. Suose that Assumtions A-A5 hold. Then, we have: n b a n =) N(0; v); a n = b d= jjkjj C f;g f = (x)dx E min f ; g ; 0 = jjkjj T 0 cov min f ; g ; min n(t) + (t) 3 ; (t) + o (t) 4 dt; where ; ; 3 ; and 4 are indeendent standard normal random variables and T 0 = ft R d : ktk g: 5

7 Remarks.. The statistic b is consistent rovided only b! 0 and nb d! as follows from the above discussion. Our result shows under stronger conditions that the statistic is asymtotically normal after subtracting a bias term that is of order n = b d= : The bias term deends on the integral of the square root of either density over the contact set, and this is non zero whenever this set has some measure. In fact, E min f ; g = 0:56 and so a n 0; so that the estimator is downward biased. The bias can be arbitrarily large deending on the magnitude of R C f;g f = (x)dx: We show below how to comute a feasible bias corrected estimator that achieves root-n consistency, but to do that we will require additional conditions.. The limiting variance deends on the magnitudes of the sets C f;g ; C f ; and C g under the relevant robability measures along with constants that just deend on the kernel chosen. We have v jjkjj (T 0 ) + = in general, and in the scalar case with the uniform kernel we have v 5=: It is not known what is the otimal kernel here, but we susect that the uniform kernel is otimal due to its minimum variance roerty. We have calculated 0 for various kernels in the univariate case and resent the results below: Kernel K(u) 0 jjkjj Uniform [juj 0:5] 0:635 :000 Triangular ( + 4u) [ 0:5 u 0] + ( 4u) [0 < u 0:5] 0:648 :3334 (u) Normal [juj 0:5] 0:667 :004 ( 0:5) Eanechnikov 6 u [juj 0:5] 0:675 :999 4 Biweight 30 4 u [juj 0:5] 0:669 :475 In the secial case that the contact set is of zero measure, = f + g ; and we see that 0 = 0 and a n = 0 so that n( b ) ) N(0; ): This asymtotic variance is actually the semiarametric e ciency bound for the case where the sets C f and C g are known, so that b is fully e cient in this case. 3. The roof of Theorem uses the decomosition of the estimation error into three stochastic terms lus a remainder term: n( b ) = n ff n (x) Ef n (x)gdx + n fg n (x) Eg n (x)gdx C f C g + n minff n (x) Ef n (x); g n (x) Eg n (x)gdx + R n ; C f;g where R n = O ( nb s ) = o (): The rst two terms are more or less standard in the semiarametric literature as integrals of semiarametric estimators over some domain. The nal term is what causes the main issue, at least when C f;g has ositive measure. This term is similar in sirit to what is obtained in other boundary estimation roblems, Andrews (999). For examle, consider the roblem of estimating = minf X ; Y g; where X = EX and Y = EY: When X = Y ; the usual estimator 6

8 b = minfx; Y g satis es n( b ) = minf n(x X ); n(y Y )g =) minf X ; Y g; where [ n(x X ); n(y Y )] =) [ X ; Y ] = and is bivariate normal with zero mean. In this case, the limiting distribution of b has a negative mean and is non-normal. In the case of b there is a negative bias term but after subtracting that o one has asymtotic normality. The intuitive reason is that our estimator involves averages of aroximately indeendent random variables. The formal justi cation though is more comlex because the behavior of the stochastic rocess n (x) = (f n (x) Ef n (x); g n (x) Eg n (x)); x C f;g is not standard. If f n (x); g n (x) were c.d.f. s we could aly the functional central limit and continuous maing theorems to obtain the limiting distribution, but this is not available here even at the slower rate of the ointwise convergence of n (x) because of a lack of tightness. If n (x) and n (x 0 ) for x 6= x 0 were indeendent we could instead argue that R C f;g minff n (x) Ef n (x); g n (x) Eg n (x)gdx is like a sum of indeendent random variables and aly a central limit theorem after recentering. Although n (x) and n (x 0 ) are asymtotically indeendent for x 6= x 0 they are not exactly so and in any case the integral requires we treat also the case where x x 0 = tb for jjtjj ; and for such sequences n (x) and n (x 0 ) can be highly deendent. In Aendix B we give a further discussion about this. The argument to exloit asymtotic indeendence and establish normality is based on the so-called Poissonization, which was originally used by Kac (949). The idea behind Poissonization is that the behavior of a xed oulation roblem should be close to that of the same roblem under a Poisson model having the xed oulation roblem size as its mean. The additional randomness introduced by Poissonization allows for alication of techniques that exloit the indeendence of the increments and the behavior of moments. This technique has been used in a number of laces including combinatorical mathematics and analysis of algorithms. We next discuss how to conduct consistent inference on the arameter using the theory resented in Theorem. For inference we must estimate consistently the quantities 0 ; f ; and g ; and estimate R C f;g f = (x)dx consistently at a better rate than b d= : We require some additional conditions: Assumtions (A6) E [kxk (f(x) = g(x))] < for some > such that n ( 0 jj s; R C f;g jd f(x)j=f = (x)dx <. (A7) The tuning arameter satis es c n! 0; nb d c n! and nb d c + n constant that satis es A5 when (C f;g ) > 0 and = otherwise. )= b d! : For all with! 0; where is a ositive The condition A6 is needed in the case where C f;g = R d as it is used to bound the estimation error of R C f;g f = (x)dx = E[f = (X) (f(x) = g(x))]; which can be badly a ected by heavy tails. It imoses a further restriction on the bandwidth, and so for small values of one needs a lot of smoothness in f; g to comensate. If X; Y are Gaussian then only an additional logarithmic constraint is imosed on the bandwidth. Condition A7 imlicitly imoses a stronger restriction on the bandwidth than A3. Generally there is both an uer and lower bound on the tuning arameter; in the case that = there is only a lower bound on the tuning arameter. 7

9 De ne the bias corrected estimator and asymtotic variance estimator: ^a n = 0:56 jjkjj b d= Then, we have the following result: b bc = b ^an =n = ; bv = ^ ^! fn = (x)dx + gn = (x)dx ^C f;g ^C f;g ^C f = x R d : f n (x) g n (x) < c n ; f n (x) > 0; g n (x) > 0 ^C g = x R d : f n (x) g n (x) > c n ; f n (x) > 0; g n (x) > 0 ^C f;g = x R d : jf n (x) g n (x)j c n ; f n (x) > 0; g n (x) > 0 ^ 0 =! f n (x)dx + g n (x)dx ^C f;g ^C f;g ^ f = f n (x)dx; ^ g = g n (x)dx ^C f ^C g ^ = ^ f ( ^ f ) + ^ g ( ^ g ): Theorem. Suose that Assumtions A-A7 hold. Then, we have: n b bc ) N(0; v) (8) Remarks. bv! v: (9). Note that the bias corrected estimator falls outside of [0; ] with ositive robability, which motivates the construction of a winsorized version b bc w = b bc ( b bc [0; ]) + ( b bc > ): When [0; ); n( b bc bc b w ) = o (): However, if = ; the limiting distribution is not normal, seci cally bc n( b w )= bv ) minf; 0g; where N(0; ):. This theorem can be used to construct consistent con dence intervals for : In order to make the interval resect the fact that the arameter sace here is [0; ] one can make the interval for H( b bc w ) and back transform, where H : [0; ]! R is strictly increasing and continuously di erentiable, for examle the logit transform. distribution, then the two-sided interval has asymtotic coverage C = H [H( b bc w ) Seci cally, let z be the th quantile from a standard normal z = q h ( b )bv=n; H( b bc w ) + z = qh ( b )bv=n] and lies inside [0; ] with robability one. Here, h() 3. Note that the bandwidth arameter b and the tuning arameter c n are asymtotically negligible, and only a ect higher order roerties, which are hard to analyze theoretically. We investigate the choice of these arameters in the simulation study below. 8

10 4. If one strongly believes that C f;g is of measure zero, then one can conduct inference using the uncorrected estimator b and the variance estimator ev = e f ( e f ) + e g ( e g ); where e f = f n (x)(f n (x) < g n (x))dx eg f = g n (x)(f n (x) g n (x))dx; that is, the tuning arameter c n = 0. In this case, ev! v = : However, if it turned out that C f;g has ositive measure then b is biased and the standard errors are inconsistent. Seci cally, it can be shown that e f! f + 0 = and e g! g + 0 = using a similar argument to that used in Theorem. 5. If one strongly believes that C f;g is of measure zero and that there is a single crossing oint x 0 such that f(x) < g(x) for all x < x 0 and f(x) > g(x) for all x > x 0 (scalar case); then one can estimate the crossing oint by nding the minimum of jf n (x) g n (x)j over x: Under some conditions such an estimator bx is consistent and even asymtotically normal, and satis es nb(bx x 0 ) ) N(0;!) with! = jjkjj f(x 0 )=(f 0 (x 0 ) g 0 (x 0 )) ; see Eddy (980): 6. The bootstra is an alternative method for roviding con dence intervals. In the secial case where the contact set has zero measure, standard bootstra resamling algorithms can be alied to conduct inference. However, as reorted in Clemons and Bradley (000) the standard bootstra con dence intervals start erforming badly when! ; i.e., when the contact set has ositive measure. 4 A Simulation Study Here we look at the small-samle erformance of b and b bc. Anderson and Ge (004) have investigated the erformance of an estimator of in the case where there are either one or two crossings. We consider the more interesting case where the contact set has ositive measure. The design is X i U [ 0:5; 0:5] ; and Y i U [0; ] ; where fx i g and fy i g are indeendent, so that = 0:5 and C f;g = [0; 0:5]. We consider samles sizes n = 00; 00; 400; 800; and 600 and take a thousand relications of each exeriment. The estimator is comuted using the uniform kernel, i.e., K (u) = (juj 0:5) for which jjkjj = and (t) = ( + t)( t 0) + ( t)(0 t ): In this case f = g = 0 and 0 = 0:5: It follows that a n = 0:8b = ; while v = = 0:3067: The bandwidth takes two values, either the Silverman s rule of thumb value, in this case b s = :84sn =5 ; where s is the samle standard deviation, or the smaller value bs 3= : In construction of the bias corrected estimator b we choose the tuning arameter c n to be either the bandwidth b; the smaller value b 3= or the larger value b =3 s : The suorts of interested are estimated from the samle, seci cally the common suort set in this case is estimated by the interval [maxfmin in X i ; min in Y i g; minfmax in X i ; max in Y i g]: The integrals are comuted based on a grid of ve hundred equally saced oints in [ 0:5; ]: 9

11 In Figure we show the result of a tyical samle. Although in the oulation the two densities are equal throughout the interval [0; 0:5]; this haens with robability zero in samle. In this examle there are seven crossings of the two density estimates. Figure. Shows two estimates of f; g reorted over the intersection of their suorts for case We reort our results in Table. n = 00: We give the bias, the median bias (mbias), the standard deviation (std), and the interquartile range divided by.349 (iqr) for the two estimators for the various combinations of samles sizes, bandwidths, and tuning arameters. summarized as follows:. The bias is quite large comared to the standard deviation The results can be. The erformance measures imrove with samle size at a rate roughly redicted by the theory (as can be con rmed by least squares regression of ln(-bias) on a constant and ln n) 3. The bias corrected estimator has a smaller bias and larger standard deviation 4. The best erformance for b is when bandwidth is b s although there is not a lot of di erence for the larger samle sizes 5. The best erformance in terms of standard deviation for b bc is when bandwidth is b 3= s ; although for the smaller samles sizes bias is best at b s : The best value of the tuning arameter for bias is the larger one b =3 ; whereas for variance b 3= is better. Finally, we look at the quality of the distributional aroximation. In Figure we show the qq lot for standardized b in the case n = 800 and bandwidth is b s : The aroximation seems quite good, with most discreancy in the left tail. 0

12 Figure. QQ-lot of samle data versus standard normal In Table we reort the results of a multivariate simulation. The design was standard normal densities in dimensions,,3,4 and 5. In this case = and C f;g = R d : We imlemented as above with the best combinations of bandwidth/tuning arameter uncovered in Table. The results are even more dramatic in this case. The bias correction method seems to roduce much better bias with a small cost in terms of increased variability. 5 Alication Much ink has been silled on how the economic reforms in China bene ted cities on the eastern seaboard relative to those in the interior. Evidence on er caita urban incomes suggests greater advances for seaboard rovinces than for inland rovinces. Partly the result of regional comarative advantage, it also re ected weak government regional equalization olicy, imerfect caital markets, and initial referential olicies on FDI and exorts and from the growth of tax revenues as their develoment roceeded for the seaboard rovinces (Anderson and Ge (004), Gustafsson, Li, and Sicular (007)). Urbanization also took lace di erentially on the seaboard and inland with cities growing more raidly both in size and number in the seaboard rovinces than in the interior (Anderson and Ge (006, 008)). The question arises as to whether the consequences of the reforms have translated into an imrovement in the relative wellbeing of individuals in seaboard as comared to interior rovinces. To investigate this, samles of urban households in two Chinese rovinces, Guangdong - an eastern seaboard rovince and Shaanxi - a rovince in the interior (see the ma of China below), taken in 987 and 00 are emloyed. 4 4 These data were obtained from the National Bureau of Statistics as art of the roject on Income Inequality during China s Transition organized by Dwayne Benjamin, Loren Brandt, John Giles and Sangui Wang.

13 One aroach to the relative wellbeing issue is to examine whether or not household wellbeing in central and seaboard rovinces has olarized. Esteban and Ray (994) and Duclos, Esteban and Ray (004) (see also Wang and Tsui (000)) osited a collection of axioms whose consequences should be re ected in a Polarization measure. The axioms are founded uon a so-called Identi cation- Alienation nexus wherein notions of olarization are fostered jointly by an agent s sense of increasing within-grou identity and between-grou distance or alienation. When one distribution stochastically dominates the other it can be argued that such measures also re ect a sense of relative ill-being of the imoverished grou and when there is a multilicity of indicators measures of "Distributional overla" aear to erform quite well Anderson (008). 5 Indicators emloyed to re ect household wellbeing are total exenditures er household member and household living area er household member. Table 3 resents summary statistics for the samles, some observations are aroriate. Both rovinces have advanced in terms of their consumtion exenditures and living sace er erson so that overall wellbeing may be considered to have advanced in both rovinces. The ga between exenditures, which re ects the alienation comonent of olarization and favors Guangdong, widened and the ga between living sace (again favoring Guangdong) remained unchanged so that olarization may well have increased in terms of the alienation comonent. Movements in the disersion of these comonents have less clear imlications for the identi cation art of olarization. In Guangdong disersion of living sace er erson diminished whereas in Shaanxi it increased, with resect to disersion of exenditures they increased in both rovinces but much more so in Shaanxi than in Guangdong to the extent that Shaanxi overtook Guangdong in its exenditure er erson disersion over the eriod. This suggests that little can be said about olarization by iecemeal analysis of its comonents. We rst show the univariate density lots, which were calculated with Gaussian kernel and Silverman s rule of thumb bandwidth. These con rm the general trends identi ed in the samle statistics Note that emirically there is only one crossing for the exenditure data but the housing variable has several crossing oints. 5 Using a multivariate Kolmogorov-Smirnov criterion the hyothesis that the Guangdong joint distribution rst order stochastically dominates the Shaanxi joint distribution could not be rejected in both years whereas the hyothesis that Shaanxi dominates Guangdong could (details from the authors on request)

14 Figure 3 We next comute the univariate and multivariate olarization measures: Let b e ; b h ; and b eh denote resectively the measure comuted on the univariate exenditure data series, the univariate housing series, and the bivariate data. We comuted these quantities with a uniform kernel and bandwidth either equal to the Silverman s rule of thumb bandwidth b s or bs 3= : We also comuted the bias corrected estimators denoted with suerscrit bc using tuning arameter b =3 : We comute both our standard errors and the standard errors that assume that the contact set is of zero measure, these are denoted by sc: In this dataset there are di erent samle sizes n and m that aly to the estimation of f and g: The distribution theory for this case is only a trivial modi cation of the theory resented above. In articular, suose that m=n! (0; ); then the asymtotic distribution is as in Theorem with a n = b d= jjkjj C f;g f = (x)dx E min f ; =g = f ( f ) + g ( g )= 0() = jjkjj cov T 0 min f ; =g ; min n (t) + (t) 3 ; (t) = + o (t) 4 = dt; 3

15 += += q where E min f ; =g = E min f ; g = and 0() = 0()( + =)=: For the bivariate roduct uniform kernel 0() = 0:5835: We comuted the bias correction and standard errors using these modi cations. The results are shown in Table 4. The results show a substantial reduction in the value of the overla measure for the joint distribution and also the univariate measure for exenditure. There is a slight decrease also in the overla of the housing variable, but this is not statistical signi cant. The level of the overla is quite high in general and the bias correction increases it quite substantially. The estimators are relatively insensitive to the choice of bandwidth. The standard errors are quite small and there is not much di erence between the full standard errors and the standard errors that imose zero measure on the contact set. Evidently there has been a signi cant olarization (reduction in overla) between the rovincial joint distributions of consumtion exenditures and living sace re ecting deterioration in the wellbeing of households in Shaanxi relative to those in Guangdong Note that an alternative to the overla measure could be obtained by comuting the Duclos Esteban and Ray (004) olarization measure generalized to the multivariate case and based on the ooled distribution. This is a somewhat more general index of the multilicity and diversity of modes and requires secifying a olarization sensitivity arameter which should lay between 0.5 and. We comuted this measure for the two years and record the results below. = 0:5 = 0:5 = 0:75 = : se se Note the index is sensitive to the choice of their olarization sensitivity arameter : at low levels of sensitivity the index actually diminishes over time whereas at high levels it increases. 6 Extensions There are some alications where the data come from a time series and one would like to allow for deendence in the observations. For examle, we might like to comare two or more forecast densities. In this case the theory becomes more comlicated and it is not clear that the Poissonization method can be alied. However, in the secial case that the contact set has zero measure, one can derive the limiting distribution for b based on the asymtotic reresentation n( b ) = n P = n i= (X i C f ) + n P = n i= (Y i C g ) + o (); assuming some restriction on the strength of the deendence. Our theory extends to the case of k densities f ; : : : ; f k in an obvious way. In that case, one might also be interested in measuring quantities related to a artial overla. Seci cally, suose 4

16 that f i (x) : : : f ik (x); then minff (x); : : : ; f k (x)g = f i (x) and maxff (x); : : : ; f k (x)g = f ik (x): Then, f ir (x); for < r < k reresents a situation where r of the densities overla. 6 We remark that the functional is related to some recent work of Kitagawa (009) who has discussed the density enveloe (f; g) = R maxff(x); g(x)gdx, which arises as a quantity of interest from consideration of a artially identi ed model. Related to this quantity is an alternative overla measure given by # = R minff(x); g(x)gdx R (0) maxff(x); g(x)gdx; which has similar roerties to. This quantity has the advantage of being sensible for cases where f; g are hazard functions or sectral densities that do not themselves integrate to one; in that case can take any non-negative value, whereas # lies between zero and one. The distribution theory for the analogue estimator of # follows by similar arguments to ones we have given above. A Aendix A. Informal Discussion of the Proof Technique Although the estimators and con dence intervals are easy to use in ractice, the asymtotic theory to rove Theorem involves several lengthy stes. Since establishing these stes require techniques that are not commonly used in econometrics, we now give a brief informal descrition of our roof techniques. Seci cally, our roof of Theorem consists of the following three stes:. The asymtotic aroximation of n( b ) by A n, given by () below, which decomoses the estimation error into three di erent terms, de ned over the disjoint sets C f ; C g and C fg ; resectively.. Get the asymtotic distribution of A P n (B); a Poissonized version A n ; where the samle size n is relaced by a Poisson random variable N with mean n that is indeendent of the original sequence f(x i ; Y i ) : i g and the integral is taken over a subset B of the union of the suorts of X and Y: 3. De-Poissonize A P n (B) to derive the asymtotic distribution of A n and hence n( b ): In ste, we make the bias of kernel densities asymtotically negligible by using the smoothness assumtions on true densities and roerties of kernel functions, which allows us to write A n as a functional of the centered statistics f n (x) Ef n (x) and g n (x) Eg n (x): Also, the decomosition into three terms is related to the recent result in the moment inequality literature that, under inequality 6 This is of interest in a number of biomedical alications. See for examle htt://collateral.knowitall.com/collateral/9539-overladensityheatma.df 5

17 restrictions, the asymtotic behavior of statistics of interest often deend only on binding restrictions, see, e.g. Chernozhukov, Hong, and Tamer (007), Andrews and Guggenberger (007) and Linton, Maasoumi and Whang (005). In ste, Poissonization of the statistic A n gives a lot of convenience in our asymtotic analysis. In articular, it is well known that if N is a Poisson random variable indeendent of the i.i.d sequence fx i : i g and fa k : k g are disjoint measurable sets, then the rocesses P N i=0 (X i A k ) Xi, k = ; ; :::, are indeendent. This imlies, for examle, that, since the kernel function K is assumed to have a comact suort, the Poissonized kernel densities f N (x) and f N (y) are indeendent if the distance between x and y is greater than a certain threshold. This facilitates comutation of asymtotic variance of A P n (B): Also, since a Poisson rocess is in nite divisible, we can write P N i=0 X d i = P n i=0 i; where f i : i g are i.i.d with = d P i=0 X i and is a Poisson random variable with mean and indeendent of fx i : i g: The fact is used reeatedly in our roofs to derive the asymtotic distribution of A P n (B); using standard machineries including CLT and Berry Esseen theorem for i.i.d. random variables. In ste 3, we need to de-poissonize the result because the asymtotic behavior of the Poissonized variable A P n (B) is generally di erent from A n : For this urose, we use the de-poissonization lemma of Beirlant and Mason (995, Theorem., see also Lemma A. below). To illustrate the Lemma in a simle context, consider a statistic n = n P = n i= f(x i B) Pr(X B))g, where B R is a Borel set. By a CLT, we know that n ) N(0; B ( B )); where B = Pr(X B): Now, consider a Poissonized statistic S n = n P = N i= f(x i B) Pr(X B))g : The asymtotic distribution of S n is given by N(0; B ); which is di erent from that of n : However, letting U n = n P = N i= f(x i C) Pr(X C))g and V n = n P = N i= f(x i RnC) Pr(X RnC))g ; where B C R is a Borel set, and alying the Poissonization lemma, we get that the conditional distribution of S n given N = n has the same asymtotic distribution as n : Although the above stes closely follow those of Giné et. al. (003), we need to extend their results to the general multi-dimensional variates d ; multile kernel densities, and norms di erent from the L - norm. Such extensions, to our best knowledge, are not available in the literature and are not trivial. A. Proof of the Main Theorems Under our conditions, we have r! log n su jf n (x) xr d f(x)j = O(b s ) + O nb d a:s:; () by Giné and Guillou (00, Theorem ) and standard treatment of the bias term, and likewise for g n (x) g(x): We use this result below. 6

18 Let A n = n min ff n (x) Ef n (x); g n (x) Eg n (x)g dx C f;g + n [f n (x) Ef n (x)] dx + n [g n (x) Eg n (x)] dx C f C g = : A n + A n + A 3n () We will show that the asymtotic distribution of of A n is normal when suitably standardized. Theorem A. Under Assumtions (A)-(A5), we have A n a n ) N(0; ): The roof of Theorem A will be given later. Given Theorem A, we can establish Theorem. Proof of Theorem. We will show below that n b = A n + o (): (3) Then, this result and Theorem A yield the desired result of Theorem. To show (3), write n b = n [minffn (x); g n (x)g minff(x); g(x)g] dx R d = n minffn (x) f(x); g n (x) g(x)gdx C f;g + n minffn (x) f(x); g n (x) f(x)gdx C f + n minffn (x) g(x); g n (x) g(x)gdx C g = : n + n + 3n : (4) Consider n rst. Write n = n min ff n (x) Ef n (x); g n (x) Eg n (x)g dx C f;g + n [minffn (x) f(x); g n (x) g(x)g min ff n (x) Ef n (x); g n (x) Eg n (x)g] dx C f;g = : A n + n : (5) 7

19 We have j n j! n fjef n (x) f(x)j dx + jeg n (x) g(x)jg dx C f;g = 4! n jef n (x) f(x)j dx C f;g 0 4 nb D f(x e bu) u K(u) dudxa s! C f;g R d X jj=s = O(n = b s )! 0; (6) where the rst inequality uses the elementary result jminfa + c; b + dg minfa; bgj (jcj + jdj) ; the rst equality follows from the de nition of C f;g ; the second inequality holds by a two term Taylor exansion with 0 < e b < b and Assumtion A, the last equality holds by Assumtions A and A, and the convergence to zero follows from Assumtion A3. We next consider n. We have n = n minf[fn (x) Ef n (x)] ; [g n (x) Eg n (x)] + [g(x) f(x)]gdx + O(n = b s ) C f = n [fn (x) Ef n (x)] dx + O(n = b s ) + o () C f = A n + o (); (7) where the rst equality follows from an argument similar to the one to establish (6) and second equality holds by the following argument: Let " > 0 be a constant and write n = : n minf[fn (x) Ef n (x)] ; [g n (x) Eg n (x)] + [g(x) f(x)]gdx C f n [fn (x) Ef n (x)] dx C f = n maxf[fn (x) Ef n (x)] [g n (x) Eg n (x)] [g(x) f(x)] ; 0gdx C f = n maxf[fn (x) Ef n (x)] [g n (x) Eg n (x)] [g(x) f(x)] ; 0gdx C f; (") + C f; (") n maxf[fn (x) Ef n (x)] [g n (x) Eg n (x)] [g(x) f(x)] ; 0gdx = : n + n ; where C f; (") = fx R d : 0 < g(x) f(x) "g; C f; (") = fx R d : g(x) f(x) > "g: 8

20 Take " = (nb d ) = (log n) for > 0: Then, we have j n j n maxf[fn (x) Ef n (x)] [g n (x) Eg n (x)] ; 0gdx C f; (") n su jf n (x) Ef n (x)j + su jg n (x) Eg n (x)j (C f; (")) xr d xr d O b d= (log n) = O nb d = (log n) = o (); (8) where the last inequality holds by the uniform consistency result () and Assumtion A5, and the convergence to zero holds by Assumtion A3. Also, for each > 0; we have Pr (j n j > ) Pr su jf n (x) Ef n (x)j + su jg n (x) Eg n (x)j > " xr d xr d nb d =! = Pr su jf n (x) Ef n (x)j + su jg n (x) Eg n (x)j > (log n) = log n xr d xr d! 0: Therefore, (7) follows from (8) and (9). Also, similarly to (7), we have 3n = A 3n + o (): (0) (9) Now, (5), (6), (7) and (0) establish (3), as desired. We rove Theorem A using the Poissonization argument of Giné et. al. (003). To do this, we need to extend some of the results of Giné et. al. (003) to the general multi-dimensional case d with multile kernel densities. Also, we need to consider norms di erent from L - norm. We rst introduce some concets used throughout the roofs. Let N be a Poisson random variable with mean n; de ned on the same robability sace as the sequence f(x i ; Y i ) : i g, and indeendent of this sequence: De ne f N (x) = n NX K b (x i= X i ) ; g N (x) = n NX K b (x Y i ) i= where b = b(n) and where the emty sum is de ned to be zero. Notice that Ef N (x) = Ef n (x) = EK b (x X) () k f;n (x) = nvar (f N (x)) = EKb (x X) () nvar (f n (x)) = EKb (x X) EK b (x X) : (3) Similar results hold for g N (x) and g n (x): Let C f;g ; C f and C g denote the sets de ned in (4)-(6). Suose 0 ; f and g are strictly ositive. (When any of 0 ; f or g is zero, we can trivially take B(M); B(M ); B(M; v); B(M ; v ); B 0 or B 0 9

21 for = f; g de ned below to be an emty set in our subsequent discussion.) For a constant M > 0; let B(M) R d denote a Borel set with nonemty interior with Lebesgue measure (B(M)) = M d : For v j > 0; de ne B(M; v) to be the v-contraction of B(M); i.e., B(M; v) = fx B(M) : (x; R d nb(m)) vg; where (x; B) = inffkx yk : y Bg: Let denote f or g: Take " (0; 0 ) and " (0; ) to be arbitrary constants. Choose M; M ; v; v > 0 and Borel sets B 0 ; B such that B(M) C f;g ; B(M ) C ; (4) B 0 B(M; v); B B(M ; v ) (5) f(x)g(y)dxdy = : > 0 (6) R d nt (M) f(x)dx B 0 = g(x)dx > 0 B 0 "; (x)dx > B " ; (7) and f and are bounded away from 0 on B 0 and B ; resectively, where T (M) = B(M f ) R d [ (B(M) B(M)) [ R d B(M g ) R d : (8) Such M; M ; v; v ; B ; and B 0 exist by continuity of f and g: Let B = B 0 [ B f [ B g : De ne a Poissonization version of A n (minus its exectation, restricted to the Borel set B) to be A P n (B) = A P n(b 0 ) + A P n(b f ) + A P 3n(B g ); (9) where A P n(b 0 ) = n min ffn (x) Ef n (x); g N (x) Eg n (x)g dx (30) B 0 ne min ffn (x) Ef n (x); g N (x) Eg n (x)g dx B 0 A P n(b f ) = n [f N (x) Ef n (x)] dx (3) B f A P 3n(B g ) = n [g N (x) Eg n (x)] dx: (3) B g Also, de ne the variance of the oissonization version A P n (B) to be n(b) = var A P n (B) : (33) To investigate the asymtotic distribution of A P n (B); we will need the following lemma, which is related to the classical Berry-Esseen theorem. Lemma A. (a) Let fw i = (W i ; : : : ; W 4i ) > : i g be a sequence of i.i.d. random vectors in R 4 such that each comonent has mean 0, variance ; and nite absolute moments of third order. 0

22 Let = ( ; : : : ; 4 ) > be multivariate normal with mean vector 0 and variance-covariance matrix = E > = EW W > 0 0 = B 0 0 A : 0 0 Then, there exist universal ositive constants A and A such that ( ) E min nx nx W i ; W i E min f ; g n n A E jw j 3 + E jw j 3 n i= and, whenever < and < ; " ( nx ) ( E nx nx n min W i ; W i min W 3i ; i= i= i= i= )# nx W 4i ( ) 3= ( ) 3= A n E jw j 3 + E jw j 3 + E jw 3 j 3 + E jw 4 j 3 : i= E [min f ; g min f 3 ; 4 g] (b) Let fw i = (W i ; : : : ; W 3i ) > : i g be a sequence of i.i.d. random vectors in R 3 such that each comonent has mean 0, variance ; and nite absolute moments of third order. Let = ( ; ; 3 ) > be multivariate normal with mean vector 0 and variance-covariance matrix 0 0 = E > = EW W > B C 0 A : Then, whenever + < ; " ( E min n nx W i ; n i= ) nx W i n i= # nx W 3i i= ( ) 3= A 3 n E jw j 3 + E jw j 3 + E jw 3 j 3 : E [min f ; g 3 ] We also need the following basic result of Beirlant and Mason (995, Theorem.), which is needed to "de-poissonize" our asymtotic results on the Poissonized random variables. Lemma A. Let N ;n and N ;n be indeendent Poisson random variables with N ;n being Poisson(n( )) and N ;n being Poisson(n) ; where (0; =): Denote N n = N ;n + N ;n and set U n = N ;n n( ) n and V n = N n n n : Let fs Nn : n g be a sequence of random variables such that (i) for each n ; the random vector (S Nn ; U n ) is indeendent of V n ; (ii) for some > 0 and such that ( ) > 0; (S Nn ; U n ) > ) N (0; ) ;

23 where Then, for all x; we have =! q Pr (S Nn x j N n = n)! Pr : x ; where denotes the standard normal random variable. have The following lemma derives the asymtotic variance of A P n (B): Lemma A3. Whenever Assumtions (A)-(A4) hold and B 0 ; B f ; and B g satisfy (4)-(7), we lim n! n(b) = 0;B 0 + ;B; (34) where 0;B = Pr(X B 0 ) = Pr(Y B 0 ); ;B = f;b + g;b + f;b g;b ; f;b = Pr(X B f ); g;b = Pr(Y B g ) and 0 is de ned in Theorem. De ne ( U n = N ) X ((X j ; Y j ) T (M)) n Pr ((X; Y ) T (M)) n j= ( V n = X N (X j ; Y j ) R d nt (M) n Pr (X; Y ) R d nt (M) ) ; n j= where T (M) is de ned in (8). We next establish the following convergence in distribution result. Lemma A4. Under Assumtions (A)-(A4), we have (A P n (B); U n ) > ) N (0; ) ; where and is de ned in (6). = 0;B 0 + ;B f;b + g;b f;b + g;b The following theorem gives the asymtotic bias formula. Lemma A5. Under Assumtions (A)-(A4), we have! h ne (a) lim min ffn (x) Ef n (x); g N (x) Eg n (x)g dx E min f ; g k n! B = 0 (b) lim n! B 0 f;n (x) i dx = 0 h ne min ffn (x) Ef n (x); g n (x) Eg n (x)g dx E min f ; g k = f;n (x) i dx = 0; where and are standard normal random variables.

24 De ne A n (B) = n [min ff n (x) Ef n (x); g n (x) Eg n (x)g (35) B 0 E min ff n (x) Ef n (x); g n (x) Eg n (x)g] dx + n [f n (x) Ef n (x)] dx + n [g n (x) Eg n (x)] dx: B f B g Using the de-poissonization lemma (Lemma A), we can show that the asymtotic distribution of A n (B) is normal. Lemma A6. Under Assumtions (A)-(A4), we have q A n (B) ) 0;B 0 + ;B ; where ;B = f;b( f;b ) + g;b ( g;b ) and stands for the standard normal random variable. The following two lemmas are useful to investigate the behavior of the di erence between the statistics A n (B) and A n : Lemma A7. Let fx i = (X > i; X > i) > R d R d : i = ; : : : ; ng be i.i.d random vectors with E kxk < : Let h j : R d R d! R be a real function such that Eh j (X j ; x) = 0 for all x R d for j = ; : Let T n = B ( nx min h (X k ; x); k= ) nx h (X k ; x) dx; where B R d is a Borel set. Then, for any convex function g : R! R; we have X nx! Eg(T n ET n ) Eg 4 " k jh j (X jk ; x)j dx ; j= k= where f" i : i = ; : : : ; ng are i.i.d random variables with Pr(" = ) = Pr(" = of fx i : i = ; : : : ; ng: k= B ) = =; indeendent Lemma A8. Suose that Assumtions (A)-(A4) hold. Then, for any Borel subset B of R d, we have lim E n! n B fh n (x) for some generic constant D > 0; where Eh n (x)g dxa D su jk(u)j u h n (x) = minff n (x) Ef n (x); g n (x) Eg n (x)g: B [f(x) + g(x)] dx; Now, we are now ready to rove Theorem A. 3

25 Proof of Theorem A. By Lemma 6. of Giné et. al.(003), there exists increasing sequences of Borel sets fb 0k C fg : k g, fb fk C f : k g; and fb gk C g : k g; each with nite Lebesgue measure, such that lim k! k! f(x)dx C f;g nb 0k = lim g(x)dx = 0 C f;g nb 0k (36) lim k! lim C f nb fk C gnb gk k! Let B k = B 0k [ B fk [ B gk for k : Notice that for each k ; by Lemma A6, we have q A n (B k ) ) 0;Bk 0 + ;B k as n! : (38) By (36) and (37), Also, by Lemma A8, we have where lim E n R fh n (x) n! C f;g nb 0k Similarly, we have lim E n R ff n (x) n! C f nb fk lim E n R fg n (x) n! C gnb gk Therefore, (40), (4), and (4) imly q q 0;Bk 0 + ;B k ) as k! : (39) Eh n (x)g dx! D su jk(u)j u h n (x) = minff n (x) Ef n (x); g n (x) Eg n (x)g: Ef n (x)g dx Eg n (x)g dx! D! D R C f;g nb 0k [f(x) + g(x)] dx; (40) su jk(u)j u su jk(u)j u R C f nb fk f(x)dx (4) R C gnb gk g(x)dx: (4) lim k! lim n! Pr An (B k ) (A n EA n ) > " = 0 8" > 0: (43) Now, by (38), (39) and (43) and Theorem 4. of Billingsley (968), we have, as n! ; A n EA n = n [min ff n (x) Ef n (x); g n (x) Eg n (x)g C f;g E min ff n (x) Ef n (x); g n (x) Eg n (x)g] dx + n [f n (x) Ef n (x)] dx + n [g n (x) Eg n (x)] dx C f C g q ) : (44) 4

26 Now, the roof of Theorem A is comlete since, similarly to Lemma A5, we have lim jea n a n j = 0: n! Proof of Theorem. To establish (8) we must show consistency of the bias correction. By the triangle inequality, we have fn = (x)dx f = (x)dx ^C f;g C f;g f = (x)dx + f = n (x) f = (x) dx ^C f;g C f;g ^C f;g = : D n + D n ; where denotes the symmetric di erence. For notational simlicity, let De ne h n (x) = f n (x) g n (x) and h(x) = f(x) g(x): ~C f;g = fx : jh(x)j c n g E n = fx : jh n (x) h(x)j c n g : We rst establish D n = o (b d= ) using an argument similar to Cuevas and Fraiman (997, Theorem ). Using C f;g ~ C f;g ; f ( ^C f;g \ ~ C c f;g \ Ec n) = 0 and f ( ^C c f;g \ C f;g \ E c n) = 0; we have D n = f ( ^C f;g C f;g ) = f ( ^C f;g \ C c f;g) + f ( ^C c f;g \ C f;g ) f ( ^C f;g \ ~ C c f;g) + f ( ~ C f;g \ C c f;g) + f ( ^C c f;g \ C f;g ) = f ( ^C f;g \ ~ C c f;g \ E n ) + f ( ~ C f;g \ C c f;g) + f ( ^C c f;g \ C f;g \ E n ) f (E n ) + & n ; (45) where & n = h(c n ): Also, by the rates of convergence result of the L -errors of kernel densities (see, e.g., Holmström and Klemelä (99)), we have that jh n (x) h(x)j f = (x)dx O b s + (nb d ) = : (46) Let n = minfb s ; (nb d ) = g: Then, for any " > 0, Pr b d= D n > " Pr f (E n ) + & n > "b d= Pr jh n (x) h(x)j f = (x)dx > "bd= & n c n Pr n jh n (x) h(x)j f = (x)dx > " nc n b d= + o()! 0; (47) 5

27 where the rst inequality holds by (45), the second inequality holds by the inequality (E n ) jh n (x) h(x)j =c n ; the third inequality follows from Assumtions A3 and A5 which imlies n c n & n! 0; and the last convergence to zero holds by (46) and n c n b d=! using Assumtion A3. This now establishes that D n = o (b d= ): We next consider D n : First note that with robability one b = minff n (x); g n (x)gdx = minff n (x); g n (x)gdx; C n where C n = [l n ; u n ] [l dn ; u dn ] with l jn = maxf min X ji; min Y jig b= in in u jn = minf max X ji; max Y jig] + b=: in in This holds since the kernel has comact suort with radius =. It follows that we can restrict any integration to sets intersected with C n. Using the identity x y = (x = y = )(y = + x = ); we have f = n (x) f = (x) jf n (x) f(x)j dx = ^C f;g \C n ^C f;g \C n f = (x) + f = (x) dx n jf n (x) f(x)j dx ^C f;g \C n f = (x) because fn = (x) 0 for x ^C f;g \C n : Then note that by the Cauchy-Schwarz and triangle inequalities nb d = jfn (x) f(x)j E jjkjj f = (x) nb d = E = jf jjkjj n (x) f(x)j f = (x) " nb d jefn (x) f(x)j # = nb d var(fn (x)) + : jjkjj f(x) jjkjj f(x) Then, we use the inequality (a + b) = + a = + b = for all ositive a; b; to obtain that nb d = " jfn (x) f(x)j nb d = # jef n (x) f(x)j nb d var(fn (x)) E + + jjkjj f = (x) jjkjj f = (x) jjkjj f(x) + 4 nb d = b X s D f(x) 3 u K(u)du5 + o(); jjkjj s! f = (x) where the second inequality follows by standard kernel arguments and is uniform in x R d. The bias term is of smaller order under our conditions given the absolute integrability of D f(x)=f = (x). Therefore, jj=s f = n (x) f = (x) jjkjj = dx (Cn ) ( + o()); ^C f;g \C n nb d b d= D n = O (n = b d r n ); 6 =

28 where r n = dy (u jn l jn ). For Gaussian X; Y; r n = O ((log n) d= ) so that this term is small under j= our conditions. More generally we can bound r n under our moment condition. Seci cally, by the Bonferroni and Markov inequalities Pr max jx jij > n in nx Pr [jx ji j > n ] n EjX jij ; i= rovided EjX ji j < : Therefore, we take n = n = L(n); where L(n)! is a slowly varying function. We then show that we can condition on the event that fr n n = L(n)g; which has robability tending to one. This comletes the roof of (8). The consistency of the standard error follows by similar arguments. n A.3 Proofs of Lemmas Proof of Lemma A. The results of Lemma A follow directly from Bhattacharya (975, Theorem). Proof of Lemma A. See Beirlant and Mason (995, Proof of Theorem.,.5-6). Proof of Lemma A3. To establish (34), we rst notice that, for kx yk > b; the random variables (f N (x) Ef n (x); g N (x) Eg n (x)) and (f N (y) Ef n (y); g N (y) Eg n (y)) are indeendent because they are functions of indeendent increments of Poisson rocesses and the kernel K vanishes outside of the closed ball of radius /. This imlies that cov A P n(b 0 ); A P n(b f ) = cov A P n(b 0 ); A P 3n(B g ) = 0: (48) On the other hand, by standard arguments for kernel densities, we have as n! var A P n(b f )! = E K b (x X) dx! f;b B f var A P 3n(B g ) = E K b (x Y ) dx! g;b (49) Bg cov A P n(b f ); A P 3n(B g )! = E K b (x X) dx K b (x Y ) dx! f;b g;b : B f Therefore, by (48) and (49), the roof of Lemma A3 is comlete if we show Bg lim var n! AP n(b 0 ) = 0;B 0: (50) 7

29 To show (50), note that var A P n(b 0 ) = n cov (min ff N (x) Ef n (x); g N (x) Eg n (x)g ; min ff N (y) Ef n (y); g N (y) Eg n (y)g) dxdy B 0 B 0 = n (kx yk b) B 0 B 0 cov (min ff N (x) Ef n (x); g N (x) Eg n (x)g ; min ff N (y) Ef n (y); g N (y) Eg n (y)g) dxdy: Let T f;n (x) = T g;n (x) = n ffn (x) Ef n (x)g ; (5) kf;n (x) n fgn (x) Eg n (x)g ; (5) kg;n (x) where k f;n (x) = nvar (f N (x)) and k g;n (x) = nvar (g N (x)) : By standard arguments, we have that, with (B 0 ) < ; q su k f;n (x) b d= jjkjj f(x) = O(b d= ) (53) xb 0 q su k g;n (x) b d= jjkjj g(x) = O(b d= ) (54) xb 0 (kx yk b) dxdy = O(b d ) (55) B 0 B 0 su jcov (minft f;n (x); T g;n (x)g; min ft f;n (y); T g;n (y)g)j = O(); (56) x;yr d where (53) and (54) holds by two term Taylor exansions and (56) follows from Cauchy-Schwarz inequality and the elementary result jminfa; bgj jaj + jbj : Therefore, from (53) - (56), we have that var A P n(b 0 ) = n;0 + o(); where n;0 = (kx yk b) cov (minft f;n (x); T g;n (x)g; min ft f;n (y); T g;n (y)g) B 0 B 0 b d jjkjj f(x)f(y)dxdy: (57) Now, let ( n (x); n (x); 3n (y); 4n (y)) for x; y R d ; be a mean zero multivariate Gaussian rocess such that for each x; y R d ; ( n (x); n (x); 3n (y); 4n (y)) and (T f;n (x); T g;n (x); T f;n (y); T g;n (y)) have the same covariance structure. That is, ( n (x); n (x); 3n (y); 4n (y)) ; ; f;n(x; y) + d = q f;n (x; y) 3 ; g;n(x; y) + q g;n(x; y) 4 ; 8

30 where ; ; 3 ; and 4 are indeendent standard normal random variables and f;n(x; y) = E [T f;n (x)t f;n (y)] g;n(x; y) = E [T g;n (x)t g;n (y)] : Let n;0 = (kx yk b) cov (minf n (x); n (x)g; min f 3n (y); 4n (y)g) B 0 B 0 b d jjkjj f(x)f(y)dxdy: (58) By a change of variables y = x + tb; we can write n;0 = (x B 0 ) (x + tb B 0 ) cov (minf n (x); n (x)g; min f 3n (x + tb); 4n (x + tb)g) T 0 B 0 jjkjj f(x)f(x + tb)dxdt; (59) where T 0 = ft R d : ktk g: Observe that, for almost every x B 0 ; we have f;n(x; x + tb) =! b d E [K ((x X)=b) K ((x X)=b + t)] b d EK ((x X)=b) EK ((x X)=b + t) R R d K(u)K(u + t)du jjkjj = (t): (60) Similarly, we have g;n(x; x + tb)! (t) as n! for almost all x B 0 : (In fact, under our assumtions, the convergence (60) holds uniformly over (x; t) B 0 T 0 :) Therefore, by the bounded convergence theorem, we have Now, the desired result (34) holds if we establish lim n! n;0 = 0;B 0: n;0 n;0! 0: (6) Set G n (x; t) = jjkjj (x B 0 ) (x + tb B 0 ) f(x)f(x + tb): Notice that B 0 T 0 G n (x; t)dxdt jjkjj (T 0 B 0 ) su xb 0 jf(x)j =: < : (6) Let " n (0; b] be a sequence such that " n =b! 0. Letting T 0;n = ft R d : " n =b ktk g; de ne n;0(" n ) = (x B 0 ) (x + tb B 0 ) cov (minft f;n (x); T g;n (x)g; min ft f;n (x + tb); T g;n (x + tb)g) T 0;n ; B 0 jjkjj f(x)f(x + tb)dxdt; 9

31 n;0(" n ) = B 0 (x B 0 ) (x + tb B 0 ) cov (minf n (x); n (x)g; min f 3n (x + tb); 4n (x + tb)g) T 0;n ; jjkjj f(x)f(x + tb)dxdt: To show (6), we rst establish We have n;0(" n ) n;0(" n ) = B 0 n;0(" n ) n;0(" n )! 0: (63) T 0;n [cov (minf n (x); n (x)g; min f 3n (x + tb); 4n (x + tb)g) cov (minft f;n (x); T g;n (x)g; min ft f;n (x + tb); T g;n (x + tb)g)] G n (x; t)dxdtj je minf n (x); n (x)ge min f 3n (x + tb); 4n (x + tb)g B 0 T 0;n E minft f;n (x); T g;n (x)ge min ft f;n (x + tb); T g;n (x + tb)gj G n (x; t)dxdt + je minf n (x); n (x)g min f 3n (x + tb); 4n (x + tb)g B 0 T 0;n E minft f;n (x); T g;n (x)g min ft f;n (x + tb); T g;n (x + tb)gj G n (x; t)dxdt = : n + n : First, consider n : Let denote a Poisson random variable with mean that are indeendent of f(x i ; Y i ) : i g and set Q f;n (x) = Q g;n (x) = 4 X x Xj K b j 4 X x Yj K b j x EK x EK Notice that, with f(x) = g(x) > 0 for all x B 0 ; we have Let Q () ;n (x); : : : ; Q(n) 3 X 5 x =s EK b 3 Y 5 x =s EK b X b Y b (64) (65) : (66) su E jq ;n (x)j 3 = O(b d= ) for = f and g: (67) xb 0 ;n (x) be i.i.d Q ;n(x) for = f and g: Clearly, we have T f;n (x) = T g;n (x) = n ffn (x) Ef n (x)g b d EK ((x X)=b) n fgn (x) Eg n (x)g b d EK ((x Y )=b) Therefore, by Lemma A(a), (67), (68), and (69), we have su je minft f;n (x); T g;n (x)g xb 0 d = d = E min f n (x); n (x)gj O 30 P n i= Q(i) f;n (x) n (68) P n i= Q(i) g;n(x) : (69) n : (70) nb d

32 The results (70) and (6) imly that n = o(). We next consider n : We have n su (x;t)b 0 T 0;n je minf n (x); n (x)g min f 3n (x + tb); 4n (x + tb)g E minft f;n (x); T g;n (x)g min ft f;n (x + tb); T g;n (x + tb)gj "n 3 O O ; (7) b nb d where the rst inequality uses (6) and the second inequality holds by Lemma A(a), (67), Assumtion A(ii), and the fact that lim n! f;n (x; x + tb) = lim n! g;n(x; x + tb) = (t) a.s. uniformly over (x; t) B 0 T 0. Since " n is arbitrary, we can choose " n = c 0 b (log n) =(6) for some constant c 0, where > 0 satis es Assumtion A(ii). This choice of " n makes the right hand side of (7) o(), using Assumtion A3. Therefore, we have n = o(), and hence (63). On the other hand, using (56) and argument similar to (70), we have n;0 n;0 n;0 (" n ) n;0(" n ) "n d O + O = o(): nb d b This and (63) establish (6), and hence (50), as desired. Proof of Lemma A4. Let n (x) = n [min ff N (x) Ef n (x); g N (x) Eg n (x)g ne min ff N (x) Ef n (x); g N (x) Eg n (x)g] ; f;n (x) = n ff N (x) Ef n (x)g ; g;n (x) = n fg N (x) Eg n (x)g We rst construct artitions of B(M); B(M f ) and B(M g ). Consider the regular grid G i = (x i ; x i +] (x id ; x id +]; where i = (i ; : : : ; i d ); i ; : : : ; i d and x i = ib for i : De ne R 0;i = G i \ B(M); R f;i = G i \ B(M f ); R g;i = G i \ B(M g ); I n = fi : R 0;i [ R f;i [ R g;i 6= ;g: Then, we see that fr 0;i : i I n d g; fr f;i : i I n d g and fr g;i : i I n d g are artitions of B(M); B(M f ) and B(M g ); resectively, with (R 0;i ) d 0 b d ; (R f;i ) d b d ; (R g;i ) d b d (7) m n = : (I n ) d 3 b d 3

33 for some ositive constants d 0 ; ::; d 3 ; see, e.g., Mason and Polonik (008) for a similar construction of artitions in a di erent context. Set i;n = (x B 0 ) n (x)dx + R 0;i (x B f ) f;n (x)dx R f;i (x B g ) g;n (x)dx; R g;i u i;n = X N [ f(x j R f;i ) [ (X j R 0;i ; Y j R 0;i ) [ (Y j R g;i )g n j= n Pr f(x R f;i ) [ (X R 0;i ; Y R 0;i ) [ (Y R g;i )g] : Then, we have Notice that A P n (B) = X ii n i;n and U n = X ii n u i;n : var(a P n (B)) = n(b) and var(u n ) = : For arbitrary ; and 3 R; let y i;n = i;n + u i;n : Notice that fy i;n : i I n g is an array of mean zero one-deendent random elds. Below we will establish that and var X ii n y i;n! = var( A P n (B) + U n ) (73)! 0;B 0 + ;B X + ( ) + ( f;b + g;b ) ; ii n E jy i;n j r = o() for some < r < 3: (74) Then, the result of Lemma A4 follows from the CLT of Shergin (990) and Cramér-Wold device. We rst establish (73). By Lemma A3, (73) holds if we have cov A P n (B); U n! f;b + g;b : (75) Recall that A P n (B) = A P n(b 0 ) + A P n(b f ) + A P 3n(B g ): Therefore, (75) holds if we have cov A P n(b f ); U n! f;b ; cov A P 3n(B f ); U n! f;b ; (76) n cov A P n(b); U n = cov min ff N (x) Ef n (x); g N (x) Eg n (x)g dx; U n (77) B 0 = O nb d 3

34 (76) holds by a standard argument. We show below (77). For any x B 0 ; we have! U n d T f;n (x); T g;n (x); = nx Q (i) Pr ((X; Y ) T (M)) n f;n (x); Q(i) g;n(x); U (i) ; (78) i= where Q (i) f;n (x); Q(i) g;n(x); U (i) for i = ; : : : ; n are i.i.d (Q f;n (x); Q g;n (x); U), with (Q f;n (x); Q g;n (x)) de ned as in (65) and (66) and 3 U = 4 X ((X j ; Y j ) T (M)) Pr ((X; Y ) T (M)) 5 = Pr ((X; Y ) T (M)): j Notice that, for = f and g; we have su jcov (Q ;n (x); U)j = O(b d= ); (79) xb 0 which in turn is less than or equal to " for all su ciently large n and any 0 < " < =: This result and Lemma A(b) imly that su xb 0 cov n min ffn (x) Ef n (x); g N (x) Eg n (x)g ; U n O which, when combined with (B 0 ) < ; yields (77) and hence (73), as desired. ; nb d We next establish (74). Notice that, with < r < 3; using Liaunov inequality and c r -inequality, we have i;n = (x B 0 ) n (x)dx + (x B f ) f;n (x)dx (x B g ) g;n (x)dx; R 0;i R f;i R g;i (E j i;n j r ) 3=r 9 B0 (x) B0 (y) B0 (z)e j n (x) n (y) n (z)j dxdydz (80) R 0;i R 0;i R 0;i + Bf (x) Bf (y) Bf (z)e j f;n (x) f;n (y) f;n (z)j dxdydz R f;i R f;i R f;i! + Bg (x) Bg (y) Bg (z)e j g;n (x) g;n (y) g;n (z)j dxdydz ; R g;i R g;i R g;i where B (x) = (x B): Also, by c r -inequality again and the elementary result jminfx; Y gj jxj + jy j ; we have: for some constant D > 0; E j n (x)j 3 Dn 3= E jf N (x) Ef n (x)j 3 + E jg N (x) Eg n (x)j 3 : (8) By the Rosenthal s inequality (see, e.g., Lemma.3 of Giné et al. (003)), we have: Ef n (x)j 3 O b + 3d= n = b d su n 3= E jf N (x) xb 0 33 : (8)

35 A similar result holds for g N : Now, Assumtion A3, (80), (8), (8), the elementary result EjXY j E (jxj + jy j + jj) 3 and the fact that (R 0;i ) d b d imly that the rst term on the right hand side of (80) is bounded by a O(b rd= ) term uniformly in i I n : Similar results hold for the other terms on the right hand side of (80). Therefore, we have E j i;n j r O(b rd= ) uniformly in i I n : (83) This imlies that On the other hand, set X ii n E j i;n j r O(m n b rd= ) = O(b (r )=d ) = o(): (84) i;n = Pr f(x R f;i ) [ (X R 0;i ; Y R 0;i ) [ (Y R g;i )g : Then, by the Rosenthal s inequality, there exists a constant D > 0 such that X X E ju i;n j r D n r= (n i;n ) r= + n i;n ii n ii n D max ii n ( i;n ) (r )= + n =! 0: Therefore, combining (84) and (85), we have (74). This comletes the roof of Lemma A4. Proof of Lemma A5. Consider art (a) rst. Let T f;n (x) and T g;n (x) be de ned as in (5) and (5), resectively. We have h ne i min ffn (x) Ef n (x); g N (x) Eg n (x)g dx E min f ; g k = f;n (x) dx B 0 su xb 0 je minft f;n (x); T g;n (x)g = O n = b d= O(b d= ) = O(n = b d ) = o(); E min f ; gj su xb 0 k = f;n (x) (B 0) by Lemma A(a), (53), Assumtion A3, and the fact (B 0 ) <. Similarly, we have h ne min ffn (x) Ef n (x); g n (x) Eg n (x)g dx E min f ; g nvar (f n (x))i dx = O(n = b d ): B 0 Therefore, art (b) also holds since, using () and (3), k = su f;n (x) nvar (fn (x)) = O(b d= ) = o(): xb 0 Proof of Lemma A6. Consider A P n (B) de ned in (9). Conditional on N = n; we have A n (B) = d n [min ff n (x) Ef n (x); g n (x) Eg n (x)g (86) B 0 E min ff N (x) Ef n (x); g N (x) Eg n (x)g] dx + n [f n (x) Ef n (x)] dx + n [g n (x) Eg n (x)] dx B f B g = : A C n (B): 34 (85)

36 By Lemmas A and A4, we have A C n (B) ) 0;B 0 + ;B ( f;b + g;b ) = q = 0;B 0 + ;B : Now, the result of Lemma A6 holds since, as n! = A C n (B) A n (B) B 0 ne min ffn (x) Ef n (x); g N (x) Eg n (x)g ne min ffn (x) Ef n (x); g n (x) Eg n (x)g dx! 0: Proof of Lemma A7. We can establish Lemma A7 by modifying the majorization inequality results of Pinelis (994). Let (X; : : : ; Xn) be an indeendent coy of (X ; : : : ; X n ): For i = ; : : : ; n; let E i and Ei denote the conditional exectations given (X ; : : : ; X i ) and (X ; : : : ; X i ; Xi ): Let i = E i T n E i T n ; (87) i = E i (T n T n; i ) ; (88) where Then, we have ( nx ) nx T n; i = min h (X k ; x); h (X k ; x) dx: B k6=i k6=i T n ET n = + n ; (89) i = i E i i ; (90) X j i j jh j (X ji ; x)j dx; (9) j= where (89) follows from (87), (90) holds by indeendence of X i s, and (9) follows from the elementary inequality jminfa + b; c + dg where Tn;i = B minfa; cgj (jbj + jdj) : Let i = E i T n;i T n; i ; ( nx min h (X k ; x) + h (Xi; x); k6=i B ) nx h (X k ; x) + h (Xi; x) dx: Notice that the random variables i and i are conditionally indeendent given (X ; : : : ; X i ); and the conditional distributions of i and i given (X ; : : : ; X i ) are equivalent. Therefore, for any k6=i 35

37 convex function q : R! R; we have E i q( i ) = E i q( i E i i ) E i q( i E i i i E i i ) E i [q( i ) + q( i )] " X! E i q 4 jh j (X ji ; x)j dx + q 4 X = Eq 4" i j= j= B B jh j (X ji ; x)j dx! ; X j= B jh j (X ji ; x)j dx where the rst inequality follows from Berger (99, Lemma.), the second inequality holds by the convexity of q and the last inequality follows from the convexity of q and (9). Now, the result of Lemma A7 holds by (89) and Lemma.6 of Berger (99). dy Proof of Lemma A8. Let R(x; r) = [x i r; x i + r] denote a closed rectangle in R d : We have 0 n B 8 0 < D : b d fh n (x) B x K D su jk(u)j u +D su jk(u)j u B = D su jk(u)j u B Eh n (x)g dxa X dxa b [f(x) + g(x)] dx i= + E b d B x K 9 Y = dxa b ; b d Pr (X R(x; b=)) f(x) dx + [f(x) + g(x)] dx + o()!# b d Pr (Y R(x; b=)) g(x) dx as n! ; where the rst inequality follows from Lemma A7. This results comletes the roof of Lemma A8. B Further Material B. Some Intuition We give below an alternative estimator/intuition about our estimator. Suose that the common suort is [0; ] and let x = b=; x = 3b=; : : : ; x T = b=; where T = =b; and suose the 36

38 kernel is suorted on [ 0:5; 0:5]: Then let b = T TX minff(x b t ); bg(x t )g: t= Note that although b f(x t ) and b f(x s ) are not strictly indeendent for s 6= t; they are aroximately so: Seci cally, cov( b f(x t ); b f(x s )) = O(n ); since h E bf(x t ) f(x b i s ) = nx E K n b x t X i Kb (x s X i ) + = i= h i h i E bf(x t ) E bf(x s ) : n n(n ) n E K b x t X E [K b (x s X)] We shall suose for heuristic reasons that minf b f(x t ); bg(x t )g is strictly indeendent of minf b f(x s ); bg(x s )g; although technically we should still aly the Poissonization technique used in this aer to roceed. Suose we can disose of the smoothing bias terms as before and suose for simlicity that C f;g = [0; ], then n( b ) = n T = TX minff(x b t ) Ef(x b t ); bg(x t ) Ebg(x t )g + o () t= TX nt + o (); t= where nt = T n = minf b f(x t ) E b f(x t ); bg(x t ) Ebg(x t )g: We can aly a triangular array CLT here after subtracting o the mean of nt ; but to make things simle lets make the further ste of the normal aroximation: Then n( b ) = nb n T TX minff(x t ) = jjkjj ft ; g(x t ) = jjkjj gt g + o (); t= =b X = jjkjj b f(x t ) = minf ft ; gt g + o (); t= where ft ; gt are standard normal random variables mutually indeendent when X; Y are but otherwise correlated. We then have the following result n( b ) jjkjj b = f(x) = dxe minf ft ; gt g =b X = jjkjj b f(x t ) = [minf ft ; gt g E minf ft ; gt g] + o (); t= 37

39 which is asymtotically normal with mean zero and variance jjkjj f(x)dxvar [minf ft ; gt g] : This should be comared with our 0: One can show that E [minf ; g ] = ; so that var [minf ft ; gt g] = 0:56 = 0:6874; which is bigger than 0 = 0:6 for the uniform kernel: This says that our more comlicated estimator that averages also over non-indeendent oints delivers about 0% imrovement in variance in this case. Note that the bias is the same for both estimators. You can aly the same tye of method in other semiarametric roblems and our sense is that the otimal kernel is the uniform if you only take account of the rst order e ect on variance. Examle is suose Y = + " i ; then the samle mean is BLUE. We can comute the kernel estimator against some covariates X and average, thus b = T TX bm(x t ); t= where bm(x t ) is a kernel estimator. Then when " i is indeendent of X i and X i is uniform on [0; ]; the asymtotic variance of b is jjkjj =n; which is minimized by taking K to be uniform on [ 0:5; 0:5]: In general the asymtotic variance of this method is worse than averaging over the samle oints (which is equivalent to the samle mean when K is uniform), but always asymtotic variance is minimized by taking uniform kernel. In our case, even averaging over the samle oints gives this nasty looking asymtotic variance but it still makes sense that the variance minimizing kernel is the uniform. Finally, we comare the asymtotics of the nonarametric estimator with those of a natural arametric alternative. The arametric roblem generally has di erent asymtotics. In articular, suose that f; g are arametrically seci ed so that f ; ; and g ; : Let b and b denote root-n consistent estimators of and ; such that [ n( b ); n( b )] > =) U N(0; ); and let b = R C minff b (x); g b (x)gdx: Then we have n( b ) = ( s (x)f (x); C s (x)g (x))u dx + o (); where s (x) log f (x)=@ and s (x) log g (x)=@ are the score functions. When the contact set has ositive measure, the asymtotic distribution is non-normal with a negative mean. 38

40 B. Comutation of kernel constants This note is regarding comutation of 0 in cases where samle sizes are not equal. We have to calculate R() = cov min f ; g ; min n + 3 ; + o 4 for given ; where i N(0; v i ) with v = v 3 = and v = v 4 =!: where We have min f ; g = U jv j min n + (t) 3 ; + o 4 = W jy j U = + V = W = Y = ; and 0 U V W 0 Y U 0 V 0 W 0 Y 0 0 C A = +! U 0 V 0 W 0 Y 0 C A C A N 0; C 5A : Therefore, We have E min f ; g = +! EjV j = EjV 0 j = +! r = r +! : R() = cov (U jv j; W jy j) 4 = +! cov (U 0 jv 0 j; W 0 jy 0 j) 4 = +! cov (U 0 ; W 0 ) 4 = +! 4 [ + cov (jv 0 j; jy 0 j)] ; +! cov (jv 0 j; W 0 ) 4 +! 4 cov (U 0 ; jy 0 j) + +! cov (jv 0 j; jy 0 j) 4 39

41 because for zero mean normals cov (jv 0 j; W 0 ) = cov (U 0 ; jy 0 j) = 0: Write jv 0 j = V 0 (V 0 > 0) V 0 (V 0 0) and jy 0 j = Y 0 (Y 0 > 0) Y 0 (Y 0 0); so that cov (jv 0 j; jy 0 j) = E[jV 0 jjy 0 j] E[jV 0 j]e[jy 0 j] = E [V 0 Y 0 (V 0 > 0)(Y 0 > 0)] + E [V 0 Y 0 (V 0 0)(Y 0 0)] E [V 0 Y 0 (V 0 > 0)(Y 0 0)] E [V 0 Y 0 (V 0 0)(Y 0 > 0)] = E [V 0 Y 0 (V 0 > 0)(Y 0 > 0)] E [V 0 Y 0 (V 0 0)(Y 0 > 0)] by symmetry. From Rosenbaum (96, JRSS B) we have E [V 0 Y 0 (V 0 > 0)(Y 0 > 0)] = F () + r E [V 0 Y 0 (V 0 0)(Y 0 > 0)] = E [V 0 Y 0 (V 0 > )(Y 0 > 0)] E [V 0 Y 0 (V 0 > 0)(Y 0 > 0)] = r F () = F () r ; where F () = Pr [V 0 > 0; Y 0 > 0] : Therefore, cov (jv 0 j; jy 0 j) = F () In the secial case that = we have F () = = so that + 4 r : In the secial case that = 0; cov (jv 0 j; jy 0 j) = E(jV 0 j ) = : cov (jv 0 j; jy 0 j) = 4 r = 0 as exected. In conclusion, R() = +! [ + cov (jv 0 j; jy 0 j)] 4 " = +! 4F () + 4 r 4 " # = ( +!) F () + : # 40

42 The main thing is that this shows that R(;!) = +! R(; ) so that we can calculate this quantity once and for all and aly it in situations where samle sizes m and n are di erent. References [] Anderson G.J. (004) Toward an Emirical Analysis of Polarization Journal of Econometrics -6 [] Anderson G.J. and Y. Ge (004) Do Economic Reforms Accelerate Urban Growth: The Case of China Urban Studies [3] Anderson G.J. and Y. Ge (005) The Size Distribution of Chinese Cities Regional Science and Urban Economics [4] Anderson G.J., Ge Y. and Leo T.W. (006) Distributional Overla: Simle Multivariate Parametric and Non-Parametric Tests for Alienation, Convergence and General Distributional Di erences mimeo Economics Deartment University of Toronto. Forthcoming Econometric Reviews (009). [5] Anderson G.J. (008) Polarization of the Poor: Multivariate Poverty Measurement Sans Frontiers mimeo University of Toronto Deartment of Economics. [6] Andrews, D.W.K. (995), "Nonarametric Kernel Estimation for Semiarametric Models." Econometric Theory, [7] Andrews, D. W. K. (999), Estimation When a Parameter Is on a Boundary. Econometrica 67, [8] Andrews, D. W. K. and P. Guggenberger (007), "The limit of exact size and a roblem with subsamling and with the m out of n bootstra," Working aer. [9] Beirlant, J. and D. M. Mason (995), "On the Asymtotic Normality of L -Norms of Emirical Functionals," Math. Methods Statist. 4, -9. [0] Berger, E. (99), "Marjorization, Exonential Inequalities and Almost Sure Behavior of Vectorvalued Random Variables, Annals of Probability 9, [] Bhattacharya, R. N. (975) "On Errors of Normal Aroximation," Annals of Probability 3,

43 [] Chernozhukov, V., H. Hong, and E. Tamer (007), "Estimation and con dence regions for arameter sets in econometric models. Econometrica, 75, [3] Clemons, T.E. and E.L. Bradley (000). "A nonarametric measure of the overlaing coe - cient," Comutational Statistics & Data Analysis 34, 5-6. [4] Cuevas A. and R. Fraiman (997), "A lug-in aroach to suort estimation," Annals of Statistics 5, [5] Duclos J-Y., J. Esteban and D. Ray (004) Polarization: Concets, Measurement and Estimation Econometrica [6] Eddy, W.F. (980). "Otimum kernel estimators of the mode," The Annals of Statistics 8, [7] Esteban, J-M. And D. Ray (994) On the Measurement of Polarization Econometrica 6, [8] Giné, E., D. M. Mason, and A. Y. aitsev (003), "The L -Norm Density Estimator Process," Annals of Probability 3, [9] Giné, E., and A. Guillou (00), "Rates if strong uniform consistency for multivariate kernel density estimators," Annals of the institute of Henri Poincare 6, [0] Gustafsson, Bjorn, Li Shi and Terry Sicular (eds) (007). Inequality and Public Policy in China, New York and Cambridge: Cambridge University Press [] Holmström and Klemelä (99), "Asymtotic bounds for the exected L error of a multivariate kernel density estimator," Journal of Multivariate Analysis 4, [] Kac, M. (949). On deviations between theoretical and emirical distributions. Proc. Natl. Acad. Sci., 35, [3] Kitagawa, T. (009). "Testing for Instrument Indeendence in the Selection Model," Working aer, Brown University. [4] Knight, John, hao Renwei and Li Shi (00). A satial analysis of wages and incomes in urban China: divergent means, convergent inequality, in Carl Riskin, hao Renwei and Li Shi (eds), China s Retreat from Equality: Income Distribution and Economic Transition, Armonk, New York: M.E. Share: [5] Linton, O., E. Maasoumi, and Y-J. Whang (005), "Consistent testing for stochastic dominance under general samling schemes," Review of Economic Studies, 7,

44 [6] Mason, D. M. and W. Polonik (008), "Asymtotic Normality of Plug-In Level Set Estimates," Unublished manuscrit. [7] Mizuno, S., T. Yamaguchi, A. Fukushima, Y. Matsuyama, and Y. Ohashi (005): "Overla coe cient for assessing the similarity of harmacokinetic data between ethnically di erent oulations", Clinical Trials, Vol., No., 74-8 [8] Pinelis, I. F. (994), "On a Majorization Inequality for Sums of Indeendent Random Variables," Statistics and Probability Letters 9, [9] Prakasa Rao, B.L.S. (983). Nonarametric Functional Estimation, Academic Press. [30] Ricklefs, R.E. and M. Lau (980). "Bias and disersion of overla indices: results of some monte carlo simulations," Ecology 6, [3] Schmid, F., and A. Schmidt (006), "Nonarametric estimation of the coe cient of overlaing - theory and emirical alication," Comutational Statistics and Data Analaysis 50, [3] Shergin, V. V. (990), "The Central Limit Theorem for Finitely Deendent Random Variables," In: Proc. 5th Vilnius Conf. Probability and Mathematical Statistics, Vol. II, Grigelionis, B., Prohorov, Y. V., Sazonov, V. V., and Statulevicius V. (Eds.) [33] Wand, M.P. and M.C. Jones (995) Kernel Smoothing. Chaman and Hall, Boca Raton. [34] Weitzman, M (970) Measures of Overla of Income Distributions of White and Negro Families in the U.S. Technical Paer Bureau of the Census. [35] Yitzhaki S., (994) Economic Distance and overlaing of distributions. Journal of Econometrics

45 C Tables b n b c bias mbias std iqr bias mbias std iqr 00 b s b b 3= b = bs 3= b b 3= b = b s b b 3= b = bs 3= b b 3= b = b s b b 3= b = bs 3= b b 3= b = b s b b 3= b = bs 3= b b 3= b = b s b b 3= b = bs 3= b b 3= b = Table. b s is ROT bandwidth for density estimation :84sn =5 b bc 44

46 b b bc d n bias mbias std iqr bias mbias std iqr Table. 45

47 Mean Std Deviation Median Maximum Minimum Guangdong Ln(Ex.c.) (n=600) Ln(Area.c.) Family size Exend.c Sq Meters.c Shaanxi Ln(Ex.c.) (n=500) Ln(Area.c.) Family size Exend.c Sq Meters.c Guangdong Ln(Ex.c.) (n=595) Ln(Area.c.) Family size Exend.c Sq Meters.c Shaanxi Ln(Ex.c.) (n=546) Ln(Area.c.) Family size Exend.c Sq Meters.c Table 3 46

48 b e sesc e b bc e se e b h sesc h b bc h se h b eh sesc eh b bc eh 987 b s b s bs 3= bs 3= Table 4 se eh

49 Figure : 6

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley Elements of Asymtotic Theory James L. Powell Deartment of Economics University of California, Berkeley Objectives of Asymtotic Theory While exact results are available for, say, the distribution of the