New Consistent Integral-Type Tests for Stochastic Dominance

Size: px

Start display at page:

Download "New Consistent Integral-Type Tests for Stochastic Dominance"

Elisabeth Shields
5 years ago
Views:

1 New Consistent Integral-Type Tests for Stochastic Dominance Job Market Paper Chris J. Bennett January 5, 28 Abstract This paper proposes and examines several new statistics for testing stochastic dominance of any pre-specified order. The test statistics, which are one-sided analogues of the classical Cramer-von Mises and Anderson-Darling statistics, are designed to test the null hypothesis of dominance and are consistent against the general alternative of nondominance. For orders of dominance greater than two, the tests are not distribution-free and hence we propose bootstrap methods for calibrating their critical values. Using Monte Carlo methods we compare the finite sample performance of these tests to one another and to alternative tests based on the one-sided Kolmogorov-Smirnov (KS) statistics of Mc- Fadden (1989) and Barrett and Donald (23), and to the integral-type test recently proposed in Hall and Yatchew (25). For tests of first-order dominance, the Anderson-Darling type statistic is shown to have power greater than these alternative tests and yet requires no more computational effort than the least computationally intensive alternative. At higher orders, the one-sided Anderson-Darling statistic is shown to be admissible and also remains attractive from a computational standpoint. We illustrate the use of these tests by applying them to a comparison of income distributions using data from the Canadian Family Expenditure Survey. I thank the members of my thesis committee and participants of the 27 Midwest Econometrics Meetings and UWO Econometrics Working Group for helpful comments. Additionally, I thank Peter Hall and Adonis Yatchew for making available their Fortran code. This work was made possible by the SHARCNET facilities (SHARCNET: Financial support from the Social Sciences and Humanities Research Council of Canada is also gratefully acknowledged. Department of Economics, Social Science Centre, The University of Western Ontario, London, Ontario, N6A 5C2, Canada. cbenne2@uwo.ca

2 1 Introduction Stochastic dominance rules, which define (partial) orderings over a set of probability distribution functions, have attracted substantial interest in economics primarily due to the consistency of these orderings with those obtained by the expected utility criterion. For example, in the canonical case in which a distribution F is said to first-degree (weakly) stochastically dominate another distribution G if the graph of F never lies above the graph of G, denoted F D 1 G, it can be shown that E F (U(X)) E G (U(X)) F D 1 G for any utility function satisfying U >. 1 In words, first-order stochastic dominance of F over G implies that F is ranked superior to G by every monotonically increasing differentiable utility function. Second-degree stochastic dominance, introduced by Hadar and Russell (1969), defines a partial ordering over the space of distributions by comparing integrals of distributions. Formally, F is said to second-degree (weakly) stochastically dominate G if x F (y)dy x G(y)dy for all x in the union of the support of the distributions. The cases of third-, fourth-, and more generally n-th degree stochastic dominance have since been introduced (see, for example, Whitmore (1989) for the development of n-th degree stochastic dominance). Fishburn (1976) proves that in each instance there exists a class of utility functions for which there is a one-to-one correspondence between the ranking implied by stochastic dominance (SD) and that implied by the expected utility criterion. Atkinson (197) and Shorrocks (1983) establish a similar correspondence between SD orderings and those obtained for entire classes of social welfare functions. Moreover, Atkinson (1987) and Foster and Shorrocks (1988a,b,c) extend these ideas to the measurement of poverty, showing that stochastic dominance orderings are consistent with the ranking implied by a large class of poverty indices for any given poverty line. It is precisely this correspondence between the respective orderings which obviates the need for a full parametric specification of preferences or poverty indices and makes stochastic dominance particularly attractive in applications ranging from the ranking of investment alternatives to the ranking of income distributions in terms of poverty, inequality, or social welfare. 2 1 The concept of stochastic dominance and its relation to expected utility originated in the work of Lehmann (1955) and Quirk and Saposnik (1962) 2 See e.g. Levy (1992, 26) for an extensive review of SD with an emphasis on applications to finance, and Davies et al. (1998), Cowell (1998), and Davidson and Duclos (2) for a review of SD and its role in applications to poverty, inequality, and social welfare. 1

3 Despite their theoretical appeal, all orderings based on stochastic dominance suffer from the practical limitation that population cdf s are unobserved and therefore comparisons must be made based on estimated distributions. When the parametric form of the distributions under consideration is assumed, an efficient testing procedure can be developed in terms of the parameter inequalities that fully characterize stochastic dominance between the distributions. On the other hand, when one wishes to avoid making parametric assumptions concerning the distributions under study as is often the case considerable challenges arise in the development of formal statistical procedures due to the fact that the resulting comparison is of two functions, and hence involves a test of an infinite dimensional parameter. The difficulties inherent in such tests, combined with the desire to relax parametric assumptions, have motivated considerable research into nonparametric procedures for testing stochastic dominance. 3 In this paper we contribute to this literature by proposing new nonparametric tests for any pre-specified order of stochastic dominance. Specifically, we propose integraltype tests for the hypothesis that stochastic dominance holds between a given pair of distributions, e.g. F weakly dominates G. 4 The statistics are one-sided counterparts of the classical Cramer-von Mises and Anderson-Darling statistics see Pettitt (1976) and references therein and as such are computed using a measure of the area between the respective curves over the regions for which the curve associated with F lies above the curve associated with G. Intuitively, this area can be regarded as the sample evidence against the hypothesis of dominance. Consideration of the integral-type tests in the context of dominance testing is in large part motivated by findings in the related literature on goodness-of-fit testing, where it is shown that tests which measure the difference over a range rather than the maximum difference at a single point, as is the case with the popular Kolmogorov-Smirnov (KS) tests, often exhibit greater power (see e.g. Stephens (1974), Kendall and Stuart (1979), Stephens (1986), and Schmid and Trede (1995)). Indeed, as we demonstrate in this paper, the basic 3 Work along these lines includes, among others, Beach and Richmond (1985) McFadden (1989), Kaur, Prakasa Rao, and Singh (1994), Anderson (1996), Barrett and Donald (23), Linton et al. (25), and Davidson and Duclos (1997, 2, 26),. 4 Tests with dominance under the null have been criticized by Levy (1992), Fisher et al (1996), and Davidson (26) for their inability to reach conclusive alternatives when the null is rejected. These tests, however, can be combined to yield a bidirectional test capable of distinguishing between equality, dominance (in either direction), and non-comparability (i.e. crossing of the respective curves). Moreover, the power of these one-sided tests, which is the focus of this paper, plays an important role in the power and hence misclassification rates of the bidirectional test; see Section 3.2 for more details. 2

4 results on the relative power of various statistics from the goodness-of-fit literature appear to extend to the context of dominance testing. In contrast to our statistics, the integral-type tests for stochastic dominance of Deshpande and Singh (1985) and Eubank, Schenchtman, and Yitzhaki (1993) are computed as the area between the curves over the entire support of the distributions. In their tests, the statistic can be zero in the population despite the null of dominance being false, and hence modifying the range of integration proves to be crucial for obtaining consistency against the more general alternative of nondominance. Hall and Yatchew (25) also propose a one-sided integral statistic in the context of testing for stochastic dominance which does satisfy consistency against the general alternative of nondominance; however, in addition to consistency, our tests are shown to be capable of yielding higher power than any other consistent testing procedure we know of within this testing framework, and are no more computationally intensive than the least costly alternative. While stochastic dominance has been applied in many areas of economics and finance, our focus in this paper is on the application of dominance orderings for the comparison of income, wealth, and earnings distributions for the purpose of constructing social welfare and poverty rankings. Throughout the paper we assume that the samples under consideration are mutually independent and generated by continuous distributions with compact support. To be consistent with most applied settings, we also allow for the samples to be of different sizes. Our theoretical framework is thus identical to that considered by Barrett and Donald (23). We note that the recent literature on testing for stochastic dominance using the Kolmogorov-Smirnovtype statistic has been extended to the case of temporally and mutually dependent samples (e.g. Linton et al., 25) and to non-compact supports (e.g. Horváth et al., 26). The theory that we develop here on integral-type statistics can also be generalized in these directions, but such an exercise is beyond the scope of this paper. Before concluding this section we note the connection between tests of stochastic dominance and of distributional hypotheses in the treatment effects literature. In particular, our tests are also applicable to assessing the distributional consequences of a treatment on some outcome variable of interest when treatment intake is (possibly) non-randomized but there is a binary instrument available for the researcher. Abadie (22) shows how instrumental variable methods can be used to estimate the counterfactual cumulative distribution functions of the outcome with and without the treatment. Based on these results we can test distributional hypotheses of the treatment effects simply by utilizing these counterfactual distributions in our testing procedure. 3

5 The remainder of the paper is organized as follows. In the next section we provide a brief review of the literature on statistical inference for stochastic dominance and select results from the related literature on the measurement of poverty and income inequality. In sections 3 through 5, we present our tests for stochastic dominance, develop the relevant asymptotic theory, and discuss bootstrap procedures for calibrating the tests. Section 6 offers some Monte Carlo evidence concerning the finite sample properties of the tests. In section 7 we apply the tests to Canadian income data from the Survey of Family Expenditures, and show that our proposed test is able to resolve more dominance comparisons than the popular KS test. Lastly, we discuss some possible extensions of our work and conclude in section 8. 2 Inference for Stochastic Dominance: Literature Review The desire to avoid making parametric assumptions and the difficulties that inherently arise in testing hypotheses about infinite dimensional parameters, i.e. functions, has motivated considerable research into statistical procedures for testing stochastic dominance. Differing approaches to the formulation of an appropriate null and alternative hypothesis combined with advances in statistical methods for infinite parameter problems have together resulted is what is now quite a sizeable literature. In this section we provide a brief review of some of the many contributions, with particular attention being given to tests based on functionals of the distribution functions under consideration. The following notation will prove useful. Let F j denote the sth integral of a distribution F ; that is F (s) (x) = x F (s 1) (y)dy, j 2, (1) with the convention that F = F 1. The rule for sth degree stochastic dominance can now be stated in terms of this iterated integral as follows: Definition 1. For s 1, F degree-s (weakly) stochastically dominates G if F (s) (x) G (s) (x), x S Throughout this paper we will write F D s G to signify that distribution F sth degree stochastically dominates distribution G. Using an inductive argument involving 4

6 integration by parts it can be shown that (1) can be written in the equivalent form F (s) (x) = 1 (s 1)! x (x y) s 1 df (y) (2) This reformulation, which can be traced back to Fishurn (1976), shows that F (s) can be viewed as a statistical functional of F and therefore suggests using as an estimator of F (s) (x) the right-hand side of equation (2) with F replaced by a suitable parametric or nonparametric estimator ˆF. Note that by choosing ˆF to be the nonparametric empirical distribution (2) reduces at each point x to a simple sample average. Roughly speaking, the literature on dominance testing divides into four categories: the first divide occurs between tests that are formulated with a null hypothesis of dominance and those that posit a null of nondominance. While the former is the more common approach in the stochastic dominance literature, see, for instance, McFadden (1989) and Davidson and Duclos (2), testing with nondominance under the null is proposed in Kaur, Prakasa Rao, and Singh (KPS) (1994) and is advocated more recently in Davidson and Duclos (26). Further, among tests with dominance under the null (and similarly for the nondominance specification), the literature divides between test statistics that are constructed based on a finite grid comparison as in, among others, Anderson (1996), Davidson and Duclos (2), and those that are constructed based on a real-valued functional whose value is zero for all distributions in the null and strictly positive for all distributions in the alternative; see for example the Kolmogorov-Smirnov type tests of McFadden (1989) and Barrett and Donald (23). The former have distributions which are easier to characterize but also introduce the possibility of test inconsistency as they impose only a subset of the restrictions under the null. The latter tests, on the other hand, have the advantage of being consistent against general alternatives but have analytically intractable limiting distributions and therefore generally require simulation or bootstrap methods for estimating the appropriate critical values. The KPS test is based on a null of nondominance. Defining ˆF (s) (x) to be the obvious estimator of (2) obtained through integration with respect to the edf of the sample, it follows that in the least favourable case to the null Z m,n (x) = ˆF (s) (x) Ĝ(s) (x) V ar( ˆF (s) (x) Ĝ(s) (x)) d N(, 1) for all x. The KPS decision rule is then to reject the null if the infimum of Z m,n (x) over the support exceeds the α-level critical value associated with the standard normal distribution. The resulting test is conservative. Also, it is possible to have a 5

7 distribution dominate another distribution almost everywhere and to fail to reject the null hypothesis, for any sample size. Anderson (1996) develops an analogue of the Pearson goodness of fit test. The test is developed by partitioning the support into k cells and considering statistics derived from the k 1 vector Z where n m Z j = 1{X i s j } 1{Y i s j }, j = 1,..., k i=1 i=1 On the boundary of the null of dominance, Z is an asymptotically mean-zero multivariate normal vector. Defining M to be a k k lower triangular matrix of ones, a test for first-order dominance is based on the the k 1 statistic MZ. Following the convention in Bishop, Chakraborti, and Thistle (1989), Anderson infers dominance only if at least one of the k statistics is significant and no other statistic is both significant and of the opposite sign. Tests for higher orders of dominance are obtained also by premultiplying Z by appropriately defined matrices. The critical values for determining significance are obtained from the Student Maximum Modulus distribution. A drawback of the Anderson test is that the choice of an appropriate partition is not obvious. Also, the test, being based on a probabilistic inequality is conservative. In one of the earliest papers on testing for stochastic dominance, Desphande and Singh (1985) propose an integral type statistic to test H : F = F H 1 : F stochastically dominates F in the second order sense The test is of the one-sample variety (i.e. F is assumed known), and the particular statistic proposed is D n = d n (x)df (x) where d n (x) = x (F n (t) F (t))dt Their statistic is not distribution free, but DS show an appropriately scaled and centered version of it to be asympotically normally distributed. Eubank, Schechtman and Yitzhaki (1993) extend the work of DS to the two-sample case. Their statistic takes the form D nm = 1 ( d nm (x)dg m (x) + 2 ) d nm (y)df n (y) where d nm is the same as before except with F replaced by the edf G m (t). An unattractive feature of these integral tests, and perhaps the reason they have been largely abandoned, is the potential inconsistency against the less restrictive and 6

8 more attractive alternative of nondominance. In other words, the property of consistency is lost under the more desirable formulation with dominance under the null and nondominance as the alternative. In fact, Schmid and Trede (1998) demonstrate explicitly the existence of a distribution in the more general alternative of nondominance for which D mn is asymptotically zero. Concentrating on tests for first and second order dominance in the case where both c.d.f. s are unspecified, McFadden (1989) proposes a one-sided Kolmogorov-Smirnovtype (KS) statistic for testing H : F (s) G (s) s=1,2 H 1 : H In particular, for s = 1, 2, the statistic is ( n sup F (s) n (x) G (s) n (x) ) where the supremum is taken over all points in the common support which, in his paper, is assumed to be the interval [, 1]. Although technically speaking the test involves a comparison over all points in the support and hence at an infinite number of points, the test is feasible to implement since the supremum must occur at one of the observed points in the pooled sample, and thus at most 2n points need to be checked in order to verify its location. McFadden shows that the test for first-order dominance is distribution free and provides both exact and asymptotic distributional results. A simulation procedure is suggested for obtaining critical values in the case of second-order stochastic dominance. Schmide and Trede (1998) provide what is essentially a more careful treatment of the asymptotics of McFadden s test for second-order stochastic dominance albeit in the one-sample setting. In particular, the authors provide a rigorous derivation of the (analytically intractable) limiting distribution of the test statistic as well as a proof of consistency against the general alternative of nondominance. It is interesting to note that this work appears to have developed as a separate strand of literature, independent of the contributions made by McFadden. Barrett and Donald (23) extend the original work of McFadden to testing for stochastic dominance at all orders. The formulation of the null and alternative are the same as in McFadden (1989) and the statistics again take the form of a one-sided Kolmogorov-Smirnov statistic. The distributions for tests above first-order are not distribution free. Consequently, BD propose both simulation and bootstrap methods for the estimation of critical values. Consistency of the tests for any pre-specified order of dominance is also established. 7

9 Also building on the test of McFadden (1989), Klecan et al. (1991) propose a test for the hypothesis that the random variables or prospects in a a given set are firstdegree (second-degree, resp.) maximal, i.e. that no prospect in the set is first-degree (second-degree, resp.) weakly stochastically dominated by another prospect in the set. The test statistic is based on the fact that d = min sup i j x [F i (x) F j (x)] > if the prospects are first-degree stochastically maximal and s = min sup [F (2) i j x i (x) F (2) j (x)] > if the prospects are second-degree stochastically maximal. The corresponding test statistics are the sample analogues which are merely multivariate versions of the standard one-sided Kolmogorov statistics. The KS-type statistic have proven to be quite popular and the papers cited above have since been extended in several directions. Horvath et al (26) tackle the assumption of compact supports and show that the general features of the KS-type tests can be maintained without the compactness assumption by introducing an appropriately defined weight function. Linton et al. (25) generalize these results substantially in the K-prospect case by considering tests of (residual) stochastic dominance that allow for general dependence amongst the prospects, and for the observations to be non-i.i.d. The authors of the latter paper also propose an innovative subsampling procedure for calibrating the tests which does not impose the least favourable case. 3 Hypotheses In most practical situations it is unreasonable to assume perfect knowledge, or even the parametric form, of either of the distributions being compared. Instead, comparisons must be made when only empirical distributions are observed. In this section we develop new nonparametric statistical procedures to test the hypothesis that stochastic dominance holds between a given pair of distributions. Throughout this section we consider tests based on a null of dominance; that is, if F and G are the distributions under consideration, we assume under the null either that F D j G or that GD j F. Dominance under the null is the conventional approach to statistical tests of SD (see, for example, McFadden 1989 and Anderson 1996). The advantage of such a test for j 2 is that rejection of, say, F D j G leads to a rejection of F D i G for all i j, therefore allowing us rule out dominance of F over G at any lower 8

10 order. The disadvantage is that rejection of the null does not lead to any conclusive alternative. It could be the case, for instance, that rejection of the null implies that the distribution is dominated. Equally plausible, however, is that the distributions are not comparable. This would be the case whenever the population functions cross at some point. In principle, the role of F and G can be reversed and the resulting test can be combined with the original and used to distinguish between these alternatives. Alternatively, a test procedure and a corresponding decision rule can be developed to try and infer both if dominance is present and the direction of dominance. Such a strategy is taken in Bishop, Formby, and Thistle (1992) and in Knight and Satchell (26), and is also used implicitly in Anderson (1996) and Barrett and Donald (23). We discuss the so-called bidirectional test and the connection to our proposed tests later in this section. 3.1 Testing when dominance is assumed under the null Let X and Y be nonnegative random variables, and let F and G denote the distributions of X and Y, respectively. We make the following assumptions concerning the distributions F and G. Assumption 1. (i) F and G have common support S; (ii) F and G are continuous functions on S. We assume that the common support of S is the interval [, 1]. As long as the random variables are bounded they can be shifted and rescaled to lie in the unit interval without loss of generality. The compact support assumption will be necessary here for the integrability of the test statistics for stochastic dominance above firstorder. This assumption can be relaxed by introducing an appropriate weight function into the integral, however we leave such considerations to future work. For several tests of F D 1 G that we propose the compactness assumption can be relaxed and indeed we can take S = R. Unless otherwise stated explicitly however Assumption 1(ii) is assumed throughout the paper. Thus, our assumption on the support is consistent with that of McFadden (1989) and Barrett and Donald (23). The hypothesis of F D s G is equivalent to F (s) (x) G (s) (x) for every x [, 1] and F (s) ( ) G (s) ( ). The hypothesis is compound since F D s G is true for many distributions F with G fixed; and the probability of rejecting the null hypothesis when it is true is greatest in the limiting case F G. Following McFadden (1989) we define 9

11 the significance level of the test to be the supremum of the rejection probabilities for all cases satisfying the null. This has the effect of making the null hypothesis H : F (s) (x) G (s) (x) for x [, 1], against H 1 : F (s) (x) > G (s) (x), for some x [, 1], with the significance level equal to the probability of rejecting H when F G. Let P denote the set of all continuous distributions on S, and define the sets P (s) = {(F, G) P P : F (s) (x) G (s) (x) for all x S} and P (s) 1 = P\P (s). Here P (s) denotes the set of all distribution pairs (F, G) such that F D s G holds, and P (s) 1 denotes the set of ordered pairs (F, G) such that we have nondominance, i.e. F D s G. To simplify the formulation of the hypothesis it is natural to seek a functional θ : P 2 R + that satisfies θ(x) = for all x P (s) and θ(x) > otherwise. Having defined any such functional the hypothesis of dominance can be reformulated as If, say, we define H (s) : θ(f, G) =, H (s) 1 : θ(f, G) >. θ (s) KS (F, G) = sup ( F (s) (x) G (s) (x) ) (3) then we recover the functional considered in McFadden (1989) and Barrett and Donald (23) as a measure of the discrepancy or distance from the null hypothesis of dominance. In this paper we propose basing tests on the alternative class of discrepancy measures given by θ (s) CV (F, G) = S ( F (s) (x) G (s) (x) ) 2 + ψ(w (x))dw (x) (4) where (x) + max{x, }, W : S R is a monotonically increasing bounded differentiable function, and ψ(t) is a continuous nonnegative weight function. Intuitively, θ (s) CV (F, G) provides a measure of the total area between the curves F (s) and G (s) over the region for which F (s) (x) G (s) (x). Under the null hypothesis in which F (s) (x) G (s) (x) for all x S this area is of course zero, whereas under the alternative it is strictly positive. 5 We point out that if we set W (x) = x, then (4) reduces to a simple Riemann integral with weight function ψ(x), which is among the general class of functionals 5 We note that functionals other than the squared difference between the distributions could also be considered here 1

12 considered in Hall and Yatchew (25). In particular, if we select ψ(x) = 1, then we recover the exact functional that is investigated in their paper in the context of testing for stochastic dominance. It follows, therefore, that our our class of discrepancy measures is a generalization of the Hall and Yatchew (25) class. As we demonstrate in this paper, this generalization leads to statistics that possess greater power than either the Kolmogorov-Smirnov test of McFadden (1989) or the integral-type test of Hall and Yatchew (25) 6, and yet are no more computationally demanding than the least computationally intensive alternative. 3.2 Simple bidirectional tests The hypotheses as formulated above are ones in which F is assumed to (weakly) stochastically dominate G under the null. This forumation is common in the literature on testing for stochastic dominance, yet it is generally more desirable to design the null and alternative in such a way that the hypothesis which you are trying to confirm, i.e. dominance, is the alternative hypothesis. In a recent paper, Davidson and Duclos (26) propose a test based on the empirical likelihood where nondominance is the null hypothesis and dominance, i.e. F D j G, is the alternative. Their test, like virtually all tests proposed in the literature, presupposes a hypothesis about the direction of dominance. In practice, however, there may be no reason a priori to believe that a particular distribution dominates another and hence it may be more desirable to consider testing for dominance in both directions. It is the general problem of bidirectional testing and its connection to the one-sided tests that we now discuss. In principle a bidirectional test for dominance can be performed by switching the role of each distribution in a given test of unidirectional stochastic dominance. Specifically, one performs separate tests of both F D j G and GD j F and then classifies the relationship between F and G, i.e. as that of equality, dominance, or incomparability, based on the outcomes. Such a procedure is proposed in Bishop, Formby, and Thistle (1992) and also in Knight and Satchell (26). These tests, however, do not control for the increased likelihood of rejecting equality that results from ignoring the multiplicity of the test. In other words, by performing the tests independently of one another we can expect the probability of rejecting at least one of the hypothesis to exceed the size of the individual tests, and hence the probability of misclassifying a result as, say, that of dominance in a given direction when in fact equality holds, is greater than the nominal size of the individual tests; see Chapter 9 of Lehmann and 6 To be fair, the focus of Hall and Yatchew (25) is not on tests for stochastic dominance, but rather on the more general theory of testing functional hypotheses 11

13 Romano (25) for more on multiple testing problems. An alternative formulation of the bidirectional test that allows us to better control the misclassification error rates is the sequential test proposed in Bennett (27). Using the notation of the previous section define θ 1 = θ (s) (F, G), and θ 2 = θ (s) (G, F ), where θ is assumed to be either of the functionals discussed in the previous section. Further, define the parameters γ 1 = max{θ 1, θ 2 } and γ 2 = min{θ 1, θ 2 }. Clearly, γ 2 γ 1. Note that only in the case where F = G do we have γ 1 =. Moreover, when the population curves it must be the case that γ 2 >. In the intermediate cases the combined values of γ 1 and γ 2 can be used to distinguish among the alternatives F D s G and GD s F. The values of γ 1 and γ 2 can thus be used to formulate a sequential bidirectional test as follows: First, we test H : (Equality) γ 1 = H 1 : H 1a H 1b H 1c where H 1a : (GD s F ) γ 1 >, γ 2 =, θ 1 > θ 2 H 1b : (GD s F ) γ 1 >, γ 2 =, θ 2 > θ 1 H 1c : (Incomparable) γ 2 > If we fail to reject the null hypothesis, then there is no statistical evidence to reject equality of the distributions and hence no reason to pursue further testing for dominance. If however we reject the null of equality, then we proceed to test H : γ 2 = H 1 : γ 2 > 12

14 By considering the combination of rejection in the first stage and rejection/acceptance in the second it is possible to distinguish between the various alternatives of interest. Of course, for any given test statistic the results of the first and second tests will generally be correlated with one another and this correlation needs to be accounted for in order to properly control the size of the tests. Bennett (27) proposes a bootstrap procedure that accounts for the possible correlation and thus enables control of the size of the second test conditional upon the rejection of the first. The resulting procedure provides greater control over the misclassification error rates of the overall test. Instead of pursuing the details of either bidirectional test further, we instead refer the reader to the aforementioned papers and emphasize that θ(f, G) and hence the statistic used to estimate θ(f, G), will enter into any such procedure. More importantly, the power of the unidirectional statistic to detect departures from the boundary of the null of dominance will play a crucial role in determining the misclassification error rates of any bidirectional testing procedure of the nature discussed in this section. These considerations highlight the importance of developing powerful one-directional tests for dominance, and it is precisely the suitable choice of θ(f, G) and a suitable estimator that we return our attention to for the remainder of the paper. 4 Test Statistics and their Asymptotics Let X 1,..., X m is a random sample from F (x) and Y 1,..., Y n be an independent random sample from G(x). Both F (x) and G(x) are unknown. We will require the following assumptions on the distributions and the sampling process: Assumption 2. The sampling process is such that n/n λ (, 1), where N = n + m. Under the conditions of Assumption 1. the empirical distribution functions of the X and Y samples, denoted F m and G n, are unbiased and consistent estimators of the respective distribution functions. Moreover, Assumptions 1 and 2 together imply that the pooled empirical distribution H mn = 1 N {mf m + ng n } consistently estimates the pooled distribution function H(x) = λf (x) + (1 λ)g(x). Note that on the boundary of the null where F G, the pooled distribution H 13

15 simplifies to F. Furthermore, it is also well known that F (s) n (x) = = 1 (s 1)! x 1 m(s 1)! (x y) (s 1) df n (y) (5) m (x X i ) (s 1) 1(y < X i ), (6) i=1 An analogous expression is of course available for G n with the Y sample in place of the X sample. These results when taken together suggest using as empirical analogues of the discrepancy measure θ (s) CV statistics of the form ˆθ (s) CV = nm ( D (s) N mn(x) ) 2 ψ(h + N(x))dH N (x) S where D mn(x) (s) = F n (s) (x) G (s) m (x). For the weight function we consider ψ(x) = [x(1 x)] q, for values of q. Specifically, we consider the case q =, which results in the statistic C (s) mn = nm N S ( D (s) mn(x) ) 2 + dh N(x); and we also treat the cases q (, 2],for s 1. For the cases in which q, we define A (s) mn = nm N S ( ) 2 D mn(x) (s) + [H N (x)(1 H N (x))] q dh N (x), The integrand in A (s) mn does not exist at the largest observation in the combined sample. Consequently, we define the integrand to be zero at the largest observation. The statistics C mn (s) and A (s) mn are simply Riemann-Stieltjes integrals of the (weighted) difference function with respect to the pooled empirical distribution function, and can thus be seen to be straightforward one-sided counterparts of the two-sample Cramer- Von Mises and Anderson-Darling statistics (c.f. Anderson (1962) and Pettitt (1976)). Since these two statistics use as their weight function the e.d.f. of the ordered and pooled sample, the integral can be rewritten as a simple average and the statistic can be computed with a single pass through the sample data. Specifically, the expressions for C mn (s) and A (s) mn when q = 1 simplify to (see Appendix for details) C (1) mn = 1 N 2 1 mn N (M i N ni) 2 + i=1 A (1) mn = 1 N 1 (M i N ni) 2 + mn i(n i) i=1 14

16 where, borrowing from Pettitt (1976), M i is defined as the number of X s less than or equal to the ith smallest in the combined sample, that is M i = nf n H 1 N (i/n), where H 1 N (t) = inf{x : H N(x) = t}. For use in comparison we also consider the statistics and S (s) mn = ( nm B mn (s) = N nm N sup ( D (s) mn(x) ), x S ) p/2 S ( D (s) mn(x) ) p + dx, for p = 1, 2. The statistic S (s) mn is that considered by McFadden (1989) and Barrett and Donald (23); and when p = 2 the statistic B (s) mn is none other than that considered in Hall and Yatchew (25). In the case of this latter statistic B mn, the empirical distribution function that is used in the other integral statistics is replaced with the identity function and we obtain the standard Riemann integral. Not surprisingly, the statistic B (s) is more computationally intensive. With the aid of (5) the desired integral can be written as a simple sum. Formally, define I = {(l, u) R : D mn (x) x ((l, u), ɛ > y (l ɛ, u + ɛ)d mn < } and let C(I) denote the cardinality of I. Certainly, for N finite C(I) is also finite. Letting I i = (l i, u i ) denote the ith interval in I, in the case p = 1 we can write C(I) nm ( B mn (s) = D (s+1) mn (u i ) D mn (s+1) (l i ) ). N i=1 We have a closed-form expression for the summand, and hence the computation of the statistic in this case is rather straightforward once the bounds of integration are identified. Note that when p = 2 no such closed-form is possible and we must resort to numerical integration. The remaining difficulty, therefore, is the identification of the bounds of integration, i.e. the l i and u i. These points, however, can be identified with a single pass through the pooled sample. It can be shown that at most a single root of the function D (s) mn(x) can occur between any two adjacent points in the pooled sample. One pass through the sample provides a simple means of bracketing the roots and a call to a numerical root finder such as the rtbis Fortran routine of Press et al. (1992) yields the desired estimate of the root location. A potentially attractive feature of the CM and AD type statistics for testing at first-order is that the asymptotic null distributions are distribution-free. This allows us to approximate the finite sampling distribution using tabulated critical values, and 15

17 hence allows us to avoid potentially time consuming simulation based approaches. A characterization of the limiting distributions is the content of Proposition (1) below. For simplicity we state the limiting distribution of A (1) mn for the case q = 1 with the more general case being treated in the appendix. Proposition 1. Suppose Assumptions 1 and 2 hold with S R. Then, (i) and 1 C mn (1) d B 2 (t)dt (ii) A (1) mn 1 d B 2 (t) t(1 t) dt, in the least favourable case under the null, where B denotes the standard Brownian Bridge process on the unit interval. Proof. See Appendix II McFadden s statistic is also distribution free at first-order with 7 ( ) lim m,n P (S(1) mn > c) = P sup B(p) > c ρ [,1] = exp( 2c 2 ). The distribution-free statistics are desirable in that they allow us to use tabulated critical values. However the availability of tabulated critical values becomes less of an issue with the development of bootstrap and other simulation methods for the determination of critical values, particularly when we take into consideration the potential for increased power through the use of these methods in place of standard asymptotics. We must also consider whether the necessary constraints imposed on the design of the test statistics to obtain distribution free statistics i.e. confining ourselves to the class of integrals of a weighted difference function with respect to an edf has serious implications on the power of the resulting tests. by It will be useful in what follows to define the functional T j : [, 1] D[, 1] C[, 1] T j (x, F ) = x F j 1 (y)dy. Under assumption (1) it is well known (see Billingsley, 1968) that 7 See McFadden (1989) for further details 16

18 N(FN F ) B F F and M(GM G) B G G where B F F and B G G are independent Brownian Bridge processes. Using the continuity of the mapping T j, Barrett and Donald (23) establish an explicit characterization of the limiting distribution of the scaled and centered integral N(T j (, F n ) T j (, F )). We state their result as Lemma (1) below: Lemma 1. Under Assumption (1) and for j 1, N(Tj (, F n ) T j (, F )) T j (, B F F ) in C([, 1]) (the space of continuous functions on [, 1] where the limit process is mean zero Gaussian with covariance kernel given by (for x 2 > x 1 ) Ω j (x 1, x 2 ; F ) = E(T j (x 1 ; B F F )T j (x 2 ; B F F )) = j 1 θ j l l= 1 l! (x 2 x 1 ) l T 2 l 1 (x 1 ; B F F ) T j (x 1 ; B F F )T j (x 2 ; B F F ) where θ j l = ( 2j l 2 j 1 ) The lemma will prove useful in developing the asymptotic properties of the test statistics proposed herein. The availability of the covariance function also enables the development of a procedure for simulating the quantiles (or other characteristics) of the null distribution. Indeed, Barrett and Donald propose several simulation procedures using the results of the lemma combined with the exploitation of a multiplier central limit theorem as in Hansen (1996). In principle a similar procedure can be developed for the integral type statistics proposed herein, but we do not pursue the details of such a procedure here. We are now in a position to state our main result concerning the limiting distribution of the integral statistics: The asymptotic null distributions of C mn, (s) A (s) mn, and S mn (s) in the least favourable case for s 2 are summarized in Proposition (2) below: Proposition 2. Suppose Assumptions 1 and 2 hold. Then, for s 2 the asymptotic null distributions in the least favourable case are 17

19 (i) (ii) (iii) C (s) mn d Cs = 1 T 2 s (x; B(F ))df 1 A mn (s) Ts 2 (x; B(F )) d Ā s = [(F (x)(1 F (x))] df q S mn (s) d Ss = sup T s (x; B(F )) x [,1] for q [, 2). Proof. See Appendix II A proof of (iii) is available in Barrett and Donald (23), however a much simpler alternative proof presented in Appendix I. The key step in obtaining the simplified proof is recognition of the fact that λb 1 (F ) + 1 λb 2 (G) possesses the same distribution as B 1 (F ) in the least favourable case where F = G. Combining this with the linearity of the operator T and well-known weak convergence results for the underlying processes completes the proof. Proposition 3. Suppose Assumptions 1 and 2 hold. Then, for s 1 and p = 1 the asymptotic null distributions in the least favourable case are for s 1. B (s) mn d BF s T j+1 (1; B(F )) Proof. See Appendix II In all of the cases considered above, a large value of the test statistic represents evidence against the null hypothesis. Accordingly, letting T mn (s) {A (s) mn, B mn, (s) C mn, (s) S mn} (s) the test takes the form Reject H (s) if T mn (s) c(α), where c(α) denotes the α-level critical value associated with the asymptotic null distribution, i.e. lim P ( T mn (s) c(α) ) = α. m,n In the case of testing for stochastic dominance at first-order we can rely on Proposition (1) to obtain the appropriate asymptotic critical values. For any other case, the asymptotic distributions are a function of the unknown distributions being compared 18

20 and hence we must rely on simulation or bootstrap procedures to consistently estimate the appropriate critical values. The following theorem demonstrates that every one of the tests are consistent against the general alternative of nondominance. For (s) conciseness we state the theorem using the notation T mn and T (s), where it is to be understood that T mn (s) {A mn, (s) B mn, (s) C mn, (s) S mn} (s) and T (s) is the corresponding random (s) variable found in the above Propositions for which T mn d T (s). Proposition 4 (Consistency of the Integral Tests). Given Assumptions 1 and 2, then: (i) if H (s) is true, (s) lim P (T mn > c(α)) P ( T (s) > c(α) ) α m,n with equality when F (x) = G(x) for all x [, 1] (ii) if H (s) is false, (s) lim P (T mn > c(α)) = 1 m,n The results of the proposition provide justification for using the critical values obtained from the asymptotic distributions to test the hypothesis of interest. As it stands, however, only in the case where s = 1 and for only a select few statistics is the test operational. In all other cases we must resort to some form a resampling procedure to estimate the appropriate critical values. It is precisely the development of such a procedure that we discuss in the next section. 5 Test Calibration We have established consistency of our proposed test statistics under the assumption that we can compute the critical values associated with the limiting distributions of these tests. However, other than for A (1) mn, C mn, (1) and S mn (1) the limiting distributions will be dependent on the underlying unknown distributions, and hence the asymptotic critical values cannot be computed exactly but instead must be estimated by simulation based methods. In this section we discuss several different bootstrap procedures for performing this estimation, and we show that in each case the desirable property of consistency is maintained by the procedure. 19

21 5.1 Bootstrap Methods The key to the implementation of bootstrap methods for the purpose of obtaining estimated p-values is ensuring that the null is imposed under the bootstrap data generating process. 8 In the context treated here there are several different ways of accomplishing this. We first outline one such bootstrap procedure where two independent random samples are drawn from the pooled observations. Then we consider bootstrap procedures that are based on resampling from the individual samples. Each of these procedures has been applied previously by Barrett and Donald (23) to their statistic S mn. (s) Conditional on the samples X = {X i } m i=1 and Y = {Y i } n i=1, let Z N denote the pooled sample, i.e. Z N = {Z 1,..., Z m+n } where Z i for 1 i N is a distinct observation from X Y. 9 Additionally, let H N and Ĥ n denote the empirical distribution function (edf) of the original sample and the edf of a sample of size n generated from H N. More precisely, Ĥ n can be viewed as the edf of a sample of n observations where each observation is generated from a sequence of independent draws from a multinomial distribution with distinct outcomes (Z 1,..., Z N ) and probabilities p i = 1/N, i = 1,..., N. By using Monte Carlo methods we can repeatedly construct samples in this way, and hence obtain an entire sequence {Ĥ n1, Ĥ n2,..., Ĥ nb }, where each Ĥ n1 is a bootstrapped version of the edf of a sample of size n and B is the number of bootstrap replications. Using this procedure we can define the ith pooled bootstrap versions of the statistics A mn, (s) B mn, (s) C mn, (s) and S mn (s) by and C (s) mni = nm N A (s) mni = nm N S B (s) mni = nm N S (s) S ( D (s) mni (x) ) 2 + dh Ni, (7) ( ) 2 D (s) mni (x) + HNi (x)(1 Ni, (8) H Ni (x))dh S mni = nm N sup S ( D (s) mni (x) ) + ( D (s) mni (x) ) dx, (9) +, (1) where D (s) mni (x) = H(s) ni (x) H (s) mi (x). Note that the edf s H (s) ni (x) and H (s) mi (x) are obtained as independent random samples from the same distribution H N and hence the bootstrap data generating process replicates the least favourable case under the null. 8 See Mackinnon (27) for a discussion of bootstrap methods for testing hypotheses. 9 There are no ties in the data w.p.1. since the distributions are continuous. 2

22 An alternative to the pooled bootstrap procedure consists of independently generating bootstrap samples X and Y, each from the respective samples X and Y. In particular, X is a random sample of size n drawn with replacement form X, and Y is obtained in an analogous way from Y. Let F ni and G mi be the ith empirical distribution functions generated in this manner. The ith two-sample bootstrap versions of the various statistics takes the same form as (7) through (1) with the exception that D (s) mni (x) = ( F (s) ni ) (x) G (s) mi (x) ( F n (s) (x) G (s) m (x) ) (11) and HNi = 1 {mf N ni +ng mi}. The function D (s) mni (x) now contains an additional term, the difference between the empirical distributions of the original sample, which is necessary in order to recenter the test for the purpose of ensuring that the boundary case of the null is imposed asymptotically. Without the recentering, the bootstrap statistics will diverge to positive infinity whenever the population distributions F and G are unequal implying that critical values or p-values cannot be obtained from the procedure, at least in any meaningful sense. It is also possible to develop a third bootstrap procedure that involves resampling from only one of the original samples. If, say, we choose to work with X, then the one-sample bootstrap statistics again takes the same form as (7) through (1) with the exception that D (s) mni (x) = ( F (s) ni ) (x) F (s) (x) and H Ni = F ni. It turns out that the choice of sample can have severe consequences for the size and power properties of the resulting bootstrap test, a fact which is best illustrated by considering the statistic B mn (s) in the case s = 1. For concreteness, suppose that the maximum in the Y sample exceeds the maximum in the X sample. The range of integration of B mn (s) is determined by the maximum in the pooled samples and hence here by the maximum in the Y sample. On the other hand, the range of integration of the bootstrap statistic B (s) mni is determined by the maximum in the X sample if we resample only from X. It would therefore possible to obtain significant p- values from such a bootstrap resampling scheme when either the pooled, two-sample, or one-sample bootstrap procedure from the Y sample would suggest that the value of the statistic were insignificant. This point suggests that if the one-sample bootstrap is to be used that the sample from which the maximum in the pooled sample is drawn should be used in the resampling procedure. Suppose T mn (s) {A (s) mn, B mn, (s) C mn, (s) S mn} (s) and T (s) mni is the value of the corresponding bootstrap statistic from (7) through (1) using either the pooled, two-sample, or one- n (12) 21

23 sample procedures. Since rejection of the null occurs when T (s) mn is large, the bootstrap test can be summarized as reject H j o if p < α, where α is the nominal size of the test and the bootstrap P value, denoted by p, is computed from p = 1 B B I i=1 ( ) T (s) mni > T mn (s). (13) We denote by p 1, p 2, and p 3 the one-sample, two-sample, and pooled bootstrap P values associated with T (s) mn that is obtained from (13). The asymptotic properties of the various bootstrap tests are stated formally in Proposition (5) below. Proposition 5. Let Assumptions 1,2 hold and assume that α < 1/2; then a bootstrap test for s-th order stochastic dominance based on T mn (s) {A (s) mn, B mn, (s) C mn, (s) S mn} (s) using the decision rule reject H (s) if p j < α, for any j {1, 2, 3} satisfies the following: lim P (rejecth (s) ) α if H (s) is true, lim P (rejecth (s) ) = 1 if H (s) is false. Thus, according to the theorem, the resulting bootstrap tests asymptotically (i) are correctly sized on the boundary of the null, (ii) have actual size less than the nominal size of the test for all distribution pairs that are strictly in the null, and (iii) reject the null with probability one for all distribution pairs in the alternative. 6 Finite Sample Properties In this section we conduct a small scale Monte Carlo experiment to assess the finite sample performance of the various test statistics proposed in the paper. The objective of this analysis is (i) to examine whether the tests are correctly sized under the null; and (ii) to demonstrate admissibility of the integral type tests within the existing class of consistent tests for stochastic dominance. Like any Monte Carlo design of this type the properties of the tests are examined within parametric classes of distributions. Since the choice of both the parametric class and the particular parameter values can influence the results one must exercise care in the design of the experiments in order to avoid criticism of attempting to rig the performance evaluation. In consideration of this, we have chosen to work with an already published design, namely that used 22

Econometrica, Vol. 71, No. 1 (January, 2003), CONSISTENT TESTS FOR STOCHASTIC DOMINANCE. By Garry F. Barrett and Stephen G.

Econometrica, Vol. 71, No. 1 (January, 2003), CONSISTENT TESTS FOR STOCHASTIC DOMINANCE. By Garry F. Barrett and Stephen G. Econometrica, Vol. 71, No. 1 January, 2003), 71 104 CONSISTENT TESTS FOR STOCHASTIC DOMINANCE By Garry F. Barrett and Stephen G. Donald 1 Methods are proposed for testing stochastic dominance of any pre-specified