Optimal Statistical Hypothesis Testing for Social Choice

Size: px

Start display at page:

Download "Optimal Statistical Hypothesis Testing for Social Choice"

Victoria Fisher
5 years ago
Views:

1 Optimal Statistical Hypothesis Testing for Social Choice LIRONG XIA, RENSSELAER POLYTECHNIC INSTITUTE, We address the following question: What are the optimal statistical hypothesis testing methods for social choice? By leveraging the theory of uniformly least favorable distributions in the Neyman-Pearson framework to finite models and randomized tests, we characterize uniformly most powerful (UMP) tests for testing whether a given alternative is the winner under Mallows model and Condorcet s model, respectively. We propose natural procedures to choose a winner by combining the results of multiple tests. 1 INTRODUCTION Suppose a group of friends want to choose a restaurant for dinner. Each person uses a full ranking over the restaurants to represent his or her preferences. Assuming that their tastes are similar and their preferences are correlated and are based on their perception of the quality of the restaurants the higher the quality of a restaurant, the more likely a person will give it a high rank. How can they decide the winner? Similar problems exist in a wide range of group decision-making scenarios such as political elections [Condorcet, 1785], meta-search engines [Dwork et al., 21], recommender systems [Ghosh et al., 1999], and crowdsourcing [Mao et al., 213]. Such problems at the intersection of statistics and social choice can be dated back to Condorcet s Jury Theorem in the 18th century [Condorcet, 1785]. The Jury Theorem states that when there are two alternatives, assuming that the votes are generated i.i.d. from a simple statistical model, then the outcome of majority voting, which is the MLE of the model, converges to the ground truth as the number of voters goes to infinity. However, the Jury Theorem does not identify the optimal rule to make a decision w.r.t. the unknown ground truth, especially for three or more alternatives. From a statistical point of view, while it is straightforward to think that an optimal rule should have the best chance to reveal the ground truth, defining the measure for such chance is highly nontrivial and controversial. If we use likelihood of a parameter as the measure, then we may pursue the maximum likelihood estimation (MLE) approach. If we view the ground truth parameter as a random variable, and use expected loss w.r.t. the posterior distribution over the parameters as the measure, then we may pursue the Bayesian approach. If we believe that the ground truth is deterministic and unknown, and want to measure the performance of a given rule, then we may pursue the Frequentist approach. At a high level, the frequentist approach tries to measure and design the most robust rule, as Efron [25] noted: a frequentist is a Bayesian trying to do well, or at least not too badly, against any possible prior distribution. Most previous work in the literature of statistical approaches to social choice pursued either an MLE approach or an Bayesian approach. We are not aware of the application of another widelyapplied modern frequentists decision-making technique optimal statistical hypothesis testing to social choice. In the celebrated Neyman-Pearson framework of statistical hypothesis testing (see, e.g. the book by Lehmann and Romano [28]), a statistical model is given and the decision-maker first chooses two non-overlapping subsets of ground truth parameters H, H 1, where H is called the null hypothesis and H 1 is called the alternative hypothesis. Then the decision-maker designs a test, in the form of a critical function f, to make a binary decision in {, 1} for each observed data. Here 1 means that H should be rejected and means that there is a lack of evidence to reject H. While many generic hypothesis testing methods, such as the generalized likelihood ratio tests [Hoeffding, 1965, Zeitouni et al., 1992], can be applied, they are often not optimal. It is still unclear what are the optimal statistical hypothesis testing methods for social choice. Manuscript submitted for review to ACM Economics & Computation 218 (EC 18).

2 Lirong Xia, Rensselaer Polytechnic Institute, 2 Our Contributions. We answer the question of optimal hypothesis testing for social choice by characterizing uniformly most powerful (UMP) tests for various combinations of H and H 1 for winner determination under two popular models for rank data: Mallows model and Condorcet s model. We focus on two types of tests for a given alternative a: the non-winner tests, where H represents a being the winner 1 ; and the winner tests, where H 1 represents a being the winner. Our main results are summarized in Table 1. UMP is a strong notion of optimality for hypothesis testing. A test f is evaluated by two criteria: its size (or level of significance), which is its probability to wrongly reject H, and its power, which is its probability to correctly reject H. The power of a test is evaluated at each h 1 H 1. A level-α test f is a UMP test, if it has the highest power at every h 1 H 1 among all tests whose sizes are no more than α. Non-winner (H = {a wins}) Winner (H 1 = {a wins}) Mallows Y&N (Thm. 3, 4) Y&N (Thm. 5,6,7) Condorcet Y&N (Thm. 1, 11) Y (Thm. 12) Table 1. Existence of UMP tests for winner determination. In Table 1, Y in Condorcet-Winner means that for any < α < 1, there exists a level-α UMP winner test for H = H 1 vs. H 1, where H 1 consists of rankings where a given alternative a is ranked at the top. Y&N means that for some α, no level-α UMP test exists when the other hypothesis consists of all other parameters; but a UMP test exists for all levels for some natural cases of the other hypothesis. For example, Y&N in Mallows-Non-winner in Table 1 means that for some α, no level-α UMP test exists for H vs. H 1 = H, where H consists of rankings where a given alternative a is ranked at the top. On the other hand, for some H 1, a level-α UMP test exists for all < α < 1. In fact, Theorem 4 characterizes all such H 1 s. In particular, we obtained a complete characterizations of H 1 for which UMP non-winner tests (that is, when H models a wins ) exist, under Mallows model (Theorem 4) and under Condorcet s model (Theorem 11). Technically, to obtain the characterizations, we leverage the theory of uniformly least favorable distributions to finite models and randomized tests (Theorem 1, 2, 8, 9). These theorems generalize the key theorems by Reinhardt [1961] that only hold for continuous parameter space, and they might be of independent interest. How to apply our results? The most positive result is the existence of UMP winner tests in the Condorcet-Winner column. These tests can be used for testing whether a given alternative a is a winner by appropriately setting H while fixing H 1 to represent a wins. We also propose two procedures in Section 6 to choose a winner as the social choice by combining results of multiple tests. Interestingly, one procedure corresponds to the Borda rule. This offers a new statistical justification of Borda. Related Work and Discussions. Most previous work in statistical approaches in social choice focused on extending the Condorcet Jury Theorem and proving asymptotic results. See the survey by Gerlinga et al. [25] and a more recent one by Nitzan and Paroush [217]. In terms of decision rules, previous work focused on using commonly-studied voting rules designed for elections [Caragiannis et al., 216, Conitzer and Sandholm, 25], maximum likelihood estimators [Conitzer and Sandholm, 25, Xia and Conitzer, 211], or Bayesian estimators [Azari Soufiani et al., 214, Elkind 1 This is a non-winner test because H is often used as the devil s advocate, and the goal is often to reject H.

3 Lirong Xia, Rensselaer Polytechnic Institute, 3 and Shah, 214, Pivato, 213, Procaccia et al., 212, Xia, 216, Young, 1988]. We are not aware of a previous work that studies UMP tests under popular models for rank data. In this paper, we focus on two commonly studied statistical models for rank data, i.e. Mallows model and Condorcet s model. The goal of this paper is to setup a first framework for using hypothesis testing for social choice. How to extend the results to other models and how to compute the UMP tests are promising directions for future research. Compared to previous MLE and Bayesian approaches to social choice, optimal rules characterized in this paper are more robust because it offers the best worst-case guarantee for an adversarial who controls the ground truth parameter. As in the general Bayesian vs. Frequentists debate, this does not mean that one approach is better than another, because the measures are different. And we note that the goal of this paper is to understand optimal rules for robustness decision-making in the principled and widely-applied framework of hypothesis testing. Marden [1995] studied various properties of distance-based models, especially Mallows model, such as the probability of certain pairwise comparisons and the expected rank of a given alternative. However, as our analyses are largely based on first-order stochastic dominance (Theorem 1), Marden s results do not help and we need to develop new ones (Lemma 3 and 4). These results provide new insights into Mallows model, which might be of independent interest. 2 PRELIMINARIES Let A = {a 1,..., a m } denote a set of m alternatives and let L(A) denote the set of all linear orders over A. Let n denote the number of agents. Each agent s preferences are represented by a linear order in L(A). We often use V = [a b ] to denote a ranking, and write a V b if a is preferred to b in V. The collection P n of all agents votes is called an (n-)profile. For any profile P and any pair of alternatives a, b, we let P (a b) denote the number of votes in P where a is preferred to b. The weighted majority graph of P, denoted by WMG(P), is a directed weighted graph where the weight w P (a b) on any edge a b is w P (a b) = P (a b) P (b a). Clearly w P (a b) = w P (b a). Statistical Models for Rank Data. A statistical model M = (S, Θ, π ) has three parts: the sample space S, which is composed of all possible data; the parameter space Θ; and the probability distributions π = {π θ : θ Θ}. If both S and Θ contain finite elements, then we call M a finite model. For any pair of linear orders V,W in L(A), let KT(V,W ) denote the Kendall-tau distance, which is the total number of pairwise disagreements between V and W. Example 1. For m = 3 and A = {1, 2, 3}, the Kendall-Tau distances among the six rankings are visualized in Figure 1 (a), where 123 represents 1 2 3, and the Kendall-Tau distance between V and W is the closest distance on the circle. For example, KT(123, 312) = 2 and KT(213, 231) = Fig. 1. Kentall-Dau distance for three alternatives. For m 4, the structure of Kendall-Tau distance is much more complicated and cannot be visualized as a circle.

4 Lirong Xia, Rensselaer Polytechnic Institute, 4 DEFINITION 1 (MALLOWS MODEL [MALLOWS, 1957]). Given < φ < 1, Mallows model is denoted by M Ma = (L(A) n, L(A), π ), where n linear orders are i.i.d. generated, the parameter space is L(A) and for any V,W L(A), π W (V ) = 1 Z φkt(v,w ), where Z is the normalization factor. Let B(A) denote the set of all irreflexive, antisymmetric, and total binary relations over A. We have L(A) B(A) and the Kendall-tau distance can be easily extended to B(A) by counting the number of pairwise disagreements. DEFINITION 2 (CONDORCET S MODEL [CONDORCET, 1785, YOUNG, 1988]). Given < φ < 1, Condorcet s model is denoted by M Co = (L(A) n, B(A), π ), where the parameter space is B(A) and for any W B(A) and V L(A), π W (V ) = 1 Z φkt(v,w ), where Z is the normalization factor. Condorcet s model differs from Mallows model mainly in allowing ties in the ground truth. Statistical Hypothesis Testing: The Neyman-Pearson Framework. Given a statistical model M = (S, Θ, π ), the decision-maker first chooses two non-overlapping subsets of parameters H, H 1 Θ, where H is called the null hypothesis and H 1 is called the alternative hypothesis. The goal of hypothesis testing is to decide whether the ground truth parameter is in H (retaining the null hypothesis) or in H 1 (rejecting the null hypothesis), based on the observed data P S. To simplify notation, we let denote retain and let 1 denote reject. A test is characterized by a (randomized) critical function f : S [, 1] such that for any P S, with probability f (P) the outcome of testing is 1 (reject). When H (or H 1 ) contains a single parameter, it is called a simple hypothesis; otherwise it is called a composite hypothesis. A test f is often evaluated by its size and power. The size of f is the maximum probability for f to wrongly outputs 1 when the ground truth is in H (such cases are called Type I errors or false positives), where the max is taken over all parameters in H. More precisely, for any h H, we let Size(f,h ) = E P πh f (P), and Size(f ) = sup h H Size(f,h ). If the size of f is α, then f is called a level-α test. For any h 1 H 1, the power of f at h 1 is the probability that f correctly outputs 1 when the ground truth is h 1. More precisely, we let Power(f,h 1 ) = E P πh1 f (P), where the expectation is take over randomly generated profiles from π h1. We would like a test f to have low size and high power, but often tradeoffs must be made. Example 2. Let M Ma denote a Mallows model with m = 3 and n = 1. Let A = {1, 2, 3}, h 1 = 123, and let H denote the other rankings. Let f be a test where f (123) = 1, f (213) = f (132) =.5, and f outputs for all other rankings. We have Size(f ) = Size(f, 213) = Size(f, 132) = (.5+φ+.5φ 2 )/Z, where Z is the normalization factor. Power( f, 123) = (1 + φ)/z. There is no deterministic test f whose size is φ 3 /2, because as soon as f (V ) = 1 for any ranking V, its size becomes at least φ 3. However, the size of the randomized test f that outputs φ 2 /2 at 123 and outputs at other rankings is φ 3 /2. There is a natural analogy between hypothesis testing and the following production problem. The production problem. Let H denote a set of workers, let S denote the types of divisible items to be produced, and suppose that there is a single buyer h 1. For any P i S, any h H, producing one unit of item P i costs π h (P i ) hours of worker h. Suppose that workers have different skills and cannot be substituted, and their labor is the only cost for producing the items. The buyer only wants no more than one unit of each item, and the unit price for item P i is π h1 (P i ). Each test f corresponds to a production plan, where f (P i ) is the amount of item P i that will be produced. Size(f,h ) is the total amount of time spent by worker h in production, and Size(f ) is the maximum number of hours of any single worker. Power(f ) is the total revenue.

5 Lirong Xia, Rensselaer Polytechnic Institute, 5 Given a statistical model M, H, a single parameter h 1 H, and < α < 1, a level-α most powerful test f α is a test with the highest power among all tests whose size is no more than α. Most powerful tests correspond to the production plan with the highest revenue given that no one works for more than α hours. For finite H, a most powerful test always exists, and can be computed by the following linear program, where for each sample P i S there is a variable x i that represents f α (P i ). max π h1 (P i ) x i (maximize the power) P i S s.t. h H, π h (P i ) x i α P i S i, x i 1 (size constraints) Most powerful tests may not be unique. For composite H 1, it is possible that for different h 1 H 1, the most powerful tests are different. If there exists a level-α test f α that is most powerful for all h 1 H 1, then f is called a level-α uniformly most powerful (UMP) test for H vs. H 1. UMP is a strong notion of optimality and a UMP test may not exists. The notion of UMP test can be explained in the context of the production problem as follows. Now there are multiple potential buyers, one for each h 1 H 1, such that buyer h 1 is willing to pay π h1 (P i ) for each of P i. However, only one buyer will show up and the production plan must be made before any buyer shows up. A UMP test corresponds to a robust production plan, such that no matter which buyer shows up, its revenue is always the highest among all production plans where no one works for more than α hours. For simple H, the fundamental lemma of Neyman and Pearson characterizes the most powerful tests as likelihood ratio tests. DEFINITION 3 (LIKELIHOOD RATIO TEST). Given a model M and < α < 1. For any h,h 1 Θ with h h 1 and any P S, we let Ratio h,h 1 (P) = π h 1 (P ) π h (P ) denote the likelihood ratio of P and let LR α,h,h 1 (P) defined below denote the level-α likelihood ratio test, both w.r.t. the simple vs. simple test (h vs. h 1 ). For any P S, LR α,h,h 1 (P) = 1 if Ratio h,h 1 (P) > k α if Ratio h,h 1 (P) < k α, γ α if Ratio h,h 1 (P) = k α where k α and γ α are chosen such that Size(LR α,h,h 1 ) = α. Sometimes h and h 1 in the subscript of Ratio h,h 1 and LR α,h,h 1 are omitted without causing confusion. We note that different k α and γ α may correspond to the same likelihood ratio test. LEMMA 1 (THE NEYMAN-PEARSON LEMMA (SEE E.G. [LEHMANN AND ROMANO, 28])). For any simple vs. simple test (h vs. h 1 ) and any < α < 1, the likelihood ratio test LR α,h,h 1 is a level-α most powerful test. Moreover, any most powerful test must agree with LR α,h,h 1 except on P S with Ratio h,h 1 (P) = k α. Example 3. Given a Mallows model. Let H = {h } and H 1 = {h 1 }. For any n-profile P n, we have Ratio(P n ) = φ KT(Pn,h 1 ) = φ KT(P n,h 1 ) KT(P n,h ). Therefore, it follows from the Neyman-Pearson lemma φ KT(Pn,h ) that for any < α < 1, there exist K α and Γ α such that the following test f α is a level-α most

6 Lirong Xia, Rensselaer Polytechnic Institute, 6 powerful test: for any n-profile P n, f α (P n ) = 1 if KT(P n,h ) KT(P n,h 1 ) > K α if KT(P n,h ) KT(P n,h 1 ) < K α. Γ α if KT(P n,h ) KT(P n,h 1 ) = K α The idea behind most powerful tests for composite H is to use a distribution Λ over H to compress H into a combined parameter, denoted by h Λ. Then, the likelihood ratio test for hλ vs. h 1 is a most powerful test for H vs. h 1 when two additional conditions are satisfied. DEFINITION 4. For any M = (S, Θ, π ), any H Θ, and any h 1 (Θ H ). Let Λ denote a distribution over H and let h Λ denote a new parameter whose distribution over S is the probabilistic mixture of {π h : h Θ} according to Λ. For any < α < 1 and any P S, let Ratio Λ,h1 (P) = π h1 (P ) h H Λ(h )π h (P ) and let LR α,λ,h 1 (P) denote the likelihood ratio test for h Λ vs. h 1. Sometimes Λ and h 1 in the subscripts of Ratio Λ,h1 and LR α,λ,h1 are omitted without causing confusion. The next lemma is a generalization of the Neyman-Pearson lemma that provides necessary and sufficient conditions for LR α,λ,h1 (P) to be most powerful for H vs. h 1. Let Support(Λ) denote the support of Λ, namely the elements in H that are assigned non-zero probability by Λ. LEMMA 2 (THEOREM AND COROLLARY IN [LEHMANN AND ROMANO, 28]). For composite vs. simple test (H vs. h 1 ) and any distribution Λ over H, the likelihood ratio test LR α,λ,h1 is a level-α most powerful test if and only if the following two conditions are satisfied. (i) For any h Support(Λ), we have Size(LR α,λ,h 1,h ) = α. (ii) For any h H, we have Size(LR α,λ,h1,h ) α. Moreover, if there is no P S with Ratio Λ,h1 (P) = k α, then LR α,λ,h1 is the unique level-α most powerful test. The distribution Λ in Lemma 2 is called a least favorable distribution. If Support(Λ) = 1, then Λ is called a deterministic least favorable distribution. If Λ is a least favorable distribution for all levels of significance < α < 1, then it is called a uniformly least favorable distribution [Reinhardt, 1961]. 3 THEOREMS ON LEAST FAVORABLE DISTRIBUTIONS We first present two general theorems on least favorable distributions that will be frequently used in this paper. For any model M = (S, Θ, π ), any composite vs. simple test (H vs. h 1 ), any distribution Λ over H, and any h H, we define a random variable X Λ h : S R such that for any P S, Pr(P) = π h (P) and X Λ h (P) = log Ratio Λ,h1 (P). A random variable X weakly first-order stochastically dominates (weakly dominates for short) another random variable Y, if for any p R, Pr(X p) Pr(Y p). THEOREM 1. Λ is a uniformly least favorable distribution for H vs. h 1 if and only if for any h Support(Λ) and any h H, X Λ weakly dominates X Λ h h. All missing proofs can be found in the appendix. Example 4. Let M denote a Mallows model with m = 3 and n = 1. Let A = {1, 2, 3}, h 1 = [1 2 3] and let Λ denote the uniform distribution over {[2 1 3], [1 3 2]}. We will apply Theorem 1 to prove that Λ is a uniformly least favorable distribution for H = (L(A) {[1 2 3]}) vs. [1 2 3]. The likelihood ratios of all rankings are summarized in Table 2 in the increasing order.

7 Lirong Xia, Rensselaer Polytechnic Institute, 7 V others Ratio Λ,1 2 3 (V ): φ Table 2. Likelihood ratios. 2φ 1+φ 2 1 φ For any h 1 H, X Λ h takes three values: log 1 2φ φ, log, and log φ. The probabilities for the five 1+φ 2 random variables taking these three values are summarized in Table 3. log φ log 2φ log 1 1+φ 2 φ X1 3 2 Λ and X Λ φ 2 1+φ+φ 2 +φ 3 φ Z Z Z X2 3 1 Λ and X Λ φ 1+φ+φ 2 +φ 3 φ 2 Z Z Z X3 2 1 Λ 1 2(φ+φ 2 ) φ 3 Z Z Z Table 3. X h Λ for all h H, where Z is the normalization factor. Because < φ < 1, it is not hard to verify that X1 3 2 Λ and X Λ weakly dominate other random variables. By Theorem 1, Λ is a uniformly least favorable distribution. Our second theorem states that if we can find a deterministic uniformly least favorable distribution, then it is also uniformly least favorable for the same statistical model with multiple i.i.d. samples. THEOREM 2. Suppose Λ is a deterministic uniformly least favorable distribution for composite vs. simple test (H vs. h 1 ) under M = (S, Θ, π ). Then for any n N, Λ is also a uniformly least favorable distribution for testing H vs. h 1 under M = (S n, Θ, π ) with n i.i.d. samples. PROOF. Let Support(Λ) = {h }. For any n N and any h H, we define a random variable X n,h : S n R, where for any P n S n, Pr(P n ) = π h (P n ) = V P n π h (V ), and X n,h (P n ) = log Ratio(P n,h,h 1). It follows that X n,h = X h + X h + + X h. By Theorem 1, for any h H, } {{ } X h weakly dominates X h. Because first-order stochastic dominance is preserved under convolution [Deelstra and Plantin, 214], we have that X n,h weakly dominates X n,h. The theorem follows after applying Theorem 1. 4 UMP TESTS FOR MALLOWS MODEL Given an alternative a, for Mallows model we define L a others = {V L(A) : b A, a V b}. That is, a is ranked at the top of the ground truth parameter. We will focus on two classes of tests, one for testing whether a is not a winner and the other for testing whether a is a winner. Because H is often chosen as a devil s advocate and the goal of testing is often to reject H, when setting H = L a others under Mallows model, we are hoping to reject H, which means that a is not the winner. This idea leads to the following formal definition. DEFINITION 5. Given an alternative a, in a non-winner test of Mallows model, we let H = L a others ; in a winner test, we let H 1 = L a others. We note that the decision-maker still need to specify both H and H 1 in a winner or non-winner test and a level α. Then, the decision-maker performs a level-α test f on the data. Often this is done by computing a test statistic and computes its p-value. For any profile P, any B A, and any a (A B), we let w P (B a) = b B w P (b a), that is, the total weights on edges from B to a in WMG(P). n

8 Lirong Xia, Rensselaer Polytechnic Institute, 8 THEOREM 3. Let M Ma denote a Mallows model with any m 2, any n, and any φ. For any α, any a A, and any h 1 = [B a others] for non-empty B, let h denote the ranking that is obtained from h 1 by raising a to the top position. Then the deterministic distribution Λ at {h } is a uniformly least favorable distribution for H = L a others vs. h 1, and f α,a,b defined below is a level-α UMP test. For any n-profile P n, f α,a,b (P n ) = 1 if w Pn (B a) > K α if w Pn (B a) < K α, Γ α if w Pn (B a) = K α where K α and Γ α are chosen such that the size of f α,a,b is α. Moreover, any most powerful test must agree with f α,a,b (P n ) except on P n with w Pn (B a) = K α. Proof sketch: We have h = [a B other]. In this proof we let LR α denote LR α,h,h 1 and let Ratio denote Ratio h,h 1. We first prove the theorem for n = 1, and then extend it to any n by Theorem 2. By Theorem 1, it suffices to prove that for any h H, X Λ weakly dominates X Λ h h. For any ranking V and any pair of alternatives a,b, we let I (a V b) = 1 if a V b, otherwise I (a V b) =. For any P L(A), we have log Ratio(P) = log φ(kt(p,h 1 ) KT(P,h )) = log φ (I (d h1 c) I (d h c)) = log φ( + c P d b P a:b B a P b:b B (I (b h1 a) I (b h a)) (I (a h1 b) I (a h b))) = log φ (2w P (B a) B ) Then we prove that for any K Z, π h ({P : w P (B a) K}) π h ({P : w P (B a) K}). Consequently, following Theorem 1, we prove that this means that Λ is a uniformly least favorable distribution for n = 1. Because Λ is deterministic, by Theorem 2, Λ is also a uniformly least favorable distribution for Mallows model with any n N, which means that the corresponding likelihood ratio test LR α is most powerful. It is not hard to verify that LR α = f α,a,b. Moreover, because Λ is deterministic, any most powerful test f for H vs. h 1 must also be most powerful for the simple vs. simple test (h vs. h 1). By the Neyman-Pearson lemma (Lemma 1), f must agree with LR α except on P n such that Ratio(P n ) = k α, which corresponds to P n with w Pn (B a) = K α. The next theorem characterizes all UMP non-winner tests (H = L a others ) under Mallows model. For any B A and a (A B), we let L B a L(A) denote the set of all rankings where the set of alternatives ranked above a is exactly B. For example, when m = 4, L {c } a = {[c a b d], [c a d b]}. THEOREM 4. Let M Ma denote a Mallows model with any m 2, any n 2, and any φ. There exists a UMP test for H = L a others vs. H 1 for each < α < 1 if and only if there exists B A such that H 1 L B a. Moreover, if H 1 L B a then f α,a,b defined in Theorem 3 is UMP. PROOF. The if part. We note that f α,a,b does not depend on the orderings among alternatives in B in h 1. It follows that for all h 1 H 1, f α,a,b is a level-α most powerful test for H vs. {h 1 }, which means that f α,a,b is a UMP test.

9 Lirong Xia, Rensselaer Polytechnic Institute, 9 The only if part. Suppose there exist B, B such that B B and there exists two rankings h 1 1 = [B a others] and h2 1 = [B a others] in H 1. W.l.o.g. suppose B B. Let K α = n B.5, Γ α =, and let f α,a,b denote the most powerful test for H vs. h 1 1 guaranteed by Theorem 3. Because K α is not an integer, there does not exist P n such that w Pn (B a) = K α. This means that f α,a,b is the unique most powerful level-α test for H vs. h 1 1. We observe that for any P n, f α,a,b (P n ) is either or 1, and f α,a,b (P n ) = 1 if and only if a is ranked above B in all n rankings in P n. It follows that f α,a,b must be the unique level-α UMP test for H vs. H 1. By Theorem 3, any most powerful level-α test, in particular f α,a,b, must agree with f α,a,b except for the threshold cases w Pn (B a) = K α for some K α. Choose arbitrary b B B and b B. Let Pn be composed of n copies of [B a others] and let P n be composed of n 1 copies of [b B a others] and one copy of [b (B {b}) a others]. Because w P n (B a) = n B > K α, we have f α,a,b (Pn) = 1. This means that the threshold K α for д α,a,b is no more than w P n (B a) = n B B. Because n 2, we have w P n (B a) n( B B + 1) 1 > n B B = w P n (B a), which means that f α,a,b (P n) = 1. However, w P n (B a) = n B 1 < n B, which is a contradiction because for any profile P n, f α,a,b (P n ) = 1 if and only if B a in all n rankings in P n. We now consider UMP winner tests (H 1 Theorem 4. = L a others ). The following corollary follows after Corrollary 1. Under Mallows model, for any m 2, any n 2, and any a,b A, there exists α > such that no level-α UMP test exists for H = L b others vs. H 1 = L a others. THEOREM 5. Let M Ma denote a Mallows model with any m 2, any n and any φ. For any any α, f α,a defined below is a UMP test for H = L others a vs. H 1 = L a others. For any n-profile P n, f α,a (P n ) = 1 if w Pn (a others) > K α if w Pn (a others) < K α, Γ α if w Pn (a others) = K α where K α and Γ α are chosen such that the size of f α,a is α. PROOF. We note that f α,a is insensitive to permutations over A {a}. Therefore, to prove that f α,a is a UMP test, it suffices to prove that for some h 1 H 1, f α,a is most powerful for H vs. h 1. Choose an arbitrary h 1 H 1. Let h H be the ranking that is obtained from h 1 by moving a to the bottom position without changing the relative positions of the other alternatives. Similar to the proof of Theorem 3, it is not hard to check that f α,a is equivalent to the likelihood ratio test LR α,h,h 1. Also because f α,a is invariant to permutations over A {a}, for any h H and any permutation M over A {a}, we have Size(f α,a,h ) = Size(f α,a, M(h )). In particular, let M denote the permutation such that M(h ) = h. We have Size(f α,a,h ) = Size(f α,a,h ). It follows from Lemma 2 that f α,a is most powerful. This proves the theorem. The next two theorems identify conditions on φ in Mallows model for the UMP winner test H = (L(A) H 1 ) vs. H 1 = L a others, when n = 1. The proofs are quite involved and can be found in the appendix. THEOREM 6. Let M Ma denote a Mallows model with n = 1, any m 4, and any φ < 1/m. There exists < α < 1 such that no level-α UMP test exists for H = (L(A) H 1 ) vs. H 1 = L a others. THEOREM 7. Let M Ma denote a Mallows model with n = 1 and any m 4. There exists ϵ > such that for any φ > 1 ϵ and any α, a UMP test exists for H = (L(A) H 1 ) vs. H 1 = L a others.

10 Lirong Xia, Rensselaer Polytechnic Institute, 1 5 UMP TESTS FOR CONDORCET S MODEL For Condorcet s model, given an alternative a, we define R a others = {V B(A) : b A, a V b}, and we will focus on tests where either H = R a others (non-winner tests) or H 1 = R a others (winner tests). We first prove two general theorems on UMP tests for statistical models that combine multiple independent models, and then apply them to characterize UMP tests under Condorcet s model. Given two models M X = (S X, Θ X, π X ) and M Y = (S Y, Θ Y, π Y ), we let M X M Y = (S X S Y, Θ X Θ Y, π X π Y ), where for any (π θx, π θy ) π X π Y and any P X S X and P Y S Y, we have (π θx, π θy )(P X, P Y ) = π θx (P X ) π θy (P Y ). Example 5. Given a Condorcet s model M Co with m = 3. Let A = {1, 2, 3}. For any pair of alternatives {a,b}, we let M {a,b } = ({, 1} n, {, 1}, π ) denote the restriction of M Co on the pairwise comparison between a and b. We have M Co = M {1,2} M {2,3} M {1,3}. THEOREM 8. For any pair of models M X and M Y, suppose Λ X is a least favorable distribution for composite vs. simple test (H,X vs. x 1 ) under M X. Given y 1 Θ Y, let Λ be the distribution over H,X Θ Y where for all x H,X, Λ (x,y 1 ) = Λ X (x). Then Λ is a least favorable distribution for H,X Θ Y vs. (x 1,y 1 ) under M X M Y. Example 6. Continuing the setting of Example 5, we let M X = M {1,2}, H,X = {}, x 1 = 1, Λ X is the deterministic distribution {}, and let M Y = M {2,3} {1,3} and y 1 = (1, 1). Λ X is a least favorable distribution according to the Neyman-Pearson lemma. Then, Λ is the deterministic distribution {(, 1, 1)} and it is a least favorable distribution for {} {, 1} 2 vs. (1, 1, 1) under the Condorcet s model with m = 3. The next theorem focuses on the setting where we combine t N independent and identical statistical model M X. Given M X = (S, Θ, π ), a distribution Λ over Θ, any θ Θ, and any t N, we let (M X ) t = M X M } {{ X and define the extension of Λ to Θ } t w.r.t. θ, denoted t by Ext(Λ, θ, t), as follows. Let θ = (θ,..., θ ) Θ t. For any j t and any θ Θ, we have Ext(Λ, θ, t)(θ, [ θ ] j ) = 1 t Λ(θ ). That is, Ext(Λ, θ, t) generates a vector θ Θ t in the following two steps: first, a number j t is chosen uniformly at random. Then, we fix the components of θ to be θ, except for the j-th component, which is generated from Θ according to Λ. For any H Θ and any h 1 (Θ H ), we let h 1 = (h 1,...,h } {{ } 1 ) and let Ext(H,h 1, t) = ({H {h 1 }} t { h 1 }). t Example 7. Continuing the setting of Example 5, we let M X = M {1,2}, let Λ denote the deterministic distribution {}, let H = {} and h 1 = 1. Then Ext(Λ, 1, 3) is the uniform distribution over {(, 1, 1), (1,, 1), (1, 1, )}, h 1 = (1, 1, 1), and Ext(H, 1, 3) = ({, 1} 3 { 1}). THEOREM 9. For any model M X and any t N, suppose Λ is a uniformly least favorable distribution for composite vs. simple test (H vs. h 1 ) under M X. Then Ext(Λ,h 1, t) is a uniformly least favorable distribution for Ext(H,h 1, t) vs. h 1 in (M X ) t. Example 8. Continuing the setting of Example 7, it follows from Theorem 9 that the uniform distribution over {(, 1, 1), (1,, 1), (1, 1, )} is a uniformly least favorable distribution for testing Ext(H, 1, 3) = ({, 1} 3 { 1}) vs. h 1 = (1, 1, 1) under (M X ) 3, which is a Condorcet s model. THEOREM 1. Let M Co denote a Condorcet s model with any m 2, any n, and any φ. For any a A and any h 1 (B(A) R a others ), Let h denote the binary relation obtained from h 1 by letting a b for all b A. Then the deterministic distribution {h } is a uniformly least favorable

11 Lirong Xia, Rensselaer Polytechnic Institute, 11 distribution for H = R a others vs. h 1 and д α,a,b defined below is a most powerful test. For any 1 if w Pn (B a) > K α n-profile P n, д α,a,b (P n ) = if w Pn (B a) < K α, where B A is the set of alternatives that Γ α if w Pn (B a) = K α are preferred to a in h 1. Moreover, when there is no profile P with w Pn (B a) = K α, then д α,a,b is the unique level-α most powerful test. PROOF. Let X = {{a,b} : b a} denote the pairwise comparisons between alternatives in A that involve a and let Y denote other pairwise comparisons. Let M X = (S X, Θ X, π X ) denote Condorcet s model M Co restricted to X. That is, S X = {, 1} (m 1)n, Θ X = {, 1} m and for any θ Θ X and any P n S X, π θ (P n ) φ KT(θ,P n ). Similarly, let M Y denote Condorcet s model restricted to Y. It follows that M Co = M X M Y. Let h 1 = (x 1,y 1 ), where x 1 Θ X and y 1 Θ Y. Let x Θ X denote the vector that represents a b for all b A. By Neyman-Pearson lemma (Lemma 1), the deterministic distribution Λ X = {x } is a uniformly least favorable distribution for x vs. x 1. Therefore, by Theorem 8, the deterministic distribution Λ = {(x,y 1 )} is uniformly least favorable for {x } Θ Y vs. (x 1,y 1 ). We note that (x,y 1 ) = h and (x 1,y 1 ) = h 1. It is not hard to verify that д α,a,b is equivalent to the likelihood ratio test LR α,λ,h1, which is most powerful. Subsequently, we have the following characterization of UMP non-winner tests under Condorcet s model (H = R a others ). For any B A, we let R B a B(A) denote the set of all binary relations where the set of alternatives that are preferred to a is B. THEOREM 11. Let M Co denote a Condorcet s model with any m 2 and n 2. There exists a UMP test for H = R a others vs. H 1 for every < α < 1 if and only if there exists B A such that H 1 R B a. Moreover, if H 1 R B a then д α,a,b defined in Theorem 1 is a UMP test. The following two results characterize UMP winner tests under Condorcet s model (H 1 = R a others ). Corrollary 2. Let M Co denote a Condorcet s model with any m 2, any n 2, and any φ. For any a b, there exists α such that no UMP test exists for H = R b others vs. H 1 = R a others. THEOREM 12. Let M Co denote a Condorcet s model with any m 2, any n 2, and any φ. For any α, д α,a defined below is a level-α UMP test for H = (B(A) H 1 ) vs. H 1 = R a others. For any P n, where Ratio(P n ) = д α,a (P n ) = 1 if Ratio(P n ) > k α if Ratio(P n ) < k α, γ α if Ratio(P n ) = k α m 1 b a φ w Pn (a b), and k α and γ α are chosen such that the level of д α,a is α. PROOF. Let M 1 denote the Condorcet s model with a single sample (ranking). Let X 1,..., X m 1 denote the m 1 pairwise comparisons between a and other alternatives. Similarly to the proof of Theorem 1, we let M X1,..., M Xm 1 denote the restriction of M 1 on the m 1 pairwise comparisons, and let M Y denote the restriction of M Co on other pairwise comparisons. In fact, M X1,... M Xm 1 are the same model. It follows that M Co = M X1 M X2 M Xm 1 M Y. In M X1, let 1 denote a is more preferred in the pairwise comparison. Due to the Neyman-Pearson lemma (Lemma 1), the deterministic distribution Λ = {} is a uniformly least favorable distribution for H = {} vs. h 1 = 1. For any n N, let M X1,n denote M X1 with n i.i.d. samples. It follows from Theorem 2 that Λ is still a uniformly least favorable distribution for M X1,n. By Theorem 9,

12 Lirong Xia, Rensselaer Polytechnic Institute, 12 Ext(Λ,h 1,m 1) is a uniformly least favorable distribution for Ext(H,h 1,m 1) = ({, 1} m 1 { 1}) vs. h 1 = 1 under M X1,n M Xm 1,n. Let M Y,n denote the model obtained from M Y by using n i.i.d. samples. For any y 1 Θ Y,n, let Λ y1 denote the distribution that is obtained from Ext(Λ,h 1,m 1) by appending y 1 to each parameter. By Theorem 2, Λ y1 is a uniformly least favorable distribution for Ext(H,h 1,m 1) Θ Y,n vs. ( 1,y 1 ) under M X1,n M Xm 1,n M Y,n, which is the Condorcet s model with n i.i.d. samples. We note that Ext(H,h 1,m 1) Θ Y,n = ({, 1} m 1 { 1}) Θ Y = (B(A) R a others ). This means that the likelihood ratio test LR α,λy1, ( 1,y 1 ) is a most powerful level-α test for (B(A) R a others) vs. ( 1,y 1 ). We note that for all y 1, LR α,λy1, ( 1,y 1 ) is the same test, which means that it is also UMP. It can be verified that д α,a = LR α,λy1, ( 1,y 1 ). 6 PROCEDURES FOR SOCIAL CHOICE All UMP tests we have characterized so far are optimal in making binary decisions, such as whether a given alternative a is the winner. We now propose two natural procedures to choose the winner by combining multiple winner tests (H 1 = L a others for Mallows model and H 1 = R a others for Condorcet model) and non-winner tests (H = L a others for Mallows model and H = R a others for Condorcet model), respectively. The procedure based on winner tests. We first choose any winner test, such as a UMP test characterized in Theorem 5, then find the alternative a with the minimum α such that H is rejected in the winner test, by conducting binary search on α. 2 This corresponds at a high level to choosing the alternative that is most likely to be the winner according to the tests. The procedure based on non-winner tests. Similarly, we use binary search on α to find the alternative a with the maximum α such that H is rejected in the non-winner test. This corresponds to choosing the alternative that is mostly unlikely to be a non-winner according to the tests. Interestingly, the procedure based on winner tests corresponds to the Borda voting rule when we use the UMP winner test for H = L others a vs. H 1 = L a others under Mallows model (Theorem 5). This provides a new theoretical justification for the Borda rule; or vice versa, Borda provides a justification of the proposed procedure. 7 SUMMARY AND FUTURE WORK We characterized UMP tests for deciding whether a given alternative is the winner under Mallows model and Condorcet s model, and proposed a procedure to choose a single winner by combining the results of multiple tests. There are many open questions and directions for future research. Can we characterize UMP tests for other goals of social choice, such as pairwise comparisons? Do UMP tests exist for other statistical models, such as random utility models? How can we efficiently compute the results of the proposed tests? REFERENCES Hossein Azari Soufiani, David C. Parkes, and Lirong Xia Statistical Decision Theory Approaches to Social Choice. In Proceedings of Advances in Neural Information Processing Systems (NIPS). Montreal, Quebec, Canada. George E. P. Box and Norman R. Draper. 27. Empirical Model-Building and Response Surfaces (2nd edition ed.). John Wiley & Sons. Ioannis Caragiannis, Ariel D. Procaccia, and Nisarg Shah When Do Noisy Votes Reveal the Truth? ACM Transactions on Economics and Computation 4, 3 (216), Article No. 15. Marquis de Condorcet Essai sur l application de l analyse à la probabilité des décisions rendues à la pluralité des voix. Paris: L Imprimerie Royale. 2 Co-winners may exist if they all reject H for the same α s.

13 Lirong Xia, Rensselaer Polytechnic Institute, 13 Vincent Conitzer and Tuomas Sandholm. 25. Common Voting Rules as Maximum Likelihood Estimators. In Proceedings of the 21st Annual Conference on Uncertainty in Artificial Intelligence (UAI). Edinburgh, UK, Griselda Deelstra and Guillaume Plantin Risk Theory and Reinsurance. Springer. Cynthia Dwork, Ravi Kumar, Moni Naor, and D. Sivakumar. 21. Rank aggregation methods for the web. In Proceedings of the 1th World Wide Web Conference Bradley Efron. 25. Bayesians, Frequentists, and Scientists. J. Amer. Statist. Assoc. 1, 1 5 (25). Edith Elkind and Nisarg Shah Electing the Most Probable Without Eliminating the Irrational: Voting Over Intransitive Domains. In Proceedings of the 3th Conference on Uncertainty in Artificial Intelligence Kerstin Gerlinga, Hans Peter Grünera, Alexandra Kielc, and Elisabeth Schulte. 25. Information acquisition and decision making in committees: A survey. European Journal of Political Economy 21, 3 (25), Sumit Ghosh, Manisha Mundhe, Karina Hernandez, and Sandip Sen Voting for movies: the anatomy of a recommender system. In Proceedings of the third annual conference on Autonomous Agents Wassily Hoeffding Asymptotically Optimal Tests for Multinomial Distributions. The Annals of Mathematical Statistics 36, 2 (1965), Erich L. Lehmann and Joseph P. Romano. 28. Testing Statistical Hypotheses (third edition ed.). Springer. Colin L. Mallows Non-null ranking model. Biometrika 44, 1/2 (1957), Andrew Mao, Ariel D. Procaccia, and Yiling Chen Better Human Computation Through Principled Voting. In Proceedings of the National Conference on Artificial Intelligence (AAAI). Bellevue, WA, USA. John I. Marden Analyzing and modeling rank data. Chapman & Hall. Shmuel Nitzan and Jacob Paroush Collective Decision Making and Jury Theorems. In The Oxford Handbook of Law and Economics: Volume 1: Methodology and Concepts, Francesco Parisi (Ed.). Oxford University Press. Jacob Paroush Stay away from fair coins: A Condorcet jury theorem. Social Choice and Welfare 15, 1 (1998), Marcus Pivato Voting rules as statistical estimators. Social Choice and Welfare 4, 2 (213), Ariel D. Procaccia, Sashank J. Reddi, and Nisarg Shah A Maximum Likelihood Approach For Selecting Sets of Alternatives.. In Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence. H. E. Reinhardt The Use of Least Favorable Distributions in Testing Composite Hypotheses. The Annals of Mathematical Statistics 32, 4 (1961), Lirong Xia Bayesian estimators as voting rules. In Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence Lirong Xia and Vincent Conitzer A Maximum Likelihood Approach towards Aggregating Partial Orders. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI). Barcelona, Catalonia, Spain, H. Peyton Young Condorcet s Theory of Voting. American Political Science Review 82 (1988), Ofer Zeitouni, Jacob Ziv, and Neri Merhav When is the generalized likelihood ratio test optimal? IEEE Transactions on Information Theory 38, 5 (1992),

14 Lirong Xia, Rensselaer Polytechnic Institute, 14 8 APPENDIX: OPTIMAL STATISTICAL HYPOTHESIS TESTING FOR SOCIAL CHOICE Theorem 1. Λ is a uniformly least favorable distribution for H vs. h 1 if and only if for any h Support(Λ) and any h H, X Λ weakly dominates X Λ h h. PROOF. To simplify notation we let LR α and Ratio to denote LR α,λ,h1 and Ratio Λ,h1, respectively. For any < α < 1 and any h H, we have Size(LR α,h ) = π h (P) P S:Ratio(P )>k α + γ α P S:Ratio(P )=k α π h (P) = Pr(X Λ h > log k α ) + γ α Pr(X Λ h = log k α ) (1) =(1 γ α ) Pr(X Λ h > log k α ) + γ α Pr(X Λ h log k α ) =(1 γ α ) lim Pr(X Λ x log kα h x) + γ α Pr(X Λ h k α ) The if direction: for any h H and any h Support(Λ), because X Λ weakly dominates X Λ h h, we have that for any x R, Pr(X Λ x) Pr(X Λ h h x). It follows from (1) that Size(LR α,h 1 ),h ) Size(LR α,h ). By Lemma 2, LR α is a level-α most powerful test. Therefore Λ is a uniformly least favorable distribution. The only if direction: suppose for the sake of contradiction that this is not true. Let h H and h Support(Λ) be such that X Λ does not weakly dominate X Λ h h. It follows that there exists x R such that Pr(X Λ x) < Pr(X Λ h h x). Let α = Pr(X Λ x). Because Λ is uniformly least h favorable, the size of LR α must be α, where k α = 2 x and γ α = 1. By Lemma 2, Pr(X Λ h x) = Size(LR α,h ) Size(LR α,h ) = Pr(X Λ x), which is a contradiction. h Theorem 3. Let M Ma denote a Mallows model with any m 2, any n, and any φ. For any α, any a A, and any h 1 = [B a others] for non-empty B, let h denote the ranking that is obtained from h 1 by raising a to the top position. Then the deterministic distribution Λ at {h } is a uniformly least favorable distribution for H = L a others vs. h 1, and f α,a,b defined below is a level-α UMP test. 1 if w Pn (B a) > K α For any n-profile P n, f α,a,b (P n ) = if w Pn (B a) < K α, where K α and Γ α are chosen such Γ α if w Pn (B a) = K α that the size of f α,a,b is α. Moreover, any most powerful test must agree with f α,a,b (P n ) except on P n with w Pn (B a) = K α. PROOF. We have h = [a B other]. In this proof we let LR α denote LR α,h,h 1 and let Ratio denote Ratio h,h 1. We first prove the theorem for n = 1 then extend it to arbitrary n by Theorem 2. By Theorem 1, it suffices to prove that for any h H, X Λ weakly dominates X Λ h h.

15 Lirong Xia, Rensselaer Polytechnic Institute, 15 For any ranking V and any pair of alternatives a,b, we let I (a V b) = 1 if a V b, otherwise I (a V b) =. For any P L(A), we have log Ratio(P) = log φ(kt(p,h 1 ) KT(P,h )) = log φ (I (d h1 c) I (d h c)) = log φ( + c P d b P a:b B a P b:b B (I (b h1 a) I (b h a)) (I (a h1 b) I (a h b))) = log φ (2w P (B a) B ) Therefore, to prove that X Λ weakly dominates X Λ h h, it suffices to prove for any K Z, π h ({P : w P (B a) K}) π h ({P : w P (B a) K}) Let M denote the permutation over A such that M(h ) = h. Because h H = L a others, we have M(a) = a. Let B = M (B). Because Kendall-Tau distance is invariant to permutations, for any P L(A) we have π h (P) = π M (h ) (M(P)) and π h ({P : w P (B a) K}) =π M (h ) ({M(P) : w M (P ) (M(B) M(a)) K}) =π h ({M (P) : w M (P ) (B M(a)) K}) =π h ({P : w P (B a) K}) Therefore, it suffices to prove that π h ({P : w P (B a) K}) π h ({P : w P (B a) K}). We will prove a stronger lemma. Given any W L(A) and C,C A with C C and C = C, we say that C dominates C w.r.t. W if there exists a one-one mapping F : (C C ) (C C) such that for all c C we have c W F (c). In words, C can be obtained from C by lowering some alternatives according to W. LEMMA 3. Under a Mallows model, for any φ, any K N, any a A, any W L(A), and any C,C A such that C dominates C w.r.t. W, we have π W ({P : w P (C a) K}) π W ({P : w P (C a) K}) PROOF. We first prove the lemma for a special case where C and C differ in only one alternative, that is, C C = 1. Let c C such that c C. Let c C such that c C. Because C dominates C in W, we have c W c. Let P = {P L(A) : w P (C a) K} and P = {P L(A) : w P (C a) K}. We define the following permutation M over L(A). For any P L(A), if c P a P c then M(P) is the ranking that is obtained from P by switching c and c ; otherwise M(P) = P. Because C C = 1, it follows that for any P P P, we must have c P a P c and (C C ) P a. Therefore, M(P P ) = P P. We now prove that π W (P P ) > π W (P P). For any P P P, we have c P a P c, which means that π W (P) π W (M(P))/φ because c W c. Therefore, π W (P P ) > π W (P P) because M(P P ) = P P. We have π W (P) = π W (P P ) + π W (P P ) π W (P P ) + π W (P P) = π W (P ). Therefore, the lemma holds for the case where C C = 1. For general C and C, because C dominates C, there exists a sequence of sets C = C,C 1,...,C l = C such that for all i l 1,

16 Lirong Xia, Rensselaer Polytechnic Institute, 16 (i) C i dominates C i+1 ; (ii) C i C i+1 = 1. It follows that π W ({P : w P (C a) K}) π W ({P : w P (C 1 a) K}) π W ({P : w P (C a) K}) It follows from Lemma 3 that X Λ weakly dominates X Λ h h, which means that Λ is a uniformly least favorable distribution for n = 1 by Theorem 1. We note that Λ is deterministic. Therefore, by Theorem 2, Λ is also a uniformly least favorable distribution for Mallows model with any n N, which means that the corresponding likelihood ratio test LR α is most powerful. It is not hard to verify that LR α = f α,a,b. Moreover, because Λ is deterministic, any most powerful test f for H vs. h 1 must also be most powerful for the simple vs. simple test (h vs. h 1). By the Neyman-Pearson lemma (Lemma 1), f must agree with LR α except on P n such that Ratio(P n ) = k α, which corresponds to P n with w Pn (B a) = K α. Remarks. We note that f α,a,b does not depend on the ordering among alternatives in B in h 1. f α,a,b is computed in the following way for any given profile P n : we first build the weighted majority graph, then calculate the total weight w Pn (B a) of all edges from B to a. If the total weight is more than a threshold K α, then H is rejected; if the total weight is less than K α, then H is retained; otherwise H is rejected with probability Γ α. This procedure is intuitive because a larger w Pn (B a) corresponds to more evidence from the data that a should be ranked below B, which means that it is less likely that the ground truth is in H, where a is ranked above B. w Pn (B a) is called the test statistic and is easy to compute. The threshold K α and the value Γ α might be hard to compute. In practice such K α and Γ α are pre-computed as a look-up table, and once w Pn (B a) is computed, we can immediately obtain its p-value, which is the smallest α such that K α = w Pn (B a). Theorem 6. Let M Ma denote a Mallows model with n = 1, any m 4, and any φ < 1/m. There exists < α < 1 such that no level-α UMP test exists for H = (L(A) H 1 ) vs. H 1 = L a others. PROOF. By Lemma 6, if a UMP test exists then f α,a is also a UMP test. Therefore, it suffices to prove that f α,a is not a level-α UMP test. To this end, we explicitly construct a test f and prove that the rankings assigned value 1 are more cost-effective than that under f α,a. Let V 1,V 2,...,V m,v 2 L(A) denote m + 1 rankings defined as follows. For any j m, let V j = [a j others], where alternatives in others are ranked w.r.t. the increasing order of their subscripts. In other words, V j is obtained from V 1 by raising alternative a j to the top position. We let V 3 = [a 3 a 1 a 4 a 2 others]. We consider the following critical function f. For any V L a others, we let f (V ) = 1. For any V j with j 3, let f (V j ) = 1. We then let f (V 3 ) = f (V 3 ) = 1+φm 1+φ. Let α denote the size of f at V 2. That is, α = Size(f,V 2 ). Let T = π V2 (L a others ). It follows that α T φ φm 1 + φ (φkt(v 2,V 3 ) + φ KT(V 2,V 3 ) ) + = φm 1 + φ (φ3 + φ 4 ) + φ 4 + >1 + φ 3 + φ 4 + φ 5 m m φ KT(V 2,V j ) j=5 φ KT(V 2,V j ) j=5 For any j, j 2 such that j j, it is not hard to verify that KT(V j,v j ) = j + j 2. Moreover, KT(V 3,V 3 ) = 1, KT(V 2,V 3 ) = 4, KT(V 4,V 3 ) = 4, and for any j 5, we have KT(V 3,V j ) = j + 2. Therefore, we have the following calculations of Size(f,V 3 ), Size(f,V 3 ), and Size(f,V 4) (see

Statistical Properties of Social Choice Mechanisms

Statistical Properties of Social Choice Mechanisms Lirong Xia Abstract Most previous research on statistical approaches to social choice focused on the computation and characterization of maximum likelihood