Preference Functions That Score Rankings and Maximum Likelihood Estimation

Preference Functions That Score Rankings and Maximum Likelihood Estimation Vincent Conitzer Department of Computer Science Duke University Durham, NC 27708, USA conitzer@cs.duke.edu Matthew Rognlie Duke University Durham, NC 27708, USA matthew.rognlie@duke.edu Lirong Xia Department of Computer Science Duke University Durham, NC 27708, USA lxia@cs.duke.edu Abstract In social choice, a preference function (PF) takes a set of votes (linear orders over a set of alternatives) as input, and produces one or more rankings (also linear orders over the alternatives) as output. Such functions have many applications, for example, aggregating the preferences of multiple agents, or merging rankings (of, say, webpages) into a single ranking. The key issue is choosing a PF to use. One natural and previously studied approach is to assume that there is an unobserved correct ranking, and the votes are noisy estimates of this. Then, we can use the PF that always chooses the maximum likelihood estimate (MLE) of the correct ranking. In this paper, we define simple ranking scoring functions (SRSFs) and show that the class of neutral SRSFs is exactly the class of neutral PFs that are MLEs for some noise model. We also define composite ranking scoring functions (CRSFs) and show a condition under which these coincide with SRSFs. We study key properties such as consistency and continuity, and consider some example PFs. In particular, we study Single Transferable Vote (STV), a commonly used PF, showing that it is a CRSF but not an SRSF, thereby clarifying the extent to which it is an MLE function. This also gives a new perspective on how ties should be broken under STV. We leave some open questions. 1 Introduction In a typical social choice setting, there is some set of alternatives, and multiple rankings of these alternatives are provided. These input rankings are called the votes. Based on these votes, the goal is either to choose the best alternative, or to create an aggregate ranking of all the alternatives. In this paper, we will be interested in the latter goal; if it is desired to choose the best alternative, then we can simply choose the top-ranked alternative in the aggregate ranking. Formally, a preference function (PF) 1 takes a set of votes (linear orders 1 We use preference function rather than social welfare function because the resulting set of strict rankings need not correspond to a weak ranking (where a set of strict rankings corresponds to over the alternatives) as input, and produces one or more aggregate rankings (also linear orders over the alternatives) as output. The reason for allowing multiple aggregate rankings is to account for the possibility of ties. The key issue is to choose a rule for determining the aggregate ranking, that is, a preference function. So, we may ask the following (vague) question: What is the optimal preference function? This has been (and will likely continue to be) a topic of debate for centuries among social choice theorists. Many different PFs have been proposed, each with its own desirable properties; some of them have elegant axiomatizations. Presumably, which PF is optimal depends on the setting at hand. For example, in some settings, the voters are agents that each have their own idiosyncratic preferences over the alternatives, and the only purpose of voting is to reach a compromise. In such a setting, no alternative can be said to be better than another alternative in any absolute sense: an alternative s quality is defined relative to the votes. Here, it makes sense to pay close attention to issues such as the manipulability of the PF. In other settings, however, there is more of an absolute sense in which some alternatives are better than others. For example, when we wish to aggregate rankings of webpages, provided by multiple search engines in response to the same query, it is reasonable to believe that some of these pages are in fact more relevant than others. The reason that not all of the search engines agree on the ranking is that the search engines are unable to directly perceive this absolute relevance of the pages. Here, it makes sense to think of each vote as a noisy estimate of the correct, absolute ranking. Similarly, in a cooperative multiagent system, the agents may disagree about the ranking of a set of alternatives because they disagree about which alternatives are more likely to lead them to their common goal. Again, we may believe that there is a correct ranking, in the sense that some alternatives really are more likely than others to lead the agents to their common goal; and again, agents rankings are noisy estimates of this correct ranking. Our goal is to find an aggregate ranking that is as close as possible to the correct ranking, based on these noisy estimates. This is the type of setting that we will study in this paper. a weak ranking if it consists of all the strict rankings that can be obtained by breaking the ties in the weak ranking). The term preference function has previously been used in this context [19].

In a 2005 paper, Conitzer and Sandholm considered the following way of making this precise [5]. There is a correct ranking r of the alternatives; given r, for every ranking v, there is a conditional probability P(v r) that a given voter will cast vote v. In this paper, we do not consider the possibility that different voters votes are drawn according to different conditional distributions. Votes are conditionally independent given r. Put another way, the noise that each voter experiences is i.i.d. The Bayesian network in Figure 1 illustrates this setup. vote 1 "correct" outcome vote 2... vote n Figure 1: A Bayesian network representation. The votes are the observed variables, and the noise that a voter experiences is represented by the conditional probability table of that vote. Under this setup, a natural goal is to find the maximum likelihood estimate (MLE) of the correct ranking. If r is drawn uniformly at random, this maximum likelihood estimate also maximizes the posterior probability. The function that takes the votes as input and produces the MLE ranking(s) as output is a preference function; in a sense, it is the optimal one for the particular noise model at hand. As pointed out by Conitzer and Sandholm, they were not the first to consider this type of setup. In fact, the basic idea dates back over two centuries to Condorcet [6], who studied one particular noise model. He solved for the MLE PF for two and three alternatives under this model; the general solution was given two centuries later by Young [18], who showed that the MLE PF for Condorcet s model coincides with a function proposed by Kemeny [10]. This has frequently been used as an argument in favor of using Kemeny s PF; however, different noise models will in general result in different MLE PFs. Several generalizations of this basic noise model have been studied [8; 7; 11; 12]. Conitzer and Sandholm considered the opposite direction: they studied a number of specific well-known PFs and they showed that for some of them, there exists a noise model such that this PF becomes the MLE, whereas for others, no such noise model can be constructed. This shows that the former PFs are in a sense more natural than the latter. Also, when a noise model can be constructed, it gives insight into the PF; moreover, if the noise model is unreasonable in a certain way, it can be modified, resulting in an improved PF. In this paper, we continue this line of work. We provide an exact characterization of the class of (neutral) PFs for which a noise model can be constructed: we show that this class is equal to the class of (neutral) simple ranking scoring functions (SRSFs), which, for every vote, assign a score to every potential aggregate ranking, and the rankings with the highest total score win. We show that several common PFs are SRSFs (these proofs resemble the corresponding proofs by Conitzer and Sandholm that these PFs are MLEs, but the proofs are significantly simpler in the language of SRSFs). We also consider composite ranking scoring functions (CRSFs), which coincide with SRSFs except they can break ties according to another SRSF, and remaining ties according to another SRSF, etc. 2 We show that if there is a bound on the number of votes, then the two classes (SRSFs and CRSFs) coincide. We study some basic properties of SRSFs and CRSFs, some of them closely related to Conitzer and Sandholm s proof techniques. Finally, we study one PF, Single Transferable Vote (STV), also known as Instant Runoff Voting, in detail. STV is used in many elections around the world; additionally, it illustrates a number of key points about our results. A noise model for STV was given by Conitzer and Sandholm. However, this noise model involves probabilities that are infinitesimally smaller than other probabilities. We show that such infinitesimally small probabilities are in a sense necessary, by showing that STV is in fact not an SRSF (when there is no bound on the number of votes). Still, we do show that STV is a CRSF (in a way that resembles the noise model with infinitesimally small probabilities). Hence, STV is in fact an MLE PF if there is an upper bound on the number of votes. Along the way, some interesting questions arise about how ties should be broken under STV. We propose two ways of breaking ties that we believe are perhaps more sensible than the common way, although at least one of them leads to computational difficulties. We leave some open questions. 2 Definitions In the below, we let A be the set of alternatives, A = m, and L(A) the set of linear orders over (that is, strict rankings of) these alternatives. A preference function (PF) is a function f : i=0,1,2,... L(A)i 2 L(A). That is, f takes as input a vector (of any length) V of linear orders (votes) over the alternatives, and as output produces one or more linear orders over (aggregate rankings of) the alternatives. (On many inputs, only a single ranking is produced, but it is possible that there are ties.) Input vectors are also called profiles. We restrict our attention to PFs that are anonymous, that is, they treat all votes equally; hence, a profile can be thought of as a multiset of votes. We will study the following PFs: Positional scoring functions. A positional scoring function is defined by a vector (s 1,..., s m ) R m, with s 1 s 2... s m. An alternative receives s i points every time it is ranked ith. Alternatives are ranked by how many points they receive; if some alternatives end up tied, then they can be ranked in any order (and all the complete rankings that can result from this will be produced by the PF). Examples include plurality (s 1 = 1, s 2 = s 3 =... = s m = 0), veto or anti-plurality (s 1 = s 2 =... = s m 1 = 1, s m = 0), and Borda (s 1 = m 1, s 2 = m 2,...,s m = 0). Kemeny. Given a vote v, a possible ranking r, and two alternatives a, b, let δ(v, r, a, b) = 1 if a v b 2 SRSFs and CRSFs should not be confused with the extremely general Generalized Scoring Rules from Xia and Conitzer [15; 16]. For example, Proposition 5 will show some limitations of SRSFs and CRSFs.

and a r b, and δ(v, r, a, b) = 0 otherwise. Then, f(v ) = argmax a,b A δ(v, r, a, b), that is, we choose the rankings that maximize the total number of times the ranking agrees with a vote on a pair of alternatives. Single Transferable Vote (STV). The alternative with the lowest plurality score (that is, the one that is ranked first by the fewest votes) is ranked last, and is removed from all the votes (so that the plurality scores change). The remainder of the ranking is determined recursively. (We will have more to say about how ties are broken later.) A PF is neutral if treats all alternatives equally. To be precise, a PF is neutral if for any profile V and any permutation on the alternatives, f((v )) = (f(v )). Here, a permutation is applied to a vector or set of rankings of the alternatives by applying it to each individual alternative in those rankings. Naturally, neutrality is a common requirement. Another common requirement for an anonymous PF is homogeneity: if we multiply the profile by some natural number n > 0 that is, replace each vote by n duplicates of it then the outcome should not change. All of the above PFs are anonymous, neutral, and homogenous. We now define noise models and MLE PFs formally. Definition 1 A noise model ν specifies a probability P ν (v r) for every v, r L(A) (so that for all r, we have v L(A) P ν(v r) = 1). Definition 2 A noise model ν is neutral if for any v, r, and permutation on A, we have P ν (v r) = P ν ((v) (r)). Definition 3 A PF f is a maximum likelihood estimator (MLE) if there exists a noise model ν so that f(v ) = arg max P ν(v r). We now define simple ranking scoring functions. Effectively, every vote gives a number of points to every possible aggregate ranking, and the rankings with the most points win. Definition 4 A PF f is a simple ranking scoring function (SRSF) if there exists a function s : L(A) L(A) R such that for all V, f(v ) = argmax s(v, r). Definition 5 A function s : L(A) L(A) R is neutral if for any v, r, and permutation on A, s(v, r) = s((v), (r)). An SRSF can be run by explicitly computing each ranking s score, but because there are m! rankings this is impractical for all but the smallest numbers of alternatives. However, typically, there are more efficient algorithms. For example, we will see that positional scoring functions and the Kemeny function are SRSFs. Positional scoring functions are of course easy to run; running the Kemeny function is NPhard [2] (for an exact complexity analysis, see [9]), but can usually be done quite fast [4; 12; 3]. 3 Equivalence of neutral MLEs and SRSFs We now show the equivalence of MLEs and SRSFs. We only show this for neutral PFs; in fact, it is not true for PFs that are not neutral. For example, a PF that always chooses the same ranking r regardless of the votes is an SRSF, simply by setting s(v, r ) = 1 for all v and setting s(v, r) = 0 everywhere else. However, this PF is not an MLE: given a noise model ν, if we take another ranking r r, we must have v L(A) P ν(v r) = 1 = v L(A) P ν(v r ), hence there exists some v such that P ν (v r) P ν (v r ); it follows that r is not the (sole) winner if v is the only vote. Lemma 1 A neutral PF f is an MLE (for some noise model) if and only if it is an MLE for a neutral noise model. Proof: The if direction is immediate. For the only if direction, given a noise model ν for f, construct a new noise model ν as follows: P ν (v r) = (1/m!) P ν((v) (r)). (Here, ranges over permutations of A.) This is still a valid noise model because P ν (v r) = v L(A)(1/m!) P ν ((v) (r)) = v L(A) (1/m!) P ν ((v) (r)) = 1 ν is also neutral because v L(A) P ν ((v) (r)) = (1/m!) P ν ( ((v)) ((r))) = (1/m!) P ν ( (v) (r)) = P ν (v r) Also, if r arg max P ν(v r), then by the neutrality of f, for any, (r ) arg max P ν((v) r). Hence, r arg max (1/m!) arg max P ν ((v) (r)) = (1/m!) arg max P ν (v r) P ν ((v) (r)) = Conversely, it can similarly be shown that if r / arg max P ν(v r), then r / arg max P ν (v r). Hence, ν is a valid noise model for f. Lemma 2 A neutral PF f is an SRSF if and only if it is an SRSF for a neutral function s. Proof: The if direction is immediate. For the only if direction, given a function s, construct a new function s as follows: s (v, r) = s((v), (r)). s is neutral because s ((v), (r)) = s( ((v)), ((r))) = s( (v), (r)) = s (v, r) Also, if r argmax s(v, r), then by the neutrality of f, for any, (r ) arg max s((v), r). Hence, r arg max s((v), (r)) =

arg max s((v), (r)) = arg max s (v, r) Conversely, it can similarly be shown that if r / argmax s(v, r), then r / arg max s (v, r). Hence, s is a valid function for f. We can now prove the characterization result: Theorem 1 A neutral PF f is an MLE if and only if it is an SRSF. Proof: If f is an MLE, then by Lemma 1, for some neutral ν, f(v ) = arg max P ν (v r) = arg max log( P ν (v r)) = arg max log(p ν (v r)) Hence it is the SRSF where s(v, r) = log(p ν (v r)) (here, s is neutral). Conversely, if f is an SRSF, then by Lemma 2, for some neutral s, f(v ) = arg max s(v, r) = arg max 2P s(v,r) = arg max 2 s(v,r) Because s is neutral, we have that v L(A) 2s(v,r) is the same for all r. (This is because for any r 1, r 2, there exists a permutation on A such that (r 1 ) = r 2, so that we have v L(A) 2s(v,r1) = v L(A) 2s((v),r2) by neutrality, which by changing the order of the summands is equal to v L(A) 2s(v,r2).) It follows that f(v ) = arg max (2s(v,r) )/( v L(A) 2s(v,r) ). Hence f is the maximum likelihood estimator for the noise model ν defined by P ν (v r) = (2 s(v,r) )/( v L(A) 2s(v,r) ). 4 Examples of SRSFs We now show that some common PFs are SRSFs. These proofs resemble the corresponding proofs by Conitzer and Sandholm that these functions are MLEs, but they are simpler. (These propositions also follow from [20].) Proposition 1 Every positional scoring function is an SRSF. Proof: Given a positional scoring function, let t : L(A) A R be defined as follows: t(v, a) is the number of points that a gets for vote v. Then, let s(v, r) = m i=1 (m i)t(v, r(i)), where r(i) is the alternative ranked ith in r. Let us consider the SRSF defined by this function s; it selects m arg max s(v, r) = arg max (m i)t(v, r(i)) = arg max m i=1 i=1(m i) t(v, r(i)) Here, t(v, r(i)) is the total score that alternative r(i) receives under the positional scoring function. Because m i is decreasing in i, to maximize m i=1 (m i) t(v, r(i)), we rank the alternative with the highest total score first, the one with the next-highest total score second, etc. If some of the alternatives are tied, they can be ranked in any order. Not only positional scoring functions are SRSFs, however. Proposition 2 The Kemeny PF is an SRSF. Proof: This is almost immediate: we defined the Kemeny PF by f(v ) = argmax a,b A δ(v, r, a, b), so we simply let s(v, r) = a,b A δ(v, r, a, b). 5 Composite ranking scoring functions An composite ranking scoring function (CRSF) starts by running an SRSF, then (potentially) breaks ties according to another SRSF, and (potentially) any remaining ties according to yet another SRSF, etc. Formally: Definition 6 A CRSF f of depth d consists of a CRSF f of depth d 1 and a functions d : L(A) L(A) R. It chooses f(v ) = arg max r f (V ) s d(v, r). A CRSF of depth 0 returns the set of all rankings L(A). So, a CRSF of (finite) depth d is defined by a sequence f 1,...,f d of SRSFs (where f 2 is used to break ties in the f 1 score, etc.). We can think of the scores at each depth as being infinitesimally smaller than the ones at the previous depths. We can multiply the scores at depth l by ǫ l for some small ǫ and then add all the scores together to obtain an SRSF; however, this SRSF will in general be different from the CRSF. Nevertheless, if ǫ is small relative to the number of votes, then the two will coincide. This is the intuition behind the following result, whose proof we omit to save space. Proposition 3 For any CRSF, for any natural number N, there exists an SRSF that agrees with the CRSF as long as there are at most N votes. Thus, for all practical purposes, we can simulate a CRSF with an SRSF. (Of course, every SRSF is also a CRSF.) 6 Properties of SRSFs/CRSFs In this section, we study some important properties of SRSFs and CRSFs. Specifically, we study consistency and continuity. There are several related works that study similar properties and derive related results, but there are significant differences in the setup. Smith [14] and Young [17] study these properties in social choice rules, which select one or more alternatives as the winner(s); we will discuss their results in more detail in Section 8. However, consistency in the context of preference functions (studied previously by Young and Levenglick [19]) is significantly different from consistency in the context of social choice rules. Other related work includes Myerson [13], who extends the Smith and Young result to settings where voters do not necessarily submit a ranking of the alternatives, and Zwicker [20], who studies a general notion of scoring rules and shows these rules are equivalent to mean proximity rules, which compute the mean location of

the votes according to some embedding in space, and then choose the closest outcome(s). An anonymous PF f is consistent if for any pair of profiles V 1 and V 2, if f(v 1 ) f(v 2 ), then f(v 1 V 2 ) = f(v 1 ) f(v 2 ). That is, if the rankings that f produces given V 1 overlap with those that f produces given V 2, then when V 1 and V 2 are taken together, f must produce the rankings that were produced in both cases, and no others. Proposition 4 Any CRSF is consistent. Proof: Let f be a CRSF of depth k, defined by a sequence of SRSFs f 1,...,f k with score functions s 1,...,s k. For any i k, let F i be the CRSF of depth i defined by the sequence f 1,...,f i. Let V 1, V 2 be profiles such that f(v 1 ) f(v 2 ) ; this also implies that F i (V 1 ) F i (V 2 ) for all i k. We use induction on i to prove that for any i k, F i (V 1 V 2 ) = F i (V 1 ) F i (V 2 ). When i = 1, F 1 (V 1 ) = f 1 (V 1 ) is the set of rankings r that maximize s 1 (V 1, l); F 1 (V 2 ) = f 1 (V 2 ) is the set of rankings r that maximize s 1 (V 2, l). Therefore, F 1 (V 1 ) F 1 (V 2 ) (which we know is nonempty) is the set of rankings r that maximize s 1 (V 1 V 2, r). Now, suppose that for some i k, F i (V 1 V 2 ) = F i (V 1 ) F i (V 2 ). F i+1 (V 1 ) (F i+1 (V 2 )) is the set of rankings r F i (V 1 ) (r F i (V 2 )) that maximize s i+1 (V 1, r) (s i+1 (V 2, r)). Hence, F i+1 (V 1 ) F i+1 (V 2 ) (which we know is nonempty) is the set of rankings r F i (V 1 ) F i (V 2 ) that maximize s i+1 (V 1, r)+ s i+1 (V 2, r) = s i+1 (V 1 V 2, r). By the induction assumption, we have that F i (V 1 ) F i (V 2 ) = F i (V 1 V 2 ), and we know that the set of rankings r F i (V 1 V 2 ) that maximize s i+1 (V 1 V 2, r) is equal to F i+1 (V 1 V 2 ). It follows that F i+1 (V 1 ) F i+1 (V 2 ) = F i+1 (V 1 V 2 ), completing the induction step. For i = k, F k = f, completing the proof. The proofs by Conitzer and Sandholm [5] that several PFs are not MLEs effectively come down to showing examples where these PFs are not consistent. By the above result, this implies they are not CRSFs (and hence not SRSFs, and hence not MLEs). Formally (we will not define these PFs here): Proposition 5 The Bucklin, Copeland, maximin, and ranked pairs PFs are not CRSFs. Proof: These PFs are not consistent: counterexamples can be found in the proofs of Conitzer and Sandholm [5]. Let L(A) = {l 1,...,l m! }. For any anonymous PF f, any profile V can be rewritten as a linear combination of the linear orders in L(A). Let V = m! i=1 t il i, where for any i m!, t i is a non-negative integer. If f is also homogenous, then the domain of f can be extended to the set of all fractional profiles V = m! i=1 t il i where each t i is a nonnegative rational number, as follows. We choose N V > 0, N V N such that for every i m!, t i N V is a integer. Then, we let f(v ) = f(n V V ) (well-defined because of homogeneity). A fractional profile V can be viewed as a point in the m!- dimensional space (Q 0 ) m! where the coefficient t i is the component of the ith dimension. Thus, in a slight abuse of notation, we can apply f to vectors of m! nonnegative rational numbers, under the interpretation that f(t 1,..., t m! ) = f( m! i=1 t il i ). The extension of f to (Q 0 ) m! allows us to define continuity. An anonymous PF f is continuous if for any sequence of points p 1, p 2,... (Q 0 ) m! with 1. lim i p i = p, and 2. for all i N, r f(p i ), we have r f(p). That is, if f produces some ranking r on every point along a sequence that converges to a limit point, then f should also produce r at the limit point. 3 Proposition 6 Any SRSF is continuous. Proof: Let s(p, r) denote the total score of ranking r given profile p. For any sequence of points p 1, p 2,... (Q 0 ) m! with lim i p i = p, we have that for all r L(A), lim i s(p i, r) = s(p, r). If r f(p i ) for all i, then for any r L(A), s(p i, r) s(p i, r ), hence we have s(p, r) = lim i s(p i, r) lim i s(p i, r ) = s(p, r ). It follows that r f(p). In contrast, CRSFs are not necessarily continuous, as shown by the following example. Let f 1 be the SRSF defined by the score function s 1, which is defined by s 1 (v, r) = 1 if v = r and s 1 (v, r) = 0 if v r. Let f 2 be the Borda function. Let f be the CRSF defined by the sequence f 1, f 2. Let m = 3 with alternatives A, B, and C, and let p = {A B C, B C A, C B A}. We have f(p) = {B C A}, but for any ǫ > 0, f(p + ǫ(a B C)) = f 1 (p + ǫ(a B C)) = {A B C}. Therefore, if we let p i = p + 1 i (A B C), it follows that lim i p i = p and for any i, A B C f(p i ), but A B C / f(p). As we have noted before, there is generally a possibility of ties for PFs, and sometimes a PF is not defined for these cases (for example, we have not defined how they should be broken for STV). We can use the continuity property to gain some insight into how ties should be broken. For any S (Q 0 ) m!, let C(S) be the closure of S, that is, C(S) is the smallest set such that for any infinite sequence p 1, p 2,... in S, if lim i p i = p, then p C(S). Let f S be a PF that satisfies anonymity and homogeneity, defined over S. That is, f S : S 2 L(A). The minimal continuous extension of f S is the PF f C(S) : C(S) 2 L(A) such that for any p C(S) and any r L(A), r f C(S) (p) if and only if there exists a sequence p 1, p 2,... in S such that lim i p i = p and for any i, r f S (p i ). The following lemma will be useful in our study of STV. Lemma 3 Suppose we have two SRSFs f, f S that have the same score function s, but f is defined over (Q 0 ) m!, and f S over a set S (Q 0 ) m! such that C(S) = (Q 0 ) m!. If for any r L(A), there exists a profile p r such that f(p r ) = {r}, then f is the minimal continuous extension of f S. Proof: By Proposition 6, f is continuous. On the other hand, for any p (Q 0 ) m! with r f(p), for any i N, f(p + 1 i p r) = {r}. Because C(S) = (Q 0 ) m!, for every i N, there exists a point p i S sufficiently close to p + 1 i p r such that f(p i ) = {r}, because s is continuous and at p + 1 i p r, for any r L(A) with r r, s(p+ 1 i p r, r) s(p+ 1 i p r, r ) > 0. 3 Our definition of continuity is equivalent to the correspondence being upper hemicontinuous, or closed (the two are equivalent in this context).

So, p 1, p 2,... is a sequence in S with for any i, r f S (p i ); therefore, any continuous extension must have r f(p). 7 Single Transferable Vote (STV) In this section, we study the Single Transferable Vote (STV) PF in detail, for two reasons. First, it is a commonly used PF, so it is of interest in its own right. Second, it gives a good illustration of a number of subtle technical phenomena, and a precise understanding of these phenomena is likely to be helpful in the analysis of other PFs. We recall that under STV, in each round, the alternative that is ranked first (among the remaining alternatives) the fewest times is removed from all the votes and ranked the lowest among the remaining alternatives, that is, just above the previously removed alternative. We note that when an alternative is removed, all the votes that ranked it first transfer to the next remaining alternative in that vote. The number of votes ranking an alternative first is that alternative s plurality score in that round. One key issue is determining how ties in a round should be broken, that is, what to do if multiple alternatives have the lowest plurality score in a round. We will at first ignore this and show that STV is a CRSF. (This proof resembles the Conitzer-Sandholm noise model but is much clearer in the language of scoring functions.) Theorem 2 When restricting attention to profiles without ties, STV is a CRSF. Proof: For l L(A), let l(i) be the ith-ranked alternative in l. Let s 1 (v, r) = 0 if r(m) = v(1), and s 1 (v, r) = 1 otherwise. That is, a ranking receives a point for a vote if and only if the ranking does not rank the alternative ranked first in the vote last. Consider the alternative a with the lowest plurality score; the rankings that win under s 1 are exactly the rankings that rank a last. Now, let s 2 (v, r) = 0 if either r(m 1) = v(1), or r(m) = v(1) and r(m 1) = v(2); and s 2 (r, v) = 1 otherwise. That is, a ranking receives a point for a vote unless the ranking ranks the first alternative in the vote second-to-last, or the ranking ranks the first alternative in the vote last and the second alternative in the vote secondto-last. If we look at rankings that survived s 2 the rankings that ranked the alternative a with the lowest plurality score last a ranking that ranks b ( a) second-to-last will fail to receive a point for every vote that ranks b first, and for every vote that ranks a first and b second. That is, it fails to receive a point for every vote that ranks b first in the second iteration of STV. Hence, the rankings that survive s 2 are the ones that rank the alternative that receives the fewest votes in the second iteration of STV second-to-last. More generally, let s k (v, r) = 0 if, letting b = r(m k + 1), for every a such that v 1 (a) < v 1 (b), r 1 (a) > r 1 (b) = m k + 1; and s k (v, r) = 0 otherwise. That is, a ranking receives a point for a vote unless the alternative b ranked kth-to last by r is preceded in v only by alternatives ranked after b in r. Given that r has not yet been eliminated and is hence consistent with STV so far, the latter condition holds if and only if b receives v s vote in the kth iteration of STV. In fact, we can break ties in STV simply according to the scoring functions used in the proof of Theorem 2. We will call the resulting PF CRSF-STV. CRSF-STV is a CRSF and hence consistent. By Theorem 1 and Proposition 3, this means that CRSF-STV is an MLE when there is an upper bound on the number of votes. Does there exist a tiebreaking rule for STV such that it is an SRSF, that is, so that it is an MLE without a bound on the number of votes? We will show that the answer is negative. To do so, we consider one particular tiebreaking rule. Under this rule, when multiple alternatives are tied to be eliminated, we have a choice of which one is eliminated. A ranking is among the winning rankings if and only if there is some sequence of such choices that results in this ranking. We call the resulting PF parallel-universes tiebreaking STV (PUT-STV). (Every choice can be thought of as leading to a separate parallel universe in which STV is executed.) We omit the remaining proofs due to space constraint. Lemma 4 PUT-STV is the minimal continuous extension of STV defined on non-tied profiles. Lemma 5 PUT-STV is not consistent. Corollary 1 PUT-STV is not a CRSF (hence, not an SRSF). This allows us to prove a property of STV in general: Theorem 3 STV is not an SRSF, even when restricting attention to non-tied profiles. We also obtain: Proposition 7 There is no tie-breaking mechanism for STV that makes it both continuous and consistent. Incidentally, PUT-STV is computationally intractable (in a sense). We do not know if the same is true for CRSF-STV. Theorem 4 It is NP-complete to determine whether, given a profile p and an alternative a, one of the winning rankings under PUT-STV ranks a first. As it turns out, neither PUT-STV nor CRSF-STV corresponds to how ties are commonly broken under STV: rather, usually, if there is a tie, all of these alternatives are simultaneously eliminated. Mathematically, this leads to bizarre discontinuities; we omit the details due to space constraint. 8 Characterizing SRSFs/CRSFs axiomatically Examining social choice rules (SCRs), that is, functions that output one or more alternatives as the winner(s) (rather than one or more rankings), Young found the following axiomatic characterization of positional scoring functions [17]. (A similar characterization was given by Smith [14].) He showed that all SCRs satisfying consistency, continuity, and neutrality SCR analogues of the properties we considered must be positional scoring functions, and all positional scoring functions satisfy these properties. Further, dropping continuity, he found that any consistent and neutral SCR must be equivalent to a composite positional scoring function. These results lead to two natural analogous conjectures about PFs. Conjecture 1 Any PF that is consistent, continuous, and neutral is an SRSF (and therefore, an MLE). Conjecture 2 Any PF that is consistent and neutral is a CRSF (therefore, an MLE if the number of votes is bounded).

Our study of STV corroborates these conjectures. By Proposition 7, there is no tie-breaking mechanism for STV that makes it both continuous and consistent; and indeed, we showed that STV is not an SRSF, but it is a CRSF. It does not appear that these conjectures can be easily proven using Smith and Young s techniques. 9 Conclusions The maximum likelihood approach provides a natural way for choosing a PF in settings where it makes sense to think there is a correct ranking. In this paper, we gave a characterization of the neutral MLE PFs, showing they coincide with the neutral SRSFs. We also considered CRSFs as a slight generalization and showed that for bounded numbers of votes they coincide with SRSFs. We considered key properties such as continuity and consistency, and gave examples of SRSFs and CRSFs. We studied STV in detail, showing that it is a CRSF but not an SRSF, and discussed the implications for breaking ties under STV. Finally, we left open questions concerning the complexity of CRSF tiebreaking for STV and whether consistency can be used to characterize the class of SRSFs/CRSFs. We believe that these results will greatly facilitate the use of the maximum likelihood approach in (computational) social choice. Similar results can be obtained for social choice settings other than PFs for example, for social choice rules that only choose the winning alternative(s), or for settings in which the inputs are not linear orders (but rather, for example, labelings of the alternatives as approved or not approved, or partial orders, etc.). As another example that demonstrates the general applicability of the framework, a maximum-likelihood voting approach was recently used in a computational biology application, specifically, NMR protein structure determination [1]. That work focuses on the problem of assigning resonances and NOEs (nuclear overhauser effect) to corresponding nuclei; it does so by having multiple structures in an ensemble vote over the assignments. Acknowledgments We thank Felix Brandt, Zheng Li, Ariel Procaccia, and Bill Zwicker for very helpful discussions, and the reviewers for very helpful comments. The authors are supported by an Alfred P. Sloan Research Fellowship, a Johnson Research Assistantship, and a James B. Duke Fellowship, respectively; they are also supported by NSF under award number IIS-0812113. References [1] Mehmet Serkan Apaydin, Vincent Conitzer, and Bruce Randall Donald. Structure-based protein NMR assignments using native structural ensembles. Journal of Biomolecular NMR, 40(4):263 276, 2008. [2] John Bartholdi, III, Craig Tovey, and Michael Trick. Voting schemes for which it can be difficult to tell who won the election. Social Choice and Welfare, 6:157 165, 1989. [3] Nadja Betzler, Michael R. Fellows, Jiong Guo, Rolf Niedermeier, and Frances A. Rosamond. Computing Kemeny rankings, parameterized by the average KTdistance. COMSOC, 85 96, 2008. [4] Vincent Conitzer, Andrew Davenport, and Jayant Kalagnanam. Improved bounds for computing Kemeny rankings. AAAI, 620 626, 2006. [5] Vincent Conitzer and Tuomas Sandholm. Common voting rules as maximum likelihood estimators. UAI, 145 152, 2005. [6] Marie Jean Antoine Nicolas de Caritat (Marquis de Condorcet). Essai sur l application de l analyse à la probabilité des décisions rendues à la pluralité des voix. 1785. Paris: L Imprimerie Royale. [7] Mohamed Drissi and Michel Truchon. Maximum likelihood approach to vote aggregation with variable probabilities. Technical Report 0211, Departement d economique, Universite Laval, 2002. [8] Michael Fligner and Joseph Verducci. Distance based ranking models. Journal of the Royal Statistical Society B, 48:359 369, 1986. [9] Edith Hemaspaandra, Holger Spakowski, and Joerg Vogel. The complexity of Kemeny elections. Theoretical Computer Science, 349(3):382 391, 2005. [10] John Kemeny. Mathematics without numbers. Daedalus, 88:575 591, 1959. [11] Guy Lebanon and John Lafferty. Cranking: Combining rankings using conditional models on permutations. ICML, 363 370, 2002. [12] Marina Meila, Kapil Phadnis, Arthur Patterson, and Jeff Bilmes. Consensus ranking under the exponential model. UAI, 2007. [13] Roger B. Myerson. Axiomatic derivation of scoring rules without the ordering assumption. Social Choice and Welfare, 12(1):59 74, 1995. [14] John H. Smith. Aggregation of preferences with variable electorate. Econometrica, 41(6):1027 1041, November 1973. [15] Lirong Xia and Vincent Conitzer. Generalized scoring rules and the frequency of coalitional manipulability. EC, 109 118, 2008. [16] Lirong Xia and Vincent Conitzer. Finite local consistency characterizes generalized scoring rules. IJCAI, 2009. [17] H. Peyton Young. Social choice scoring functions. SIAM Journal of Applied Mathematics, 28(4):824 838, 1975. [18] H. Peyton Young. Optimal voting rules. Journal of Economic Perspectives, 9(1):51 64, 1995. [19] H. Peyton Young and Arthur Levenglick. A consistent extension of Condorcet s election principle. SIAM Journal of Applied Mathematics, 35(2):285 300, 1978. [20] William S. Zwicker. Consistency without neutrality in voting rules: When is a vote an average? Mathematical and Computer Modelling, 2008.