Preference-Based Rank Elicitation using Statistical Models: The Case of Mallows

Size: px

Start display at page:

Download "Preference-Based Rank Elicitation using Statistical Models: The Case of Mallows"

Alyson Gregory
5 years ago
Views:

1 Preference-Based Rank Elicitation using Statistical Models: The Case of Mallows Robert Busa-Fekete 1 Eyke Huellermeier 2 Balzs Szrnyi 3 1 MTA-SZTE Research Group on Artificial Intelligence, Tisza Lajos krt. 103., H-6720 Szeged, Hungary 2 Department of Computer Science, University of Paderborn, Warburger Str. 100, Paderborn, Germany 3 INRIA Lille - Nord Europe, SequeL project, 40 avenue Halley, Villeneuve dascq, France ICML 2014 June 24, 2014 (UCLA) Ranking Elicitation ICML 2014 June 24, / 21

2 Outline 1 Preference-based stochastic multi-armed bandit (PB-MAB) 2 PB-MAB Framework with Statistical Models over Ranking 3 Concrete implementations with Mallows model 4 Numerical Experiments (UCLA) Ranking Elicitation ICML 2014 June 24, / 21

3 Motivation Exploiting revealed preferences to learn a ranking Example: Crowdsourcing Amazon Mechanical Turk Widely-used platform in Natural Language Processing (NLP) to annotate training database Machine Translation: for each English sentence, there are given many possible translations Goal: either to find a ranking which reflects to the quality of the translations, or the best translation The annotators might be asked in terms of simple questions: Is translation A better than translation B? (UCLA) Ranking Elicitation ICML 2014 June 24, / 21

4 Preference-based stochastic multi-armed bandit setup Give M items/arms: A = {a 1,..., a M } Items/arms can only be compared in a pairwise manner In a time step t, the (online) learning algorithm selects a pair of items (i t, j t ) to be compared feedback Pairwise probability for items a i and a j : p i,j = P(a i a j ) = E[I{a i a j }] (1) follows a fixed probabilistic distribution If p i,j > 1/2, then item a i is preferred to item a j If p i,j < 1/2, then item a j is preferred to item a i (UCLA) Ranking Elicitation ICML 2014 June 24, / 21

5 Goal of the online learner (decision maker/agent) Find the best item (with high probability) Find a ranking over item (with high probability) Minimize the number of pairwise comparisons (UCLA) Ranking Elicitation ICML 2014 June 24, / 21

6 Modeling assumptions Elicit a ranking based on probabilistic (noisy) feedback Establish a connection to statistical models of rank data Full ranking (r 1,..., r i,..., r j,..., r M ) P(. ) Observation I{r i < r j } P(. ) is a parametric probability distribution over the set of ranking S M Making inference about P based on sampled pairwise comparisons (UCLA) Ranking Elicitation ICML 2014 June 24, / 21

7 Modeling assumptions Pairwise probabilities can be written as p i,j = P(a i a j ) = where L(r j > r i ) = {r S M r j > r i } r L(r j >r i ) P(r ) (2) Implicitly, we assume certain regularity properites on P induced by P(. ) Pairwise probability cannot be arbitrary (UCLA) Ranking Elicitation ICML 2014 June 24, / 21

8 Modeling assumptions Strong Stochastic Transitivity For any triplet of arms such that a i a j a k, p i,k max(p i,j, p j,k ) If we know a i a j and a j a k then a i a k Nice regularity property Reduce sample complexity (UCLA) Ranking Elicitation ICML 2014 June 24, / 21

9 Online learning framework (UCLA) Ranking Elicitation ICML 2014 June 24, / 21

10 How to make the setup complete (MPI) Find the most preferred item (or arm) a i defined as i = arg max 1 i M E [I{r i = 1}] (3) r P(. ) (MPR) Find the most probable ranking r given the ranking model P(. ) r = arg max P(r ) (4) r S M All goals are meant to be achieved with probability at least 1 δ Based as few pairwise comparisons as possible (UCLA) Ranking Elicitation ICML 2014 June 24, / 21

11 Mallows φ-model The probability of observing a ranking r is r = ( r 1,..., r M ) is the center ranking P(r φ, r) = 1 Z(φ) φd(r, r) (5) d(.,.) is the Kendall s rank distance defined as d(r, r) = 1 i<j M φ (0, 1] is the spread parameter φ = 1 uniform distribution φ 0 P( r φ, r) 1 (becomes more peaky) Z(φ) is the normalisation factor I{(r i r j )( r i r j ) < 0} (6) (UCLA) Ranking Elicitation ICML 2014 June 24, / 21

12 Find the most preferred item for Mallows model (MPI) i = arg max 1 i M E [I{r i = 1}] (7) r P(. φ, r) Most preferred item (MPI) is the one for which r i = 1 The center ranking determines a total order on the set of items such that if r i < r j then p i,j > 1/2, and r i > r j then p i,j < 1/2 (UCLA) Ranking Elicitation ICML 2014 June 24, / 21

13 Find the most preferred item for Mallows model (MPI) i = arg max 1 i M E [I{r i = 1}] (7) r P(. φ, r) Most preferred item (MPI) is the one for which r i = 1 The center ranking determines a total order on the set of items such that if r i < r j then p i,j > 1/2, and r i > r j then p i,j < 1/2 MALLOWSMPI(δ) 1 Pick a random item a i 2 Pick another item, say a j, which has not selected yet, if there is no such, then break 3 Compare a i and a j until 1/2 / [ˆp i,j c i,j, ˆp i,j + c i,j ], where 1 c i,j = log 4n2 i,j M 2n i,j δ 4 if 1/2 < ˆp i,j c i,j, then keep a i goto 2, otherwise keep a j (8) (UCLA) Ranking Elicitation ICML 2014 June 24, / 21

14 Find the most probable ranking for Mallows model (MPR) Most probable ranking is the center ranking 1 r = arg max P(r φ, r) = arg max φd(r, r) (9) r S M r S M Z(φ) The center ranking determines a total order on the set of items such that if r i < r j then p i,j > 1/2, and r i > r j then p i,j < 1/2 (UCLA) Ranking Elicitation ICML 2014 June 24, / 21

15 Find the most probable ranking for Mallows model (MPR) MALLOWSMPR(δ) Follow the Merge sort strategy If the sorting algorithm compares two distinct items, say a i and a j, then compare them until 1/2 / [ˆp i,j c i,j, ˆp i,j + c i,j ], where 1 c i,j = log 4n2 i,j C M 2n i,j δ Worst case performance of merge sort algorithm: C M = M log 2 M M + 1 (Theorem 1, Flajolet & Golin (1994)) (10) (UCLA) Ranking Elicitation ICML 2014 June 24, / 21

16 Sample complexity MallowsMPI(δ) O( M ρ 2 log M ), (11) δρ where ρ = 1 φ 1+φ lim φ 0 1/ρ 2 = 1 (more peaky distribution easier task) lim φ 1 1/ρ 2 = (more uniform harder task) MALLOWSMPR(δ) O( M log 2 M ρ 2 log M log 2 M ) (12) δρ (UCLA) Ranking Elicitation ICML 2014 June 24, / 21

17 Numerical experiments for identifying Most Preferred Item (MPI) Verify that if the model assumptions are valid, the purposed algorithm is efficient. If Mallows model is assumed best arm = most preferred item Beat the mean (in PAC setting) [Yue and Joachims, 2011] Interleaved Filter [Yue et al., 2012] Compare the algorithms in terms of sample complexity Sample complexity: number of pairwise comparison take prior to the termination 100 repetitions The confidence parameter δ was set to 0.05, and thus, the accuracy was significantly higher than 0.95 in every case. (UCLA) Ranking Elicitation ICML 2014 June 24, / 21

18 Numerical experiments for identifying MPI (UCLA) Ranking Elicitation ICML 2014 June 24, / 21

19 Numerical experiments for identifying Most Probable Ranking (MPR) Most probable ranking = center ranking r = arg max P(r φ, r) (13) r S M Parameter estimation method for Mallows which can handle incomplete ranking [Cheng et al., 2009] Validated on datasets that consist of pairwise comparisons Assessed the accuracy of the estimator for center ranking on datasets with various size. (UCLA) Ranking Elicitation ICML 2014 June 24, / 21

20 Numerical experiments for identifying MPR Solid Line: Parameter estimation method by Cheng et al. [2009] Dashed vertical lines: Merge sort-based MPR algorithm (UCLA) Ranking Elicitation ICML 2014 June 24, / 21

21 Conclusion The purposed algorithms are efficient, if the modeling assumption holds Minimize sample complexity Guarantee a certain level of confidence (UCLA) Ranking Elicitation ICML 2014 June 24, / 21

22 The End (UCLA) Ranking Elicitation ICML 2014 June 24, / 21

Preference-Based Rank Elicitation using Statistical Models: The Case of Mallows

Preference-Based Rank Elicitation using Statistical Models: The Case of Mallows : The Case of Mallows Róbert Busa-Fekete BUSAROBI@INF.U-SZEGED.HU Eyke Hüllermeier EYKE@UPB.DE Balázs Szörényi,3 SZORENYI@INF.U-SZEGED.HU MTA-SZTE Research Group on Artificial Intelligence, Tisza Lajos