Crowdsourcing Pareto-Optimal Object Finding by Pairwise Comparisons

Similar documents
Arrow s Impossibility Theorem and Experimental Tests of Preference Aggregation

Partial lecture notes THE PROBABILITY OF VIOLATING ARROW S CONDITIONS

Presentation for CSG399 By Jian Wen

Aggregating Ordinal Labels from Crowds by Minimax Conditional Entropy. Denny Zhou Qiang Liu John Platt Chris Meek

Follow links for Class Use and other Permissions. For more information send to:

Recap Social Choice Fun Game Voting Paradoxes Properties. Social Choice. Lecture 11. Social Choice Lecture 11, Slide 1

Ex Post Cheap Talk : Value of Information and Value of Signals

CS-E4830 Kernel Methods in Machine Learning

Crowd-Learning: Improving the Quality of Crowdsourcing Using Sequential Learning

What do you do when you can t use money to solve your problems?

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 3 Centrality, Similarity, and Strength Ties

Large-Margin Thresholded Ensembles for Ordinal Regression

On Optimality of Jury Selection in Crowdsourcing

A Polynomial-time Nash Equilibrium Algorithm for Repeated Games

Fast Adaptive Algorithm for Robust Evaluation of Quality of Experience

Classification Based on Logical Concept Analysis

CS 347 Parallel and Distributed Data Processing

arxiv: v2 [math.co] 14 Apr 2011

THE POSTDOC VARIANT OF THE SECRETARY PROBLEM 1. INTRODUCTION.

RAPPOR: Randomized Aggregatable Privacy- Preserving Ordinal Response

Learning MN Parameters with Alternative Objective Functions. Sargur Srihari

ON SOCIAL-TEMPORAL GROUP QUERY WITH ACQUAINTANCE CONSTRAINT

5 ProbabilisticAnalysisandRandomized Algorithms

On the Complexity of Mining Itemsets from the Crowd Using Taxonomies

EconS Advanced Microeconomics II Handout on Subgame Perfect Equilibrium (SPNE)

: Cryptography and Game Theory Ran Canetti and Alon Rosen. Lecture 8

Introduction to Bayesian Learning. Machine Learning Fall 2018

Lecture 4. 1 Examples of Mechanism Design Problems

Knowledge-based Agents. CS 331: Artificial Intelligence Propositional Logic I. Knowledge-based Agents. Outline. Knowledge-based Agents

CS 331: Artificial Intelligence Propositional Logic I. Knowledge-based Agents

"Arrow s Theorem and the Gibbard-Satterthwaite Theorem: A Unified Approach", by Phillip Reny. Economic Letters (70) (2001),

A Database Framework for Probabilistic Preferences

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Recommendation. Tobias Scheffer

Pareto Efficiency (also called Pareto Optimality)

1 Distributional problems

Finding Pareto Optimal Groups: Group based Skyline

Algorithmic Game Theory Introduction to Mechanism Design

Complex Matrix Transformations

Question Selection for Crowd Entity Resolution

Randomized Decision Trees

USE AND OPTIMISATION OF PAID CROWDSOURCING FOR THE COLLECTION OF GEODATA

USE AND OPTIMISATION OF PAID CROWDSOURCING FOR THE COLLECTION OF GEODATA

Whom to Ask? Jury Selection for Decision Making Tasks on Micro-blog Services

3. When a researcher wants to identify particular types of cases for in-depth investigation; purpose less to generalize to larger population than to g

Single-peaked consistency and its complexity

Efficient query evaluation

SYSU Lectures on the Theory of Aggregation Lecture 2: Binary Aggregation with Integrity Constraints

Coalitional Structure of the Muller-Satterthwaite Theorem

From Non-Negative Matrix Factorization to Deep Learning

CSE 21 Practice Exam for Midterm 2 Fall 2017

Stable matching. Carlos Hurtado. July 5th, Department of Economics University of Illinois at Urbana-Champaign

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

from

The Benefits of a Model of Annotation

Determining the Diameter of Small World Networks

Quadratic Equations Part I

6.867 Machine learning

10-725/36-725: Convex Optimization Spring Lecture 21: April 6

Chapter 4 Optimized Implementation of Logic Functions

Data Mining Techniques

Lecture 12. Applications to Algorithmic Game Theory & Computational Social Choice. CSC Allan Borodin & Nisarg Shah 1

CMSC 451: Lecture 7 Greedy Algorithms for Scheduling Tuesday, Sep 19, 2017

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016

Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games

Bagging. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL 8.7

Economics 3012 Strategic Behavior Andy McLennan October 20, 2006

Crowdsourcing & Optimal Budget Allocation in Crowd Labeling

Section 3: Simple Linear Regression

Algorithms. NP -Complete Problems. Dong Kyue Kim Hanyang University

Machine Learning I Reinforcement Learning

Constitutional Rights and Pareto Efficiency

3.1 Arrow s Theorem. We study now the general case in which the society has to choose among a number of alternatives

OPTIMIZING SEARCH ENGINES USING CLICKTHROUGH DATA. Paper By: Thorsten Joachims (Cornell University) Presented By: Roy Levin (Technion)

Game Theory: Spring 2017

Stable and Pareto optimal group activity selection from ordinal preferences

Preference-Based Rank Elicitation using Statistical Models: The Case of Mallows

Preference Orderings

CHAPTER 3. THE IMPERFECT CUMULATIVE SCALE

Price: $25 (incl. T-Shirt, morning tea and lunch) Visit:

MRI: Meaningful Interpretations of Collaborative Ratings

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16

APPLIED MECHANISM DESIGN FOR SOCIAL GOOD

Matching. Terence Johnson. April 17, University of Notre Dame. Terence Johnson (ND) Matching April 17, / 41

Algorithms for Playing and Solving games*

Fair Divsion in Theory and Practice

Matching Theory and the Allocation of Kidney Transplantations

CS 6375: Machine Learning Computational Learning Theory

Notes on Coursera s Game Theory

Against the F-score. Adam Yedidia. December 8, This essay explains why the F-score is a poor metric for the success of a statistical prediction.

6.207/14.15: Networks Lecture 24: Decisions in Groups

Real-time Scheduling of Periodic Tasks (2) Advanced Operating Systems Lecture 3

Game Theory for Linguists

Monotonicity and Nash Implementation in Matching Markets with Contracts

Iterated Strict Dominance in Pure Strategies

Computing Minmax; Dominance

Points-based rules respecting a pairwise-change-symmetric ordering

Algorithmic Mechanism Design

DECISIONS UNDER UNCERTAINTY

Combining Voting Rules Together

Designing Information Devices and Systems I Spring 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way

Transcription:

2015 The University of Texas at Arlington. All Rights Reserved. Crowdsourcing Pareto-Optimal Object Finding by Pairwise Comparisons Abolfazl Asudeh, Gensheng Zhang, Naeemul Hassan, Chengkai Li, Gergely V. Zaruba Department of Computer Science and Engineering University of Texas at Arlington CIKM, October 21, 2015, Melbourne, Australia

2015 The University of Texas at Arlington. All Rights Reserved. Humans Obsession: Comparing Things Who is the better footballer? http://visual.ly/messi-vs-ronaldo Which is the better company to work for?

Facebook vs. Google http://www.businessinsider.com/facebook-vs-google-best-employer-2013-11 2015 The University of Texas at Arlington. All Rights Reserved.

2015 The University of Texas at Arlington. All Rights Reserved. Facebook vs. Google Multiple criteria Overall satisfaction CEO approval Employee confidence in the future Perks and salaries Interview difficulty Which one is better?

Overview 1. Ask crowd to compare objects on individual criteria 2. Derive partial knowledge about preference relations based on responses from crowd 3. Find all Pareto-optimal objects without exhausting all possible comparison questions General framework and its instantiations (algorithms), with the goal of minimizing number of questions

Preference (Better-Than) Relation P c (x, y) P c (or x c y) x is better than (preferred over) y with regard to criterion c Assumptions on data model: o o P c is a strict partial order (as opposed to a total order) Irreflexivity, Transitivity, Asymmetry No explicit attribute representation, thus no equivalence on a criterion Notations: o o (indifferent) x c y (x, y) P c (y, x) P c (not better than) x c y (x, y) P c

Pareto-optimal Objects Object dominance: consider objects O, criteria C y x c C : x c y and c C such that y c x (i.e., x is not better than y by any criterion and y is better than x by at least one criterion) x O is Pareto-optimal x is not dominated by any other object Example: o c d c story d, c music d, c acting d o Only one Pareto-optimal object: b

Deriving Preference Relations by Aggregating Crowd s Responses to Pairwise Comparisons

Pareto-Optimal Object Finding Problem statement: Given objects O and criteria C, find all Pareto-optimal objects, using pairwise comparisons by individual criteria Cost metric o Goal: as few pairwise comparison questions as possible o Simple, but reflect real-world monetary cost and time delay o Brute-force approach : C x O x ( O -1)/2 questions Assumptions on execution model o Sequential execution: get rlt(q i ) before asking q i+1 o No consideration of worker quality

Applications Collecting Public Opinion o Best companies to work for, best cities to live in Group Decision Making o Where for lunch, which product to use, which candidate to hire Information Exploration o Compare photos by color, sharpness, and landscape Back to the which one is better? o After finding Pareto-optimal objects, further actions (ranking, filtering, visualization) to find desirable objects

Related Work [12] Chen et al. WSDM13 [15] Davidson et al. ICDT13 [17] Grozet al. PODS15 [23] Lofi et al. EDBT13 [25] Polychronopoulos et al. WebDB13 [ 4] technical report of this paper Other related work: collaborative filtering, learning to rank,

Related Work Explicit attribute representation o Total order on ordinal attributes (sizes of houses, ratings of restaurants), Partial-order on categorical attributes (genres of movies) o Not always easy to model and/or for users to provide missing values (e.g., story of movies) Pairwise comparisons o Known to be easier, faster, and less error-prone. Widely used in social choice and welfare, preferences, and voting Partial order vs. Total order o A direct effect of using pairwise comparisons o Not always natural to enforce a total order

General, Iterative Algorithm Framework 4-steps in each iteration (1) Question selection (2) Outcome derivation (3) Contradiction resolution (4) Termination test

(2) Outcome Derivation The framework is agnostic to the outcome derivation method. Other conceivable method can be plugged in. Question o q = x? c y (compare x and y by criterion c) Three possible outcomes based on voting rlt(q) = { x c y y c x x c y

(3) Contradiction Resolution Assume transitivity in preference relation, and enforce it. case 1 x y z criterion c

(3) Contradiction Resolution Assume transitivity in preference relation, and enforce it. case 1 x y z criterion c assumed

(3) Contradiction Resolution Assume transitivity in preference relation, and enforce it. x y z criterion c assumed case 1 case 2 x ~ y z criterion c

(3) Contradiction Resolution Assume transitivity in preference relation, and enforce it. x ~ y assumed ~ case 1 case 2 assumed z criterion c x y z criterion c

(4) Termination Test At the end of each iteration, objects are partitioned into 3 sets, based on incomplete preference relations R + (Q) so far. o O : Pareto-optimal objects o O : Non Pareto-optimal objects o O? : R + (Q) is insufficient for discerning these objects Pareto-optimality o O? = terminate

(1) Question Selection The process of executing a question sequence Q = q1,..., qn o o Q is a terminal sequence if O? = based on R + (Q). Goal: among many terminal sequences, execute a short sequence Lower bound o Theorem 2: At ldeast ( O k) C +(k 1) 2 pairwise comparison questions are necessary, where k is the number of Pareto-optimal objects. Bad news o o Worst-case : C x O x ( O -1)/2 questions; cannot do better than brute-force E.g., suppose all objects are indifferent by every criterion. If any comparison x? c y is skipped, we will not be able to determine if x and y are indifferent or if one dominates another.

Transitivity of Object Dominance: Doesn t Hold A cost-saving property for skyline queries o Object dominance transitivity: x y, y z x z o Immediately prune a dominated object from further comparison. (Any object dominated by y is also dominated by x.)

Transitivity of Object Dominance: Doesn t Hold Fundamental reason: lack of explicit attribute representation o In skyline/preference queries: on any attribute, x >= y, y>=z x>=z. o In Pareto-optimal object finding: x c or c y, y c or c z x c or c z (not true) Even possible that an object is dominated by only one non- Pareto optimal object.

Can Still Benefit From A Similar Idea For a non-pareto optimal object, we only need to know at least one object dominates it. We don t care about which other objects also dominate it. Overriding principle of the framework: o o Identify non-pareto objects as early as possible Postpone their comparisons with other objects as much as possible

Candidate Questions Given asked questions Q = q1,..., qn, x? c y is a candidate question iff it satisfies 3 conditions: (i) The outcome of x? c y is unknown yet, i.e., rlt(x? c y) R + (Q) (ii) x O? (iii) Based on R+(Q), y x is not ruled out yet. How to rule out y x? c C such that x c y R + (Q) y x

Only Choosing from Candidate Questions Sufficient Property 2: Qcan = O? = Efficient Theorem 1: If Q contains non-candidate questions, there exists a shorter or equally long sequence Q without non-candidate questions such that Q finds at least all dominated objects found by Q.

Macro-ordering, Micro-ordering Both guided by the overriding principle Macro-ordering: When available, we choose a candidate question x? c y such that y O Micro-ordering: Several question ordering heuristics: o Random Question (RandomQ): randomly choose a candidate question x? c y o Random Pair (RandomP): randomly choose a candidate question x? c y, continue to finish all remaining candidate questions between x and y. o Fewest Remaining Questions (FRQ): Choose a pair with the fewest remaining questions. Ties are broken based on how many objects are better/worse than x and y on the criterion.

Experiments by Simulation o Used an 10000-tuple NBA dataset that records players perseason performance on 10 criteria (points-per-game, ) o Simulated Partial orders based on players performance comparison, with some perturbations.

Effectiveness of Candidate Questions and Macro- Ordering

Effectiveness of Micro-Ordering

Experiments using Amazon Mechanical Turk o Compare 100 photos of UT-Arlington campus, by color, sharpness, landscape. o All 14, 850 possible pairwise questions were partitioned into 1, 650 tasks, each containing 9 questions on a criterion. o Worker qualification: responded to at least 100 HITs before with at least 90% approval rate 2 additional validation questions mixed in each task

Experiments using Amazon Mechanical Turk

Limitations and Future Work No performance guarantee: as bad as brute-force in worst-case Non-deterministic results: due to contradiction resolution Possibly empty result: due to lack of object dominance transitivity No consideration of different levels of confidence on question outcomes or crowdsourcers' quality. Future work: Pareto-optimal objects in probabilistic sense? No consideration of parallel/batch-execution scheme Future work: Parallel scheme

2015 The University of Texas at Arlington. All Rights Reserved. Thank You! Questions? Chengkai Li http://ranger.uta.edu/~cli cli@uta.edu

(3) Contradiction Resolution How often does contradiction happen? o o Depends on data itself, k, and θ It may not happen a lot. Intuitively: as long as the underlying relation is transitive, collective wisdom of crowd should reflect it. Preference judgments of relevance in document retrieval are transitive [27, 11].

Brute-Force on the Toy Example 6 objects, 3 criteria: 45 comparisons

RandomQ on the Toy Example

RandomP on the Toy Example

FRQ on the Toy Example