Whom to Ask? Jury Selection for Decision Making Tasks on Micro-blog Services

Similar documents
Spatial Crowdsourcing: Challenges and Applications

On Optimality of Jury Selection in Crowdsourcing

Knapsack Auctions. Sept. 18, 2018

Computational Complexity

arxiv: v3 [cs.gt] 13 Aug 2014

Mobility Analytics through Social and Personal Data. Pierre Senellart

Fast Adaptive Algorithm for Robust Evaluation of Quality of Experience

Crowd-Learning: Improving the Quality of Crowdsourcing Using Sequential Learning

DM-Group Meeting. Subhodip Biswas 10/16/2014

6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search

SF2972 Game Theory Exam with Solutions March 15, 2013

CS246 Final Exam, Winter 2011

Crowdsourcing contests

On-Line Social Systems with Long-Range Goals

Cost and Preference in Recommender Systems Junhua Chen LESS IS MORE

DS504/CS586: Big Data Analytics Graph Mining II

HARNESSING THE WISDOM OF CROWDS

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Limitations of Algorithms

Web Structure Mining Nodes, Links and Influence

ArcGIS. for Server. Understanding our World

Crowdsourcing Pareto-Optimal Object Finding by Pairwise Comparisons

Algorithmic Game Theory and Applications

Social Choice and Social Networks. Aggregation of General Biased Signals (DRAFT)

Use estimation strategies reasonably and fluently while integrating content from each of the other strands. PO 1. Recognize the limitations of

Informational Substitutes

Secretary Problems. Petropanagiotaki Maria. January MPLA, Algorithms & Complexity 2

Bounded Budget Betweenness Centrality Game for Strategic Network Formations

NETS 412: Algorithmic Game Theory March 28 and 30, Lecture Approximation in Mechanism Design. X(v) = arg max v i (a)

CMU Social choice: Advanced manipulation. Teachers: Avrim Blum Ariel Procaccia (this time)

Real-time Sentiment-Based Anomaly Detection in Twitter Data Streams

Arizona Mathematics Standards Articulated by Grade Level (2008) for College Work Readiness (Grades 11 and 12)

Arrow s Impossibility Theorem and Experimental Tests of Preference Aggregation

CS 598RM: Algorithmic Game Theory, Spring Practice Exam Solutions

Multi-Dimensional Online Tracking

MobiHoc 2014 MINIMUM-SIZED INFLUENTIAL NODE SET SELECTION FOR SOCIAL NETWORKS UNDER THE INDEPENDENT CASCADE MODEL

arxiv: v3 [cs.lg] 25 Aug 2017

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018

Multi-Worker-Aware Task Planning in Real-Time Spatial Crowdsourcing

Crowdsourcing via Tensor Augmentation and Completion (TAC)

CS 350 Algorithms and Complexity

Economic Benefit Study on Value of Spatial Information Australian Experience

CS 350 Algorithms and Complexity

DS504/CS586: Big Data Analytics Graph Mining II

Supplementary income generation through Micro-Tasks

MATH : FINAL EXAM INFO/LOGISTICS/ADVICE

Survey of Voting Procedures and Paradoxes

Outline: Ensemble Learning. Ensemble Learning. The Wisdom of Crowds. The Wisdom of Crowds - Really? Crowd wiser than any individual

Dynamic Macroeconomic Theory Notes. David L. Kelly. Department of Economics University of Miami Box Coral Gables, FL

Solution: Since the prices are decreasing, we consider all the nested options {1,..., i}. Given such a set, the expected revenue is.

Lecture 4. 1 Examples of Mechanism Design Problems

Combinatorial Auction-Based Allocation of Virtual Machine Instances in Clouds

Statistical Quality Control for Human Computation and Crowdsourcing

Differentially Private Real-time Data Release over Infinite Trajectory Streams

Economic Growth: Lecture 8, Overlapping Generations

Generalisation and Multiple Representation of Location-Based Social Media Data

Redistribution Mechanisms for Assignment of Heterogeneous Objects

General Equilibrium and Welfare

Algorithms Exam TIN093 /DIT602

Beating Social Pulse: Understanding Information Propagation via Online Social Tagging Systems 1

Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems

Wiki Definition. Reputation Systems I. Outline. Introduction to Reputations. Yury Lifshits. HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte

Sampling and Sample Size. Shawn Cole Harvard Business School

Mining Newsgroups Using Networks Arising From Social Behavior by Rakesh Agrawal et al. Presented by Will Lee

Mechanism Design for Resource Bounded Agents

Yahoo! Labs Nov. 1 st, Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Lecture 10: Mechanism Design

Section #2: Linear and Integer Programming

FINANCIAL OPTIMIZATION

Use of big data, crowdsourcing and GIS in assessment of weather-related impact

Assortment Optimization under the Multinomial Logit Model with Nested Consideration Sets

MEI STRUCTURED MATHEMATICS 4751

1 ** The performance objectives highlighted in italics have been identified as core to an Algebra II course.

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Data Structures in Java

Link Analysis Ranking

CP:

Spam ain t as Diverse as It Seems: Throttling OSN Spam with Templates Underneath

CS 188: Artificial Intelligence. Outline

Online Social Networks and Media. Opinion formation on social networks

Algorithmic Game Theory Introduction to Mechanism Design

Online Social Networks and Media. Link Analysis and Web Search

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016

Third Problem Assignment

Your Virtual Workforce. On Demand. Worldwide. COMPANY PRESENTATION. clickworker GmbH 2017

UnSAID: Uncertainty and Structure in the Access to Intensional Data

General idea. Firms can use competition between agents for. We mainly focus on incentives. 1 incentive and. 2 selection purposes 3 / 101

CSCI 5520: Foundations of Data Privacy Lecture 5 The Chinese University of Hong Kong, Spring February 2015

2. The Binomial Distribution

An example to start off with

Probabilistic Aspects of Voting

Optimal Auctions with Correlated Bidders are Easy

Intro Prefs & Voting Electoral comp. Voter Turnout Agency GIP SIP Rent seeking Partisans. 7. Special-interest politics

P P P NP-Hard: L is NP-hard if for all L NP, L L. Thus, if we could solve L in polynomial. Cook's Theorem and Reductions

Data Mining Recitation Notes Week 3

Approximating MAX-E3LIN is NP-Hard

GEOGRAPHY ADVANCED LEVEL

MATHEMATICS AND STATISTICS

Personalized Social Recommendations Accurate or Private

Second Welfare Theorem

Transcription:

Whom to Ask? Jury Selection for Decision Making Tasks on Micro-blog Services Caleb Chen CAO, Jieying SHE, Yongxin TONG, Lei CHEN The Hong Kong University of Science and Technology

Is Istanbul the capital of Turkey? 2

Social Network/Media services the virtualization and digitalization of people s social activities 3

Minor as dressing for a banquet Major as prediction of macro economy trends two-option decision making tasks tasks 4

Wisdom of Crowd The basic argument there, drawing on a long history of intuition about markets, is that the aggregate behavior of many people, each with limited information, can produce very accurate beliefs. D. Easley, J. Kleinberg, Networks, Crowds, and Markets

Crowdsourcing-powered DB Systems Qurk, Human powered Sorts and Joins, VLDB 2012(MIT) Deco, A System for Declarative Crowdsourcing, VLDB 2012(Stanford) CrowdDB, Answering Queries with Crowdsourcing, SIGMOD 2011(Berkeley) 6

General Crowdsourcing Platforms AMT 7

AMT Requesters MTurk workers (Photo By Andrian Chen) 8

Can we extend the magic power of Crowdsourcing onto social network? 9

Microblog Users Simple 140 characters RT + @ But comprehensive Large network Various backgrounds of users 10

Why Microblog Platform? Twitter AMT Accessibility Social Media Network Highly convenient, on all kinds of mobile devices General Purpose Platform Specific online platform Incentive Altruistic or payment Mostly monetary incentive Supported tasks Simple task as decision making Various types of tasks Communication Infrastructure Tweet and Reply are enough Complex workflow control mechanism Worker Selection Active, Enabled by @ Passively, No exact selection 11

Outline Running Example Problem Definition Jury Selection Algorithms Evaluation 12

Motivation Jury Selection Problem Running Case(1) Is Döner Kebab available in Hong Kong? Given a decision making problem, with budget $1, whom should we ask? 13

Motivation Jury Selection Problem Running Case(2) Is Döner Kebab available in Hong Kong? ε: error rate of an individual r: requirement of an individual, can be virtual Majority Voting to achieve final answer 14

Motivation Jury Selection Problem Running Case(2) Is Döner Kebab available in Hong Kong? Worker : Juror Crowds : Jury Data Quality : Jury Error Rate 15

Motivation Jury Selection Problem Running Case(3) Is Döner Kebab available in Hong Kong? If (A, B, C) are chosen(majority Voting) JER(A,B,C) = 0.1*0.2*0.2 + (1-0.1)*0.2*0.2 + 0.1* (1-0.2)*0.2 + 0.1*0.2*(1-0.2) = 0.072 Better than A(0.1), B(0.2) or C(0.2) individually 16

Motivation Jury Selection Problem Running Case(4) Is Döner Kebab available in Hong Kong? What if we enroll more JER(A,B,C,D,E) = 0.0704 < JER(A,B,C) The more the better? 17

Motivation Jury Selection Problem Running Case(5) Is Döner Kebab available in Hong Kong? What if we enroll even more? JER(A,B,C,D,E,F,G) = 0.0805 > JER(A,B,C,D,E) Hard to calculate JER 18

Motivation Jury Selection Problem Running Case(6) Is Döner Kebab available in Hong Kong? So just pick up the best combination? JER(A,B,C,D,E)=0.0704 R(A,B,C,D,E) = $1.6 > budget($1.0) 19

Motivation Jury Selection Problem Running Case(7) Worker selection for maximize the quality of a particular type of product: the reliability of voting. 20

Outline Motivation Problem Definition Jury Selection Algorithms Evaluation 21

Problem Definition Jury and Voting j 1 ε(0.1) r($0.3) j 2 ε(0.3) r($0.4) j 3 ε(0.2) r($0.2) A Jury J n = {j 1, j 2, j 3 } with 3 jurors j 1 ε(0.1) r($0.3) j 2 ε(0.3) r($0.4) j 3 ε(0.2) r($0.2) 1 0 1 A Voting V n = {1,0,1} from J n 22

Problem Definition Voting Scheme j 1 ε(0.1) r($0.3) j 2 ε(0.3) r($0.4) j 3 ε(0.2) r($0.2) 1 0 1 A Voting V n = {1,0,1} from J n MV(V n ) = 1, ( j i = 2 > 1) 1 23

Invididual Error-rate Problem Definition j 1 ε(0.1) r($0.3) j 2 ε(0.3) r($0.4) j 3 ε(0.2) r($0.2) 1 0 1 A Voting V n = {1,0,1} from J n 24

Problem Definition j 1 ε(0.1) r($0.3) j 2 ε(0.3) r($0.4) j 3 ε(0.2) r($0.2) j 1 ε(0.1) r($0.3) j 2 ε(0.3) r($0.4) j 3 ε(0.2) r($0.2) j 1 ε(0.1) r($0.3) j 2 ε(0.3) r($0.4) j 3 ε(0.2) r($0.2) j 1 ε(0.1) r($0.3) j 2 ε(0.3) r($0.4) j 3 ε(0.2) r($0.2) JJJ J 3 = 0.1*0.3*0.2 + (1-0.1)*0.3*0.2 + 0.1*(1-0.3)*0.2 + 0.1*0.3*(1-0.2) = 0.029 25

Problem Definition Crowdsourcing Models(model of candidate microblog users) 26

Problem Definition Jury Selection Problem(JSP) We hope to form a Jury J n, allowed by the budget, and with lowest JER 27

Outline Motivation Problem Definition Jury Selection Algorithms Evaluation 28

Computation of Jury Error Rate The number of careless jurors(carelessness-c) is a random variable following Poisson Binomial Distribution The naïve computation of JER is exponentially increasing 29

Computation of Jury Error Rate(2) Alg1: Dynamic Programming to compute JER in O(n 2 ) 30

Computation of Jury Error Rate(3) Alg2: Convolution-based to compute JER in O(nlll 2 n) Treat probability distribution as coefficients of polynomials Divide larger jury in two smaller juries Merge by polynomial multiplication Can be speeded up by using FFT 31

Computation of Jury Error Rate(4) Alg2: Convolution-based to compute JER in O(nlll 2 n) Divide into two smaller juries Merge, using FFT to speed up convolution 32

Computation of Jury Error Rate(5) Alg3: lower bound of JER in O(n) time Paley-Zygmund inequality 33

JSP on AltrM(1) Monotonicity with given jury size on varying individual error-rate In English: best jury comes from best jurors Decide the size 34

JSP on AltrM(2) Algorithm for JSP on AltrM Alg_AltrM{ 1. Sort according to error-rate; 2. Starting from 1 to n, increase the jury size by two; 1. Compute JER; 2. Update best current jury; 3. Output best jury; } //keep the size odd Might be convex, future work 35

JSP on PayM(1) Budget is a constraint Objective function is JER NP-hardness Reduce to an nth-order 0-1 Knapsack Problem 36

JSP on PayM(2) Approximate Algorithm Alg_PayM{ 1. Sort according to (requirement * error-rate); 2. Starting from 1 to n, increase the jury size by two; 1. Keep track of pair; //increment might be conducted by size of 1 2. Check whether adding new juror will exceed budget; 3. If so, compute and compare JER; 4. Update best current jury; 3. Output best jury; } 37

Framework 38

Outline Motivation Problem Definition Jury Selection Algorithm Evaluation 39

Parameter Estimation How to estimate such parameter is itself a research topic Individual Error Rate (ε) -- RT graph PageRank and HITS The score in rank is normalized to be the individual error rate Integrated requirement(r) account info Account Age and Account Activity 40

Data Preparation We test our algorithms on both synthetic data and real Twitter data Varying Size Mean Variance 3.4GHz Win7 PC, programmed in C++ 41

Mean = 0.5 is the turning point On the right side, truth rests in the hands of a few. 42

While the budget increases The total cost also increase The jury error rate decreases 43

Green Accurate Algorithm (test with N=20) Blue approximation algorithm O(nlogn) Good approximation on JER 44

Take-away and Future Work Take-away Cultivate a pool of candidate jurors JER deceases very fast according to the size of jury Future Work Beyond direct payment Prediction Market Beyond decision making Campaign Boosting 45

Thank You Q & A Is Döner Kebab available in Hong Kong? 46