Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems

Size: px

Start display at page:

Download "Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems"

Curtis Ray
5 years ago
Views:

1 Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems Sewoong Oh Massachusetts Institute of Technology joint work with David R. Karger and Devavrat Shah September 28, / 13

2 Crowdsourcing Image classification Character recognition Transcription Proofreading 2 / 13

3 Budget-optimal Crowdsourcing Microtasks Workers Add redundancy to cope with errors Objective: Get reliable answers at minimum cost Challenges 1. Task Allocation 2. Inference Problem 3 / 13

4 Budget-optimal Crowdsourcing Microtasks Workers Add redundancy to cope with errors Objective: Get reliable answers at minimum cost Challenges 1. Task Allocation Solution: Random Graph 2. Inference Problem Solution: Low-rank Matrix Approximation 3 / 13

5 Previous Work on Reliable Crowdsourcing Focuses on Inference problem EM-based heuristics with no guarantees Dawid, Skene ( 79) Smyth et al. ( 95) Whitehill et al. ( 09) Welinder et al. ( 10) 4 / 13

6 Task Allocation Microtasks Batches l r Random (l, r)-regular bipartite graphs have good properties Locally Tree-like Good Expander Sharpen Analysis 5 0 }{{} Gap High Signal-to-Noise Ratio 5 / 13

7 Modeling the Crowd i j Binary tasks: s i {1, 1} Worker reliability: p j [0, 1] Assume we know if 1 n { si with probability p A ij = j s i with probability 1 p j j p j > / 13

8 Inference Problem Given: Responses from the crowd {A ij } Find: Estimate of the answer {ŝ i } ( ) ŝ i = sign W ij A ij }{{}}{{} j reliability response Error rate Majority Voting W ij = e Resources Oracle Estimator who knows p j s W ij = log( p j 1p j ) 7 / 13

9 Inference Problem Given: Responses from the crowd {A ij } Find: Estimate of the answer {ŝ i } ( ) ŝ i = sign W ij A ij }{{}}{{} j reliability response Error rate Majority Voting W ij = e Resources Iterative Algorithm learns W ij s Oracle Estimator who knows p j s W ij = log( p j 1p j ) 7 / 13

10 Inference Problem Given: Responses from the crowd {A ij } Find: Estimate of the answer {ŝ i } ( ) ŝ i = sign W ij A ij }{{}}{{} j reliability response Iteratively learn the weights Task-likelihood update Worker-reliability update i j j i Sij W ij L ij = A ij W ij }{{}}{{} j likelihood j reliability A task is likely to be if reliable workers agree that it is W ij = A i }{{} j L i j }{{} i reliability i likelihood A worker is reliable if the worker agreed with our belief on other tasks 7 / 13

11 Iterative Algorithm as Singular Vector Computation A E[A s, p] Random Perturbation = }{{}}{{}}{{} data low-rank signal noise 1. Why are the singular vectors good for inference Good expanders have high SNR 2. Why not use the singular vectors directly Exploit tree-like structure to prove a sharp bound 8 / 13

12 Performance Analysis p 1 p 2 p 3 p 4 p 5 l The performance depends on the worker reliability through q 1 n (2p j 1) 2 n Theorem. [Karger, O., Shah 11] In the large system limit, for σ 2 P error j=1 ( 3 1 qr r p 6 ) q 2 lr q 2 lr1 { exp ql } 2σ 2 and lr > 1/q2 9 / 13

13 How Good is the Performance P error Majority Voting EM Algorithm Iterative Algorithm Iterative algorithm (r > 1/q): 1e Matching minimax lower bound: inf ql P error e 1 16 ql sup P error Alg,G(l) {s i },{p j } F(q) e (qlo(q2 l)) Oracle Estimator 10 / 13

14 Implications P Error e 1 16 ql How much do we need to spend to achieve P Error ɛ Sufficient to choose l 1 q log( 1 ɛ ) Necessary to have l 1 q log( 1 ɛ ) Need q to determine l Can search for q using bisection 11 / 13

15 Resource Allocation Which crowd is better Cost c 1 = $0.04 c 2 = $0.05 Worker Quality P 1 P 2 12 / 13

16 Resource Allocation Which crowd is better Cost c 1 = $0.04 c 2 = $0.05 Worker Quality P 1 P 2 q 1 = E[(2P 1 1) 2 ] q 2 = E[(2P 2 1) 2 ] Invest all resources on arg max q k C k 12 / 13

17 Conclusion Problem: Reliable crowdsourcing with minimum resources Task allocation: random regular graphs Inference algorithm: low-rank matrix approximation Required budget is order-optimal 13 / 13

Iterative Learning for Reliable Crowdsourcing Systems

Iterative Learning for Reliable Crowdsourcing Systems David R. Karger Sewoong Oh Devavrat Shah Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Abstract Crowdsourcing