Crowdsourcing via Tensor Augmentation and Completion (TAC)

Size: px

Start display at page:

Download "Crowdsourcing via Tensor Augmentation and Completion (TAC)"

Annice Norris
5 years ago
Views:

1 Crowdsourcing via Tensor Augmentation and Completion (TAC) Presenter: Yao Zhou joint work with: Dr. Jingrui He - 1 -

2 Roadmap Background Related work Crowdsourcing based on TAC Experimental results Conclusion - 2 -

3 Crowdsourcing in machine learning Training a supervised machine learning model needs training labels Many crowdsourcing platforms provide services to collect labels information

4 An example of crowdsourcing Lynx (wildcat) Tabby (domestic cat) - 4 -

Key problem of crowdsourcing How to infer the true labels from a large number of labels collected from crowd? Pros: Low cost: Collecting large amounts of labels is economic.

5 Key problem of crowdsourcing How to infer the true labels from a large number of labels collected from crowd? Pros: Low cost: Collecting large amounts of labels is economic. Cons: Low quality: Collected labels from the crowd (non-expert) are noisy. Missing labels: Some workers are not willing to label all of the items. Noisy labels Missing labels - 5 -

Infer the worker s ability, item difficulty and item true labels simultaneously. DS-MF [Liu et al., 2012].

6 Some related work MV Majority Voting, a simple baseline. DS-EM [Dawid and Skene, 1979] Infer worker s ability matrix and true labels. Two-coin model for a binary labelling task. GLAD [Whitehill et al., 2009]. Infer the worker s ability, item difficulty and item true labels simultaneously. DS-MF [Liu et al., 2012]. Employ variational Bayesian inference using meanfield algorithm. MMCE [Zhou et al., 2012]. Employ the minimax entropy principle to infer worker ability, item difficulty and true labels at the same time. Structural information of labels is not utilized!! - 6 -

7 Roadmap Background Related work Crowdsourcing based on TAC Experimental results Conclusion - 7 -

8 Tensor augmentation + 1 Notation: Re-organize labels of crowds as a three-way tensor: Based on worker s labelling decision, generate an index set: Workers: i = 1,2,, N w Items: j = 1,2,, N i Classes: k = 1,, N c Tensor T The ground truth layer: Extra tensor slice of size N i N c. Augmented on tensor along the worker dimension. # of workers N w Ground truth layer # of items N i - 8 -

9 Tensor augmentation and completion (TAC) Goal of TAC: Complete the augmented tensor Main principle of TAC: Rank minimization NP-hard Tightest convex envelope Trace norm minimization - 9 -

Tensor augmentation and completion (TAC) Definition of trace norm for an n-way tensor [Liu et.al 2012]: Here, X l represents for the unfold of a tensor X.

10 Tensor augmentation and completion (TAC) Definition of trace norm for an n-way tensor [Liu et.al 2012]: Here, X l represents for the unfold of a tensor X. The reverse operation is fold. Tensor: X R Unfolded matrices: X (1) R 3 4 X (2) R 2 6 X 3 R 2 6 Reference: Ji Liu et.al. Tensor completion for estimating missing values in visual data. TPAMI

11 Tensor augmentation and completion (TAC) Relaxed objective of TAC with regularization: Index of the ground truth layer Intermediate relaxed matrices Regularization term Solution: Block Coordinate Descend (BCD) Four blocks of variables:

12 Updating M l Sub-problem: Closed form solution, proved by [Cai et.al. 2009]: Here, and τ = Singular Value Thresholding Reference: Jian-Feng Cai, et.al. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization,

13 Updating X Two formulations: Prior guided ground truth inference (PG-TAC) Prior Statistics Relaxed simplex ground truth inference (RS-TAC) Slack variable

14 Updating X Prior guided ground truth inference (PG-TAC) Elements of Set C 3 Ground truth layer Elements of Set C 2 Tensor T Solution: Elements of tensor X can be divided into three sets {C 1, C 2, C 3 } Elements of set C 1 : Elements of Set C 1 Elements of set C 2 : Elements of set C 3 :

15 Updating X Prior guided ground truth inference (RS-TAC) Elements of Set C 3 Ground truth layer Elements of Set C 2 Tensor T Solution: Elements of tensor X can be divided into three sets {C 1, C 2, C 3 } Elements of set C 1 : Elements of Set C 1 Elements of set C 2 : Different from PG-TAC Elements of set C 3 :

16 Roadmap Background Related work Crowdsourcing based on TAC Experimental results Conclusion

17 Experimental Results Lower is better Lower is better Synthetic Data Set: Notations: # of Workers: N w # of Items: N i # of Classes: N c Probability of not giving labels q Initial configuration: N w = 50, N i = 400 N c = 4, q = 0.7 Four configurations:

18 Real-world Data Set: Experimental Results References: Dengyong Zhou, et.al. Learning from the wisdom of crowds by minimax entropy. NIPS, Dengyong Zhou, et.al. Regularized minimax conditional entropy for crowdsourcing. CoRR, Hu Han, et.al. Demographic estimation from face images: Human vs. machine performance. TPAMI, Rion Snow, et.al. Cheap and fast but is it good?: Evaluating non-expert annotations for natural language tasks. EMNLP,

19 Real-world Data Set results: Experimental Results References: Qiang Liu, et al. Variational inference for crowdsourcing. NIPS, Dengyong Zhou, et.al. Learning from the wisdom of crowds by minimax entropy. NIPS, A. P. Dawid, et al. Maximum likelihood estimation of observer error-rates using the EM algorithm. Applied Statistics, Jacob Whitehill et al. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. NIPS,

20 Conclusion Two novel methods PG-TAC and RS-TAC: Augment the data tensor with a ground truth layer. Utilize the structural information of crowd labels. Infer the true labels of items in binary and multi-class settings. Experimental results: Six real data sets. Outperform state-of-the-art methods

21 Thank you! & Questions?

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion