Uncovering the Latent Structures of Crowd Labeling

Similar documents
Learning from the Wisdom of Crowds by Minimax Entropy. Denny Zhou, John Platt, Sumit Basu and Yi Mao Microsoft Research, Redmond, WA

Crowdsourcing via Tensor Augmentation and Completion (TAC)

Improving Quality of Crowdsourced Labels via Probabilistic Matrix Factorization

Aggregating Ordinal Labels from Crowds by Minimax Conditional Entropy. Denny Zhou Qiang Liu John Platt Chris Meek

arxiv: v2 [cs.lg] 17 Nov 2016

Adaptive Crowdsourcing via EM with Prior

Truth Discovery and Crowdsourcing Aggregation: A Unified Perspective

Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems

Permuation Models meet Dawid-Skene: A Generalised Model for Crowdsourcing

Learning Medical Diagnosis Models from Multiple Experts

On the Impossibility of Convex Inference in Human Computation

Nonparametric Bayesian Methods (Gaussian Processes)

Non-Parametric Bayes

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Learning from Crowds in the Presence of Schools of Thought

Iterative Learning for Reliable Crowdsourcing Systems

Crowdsourcing label quality: a theoretical analysis

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Crowdsourcing with Multi-Dimensional Trust

Finite Mixture Model of Bounded Semi-naive Bayesian Networks Classifier

arxiv: v1 [cs.lg] 2 Jan 2019

Online Bayesian Passive-Agressive Learning

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Algorithmisches Lernen/Machine Learning

CSCI-567: Machine Learning (Spring 2019)

Mixtures of Gaussians. Sargur Srihari

Variational Inference for Crowdsourcing

Latent Variable Models and EM algorithm

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models

Collaborative topic models: motivations cont

Aggregating Crowdsourced Ordinal Labels via Bayesian Clustering

arxiv: v2 [cs.lg] 20 May 2018

Convergence Rate of Expectation-Maximization

Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem

Machine Learning for Signal Processing Bayes Classification

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Automated Segmentation of Low Light Level Imagery using Poisson MAP- MRF Labelling

Statistical Debugging with Latent Topic Models

Parts 3-6 are EXAMPLES for cse634

A Generative Block-Diagonal Model for Clustering

Learning From Crowds. Presented by: Bei Peng 03/24/15

RaRE: Social Rank Regulated Large-scale Network Embedding

Content-based Recommendation

arxiv: v3 [cs.lg] 25 Aug 2017

Learning Binary Classifiers for Multi-Class Problem

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

CS 188: Artificial Intelligence. Outline

Generative Clustering, Topic Modeling, & Bayesian Inference

Chapter 14 Combining Models

Cheng Soon Ong & Christian Walder. Canberra February June 2018

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views

We Prediction of Geological Characteristic Using Gaussian Mixture Model

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up

Bandit-Based Task Assignment for Heterogeneous Crowdsourcing

Structure Learning in Sequential Data

Small-variance Asymptotics for Dirichlet Process Mixtures of SVMs

Chapter 16. Structured Probabilistic Models for Deep Learning

Study Notes on the Latent Dirichlet Allocation

Recent Advances in Bayesian Inference Techniques

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

CS145: INTRODUCTION TO DATA MINING

An Approach to Classification Based on Fuzzy Association Rules

Machine Learning for natural language processing

arxiv: v3 [stat.ml] 1 Nov 2014

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

Online Bayesian Passive-Aggressive Learning

Expectation Maximisation (EM) CS 486/686: Introduction to Artificial Intelligence University of Waterloo

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Click Prediction and Preference Ranking of RSS Feeds

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI

Machine Learning Basics

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference

A Bayesian model for fusing biomedical labels

An Empirical Study of Building Compact Ensembles

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Advanced Machine Learning

Randomized Decision Trees

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Latent Dirichlet Allocation (LDA)

Midterm, Fall 2003

INFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Dirichlet Process Based Evolutionary Clustering

A Bayesian Approach to Concept Drift

Chapter 08: Direct Maximum Likelihood/MAP Estimation and Incomplete Data Problems

Gaussian Mixture Model

CS446: Machine Learning Fall Final Exam. December 6 th, 2016

TUTORIAL PART 1 Unsupervised Learning

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks

Short Note: Naive Bayes Classifiers and Permanence of Ratios

Course 395: Machine Learning

Artificial Neural Networks

Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington

A Hybrid Generative/Discriminative Approach to Semi-supervised Classifier Design

The Benefits of a Model of Annotation

Transcription:

Uncovering the Latent Structures of Crowd Labeling Tian Tian and Jun Zhu Presenter:XXX Tsinghua University 1 / 26

Motivation Outline 1 Motivation 2 Related Works 3 Crowdsourcing Latent Class 4 Experiments 5 Conclusion 2 / 26

Motivation Motivation: Background Artificial intelligence are relying more and more on large-scale training datasets. Expert labeling are expensive and time-consuming. 3 / 26

Motivation Motivation: Crowdsourcing Solution Using multiple web workers to label each item. Recovery the ground truth from the noisy data. Table: Different workers may give inconsistent labels to a same item. worker a worker b worker c worker d item 1 1 2 1 1 item 2 2 2 1 2 item 3 1 1 2 1 item 4 1 1 1 2 4 / 26

Related Works Outline 1 Motivation 2 Related Works 3 Crowdsourcing Latent Class 4 Experiments 5 Conclusion 5 / 26

Related Works Majority Voting Majority Voting Assumption For every worker, the ground truth label is always the most common to be given, and the labels for each item are given independently. P(Y m = d) = (n,m) I δ W nm,d d,(n,m) I δ, m, (1) W nm,d 6 / 26

Related Works Dawid-Skene Estimator Dawid-Skene estimator Assumption The performance of a worker is consistent across different items, and his or her behavior can be measured by a confusion matrix. Table: An example of binary classification confusion matrix. label A label B label A 1 0 label B 1/3 2/3 7 / 26

Related Works Dawid-Skene Estimator Dawid-Skene estimator Assumption The performance of a worker is consistent across different items, and his or her behavior can be measured by a confusion matrix. P(W q, p) = m d q d n,l p ndl δ W nm,l, (2) p ndl : the probability that worker n labels an item as l when it s ground truth label is d. 8 / 26

Crowdsourcing Latent Class Outline 1 Motivation 2 Related Works 3 Crowdsourcing Latent Class 4 Experiments 5 Conclusion 9 / 26

Crowdsourcing Latent Class Latent Class Assumptions Latent Class Assumptions Assumption I. Each item belongs to one specific latent class only. Assumption II. Items in the same latent class should be classified into the same category. 10 / 26

Crowdsourcing Latent Class Latent Class Assumptions Latent Class Assumptions Each item belongs to one specific latent class only. Items in the same latent class in the same category. Figure: Latent classes of a binary classification task to classify fruit and vegetable. 11 / 26

Crowdsourcing Latent Class Latent Class Dawid-Skene Estimator Latent Class Confusion Matrix Extend the original confusion matrix. An entry p nkl in this matrix represents the probability that worker n gives label l to an item which belongs to latent class k. Table: An example of latent class confusion matrix. tomato pumpkin cuke apple Fruit 2/3 0 1/3 1 Vegetable 1/3 1 2/3 0 12 / 26

Crowdsourcing Latent Class Latent Class Dawid-Skene Estimator Chinese Restaurant Process The number of latent classes K is unknown in advance. we choose a nonparametric prior for the latent class confusion matrix to learn the number of classes. Z i is the latent class assignment of item i, with the CRP prior, the probability Old Class : P(Z i = k) n k + α, k = 1 K, (3) New Class : P(Z i = K + 1) α. (4) 13 / 26

Crowdsourcing Latent Class Latent Class Dawid-Skene Estimator Nonparametric Dawid-Skene Model Full Bayesian Model: Assign : Z α c CRP(α c ), (5) Entries : p nk α d Dirichlet(α d ), n, k, (6) Obervation : W nm Z, p n Multinomial(A nm ), n, m, (7) Here A nm = {A nm1,, A nmd }, A nmd = K k=1 p nkd δ Zm,k 14 / 26

Crowdsourcing Latent Class Latent Class Dawid-Skene Estimator Inference We use Gibbs Sampling to infer the hidden variables. Conditional Distributions: confusion matrix parameter: p nk Z, W Dirichlet(p nk B nk ), n, k, (8) where B nkd = M m=1 δ W nm,dδ Zm,k + α d /D. hidden variables, when k K, P(Z m = k Z m, p, W) n k When generating a new class, P(Z m = k new Z m, p, W) α c N n=1 N n=1 d=1 D δ p Wnm,d nkd, (9) D d=1 Γ(δ W nm,d + α d /D). (10) Γ(1 + α d ) 15 / 26

Crowdsourcing Latent Class Latent Class Minimax Entropy Estimator Latent Class Minimax Entropy Estimator We also extend the minimax entropy estimator. Idea: change the category constraints. New objective: min Z max p nmd log p nmd α m τmd 2 p,τ,σ 2 n,m,d m,d n,m,d ( ) pnmd δ Wnm,d = τmd, m, d, β n σ 2 ndk 2 m s.t. n ( ) pnmd δ Wnm,d δzm,k = σ ndk, n, k, p nmd = 1, n, m. d (11) 16 / 26

Crowdsourcing Latent Class Category Recovery Category Recovery Regard items in a same class as one imaginary item, here we call it a hyper-item. We use a generalized Dawid-Skene estimator with hyper-items to estimate the category assignments. P(W q, p) = n q d p nkd ndl, (12) k d n,l where n nkd = m δ W nm,dδ Zm,k is the count of labels that worker n gives to hyper-item k. 17 / 26

Experiments Outline 1 Motivation 2 Related Works 3 Crowdsourcing Latent Class 4 Experiments 5 Conclusion 18 / 26

Experiments Synthetic Dataset 4 latent classes, 40 items parameters for each latent class. 2 types of workers, 20 workers of each type. K 60 50 40 30 20 10 α =0.1 c α =0.2 c α =0.8 c α c =0.4 α =1.6 c Error Rate (%) 0.06 0.05 0.04 0.03 0.02 0.01 0 0 20 40 60 80 100 # Iterations (a) 0 0.1 0.2 0.4 0.8 1.6 α c Figure: (a) shows the numbers of latent classes found by NDS with different color. (b) shows the average category recovery error rates. (b) 19 / 26

Experiments Real Flower Dataset 4 flower species, 50 pics for each. 2 species for each category. 36 workers, 2366 labels in total. (a) NDS (b) LC-ME (c) NDS+CR (d) LC-ME+CR Figure: entry corresponding to image, column corresponding to flower species. (a)(b): color denotes latent class. (c)(d): denotes category. 20 / 26

Experiments Real Flower Dataset 4 flower species, 50 pics for each. 2 species for each category. 36 workers, 2366 labels in total. Figure: Representative pictures for different latent classes.(best viewed in color). 21 / 26

Experiments Category Recovery Table: Performance of several models on flowers dataset. the average error rate of 10 trials, together with standard deviation, are presented. # 20 25 30 35 MV 0.1998 ± 0.0506 0.2383 ± 0.0216 0.2153 ± 0.0189 0.2170 ± 0.0096 DS 0.1590 ± 0.0538 0.1555 ± 0.0315 0.1310 ± 0.0213 0.1300 ± 0.0041 NDS 0.1595 ± 0.0737 0.1605 ± 0.0434 0.1330 ± 0.0371 0.1475 ± 0.0354 ME 0.1535 ± 0.0695 0.1470 ± 0.0339 0.1315 ± 0.0200 0.1335 ± 0.0078 LC-ME 0.1415 ± 0.0382 0.1430 ± 0.0286 0.1215 ± 0.0133 0.1190 ± 0.0168 The first line means the number of workers we used on estimation tasks. 22 / 26

Conclusion Outline 1 Motivation 2 Related Works 3 Crowdsourcing Latent Class 4 Experiments 5 Conclusion 23 / 26

Conclusion Conclusion We show a latent class structures in crowdsourcing. We propose algorithms to disclose meaningful latent classes. We show that latent structures can help to achieve higher accuracies on category recovery tasks. 24 / 26

References Outline 1 Motivation 2 Related Works 3 Crowdsourcing Latent Class 4 Experiments 5 Conclusion 25 / 26

References Selected Reference Dawid, A.P., and Skene, A.M.: Maximum likelihood estimation of observer error-rates using the em algorithm. Applied statistics (1979), 20 28. Dempster, A.P., Laird, N.M., Rubin, D.B., etal.: Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal statistical Society 39, 1 (1977), 1 38. Li, H., Yu, B., and Zhou, D.: Error rate analysis of labeling by crowdsourcing. Raykar, V.C., Yu, S., Zhao, L.H., Valadez, G.H., Florin, C., Bogoni, L., and Moy, L.: Learning from crowds. The Journal of Machine Learning Research 11 (2010), 1297 1322. Sheshadri, A., and Lease, M.: Square: A benchmark for research on computing crowd consensus. In First AAAI Conference on Human Computation and Crowdsourcing (2013). Tian, Y., and Zhu, J.: Learning from crowds in the presence of schools of thought. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (2012), ACM, 226 234. Welinder, P., Branson, S., Belongie, S., and Perona, P.: The multidimensional wisdom of crowds. In Neural Information Processing Systems (2010), vol.10, 2424 2432. Zhou, D., Platt, J.C., Basu, S., and Mao, Y.: Learning from the wisdom of crowds by minimax entropy. In Neural Information Processing Systems (2012), 2204 2212. 26 / 26