Crowd-Learning: Improving the Quality of Crowdsourcing Using Sequential Learning

Similar documents
An Online Learning Approach to Improving the Quality of Crowd-Sourcing

1 [15 points] Frequent Itemsets Generation With Map-Reduce

Evaluation of multi armed bandit algorithms and empirical algorithm

Reducing contextual bandits to supervised learning

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016

COMS 4721: Machine Learning for Data Science Lecture 20, 4/11/2017

Models of collective inference

Multi-armed bandit models: a tutorial

Administration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon.

New Algorithms for Contextual Bandits

An Optimal Bidimensional Multi Armed Bandit Auction for Multi unit Procurement

Tsinghua Machine Learning Guest Lecture, June 9,

COS 402 Machine Learning and Artificial Intelligence Fall Lecture 22. Exploration & Exploitation in Reinforcement Learning: MAB, UCB, Exp3

Learning with Exploration

Crowdsourcing via Tensor Augmentation and Completion (TAC)

Grundlagen der Künstlichen Intelligenz

Bandit Algorithms. Zhifeng Wang ... Department of Statistics Florida State University

Reward Maximization Under Uncertainty: Leveraging Side-Observations on Networks

arxiv: v3 [cs.gt] 17 Jun 2015

Bayesian Contextual Multi-armed Bandits

Multi-armed Bandits in the Presence of Side Observations in Social Networks

Estimation Considerations in Contextual Bandits

Adaptive Crowdsourcing via EM with Prior

Large-scale Information Processing, Summer Recommender Systems (part 2)

Active Learning and Optimized Information Gathering

Sparse Linear Contextual Bandits via Relevance Vector Machines

Revisiting the Exploration-Exploitation Tradeoff in Bandit Models

Bandit models: a tutorial

Online Learning with Feedback Graphs

Mean field equilibria of multiarmed bandit games

Decoupled Collaborative Ranking

Online Learning and Sequential Decision Making

Binary Principal Component Analysis in the Netflix Collaborative Filtering Task

Final Exam, Machine Learning, Spring 2009

Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning

Allocating Resources, in the Future

The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks

Multiple Identifications in Multi-Armed Bandits

A Gradient-based Adaptive Learning Framework for Efficient Personal Recommendation

The Multi-Arm Bandit Framework

RL 3: Reinforcement Learning

Fast Adaptive Algorithm for Robust Evaluation of Quality of Experience

The information complexity of sequential resource allocation

Crowdsourcing & Optimal Budget Allocation in Crowd Labeling

Analysis of Thompson Sampling for the multi-armed bandit problem

Outline: Ensemble Learning. Ensemble Learning. The Wisdom of Crowds. The Wisdom of Crowds - Really? Crowd wiser than any individual

Online Learning and Sequential Decision Making

Thompson Sampling for the MNL-Bandit

[6] [9] [16], [20], [21] [10], [11] This work Multi-agent no yes yes yes Cooperative N/A yes yes yes Contextual yes no yes yes process

An Experimental Evaluation of High-Dimensional Multi-Armed Bandits

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Circle-based Recommendation in Online Social Networks

Bandit-Based Task Assignment for Heterogeneous Crowdsourcing

Crowdsourcing Pareto-Optimal Object Finding by Pairwise Comparisons

Scaling Neighbourhood Methods

An Adaptive Algorithm for Selecting Profitable Keywords for Search-Based Advertising Services

Combinatorial Multi-Armed Bandit and Its Extension to Probabilistically Triggered Arms

Click Prediction and Preference Ranking of RSS Feeds

Restricted Boltzmann Machines for Collaborative Filtering

Whom to Ask? Jury Selection for Decision Making Tasks on Micro-blog Services

Discovering Unknown Unknowns of Predictive Models

A Wisdom of the Crowd Approach to Forecasting

Context-Aware Hierarchical Online Learning for Performance Maximization in Mobile Crowdsourcing

I D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69

Linear Scalarized Knowledge Gradient in the Multi-Objective Multi-Armed Bandits Problem

Mixed Membership Matrix Factorization

Bandits and Exploration: How do we (optimally) gather information? Sham M. Kakade

Maximum Margin Matrix Factorization for Collaborative Ranking

Multi-Armed Bandit: Learning in Dynamic Systems with Unknown Models

Doubly Robust Policy Evaluation and Learning

Lecture 1: March 7, 2018

Scalable Hierarchical Recommendations Using Spatial Autocorrelation

Aggregating Ordinal Labels from Crowds by Minimax Conditional Entropy. Denny Zhou Qiang Liu John Platt Chris Meek

A Modified PMF Model Incorporating Implicit Item Associations

THE first formalization of the multi-armed bandit problem

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.

Contextual Combinatorial Bandit and its Application on Diversified Online Recommendation

Advanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12

Mixed Membership Matrix Factorization

Improving Quality of Crowdsourced Labels via Probabilistic Matrix Factorization

Bias-Variance Error Bounds for Temporal Difference Updates

Marks. bonus points. } Assignment 1: Should be out this weekend. } Mid-term: Before the last lecture. } Mid-term deferred exam:

Lecture 8. Instructor: Haipeng Luo

Contextual Online Learning for Multimedia Content Aggregation

Recommendation Systems

Online Collaborative Filtering with Implicit Feedback

Models, Data, Learning Problems

A Review of the E 3 Algorithm: Near-Optimal Reinforcement Learning in Polynomial Time

Cost and Preference in Recommender Systems Junhua Chen LESS IS MORE

Measuring Social Influence Without Bias

The Multi-Armed Bandit Problem

Counterfactual Model for Learning Systems

COMBINATORIAL MULTI-ARMED BANDIT PROBLEM WITH PROBABILISTICALLY TRIGGERED ARMS: A CASE WITH BOUNDED REGRET

Diagnostics. Gad Kimmel

Matrix Factorization Techniques for Recommender Systems

15-780: Graduate Artificial Intelligence. Reinforcement learning (RL)

Hypothesis Registration: Structural Predictions for the 2013 World Schools Debating Championships

Complex Bandit Problems and Thompson Sampling

Prof. Dr. Ann Nowé. Artificial Intelligence Lab ai.vub.ac.be

Orbital Insight Energy: Oil Storage v5.1 Methodologies & Data Documentation

Transcription:

Crowd-Learning: Improving the Quality of Crowdsourcing Using Sequential Learning Mingyan Liu (Joint work with Yang Liu) Department of Electrical Engineering and Computer Science University of Michigan, Ann Arbor, MI March 2015

The power of crowdsourcing Tapping into enormous resources in sensing and processing: Data collection: participatory sensing, user-generated map Data processing: image labeling, annotation Recommendation: rating of movies, news, restaurants, services Social studies: opinion survey, the science of opinion survey Liu (Michigan) Crowd-Learning March 2015 2 / 42

Scenario I: recommender systems E.g., Yelp, movie reviews, news feed A user shares experience and opinion Measure of quality subjective: not all ratings should be valued equally Liu (Michigan) Crowd-Learning March 2015 3 / 42

Introduction Recommender system Labeler selection Conclusion Scenario II: crowdsourcing markets E.g., using AMTs Labelers Assignment Labeling Aggregation 0,0,1 1,0,1 0 1 0,1,0 Time 0 Paid workers perform computational tasks. Measure of quality objective but hard to evaluate: competence, bias, irresponsible behavior, etc. Liu (Michigan) Crowd-Learning March 2015 4 / 42

Our objective To make the most effective use of the crowdsourcing system Cost in having large amount of data labeled is non-trivial There may also be time constraint A sequential/online learning framework Over time learn which labelers are more competent, or whose reviews/opinion should be valued more. Closed-loop, causal. Liu (Michigan) Crowd-Learning March 2015 5 / 42

Multiarmed bandit (MAB) problems A sequential decision and learning framework: Objective: select the best of a set of choices ( arms ) Principle: repeated sampling of different choices ( exploration ), while controlling how often each choice is used based on their empirical quality ( exploitation ). Performance measure: regret difference between an algorithm and a benchmark. Challenge in crowdsourcing: ground truth True label of data remains unknown If view each labeler as a choice/arm: unknown quality of outcome ( reward ). Liu (Michigan) Crowd-Learning March 2015 6 / 42

Multiarmed bandit (MAB) problems A sequential decision and learning framework: Objective: select the best of a set of choices ( arms ) Principle: repeated sampling of different choices ( exploration ), while controlling how often each choice is used based on their empirical quality ( exploitation ). Performance measure: regret difference between an algorithm and a benchmark. Challenge in crowdsourcing: ground truth True label of data remains unknown If view each labeler as a choice/arm: unknown quality of outcome ( reward ). Liu (Michigan) Crowd-Learning March 2015 6 / 42

Key ideas and features Dealing with lack of ground truth: Recommender system: input from others is calibrated against one s own experience Labeler selection: mild assumption on the collective quality of the crowd; quality of an individual is estimated against the crowd. Online and offline uses: Learning occurs as data/labeling tasks arrive. Can be equally used offline by processing data sequentially. Performance measure Weak regret: comparing against optimal static selections. Will also compare with offline methods. Liu (Michigan) Crowd-Learning March 2015 7 / 42

Outline of the talk The recommender system problem Formulation and main results Experiments using MovieLens data The labeler section problem Formulation and main results Experiments using a set of AMT data Discussion and conclusion Liu (Michigan) Crowd-Learning March 2015 8 / 42

Model Users/Reviewers and options: M users/reviewers: i, j {1, 2,..., M}. Each has access to N options k, l {1, 2,..., N}. At each time step a user can choose up to K options: a i (t). Rewards: An IID random reward rl i (t), both user and option dependent. Mean reward (unknown to the user): µ i l µ i k, l k, i, i.e., different options present distinct values to a user. Liu (Michigan) Crowd-Learning March 2015 9 / 42

Performance measure Weak regret: R i,a (T ) = T T µ i k E[ rk(t)] i k NK i t=1 k a i (t) A user s optimal selection (reward-maximization): top-k set N i K. General goal is to achieve R i,a (T ) = o(t ). Existing approach can achieve log regret uniform in time. Liu (Michigan) Crowd-Learning March 2015 10 / 42

Example: UCB1 [Auer et all 2002] Single-play version; extendable to multiple-play Initialization: for t N, play arm/choice t, t = t + 1 While t > N for each choice k, calculate its sample mean: r i k(t) = r i k(1) + r i k(2) +... + r i k(n i k(t)) n i k (t) its index: g i k,t,n i k (t) = r i k(t) + L log t n i k (t), k play the arm with the highest index; t = t + 1 Liu (Michigan) Crowd-Learning March 2015 11 / 42

Key observation A user sees and utilizes its own samples in learning. Can we improve this by leveraging other users experience? Second-hand learning in addition to first-hand learning. Basic idea: Estimate the difference between two users. Use this to calibrate others observations or decisions so that they could be used as one s own. Liu (Michigan) Crowd-Learning March 2015 12 / 42

How to model information exchange Full information exchange: Users share their decisions and subsequent rewards (k, r i k (t)). Partial information exchange: Only share decisions on which options were used without revealing evaluation (k). Liu (Michigan) Crowd-Learning March 2015 13 / 42

Full information exchange how to measure pairwise difference Estimated distortion: δ i,j k (t) = s t log r k i (s)/ni k (t) s t log r j k (s)/nj k (t). Converted average reward from j: π i,j ( r j k (t)) = s t (r j k i,j (s)) δ k (t) /n j k (t) Liu (Michigan) Crowd-Learning March 2015 14 / 42

An index algorithm Original UCB1 index: r i k (t) + 2 log t n i k (t) Modified index: choose K highest in this value (U full): converted rewards {}}{ r k i (t) ni k (t) + π i,j ( r j k (t)) nj k (t) j i j nj k (t) + 2 log t j nj k (t), Converted average reward from j: π i,j ( r j k (t)) = s t (r j k i,j (s)) δ k (t) /n j k (t) Liu (Michigan) Crowd-Learning March 2015 15 / 42

Regret bound Theorem The weak regret of user i under U full is upper bounded by RU i full(t ) 4( 2 + κm γ ) 2 log T M i + const. k k N i K where i k = µi K µi k, assuming min j{e[log r j k ] δi,j k } > 0, and rk i = (r j k )δi,j k. Compared with UCB1: R ucb1 (T ) k N K 8 log T k + const. When M large roughly M-fold improvement. Liu (Michigan) Crowd-Learning March 2015 16 / 42

Regret bound Theorem The weak regret of user i under U full is upper bounded by RU i full(t ) 4( 2 + κm γ ) 2 log T M i + const. k k N i K where i k = µi K µi k, assuming min j{e[log r j k ] δi,j k } > 0, and rk i = (r j k )δi,j k. Compared with UCB1: R ucb1 (T ) k N K 8 log T k + const. When M large roughly M-fold improvement. Liu (Michigan) Crowd-Learning March 2015 16 / 42

Partial information exchange Only sees others choices, not rewards Will further distinguish users by their preference groups. Within the same preference group users have the same preference ordering among all choices: µ i 1 > µi 2 > > µi N for all i in the group. Liu (Michigan) Crowd-Learning March 2015 17 / 42

Uniform group preference Keep track of sample frequency: Track n k (t) = i ni k (t); compute frequency β k (t) := n k(t) l n l(t) Modified index: (U part): r k(t) i log t 2 log t α(1 β k (t)) nk i }{{ (t) + nk i } (t) Group recommendation α: the weight given to others choices. Liu (Michigan) Crowd-Learning March 2015 18 / 42

Non-uniform group preferences Additional technical hurdles: Assume known set of preferences but unknown group affiliation. Need to perform group identification Keep track of the number of times user j chooses option k. Estimate j s preference by ordering the sample frequency. Place j in the group with best match in preference ordering. Discount choices made by members of a different group. Similar results can be obtained. Liu (Michigan) Crowd-Learning March 2015 19 / 42

Experiment I: M = 10, N = 5, K = 3 Uniform preference; rewards exp rv; distortion Gaussian Crowd regret 40 30 20 10 U FULL UCB IND UCB Centralized 0 0 200 400 600 800 1000 Time Crowd regrets 25 20 15 10 5 UCB-IND U-PART 0 0 1000 2000 3000 4000 5000 Time (L) comparing full information exchange with UCB1 applied individually, and applied centrally with known distortion. (R) comparing partial information exchange with UCB1 applied individually. Liu (Michigan) Crowd-Learning March 2015 20 / 42

Experiment II: MovieLens data A good dataset though not ideal for our intended use Collected via a movie recommendation system We will use MovieLens-1M dataset: containing 1M rating records provided by 6040 users on 3952 movies from 18 genres, from April 25, 2000 to February 28, 2003. Each rating on a scale of 1-5. In general, each reviewer contributes to multiple reviews: 70% have more than 50 reviews. Can we provide better recommendation? Predict how a user is going to rate movies given his and other users reviews in the past. The decision aspect of the learning algorithm is not captured. Liu (Michigan) Crowd-Learning March 2015 21 / 42

MovieLens: methodology Discrete time steps clocked by the review arrivals. Bundle movies into 18 genres (action, adventure, comedy, etc), each representing an option/arm: ensure that each option remains available for each reviewer lose the finer distinction between movies of the same genre prediction is thus for a whole genre, used as a proxy for a specific movie within that genre. Use full information exchange index we will only utilize users estimated to be in the same group (same preference ordering). Prediction performance measured by error and squared error averaged over the total number of reviews received by time t. Liu (Michigan) Crowd-Learning March 2015 22 / 42

The recommendation algorithm At time t, given i s review rk i (t) for movie k: update i s preference ranking over options/genres; update i s similarity group: reviewers that share the same set of top K preferred options as i; estimate the distortion between i and those in its similarity group; update i s rating for each option by including rating from those in its similarity group corrected by the estimated distortion; repeat for all reviews arriving at time t. At the end of step t, obtain estimated rating for all reviewers and all genres. Liu (Michigan) Crowd-Learning March 2015 23 / 42

Algorithm used online Prediction at each time step Average error 2 1.9 1.7 1.5 1.3 1.1 0.9 Individual Crowd learning 0.7 0.5 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Time Prediction becomes more accurate with more past samples. Group learning outperforms individual learning. Downward trend not monotonic due to arrivals of new movies. Liu (Michigan) Crowd-Learning March 2015 24 / 42

Algorithm used offline Offline estimation result; comparison with the following SoCo, a social network and contextual information aided recommendation system. A random decision tree is adopted to partition the original user-item-rating matrix (user-movie-rating matrix in our context) so that items with similar contexts are grouped. RPMF, Random Partition Matrix Factorization, a contextual collaborative filtering method based on a tree constructed by using random partition techniques. MF, basic matrix factorization technique over the user-item matrix. Algorithm Crowd Ind. SoCo RPMF MF Avg. Error 0.6880 0.8145 0.7066 0.7223 0.7668 RMSE 0.9054 1.0279 0.8722 0.8956 0.9374 Liu (Michigan) Crowd-Learning March 2015 25 / 42

Discussion Combination of learning from direct and indirect experience Estimation of similarity groups and pairwise distortion effectively allows us to utilize a larger set of samples. Our algorithm does not rely on exogenous social or contextual information However, the estimation of similarity groups introduces a type of social connectivity among users. In settings where it s unclear whether preferences are uniform or non-uniform: can simply assume it to be the latter and do as we did in the MovieLens experiment. Liu (Michigan) Crowd-Learning March 2015 26 / 42

Outline of the talk The recommender system problem Formulation and main results Experiments using MovieLens data The labeler section problem Formulation and main results Experiments using a set of AMT data Discussion and conclusion Liu (Michigan) Crowd-Learning March 2015 27 / 42

Introduction Recommender system Labeler selection Conclusion Labeler selection Labelers Assignment Labeling Aggregation 0,0,1 1,0,1 0 1 0,1,0 Time 0 M labelers; labelers i has accuracy pi (can be task-dependent). No two exactly the same: pi 6= pj for i 6= j, and 0 < pi < 1, i. P Collective quality: p := i pi /M > 1/2. Unlabeled tasks arrive at t = 1, 2,. User selects a subset St of labelers for task at t. Labeling payment of ci for each task performed by labeler i. Liu (Michigan) Crowd-Learning March 2015 28 / 42

Labeling outcome/information aggregation Aggregating results from multiple labelers: A task receives a set of labels: {L i (t)} i St. Use simple majority voting or weighted majority voting to compute the label output: L (t). Probability of correct labeling outcome: π(s t ); well defined function of p i s. Optimal set of labelers: S that maximizes π(s). Accuracy of labeling outcome: Probability that a simple majority vote over all M labelers is correct: a min := P( i X i/m > 1/2). If p > 1/2 and M > log 2 p 1/2, then a min > 1/2. Liu (Michigan) Crowd-Learning March 2015 29 / 42

Obtaining S Assuming we know {p i }, S can be obtained using a simple linear search Theorem Under the simple majority voting rule, S is an odd number. Furthermore, S is monotonic: if i S and j S, then we must have p i > p j. Liu (Michigan) Crowd-Learning March 2015 30 / 42

Obtaining S Assuming we know {p i }, S can be obtained using a simple linear search Theorem Under the simple majority voting rule, S is an odd number. Furthermore, S is monotonic: if i S and j S, then we must have p i > p j. Liu (Michigan) Crowd-Learning March 2015 30 / 42

An online learning algorithm Exploration Exploitation Task arrivals x Time Tasks used for testing There is a set of tasks E(t) ( log t) used for testing purposes. These or their independent and identical variants are repeatedly assigned to the labelers ( log t). Liu (Michigan) Crowd-Learning March 2015 31 / 42

Exploration Exploitation Task arrivals x Time Tasks used for testing Two types of time steps: Exploration: all M labelers are used. Exploration is entered if (1) the number of testers falls below a threshold ( log t), or if (2) the number of times a tester has been tested falls below a threshold ( log t). Exploitation: the estimated S is used to label the arriving task based on the current estimated { p i }. Liu (Michigan) Crowd-Learning March 2015 32 / 42

Exploration Exploitation Task arrivals x Time Tasks used for testing Three types of tasks: Testers: those arriving to find (1) true and (2) false. These are added to E(t) and are repeatedly used to collect independent labels whenever (2) is true subsequently. Throw-aways: those arriving to find (2) true. These are given a random label. Keepers: those arriving to find both (1) and (2) false. These are given a label outcome using the best estimated set of labelers. Liu (Michigan) Crowd-Learning March 2015 33 / 42

Exploration Exploitation Task arrivals x Time Tasks used for testing Accuracy update Estimated label on tester k at time t: majority label over all test outcomes up to time t. p i at time t: the % of times i s label matches the majority vote known at t out of all tests on all testers. Liu (Michigan) Crowd-Learning March 2015 34 / 42

Regret Comparing with the optimal selection (static): Main result: T R(T ) = T π(s ) E[ π(s t )] t=1 R(T ) Const(S, max, min, δ max, δ min, a min ) log 2 (T ) + Const max = max S S π(s ) π(s), δ max = max i j p i p j. First term due to exploration; second due to exploitation. Can obtain similar result on the cost C(T ). Liu (Michigan) Crowd-Learning March 2015 35 / 42

Discussion Relaxing some assumptions Re-assignment of the testers after random delay Improve the bound by improving a min : weed out bad labelers. Weighted majority voting rule Each labeler i s decision is weighed by log p i 1 p i. Have to account for additional error in estimating the weights when determining label outcome. A larger constant: slower convergence to a better target. Liu (Michigan) Crowd-Learning March 2015 36 / 42

Experiment I: simulation with M = 5 Performance comparison: labeler selection v.s. full crowd-sourcing (simple majority vote) Average reward 0.8 0.75 0.7 w/ LS_OL Full crowd-sourcing 0.65 0 200 400 600 800 1000 Time Liu (Michigan) Crowd-Learning March 2015 37 / 42

Comparing weighted and simple majority vote Accumulated reward 100 50 Simple majority voting Weighted majority voting 0 0 20 40 60 80 100 Time steps Liu (Michigan) Crowd-Learning March 2015 38 / 42

Experiment II: on a real AMT dataset Contains 1,000 images each labeled by the same set of 5 AMTs. Labels are on a scale from 0 to 5, indicating how many scenes are seen from each image. A second dataset summarizing keywords for scenes of each image: use this count as the ground truth. AMT1 AMT2 AMT3 AMT4 AMT5 # of disagree 348 353 376 338 441 Table : Total number of disagreement each AMT has Liu (Michigan) Crowd-Learning March 2015 39 / 42

Performance comparison Average error rate 1.8 1.6 1.4 1.2 1 AMT5 ruled out Full crowd w/ LS_OL 0.8 0 200 400 600 800 1000 Ordered image number CDF 1 0.8 0.6 0.4 0.2 Full Crowd : avg=1.63 w/ LS_OL: avg=1.36 0 0 1 2 3 4 5 Error in labeling (L) AMT 5 was quickly weeded out; eventually settled on the optimal set of AMTs 1, 2, and 4. (R) CDF of all images labeling error at the end of this process. Liu (Michigan) Crowd-Learning March 2015 40 / 42

We discussed two problems Conclusion How to make better recommendation for a user by considering more heavily opinions of other like-minded users. UCB1-like group learning algorithms. Outperforms individual learning. How to select the best set of labelers over a sequence of tasks. An algorithm that estimates labeler s quality by comparing against (weighted) majority vote. New regret bound. Currently under investigation Lower bound on the regret in the labeler selection problem. Generalization to sequential classifier design. Liu (Michigan) Crowd-Learning March 2015 41 / 42

References Y. Liu and M. Liu, An Online Learning Approach to Improving the Quality of Crowd-sourcing, to appear in ACM SIGMETRICS, June 2015, Portland, OR. Y. Liu and M. Liu, Group Learning and Opinion Diffusion in a Broadcast Network, Annual Allerton Conference on Control, Communication, and Computing (Allerton), October 2013, Allerton, IL. Liu (Michigan) Crowd-Learning March 2015 42 / 42