A Randomized Approach for Crowdsourcing in the Presence of Multiple Views

Similar documents
Crowdsourcing via Tensor Augmentation and Completion (TAC)

Learning Task Grouping and Overlap in Multi-Task Learning

Learning from the Wisdom of Crowds by Minimax Entropy. Denny Zhou, John Platt, Sumit Basu and Yi Mao Microsoft Research, Redmond, WA

Fantope Regularization in Metric Learning

MULTIPLEKERNELLEARNING CSE902

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices

Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Machine Learning Basics

Online Dictionary Learning with Group Structure Inducing Norms

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

ECE521 week 3: 23/26 January 2017

Accelerated Block-Coordinate Relaxation for Regularized Optimization

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Rank Selection in Low-rank Matrix Approximations: A Study of Cross-Validation for NMFs

Logistic Regression. COMP 527 Danushka Bollegala

Training the linear classifier

Minimizing the Difference of L 1 and L 2 Norms with Applications

Spectral k-support Norm Regularization

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018

Provable Alternating Minimization Methods for Non-convex Optimization

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

10-701/ Machine Learning - Midterm Exam, Fall 2010

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Machine Learning. Linear Models. Fabio Vandin October 10, 2017

An Empirical Study of Building Compact Ensembles

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

Compressed Sensing and Neural Networks

HOMEWORK #4: LOGISTIC REGRESSION

Adaptive Crowdsourcing via EM with Prior

HOMEWORK #4: LOGISTIC REGRESSION


ARestricted Boltzmann machine (RBM) [1] is a probabilistic

Accelerating SVRG via second-order information

STA 4273H: Statistical Machine Learning

Mirror Descent for Metric Learning. Gautam Kunapuli Jude W. Shavlik

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

CS 231A Section 1: Linear Algebra & Probability Review

ECS289: Scalable Machine Learning

DATA MINING AND MACHINE LEARNING

Statistical Data Mining and Machine Learning Hilary Term 2016

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Least Mean Squares Regression

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning

Parallel and Distributed Stochastic Learning -Towards Scalable Learning for Big Data Intelligence

CPSC 540: Machine Learning

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Tensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia

CPSC 340: Machine Learning and Data Mining

Large Scale Semi-supervised Linear SVMs. University of Chicago

Heterogeneous Learning. Jingrui He Computer Science Department Stevens Institute of Technology

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu

Dictionary Learning Using Tensor Methods

Decoupled Collaborative Ranking

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Lecture 17: Neural Networks and Deep Learning

Classical Predictive Models

Accelerated Stochastic Block Coordinate Gradient Descent for Sparsity Constrained Nonconvex Optimization

Diversity Regularization of Latent Variable Models: Theory, Algorithm and Applications

Mathematical Methods for Data Analysis

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Linear and Logistic Regression. Dr. Xiaowei Huang

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

MATH 680 Fall November 27, Homework 3

Scaling Neighbourhood Methods

Importance Sampling for Minibatches

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Learning gradients: prescriptive models

Tufts COMP 135: Introduction to Machine Learning

Machine Learning Practice Page 2 of 2 10/28/13

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

Tensor Methods for Feature Learning

Kernel Learning for Multi-modal Recognition Tasks

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

STA141C: Big Data & High Performance Statistical Computing

CS60021: Scalable Data Mining. Large Scale Machine Learning

Variational Functions of Gram Matrices: Convex Analysis and Applications

A Framework for Modeling Positive Class Expansion with Single Snapshot

Tensor Analysis. Topics in Data Mining Fall Bruno Ribeiro

Sparse Gaussian conditional random fields

Sparse Approximation and Variable Selection

The Ties that Bind Characterizing Classes by Attributes and Social Ties

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Linear Models for Classification

Aggregating Ordinal Labels from Crowds by Minimax Conditional Entropy. Denny Zhou Qiang Liu John Platt Chris Meek

Tensor Canonical Correlation Analysis and Its applications

ECS289: Scalable Machine Learning

Proximal Methods for Optimization with Spasity-inducing Norms

Homework 5. Convex Optimization /36-725

Neural Network Training

Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning

Introduction to Logistic Regression

When Dictionary Learning Meets Classification

he Applications of Tensor Factorization in Inference, Clustering, Graph Theory, Coding and Visual Representation

Some tensor decomposition methods for machine learning

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35

Transcription:

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 -

Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion - 2 -

Feature Heterogeneity (Multi-view) Example: Document classification First Dictionary First View Documents Second Dictionary Different dictionaries Second View Bag of words under different views - 3 -

Feature Heterogeneity (Multi-view) Example: Image classification View 1: SIFT View 2: HOG View 3: Deep Features Image set Different feature extractions View 4: Contour Features - 4 -

View Consistency Initial Data D L View 1 View 2 Classifier 1 F 1 ( ) Classifier 2 F 2 ( ) Make Predictions on New Data x' F 1 (x')= F 2 (x') - 5 -

Crowdsourcing What is crowdsourcing? Crowdfunding (Kickstarter) Collective Knowledge (Data labeling, Foreign language translation) Collective Creativity (Analogy mining) Implicit Crowdsourcing (CAPTCHA) Crowdsourcing in Machine Learning In ML, training a (semi)supervised model needs training labels. Many crowdsourcing platforms provide services to collect labels information. - 6 -

Worker Consensus Workers Items Predictions of the workers regarding the same item should be similar Low-cost and efficient: Collecting a large number of labels in a short period of time. - 7 -

Research Questions Q1: How to model the multi-view learning problem using crowdsourcing labels? Q2: What is the appropriate tool to solve it? Q3: How to speed up? - 8 -

Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion - 9 -

M2VW: Formulation Weight matrix of the workers: where, in the m th view. indicates the weights learned for the k th worker - 10 -

M2VW: Formulation Prediction tensor: Vector of all 1 s (of size V) The i th slice is defined as: Feature vector of x i Block diagonal matrix N: # of examples prediction tensor A N w : # of workers V: # of views i th tensor slice A i:: Prediction of the i th example made by the k th worker under the m th view - 11 -

M2VW: Formulation Optimization formulation: Loss function Low-rank prediction tensor Remarks: Problem of rank minimization is NP-hard [1]. Rank of a tensor is not uniquely defined [2,3]. Tightest convex envelope Regularization term Non-negative combination of matrices trace norms [1]. E. J. Candes, et al, Exact matrix completion via convex optimization, Foundations of Computational Mathematics, 2009. [2]. T. G. Kolda, et al, Tensor decompositions and applications, SIAM Review, 2009. [3]. J. Liu, et al, Tensor completion for estimating missing values in visual data, IEEE TPAMI, 2012. - 12 -

M2VW: Formulation Interpretations of the terms: Loss function: Convex and monotonically decreasing First matricization: 1 st -example 2 nd -example Last example N w : # of workers Vectorized predictions of worker i Vectorized predictions of worker ii Worker Consensus Minimizing requires the predictions of different workers over the same item on the same view to be correlated. - 13 -

M2VW: Formulation Interpretations of the terms: Second matricization: 1 st -example 2 nd -example Last example V: # of views Vectorized predictions under view i Minimizing requires the predictions of different views over the same worker on the same item to be consistent. Vectorized predictions under view ii View Consistency Remarks: Third matricization requires the predictions of different worker-view combinations on the same item should also be correlated, which is the repetition of Worker Consensus and View Consistency. - 14 -

M2VW: Formulation Interpretations of the terms: Regularization term: Group sparsity: group-wise weights for the features that corresponding to a specific worker under a specific view. Feature sparsity: general sparsity weights cross multiple workers. - 15 -

M2VW: Formulation Optimization formulation (relaxed) Key advantages of this relaxation: The interdependent trace norm terms are split, so they can be solved independently. Relaxation penalty term can be transformed into a smooth differentiable function. The (transformed) terms in objective function is also separable and parallelizable with respect to the workers. - 16 -

M2VW: Algorithm Solution: Gradient based method (BCD) Subproblem of updating W: Gradient of the loss function - 17 -

M2VW: Algorithm Solution: Gradient based method (BCD) Subproblem of updating W: Gradient of the relaxation penalty Conclusion from Lemma 1 & Lemma 2-18 -

M2VW: Algorithm Solution: Gradient based method (BCD) Subproblem of updating W: Gradient of the feature sparsity update columns in worker-wise update all entries together - 19 -

M2VW: Algorithm Solution: Gradient based method (BCD) Subproblem of updating M l : Closed form solution [4] : where,,, τ = and is the i th singular value of [4]. J.-F. Cai, et al, A singular value thresholding algorithm for matrix completion, SIAM Journal on Optimization, 2010. - 20 -

M2VW: Algorithm Solution: Gradient based method (BCD) Computational complexity: Observation: The complexity is linear w.r.t. the number of workers. Question: How to speed up? - 21 -

M2VW: Randomized Algorithm Remarks: - 22 -

M2VW: Randomized Algorithm Decomposition of W into N w blocks: Any W can be written uniquely as: Block (worker) update: For the gradient of the k th worker: Block coordinate directions Worker blocks of W - 23 -

M2VW: Randomized Algorithm Batch workers update: For n th round of the BCD iterations, and assume that we select a subset of the block coordinate directions: Remarks: Optimization objective f(w) is smooth and block separable. The subset of the block coordinate directions are uniformly selected with probability of. - 24 -

Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion - 25 -

Experiment Dataset: 20 Newsgroups [5] : two of its largest subsets, 50 synthetic workers. Animal Breed [6] : subset of ImageNet, 31 real crowdsourcing workers. [5]. T. Joachims. A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization, Computer Science Technical Report CMU-CS-96-118. Carnegie Mellon University. [6]. Y. Zhou, et al, MultiC 2 : an optimization framework for learning from task and worker dual heterogeneity in SDM, 2017. - 26 -

Experiment Effectiveness results: Evaluation metric: Average F1-score. Comparison methods: ConLR: Logistic Regression using concatenated features. PMC [7] : Pseudo Multi-view Co-training. VRKHS [8] : Vector-valued RKHS multi-view learning. MultiC 2[6] : Heterogeneous classification using crowdsourcing labels. [7]. M. Chen, et al, Automatic feature decomposition for single view co-training, in ICML, 2011. [8]. H. Q. Minh, et al, A unifying framework for vector-valued manifold regularization and multi-view learning, in ICML, 2013. - 27 -

Experiment Terms necessities and parameter sensitivity: Baseline II: Loss term + sparsity term Baseline III: Loss term + low-rank term Baseline I: Loss term only Baseline I: Loss term only - 28 -

Experiment Efficiency: Improved performance with proper worker batch size Run time scales linearly w.r.t. the number of workers - 29 -

Conclusion Dual heterogeneity learning framework: Feature heterogeneity (multi-view learning) Worker heterogeneity (crowdsourcing) Algorithms: Relaxation leads to independent and differentiable objective. Separability of the objective leads to RBCD solution. Experiment results: Consistently better results on synthetic and real dataset. Linear scalability w.r.t. the number of workers. - 30 -

Thank you! & Questions? - 31 -