Multi-label Active Learning with Auxiliary Learner

Similar documents
Active Learning for Sparse Bayesian Multilabel Classification

Large-Margin Thresholded Ensembles for Ordinal Regression

Large-Margin Thresholded Ensembles for Ordinal Regression

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science

Introduction to Support Vector Machines

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

TDT4173 Machine Learning

6.036 midterm review. Wednesday, March 18, 15

CS145: INTRODUCTION TO DATA MINING

A Deep Interpretation of Classifier Chains

Learning Classification with Auxiliary Probabilistic Information Quang Nguyen Hamed Valizadegan Milos Hauskrecht

CS7267 MACHINE LEARNING

An Ensemble Ranking Solution for the Yahoo! Learning to Rank Challenge

Support Vector and Kernel Methods

ECS289: Scalable Machine Learning

Linear and Logistic Regression. Dr. Xiaowei Huang

Foundations For Learning in the Age of Big Data. Maria-Florina Balcan

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Lecture 18: Multiclass Support Vector Machines

SUPPORT VECTOR MACHINE

Logistic Regression. Machine Learning Fall 2018

Support Vector Machine. Industrial AI Lab.

Introduction to Machine Learning Midterm Exam

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley

What is semi-supervised learning?

Multiclass Boosting with Repartitioning

Web Search and Text Mining. Learning from Preference Data

Active learning in sequence labeling

Support Vector Machines and Kernel Methods

Introduction to Logistic Regression and Support Vector Machine

Machine learning for pervasive systems Classification in high-dimensional spaces

1 Active Learning Foundations of Machine Learning and Data Science. Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015

Introduction to Machine Learning

Lecture 11. Linear Soft Margin Support Vector Machines

ECS289: Scalable Machine Learning

CS6220: DATA MINING TECHNIQUES

Regret Analysis for Performance Metrics in Multi-Label Classification The Case of Hamming and Subset Zero-One Loss

FINAL: CS 6375 (Machine Learning) Fall 2014

Machine Learning Basics

NEAR-OPTIMALITY AND ROBUSTNESS OF GREEDY ALGORITHMS FOR BAYESIAN POOL-BASED ACTIVE LEARNING

The Naïve Bayes Classifier. Machine Learning Fall 2017

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Machine Learning Techniques

Support Vector Machine. Industrial AI Lab. Prof. Seungchul Lee

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers

Cutting Plane Training of Structural SVM

Efficient Bandit Algorithms for Online Multiclass Prediction

Multi-label learning from batch and streaming data

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

Machine Learning Techniques

Practical Agnostic Active Learning

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Constrained Optimization and Support Vector Machines

LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

COMS 4771 Lecture Boosting 1 / 16

Learning with multiple models. Boosting.

Classifier Chains for Multi-label Classification

Machine Learning Foundations

Support Vector Machines

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University

CS 188: Artificial Intelligence. Outline

Surrogate regret bounds for generalized classification performance metrics

ML4NLP Multiclass Classification

Machine Learning : Support Vector Machines

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

ML (cont.): SUPPORT VECTOR MACHINES

Binary Classification, Multi-label Classification and Ranking: A Decision-theoretic Approach

Progressive Random k-labelsets for Cost-Sensitive Multi-Label Classification

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers

18.9 SUPPORT VECTOR MACHINES

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9

Statistical Data Mining and Machine Learning Hilary Term 2016

Large Scale Semi-supervised Linear SVMs. University of Chicago

Infinite Ensemble Learning with Support Vector Machinery

A Least Squares Formulation for Canonical Correlation Analysis

c 4, < y 2, 1 0, otherwise,

Foundations For Learning in the Age of Big Data. Maria-Florina Balcan

TDT4173 Machine Learning

Support Vector Machine for Classification and Regression

Active learning. Sanjoy Dasgupta. University of California, San Diego

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs

Soft-margin SVM can address linearly separable problems with outliers

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning

FA Homework 2 Recitation 2

Adaptive Submodularity with Varying Query Sets: An Application to Active Multi-label Learning

Training algorithms for fuzzy support vector machines with nois

Holdout and Cross-Validation Methods Overfitting Avoidance

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

CSE 546 Final Exam, Autumn 2013

Decision Trees: Overfitting

Solving the SVM Optimization Problem

The AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer

Efficient and Principled Online Classification Algorithms for Lifelon

Midterm exam CS 189/289, Fall 2015

Machine Learning

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Transcription:

Multi-label Active Learning with Auxiliary Learner Chen-Wei Hung and Hsuan-Tien Lin National Taiwan University November 15, 2011 C.-W. Hung & H.-T. Lin (NTU) Multi-label AL w/ Auxiliary Learner 11/15/2011 (ACML) 1 / 14

Multi-label Active Learning Multi-label Active Learning multi-label data set: labeled pool D l = {(x n, y n)} Dataset Dataset l Labeled pool D l Labeled pool u Unlabeled pool D u Unlabeled pool Train a classifier Train a classifier unlabeled pool D u = {x n } Predict Predict learning: 2 Query oracle 2 and Query add new oracle labeled and instance add new to Labeled labeled instance pool to Labeled pool active: 1 Select some Select some data from unlabeled data from pool unlabeled pool train with data set to get decision functions f k (x) query 1 labels of size-s set Ds D u move D s & labels from D u to D l expensiveness of labeling, especially for multi-label active learning: allow asking questions (query labels) hope: reduce labeling cost while maintaining good performance by asking key questions C.-W. Hung & H.-T. Lin (NTU) Multi-label AL w/ Auxiliary Learner 11/15/2011 (ACML) 2 / 14

Multi-label Active Learning Problem Setup Given K -class problem with labeled pool D l that contains (input x n, label-set y n); y n expressed by { 1, +1} K an unlabeled pool D u = {x n } Goal a multi-label active learning algorithm that iteratively learn a decent classifier f k (x) R K from D l, with sign(f k (x)) used to predict k-th label choose a key subset D s from D u to be queried and improve performance of f k efficiently w.r.t. # queries multi-label active learning: newer and less-studied (than binary active learning) C.-W. Hung & H.-T. Lin (NTU) Multi-label AL w/ Auxiliary Learner 11/15/2011 (ACML) 3 / 14

Multi-label Active Learning Max. Loss Reduction with Max. Confidence (MMC) State-of-the-art in Multi-label Active Learning MMC: proposed by Yang et al., Effective Multi-label Active Learning for Text Classification, KDD, 2009 first-level learning: get g k (x) by binary relevance SVM (BRSVM) from D l second-level learning: get f k (x) by stacked logistic reg. (SLR) from D l & g k (x) query: by maximum margin reduction using f k and g k binary relevance SVM (BRSVM): one binary SVM per label promising practical performance with some theoretical rationale Motivation: How to improve MMC? C.-W. Hung & H.-T. Lin (NTU) Multi-label AL w/ Auxiliary Learner 11/15/2011 (ACML) 4 / 14

Multi-label Active Learning Multi-label Active Learning with Auxiliary Learner Idea digest the essence of MMC, and then extend for improvement auxiliary learning: get g k (x) by some G from D l major learning: get f k (x) by some F from D l query by disagreement of g k & f k proposed framework: query with two learners major & auxiliary major (original f k ): for accurate predictions of multi-label learning auxiliary: a different one to help query decisions MMC = (major: SLR) + (auxiliary: BRSVM) + (criterion: MMR) C.-W. Hung & H.-T. Lin (NTU) Multi-label AL w/ Auxiliary Learner 11/15/2011 (ACML) 5 / 14

Query Criteria Maximum Margin Reduction (MMR), Used by MMC Intuition: Query by Version Space Reduction query set Ds { ( ) ( )} = argmax V G, Dl V G, Dl labeled D s D s =S,D s D u V : size of version space (set of classifiers consistent to data) rationale: smaller V less ambiguity in learning better MMR: with some other assumptions D s top S instances D u, ordered by K k=1 1 sign(f k (x)) g k (x) 2 equivalent MMR criterion: K k=1 sign(f k(x)) g k (x) Yang et al.,effective Multi-label Active Learning for Text Classification, KDD09 C.-W. Hung & H.-T. Lin (NTU) Multi-label AL w/ Auxiliary Learner 11/15/2011 (ACML) 6 / 14

Query Criteria Maximum Hamming Loss Reduction (HLR) Intuition: Query by Hamming Loss Reduction query set Ds { ( ) ( )} = argmax HL G, Dl HL G, Dl labeled D s D s =S,D s D u HL: Hamming loss made by learner G rationale: smaller HL better performance in learning HLR (our proposed criterion): with some assumptions Ds top S instances D u, K ordered by sign(f k (x)) sign(g k (x)) k=1 equivalent HLR criterion: K k=1 sign(f k(x)) sign(g k (x)) C.-W. Hung & H.-T. Lin (NTU) Multi-label AL w/ Auxiliary Learner 11/15/2011 (ACML) 7 / 14

Query Criteria Quick Comparison between MMR and HLR MMR HLR K sign(f k (x)) g k (x) k=1 rationale: reduce V rapidly magnitude-sensitive : few large g k that disagree with f k = must query (not robust to outliers) K sign(f k (x)) sign(g k (x)) k=1 rationale: reduce HL rapidly magnitude-insensitive : useful ambiguity information in g k lost (not aware of details) better criterion by combining the two? Yes! C.-W. Hung & H.-T. Lin (NTU) Multi-label AL w/ Auxiliary Learner 11/15/2011 (ACML) 8 / 14

Query Criteria Soft Hamming Loss Reduction HLR MMR SHLR sign(f k (x)) g k (x) rationale: g k (x) large HLR to be robust to magnitude g k (x) small MMR to keep ambiguity information Soft HLR: Ds = top S instances D u, K ( ) ordered by clip sign(f k (x)) g k (x), 1, 1 k=1 which is better? SHLR, HLR or MMR? C.-W. Hung & H.-T. Lin (NTU) Multi-label AL w/ Auxiliary Learner 11/15/2011 (ACML) 9 / 14

Experiment Experiment Query Criteria Random: use neither auxiliary nor major BinMin: use only auxiliary but not major SHLR, HLR, MMR: use both auxiliary & major Setting (same as used by Yang et al. to evaluate MMC) D l size: initial 500 to final 1500, step by S = 20 Major/Auxiliary Combination major = SLR[BRSVM]; auxiliary = BR(SVM): used by MMC major = CC(SVM); auxiliary = BR(SVM) major = SLR[BRSVM]; auxiliary = CC(SVM) improve MMC by SHLR or HLR? best criterion across major/auxiliary combinations? C.-W. Hung & H.-T. Lin (NTU) Multi-label AL w/ Auxiliary Learner 11/15/2011 (ACML) 10 / 14

Experiment Experiment on SLR as major BR as auxiliary SLR+BR, rcv1, Evaluated with F1-score 0.88 0.86 0.84 F1 score on SLR major and BR auxiliary MMR HLR SHLR BinMin Random 0.82 F1 score 0.8 0.78 0.76 0.74 0.72 0.7 0 10 20 30 40 50 60 Number of add instance (*20) SHLR > MMR HLR > BinMin > Random C.-W. Hung & H.-T. Lin (NTU) Multi-label AL w/ Auxiliary Learner 11/15/2011 (ACML) 11 / 14

Experiment Experiment on SLR as major BR as auxiliary SLR+BR across Data Sets, Evaluated with F1-score 1 F1-score relative to the best 0.98 0.96 0.94 0.92 SHLR best: 5/8 (one tie) MMR best: 2/8 (one tie) HLR best: 2/8 0.9 rcv1 Y!Ar Y!Bu Y!Co Y!Ed Y!En yeast scene relative performance to the best across data sets: SHLR > MMR HLR better than MMC? YES! C.-W. Hung & H.-T. Lin (NTU) Multi-label AL w/ Auxiliary Learner 11/15/2011 (ACML) 12 / 14

Experiment Experiment on SLR as major BR as auxiliary CC+BR, rcv1, Evaluated with F1-score 0.9 0.85 0.8 F1 score on CC major and BR auxiliary MMR HLR SHLR BinMin Random 0.75 F1 score 0.7 0.65 0.6 0.55 0.5 0.45 0 10 20 30 40 50 60 Number of add instance (*20) SHLR similarly best when changing major to CC or changing auxiliary to CC or changing performance measure to Hamming loss C.-W. Hung & H.-T. Lin (NTU) Multi-label AL w/ Auxiliary Learner 11/15/2011 (ACML) 13 / 14

Conclusion Conclusion general framework for multi-label active learning: with auxiliary learner simple query criterion: via Hamming loss reduction, sometimes better even better query criterion: via soft Hamming loss reduction, usually best future work: major/auxiliary combination, especially choice of auxiliary Thank you. Questions? C.-W. Hung & H.-T. Lin (NTU) Multi-label AL w/ Auxiliary Learner 11/15/2011 (ACML) 14 / 14