Multi-label Active Learning with Auxiliary Learner

Multi-label Active Learning with Auxiliary Learner Chen-Wei Hung and Hsuan-Tien Lin National Taiwan University November 15, 2011 C.-W. Hung & H.-T. Lin (NTU) Multi-label AL w/ Auxiliary Learner 11/15/2011 (ACML) 1 / 14

Multi-label Active Learning Multi-label Active Learning multi-label data set: labeled pool D l = {(x n, y n)} Dataset Dataset l Labeled pool D l Labeled pool u Unlabeled pool D u Unlabeled pool Train a classifier Train a classifier unlabeled pool D u = {x n } Predict Predict learning: 2 Query oracle 2 and Query add new oracle labeled and instance add new to Labeled labeled instance pool to Labeled pool active: 1 Select some Select some data from unlabeled data from pool unlabeled pool train with data set to get decision functions f k (x) query 1 labels of size-s set Ds D u move D s & labels from D u to D l expensiveness of labeling, especially for multi-label active learning: allow asking questions (query labels) hope: reduce labeling cost while maintaining good performance by asking key questions C.-W. Hung & H.-T. Lin (NTU) Multi-label AL w/ Auxiliary Learner 11/15/2011 (ACML) 2 / 14

Multi-label Active Learning Problem Setup Given K -class problem with labeled pool D l that contains (input x n, label-set y n); y n expressed by { 1, +1} K an unlabeled pool D u = {x n } Goal a multi-label active learning algorithm that iteratively learn a decent classifier f k (x) R K from D l, with sign(f k (x)) used to predict k-th label choose a key subset D s from D u to be queried and improve performance of f k efficiently w.r.t. # queries multi-label active learning: newer and less-studied (than binary active learning) C.-W. Hung & H.-T. Lin (NTU) Multi-label AL w/ Auxiliary Learner 11/15/2011 (ACML) 3 / 14

Multi-label Active Learning Max. Loss Reduction with Max. Confidence (MMC) State-of-the-art in Multi-label Active Learning MMC: proposed by Yang et al., Effective Multi-label Active Learning for Text Classification, KDD, 2009 first-level learning: get g k (x) by binary relevance SVM (BRSVM) from D l second-level learning: get f k (x) by stacked logistic reg. (SLR) from D l & g k (x) query: by maximum margin reduction using f k and g k binary relevance SVM (BRSVM): one binary SVM per label promising practical performance with some theoretical rationale Motivation: How to improve MMC? C.-W. Hung & H.-T. Lin (NTU) Multi-label AL w/ Auxiliary Learner 11/15/2011 (ACML) 4 / 14

Multi-label Active Learning Multi-label Active Learning with Auxiliary Learner Idea digest the essence of MMC, and then extend for improvement auxiliary learning: get g k (x) by some G from D l major learning: get f k (x) by some F from D l query by disagreement of g k & f k proposed framework: query with two learners major & auxiliary major (original f k ): for accurate predictions of multi-label learning auxiliary: a different one to help query decisions MMC = (major: SLR) + (auxiliary: BRSVM) + (criterion: MMR) C.-W. Hung & H.-T. Lin (NTU) Multi-label AL w/ Auxiliary Learner 11/15/2011 (ACML) 5 / 14

Query Criteria Maximum Margin Reduction (MMR), Used by MMC Intuition: Query by Version Space Reduction query set Ds { ( ) ( )} = argmax V G, Dl V G, Dl labeled D s D s =S,D s D u V : size of version space (set of classifiers consistent to data) rationale: smaller V less ambiguity in learning better MMR: with some other assumptions D s top S instances D u, ordered by K k=1 1 sign(f k (x)) g k (x) 2 equivalent MMR criterion: K k=1 sign(f k(x)) g k (x) Yang et al.,effective Multi-label Active Learning for Text Classification, KDD09 C.-W. Hung & H.-T. Lin (NTU) Multi-label AL w/ Auxiliary Learner 11/15/2011 (ACML) 6 / 14

Query Criteria Maximum Hamming Loss Reduction (HLR) Intuition: Query by Hamming Loss Reduction query set Ds { ( ) ( )} = argmax HL G, Dl HL G, Dl labeled D s D s =S,D s D u HL: Hamming loss made by learner G rationale: smaller HL better performance in learning HLR (our proposed criterion): with some assumptions Ds top S instances D u, K ordered by sign(f k (x)) sign(g k (x)) k=1 equivalent HLR criterion: K k=1 sign(f k(x)) sign(g k (x)) C.-W. Hung & H.-T. Lin (NTU) Multi-label AL w/ Auxiliary Learner 11/15/2011 (ACML) 7 / 14

Query Criteria Quick Comparison between MMR and HLR MMR HLR K sign(f k (x)) g k (x) k=1 rationale: reduce V rapidly magnitude-sensitive : few large g k that disagree with f k = must query (not robust to outliers) K sign(f k (x)) sign(g k (x)) k=1 rationale: reduce HL rapidly magnitude-insensitive : useful ambiguity information in g k lost (not aware of details) better criterion by combining the two? Yes! C.-W. Hung & H.-T. Lin (NTU) Multi-label AL w/ Auxiliary Learner 11/15/2011 (ACML) 8 / 14

Query Criteria Soft Hamming Loss Reduction HLR MMR SHLR sign(f k (x)) g k (x) rationale: g k (x) large HLR to be robust to magnitude g k (x) small MMR to keep ambiguity information Soft HLR: Ds = top S instances D u, K ( ) ordered by clip sign(f k (x)) g k (x), 1, 1 k=1 which is better? SHLR, HLR or MMR? C.-W. Hung & H.-T. Lin (NTU) Multi-label AL w/ Auxiliary Learner 11/15/2011 (ACML) 9 / 14

Experiment Experiment Query Criteria Random: use neither auxiliary nor major BinMin: use only auxiliary but not major SHLR, HLR, MMR: use both auxiliary & major Setting (same as used by Yang et al. to evaluate MMC) D l size: initial 500 to final 1500, step by S = 20 Major/Auxiliary Combination major = SLR[BRSVM]; auxiliary = BR(SVM): used by MMC major = CC(SVM); auxiliary = BR(SVM) major = SLR[BRSVM]; auxiliary = CC(SVM) improve MMC by SHLR or HLR? best criterion across major/auxiliary combinations? C.-W. Hung & H.-T. Lin (NTU) Multi-label AL w/ Auxiliary Learner 11/15/2011 (ACML) 10 / 14

Experiment Experiment on SLR as major BR as auxiliary SLR+BR, rcv1, Evaluated with F1-score 0.88 0.86 0.84 F1 score on SLR major and BR auxiliary MMR HLR SHLR BinMin Random 0.82 F1 score 0.8 0.78 0.76 0.74 0.72 0.7 0 10 20 30 40 50 60 Number of add instance (*20) SHLR > MMR HLR > BinMin > Random C.-W. Hung & H.-T. Lin (NTU) Multi-label AL w/ Auxiliary Learner 11/15/2011 (ACML) 11 / 14

Experiment Experiment on SLR as major BR as auxiliary SLR+BR across Data Sets, Evaluated with F1-score 1 F1-score relative to the best 0.98 0.96 0.94 0.92 SHLR best: 5/8 (one tie) MMR best: 2/8 (one tie) HLR best: 2/8 0.9 rcv1 Y!Ar Y!Bu Y!Co Y!Ed Y!En yeast scene relative performance to the best across data sets: SHLR > MMR HLR better than MMC? YES! C.-W. Hung & H.-T. Lin (NTU) Multi-label AL w/ Auxiliary Learner 11/15/2011 (ACML) 12 / 14

Experiment Experiment on SLR as major BR as auxiliary CC+BR, rcv1, Evaluated with F1-score 0.9 0.85 0.8 F1 score on CC major and BR auxiliary MMR HLR SHLR BinMin Random 0.75 F1 score 0.7 0.65 0.6 0.55 0.5 0.45 0 10 20 30 40 50 60 Number of add instance (*20) SHLR similarly best when changing major to CC or changing auxiliary to CC or changing performance measure to Hamming loss C.-W. Hung & H.-T. Lin (NTU) Multi-label AL w/ Auxiliary Learner 11/15/2011 (ACML) 13 / 14

Conclusion Conclusion general framework for multi-label active learning: with auxiliary learner simple query criterion: via Hamming loss reduction, sometimes better even better query criterion: via soft Hamming loss reduction, usually best future work: major/auxiliary combination, especially choice of auxiliary Thank you. Questions? C.-W. Hung & H.-T. Lin (NTU) Multi-label AL w/ Auxiliary Learner 11/15/2011 (ACML) 14 / 14