Using Both Latent and Supervised Shared Topics for Multitask Learning

Similar documents
Online Bayesian Passive-Agressive Learning

A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank

Generative Clustering, Topic Modeling, & Bayesian Inference

Distinguish between different types of scenes. Matching human perception Understanding the environment

Study Notes on the Latent Dirichlet Allocation

CS Lecture 18. Topic Models and LDA

Classical Predictive Models

Online Bayesian Passive-Aggressive Learning

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Online Bayesian Passive-Aggressive Learning"

Latent variable models for discrete data

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine

Class-Specific Simplex-Latent Dirichlet Allocation for Image Classification

Class-Specific Simplex-Latent Dirichlet Allocation for Image Classification

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Kernel Density Topic Models: Visual Topics Without Visual Words

Distributed ML for DOSNs: giving power back to users

Latent Dirichlet Allocation Introduction/Overview

Hybrid Models for Text and Graphs. 10/23/2012 Analysis of Social Media

Sparse Stochastic Inference for Latent Dirichlet Allocation

ECS289: Scalable Machine Learning

Click Prediction and Preference Ranking of RSS Feeds

Transfer Learning using Task-Level Features with Application to Information Retrieval

Topic Modeling: Beyond Bag-of-Words

Lab 12: Structured Prediction

Semi-supervised Learning

ECS289: Scalable Machine Learning

arxiv: v1 [stat.ml] 30 Dec 2009

Knowledge Transfer with Interactive Learning of Semantic Relationships

Machine Learning for Structured Prediction

Efficient and Principled Online Classification Algorithms for Lifelon

Relevance Topic Model for Unstructured Social Group Activity Recognition

Introduction to Machine Learning Midterm Exam

SUPERVISED MULTI-MODAL TOPIC MODEL FOR IMAGE ANNOTATION

Content-based Recommendation

Collaborative Topic Modeling for Recommending Scientific Articles

Gaussian Models

Logistic Regression. COMP 527 Danushka Bollegala

Dimension Reduction (PCA, ICA, CCA, FLD,

Information Extraction from Text

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 24

Topic Models and Applications to Short Documents

Parametric Mixture Models for Multi-Labeled Text

Markov Topic Models. Bo Thiesson, Christopher Meek Microsoft Research One Microsoft Way Redmond, WA 98052

Document and Topic Models: plsa and LDA

Latent Dirichlet Conditional Naive-Bayes Models

Topics in Natural Language Processing

ECE 5984: Introduction to Machine Learning

Topic Modelling and Latent Dirichlet Allocation

Detecting Humans via Their Pose

On the Interpretability of Conditional Probability Estimates in the Agnostic Setting

Probabilistic Latent Semantic Analysis

Monte Carlo Methods for Maximum Margin Supervised Topic Models

人工知能学会インタラクティブ情報アクセスと可視化マイニング研究会 ( 第 3 回 ) SIG-AM Pseudo Labled Latent Dirichlet Allocation 1 2 Satoko Suzuki 1 Ichiro Kobayashi Departmen

STA 4273H: Statistical Machine Learning

Brief Introduction to Machine Learning

Anticipating Visual Representations from Unlabeled Data. Carl Vondrick, Hamed Pirsiavash, Antonio Torralba

Introduction to Probabilistic Machine Learning

Discriminative Training of Mixed Membership Models

Discriminative Learning of Sum-Product Networks. Robert Gens Pedro Domingos

6.036 midterm review. Wednesday, March 18, 15

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Supervised dimension reduction with topic models

Active and Semi-supervised Kernel Classification

Introduction to Machine Learning Midterm Exam Solutions

Lecture 13 : Variational Inference: Mean Field Approximation

Information retrieval LSI, plsi and LDA. Jian-Yun Nie

Statistical Debugging with Latent Topic Models

Unsupervised Learning of Hierarchical Models. in collaboration with Josh Susskind and Vlad Mnih

Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs

Fast Supervised LDA for Discovering Micro-Events in Large-Scale Video Datasets

Knowledge Extraction from DBNs for Images

Machine Learning for NLP

Topic Models. Charles Elkan November 20, 2008

ECE521 week 3: 23/26 January 2017

INTERPRETING THE PREDICTION PROCESS OF A DEEP NETWORK CONSTRUCTED FROM SUPERVISED TOPIC MODELS

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Large Scale Semi-supervised Linear SVM with Stochastic Gradient Descent

Contextual Modeling with Labeled Multi-LDA

Collaborative topic models: motivations cont

Classification, Linear Models, Naïve Bayes

TUTORIAL PART 1 Unsupervised Learning

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Incorporating Social Context and Domain Knowledge for Entity Recognition

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

LogisticLDA: Regularizing Latent Dirichlet Allocation by Logistic Regression

An Efficient Approach for Assessing Parameter Importance in Bayesian Optimization

Latent Dirichlet Allocation (LDA)

Improving Topic Models with Latent Feature Word Representations

HOMEWORK 4: SVMS AND KERNELS

Latent Dirichlet Allocation (LDA)

CMU-Q Lecture 24:

Predictive Subspace Learning for Multi-view Data: a Large Margin Approach

Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning

Small-variance Asymptotics for Dirichlet Process Mixtures of SVMs

Introduction to Neural Networks

Machine Teaching. for Personalized Education, Security, Interactive Machine Learning. Jerry Zhu

Machine Learning Basics

Transcription:

Using Both Latent and Supervised Shared Topics for Multitask Learning Ayan Acharya, Aditya Rawal, Raymond J. Mooney, Eduardo R. Hruschka UT Austin, Dept. of ECE September 21, 2013

Problem Definition An MTL framework that can use both attributes and class labels In training corpus each document belongs to a different class and has a set of attributes ( supervised topics ). Objective: Train a model using the words, supervised topics and class labels, and classify completely unlabeled test data (no supervised topic or class label) Attributes: is 3d Boxy?, has torso?, has wheels? etc. Ayan Acharya (UT Austin, Dept. of ECE) DSLDA-NPDSLDA September 21, 2013 2 / 19

Transfer with Supervised Shared Attributes Train to infer attributes from visual features Train to infer categories from attributes (Lampert et al., CVPR 2009) Ayan Acharya (UT Austin, Dept. of ECE) DSLDA-NPDSLDA September 21, 2013 3 / 19

Multitask Learning with Shared Latent Attributes work on multitask learning by R. Caruana (Machine Learning, 1997) Ayan Acharya (UT Austin, Dept. of ECE) DSLDA-NPDSLDA September 21, 2013 4 / 19

Transfer with Shared Latent and Supervised Attributes Ayan Acharya (UT Austin, Dept. of ECE) DSLDA-NPDSLDA September 21, 2013 5 / 19

Latent Dirichlet Allocation (LDA) Reference: Blei et al., JMLR, 2003 α θ z w M n N β K Ayan Acharya (UT Austin, Dept. of ECE) DSLDA-NPDSLDA September 21, 2013 6 / 19

Labeled LDA (LLDA) Reference: Ramage et al., EMNLP, 2009 α Λ θ z w M n N β K Ayan Acharya (UT Austin, Dept. of ECE) DSLDA-NPDSLDA September 21, 2013 7 / 19

Maximum Entropy Discriminant LDA (MedLDA) Reference: Zhu et al., ICML, 2009 α θ z Y w M n N β K r Ayan Acharya (UT Austin, Dept. of ECE) DSLDA-NPDSLDA September 21, 2013 8 / 19

Doubly Supervised LDA (DSLDA) α (1) α (2) Λ θ z ɛ Y w M n N β K r Ayan Acharya (UT Austin, Dept. of ECE) DSLDA-NPDSLDA September 21, 2013 9 / 19

Objective Function in DSLDA 1 min q,κ 0,{ξ n} 2 r 2 L(q(Z), κ 0 ) + C N ξ n, n=1 s.t. n, y Y n : E[r T f n (y)] 1 ξ n ; ξ n 0. Ayan Acharya (UT Austin, Dept. of ECE) DSLDA-NPDSLDA September 21, 2013 10 / 19

Objective Function in DSLDA 1 min q,κ 0,{ξ n} 2 r 2 L(q(Z), κ 0 ) + C N ξ n, n=1 s.t. n, y Y n : E[r T f n (y)] 1 ξ n ; ξ n 0. κ 0 : set of model parameters f n (y) = f (Y n, z n) f (y, z n) f (y, z n) : zero padded feature vector L(q(Z)) : lower bound from variational approximation q(z) Ayan Acharya (UT Austin, Dept. of ECE) DSLDA-NPDSLDA September 21, 2013 10 / 19

Non-parametric Doubly Supervised LDA (NPDSLDA) α (2) ɛ δ 0 Λ π (2) π c β γ 0 Y z w M n N φ φ K 2 η 1 η 2 r Ayan Acharya (UT Austin, Dept. of ECE) DSLDA-NPDSLDA September 21, 2013 11 / 19

Baseline Models 1 MedLDA with one-vs-all classification (MedLDA-OVA) 2 MedLDA with multitask learning (MedLDA-MTL) 3 DSLDA with only shared supervised topics (DSLDA-OSST) 4 DSLDA with no shared latent topics (DSLDA-NSLT) 5 Majority class method (MCM) Model Supervised Topics Latent Topics MedLDA-OVA absent not shared MedLDA-MTL absent shared DSLDA-OSST present absent DSLDA-NSLT present not shared MCM absent absent Ayan Acharya (UT Austin, Dept. of ECE) DSLDA-NPDSLDA September 21, 2013 12 / 19

Description of Dataset: ayahoo Classes: carriage, centaur, bag, building, donkey, goat, jetski, monkey, mug, statue, wolf, and zebra Supervised topics: has head, has wheel, has torso and 61 others Ayan Acharya (UT Austin, Dept. of ECE) DSLDA-NPDSLDA September 21, 2013 13 / 19

Description of Dataset: ACM Conference Classes: First group WWW, SIGIR, KDD, ICML; Second group ISPD, DAC; abstracts of papers are treated as documents Supervised topics: keywords provided by the authors Ayan Acharya (UT Austin, Dept. of ECE) DSLDA-NPDSLDA September 21, 2013 14 / 19

Experimental Methodology Multitask training that evaluates benefits of sharing information between classes on the predictive accuracy of all classes Varied both fraction of training data that contains supervised topic labels and the fraction that contains supervised class labels Ayan Acharya (UT Austin, Dept. of ECE) DSLDA-NPDSLDA September 21, 2013 15 / 19

Results from ayahoo Data 50% training with supervised topic labels Ayan Acharya (UT Austin, Dept. of ECE) DSLDA-NPDSLDA September 21, 2013 16 / 19

Results from Text Data 50% training with supervised topic labels Ayan Acharya (UT Austin, Dept. of ECE) DSLDA-NPDSLDA September 21, 2013 17 / 19

Future Work Active learning for efficient query over both supervised topics and class labels Online training to update the model parameters The general idea of double supervision could be applied to many other models, for example, in multi-layer perceptrons, latent SVMs or in deep belief networks. Ayan Acharya (UT Austin, Dept. of ECE) DSLDA-NPDSLDA September 21, 2013 18 / 19

Questions? References: 1 Multitask Learning, R. Caruana, Machine Learning, 1997. [Link]. 2 Learning to detect unseen object classes by between class attribute transfer, CVPR 2009, Lampert et al. [Link]. 3 Actively Selecting Annotations Among Objects and Attributes, ICCV 2011, Kovashka et al. [Link]. 4 MedLDA: Maximum Margin Supervised Topic Models for Regression and Classification, ICML 2009, Zhu et al. [Link]. 5 Online Variational Inference for the Hierarchical Dirichlet Process, AISTATS 2011, Wang et al. [Link]. Ayan Acharya (UT Austin, Dept. of ECE) DSLDA-NPDSLDA September 21, 2013 19 / 19