ICDM Pisa, December 16, Motivation

Size: px
Start display at page:

Download "ICDM Pisa, December 16, Motivation"

Transcription

1 Using Wikipedia for Co-clustering Based Cross-domain Text Classification Pu Wang and Carlotta Domeniconi George Mason University Jian Hu Microsoft Research Asia ICDM Pisa, December 16, 2008 Motivation Labeled data are seldom available, and often too expensive to obtain. Abundant labeled data may exist for a different but related domain. Goal: Use the labeled data as auxiliary information to accomplish the task (classification) in the target domain.

2 Main Idea Leverage the shared dictionary across the in-domain and out-of-domain (target) documents to propagate label information. D i common words label propagation D o Main Idea Enrich document representation to fill the semantic gap. D i common words & semantic concepts label propagation D o

3 Co-clustering based Classification D i : in-domain documents D o : out-of-domain documents C : set of class labels W : dictionary of all the words Co-clustering based Classification Co-clustering of D o : C Do : {d 1,..., d m } { ˆd 1, ˆd 2,..., ˆd C } = ˆD o C W : {w 1,..., w n } { ŵ 1, ŵ 2,..., ŵ k } = Ŵ

4 Co-clustering based Classification )(*)&+,$-.#"/01.( :4,;) :4,;) "#8& 64917%82) "#8& 64917%82) *+1)2%,#8>&:4,;)&45&'42?&<83 647"#8&"8;&012345&647"#8& 64917%82) :4,;) ;4917%82& 9+1)2%,) *+1)2%,#8>& "#8&64917%82) :4,;) 4$-506 4$-507 Co-clustering based Classification!"#$#%&#'%$#(" :4,;) *-./0!"#$%& '"(%)& *+"))#!%, <8#2#"+#="2#48& 45&64917%82& *+1)2%,) "#8& 64917%82) <8#2#"+#="2#48& 45&:4,;& *+1)2%,) <83647"#8&"8;& "#8& 64917%82) :4,;)

5 Co-clustering based Classification Iterative algorithm that achieves min ˆDo,Ŵ{I(D o; W) I( ˆD o ; Ŵ)+λ(I(C; W) I(C; Ŵ))} loss in mutual information between documents and words loss in mutual information between class labels and words Information Theoretic Co-clustering [Dhillion et al., KDD 03] I(D o ; W) I( ˆD o ; Ŵ) I(X; Y )= x p(x, y) log y p(x, y) p(x)p(y) I(C; W) I(C; Ŵ)

6 f(w) = p(d, w),f(d w) =p(d w) = d D o f(d) = p(d, w),f(w d) =p(w d) = w W f(d, w) f(w), p(d, w) f(d), ˆf(ŵ ˆd) =p(ŵ ˆd), ˆf( ˆd ŵ) =p( ˆd ŵ), ˆf(d ˆd) =p(d ˆd), ˆf(w ŵ) =p(w ŵ), ˆf(d ŵ) = ˆf(d ˆd) ˆf( ˆd ŵ) =p(d ˆd)p( ˆd ŵ) ˆf(w ˆd) = ˆf(w ŵ) ˆf(ŵ ˆd) =p(w ŵ)p(ŵ ˆd) g(c, w) =p(c, ŵ)p(w ŵ) =p(c, ŵ) p(w) p(ŵ) g(w) = p(c, w),g(c w) =p(c w) = c C w ŵ ĝ(c ŵ) = p(c w)p(w) = p(ŵ) g(c, w) g(w), w ŵ p(c w)p(w) w ŵ p(w).

7 Co-clustering based Classification I(D o ; W) I( ˆD o ; Ŵ)+λI(C; W) I(C; Ŵ) = D(f(D o ; W) ˆf(D o ; W)) + λd(g(c, W) ĝ(c, W)) D(p(x) q(x)) = x p(x) log p(x) q(x) Co-clustering based Classification D(f(D o, W) ˆf(D o, W)) f(d)d(f(w d) ˆf(W ˆd)) = ˆd ˆDo d ˆd D(f(D o, W) ˆf(D o, W)) = f(w)d(f(d o w) ˆf(D o ŵ)) ŵ Ŵ w ŵ D(g(C, W) ĝ(c, W)) = g(w)d(g((c w) ĝ(c ŵ))) ŵ Ŵ w ŵ

8 Co-clustering based Classification C (t) D o (d) = argmin D(f(W d) ˆf (t 1) (W ˆd)) ˆd C (t+1) W (d) = argmin f(w)d(f(d o w) ˆf(D o ŵ)) ŵ +λg(w)d(g((c w) ĝ(c ŵ))) Main Idea Enrich document representation to fill the semantic gap. D i common words & semantic concepts label propagation D o

9 Building Semantic Kernels from Wikipedia: Overall Approach Build Thesaurus from Wikipedia Build Semantic Kernels Search Wikipedia Concepts in Documents Ambiguous Concepts: Puma Puma (Car) Redirect Concepts of "Puma" "Cougar" "Felidae" Category "Puma" Related Concepts of "Puma" 1 a... b a 1... c b c... 1 Text Document "... The Cougar, also Puma and Mountain lion, is a New World mammal of the Felidae family..." Concept "Puma" belongs to Category "Felidae" "Mountain Lion" Wikipedia Concept Proximity Matrix Candidate Concepts Puma 2... "Ford Vehicles" Category "Puma (Car)" "Automobile" Enrich Document Representation with Wikipedia Concepts Enriched Document Representation Puma 2 Cougar 2 Felines Disambiguation "Puma" here means a kind of animal, not car or sportbrand. Terms Concepts Proximity Matrix Terms Concepts a b a 1 c b c 1 S = λ 1 S BOW + λ 2 S OLC + (1 λ 1 λ 2 )(1 D cat ) Outlink category-based Contentbased Distancebased

10 Terms Concepts Terms Proximity Matrix Concepts a b a 1 c b c 1 1 if c i and c j are synonyms; µ P ij = depth if c i and c j are hyponyms; S if c i and c j are associative concepts; 0 otherwise. S = λ 1 S BOW + λ 2 S OLC + (1 λ 1 λ 2 )(1 D cat ) Outlink category-based Contentbased Distancebased Building Semantic Kernels Machine learning, statistical learning and data mining are related subjects. Original BOW Vector <machine:1, statistical:1, learn:2, data:1, mine:1, relate:1, subject:1> Find Wikipedia Concepts and Keep as it is φ(d) P φ(d) =φ(d)p <relate:1, subject:1; machine learning:1, statistical learning:1, data mining:1;... >... Machine Learning Statistical Learning Data Mining Artificial Intelligence... Machine Learning Statistical Learning Data Mining Artificial Intelligence Enriched Document Vector Representation = <relate:1, subject:1; machine learning:1, statistical learning:1, data mining:1; artificial intelligence:0.3252>

11 Empirical Evaluation Data sets: 20Newsgroups and SRAA Methods: CoCC w/ and w/out enrichment NB w/ and w/out enrichment Cross-domain Classification Precision Rates Data Set w/o enrichment w/ enrichment NB CoCC NB CoCC rec vs talk rec vs sci comp vs talk comp vs sci comp vs rec sci vs talk rec vs sci vs comp rec vs talk vs sci sci vs talk vs comp rec vs talk vs sci vs comp real vs simulation auto vs aviation

12 )"! CoCC with enrichment: Precision as a function of the number of iterations!"(!"',-./01023!"&!"%!"$!"#! ) * + # $ % & ' ( )! )) )* )+ )# )$ )% )& )' )( *! *) ** *+ *# *$ *% *& / :78171/0 -./78171/07817/2;< 1/ :7817/2;< 1/ / :7817/2;< CoCC with enrichment: Precision as a function of λ (sci vs talk vs comp) '"!!"& *+,-./.01!"%!"$!"# '#& /7,+/ %$ /7,+/ '% /7,+/!!"!('#)!"!%#)!"'#)!"#)!") ' # $ &!

13 CoCC with enrichment: Precision as a function of the number of word clusters (sci vs talk vs comp),-./01023 #"!!"+!"*!")!"(!"'!"&!"%!"$!"#!!?#!?!"$'!?!"#$' $ & * #( %$ (& #$* $'( '#$ :2-;8<=51>.-1 Conclusions Extended co-clustering approach for crossdomain text classification by embedding background knowledge using Wikipedia Future work: Explore alternative representations for common language substrate Cross-language text classification

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa VS model in practice Document and query are represented by term vectors Terms are not necessarily orthogonal to each other Synonymy: car v.s. automobile Polysemy:

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa Recap: vector space model Represent both doc and query by concept vectors Each concept defines one dimension K concepts define a high-dimensional space Element

More information

Instance-based Domain Adaptation via Multi-clustering Logistic Approximation

Instance-based Domain Adaptation via Multi-clustering Logistic Approximation Instance-based Domain Adaptation via Multi-clustering Logistic Approximation FENG U, Nanjing University of Science and Technology JIANFEI YU, Singapore Management University RUI IA, Nanjing University

More information

Instance-based Domain Adaptation

Instance-based Domain Adaptation Instance-based Domain Adaptation Rui Xia School of Computer Science and Engineering Nanjing University of Science and Technology 1 Problem Background Training data Test data Movie Domain Sentiment Classifier

More information

Cross-Lingual Language Modeling for Automatic Speech Recogntion

Cross-Lingual Language Modeling for Automatic Speech Recogntion GBO Presentation Cross-Lingual Language Modeling for Automatic Speech Recogntion November 14, 2003 Woosung Kim woosung@cs.jhu.edu Center for Language and Speech Processing Dept. of Computer Science The

More information

Information Extraction from Text

Information Extraction from Text Information Extraction from Text Jing Jiang Chapter 2 from Mining Text Data (2012) Presented by Andrew Landgraf, September 13, 2013 1 What is Information Extraction? Goal is to discover structured information

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

Classification & Information Theory Lecture #8

Classification & Information Theory Lecture #8 Classification & Information Theory Lecture #8 Introduction to Natural Language Processing CMPSCI 585, Fall 2007 University of Massachusetts Amherst Andrew McCallum Today s Main Points Automatically categorizing

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu November 3, 2015 Methods to Learn Matrix Data Text Data Set Data Sequence Data Time Series Graph

More information

Kernel Learning via Random Fourier Representations

Kernel Learning via Random Fourier Representations Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu October 19, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information. L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate

More information

Dept. of Linguistics, Indiana University Fall 2015

Dept. of Linguistics, Indiana University Fall 2015 L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission

More information

Collaborative topic models: motivations cont

Collaborative topic models: motivations cont Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem.

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

Bayesian decision making

Bayesian decision making Bayesian decision making Václav Hlaváč Czech Technical University in Prague Czech Institute of Informatics, Robotics and Cybernetics 166 36 Prague 6, Jugoslávských partyzánů 1580/3, Czech Republic http://people.ciirc.cvut.cz/hlavac,

More information

Notes on Latent Semantic Analysis

Notes on Latent Semantic Analysis Notes on Latent Semantic Analysis Costas Boulis 1 Introduction One of the most fundamental problems of information retrieval (IR) is to find all documents (and nothing but those) that are semantically

More information

Word Sense Disambiguation (WSD) Based on Foundations of Statistical NLP by C. Manning & H. Schütze, ch. 7 MIT Press, 2002

Word Sense Disambiguation (WSD) Based on Foundations of Statistical NLP by C. Manning & H. Schütze, ch. 7 MIT Press, 2002 0. Word Sense Disambiguation (WSD) Based on Foundations of Statistical NLP by C. Manning & H. Schütze, ch. 7 MIT Press, 2002 WSD Examples 1. They have the right to bear arms. (drept) The sign on the right

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun yzsun@cs.ucla.edu December 4, 2017 Methods to be Learnt Vector Data Set Data Sequence Data Text Data Classification Clustering

More information

Information Retrieval and Organisation

Information Retrieval and Organisation Information Retrieval and Organisation Chapter 13 Text Classification and Naïve Bayes Dell Zhang Birkbeck, University of London Motivation Relevance Feedback revisited The user marks a number of documents

More information

Domain Adaptation for Word Sense Disambiguation under the Problem of Covariate Shift

Domain Adaptation for Word Sense Disambiguation under the Problem of Covariate Shift Domain Adaptation for Word Sense Disambiguation under the Problem of Covariate Shift HIRONORI KIKUCHI 1,a) HIROYUKI SHINNOU 1,b) Abstract: Word sense disambiguation(wsd) is the task of identifying the

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Information Bottleneck Co-clustering

Information Bottleneck Co-clustering Information Bottleneck Co-clustering Pu Wang Carlotta Domeniconi Kathryn Blackmond Laskey Abstract Co-clustering has emerged as an important approach for mining contingency data matrices. We present a

More information

Intelligent Systems I

Intelligent Systems I Intelligent Systems I 00 INTRODUCTION Stefan Harmeling & Philipp Hennig 24. October 2013 Max Planck Institute for Intelligent Systems Dptmt. of Empirical Inference Which Card? Opening Experiment Which

More information

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute

More information

Convolutional Dictionary Learning and Feature Design

Convolutional Dictionary Learning and Feature Design 1 Convolutional Dictionary Learning and Feature Design Lawrence Carin Duke University 16 September 214 1 1 Background 2 Convolutional Dictionary Learning 3 Hierarchical, Deep Architecture 4 Convolutional

More information

Graphical Models for Automatic Speech Recognition

Graphical Models for Automatic Speech Recognition Graphical Models for Automatic Speech Recognition Advanced Signal Processing SE 2, SS05 Stefan Petrik Signal Processing and Speech Communication Laboratory Graz University of Technology GMs for Automatic

More information

From perceptrons to word embeddings. Simon Šuster University of Groningen

From perceptrons to word embeddings. Simon Šuster University of Groningen From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written

More information

Determining Word Sense Dominance Using a Thesaurus

Determining Word Sense Dominance Using a Thesaurus Determining Word Sense Dominance Using a Thesaurus Saif Mohammad and Graeme Hirst Department of Computer Science University of Toronto EACL, Trento, Italy (5th April, 2006) Copyright cfl2006, Saif Mohammad

More information

Interpreting Deep Classifiers

Interpreting Deep Classifiers Ruprecht-Karls-University Heidelberg Faculty of Mathematics and Computer Science Seminar: Explainable Machine Learning Interpreting Deep Classifiers by Visual Distillation of Dark Knowledge Author: Daniela

More information

Generative Classifiers: Part 1. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang

Generative Classifiers: Part 1. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang Generative Classifiers: Part 1 CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang 1 This Week Discriminative vs Generative Models Simple Model: Does the patient

More information

Machine Learning for Data Science (CS4786) Lecture 2

Machine Learning for Data Science (CS4786) Lecture 2 Machine Learning for Data Science (CS4786) Lecture 2 Clustering Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2017fa/ REPRESENTING DATA AS FEATURE VECTORS How do we represent data? Each data-point

More information

Hou, Ch. et al. IEEE Transactions on Neural Networks March 2011

Hou, Ch. et al. IEEE Transactions on Neural Networks March 2011 Hou, Ch. et al. IEEE Transactions on Neural Networks March 2011 Semi-supervised approach which attempts to incorporate partial information from unlabeled data points Semi-supervised approach which attempts

More information

Citation for published version (APA): Andogah, G. (2010). Geographically constrained information retrieval Groningen: s.n.

Citation for published version (APA): Andogah, G. (2010). Geographically constrained information retrieval Groningen: s.n. University of Groningen Geographically constrained information retrieval Andogah, Geoffrey IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from

More information

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion

More information

Distribution-Free Distribution Regression

Distribution-Free Distribution Regression Distribution-Free Distribution Regression Barnabás Póczos, Alessandro Rinaldo, Aarti Singh and Larry Wasserman AISTATS 2013 Presented by Esther Salazar Duke University February 28, 2014 E. Salazar (Reading

More information

HMM Expanded to Multiple Interleaved Chains as a Model for Word Sense Disambiguation

HMM Expanded to Multiple Interleaved Chains as a Model for Word Sense Disambiguation HMM Expanded to Multiple Interleaved Chains as a Model for Word Sense Disambiguation Denis Turdakov and Dmitry Lizorkin Institute for System Programming of the Russian Academy of Sciences, 25 Solzhenitsina

More information

Machine Learning for Data Science (CS4786) Lecture 24

Machine Learning for Data Science (CS4786) Lecture 24 Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each

More information

Term Filtering with Bounded Error

Term Filtering with Bounded Error Term Filtering with Bounded Error Zi Yang, Wei Li, Jie Tang, and Juanzi Li Knowledge Engineering Group Department of Computer Science and Technology Tsinghua University, China {yangzi, tangjie, ljz}@keg.cs.tsinghua.edu.cn

More information

What is semi-supervised learning?

What is semi-supervised learning? What is semi-supervised learning? In many practical learning domains, there is a large supply of unlabeled data but limited labeled data, which can be expensive to generate text processing, video-indexing,

More information

Lecture 12: Link Analysis for Web Retrieval

Lecture 12: Link Analysis for Web Retrieval Lecture 12: Link Analysis for Web Retrieval Trevor Cohn COMP90042, 2015, Semester 1 What we ll learn in this lecture The web as a graph Page-rank method for deriving the importance of pages Hubs and authorities

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Language Models. Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Language Models. Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Language Models Tobias Scheffer Stochastic Language Models A stochastic language model is a probability distribution over words.

More information

Intelligent Systems Discriminative Learning, Neural Networks

Intelligent Systems Discriminative Learning, Neural Networks Intelligent Systems Discriminative Learning, Neural Networks Carsten Rother, Dmitrij Schlesinger WS2014/2015, Outline 1. Discriminative learning 2. Neurons and linear classifiers: 1) Perceptron-Algorithm

More information

Matrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang

Matrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang Matrix Factorization & Latent Semantic Analysis Review Yize Li, Lanbo Zhang Overview SVD in Latent Semantic Indexing Non-negative Matrix Factorization Probabilistic Latent Semantic Indexing Vector Space

More information

Topic Models and Applications to Short Documents

Topic Models and Applications to Short Documents Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

Variable Latent Semantic Indexing

Variable Latent Semantic Indexing Variable Latent Semantic Indexing Prabhakar Raghavan Yahoo! Research Sunnyvale, CA November 2005 Joint work with A. Dasgupta, R. Kumar, A. Tomkins. Yahoo! Research. Outline 1 Introduction 2 Background

More information

Fuzzy Systems. Introduction

Fuzzy Systems. Introduction Fuzzy Systems Introduction Prof. Dr. Rudolf Kruse Christian Moewes {kruse,cmoewes}@iws.cs.uni-magdeburg.de Otto-von-Guericke University of Magdeburg Faculty of Computer Science Department of Knowledge

More information

Multi-Task Clustering using Constrained Symmetric Non-Negative Matrix Factorization

Multi-Task Clustering using Constrained Symmetric Non-Negative Matrix Factorization Multi-Task Clustering using Constrained Symmetric Non-Negative Matrix Factorization Samir Al-Stouhi Chandan K. Reddy Abstract Researchers have attempted to improve the quality of clustering solutions through

More information

Machine Learning Linear Models

Machine Learning Linear Models Machine Learning Linear Models Outline II - Linear Models 1. Linear Regression (a) Linear regression: History (b) Linear regression with Least Squares (c) Matrix representation and Normal Equation Method

More information

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Schütze s, linked from http://informationretrieval.org/ IR 26/26: Feature Selection and Exam Overview Paul Ginsparg Cornell University,

More information

CUSTOMER REVIEW FEATURE EXTRACTION Heng Ren, Jingye Wang, and Tony Wu

CUSTOMER REVIEW FEATURE EXTRACTION Heng Ren, Jingye Wang, and Tony Wu CUSTOMER REVIEW FEATURE EXTRACTION Heng Ren, Jingye Wang, an Tony Wu Abstract Popular proucts often have thousans of reviews that contain far too much information for customers to igest. Our goal for the

More information

Tutorial 2. Fall /21. CPSC 340: Machine Learning and Data Mining

Tutorial 2. Fall /21. CPSC 340: Machine Learning and Data Mining 1/21 Tutorial 2 CPSC 340: Machine Learning and Data Mining Fall 2016 Overview 2/21 1 Decision Tree Decision Stump Decision Tree 2 Training, Testing, and Validation Set 3 Naive Bayes Classifier Decision

More information

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection Instructor: Herke van Hoof (herke.vanhoof@cs.mcgill.ca) Based on slides by:, Jackie Chi Kit Cheung Class web page:

More information

Probabilistic Graphical Models for Image Analysis - Lecture 1

Probabilistic Graphical Models for Image Analysis - Lecture 1 Probabilistic Graphical Models for Image Analysis - Lecture 1 Alexey Gronskiy, Stefan Bauer 21 September 2018 Max Planck ETH Center for Learning Systems Overview 1. Motivation - Why Graphical Models 2.

More information

Statistical Methods for SVM

Statistical Methods for SVM Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,

More information

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision

More information

Learning Objectives. c D. Poole and A. Mackworth 2010 Artificial Intelligence, Lecture 7.2, Page 1

Learning Objectives. c D. Poole and A. Mackworth 2010 Artificial Intelligence, Lecture 7.2, Page 1 Learning Objectives At the end of the class you should be able to: identify a supervised learning problem characterize how the prediction is a function of the error measure avoid mixing the training and

More information

Semi-supervised Learning

Semi-supervised Learning Semi-supervised Learning Introduction Supervised learning: x r, y r R r=1 E.g.x r : image, y r : class labels Semi-supervised learning: x r, y r r=1 R, x u R+U u=r A set of unlabeled data, usually U >>

More information

Cluster Kernels for Semi-Supervised Learning

Cluster Kernels for Semi-Supervised Learning Cluster Kernels for Semi-Supervised Learning Olivier Chapelle, Jason Weston, Bernhard Scholkopf Max Planck Institute for Biological Cybernetics, 72076 Tiibingen, Germany {first. last} @tuebingen.mpg.de

More information

Issues and Techniques in Pattern Classification

Issues and Techniques in Pattern Classification Issues and Techniques in Pattern Classification Carlotta Domeniconi www.ise.gmu.edu/~carlotta Machine Learning Given a collection of data, a machine learner eplains the underlying process that generated

More information

Sequences and Information

Sequences and Information Sequences and Information Rahul Siddharthan The Institute of Mathematical Sciences, Chennai, India http://www.imsc.res.in/ rsidd/ Facets 16, 04/07/2016 This box says something By looking at the symbols

More information

Multisurface Proximal Support Vector Machine Classification via Generalized Eigenvalues

Multisurface Proximal Support Vector Machine Classification via Generalized Eigenvalues Multisurface Proximal Support Vector Machine Classification via Generalized Eigenvalues O. L. Mangasarian and E. W. Wild Presented by: Jun Fang Multisurface Proximal Support Vector Machine Classification

More information

Introduction to Logistic Regression and Support Vector Machine

Introduction to Logistic Regression and Support Vector Machine Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel

More information

Computational Learning Theory

Computational Learning Theory Computational Learning Theory Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Ordibehesht 1390 Introduction For the analysis of data structures and algorithms

More information

Lecture 15: Logistic Regression

Lecture 15: Logistic Regression Lecture 15: Logistic Regression William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 15 What we ll learn in this lecture Model-based regression and classification Logistic regression

More information

Part 9: Text Classification; The Naïve Bayes algorithm Francesco Ricci

Part 9: Text Classification; The Naïve Bayes algorithm Francesco Ricci Part 9: Text Classification; The Naïve Bayes algorithm Francesco Ricci Most of these slides comes from the course: Information Retrieval and Web Search, Christopher Manning and Prabhakar Raghavan 1 Content

More information

Do Neural Network Cross-Modal Mappings Really Bridge Modalities?

Do Neural Network Cross-Modal Mappings Really Bridge Modalities? Do Neural Network Cross-Modal Mappings Really Bridge Modalities? Language Intelligence and Information Retrieval group (LIIR) Department of Computer Science Story Collell, G., Zhang, T., Moens, M.F. (2017)

More information

The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks

The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks Yingfei Wang, Chu Wang and Warren B. Powell Princeton University Yingfei Wang Optimal Learning Methods June 22, 2016

More information

Exploiting Associations between Word Clusters and Document Classes for Cross-domain Text Categorization

Exploiting Associations between Word Clusters and Document Classes for Cross-domain Text Categorization Exploiting Associations between Word Clusters and Document Classes for Cross-domain Text Categorization Fuzhen Zhuang Ping Luo Hui Xiong Qing He Yuhong Xiong Zhongzhi Shi Abstract Cross-domain text categorization

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: (Finish) Expectation Maximization Principal Component Analysis (PCA) Readings: Barber 15.1-15.4 Dhruv Batra Virginia Tech Administrativia Poster Presentation:

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Word vectors Many slides borrowed from Richard Socher and Chris Manning Lecture plan Word representations Word vectors (embeddings) skip-gram algorithm Relation to matrix factorization

More information

Latent Dirichlet Allocation Introduction/Overview

Latent Dirichlet Allocation Introduction/Overview Latent Dirichlet Allocation Introduction/Overview David Meyer 03.10.2016 David Meyer http://www.1-4-5.net/~dmm/ml/lda_intro.pdf 03.10.2016 Agenda What is Topic Modeling? Parametric vs. Non-Parametric Models

More information

Lecture 8: PGM Inference

Lecture 8: PGM Inference 15 September 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I 1 Variable elimination Max-product Sum-product 2 LP Relaxations QP Relaxations 3 Marginal and MAP X1 X2 X3 X4

More information

Tackling the Poor Assumptions of Naive Bayes Text Classifiers

Tackling the Poor Assumptions of Naive Bayes Text Classifiers Tackling the Poor Assumptions of Naive Bayes Text Classifiers Jason Rennie MIT Computer Science and Artificial Intelligence Laboratory jrennie@ai.mit.edu Joint work with Lawrence Shih, Jaime Teevan and

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

An Introduction to Statistical and Probabilistic Linear Models

An Introduction to Statistical and Probabilistic Linear Models An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning

More information

Induction of Decision Trees

Induction of Decision Trees Induction of Decision Trees Peter Waiganjo Wagacha This notes are for ICS320 Foundations of Learning and Adaptive Systems Institute of Computer Science University of Nairobi PO Box 30197, 00200 Nairobi.

More information

CS540 ANSWER SHEET

CS540 ANSWER SHEET CS540 ANSWER SHEET Name Email 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 1 2 Final Examination CS540-1: Introduction to Artificial Intelligence Fall 2016 20 questions, 5 points

More information

DD Advanced Machine Learning

DD Advanced Machine Learning Modelling Carl Henrik {chek}@csc.kth.se Royal Institute of Technology November 4, 2015 Who do I think you are? Mathematically competent linear algebra multivariate calculus Ok programmers Able to extend

More information

Statistical Methods for Data Mining

Statistical Methods for Data Mining Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find

More information

Domain Adaptation for Regression

Domain Adaptation for Regression Domain Adaptation for Regression Corinna Cortes Google Research corinna@google.com Mehryar Mohri Courant Institute and Google mohri@cims.nyu.edu Motivation Applications: distinct training and test distributions.

More information

Neural Networks Language Models

Neural Networks Language Models Neural Networks Language Models Philipp Koehn 10 October 2017 N-Gram Backoff Language Model 1 Previously, we approximated... by applying the chain rule p(w ) = p(w 1, w 2,..., w n ) p(w ) = i p(w i w 1,...,

More information

DANIEL WILSON AND BEN CONKLIN. Integrating AI with Foundation Intelligence for Actionable Intelligence

DANIEL WILSON AND BEN CONKLIN. Integrating AI with Foundation Intelligence for Actionable Intelligence DANIEL WILSON AND BEN CONKLIN Integrating AI with Foundation Intelligence for Actionable Intelligence INTEGRATING AI WITH FOUNDATION INTELLIGENCE FOR ACTIONABLE INTELLIGENCE in an arms race for artificial

More information

Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287

Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Review: Neural Networks One-layer multi-layer perceptron architecture, NN MLP1 (x) = g(xw 1 + b 1 )W 2 + b 2 xw + b; perceptron x is the

More information

c 4, < y 2, 1 0, otherwise,

c 4, < y 2, 1 0, otherwise, Fundamentals of Big Data Analytics Univ.-Prof. Dr. rer. nat. Rudolf Mathar Problem. Probability theory: The outcome of an experiment is described by three events A, B and C. The probabilities Pr(A) =,

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Chapter 8 Text Classification Introduction A Characterization of Text Classification Unsupervised Algorithms Supervised Algorithms Feature Selection or Dimensionality Reduction

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

Expectation Propagation for Approximate Bayesian Inference

Expectation Propagation for Approximate Bayesian Inference Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given

More information

Lecture 2: Probability, Naive Bayes

Lecture 2: Probability, Naive Bayes Lecture 2: Probability, Naive Bayes CS 585, Fall 205 Introduction to Natural Language Processing http://people.cs.umass.edu/~brenocon/inlp205/ Brendan O Connor Today Probability Review Naive Bayes classification

More information

Jae-Bong Lee 1 and Bernard A. Megrey 2. International Symposium on Climate Change Effects on Fish and Fisheries

Jae-Bong Lee 1 and Bernard A. Megrey 2. International Symposium on Climate Change Effects on Fish and Fisheries International Symposium on Climate Change Effects on Fish and Fisheries On the utility of self-organizing maps (SOM) and k-means clustering to characterize and compare low frequency spatial and temporal

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Graph and Network Instructor: Yizhou Sun yzsun@cs.ucla.edu May 31, 2017 Methods Learnt Classification Clustering Vector Data Text Data Recommender System Decision Tree; Naïve

More information

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space to The The A s s in to Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 2, 2009 to The The A s s in 1 Motivation Outline 2 The Mapping the

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 18, 2016 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass

More information

UNSUPERVISED LEARNING

UNSUPERVISED LEARNING UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training

More information

Symbolic methods in TC: Decision Trees

Symbolic methods in TC: Decision Trees Symbolic methods in TC: Decision Trees ML for NLP Lecturer: Kevin Koidl Assist. Lecturer Alfredo Maldonado https://www.cs.tcd.ie/kevin.koidl/cs4062/ kevin.koidl@scss.tcd.ie, maldonaa@tcd.ie 2016-2017 2

More information

Fuzzy Systems. Introduction

Fuzzy Systems. Introduction Fuzzy Systems Introduction Prof. Dr. Rudolf Kruse Christoph Doell {kruse,doell}@iws.cs.uni-magdeburg.de Otto-von-Guericke University of Magdeburg Faculty of Computer Science Department of Knowledge Processing

More information

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013 Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description

More information