Cross-Modal Retrieval: A Pairwise Classification Approach

Size: px
Start display at page:

Download "Cross-Modal Retrieval: A Pairwise Classification Approach"

Transcription

1 Cross-Modal Retrieval: A Pairwise Classification Approach Aditya Krishna Menon 1,2, Didi Surian 1,3 and Sanjay Chawla 3,4 1 National ICT Australia, Australia 2 Australian National University, Australia 3 The University of Sydney, Australia 4 Qatar Computing Research Institute, Qatar SIAM International Conference on Data Mining 2015 April 30th, 2015

2 Overview of talk 1 Overview 2 Cross-modal retrieval: unsupervised and supervised 3 The similarity function perspective 4 Learning without annotations 5 Learning with annotations 6 Experiments 7 Conclusion

3 Background Content is increasingly available in multiple modalities (images, text, videos,...) Loveliest)of)trees,)the) cherry)now2 Is)hung)with)bloom) along)the)bough,2 And)stands)about)the) woodland)ride2 Wearing)white)for) Eastertide.2 I WANDER'D lonely as a cloud That floats on high o'er vales and hills, When all at once I saw a crowd, A host, of golden daffodils; Beside the lake, beneath the trees, Fluttering and dancing in the breeze.

4 Cross-modal retrieval: informally Given the representation of an entity in one modality, find its best representation in all other modalities

5 Cross-modal retrieval: informally e.g. Given a piece of text, what is the image that best accompanies it? Loveliest)of)trees,)the) cherry)now2 Is)hung)with)bloom) along)the)bough,2 And)stands)about)the) woodland)ride2 Wearing)white)for) Eastertide.2 I WANDER'D lonely as a cloud That floats on high o'er vales and hills, When all at once I saw a crowd, A host, of golden daffodils; Beside the lake, beneath the trees, Fluttering and dancing in the breeze.

6 Cross-modal retrieval: informally e.g. Given a piece of text, what is the image that best accompanies it? Loveliest)of)trees,)the) cherry)now2 Is)hung)with)bloom) along)the)bough,2 And)stands)about)the) woodland)ride2 Wearing)white)for) Eastertide.2 I WANDER'D lonely as a cloud That floats on high o'er vales and hills, When all at once I saw a crowd, A host, of golden daffodils; Beside the lake, beneath the trees, Fluttering and dancing in the breeze.

7 Prior work: CCA Canonical Correlation Analysis (CCA): project both modalities onto a latent subspace where they exhibit maximum correlation Maximally Correlated Image Space Sub spaces Text Space R I U I U I U T R T U I U T U T

8 Prior work: SM Semantic Matching (SM): project both modalities into a rich supervised subspace Image Space R I Semantic Space Image classifiers Text classifiers Text Space R T McAleer's tenure as part owner of the Red Sox came to a swift end. On July 15, 1913, McAleer became involved in a dispute with the AL president, Ban Johnson

9 This paper A pairwise classification perspective on cross-modal retrieval Goal is to score pairs of instances based on affinity Seamlessly study scenarios with and without ground-truth annotation Essentially, reduction to logistic regression on pairs of instances Unification of the CCA and SM approaches Good empirical performance on real-world retrieval tasks

10 Overview of talk 1 Overview 2 Cross-modal retrieval: unsupervised and supervised 3 The similarity function perspective 4 Learning without annotations 5 Learning with annotations 6 Experiments 7 Conclusion

11 (Unsupervised) Cross-modal retrieval: training set Training examples: D = {(a (i), b (i) )} n i=1 (" )"," ("," )" ("," )" Loveliest)of)trees,)the) cherry)now2 Is)hung)with)bloom) along)the)bough,2 And)stands)about)the) woodland)ride2 Wearing)white)for) Eastertide.2 I WANDER'D lonely as a cloud That floats on high o'er vales and hills, When all at once I saw a crowd, A host, of golden daffodils; Beside the lake, beneath the trees, Fluttering and dancing in the breeze.

12 (Unsupervised) Cross-modal retrieval: training set Training examples: D = {(a (i), b (i) )} n i=1 Set of suitable pairings of entities across two modalities (a (i), b (i) ) A B, A R da and B R d b the modality representations (" )"," ("," )" ("," )" Loveliest)of)trees,)the) cherry)now2 Is)hung)with)bloom) along)the)bough,2 And)stands)about)the) woodland)ride2 Wearing)white)for) Eastertide.2 I WANDER'D lonely as a cloud That floats on high o'er vales and hills, When all at once I saw a crowd, A host, of golden daffodils; Beside the lake, beneath the trees, Fluttering and dancing in the breeze.

13 Cross-modal retrieval: goal Goal: Learn retrieval mappings f : A B and g : B A

14 Cross-modal retrieval: goal Goal: Learn retrieval mappings f : A B and g : B A e.g. Given a piece of text, what is the best accompanying image? f!:! They call me 'The Wild Rose' But my name was Elisa Day Why they call me it, I do not know For my name was Elisa Day

15 Assumption: ground-truth annotations We assume there is a set of ground-truth annotations for each entity For our purposes, a label from some set Y = {1, 2,..., K} Arts% Nature% I WANDER'D lonely as a cloud That floats on high o'er vales and hills, When all at once I saw a crowd, A host, of golden daffodils; Beside the lake, beneath the trees, Fluttering and dancing in the breeze. Nature% History% Poli.cs% Science% These annotations may be unobserved But are still used for evaluation

16 (Supervised) Cross-modal retrieval: training set Training examples: D = {((a (i), b (i) ), )} n i=1

17 (Supervised) Cross-modal retrieval: training set Training examples: D = {((a (i), b (i) ), y (i) )} n i=1 y (i) Y are the annotations (' I WANDER'D lonely as a cloud That floats on high o'er vales and hills, When all at once I saw a crowd, A host, of golden daffodils; Beside the lake, beneath the trees, Fluttering and dancing in the breeze.,',' )' Nature' (',',' Arts' )' (',' )','History' Goal is as before

18 Overview of talk 1 Overview 2 Cross-modal retrieval: unsupervised and supervised 3 The similarity function perspective 4 Learning without annotations 5 Learning with annotations 6 Experiments 7 Conclusion

19 Latent map approach Popular approach: learn latent maps ψ A : A R k, ψ B : B R k k N + is the dimensionality of some latent subspace Compute retrieval maps f : a argmin d(ψ A (a), ψ B (b)) b B g : b argmin d(ψ A (a), ψ B (b)), a A where d(, ) is some suitable distance function

20 Latent map approach Popular approach: learn latent maps ψ A : A R k, ψ B : B R k k N + is the dimensionality of some latent subspace Compute retrieval maps f : a argmin d(ψ A (a), ψ B (b)) b B g : b argmin d(ψ A (a), ψ B (b)), a A where d(, ) is some suitable distance function e.g.: In CCA, we learn linear projections and use Euclidean distance for retrieval ψ A : a Ua ψ B : a Vb,

21 Similarity function approach Our approach: learn a similarity function, s : A B R Assigns a high score when the given pair are related to each other Compute retrieval maps f : a argmax b B g : b argmax a A s(a, b) s(a, b). Captures the latent map approach as a special case

22 Similarity function approach Our approach: learn a similarity function, s : A B R Assigns a high score when the given pair are related to each other Compute retrieval maps f : a argmax b B g : b argmax a A s(a, b) s(a, b). Captures the latent map approach as a special case But what is a good similarity?

23 What is a good similarity? s(a, b) should be high iff a, b have the same ground-truth annotation

24 What is a good similarity? s(a, b) should be high iff a, b have the same ground-truth annotation A: random variable over A (1 st modality) B: random variable over B (2 nd modality) Y: random variable over Y (annotation for A) Y : random variable over Y (annotation for B)

25 What is a good similarity? s(a, b) should be high iff a, b have the same ground-truth annotation A: random variable over A (1 st modality) B: random variable over B (2 nd modality) Y: random variable over Y (annotation for A) Y : random variable over Y (annotation for B) Here, Z := 2 Y = Y 1 {±1} whether or not the two annotations agree

26 What is a good similarity? s(a, b) should be high iff a, b have the same ground-truth annotation A: random variable over A (1 st modality) B: random variable over B (2 nd modality) Y: random variable over Y (annotation for A) Y : random variable over Y (annotation for B) Here, Z := 2 Y = Y 1 {±1} whether or not the two annotations agree Judge s according to probability of incorrectly assessing compatible pairs: L(s) = P A,B,Z (s(a, B) Z < 0). sign of s should indicate whether the ground truths align

27 Reduction to pairwise classification We can view Z as a binary label on the instance pair (A, B)

28 Reduction to pairwise classification We can view Z as a binary label on the instance pair (A, B) Learning a similarity function binary classification on the paired instance space A B

29 Reduction to pairwise classification We can view Z as a binary label on the instance pair (A, B) Learning a similarity function binary classification on the paired instance space A B Ideally, treat D = {((a (i), b (i) ), z (i) )} n i=1 as input to a standard binary classifier

30 Reduction to pairwise classification We can view Z as a binary label on the instance pair (A, B) Learning a similarity function binary classification on the paired instance space A B Ideally, treat D = {((a (i), b (i) ), z (i) )} n i=1 as input to a standard binary classifier More effort is needed depending on whether or not we have ground-truth annotations

31 Overview of talk 1 Overview 2 Cross-modal retrieval: unsupervised and supervised 3 The similarity function perspective 4 Learning without annotations 5 Learning with annotations 6 Experiments 7 Conclusion

32 Learning without annotations Without ground-truth annotations, we only observe pairs that are linked We do not observe pairs that do not link Training set: D = {((a (i), b (i) ), z (i) )} where all labels z (i) = +1 (",","+" )" Loveliest)of)trees,)the) cherry)now2 Is)hung)with)bloom) along)the)bough,2 And)stands)about)the) woodland)ride2 Wearing)white)for) Eastertide.2 (" )" I WANDER'D lonely as a cloud That floats on high o'er vales and hills, When all at once I saw a crowd, A host, of golden daffodils; Beside the lake, beneath the trees, Fluttering and dancing in the breeze.,","+" (",","+")"

33 Learning without annotations Without ground-truth annotations, we only observe pairs that are linked We do not observe pairs that do not link Training set: D = {((a (i), b (i) ), z (i) )} where all labels z (i) = +1 Learning from positive only data Cannot apply standard binary classifier! (",","+" )" Loveliest)of)trees,)the) cherry)now2 Is)hung)with)bloom) along)the)bough,2 And)stands)about)the) woodland)ride2 Wearing)white)for) Eastertide.2 (" )" I WANDER'D lonely as a cloud That floats on high o'er vales and hills, When all at once I saw a crowd, A host, of golden daffodils; Beside the lake, beneath the trees, Fluttering and dancing in the breeze.,","+" (",","+")"

34 Reduction to positive and unlabelled learning Basic idea: Construct all pairs of instances from each modality D = {((a (i), b (j) ), z (i,j) )} n i,j=1 We know z (i,i) = 1, but z (i,j) unknown for i j (",","+" )" Loveliest)of)trees,)the) cherry)now2 Is)hung)with)bloom) along)the)bough,2 And)stands)about)the) woodland)ride2 Wearing)white)for) Eastertide.2 (" )" I WANDER'D lonely as a cloud That floats on high o'er vales and hills, When all at once I saw a crowd, A host, of golden daffodils; Beside the lake, beneath the trees, Fluttering and dancing in the breeze.,","+" (",","+")" (",","?")" Loveliest)of)trees,)the) cherry)now2 Is)hung)with)bloom) along)the)bough,2 And)stands)about)the) woodland)ride2 Wearing)white)for) Eastertide.2 (" )" I WANDER'D lonely as a cloud That floats on high o'er vales and hills, When all at once I saw a crowd, A host, of golden daffodils; Beside the lake, beneath the trees, Fluttering and dancing in the breeze.,","?" (",","?")"

35 Reduction to positive and unlabelled learning Basic idea: Construct all pairs of instances from each modality D = {((a (i), b (j) ), z (i,j) )} n i,j=1 We know z (i,i) = 1, but z (i,j) unknown for i j Learning from positive and unlabelled data Aims to better discriminate linked and unlinked pairs (",","+" )" Loveliest)of)trees,)the) cherry)now2 Is)hung)with)bloom) along)the)bough,2 And)stands)about)the) woodland)ride2 Wearing)white)for) Eastertide.2 (" )" I WANDER'D lonely as a cloud That floats on high o'er vales and hills, When all at once I saw a crowd, A host, of golden daffodils; Beside the lake, beneath the trees, Fluttering and dancing in the breeze.,","+" (",","?")" Loveliest)of)trees,)the) cherry)now2 Is)hung)with)bloom) along)the)bough,2 And)stands)about)the) woodland)ride2 Wearing)white)for) Eastertide.2 (" )" I WANDER'D lonely as a cloud That floats on high o'er vales and hills, When all at once I saw a crowd, A host, of golden daffodils; Beside the lake, beneath the trees, Fluttering and dancing in the breeze. (",","+")" (",","?")",","?"

36 Reduction to binary classification Simple recipe for positive and unlabelled learning [Elkan and Noto, 2008]: Treat unlabelled instances as negatives Learn a class-probability estimator (e.g. logistic regression) Optimal for ranking purposes!

37 Reduction to binary classification Simple recipe for positive and unlabelled learning [Elkan and Noto, 2008]: Treat unlabelled instances as negatives Learn a class-probability estimator (e.g. logistic regression) Optimal for ranking purposes! Thus, we can perform logistic regression on the labelled instances Loveliest of trees, the cherry now Is hung with bloom along the bough, And stands about the woodland ride Wearing white for Eastertide. I WANDER'D lonely as a cloud That floats on high o'er vales and hills, When all at once I saw a crowd, A host, of golden daffodils; Beside the lake, beneath the trees, Fluttering and dancing in the breeze. D = {((a (i), b (j) ), (2 i = j 1))} n i,j=1 (,, + ) ( ),, + (,, + ) (,, - ) Loveliest of trees, the cherry now Is hung with bloom along the bough, And stands about the woodland ride Wearing white for Eastertide. ( ) I WANDER'D lonely as a cloud That floats on high o'er vales and hills, When all at once I saw a crowd, A host, of golden daffodils; Beside the lake, beneath the trees, Fluttering and dancing in the breeze.,, - (,, - )

38 Choice of feature mapping We consider l 2 regularised logistic regression with some linear scorer: min w 1 n 2 n i=1 n log(1 + e z(i,j) s(a (i),b (j) ;w) ) + λ 2 w 2 2 j=1 where s(a, b; w) = w, Φ(a, b) for some feature mapping Φ Simplest Φ is concatenation: Φ(a, b) = [ a b ]

39 Choice of feature mapping We consider l 2 regularised logistic regression with some linear scorer: min w 1 n 2 n i=1 n log(1 + e z(i,j) s(a (i),b (j) ;w) ) + λ 2 w 2 2 j=1 where s(a, b; w) = w, Φ(a, b) for some feature mapping Φ Simplest Φ is concatenation: Φ(a, b) = [ a b ] Problem: Induces the same ranking over b B for any a A!

40 Choice of feature mapping Better Φ: Φ(a, b) = [ ] a d b d d,d i.e. consider the cross-product of features The similarity function is equivalently: s(a, b; w) = w, Φ(a, b) = a T Wb c.f. bilinear regression CCA low rank approximation to W

41 Choice of feature mapping Better Φ: Φ(a, b) = [ ] a d b d d,d i.e. consider the cross-product of features The similarity function is equivalently: s(a, b; w) = w, Φ(a, b) = a T Wb c.f. bilinear regression CCA low rank approximation to W Learn, for each pair of features across the modalities, their predictive power in determining affinity

42 Overview of talk 1 Overview 2 Cross-modal retrieval: unsupervised and supervised 3 The similarity function perspective 4 Learning without annotations 5 Learning with annotations 6 Experiments 7 Conclusion

43 Learning with annotations With ground-truth annotations, we still only observe pairs that are linked Training set: D = {((a (i), b (i) ), y (i) )} n i=1 y (i) Y is the category for the ith entity (',',' )' I WANDER'D lonely as a cloud That floats on high o'er vales and hills, When all at once I saw a crowd, A host, of golden daffodils; Nature' Beside the lake, beneath the trees, Fluttering and dancing in the breeze. (',',' Arts' )' (',' )','History'

44 Learning with annotations With ground-truth annotations, we still only observe pairs that are linked Training set: D = {((a (i), b (i) ), y (i) )} n i=1 y (i) Y is the category for the ith entity (',',' )' I WANDER'D lonely as a cloud That floats on high o'er vales and hills, When all at once I saw a crowd, A host, of golden daffodils; Nature' Beside the lake, beneath the trees, Fluttering and dancing in the breeze. (',',' Arts' )' (',' )','History' However, using annotations, we can more accurately infer whether other pairs should be linked or not

45 Pairwise classification approach Basic idea: Construct all pairs of instances from each modality where z (i,j) = 2 y (i) = y (j) 1 D = {((a (i), b (j) ), z (i,j) )} n i,j=1 Link a pair iff ground-truth annotation coincides Now learn logistic regression model as in the case without annotations Only changing the labelling procedure

46 An alternate approach There is a simpler approach to learning with annotations If well-specified, logistic regression will learn a transformation of

47 An alternate approach There is a simpler approach to learning with annotations If well-specified, logistic regression will learn a transformation of P(Z = 1 A = a, B = b) = P(Y = Y A = a, B = b) K = P(Y = k A = a) P(Y = k B = b) k=1 = η A (a), η B (b), where η A and η B are vectors of instance-conditional distributions over labels

48 An alternate approach There is a simpler approach to learning with annotations If well-specified, logistic regression will learn a transformation of P(Z = 1 A = a, B = b) = P(Y = Y A = a, B = b) K = P(Y = k A = a) P(Y = k B = b) k=1 = η A (a), η B (b), where η A and η B are vectors of instance-conditional distributions over labels Why not just directly learn η A, η B?

49 Marginal approach Basic idea: Learn independent models for η A and η B e.g. using multi-class logistic regression on {(a (i), y (i) )} and {(b (i), y (i) )} Use their dot-product as the similarity function c.f. use of cosine similarity in SM method While individual probability models are linear, final similarity function will be nonlinear in the inputs

50 Overview of talk 1 Overview 2 Cross-modal retrieval: unsupervised and supervised 3 The similarity function perspective 4 Learning without annotations 5 Learning with annotations 6 Experiments 7 Conclusion

51 Overview of experiments Experiments on three datasets: Wikipedia: 2,173 training, 693 test pairs; 10 semantic categories. Pascal: 700 training, 300 test pairs; 20 semantic categories. TVGraz: 1,558 training, 500 testing pairs; 10 semantic categories. Feature representations: Text: LDA topic assignment distributions Images: SIFT features

52 Methods compared Baselines: Random predictor OC-SVM CCA SM, SCM Cross-modal metric learning (CMML) Measure performance based on mean average precision (MAP)

53 Results without annotation Wikipedia Image query Text query Pascal Image query Text query Random OC SVM CCA CMML Cross Prod 0.05 TVGraz Random OC SVM CCA CMML Cross Prod Image query Text query Random OC SVM CCA CMML Cross Prod MAP score improvements to CCA: 2% on Wikipedia, 10% on Pascal, and 3% on TVGraz.

54 Results with annotation Wikipedia Image query Text query Pascal Image query Text query SM SCM Pairwise SM Marginal SM Marginal SCM 0.15 SM SCM Pairwise SM Marginal SM Marginal SCM TVGraz Image query Text query SM SCM Pairwise SM Marginal SM Marginal SCM MAP score improvements to SM: 10% on Wikipedia, 10% on Pascal, and 5% on TVGraz.

55 Precision-recall curves on Wikipedia Consistent dominance of cross-product method over CCA, marginal over SM Image Queries Wikipedia CCA Cross Product SM Marginal SM Text Queries Wikipedia CCA Cross Product SM Marginal SM Precision Precision Recall Recall

56 Case study: text query Given a text, find images TEXT QUERY Cobby was a member of the RAAF Reserve (also known as the Ci8zen Air Force) during his 8me with the Civil Avia8on Board, and rejoined the Permanent Air Force following the outbreak of World War II Warfare Image Results CCA Cross- product Warfare Warfare Warfare Warfare Warfare Warfare Geography Music CCA suffers from not considering negative links

57 Case study: image query Given an image (top), find texts IMAGE QUERY Geography Cross- product CCA Text Results Most of the Columbia's drainage basin (which, at, is about the size of France) lies roughly between the Un?l the 20th century, the slough and its side channels and associated ponds and lakes were part of the ac?ve The Columbia begins its journey in the southern Rocky Mountain Trench in Bri?sh Columbia (BC) There are over of hiking trails at Worlds End State Park. Most of the trails are rocky and steep, so hikers are The history of Kaziranga as a protected area can be traced back to 1904, when Mary Victoria Leiter Curzon As of the census of 2006, there were 382,872 people, 165,743 households, and 99,114 families residing in Various public sector organisa?ons have an important role in the stewardship of the country s fauna Only humans are recognized as persons and protected in law by the United Na?ons Universal Declara?on of Geography Geography Geography Geography Geography Geography Biology Biology

58 Overview of talk 1 Overview 2 Cross-modal retrieval: unsupervised and supervised 3 The similarity function perspective 4 Learning without annotations 5 Learning with annotations 6 Experiments 7 Conclusion

59 Conclusion Pairwise classification perspective on cross-modal retrieval Unified perspective in the presence and absence of ground truth annotations Logistic regression approach for learning without and with annotations Without: reduction to positive and unlabelled learning With: learn probability distributions for each modality Good empirical performance

60 Questions?

Information Extraction from Text

Information Extraction from Text Information Extraction from Text Jing Jiang Chapter 2 from Mining Text Data (2012) Presented by Andrew Landgraf, September 13, 2013 1 What is Information Extraction? Goal is to discover structured information

More information

Anticipating Visual Representations from Unlabeled Data. Carl Vondrick, Hamed Pirsiavash, Antonio Torralba

Anticipating Visual Representations from Unlabeled Data. Carl Vondrick, Hamed Pirsiavash, Antonio Torralba Anticipating Visual Representations from Unlabeled Data Carl Vondrick, Hamed Pirsiavash, Antonio Torralba Overview Problem Key Insight Methods Experiments Problem: Predict future actions and objects Image

More information

Learning from Corrupted Binary Labels via Class-Probability Estimation

Learning from Corrupted Binary Labels via Class-Probability Estimation Learning from Corrupted Binary Labels via Class-Probability Estimation Aditya Krishna Menon Brendan van Rooyen Cheng Soon Ong Robert C. Williamson xxx National ICT Australia and The Australian National

More information

Classification 2: Linear discriminant analysis (continued); logistic regression

Classification 2: Linear discriminant analysis (continued); logistic regression Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:

More information

Collaborative topic models: motivations cont

Collaborative topic models: motivations cont Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem.

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Classification 1: Linear regression of indicators, linear discriminant analysis

Classification 1: Linear regression of indicators, linear discriminant analysis Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification

More information

Multiclass Classification-1

Multiclass Classification-1 CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

Tensor Canonical Correlation Analysis and Its applications

Tensor Canonical Correlation Analysis and Its applications Tensor Canonical Correlation Analysis and Its applications Presenter: Yong LUO The work is done when Yong LUO was a Research Fellow at Nanyang Technological University, Singapore Outline Y. Luo, D. C.

More information

Online Passive-Aggressive Algorithms. Tirgul 11

Online Passive-Aggressive Algorithms. Tirgul 11 Online Passive-Aggressive Algorithms Tirgul 11 Multi-Label Classification 2 Multilabel Problem: Example Mapping Apps to smart folders: Assign an installed app to one or more folders Candy Crush Saga 3

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

Using Both Latent and Supervised Shared Topics for Multitask Learning

Using Both Latent and Supervised Shared Topics for Multitask Learning Using Both Latent and Supervised Shared Topics for Multitask Learning Ayan Acharya, Aditya Rawal, Raymond J. Mooney, Eduardo R. Hruschka UT Austin, Dept. of ECE September 21, 2013 Problem Definition An

More information

A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank

A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank Shoaib Jameel Shoaib Jameel 1, Wai Lam 2, Steven Schockaert 1, and Lidong Bing 3 1 School of Computer Science and Informatics,

More information

Fantope Regularization in Metric Learning

Fantope Regularization in Metric Learning Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction

More information

Generative Models for Classification

Generative Models for Classification Generative Models for Classification CS4780/5780 Machine Learning Fall 2014 Thorsten Joachims Cornell University Reading: Mitchell, Chapter 6.9-6.10 Duda, Hart & Stork, Pages 20-39 Generative vs. Discriminative

More information

Linear Classifiers IV

Linear Classifiers IV Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers IV Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic

More information

Lecture 3: Multiclass Classification

Lecture 3: Multiclass Classification Lecture 3: Multiclass Classification Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Some slides are adapted from Vivek Skirmar and Dan Roth CS6501 Lecture 3 1 Announcement v Please enroll in

More information

Multi-Label Informed Latent Semantic Indexing

Multi-Label Informed Latent Semantic Indexing Multi-Label Informed Latent Semantic Indexing Shipeng Yu 12 Joint work with Kai Yu 1 and Volker Tresp 1 August 2005 1 Siemens Corporate Technology Department of Neural Computation 2 University of Munich

More information

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels Karl Stratos June 21, 2018 1 / 33 Tangent: Some Loose Ends in Logistic Regression Polynomial feature expansion in logistic

More information

Dimension Reduction (PCA, ICA, CCA, FLD,

Dimension Reduction (PCA, ICA, CCA, FLD, Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011 April 6 th, 2011 Parts of the PCA slides are from previous 10-701 lectures 1 Outline Dimension reduction

More information

Correlation Autoencoder Hashing for Supervised Cross-Modal Search

Correlation Autoencoder Hashing for Supervised Cross-Modal Search Correlation Autoencoder Hashing for Supervised Cross-Modal Search Yue Cao, Mingsheng Long, Jianmin Wang, and Han Zhu School of Software Tsinghua University The Annual ACM International Conference on Multimedia

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

2017 Fall ECE 692/599: Binary Representation Learning for Large Scale Visual Data

2017 Fall ECE 692/599: Binary Representation Learning for Large Scale Visual Data 2017 Fall ECE 692/599: Binary Representation Learning for Large Scale Visual Data Liu Liu Instructor: Dr. Hairong Qi University of Tennessee, Knoxville lliu25@vols.utk.edu September 21, 2017 Liu Liu (UTK)

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

arxiv: v2 [cs.cl] 1 Jan 2019

arxiv: v2 [cs.cl] 1 Jan 2019 Variational Self-attention Model for Sentence Representation arxiv:1812.11559v2 [cs.cl] 1 Jan 2019 Qiang Zhang 1, Shangsong Liang 2, Emine Yilmaz 1 1 University College London, London, United Kingdom 2

More information

Learning From Crowds. Presented by: Bei Peng 03/24/15

Learning From Crowds. Presented by: Bei Peng 03/24/15 Learning From Crowds Presented by: Bei Peng 03/24/15 1 Supervised Learning Given labeled training data, learn to generalize well on unseen data Binary classification ( ) Multi-class classification ( y

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:

More information

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis Week 5 Based in part on slides from textbook, slides of Susan Holmes Part I Linear Discriminant Analysis October 29, 2012 1 / 1 2 / 1 Nearest centroid rule Suppose we break down our data matrix as by the

More information

Text mining and natural language analysis. Jefrey Lijffijt

Text mining and natural language analysis. Jefrey Lijffijt Text mining and natural language analysis Jefrey Lijffijt PART I: Introduction to Text Mining Why text mining The amount of text published on paper, on the web, and even within companies is inconceivably

More information

Generative Models. CS4780/5780 Machine Learning Fall Thorsten Joachims Cornell University

Generative Models. CS4780/5780 Machine Learning Fall Thorsten Joachims Cornell University Generative Models CS4780/5780 Machine Learning Fall 2012 Thorsten Joachims Cornell University Reading: Mitchell, Chapter 6.9-6.10 Duda, Hart & Stork, Pages 20-39 Bayes decision rule Bayes theorem Generative

More information

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017 CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class

More information

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Decoupled Collaborative Ranking

Decoupled Collaborative Ranking Decoupled Collaborative Ranking Jun Hu, Ping Li April 24, 2017 Jun Hu, Ping Li WWW2017 April 24, 2017 1 / 36 Recommender Systems Recommendation system is an information filtering technique, which provides

More information

The Relevance of Spatial Relation Terms and Geographical Feature Types

The Relevance of Spatial Relation Terms and Geographical Feature Types The Relevance of Spatial Relation Terms and Geographical Feature Types Reporter Chunju Zhang Date: 2012-05-29 1 2 Introduction Classification of Spatial Relation Terms 3 4 5 Calculation of Relevance Conclusion

More information

Machine Learning for Structured Prediction

Machine Learning for Structured Prediction Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for

More information

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification 10-810: Advanced Algorithms and Models for Computational Biology Optimal leaf ordering and classification Hierarchical clustering As we mentioned, its one of the most popular methods for clustering gene

More information

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we

More information

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines CS4495/6495 Introduction to Computer Vision 8C-L3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 5: Multi-class and preference learning Juho Rousu 11. October, 2017 Juho Rousu 11. October, 2017 1 / 37 Agenda from now on: This week s theme: going

More information

Topics in Natural Language Processing

Topics in Natural Language Processing Topics in Natural Language Processing Shay Cohen Institute for Language, Cognition and Computation University of Edinburgh Lecture 9 Administrativia Next class will be a summary Please email me questions

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we

More information

Kernel Methods. Konstantin Tretyakov MTAT Machine Learning

Kernel Methods. Konstantin Tretyakov MTAT Machine Learning Kernel Methods Konstantin Tretyakov (kt@ut.ee) MTAT.03.227 Machine Learning So far Supervised machine learning Linear models Non-linear models Unsupervised machine learning Generic scaffolding So far Supervised

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Clustering Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu May 2, 2017 Announcements Homework 2 due later today Due May 3 rd (11:59pm) Course project

More information

A Least Squares Formulation for Canonical Correlation Analysis

A Least Squares Formulation for Canonical Correlation Analysis A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation

More information

Asaf Bar Zvi Adi Hayat. Semantic Segmentation

Asaf Bar Zvi Adi Hayat. Semantic Segmentation Asaf Bar Zvi Adi Hayat Semantic Segmentation Today s Topics Fully Convolutional Networks (FCN) (CVPR 2015) Conditional Random Fields as Recurrent Neural Networks (ICCV 2015) Gaussian Conditional random

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

Kernel Methods. Konstantin Tretyakov MTAT Machine Learning

Kernel Methods. Konstantin Tretyakov MTAT Machine Learning Kernel Methods Konstantin Tretyakov (kt@ut.ee) MTAT.03.227 Machine Learning So far Supervised machine learning Linear models Least squares regression, SVR Fisher s discriminant, Perceptron, Logistic model,

More information

11. Learning To Rank. Most slides were adapted from Stanford CS 276 course.

11. Learning To Rank. Most slides were adapted from Stanford CS 276 course. 11. Learning To Rank Most slides were adapted from Stanford CS 276 course. 1 Sec. 15.4 Machine learning for IR ranking? We ve looked at methods for ranking documents in IR Cosine similarity, inverse document

More information

Kronecker Decomposition for Image Classification

Kronecker Decomposition for Image Classification university of innsbruck institute of computer science intelligent and interactive systems Kronecker Decomposition for Image Classification Sabrina Fontanella 1,2, Antonio Rodríguez-Sánchez 1, Justus Piater

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant

More information

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM 1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University

More information

6.036 midterm review. Wednesday, March 18, 15

6.036 midterm review. Wednesday, March 18, 15 6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that

More information

KSU Team s System and Experience at the NTCIR-11 RITE-VAL Task

KSU Team s System and Experience at the NTCIR-11 RITE-VAL Task KSU Team s System and Experience at the NTCIR-11 RITE-VAL Task Tasuku Kimura Kyoto Sangyo University, Japan i1458030@cse.kyoto-su.ac.jp Hisashi Miyamori Kyoto Sangyo University, Japan miya@cse.kyoto-su.ac.jp

More information

Gene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm

Gene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm Gene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm Zhenqiu Liu, Dechang Chen 2 Department of Computer Science Wayne State University, Market Street, Frederick, MD 273,

More information

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7. Preliminaries Linear models: the perceptron and closest centroid algorithms Chapter 1, 7 Definition: The Euclidean dot product beteen to vectors is the expression d T x = i x i The dot product is also

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

Essence of Machine Learning (and Deep Learning) Hoa M. Le Data Science Lab, HUST hoamle.github.io

Essence of Machine Learning (and Deep Learning) Hoa M. Le Data Science Lab, HUST hoamle.github.io Essence of Machine Learning (and Deep Learning) Hoa M. Le Data Science Lab, HUST hoamle.github.io 1 Examples https://www.youtube.com/watch?v=bmka1zsg2 P4 http://www.r2d3.us/visual-intro-to-machinelearning-part-1/

More information

An Introduction to Machine Learning

An Introduction to Machine Learning An Introduction to Machine Learning L6: Structured Estimation Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune, January

More information

Warm up: risk prediction with logistic regression

Warm up: risk prediction with logistic regression Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T

More information

Lab 12: Structured Prediction

Lab 12: Structured Prediction December 4, 2014 Lecture plan structured perceptron application: confused messages application: dependency parsing structured SVM Class review: from modelization to classification What does learning mean?

More information

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018 From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction

More information

A Support Vector Method for Multivariate Performance Measures

A Support Vector Method for Multivariate Performance Measures A Support Vector Method for Multivariate Performance Measures Thorsten Joachims Cornell University Department of Computer Science Thanks to Rich Caruana, Alexandru Niculescu-Mizil, Pierre Dupont, Jérôme

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Support Vector Machines CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification

More information

Introduction to Logistic Regression and Support Vector Machine

Introduction to Logistic Regression and Support Vector Machine Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel

More information

Machine Learning And Applications: Supervised Learning-SVM

Machine Learning And Applications: Supervised Learning-SVM Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine

More information

What is semi-supervised learning?

What is semi-supervised learning? What is semi-supervised learning? In many practical learning domains, there is a large supply of unlabeled data but limited labeled data, which can be expensive to generate text processing, video-indexing,

More information

CS 188: Artificial Intelligence. Outline

CS 188: Artificial Intelligence. Outline CS 188: Artificial Intelligence Lecture 21: Perceptrons Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. Outline Generative vs. Discriminative Binary Linear Classifiers Perceptron Multi-class

More information

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

CISC 636 Computational Biology & Bioinformatics (Fall 2016) CISC 636 Computational Biology & Bioinformatics (Fall 2016) Predicting Protein-Protein Interactions CISC636, F16, Lec22, Liao 1 Background Proteins do not function as isolated entities. Protein-Protein

More information

Latent Dirichlet Allocation

Latent Dirichlet Allocation Outlines Advanced Artificial Intelligence October 1, 2009 Outlines Part I: Theoretical Background Part II: Application and Results 1 Motive Previous Research Exchangeability 2 Notation and Terminology

More information

A Convolutional Neural Network-based

A Convolutional Neural Network-based A Convolutional Neural Network-based Model for Knowledge Base Completion Dat Quoc Nguyen Joint work with: Dai Quoc Nguyen, Tu Dinh Nguyen and Dinh Phung April 16, 2018 Introduction Word vectors learned

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

Batch Mode Sparse Active Learning. Lixin Shi, Yuhang Zhao Tsinghua University

Batch Mode Sparse Active Learning. Lixin Shi, Yuhang Zhao Tsinghua University Batch Mode Sparse Active Learning Lixin Shi, Yuhang Zhao Tsinghua University Our work Propose an unified framework of batch mode active learning Instantiate the framework using classifiers based on sparse

More information

Learning Query and Document Similarities from Click-through Bipartite Graph with Metadata

Learning Query and Document Similarities from Click-through Bipartite Graph with Metadata Learning Query and Document Similarities from Click-through Bipartite Graph with Metadata Wei Wu a, Hang Li b, Jun Xu b a Department of Probability and Statistics, Peking University b Microsoft Research

More information

Large-Margin Thresholded Ensembles for Ordinal Regression

Large-Margin Thresholded Ensembles for Ordinal Regression Large-Margin Thresholded Ensembles for Ordinal Regression Hsuan-Tien Lin (accepted by ALT 06, joint work with Ling Li) Learning Systems Group, Caltech Workshop Talk in MLSS 2006, Taipei, Taiwan, 07/25/2006

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

Metric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT)

Metric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT) Metric Embedding of Task-Specific Similarity Greg Shakhnarovich Brown University joint work with Trevor Darrell (MIT) August 9, 2006 Task-specific similarity A toy example: Task-specific similarity A toy

More information

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection

More information

Collaborative Topic Modeling for Recommending Scientific Articles

Collaborative Topic Modeling for Recommending Scientific Articles Collaborative Topic Modeling for Recommending Scientific Articles Chong Wang and David M. Blei Best student paper award at KDD 2011 Computer Science Department, Princeton University Presented by Tian Cao

More information

Basics of Multivariate Modelling and Data Analysis

Basics of Multivariate Modelling and Data Analysis Basics of Multivariate Modelling and Data Analysis Kurt-Erik Häggblom 2. Overview of multivariate techniques 2.1 Different approaches to multivariate data analysis 2.2 Classification of multivariate techniques

More information

A Bayesian Partitioning Approach to Duplicate Detection and Record Linkage

A Bayesian Partitioning Approach to Duplicate Detection and Record Linkage A Bayesian Partitioning Approach to Duplicate Detection and Record Linkage Mauricio Sadinle msadinle@stat.duke.edu Duke University Supported by NSF grants SES-11-30706 to Carnegie Mellon University and

More information

Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes. November 9, Statistics 202: Data Mining

Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes. November 9, Statistics 202: Data Mining Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes November 9, 2012 1 / 1 Nearest centroid rule Suppose we break down our data matrix as by the labels yielding (X

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Low-Dimensional Discriminative Reranking. Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park

Low-Dimensional Discriminative Reranking. Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park Low-Dimensional Discriminative Reranking Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park Discriminative Reranking Useful for many NLP tasks Enables us to use arbitrary features

More information

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation. ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent

More information

ANLP Lecture 22 Lexical Semantics with Dense Vectors

ANLP Lecture 22 Lexical Semantics with Dense Vectors ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous

More information

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification

More information

Information Retrieval

Information Retrieval Introduction to Information CS276: Information and Web Search Christopher Manning and Pandu Nayak Lecture 15: Learning to Rank Sec. 15.4 Machine learning for IR ranking? We ve looked at methods for ranking

More information

18.9 SUPPORT VECTOR MACHINES

18.9 SUPPORT VECTOR MACHINES 744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the

More information

Machine learning for automated theorem proving: the story so far. Sean Holden

Machine learning for automated theorem proving: the story so far. Sean Holden Machine learning for automated theorem proving: the story so far Sean Holden University of Cambridge Computer Laboratory William Gates Building 15 JJ Thomson Avenue Cambridge CB3 0FD, UK sbh11@cl.cam.ac.uk

More information

Natural Language Processing. Topics in Information Retrieval. Updated 5/10

Natural Language Processing. Topics in Information Retrieval. Updated 5/10 Natural Language Processing Topics in Information Retrieval Updated 5/10 Outline Introduction to IR Design features of IR systems Evaluation measures The vector space model Latent semantic indexing Background

More information

Diversity Regularization of Latent Variable Models: Theory, Algorithm and Applications

Diversity Regularization of Latent Variable Models: Theory, Algorithm and Applications Diversity Regularization of Latent Variable Models: Theory, Algorithm and Applications Pengtao Xie, Machine Learning Department, Carnegie Mellon University 1. Background Latent Variable Models (LVMs) are

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information