Semantic Similarity and Relatedness

Similar documents
MATHEMATICAL AND EXPERIMENTAL INVESTIGATION OF ONTOLOGICAL SIMILARITY MEASURES AND THEIR USE IN BIOMEDICAL DOMAINS

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang

Latent semantic indexing

Towards an Efficient Combination of Similarity Measures for Semantic Relation Extraction

Manning & Schuetze, FSNLP (c) 1999,2000

Matrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang

Manning & Schuetze, FSNLP, (c)

Chapter 10: Information Retrieval. See corresponding chapter in Manning&Schütze

Natural Language Processing. Topics in Information Retrieval. Updated 5/10

The OntoNL Semantic Relatedness Measure for OWL Ontologies

Variable Latent Semantic Indexing

Conceptual Similarity: Why, Where, How

Semantic Similarity from Corpora - Latent Semantic Analysis

Investigation of Latent Semantic Analysis for Clustering of Czech News Articles

Semantic distances & LSA

Word Meaning and Similarity. Word Similarity: Distributional Similarity (I)

Information Retrieval and Topic Models. Mausam (Based on slides of W. Arms, Dan Jurafsky, Thomas Hofmann, Ata Kaban, Chris Manning, Melanie Martin)

Analysis and visualization of protein-protein interactions. Olga Vitek Assistant Professor Statistics and Computer Science

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Toponym Disambiguation using Ontology-based Semantic Similarity

Applied Natural Language Processing

Francisco M. Couto Mário J. Silva Pedro Coutinho

Mapping of Science. Bart Thijs ECOOM, K.U.Leuven, Belgium

Deep Learning. Ali Ghodsi. University of Waterloo

What is Text mining? To discover the useful patterns/contents from the large amount of data that can be structured or unstructured.

Chap 2: Classical models for information retrieval

Retrieval by Content. Part 2: Text Retrieval Term Frequency and Inverse Document Frequency. Srihari: CSE 626 1

DISTRIBUTIONAL SEMANTICS

A Generalized Vector Space Model for Text Retrieval Based on Semantic Relatedness

PROBABILISTIC LATENT SEMANTIC ANALYSIS

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

ANLP Lecture 22 Lexical Semantics with Dense Vectors

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

John Pavlopoulos and Ion Androutsopoulos NLP Group, Department of Informatics Athens University of Economics and Business, Greece

INF 141 IR METRICS LATENT SEMANTIC ANALYSIS AND INDEXING. Crista Lopes

Problems. Looks for literal term matches. Problems:

Extended IR Models. Johan Bollen Old Dominion University Department of Computer Science

Cross-Lingual Language Modeling for Automatic Speech Recogntion

Information Retrieval Basic IR models. Luca Bondi

vector space retrieval many slides courtesy James Amherst

Probabilistic Near-Duplicate. Detection Using Simhash

Matrix decompositions and latent semantic indexing

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

Calculating Semantic Relatedness with GermaNet

Information Retrieval. Lecture 6

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

CS47300: Web Information Search and Management

Fall CS646: Information Retrieval. Lecture 6 Boolean Search and Vector Space Model. Jiepu Jiang University of Massachusetts Amherst 2016/09/26

Determining Word Sense Dominance Using a Thesaurus

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology

An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition

LSI vs. Wordnet Ontology in Dimension Reduction for Information Retrieval

Lecture 9: Probabilistic IR The Binary Independence Model and Okapi BM25

Information Retrieval

.. CSC 566 Advanced Data Mining Alexander Dekhtyar..

Exploiting WordNet as Background Knowledge

III.6 Advanced Query Types

CS276A Text Information Retrieval, Mining, and Exploitation. Lecture 4 15 Oct 2002

Information Retrieval and Web Search

Knowledge Discovery and Data Mining 1 (VO) ( )

Lecture 5: Web Searching using the SVD

Similarity Measures for Query Expansion in TopX

Vector Space Model. Yufei Tao KAIST. March 5, Y. Tao, March 5, 2013 Vector Space Model

13 Searching the Web with the SVD

9 Searching the Internet with the SVD

Information Retrieval

Ontology-Based News Recommendation

Term Weighting and the Vector Space Model. borrowing from: Pandu Nayak and Prabhakar Raghavan

Term Weighting and Vector Space Model. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

Latent Semantic Analysis (Tutorial)

Text classification II CE-324: Modern Information Retrieval Sharif University of Technology

Text Document Clustering Using Global Term Context Vectors

Ranked IR. Lecture Objectives. Text Technologies for Data Science INFR Learn about Ranked IR. Implement: 10/10/2018. Instructor: Walid Magdy

Text Analytics (Text Mining)

Information Retrieval

Boolean and Vector Space Retrieval Models

Ranked IR. Lecture Objectives. Text Technologies for Data Science INFR Learn about Ranked IR. Implement: 10/10/2017. Instructor: Walid Magdy

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

Modern Information Retrieval

Outline for today. Information Retrieval. Cosine similarity between query and document. tf-idf weighting

Motivation. User. Retrieval Model Result: Query. Document Collection. Information Need. Information Retrieval / Chapter 3: Retrieval Models

Text Analytics (Text Mining)

Information Retrieval

A few applications of the SVD

Recap of the last lecture. CS276A Information Retrieval. This lecture. Documents as vectors. Intuition. Why turn docs into vectors?

Scoring, Term Weighting and the Vector Space

TDDD43. Information Retrieval. Fang Wei-Kleiner. ADIT/IDA Linköping University. Fang Wei-Kleiner ADIT/IDA LiU TDDD43 Information Retrieval 1

Maschinelle Sprachverarbeitung

From Non-Negative Matrix Factorization to Deep Learning

Notes on Latent Semantic Analysis

Deep Learning Basics Lecture 10: Neural Language Models. Princeton University COS 495 Instructor: Yingyu Liang

Information Retrival Ranking. tf-idf. Eddie Aronovich. October 30, 2012

CS 3750 Advanced Machine Learning. Applications of SVD and PCA (LSA and Link analysis) Cem Akkaya

Lecture 7: Word Embeddings

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing

Informa(on theory in ML

Transcription:

Semantic Relatedness Semantic Similarity and Relatedness (Based on Budanitsky, Hirst 2006 and Chapter 20 of Jurafsky/Martin 2 nd. Ed. - Most figures taken from either source.) Many applications require measure of relation between words, even if they are not collocated: WSD, Information Retrieval, Query Translation Identifying Relatedness Synonyms: Thesaurus, Wordnet {favored, popula preferred} Same category terms: Thesaurus Categories, other Ontologies, Wordnet Hierarchy {Roget s Thesaurus Catgory like: admire delight, attract, unwelcome} Context based similarity WordNet Path Based Similarity Path Length: Limitations Simplest measure: pathlength: #edges in shortest path between sense nodes c1 and c2 sim (c1,c2) = log pathlen(c1,c2) wordsim(w1,w2) = max (c1 senses(w1), c2 senses(w2)) sim(c1,c2) Figure from Budanitsky, Hirst 2006 Edges at different heights carry different information Wu, Palmer 1994 lso: lowest super-ordinate

Path Length: Further Limitations Even at same height, all edges are not equal Compare <credit, credit card> with <money, cash> Need a weight on edges Measuring P(c): Information Content IC(c) = -log P(c) Note that c refers to a category All words subsumed by c are also counted More common a word, lower is its IC = log(1/p(c)) Key Idea (Jiang Conrath 1997): semantic distance of the link connecting a childconcept c to its parent-concept par(c) is proportional to the conditional probability p(c par(c)) From Lin 1998 IC based Edge Weighing (Jiang Conrath 1997) Key Idea: semantic distance of the link connecting a child-concept c to its parent-concept par(c) is proportional to the conditional probability p(c par(c)) Lin 97 Derived the same formulae slightly differently Word similarity is about commonality vs. differences. (Lin 1998) Simplifying Note that with this edge weight definition PathWeight(c, Root) = IC(c) Putting PathWeight in WuPalmer gives =.59 Jiang Conrath Distance = { IC(c1) IC(lso(c1,c2)) } + { IC(c2) IC(lso(c1,c2)) } WordNet Based Similarity: Key Ideas All edges are not equal, edges at different heights carry different information Even at same height, all edges are not equal, need weight on edges Edge weight as conditional probability p(c par(c)) Information Content: IC(c) = -log P(c) EdgeWeight(c, par(c)) = IC(c) IC(par(c)) PathWeight(c, Root) = IC(c) Integrating WuPalmer with PathWeight gives Same can be derived by considering that Similarity is about commonality vs. differences Putting EdgeWeight in sim (c1,c2) = log pathlen(c1,c2), gives Gives the best result on several applications, with English Wordnet Compared to WuPalmer ignoring depth already taken care of by IC.

Similarity vs. Relatedness Previous methods may not work for words belonging to different classes: car and petrol Gloss Overlap as a similarity measure Extended Gloss Overlap Concepts and Figures from Banerjee and Pedersen 2003 Instead of wordnet relations, some other method can be used for extending the definitions Distribution Hypothesis Similarity using Co-occurrence Vectors words that occur in the same contexts tend to have similar meanings (Harris, 1954) "a word is characterized by the company it keeps" Firth (1957) Commanality vs. Differences All these measures change value with scaling

Normalized Measure Cosine Measure: Applications From FSNLP From Patwardhan, Pederson 2006 Right measure depends on the application What happens when we scale the values What happens when we add dimensions/features Consider a vector with 4 entries. What happens if we add 20 more entries to it How similar is it to the original vector as per different measures De facto standard for term vector comparison Cosine Similarity has several other Applications In IR, Query vs. Document Similarity Comparing Documents Finding Obfuscated code for virus detection or assignment copy detection Tuning the Context Vectors Using Dependency Relations Remove the very high frequency and very low frequency terms Weigh the terms using tf*idf (o inverse definition/context frequency in this case) Term Weight: 1 + log (tf) IDF: log(n/df) Use PMI instead of counts PMI (w1, w2) = log { P(w1, w2)/p(w1)p(w2) } Only use context words that have a dependency relation Use PMI conditional on dependency relation Final Similarity Measure C( w, w') C(*, *) I ( w, w' ) = log C( w, *) C(*, w' ) Formulae and figure below from Lin 98. Show that it is equivalent to PMI Conditional on r T(w): set of pairs (w ) s.t. PMI(w,w ) is positive Full Parsing is expensive: Use Shallow Parsing only subject, object, modifier etc. Using Probabilistic Measures Second Order Measures When counts are replaced by probabilities then cosine distance may not be the best metric. Different vector entries are not co-operating but competing We need to compare probability distributions KL-Divergence: How well distribution Q approximated distribution P, or how much information is lost if we assume Q instead of P Asymmetric, Undefined when Q is 0 Jenson Shannon Divergence Compare two words using cosine similarity of their co-occurrence vectors Standard Approach Knowledge Based Approach: Overlaps based on definitions Marrying the two: Second Order Context Vectors Extend the definitions with the centroid of the context vectors of the words in the definition (Patwardhan, Pederson 2006)

Similarity of Short Contexts: Applications Reducing Dimensions: Singular Value Decomposition (SVD) From lion.cs.uiuc.edu From Pedersen 2009 From www.mathworks.com Note the similarity between d2 and d3: Discovers that cosmonaut And astronaut are related, as They both co-occur with moon FSNLP Chap. 15.4