Towards an Efficient Combination of Similarity Measures for Semantic Relation Extraction

Size: px
Start display at page:

Download "Towards an Efficient Combination of Similarity Measures for Semantic Relation Extraction"

Transcription

1 Towards an Efficient Combination of Similarity Measures for Semantic Relation Extraction Alexander Panchenko Université catholique de Louvain & Bauman Moscow State Technical University 5th December 2011 / CLAIM Seminar, BMSTU Alexander Panchenko 1/30

2 Plan 1 Introduction 2 Methodology 3 Evaluation 4 Results 5 Conclusion and Further Research Alexander Panchenko 2/30

3 Reference Papers Panchenko A. Method for Automatic Construction of Semantic Relations Between Concepts of an Information Retrieval Thesaurus. // In Herald of the Voronezh State University. Series Systems Analysis and Information Technologies, vol.2, pages , analiz&year=2010&num=02&f_name= Panchenko A. Comparison of the Knowledge-, Corpus-, and Web-based Similarity Measures for Semantic Relations Extraction // Proceedings of the GEMS 2011 Workshop on Geometrical Models of Natural Language Semantics, EMNLP 2011, pages 11-21, Panchenko A. Towards an Efficient Combination of Similarity Measures for Semantic Relation Extraction // Submitted to the Student Workshop of EACL Alexander Panchenko 3/30

4 Semantic Relations r = c i, t, c j semantic relation, where c i, c j C, t T C terms e.g. radio or receiver operating characteristic T semantic relation types, e.g. hyponymy or synonymy R C T C set of semantic relations Alexander Panchenko 4/30

5 Semantic Relations Example: Thesaurus Figure: A part of a the information retrieval thesaurus EuroVoc. Alexander Panchenko 5/30

6 Semantic Relations Example: Thesaurus Figure: A part of a the information retrieval thesaurus EuroVoc. R = energy-generating product, NT, energy industry energy technology, NT, energy industry petrolium, RT, fossil fuel energy technology, RT, oil technology... Alexander Panchenko 5/30

7 General Problem: Automatic Thesaurus Construction Figure: A technology of automatic thesaurus construction. How thesaurus is used? Query expansion and query suggestion Navigation and browsing on the corpus Visualization of the corpus Alexander Panchenko 6/30

8 The Problem Semantic Relations Extraction Input: terms C, semantic relation types T Ouput: lexico-semantic relations ^R R Alexander Panchenko 7/30

9 The Problem Semantic Relations Extraction Input: terms C, semantic relation types T Ouput: lexico-semantic relations ^R R Pattern-based relations extraction, where patterns are built manually (Hearst, 1992) or semi-automatically (Snow, 2004) (+) High precision ( ) Complexity and cost pattern construction ( ) Patterns are highly task and domain dependent Alexander Panchenko 7/30

10 The Problem Semantic Relations Extraction Input: terms C, semantic relation types T Ouput: lexico-semantic relations ^R R Pattern-based relations extraction, where patterns are built manually (Hearst, 1992) or semi-automatically (Snow, 2004) (+) High precision ( ) Complexity and cost pattern construction ( ) Patterns are highly task and domain dependent Similarity-based relation extraction (Philippovich and Prokhorov, 2002; Grefenstette, 1994; Curran and Moens, 2002) ( ) Less precise (+) Little or no manual work (+) More adaptive across domains Alexander Panchenko 7/30

11 Similarity-based Relation Extraction State of the Art: There exist many heterogeneous similarity measures based on corpus, knowledge, web, definitions, etc. Research Questions: Alexander Panchenko 8/30

12 Similarity-based Relation Extraction State of the Art: There exist many heterogeneous similarity measures based on corpus, knowledge, web, definitions, etc. Various measures provide complimentary types of semantic information. Research Questions: Alexander Panchenko 8/30

13 Similarity-based Relation Extraction State of the Art: There exist many heterogeneous similarity measures based on corpus, knowledge, web, definitions, etc. Various measures provide complimentary types of semantic information. This suggest their combination. Research Questions: Alexander Panchenko 8/30

14 Similarity-based Relation Extraction State of the Art: There exist many heterogeneous similarity measures based on corpus, knowledge, web, definitions, etc. Various measures provide complimentary types of semantic information. This suggest their combination. Research Questions: Which similarity measure is the best for relation extraction? Alexander Panchenko 8/30

15 Similarity-based Relation Extraction State of the Art: There exist many heterogeneous similarity measures based on corpus, knowledge, web, definitions, etc. Various measures provide complimentary types of semantic information. This suggest their combination. Research Questions: Which similarity measure is the best for relation extraction? How to efficiently combine similarity measures so as to improve relation extraction? Alexander Panchenko 8/30

16 The Key Contributions Up To Now A protocol for evaluation of the similarity-based relation extraction Comparison of 34 single measures Two methods of combination similarity and relation fusion Six best combinations outperforming single measures are found Alexander Panchenko 9/30

17 Similarity-based Semantic Relations Extraction Semantic Relations Extraction Algorithm Input: Terms C, Sim.parameters P, Threshold k, Min.similarity value γ Output: Semantic relations ^R (unlabeled) 1 S sim(c, P) ; 2 S normalize(s) ; 3 ^R threshold(s, k, γ) ; 4 return ^R ; Alexander Panchenko 10/30

18 Similarity-based Semantic Relations Extraction Semantic Relations Extraction Algorithm Input: Terms C, Sim.parameters P, Threshold k, Min.similarity value γ Output: Semantic relations ^R (unlabeled) 1 S sim(c, P) ; 2 S normalize(s) ; 3 ^R threshold(s, k, γ) ; 4 return ^R ; sim a similarity measure Alexander Panchenko 10/30

19 Similarity-based Semantic Relations Extraction Semantic Relations Extraction Algorithm Input: Terms C, Sim.parameters P, Threshold k, Min.similarity value γ Output: Semantic relations ^R (unlabeled) 1 S sim(c, P) ; 2 S normalize(s) ; 3 ^R threshold(s, k, γ) ; 4 return ^R ; sim a similarity measure normalize similarity score normalization Alexander Panchenko 10/30

20 Similarity-based Semantic Relations Extraction Semantic Relations Extraction Algorithm Input: Terms C, Sim.parameters P, Threshold k, Min.similarity value γ Output: Semantic relations ^R (unlabeled) 1 S sim(c, P) ; 2 S normalize(s) ; 3 ^R threshold(s, k, γ) ; 4 return ^R ; sim a similarity measure normalize similarity score normalization threshold knn thresholding R = C i=1 { c i, t, c j : c j top k% terms s ij γ}. Alexander Panchenko 10/30

21 Knowledge-based Measures (6) Data: semantic network WordNet 3.0, corpus SemCor. Alexander Panchenko 11/30

22 Knowledge-based Measures (6) Data: semantic network WordNet 3.0, corpus SemCor. Variables: len(c i, c j ) length of the shortest path between terms c i and c j len(c i, lcs(c i, c j )) length of the shortest path from c i to the lowest common subsumer (LCS) of c i and c j len(c root, lcs(c i, c j )) length of the shortest path from the root term c root to the LCS of c i and c j P(c) probability of the term c, estimated from a corpus P(lcs(c i, c j )) probability of the LCS of c i and c j Alexander Panchenko 11/30

23 Knowledge-based Measures (6) Data: semantic network WordNet 3.0, corpus SemCor. Variables: len(c i, c j ) length of the shortest path between terms c i and c j len(c i, lcs(c i, c j )) length of the shortest path from c i to the lowest common subsumer (LCS) of c i and c j len(c root, lcs(c i, c j )) length of the shortest path from the root term c root to the LCS of c i and c j P(c) probability of the term c, estimated from a corpus P(lcs(c i, c j )) probability of the LCS of c i and c j Measures: Inverted Edge Count (Jurafsky and Martin, 2009), Leacock-Chodorow (1998), Wu-Palmer (1994), Resnik (1995), Jiang-Conrath (1997), Lin (1998). Alexander Panchenko 11/30

24 Web-based Measures (9) Data: number of the hits returned by an information retrieval system (GOOGLE, YAHOO, YAHOO BOSS, BING). Alexander Panchenko 12/30

25 Web-based Measures (9) Data: number of the hits returned by an information retrieval system (GOOGLE, YAHOO, YAHOO BOSS, BING). Variables: h i number of hits returned by query "c i " h ij number of hits returned by the query "c i AND c j " Alexander Panchenko 12/30

26 Web-based Measures (9) Data: number of the hits returned by an information retrieval system (GOOGLE, YAHOO, YAHOO BOSS, BING). Variables: h i number of hits returned by query "c i " h ij number of hits returned by the query "c i AND c j " Measures: NGD (Cilibrasi and Vitanyi, 2007) PMI-IR (Turney, 2001) Alexander Panchenko 12/30

27 Corpus-based Measures (13) Data: corpus WACYPEDIA (800M tokens) and UKWAC (2000M) Alexander Panchenko 13/30

28 Corpus-based Measures (13) Data: corpus WACYPEDIA (800M tokens) and UKWAC (2000M) Variables: f i context window feature vector of term c i f s i syntactic feature vector of c i Alexander Panchenko 13/30

29 Corpus-based Measures (13) Data: corpus WACYPEDIA (800M tokens) and UKWAC (2000M) Variables: f i context window feature vector of term c i f s i syntactic feature vector of c i Measures: BDA (Sahlgren, 2006) SDA (Curran, 2003) LSA on the TASA corpus (Landauer and Dumais, 1997) NGD and PMI-IR on the Factiva corpus (Veksler et al., 2008). Alexander Panchenko 13/30

30 Corpus-based Measures: Distributional Analysis Distributional Similarity Measure Input: Terms C, Corpus D, Number of features β, Min.term frequency θ, Feature matrix construction param. P Output: Similarity matrix, S [C C] 1 F construct_fmatrix(c, D, β, θ, P) ; 2 F pmi(f) ; 3 S cos(f) ; 4 return S ; PMI normalization f ij = log P(c i,f j ) P(c i )P(f j ) = log Cosine similarity: s ij = cos(c i, c j ) = f i f j f i f j f ij n(c i ) i f ij Alexander Panchenko 14/30

31 Definition-based Measures (6) Data: definitions from WordNet, Wikipedia, and Wiktionary. Alexander Panchenko 15/30

32 Definition-based Measures (6) Data: definitions from WordNet, Wikipedia, and Wiktionary. Variables: gloss(c) definition of the term sim(gloss(c i ), gloss(c j )) similarity of terms glosses f i context vector of c i, calculated on the corpus of all glosses f i bag-of-words vector, derived from the definition of c i exist(c i, c j ) a relation between c i and c j in the dictionary Alexander Panchenko 15/30

33 Definition-based Measures (6) Data: definitions from WordNet, Wikipedia, and Wiktionary. Variables: gloss(c) definition of the term sim(gloss(c i ), gloss(c j )) similarity of terms glosses f i context vector of c i, calculated on the corpus of all glosses f i bag-of-words vector, derived from the definition of c i exist(c i, c j ) a relation between c i and c j in the dictionary Measures: BDA using Wiktionary and Wikipedia Extended Lesk using Wordnet (Banerjee and Pedersen, 2003) Gloss Vectors using Wordnet (Patwardhan and Pedersen, 2006) Alexander Panchenko 15/30

34 Definition-based Measures Wiktionary-based Similarity Measure Input: Terms C, UseWikipedia, Number of features β Output: Similarity matrix, S [C C] 1 D get_wiktionary_definitions(c) ; 2 if UseWikipedia then 3 D D get_wikipedia_definitions(c) 4 F construct_fmatrix(c, D, β) ; 5 F pmi(f) ; 6 S cos(f) ; 7 S update_similarity(s) ; 8 return S ; Alexander Panchenko 16/30

35 Combined Measures Similarity Fusion: S cmb = 1 N N i=1 S i Relation Fusion: Relation fusion measure Input: Sim.matrices produced by N measures {S 1,..., S N }, knn threshold k Output: Combined similarity matrix, S cmb 1 for i=1,n do 2 R i threshold(s i, k, γ = 0) R i relation_matrix(r i ) 3 S cmb 1 N N i=1 R i ; 4 return S cmb ; { 1 if ci, t, c r ij = j R k 0 else Alexander Panchenko 17/30

36 Combined Measures Which of the 34 single measures should we combine? We present combinations of three groups of measures: Group4 = WN-Resnik, BDA , SDA , Def-WktWiki-1000 Group8 = Group4 + WN-WuPalmer, LSA-Tasa, Def-GlossVec., and Def-Ext.Les Group14 = Group8 + WN-LeacockChodorow, WN-Lin, WN-JiangConrath, NGD-Factiva, NGD-Yahoo, and NGD-GoogleWiki. Alexander Panchenko 18/30

37 Evaluation with Human Judgments term, c i term, c j human sim., s sim., s human rank, r sim.rank, ^r tiger cat book paper computer keyboard possibility girl sugar approach Alexander Panchenko 19/30

38 Evaluation with Human Judgments term, c i term, c j human sim., s sim., s human rank, r sim.rank, ^r tiger cat book paper computer keyboard possibility girl sugar approach Human judgments datasets: WordSim353 (Finkelstein, 2002) 353 pairs Miller Charles (1991) 30 pairs Rubenstein Goodenough (1965) 65 pairs Alexander Panchenko 19/30

39 Evaluation with Human Judgments term, c i term, c j human sim., s sim., s human rank, r sim.rank, ^r tiger cat book paper computer keyboard possibility girl sugar approach Human judgments datasets: WordSim353 (Finkelstein, 2002) 353 pairs Miller Charles (1991) 30 pairs Rubenstein Goodenough (1965) 65 pairs Person s correlation: ρ = cov(s,^s) σ(s)σ(^s) Spearman s correlation: r = cov(r,^r) σ(r)σ(^r) Alexander Panchenko 19/30

40 Evaluation with Semantic Relations target term, c i relatum term, c j relation type, t judge adjudicate syn judge arbitrate syn judge asessor syn judge chancellor syn judge gendarmerie syn judge sheriff syn judge pc random judge fare random judge lemon random Number of correct and random relations is equal for each target term! Semantic Relations Datasets: BLESS (Baroni and Lenci, 2011) relations (hyper, coord, mero, event, attri, random) SN (Panchenko,?) relations (syn, random) Alexander Panchenko 20/30

41 Evaluation with Semantic Relations Let R all semantic relations, which are not random ^R extracted relations k knn threshold Evaluation Metrics Precision = R ^R ^R Recall = R ^R R F1 = 2 Precision Recall Precision+Recall MAP(M) = 1 M M k=1 Precision(k). Alexander Panchenko 21/30

42 Example: Evaluation with Semantic Relations Precision(50%) = target word relatum word relation type sim aficionado enthusiast syn aficionado fan syn aficionado admirer syn aficionado addict syn aficionado devotee syn aficionado foundling random aficionado fanatic syn aficionado adherent syn aficionado capital random aficionado statute random aficionado blot random aficionado meddler random aficionado enlargement random aficionado bawdyhouse random Alexander Panchenko 22/30

43 Results on the Human Judgements Datasets Alexander Panchenko 23/30

44 Results on the Semantic Relations Datasets Alexander Panchenko 24/30

45 Precision-Recall Curves Figure: PR graphs of (on the left) the best single and combined measures; (on the right) Wiktionary measures. Alexander Panchenko 25/30

46 Precision-Recall Curves Figure: PR graph of four combined measures. Alexander Panchenko 26/30

47 Conclusion: The best single measures: Wordnet-based measure WN-Resnik Bag-of-word distributional measure BDA Syntactic distributional measure SDA Wiktionary measure Def-WktWiki-1000 The best combined measure: Relation fusion of 8 measures Comb-Rel-810 Very close to combined measures using 14 measures Alexander Panchenko 27/30

48 Further Research: More Sophisticated Combination Methods: Unsupervised feature combination Bag-of-word features of Distributional Analysis + Wikipedia/Wiktionary/Wordnet definitions Feature tensor: jointly co-occuring DA features, tensor decompositions for better fusion Similarity tensor: yet another similarity fusion technique Alexander Panchenko 28/30

49 Further Research: More Sophisticated Combination Methods: Unsupervised feature combination Bag-of-word features of Distributional Analysis + Wikipedia/Wiktionary/Wordnet definitions Feature tensor: jointly co-occuring DA features, tensor decompositions for better fusion Similarity tensor: yet another similarity fusion technique Supervised linear combination of pairwise similarities Alexander Panchenko 28/30

50 Further Research: More Sophisticated Combination Methods: Unsupervised feature combination Bag-of-word features of Distributional Analysis + Wikipedia/Wiktionary/Wordnet definitions Feature tensor: jointly co-occuring DA features, tensor decompositions for better fusion Similarity tensor: yet another similarity fusion technique Supervised linear combination of pairwise similarities Supervised linear combination of features used by single measures Alexander Panchenko 28/30

51 Further Research: Evaluation Domain-specific terms and relations Agrovoc, MeSH, etc. An application-based evaluation query expansion Alexander Panchenko 29/30

52 Further Research: Evaluation Domain-specific terms and relations Agrovoc, MeSH, etc. An application-based evaluation query expansion Methods Corpus-based:DA with n-grams, surface patterns, LSA, LDA, syntactic tree kernels Web-based: more experiments with Google hits Knowledge-based: SimRank, random walks and the like on the Wikipedia/Wiktionary/Wordnet category lattice Surface-based: edit distance, longest common substring etc. Alexander Panchenko 29/30

53 Further Research: Evaluation Domain-specific terms and relations Agrovoc, MeSH, etc. An application-based evaluation query expansion Methods Corpus-based:DA with n-grams, surface patterns, LSA, LDA, syntactic tree kernels Web-based: more experiments with Google hits Knowledge-based: SimRank, random walks and the like on the Wikipedia/Wiktionary/Wordnet category lattice Surface-based: edit distance, longest common substring etc. Relation types: supervised model trained on a set of hyponyms, synonyms, etc. Alexander Panchenko 29/30

54 Questions Thank you! Questions? Alexander Panchenko 30/30

Semantic Similarity and Relatedness

Semantic Similarity and Relatedness Semantic Relatedness Semantic Similarity and Relatedness (Based on Budanitsky, Hirst 2006 and Chapter 20 of Jurafsky/Martin 2 nd. Ed. - Most figures taken from either source.) Many applications require

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa VS model in practice Document and query are represented by term vectors Terms are not necessarily orthogonal to each other Synonymy: car v.s. automobile Polysemy:

More information

Factorization of Latent Variables in Distributional Semantic Models

Factorization of Latent Variables in Distributional Semantic Models Factorization of Latent Variables in Distributional Semantic Models Arvid Österlund and David Ödling KTH Royal Institute of Technology, Sweden arvidos dodling@kth.se Magnus Sahlgren Gavagai, Sweden mange@gavagai.se

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa Recap: vector space model Represent both doc and query by concept vectors Each concept defines one dimension K concepts define a high-dimensional space Element

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Word vectors Many slides borrowed from Richard Socher and Chris Manning Lecture plan Word representations Word vectors (embeddings) skip-gram algorithm Relation to matrix factorization

More information

MATHEMATICAL AND EXPERIMENTAL INVESTIGATION OF ONTOLOGICAL SIMILARITY MEASURES AND THEIR USE IN BIOMEDICAL DOMAINS

MATHEMATICAL AND EXPERIMENTAL INVESTIGATION OF ONTOLOGICAL SIMILARITY MEASURES AND THEIR USE IN BIOMEDICAL DOMAINS ABSTRACT MATHEMATICAL AND EXPERIMENTAL INVESTIGATION OF ONTOLOGICAL SIMILARITY MEASURES AND THEIR USE IN BIOMEDICAL DOMAINS by Xinran Yu Similarity measurement is an important notion. In the context of

More information

Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287

Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Review: Neural Networks One-layer multi-layer perceptron architecture, NN MLP1 (x) = g(xw 1 + b 1 )W 2 + b 2 xw + b; perceptron x is the

More information

The OntoNL Semantic Relatedness Measure for OWL Ontologies

The OntoNL Semantic Relatedness Measure for OWL Ontologies The OntoNL Semantic Relatedness Measure for OWL Ontologies Anastasia Karanastasi and Stavros hristodoulakis Laboratory of Distributed Multimedia Information Systems and Applications Technical University

More information

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing Semantics with Dense Vectors Reference: D. Jurafsky and J. Martin, Speech and Language Processing 1 Semantics with Dense Vectors We saw how to represent a word as a sparse vector with dimensions corresponding

More information

Calculating Semantic Relatedness with GermaNet

Calculating Semantic Relatedness with GermaNet Organismus, Lebewesen organism, being Katze cat... Haustier pet Hund dog...... Baum tree Calculating Semantic Relatedness with GermaNet Verena Henrich, Düsseldorf, 19. Februar 2015 Semantic Relatedness

More information

INF 141 IR METRICS LATENT SEMANTIC ANALYSIS AND INDEXING. Crista Lopes

INF 141 IR METRICS LATENT SEMANTIC ANALYSIS AND INDEXING. Crista Lopes INF 141 IR METRICS LATENT SEMANTIC ANALYSIS AND INDEXING Crista Lopes Outline Precision and Recall The problem with indexing so far Intuition for solving it Overview of the solution The Math How to measure

More information

Chapter 10: Information Retrieval. See corresponding chapter in Manning&Schütze

Chapter 10: Information Retrieval. See corresponding chapter in Manning&Schütze Chapter 10: Information Retrieval See corresponding chapter in Manning&Schütze Evaluation Metrics in IR 2 Goal In IR there is a much larger variety of possible metrics For different tasks, different metrics

More information

John Pavlopoulos and Ion Androutsopoulos NLP Group, Department of Informatics Athens University of Economics and Business, Greece

John Pavlopoulos and Ion Androutsopoulos NLP Group, Department of Informatics Athens University of Economics and Business, Greece John Pavlopoulos and Ion Androutsopoulos NLP Group, Department of Informatics Athens University of Economics and Business, Greece http://nlp.cs.aueb.gr/ A laptop with great design, but the service was

More information

WEST: WEIGHTED-EDGE BASED SIMILARITY MEASUREMENT TOOLS FOR WORD SEMANTICS

WEST: WEIGHTED-EDGE BASED SIMILARITY MEASUREMENT TOOLS FOR WORD SEMANTICS WEST: WEIGHTED-EDGE BASED SIMILARITY MEASUREMENT TOOLS FOR WORD SEMANTICS Liang Dong, Pradip K. Srimani, James Z. Wang School of Computing, Clemson University Web Intelligence 2010, September 1, 2010 Outline

More information

FROM QUERIES TO TOP-K RESULTS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

FROM QUERIES TO TOP-K RESULTS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS FROM QUERIES TO TOP-K RESULTS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Retrieval evaluation Link

More information

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation. ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent

More information

ANLP Lecture 22 Lexical Semantics with Dense Vectors

ANLP Lecture 22 Lexical Semantics with Dense Vectors ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous

More information

Information Extraction from Text

Information Extraction from Text Information Extraction from Text Jing Jiang Chapter 2 from Mining Text Data (2012) Presented by Andrew Landgraf, September 13, 2013 1 What is Information Extraction? Goal is to discover structured information

More information

Do Neural Network Cross-Modal Mappings Really Bridge Modalities?

Do Neural Network Cross-Modal Mappings Really Bridge Modalities? Do Neural Network Cross-Modal Mappings Really Bridge Modalities? Language Intelligence and Information Retrieval group (LIIR) Department of Computer Science Story Collell, G., Zhang, T., Moens, M.F. (2017)

More information

Semantic Similarity from Corpora - Latent Semantic Analysis

Semantic Similarity from Corpora - Latent Semantic Analysis Semantic Similarity from Corpora - Latent Semantic Analysis Carlo Strapparava FBK-Irst Istituto per la ricerca scientifica e tecnologica I-385 Povo, Trento, ITALY strappa@fbk.eu Overview Latent Semantic

More information

A Generalized Vector Space Model for Text Retrieval Based on Semantic Relatedness

A Generalized Vector Space Model for Text Retrieval Based on Semantic Relatedness A Generalized Vector Space Model for Text Retrieval Based on Semantic Relatedness George Tsatsaronis and Vicky Panagiotopoulou Department of Informatics Athens University of Economics and Business, 76,

More information

Word Meaning and Similarity. Word Similarity: Distributional Similarity (I)

Word Meaning and Similarity. Word Similarity: Distributional Similarity (I) Word Meaning and Similarity Word Similarity: Distributional Similarity (I) Problems with thesaurus-based meaning We don t have a thesaurus for every language Even if we do, they have problems with recall

More information

A Game-Theoretic Approach to Graph Transduction: An Experimental Study

A Game-Theoretic Approach to Graph Transduction: An Experimental Study MSc (ex D.M. 270/2004) in Computer Science Dissertation A Game-Theoretic Approach to Graph Transduction: An Experimental Study Supervisor Prof. Marcello Pelillo Candidate Michele Schiavinato Id 810469

More information

Retrieval by Content. Part 2: Text Retrieval Term Frequency and Inverse Document Frequency. Srihari: CSE 626 1

Retrieval by Content. Part 2: Text Retrieval Term Frequency and Inverse Document Frequency. Srihari: CSE 626 1 Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency Srihari: CSE 626 1 Text Retrieval Retrieval of text-based information is referred to as Information Retrieval (IR)

More information

DISTRIBUTIONAL SEMANTICS

DISTRIBUTIONAL SEMANTICS COMP90042 LECTURE 4 DISTRIBUTIONAL SEMANTICS LEXICAL DATABASES - PROBLEMS Manually constructed Expensive Human annotation can be biased and noisy Language is dynamic New words: slangs, terminology, etc.

More information

.. CSC 566 Advanced Data Mining Alexander Dekhtyar..

.. CSC 566 Advanced Data Mining Alexander Dekhtyar.. .. CSC 566 Advanced Data Mining Alexander Dekhtyar.. Information Retrieval Latent Semantic Indexing Preliminaries Vector Space Representation of Documents: TF-IDF Documents. A single text document is a

More information

Notes on Latent Semantic Analysis

Notes on Latent Semantic Analysis Notes on Latent Semantic Analysis Costas Boulis 1 Introduction One of the most fundamental problems of information retrieval (IR) is to find all documents (and nothing but those) that are semantically

More information

Boolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK).

Boolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). Boolean and Vector Space Retrieval Models 2013 CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). 1 Table of Content Boolean model Statistical vector space model Retrieval

More information

Part A. P (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 ) P (w M w 1 w 2 w M 1 ) P (w 1 )P (w 2 w 1 )P (w 3 w 2 ) P (w M w M 1 )

Part A. P (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 ) P (w M w 1 w 2 w M 1 ) P (w 1 )P (w 2 w 1 )P (w 3 w 2 ) P (w M w M 1 ) Part A 1. A Markov chain is a discrete-time stochastic process, defined by a set of states, a set of transition probabilities (between states), and a set of initial state probabilities; the process proceeds

More information

Concepts & Categorization. Measurement of Similarity

Concepts & Categorization. Measurement of Similarity Concepts & Categorization Measurement of Similarity Geometric approach Featural approach both are vector representations Vector-representation for words Words represented as vectors of feature values Similar

More information

Information Retrieval

Information Retrieval Introduction to Information CS276: Information and Web Search Christopher Manning and Pandu Nayak Lecture 13: Latent Semantic Indexing Ch. 18 Today s topic Latent Semantic Indexing Term-document matrices

More information

Deep Learning for NLP Part 2

Deep Learning for NLP Part 2 Deep Learning for NLP Part 2 CS224N Christopher Manning (Many slides borrowed from ACL 2012/NAACL 2013 Tutorials by me, Richard Socher and Yoshua Bengio) 2 Part 1.3: The Basics Word Representations The

More information

Applied Natural Language Processing

Applied Natural Language Processing Applied Natural Language Processing Info 256 Lecture 9: Lexical semantics (Feb 19, 2019) David Bamman, UC Berkeley Lexical semantics You shall know a word by the company it keeps [Firth 1957] Harris 1954

More information

Toponym Disambiguation using Ontology-based Semantic Similarity

Toponym Disambiguation using Ontology-based Semantic Similarity Toponym Disambiguation using Ontology-based Semantic Similarity David S Batista 1, João D Ferreira 2, Francisco M Couto 2, and Mário J Silva 1 1 IST/INESC-ID Lisbon, Portugal {dsbatista,msilva}@inesc-id.pt

More information

Boolean and Vector Space Retrieval Models

Boolean and Vector Space Retrieval Models Boolean and Vector Space Retrieval Models Many slides in this section are adapted from Prof. Joydeep Ghosh (UT ECE) who in turn adapted them from Prof. Dik Lee (Univ. of Science and Tech, Hong Kong) 1

More information

Information Retrieval and Organisation

Information Retrieval and Organisation Information Retrieval and Organisation Chapter 13 Text Classification and Naïve Bayes Dell Zhang Birkbeck, University of London Motivation Relevance Feedback revisited The user marks a number of documents

More information

Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics

Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics Chin-Yew Lin and Franz Josef Och Information Sciences Institute University of Southern California

More information

Learning to translate with neural networks. Michael Auli

Learning to translate with neural networks. Michael Auli Learning to translate with neural networks Michael Auli 1 Neural networks for text processing Similar words near each other France Spain dog cat Neural networks for text processing Similar words near each

More information

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component

More information

A Unified Learning Framework of Skip-Grams and Global Vectors

A Unified Learning Framework of Skip-Grams and Global Vectors A Unified Learning Framework of Skip-Grams and Global Vectors Jun Suzuki and Masaaki Nagata NTT Communication Science Laboratories, NTT Corporation 2-4 Hikaridai, Seika-cho, Soraku-gun, Kyoto, 619-0237

More information

Learning Word Representations by Jointly Modeling Syntagmatic and Paradigmatic Relations

Learning Word Representations by Jointly Modeling Syntagmatic and Paradigmatic Relations Learning Word Representations by Jointly Modeling Syntagmatic and Paradigmatic Relations Fei Sun, Jiafeng Guo, Yanyan Lan, Jun Xu, and Xueqi Cheng CAS Key Lab of Network Data Science and Technology Institute

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval http://informationretrieval.org IIR 18: Latent Semantic Indexing Hinrich Schütze Center for Information and Language Processing, University of Munich 2013-07-10 1/43

More information

Predicting New Search-Query Cluster Volume

Predicting New Search-Query Cluster Volume Predicting New Search-Query Cluster Volume Jacob Sisk, Cory Barr December 14, 2007 1 Problem Statement Search engines allow people to find information important to them, and search engine companies derive

More information

Part I: Web Structure Mining Chapter 1: Information Retrieval and Web Search

Part I: Web Structure Mining Chapter 1: Information Retrieval and Web Search Part I: Web Structure Mining Chapter : Information Retrieval an Web Search The Web Challenges Crawling the Web Inexing an Keywor Search Evaluating Search Quality Similarity Search The Web Challenges Tim

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Chapter 8 Text Classification Introduction A Characterization of Text Classification Unsupervised Algorithms Supervised Algorithms Feature Selection or Dimensionality Reduction

More information

Cross-lingual and temporal Wikipedia analysis

Cross-lingual and temporal Wikipedia analysis MTA SZTAKI Data Mining and Search Group June 14, 2013 Supported by the EC FET Open project New tools and algorithms for directed network analysis (NADINE No 288956) Table of Contents 1 Link prediction

More information

Natural Language Processing. Topics in Information Retrieval. Updated 5/10

Natural Language Processing. Topics in Information Retrieval. Updated 5/10 Natural Language Processing Topics in Information Retrieval Updated 5/10 Outline Introduction to IR Design features of IR systems Evaluation measures The vector space model Latent semantic indexing Background

More information

CS 3750 Advanced Machine Learning. Applications of SVD and PCA (LSA and Link analysis) Cem Akkaya

CS 3750 Advanced Machine Learning. Applications of SVD and PCA (LSA and Link analysis) Cem Akkaya CS 375 Advanced Machine Learning Applications of SVD and PCA (LSA and Link analysis) Cem Akkaya Outline SVD and LSI Kleinberg s Algorithm PageRank Algorithm Vector Space Model Vector space model represents

More information

Measuring Semantic Similarity Between Digital Forensics Terminologies Using Web Search Engines

Measuring Semantic Similarity Between Digital Forensics Terminologies Using Web Search Engines Measuring Semantic Similarity Between Digital Forensics Terminologies Using Web Search Engines Nickson M. Karie Department of Computer Science, University of Pretoria, Private Bag X20, Hatfield 0028, Pretoria,

More information

Learning Features from Co-occurrences: A Theoretical Analysis

Learning Features from Co-occurrences: A Theoretical Analysis Learning Features from Co-occurrences: A Theoretical Analysis Yanpeng Li IBM T. J. Watson Research Center Yorktown Heights, New York 10598 liyanpeng.lyp@gmail.com Abstract Representing a word by its co-occurrences

More information

Mining coreference relations between formulas and text using Wikipedia

Mining coreference relations between formulas and text using Wikipedia Mining coreference relations between formulas and text using Wikipedia Minh Nghiem Quoc 1, Keisuke Yokoi 2, Yuichiroh Matsubayashi 3 Akiko Aizawa 1 2 3 1 Department of Informatics, The Graduate University

More information

FACTORIZATION MACHINES AS A TOOL FOR HEALTHCARE CASE STUDY ON TYPE 2 DIABETES DETECTION

FACTORIZATION MACHINES AS A TOOL FOR HEALTHCARE CASE STUDY ON TYPE 2 DIABETES DETECTION SunLab Enlighten the World FACTORIZATION MACHINES AS A TOOL FOR HEALTHCARE CASE STUDY ON TYPE 2 DIABETES DETECTION Ioakeim (Kimis) Perros and Jimeng Sun perros@gatech.edu, jsun@cc.gatech.edu COMPUTATIONAL

More information

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31 Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking Dengyong Zhou zhou@tuebingen.mpg.de Dept. Schölkopf, Max Planck Institute for Biological Cybernetics, Germany Learning from

More information

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,

More information

Fast LSI-based techniques for query expansion in text retrieval systems

Fast LSI-based techniques for query expansion in text retrieval systems Fast LSI-based techniques for query expansion in text retrieval systems L. Laura U. Nanni F. Sarracco Department of Computer and System Science University of Rome La Sapienza 2nd Workshop on Text-based

More information

Citation for published version (APA): Andogah, G. (2010). Geographically constrained information retrieval Groningen: s.n.

Citation for published version (APA): Andogah, G. (2010). Geographically constrained information retrieval Groningen: s.n. University of Groningen Geographically constrained information retrieval Andogah, Geoffrey IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 6: Numerical Linear Algebra: Applications in Machine Learning Cho-Jui Hsieh UC Davis April 27, 2017 Principal Component Analysis Principal

More information

PROBABILISTIC LATENT SEMANTIC ANALYSIS

PROBABILISTIC LATENT SEMANTIC ANALYSIS PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications

More information

Evaluation. Brian Thompson slides by Philipp Koehn. 25 September 2018

Evaluation. Brian Thompson slides by Philipp Koehn. 25 September 2018 Evaluation Brian Thompson slides by Philipp Koehn 25 September 2018 Evaluation 1 How good is a given machine translation system? Hard problem, since many different translations acceptable semantic equivalence

More information

PV211: Introduction to Information Retrieval https://www.fi.muni.cz/~sojka/pv211

PV211: Introduction to Information Retrieval https://www.fi.muni.cz/~sojka/pv211 PV211: Introduction to Information Retrieval https://www.fi.muni.cz/~sojka/pv211 IIR 18: Latent Semantic Indexing Handout version Petr Sojka, Hinrich Schütze et al. Faculty of Informatics, Masaryk University,

More information

III.6 Advanced Query Types

III.6 Advanced Query Types III.6 Advanced Query Types 1. Query Expansion 2. Relevance Feedback 3. Novelty & Diversity Based on MRS Chapter 9, BY Chapter 5, [Carbonell and Goldstein 98] [Agrawal et al 09] 123 1. Query Expansion Query

More information

Variable Latent Semantic Indexing

Variable Latent Semantic Indexing Variable Latent Semantic Indexing Prabhakar Raghavan Yahoo! Research Sunnyvale, CA November 2005 Joint work with A. Dasgupta, R. Kumar, A. Tomkins. Yahoo! Research. Outline 1 Introduction 2 Background

More information

HMM Expanded to Multiple Interleaved Chains as a Model for Word Sense Disambiguation

HMM Expanded to Multiple Interleaved Chains as a Model for Word Sense Disambiguation HMM Expanded to Multiple Interleaved Chains as a Model for Word Sense Disambiguation Denis Turdakov and Dmitry Lizorkin Institute for System Programming of the Russian Academy of Sciences, 25 Solzhenitsina

More information

Investigation of Latent Semantic Analysis for Clustering of Czech News Articles

Investigation of Latent Semantic Analysis for Clustering of Czech News Articles Investigation of Latent Semantic Analysis for Clustering of Czech News Articles Michal Rott, Petr Červa Laboratory of Computer Speech Processing 4. 9. 2014 Introduction Idea of article clustering Presumptions:

More information

Ontology-Based News Recommendation

Ontology-Based News Recommendation Ontology-Based News Recommendation Wouter IJntema Frank Goossen Flavius Frasincar Frederik Hogenboom Erasmus University Rotterdam, the Netherlands frasincar@ese.eur.nl Outline Introduction Hermes: News

More information

Cross-Lingual Language Modeling for Automatic Speech Recogntion

Cross-Lingual Language Modeling for Automatic Speech Recogntion GBO Presentation Cross-Lingual Language Modeling for Automatic Speech Recogntion November 14, 2003 Woosung Kim woosung@cs.jhu.edu Center for Language and Speech Processing Dept. of Computer Science The

More information

An Efficient Sliding Window Approach for Approximate Entity Extraction with Synonyms

An Efficient Sliding Window Approach for Approximate Entity Extraction with Synonyms An Efficient Sliding Window Approach for Approximate Entity Extraction with Synonyms Jin Wang (UCLA) Chunbin Lin (Amazon AWS) Mingda Li (UCLA) Carlo Zaniolo (UCLA) OUTLINE Motivation Preliminaries Framework

More information

16 The Information Retrieval "Data Model"

16 The Information Retrieval Data Model 16 The Information Retrieval "Data Model" 16.1 The general model Not presented in 16.2 Similarity the course! 16.3 Boolean Model Not relevant for exam. 16.4 Vector space Model 16.5 Implementation issues

More information

Collaborative NLP-aided ontology modelling

Collaborative NLP-aided ontology modelling Collaborative NLP-aided ontology modelling Chiara Ghidini ghidini@fbk.eu Marco Rospocher rospocher@fbk.eu International Winter School on Language and Data/Knowledge Technologies TrentoRISE Trento, 24 th

More information

Outline for today. Information Retrieval. Cosine similarity between query and document. tf-idf weighting

Outline for today. Information Retrieval. Cosine similarity between query and document. tf-idf weighting Outline for today Information Retrieval Efficient Scoring and Ranking Recap on ranked retrieval Jörg Tiedemann jorg.tiedemann@lingfil.uu.se Department of Linguistics and Philology Uppsala University Efficient

More information

A Neural Passage Model for Ad-hoc Document Retrieval

A Neural Passage Model for Ad-hoc Document Retrieval A Neural Passage Model for Ad-hoc Document Retrieval Qingyao Ai, Brendan O Connor, and W. Bruce Croft College of Information and Computer Sciences, University of Massachusetts Amherst, Amherst, MA, USA,

More information

Knowledge Discovery in Data: Overview. Naïve Bayesian Classification. .. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..

Knowledge Discovery in Data: Overview. Naïve Bayesian Classification. .. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar Knowledge Discovery in Data: Naïve Bayes Overview Naïve Bayes methodology refers to a probabilistic approach to information discovery

More information

Fall CS646: Information Retrieval. Lecture 6 Boolean Search and Vector Space Model. Jiepu Jiang University of Massachusetts Amherst 2016/09/26

Fall CS646: Information Retrieval. Lecture 6 Boolean Search and Vector Space Model. Jiepu Jiang University of Massachusetts Amherst 2016/09/26 Fall 2016 CS646: Information Retrieval Lecture 6 Boolean Search and Vector Space Model Jiepu Jiang University of Massachusetts Amherst 2016/09/26 Outline Today Boolean Retrieval Vector Space Model Latent

More information

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,

More information

Information Retrieval

Information Retrieval https://vvtesh.sarahah.com/ Information Retrieval Venkatesh Vinayakarao Term: Aug Dec, 2018 Indian Institute of Information Technology, Sri City Characteristic vectors representing code are often high

More information

Latent semantic indexing

Latent semantic indexing Latent semantic indexing Relationship between concepts and words is many-to-many. Solve problems of synonymy and ambiguity by representing documents as vectors of ideas or concepts, not terms. For retrieval,

More information

Collaborative topic models: motivations cont

Collaborative topic models: motivations cont Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem.

More information

Manning & Schuetze, FSNLP (c) 1999,2000

Manning & Schuetze, FSNLP (c) 1999,2000 558 15 Topics in Information Retrieval (15.10) y 4 3 2 1 0 0 1 2 3 4 5 6 7 8 Figure 15.7 An example of linear regression. The line y = 0.25x + 1 is the best least-squares fit for the four points (1,1),

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

Information Retrieval and Web Search

Information Retrieval and Web Search Information Retrieval and Web Search IR models: Vector Space Model IR Models Set Theoretic Classic Models Fuzzy Extended Boolean U s e r T a s k Retrieval: Adhoc Filtering Brosing boolean vector probabilistic

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 5: Multi-class and preference learning Juho Rousu 11. October, 2017 Juho Rousu 11. October, 2017 1 / 37 Agenda from now on: This week s theme: going

More information

Click Models for Web Search

Click Models for Web Search Click Models for Web Search Lecture 1 Aleksandr Chuklin, Ilya Markov Maarten de Rijke a.chuklin@uva.nl i.markov@uva.nl derijke@uva.nl University of Amsterdam Google Research Europe AC IM MdR Click Models

More information

Can Vector Space Bases Model Context?

Can Vector Space Bases Model Context? Can Vector Space Bases Model Context? Massimo Melucci University of Padua Department of Information Engineering Via Gradenigo, 6/a 35031 Padova Italy melo@dei.unipd.it Abstract Current Information Retrieval

More information

Data Mining Recitation Notes Week 3

Data Mining Recitation Notes Week 3 Data Mining Recitation Notes Week 3 Jack Rae January 28, 2013 1 Information Retrieval Given a set of documents, pull the (k) most similar document(s) to a given query. 1.1 Setup Say we have D documents

More information

Predicting Neighbor Goodness in Collaborative Filtering

Predicting Neighbor Goodness in Collaborative Filtering Predicting Neighbor Goodness in Collaborative Filtering Alejandro Bellogín and Pablo Castells {alejandro.bellogin, pablo.castells}@uam.es Universidad Autónoma de Madrid Escuela Politécnica Superior Introduction:

More information

Supervised Metric Learning with Generalization Guarantees

Supervised Metric Learning with Generalization Guarantees Supervised Metric Learning with Generalization Guarantees Aurélien Bellet Laboratoire Hubert Curien, Université de Saint-Etienne, Université de Lyon Reviewers: Pierre Dupont (UC Louvain) and Jose Oncina

More information

Vector Space Model. Yufei Tao KAIST. March 5, Y. Tao, March 5, 2013 Vector Space Model

Vector Space Model. Yufei Tao KAIST. March 5, Y. Tao, March 5, 2013 Vector Space Model Vector Space Model Yufei Tao KAIST March 5, 2013 In this lecture, we will study a problem that is (very) fundamental in information retrieval, and must be tackled by all search engines. Let S be a set

More information

Lecture 5: Web Searching using the SVD

Lecture 5: Web Searching using the SVD Lecture 5: Web Searching using the SVD Information Retrieval Over the last 2 years the number of internet users has grown exponentially with time; see Figure. Trying to extract information from this exponentially

More information

An Introduction to String Re-Writing Kernel

An Introduction to String Re-Writing Kernel An Introduction to String Re-Writing Kernel Fan Bu 1, Hang Li 2 and Xiaoyan Zhu 3 1,3 State Key Laboratory of Intelligent Technology and Systems 1,3 Tsinghua National Laboratory for Information Sci. and

More information

From ITDL to Place2Vec Reasoning About Place Type Similarity and Relatedness by Learning Embeddings From Augmented Spatial Contexts

From ITDL to Place2Vec Reasoning About Place Type Similarity and Relatedness by Learning Embeddings From Augmented Spatial Contexts From ITDL to Place2Vec Reasoning About Place Type Similarity and Relatedness by Learning Embeddings From Augmented Spatial Contexts ABSTRACT Bo Yan STKO Lab University of California, Santa Barbara boyan@geog.ucsb.edu

More information

13 Searching the Web with the SVD

13 Searching the Web with the SVD 13 Searching the Web with the SVD 13.1 Information retrieval Over the last 20 years the number of internet users has grown exponentially with time; see Figure 1. Trying to extract information from this

More information

Principal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014

Principal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014 Principal Component Analysis and Singular Value Decomposition Volker Tresp, Clemens Otte Summer 2014 1 Motivation So far we always argued for a high-dimensional feature space Still, in some cases it makes

More information

Natural Language Processing

Natural Language Processing David Packard, A Concordance to Livy (1968) Natural Language Processing Info 159/259 Lecture 8: Vector semantics and word embeddings (Sept 18, 2018) David Bamman, UC Berkeley 259 project proposal due 9/25

More information

Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of Data

Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of Data Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of Data Nikita Zhiltsov 1,2 Alexander Kotov 3 Fedor Nikolaev 3 1 Kazan Federal University 2 Textocat 3 Textual Data Analytics

More information

Computer science research seminar: VideoLectures.Net recommender system challenge: presentation of baseline solution

Computer science research seminar: VideoLectures.Net recommender system challenge: presentation of baseline solution Computer science research seminar: VideoLectures.Net recommender system challenge: presentation of baseline solution Nino Antulov-Fantulin 1, Mentors: Tomislav Šmuc 1 and Mile Šikić 2 3 1 Institute Rudjer

More information

Nearest Neighbor Search with Keywords

Nearest Neighbor Search with Keywords Nearest Neighbor Search with Keywords Yufei Tao KAIST June 3, 2013 In recent years, many search engines have started to support queries that combine keyword search with geography-related predicates (e.g.,

More information

ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation

ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation Chin-Yew Lin and Franz Josef Och Information Sciences Institute University of Southern California 4676 Admiralty Way

More information

Similarity for Conceptual Querying

Similarity for Conceptual Querying Similarity for Conceptual Querying Troels Andreasen, Henrik Bulskov, and Rasmus Knappe Department of Computer Science, Roskilde University, P.O. Box 260, DK-4000 Roskilde, Denmark {troels,bulskov,knappe}@ruc.dk

More information

Midterm Examination Practice

Midterm Examination Practice University of Illinois at Urbana-Champaign Midterm Examination Practice CS598CXZ Advanced Topics in Information Retrieval (Fall 2013) Professor ChengXiang Zhai 1. Basic IR evaluation measures: The following

More information