Towards an Efficient Combination of Similarity Measures for Semantic Relation Extraction

Size: px

Start display at page:

Download "Towards an Efficient Combination of Similarity Measures for Semantic Relation Extraction"

Joy Atkinson
5 years ago
Views:

1 Towards an Efficient Combination of Similarity Measures for Semantic Relation Extraction Alexander Panchenko Université catholique de Louvain & Bauman Moscow State Technical University 5th December 2011 / CLAIM Seminar, BMSTU Alexander Panchenko 1/30

2 Plan 1 Introduction 2 Methodology 3 Evaluation 4 Results 5 Conclusion and Further Research Alexander Panchenko 2/30

3 Reference Papers Panchenko A. Method for Automatic Construction of Semantic Relations Between Concepts of an Information Retrieval Thesaurus. // In Herald of the Voronezh State University. Series Systems Analysis and Information Technologies, vol.2, pages , analiz&year=2010&num=02&f_name= Panchenko A. Comparison of the Knowledge-, Corpus-, and Web-based Similarity Measures for Semantic Relations Extraction // Proceedings of the GEMS 2011 Workshop on Geometrical Models of Natural Language Semantics, EMNLP 2011, pages 11-21, Panchenko A. Towards an Efficient Combination of Similarity Measures for Semantic Relation Extraction // Submitted to the Student Workshop of EACL Alexander Panchenko 3/30

4 Semantic Relations r = c i, t, c j semantic relation, where c i, c j C, t T C terms e.g. radio or receiver operating characteristic T semantic relation types, e.g. hyponymy or synonymy R C T C set of semantic relations Alexander Panchenko 4/30

5 Semantic Relations Example: Thesaurus Figure: A part of a the information retrieval thesaurus EuroVoc. Alexander Panchenko 5/30

6 Semantic Relations Example: Thesaurus Figure: A part of a the information retrieval thesaurus EuroVoc. R = energy-generating product, NT, energy industry energy technology, NT, energy industry petrolium, RT, fossil fuel energy technology, RT, oil technology... Alexander Panchenko 5/30

7 General Problem: Automatic Thesaurus Construction Figure: A technology of automatic thesaurus construction. How thesaurus is used? Query expansion and query suggestion Navigation and browsing on the corpus Visualization of the corpus Alexander Panchenko 6/30

8 The Problem Semantic Relations Extraction Input: terms C, semantic relation types T Ouput: lexico-semantic relations ^R R Alexander Panchenko 7/30

9 The Problem Semantic Relations Extraction Input: terms C, semantic relation types T Ouput: lexico-semantic relations ^R R Pattern-based relations extraction, where patterns are built manually (Hearst, 1992) or semi-automatically (Snow, 2004) (+) High precision ( ) Complexity and cost pattern construction ( ) Patterns are highly task and domain dependent Alexander Panchenko 7/30

10 The Problem Semantic Relations Extraction Input: terms C, semantic relation types T Ouput: lexico-semantic relations ^R R Pattern-based relations extraction, where patterns are built manually (Hearst, 1992) or semi-automatically (Snow, 2004) (+) High precision ( ) Complexity and cost pattern construction ( ) Patterns are highly task and domain dependent Similarity-based relation extraction (Philippovich and Prokhorov, 2002; Grefenstette, 1994; Curran and Moens, 2002) ( ) Less precise (+) Little or no manual work (+) More adaptive across domains Alexander Panchenko 7/30

11 Similarity-based Relation Extraction State of the Art: There exist many heterogeneous similarity measures based on corpus, knowledge, web, definitions, etc. Research Questions: Alexander Panchenko 8/30

12 Similarity-based Relation Extraction State of the Art: There exist many heterogeneous similarity measures based on corpus, knowledge, web, definitions, etc. Various measures provide complimentary types of semantic information. Research Questions: Alexander Panchenko 8/30

13 Similarity-based Relation Extraction State of the Art: There exist many heterogeneous similarity measures based on corpus, knowledge, web, definitions, etc. Various measures provide complimentary types of semantic information. This suggest their combination. Research Questions: Alexander Panchenko 8/30

14 Similarity-based Relation Extraction State of the Art: There exist many heterogeneous similarity measures based on corpus, knowledge, web, definitions, etc. Various measures provide complimentary types of semantic information. This suggest their combination. Research Questions: Which similarity measure is the best for relation extraction? Alexander Panchenko 8/30

15 Similarity-based Relation Extraction State of the Art: There exist many heterogeneous similarity measures based on corpus, knowledge, web, definitions, etc. Various measures provide complimentary types of semantic information. This suggest their combination. Research Questions: Which similarity measure is the best for relation extraction? How to efficiently combine similarity measures so as to improve relation extraction? Alexander Panchenko 8/30

16 The Key Contributions Up To Now A protocol for evaluation of the similarity-based relation extraction Comparison of 34 single measures Two methods of combination similarity and relation fusion Six best combinations outperforming single measures are found Alexander Panchenko 9/30

17 Similarity-based Semantic Relations Extraction Semantic Relations Extraction Algorithm Input: Terms C, Sim.parameters P, Threshold k, Min.similarity value γ Output: Semantic relations ^R (unlabeled) 1 S sim(c, P) ; 2 S normalize(s) ; 3 ^R threshold(s, k, γ) ; 4 return ^R ; Alexander Panchenko 10/30

18 Similarity-based Semantic Relations Extraction Semantic Relations Extraction Algorithm Input: Terms C, Sim.parameters P, Threshold k, Min.similarity value γ Output: Semantic relations ^R (unlabeled) 1 S sim(c, P) ; 2 S normalize(s) ; 3 ^R threshold(s, k, γ) ; 4 return ^R ; sim a similarity measure Alexander Panchenko 10/30

19 Similarity-based Semantic Relations Extraction Semantic Relations Extraction Algorithm Input: Terms C, Sim.parameters P, Threshold k, Min.similarity value γ Output: Semantic relations ^R (unlabeled) 1 S sim(c, P) ; 2 S normalize(s) ; 3 ^R threshold(s, k, γ) ; 4 return ^R ; sim a similarity measure normalize similarity score normalization Alexander Panchenko 10/30

20 Similarity-based Semantic Relations Extraction Semantic Relations Extraction Algorithm Input: Terms C, Sim.parameters P, Threshold k, Min.similarity value γ Output: Semantic relations ^R (unlabeled) 1 S sim(c, P) ; 2 S normalize(s) ; 3 ^R threshold(s, k, γ) ; 4 return ^R ; sim a similarity measure normalize similarity score normalization threshold knn thresholding R = C i=1 { c i, t, c j : c j top k% terms s ij γ}. Alexander Panchenko 10/30

21 Knowledge-based Measures (6) Data: semantic network WordNet 3.0, corpus SemCor. Alexander Panchenko 11/30

22 Knowledge-based Measures (6) Data: semantic network WordNet 3.0, corpus SemCor. Variables: len(c i, c j ) length of the shortest path between terms c i and c j len(c i, lcs(c i, c j )) length of the shortest path from c i to the lowest common subsumer (LCS) of c i and c j len(c root, lcs(c i, c j )) length of the shortest path from the root term c root to the LCS of c i and c j P(c) probability of the term c, estimated from a corpus P(lcs(c i, c j )) probability of the LCS of c i and c j Alexander Panchenko 11/30

23 Knowledge-based Measures (6) Data: semantic network WordNet 3.0, corpus SemCor. Variables: len(c i, c j ) length of the shortest path between terms c i and c j len(c i, lcs(c i, c j )) length of the shortest path from c i to the lowest common subsumer (LCS) of c i and c j len(c root, lcs(c i, c j )) length of the shortest path from the root term c root to the LCS of c i and c j P(c) probability of the term c, estimated from a corpus P(lcs(c i, c j )) probability of the LCS of c i and c j Measures: Inverted Edge Count (Jurafsky and Martin, 2009), Leacock-Chodorow (1998), Wu-Palmer (1994), Resnik (1995), Jiang-Conrath (1997), Lin (1998). Alexander Panchenko 11/30

24 Web-based Measures (9) Data: number of the hits returned by an information retrieval system (GOOGLE, YAHOO, YAHOO BOSS, BING). Alexander Panchenko 12/30

25 Web-based Measures (9) Data: number of the hits returned by an information retrieval system (GOOGLE, YAHOO, YAHOO BOSS, BING). Variables: h i number of hits returned by query "c i " h ij number of hits returned by the query "c i AND c j " Alexander Panchenko 12/30

26 Web-based Measures (9) Data: number of the hits returned by an information retrieval system (GOOGLE, YAHOO, YAHOO BOSS, BING). Variables: h i number of hits returned by query "c i " h ij number of hits returned by the query "c i AND c j " Measures: NGD (Cilibrasi and Vitanyi, 2007) PMI-IR (Turney, 2001) Alexander Panchenko 12/30

27 Corpus-based Measures (13) Data: corpus WACYPEDIA (800M tokens) and UKWAC (2000M) Alexander Panchenko 13/30

28 Corpus-based Measures (13) Data: corpus WACYPEDIA (800M tokens) and UKWAC (2000M) Variables: f i context window feature vector of term c i f s i syntactic feature vector of c i Alexander Panchenko 13/30

29 Corpus-based Measures (13) Data: corpus WACYPEDIA (800M tokens) and UKWAC (2000M) Variables: f i context window feature vector of term c i f s i syntactic feature vector of c i Measures: BDA (Sahlgren, 2006) SDA (Curran, 2003) LSA on the TASA corpus (Landauer and Dumais, 1997) NGD and PMI-IR on the Factiva corpus (Veksler et al., 2008). Alexander Panchenko 13/30

30 Corpus-based Measures: Distributional Analysis Distributional Similarity Measure Input: Terms C, Corpus D, Number of features β, Min.term frequency θ, Feature matrix construction param. P Output: Similarity matrix, S [C C] 1 F construct_fmatrix(c, D, β, θ, P) ; 2 F pmi(f) ; 3 S cos(f) ; 4 return S ; PMI normalization f ij = log P(c i,f j ) P(c i )P(f j ) = log Cosine similarity: s ij = cos(c i, c j ) = f i f j f i f j f ij n(c i ) i f ij Alexander Panchenko 14/30

31 Definition-based Measures (6) Data: definitions from WordNet, Wikipedia, and Wiktionary. Alexander Panchenko 15/30

32 Definition-based Measures (6) Data: definitions from WordNet, Wikipedia, and Wiktionary. Variables: gloss(c) definition of the term sim(gloss(c i ), gloss(c j )) similarity of terms glosses f i context vector of c i, calculated on the corpus of all glosses f i bag-of-words vector, derived from the definition of c i exist(c i, c j ) a relation between c i and c j in the dictionary Alexander Panchenko 15/30

33 Definition-based Measures (6) Data: definitions from WordNet, Wikipedia, and Wiktionary. Variables: gloss(c) definition of the term sim(gloss(c i ), gloss(c j )) similarity of terms glosses f i context vector of c i, calculated on the corpus of all glosses f i bag-of-words vector, derived from the definition of c i exist(c i, c j ) a relation between c i and c j in the dictionary Measures: BDA using Wiktionary and Wikipedia Extended Lesk using Wordnet (Banerjee and Pedersen, 2003) Gloss Vectors using Wordnet (Patwardhan and Pedersen, 2006) Alexander Panchenko 15/30

34 Definition-based Measures Wiktionary-based Similarity Measure Input: Terms C, UseWikipedia, Number of features β Output: Similarity matrix, S [C C] 1 D get_wiktionary_definitions(c) ; 2 if UseWikipedia then 3 D D get_wikipedia_definitions(c) 4 F construct_fmatrix(c, D, β) ; 5 F pmi(f) ; 6 S cos(f) ; 7 S update_similarity(s) ; 8 return S ; Alexander Panchenko 16/30

35 Combined Measures Similarity Fusion: S cmb = 1 N N i=1 S i Relation Fusion: Relation fusion measure Input: Sim.matrices produced by N measures {S 1,..., S N }, knn threshold k Output: Combined similarity matrix, S cmb 1 for i=1,n do 2 R i threshold(s i, k, γ = 0) R i relation_matrix(r i ) 3 S cmb 1 N N i=1 R i ; 4 return S cmb ; { 1 if ci, t, c r ij = j R k 0 else Alexander Panchenko 17/30

36 Combined Measures Which of the 34 single measures should we combine? We present combinations of three groups of measures: Group4 = WN-Resnik, BDA , SDA , Def-WktWiki-1000 Group8 = Group4 + WN-WuPalmer, LSA-Tasa, Def-GlossVec., and Def-Ext.Les Group14 = Group8 + WN-LeacockChodorow, WN-Lin, WN-JiangConrath, NGD-Factiva, NGD-Yahoo, and NGD-GoogleWiki. Alexander Panchenko 18/30

37 Evaluation with Human Judgments term, c i term, c j human sim., s sim., s human rank, r sim.rank, ^r tiger cat book paper computer keyboard possibility girl sugar approach Alexander Panchenko 19/30

38 Evaluation with Human Judgments term, c i term, c j human sim., s sim., s human rank, r sim.rank, ^r tiger cat book paper computer keyboard possibility girl sugar approach Human judgments datasets: WordSim353 (Finkelstein, 2002) 353 pairs Miller Charles (1991) 30 pairs Rubenstein Goodenough (1965) 65 pairs Alexander Panchenko 19/30

39 Evaluation with Human Judgments term, c i term, c j human sim., s sim., s human rank, r sim.rank, ^r tiger cat book paper computer keyboard possibility girl sugar approach Human judgments datasets: WordSim353 (Finkelstein, 2002) 353 pairs Miller Charles (1991) 30 pairs Rubenstein Goodenough (1965) 65 pairs Person s correlation: ρ = cov(s,^s) σ(s)σ(^s) Spearman s correlation: r = cov(r,^r) σ(r)σ(^r) Alexander Panchenko 19/30

40 Evaluation with Semantic Relations target term, c i relatum term, c j relation type, t judge adjudicate syn judge arbitrate syn judge asessor syn judge chancellor syn judge gendarmerie syn judge sheriff syn judge pc random judge fare random judge lemon random Number of correct and random relations is equal for each target term! Semantic Relations Datasets: BLESS (Baroni and Lenci, 2011) relations (hyper, coord, mero, event, attri, random) SN (Panchenko,?) relations (syn, random) Alexander Panchenko 20/30

41 Evaluation with Semantic Relations Let R all semantic relations, which are not random ^R extracted relations k knn threshold Evaluation Metrics Precision = R ^R ^R Recall = R ^R R F1 = 2 Precision Recall Precision+Recall MAP(M) = 1 M M k=1 Precision(k). Alexander Panchenko 21/30

42 Example: Evaluation with Semantic Relations Precision(50%) = target word relatum word relation type sim aficionado enthusiast syn aficionado fan syn aficionado admirer syn aficionado addict syn aficionado devotee syn aficionado foundling random aficionado fanatic syn aficionado adherent syn aficionado capital random aficionado statute random aficionado blot random aficionado meddler random aficionado enlargement random aficionado bawdyhouse random Alexander Panchenko 22/30

43 Results on the Human Judgements Datasets Alexander Panchenko 23/30

44 Results on the Semantic Relations Datasets Alexander Panchenko 24/30

45 Precision-Recall Curves Figure: PR graphs of (on the left) the best single and combined measures; (on the right) Wiktionary measures. Alexander Panchenko 25/30

46 Precision-Recall Curves Figure: PR graph of four combined measures. Alexander Panchenko 26/30

47 Conclusion: The best single measures: Wordnet-based measure WN-Resnik Bag-of-word distributional measure BDA Syntactic distributional measure SDA Wiktionary measure Def-WktWiki-1000 The best combined measure: Relation fusion of 8 measures Comb-Rel-810 Very close to combined measures using 14 measures Alexander Panchenko 27/30

48 Further Research: More Sophisticated Combination Methods: Unsupervised feature combination Bag-of-word features of Distributional Analysis + Wikipedia/Wiktionary/Wordnet definitions Feature tensor: jointly co-occuring DA features, tensor decompositions for better fusion Similarity tensor: yet another similarity fusion technique Alexander Panchenko 28/30

49 Further Research: More Sophisticated Combination Methods: Unsupervised feature combination Bag-of-word features of Distributional Analysis + Wikipedia/Wiktionary/Wordnet definitions Feature tensor: jointly co-occuring DA features, tensor decompositions for better fusion Similarity tensor: yet another similarity fusion technique Supervised linear combination of pairwise similarities Alexander Panchenko 28/30

50 Further Research: More Sophisticated Combination Methods: Unsupervised feature combination Bag-of-word features of Distributional Analysis + Wikipedia/Wiktionary/Wordnet definitions Feature tensor: jointly co-occuring DA features, tensor decompositions for better fusion Similarity tensor: yet another similarity fusion technique Supervised linear combination of pairwise similarities Supervised linear combination of features used by single measures Alexander Panchenko 28/30

51 Further Research: Evaluation Domain-specific terms and relations Agrovoc, MeSH, etc. An application-based evaluation query expansion Alexander Panchenko 29/30

52 Further Research: Evaluation Domain-specific terms and relations Agrovoc, MeSH, etc. An application-based evaluation query expansion Methods Corpus-based:DA with n-grams, surface patterns, LSA, LDA, syntactic tree kernels Web-based: more experiments with Google hits Knowledge-based: SimRank, random walks and the like on the Wikipedia/Wiktionary/Wordnet category lattice Surface-based: edit distance, longest common substring etc. Alexander Panchenko 29/30

53 Further Research: Evaluation Domain-specific terms and relations Agrovoc, MeSH, etc. An application-based evaluation query expansion Methods Corpus-based:DA with n-grams, surface patterns, LSA, LDA, syntactic tree kernels Web-based: more experiments with Google hits Knowledge-based: SimRank, random walks and the like on the Wikipedia/Wiktionary/Wordnet category lattice Surface-based: edit distance, longest common substring etc. Relation types: supervised model trained on a set of hyponyms, synonyms, etc. Alexander Panchenko 29/30

54 Questions Thank you! Questions? Alexander Panchenko 30/30

Semantic Similarity and Relatedness

Semantic Relatedness Semantic Similarity and Relatedness (Based on Budanitsky, Hirst 2006 and Chapter 20 of Jurafsky/Martin 2 nd. Ed. - Most figures taken from either source.) Many applications require