Semantic distances & LSA

Similar documents
Semantic Similarity and Relatedness

Calculating Semantic Relatedness with GermaNet

Part I: Web Structure Mining Chapter 1: Information Retrieval and Web Search

Latent Semantic Models. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

Latent semantic indexing

MATHEMATICAL AND EXPERIMENTAL INVESTIGATION OF ONTOLOGICAL SIMILARITY MEASURES AND THEIR USE IN BIOMEDICAL DOMAINS

WEST: WEIGHTED-EDGE BASED SIMILARITY MEASUREMENT TOOLS FOR WORD SEMANTICS

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Natural Language Processing. Topics in Information Retrieval. Updated 5/10

Make graph of g by adding c to the y-values. on the graph of f by c. multiplying the y-values. even-degree polynomial. graph goes up on both sides

More from Lesson 6 The Limit Definition of the Derivative and Rules for Finding Derivatives.

Notes on Latent Semantic Analysis

CS 3750 Advanced Machine Learning. Applications of SVD and PCA (LSA and Link analysis) Cem Akkaya

Ch.7 #4 7,11,12,18 21,24 27

Manning & Schuetze, FSNLP (c) 1999,2000

Basic Differentiation Rules and Rates of Change. The Constant Rule

Information Retrieval

Consideration on new theoretical solutions of special electric machines using specialized soft of electromagnetic field numerical analysis

LSI vs. Wordnet Ontology in Dimension Reduction for Information Retrieval

Linear Algebra Background

Code_Aster. Detection of the singularities and computation of a card of size of elements

PCA. Principle Components Analysis. Ron Parr CPS 271. Idea:

Latent Semantic Analysis. Hongning Wang

Cross-Lingual Language Modeling for Automatic Speech Recogntion

Number of wireless sensors needed to detect a wildfire

Automated Slogan Production Using a Genetic Algorithm

A Generalized Vector Space Model for Text Retrieval Based on Semantic Relatedness

Natural Language Processing

Code_Aster. Detection of the singularities and calculation of a map of size of elements

CE2253- APPLIED HYDRAULIC ENGINEERING (FOR IV SEMESTER)

Introduction to Mechanics Work and Energy

On colour-blind distinguishing colour pallets in regular graphs

Latent Semantic Indexing

The Principle of Least Action and Designing Fiber Optics

Multivariable Calculus: Chapter 13: Topic Guide and Formulas (pgs ) * line segment notation above a variable indicates vector

Semantic Similarity from Corpora - Latent Semantic Analysis

Manning & Schuetze, FSNLP, (c)

Econ 172A, Fall 2012: Final Examination Solutions (I) 1. The entries in the table below describe the costs associated with an assignment

DISTRIBUTIONAL SEMANTICS

Towards an Efficient Combination of Similarity Measures for Semantic Relation Extraction

Table of Common Derivatives By David Abraham

Expected Value of Partial Perfect Information

Latent Semantic Analysis. Hongning Wang

Problems. Looks for literal term matches. Problems:

Variable Latent Semantic Indexing

Econ 172A, Fall 2012: Final Examination Solutions (II) 1. The entries in the table below describe the costs associated with an assignment

An Approach for Design of Multi-element USBL Systems

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

Unit #4 - Inverse Trig, Interpreting Derivatives, Newton s Method

MATH2231-Differentiation (2)

Pure Further Mathematics 1. Revision Notes

Coarse to Fine Grained Sense Disambiguation in Wikipedia

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

Assignment 1. g i (x 1,..., x n ) dx i = 0. i=1

CS 572: Information Retrieval

One-dimensional I test and direction vector I test with array references by induction variable

Investigation of Latent Semantic Analysis for Clustering of Czech News Articles

Both the ASME B and the draft VDI/VDE 2617 have strengths and

Algorithmic Computation and Approximation of Semantic Similarity

Statics. There are four fundamental quantities which occur in mechanics:

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION

This section outlines the methodology used to calculate the wave load and wave wind load values.

Directions: This is a final exam review which covers all of the topics of the course. Please use this as a guide to assist you in your studies.

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation

Multi-View Clustering via Canonical Correlation Analysis

Physics 505 Electricity and Magnetism Fall 2003 Prof. G. Raithel. Problem Set 3. 2 (x x ) 2 + (y y ) 2 + (z + z ) 2

05 The Continuum Limit and the Wave Equation

Angles-Only Orbit Determination Copyright 2006 Michel Santos Page 1

A Hybrid Approach for Modeling High Dimensional Medical Data

Nuclear Physics and Astrophysics

BEYOND THE CONSTRUCTION OF OPTIMAL SWITCHING SURFACES FOR AUTONOMOUS HYBRID SYSTEMS. Mauro Boccadoro Magnus Egerstedt Paolo Valigi Yorai Wardi

Toponym Disambiguation using Ontology-based Semantic Similarity

Physics 41 Chapter 38 HW Serway 9 th Edition

3-D FEM Modeling of fiber/matrix interface debonding in UD composites including surface effects

PD Controller for Car-Following Models Based on Real Data

INVERSE PROBLEM OF A HYPERBOLIC EQUATION WITH AN INTEGRAL OVERDETERMINATION CONDITION

Econ 172A, Fall 2012: Final Examination (I) 1. The examination has seven questions. Answer them all.

Image Denoising Using Spatial Adaptive Thresholding

Wednesday 8 November 2017 Morning Time allowed: 1 hour 30 minutes

sampling, resulting in iscrete-time, iscrete-frequency functions, before they can be implemente in any igital system an be of practical use. A consier

Quantum Search on the Spatial Grid

Solution. ANSWERS - AP Physics Multiple Choice Practice Kinematics. Answer

Multi-View Clustering via Canonical Correlation Analysis

Applied Natural Language Processing

Probabilistic Latent Semantic Analysis

Classifying Biomedical Text Abstracts based on Hierarchical Concept Structure

Types of Motion. Photo of ball falling

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Fast image compression using matrix K-L transform

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

ATSC 5010 Physical Meteorology I Lab Lab 2 numerical integration; height/pressure relationships. à Begin with IDEAL GAS Law:

arxiv: v1 [math-ph] 5 May 2014

Shear stiffness of granular material at small strains: does it depend on grain size?

A Pairwise Document Analysis Approach for Monolingual Plagiarism Detection

Construction of thesaurus in the field of car patent Tingting Mao 1, Xueqiang Lv 1, Kehui Liu 2

Fill Removal Modeling

THE ACCURATE ELEMENT METHOD: A NEW PARADIGM FOR NUMERICAL SOLUTION OF ORDINARY DIFFERENTIAL EQUATIONS

FROM QUERIES TO TOP-K RESULTS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS


Physics 2212 K Quiz #2 Solutions Summer 2016

Transcription:

Semantic istances & LSA Stefan rausan-matu Politehnica" University of Bucharest an Romanian Acaemy Research Institute for Artificial Intelligence Bucharest, Romania stefan.trausan@cs.pub.ro http://www.racai.ro/~trausan 1

Lexical chains Venture capitalists have become culture heroes in the New Economy. hey're the visionary investors who sink millions of ollars into risky start-up companies, take seats on their boars to mentor the tech ners--an then cash out at a huge profit. he name "San Hill Roa," the street in Menlo Park where Kleiner Perkins an some of the other famous venture capitalists have their offices, has assume an almost magical quality, as if the partners were ispensing fairy ust rather than cash. 29-Apr-11 S. rausan-matu 2 2

Applications of lexical chains Verification of text cohesion ext segmentation Summarization Wor sense isambiguation Determination of iscourse structure Automatic hypertext generation Intelligent spelling checking Information retrieval 29-Apr-11 S. rausan-matu 3 3

Builing lexical chains ext scanning an etection of semantically relate wors small istance Constructing a set of potential lexical chains Computational problems 29-Apr-11 S. rausan-matu 4 4

Low semantic istance = high similarity bank money bank river apple fruit orange-fruit pen paper pen pencil men-crow re-blue man-human woman-human man-men men-women man-bike wheel bike sweet-bitter sweet-essert esert-storm esert-efect singer-song hacker-soft 29-Apr-11 S. rausan-matu 5 5

Closeness relations Synonym Hyponym/hypernim Meronym/Holonym Antonym Entailment ypicality 29-Apr-11 S. rausan-matu 6 6

High semantic istance bike-cat software-og og-amoeba iea-sleep runner-paint 29-Apr-11 S. rausan-matu 7 7

Semantic istance accoring to: Dictionaries (Kozima an Ferugori, Kozima an Ito) hesauri (e.g. Roget Morris an Hirst; Bunrui goihyo Japanese thesaurus Okamura an Hona) Semantic networks (e.g. MeSH Meical Subject Heaings Raa) Ontologies WorNet, FrameNet 29-Apr-11 S. rausan-matu 8 8

Chains of Wornet senses [venture capitalist(1), hero(1), visionary(1), investor(1), mentor(1), ner(1), profit(2), venture capitalist(1), quality(1), partner(1), fairy(1)] 29-Apr-11 S. rausan-matu 9 9

Raa et al. ist R (c1,c2)=min nr of eges between c1 an c2 29-Apr-11 S. rausan-matu 10 10

Hirst an St Onge Morris an Hirst applie for WorNet Directions: ownwar (cause, hyponym, holonym, entailment) upwar (hypernym, meronym) horizontal (similar, participle_of, see_also, antonyme, attribute) 29-Apr-11 S. rausan-matu 11 11

Hirst an St Onge Relations: extra-strong wor repetition strong same synset (esert efect) antonyms (col-hot) sub-phrase (school elementary school) meium-strength 29-Apr-11 S. rausan-matu 12 12

rel HS = Meium strength 3C, for extra-strong relations 2C, for strong relations C path_length (k * # changes_in_irection), for meium strength relations 0 otherwise 29-Apr-11 S. rausan-matu 13 13

Forbien sequences No other irection can precee an upwar irection Only one irection change is allowe Exception: upwar horizontal - ownwar 29-Apr-11 S. rausan-matu 14 14

Forbien sequences 29-Apr-11 S. rausan-matu 15 15

Allowe sequences 29-Apr-11 S. rausan-matu 16 16

Sussna Weight of ege epens on fanout nr: the number of arcs leaving c. w(c1 c2) = 2 1 / nr(c1) If c1 an c2 are ajacent: Dist S (c1,c2)=(w(c1 c2)+w(c2 c1)) / 2 = the epth of the ege in the ontology If not aiacent a istances to shortest paths 29-Apr-11 S. rausan-matu 17 17

Wu si Palmer c = lso(c1;c2), lso=lowest super orinate sim WP (c1,c2)=(2xn)/(n1+n2+2xn), Ni the path from ci to c N the path from c to the root ist WP (c1,c2)=(n1+n2)/(n1+n2+2xn), 29-Apr-11 S. rausan-matu 18 18

Leacock si Choorow sim LC (c1,c2)=-log(1+length(c1,c2))/(2xd)) length(c1,c2) is Raa s shortest path D is the height of the ontology 29-Apr-11 S. rausan-matu 19 19

Resnik he similarity of two concepts is their share information information content of the lowest superorinate Information content of c is given by -log p(c) the less frequent it is, the more information it contains. sim R (c1,c2) = -log p(lso(c1;c2)) 29-Apr-11 S. rausan-matu 20 20

Lin he similarity between arbitrary objects A an B is measure by the ratio between the amount of information neee to state their commonality an that neee to fully escribe what they are. 29-Apr-11 S. rausan-matu 21 21

Lin sim L (c1,c2)=2log p(lso(c1,c2))/ (log p(c1)+log p(c2)) 29-Apr-11 S. rausan-matu 22 22

Jiang an Conrath ist JC (c1,c2)=2log p(lso(c1,c2)) - (log p(c1)+log p(c2)) 29-Apr-11 S. rausan-matu 23 23

Problems in etecting lexical chains Uner-chaining the ifficulty of fining semantically relate wors Over-chaining wrong etermination of semantically unrelate wors 29-Apr-11 S. rausan-matu 24 24

Uner-chaining Name entitities recognition Venture capitalists have become culture heroes in the New Economy. hey're the visionary investors who sink millions of ollars into risky start-up companies, take seats on their boars to mentor the tech ners--an then cash out at a huge profit. he name "San Hill Roa," the street in Menlo Park where Kleiner Perkins an some of the other famous venture capitalists have their offices, has assume an almost magical quality, as if the partners were ispensing fairy ust rather than cash. 29-Apr-11 S. rausan-matu 25 25

Uner-chaining Coreference resolution Venture capitalists have become culture heroes in the New Economy. hey're the visionary investors who sink millions of ollars into risky start-up companies, take seats on their boars to mentor the tech ners--an then cash out at a huge profit. he name "San Hill Roa," the street in Menlo Park where Kleiner Perkins an some of the other famous venture capitalists have their offices, has assume an almost magical quality, as if the partners were ispensing fairy ust rather than cash. 29-Apr-11 S. rausan-matu 26 26

Over-chaining Venture capitalists have become culture heroes in the New Economy. hey're the visionary investors who sink millions of ollars into risky start-up companies, take seats on their boars to mentor the tech ners--an then cash out at a huge profit. he name "San Hill Roa," the street in Menlo Park where Kleiner Perkins an some of the other famous venture capitalists have their offices, has assume an almost magical quality, as if the partners were ispensing fairy ust rather than cash. 29-Apr-11 S. rausan-matu 27 27

Semantic spaces in Latent Semantic Inexing (LSI) 29-Apr-11 S. rausan-matu 28 28

Vector space moel 29-Apr-11 S. rausan-matu 29 29

he LSI iea Reucing the imensionality of the vector space, similarly to the least squares metho he effect is the creation of semantic spaces containing semantically relate wors http://lsa.colorao.eu 29-Apr-11 S. rausan-matu 30 30

31 29-Apr-11 S. rausan-matu 31 erms-ocuments array (ex. from Manning an Schutze, 1999) = 1 0 1 0 0 0 0 1 1 0 0 1 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0 1 cos 6 5 4 3 2 1 truck car moon astronaut monaut A

Singular value ecomposition (SVD) A = tx txn S nxn D xn n=min(t,) 29-Apr-11 S. rausan-matu 32 32

33 29-Apr-11 S. rausan-matu 33 = 0.09 0.16 0.61 0.73 0.25 im5 0.58 0.58 0.58 im 4 0.41 0.15 0.37 0.59 0.57 im3 0.65 0.35 0.51 0.33 0.30 im 2 0.26 0.70 0.48 0.13 0.44 im1 cos truck car moon astronaut monaut

34 29-Apr-11 S. rausan-matu 34 S = 0.39 1.00 1.28 1.59 2.16 S

35 29-Apr-11 S. rausan-matu 35 D = 0.22 0.41 0.19 0.63 0.29 0.53 im5 0.58 0.58 0.58 im 4 0.33 0.12 0.20 0.45 0.75 0.28 im3 0.41 0.22 0.63 0.19 0.53 0.29 im 2 0.12 0.33 0.45 0.20 0.28 0.75 im1 6 5 4 3 2 1 D

Properties of SVD SVD is unique, D are orthonormal: = S values are sorte D D = I 29-Apr-11 S. rausan-matu 36 36

Reuce A = A A ) 2 By SVD on maps the n-imension space on a k-imension one, with n >>k Common values for k are 100 an 150. 29-Apr-11 S. rausan-matu 37 37

B B = S 2x2D x 2 B = im1 im 2 1 2 3 4 5 1.62 0.60 0.04 0.97 0.71 0.46 0.84 0.30 1.00 0.35 6 0.26 0.65 29-Apr-11 S. rausan-matu 38 38

39 29-Apr-11 S. rausan-matu 39 Document correlation (Manning an Schutze, 1999) 1.00 0.74 0.93 0.87 0.54 0.10 1.00 0.94 0.32 0.16 0.74 1.00 0.62 0.18 0.47 1.00 0.88 0.40 1.00 0.78 1.00 6 5 4 3 2 1 6 5 4 3 2 1 B B SD SD DS DS SD DS SD SD A A = = = = = ) ( ) ( ) )( ( ) (

erm correlation AA = SD ( SD ) = SD DS = ( S)( S) 29-Apr-11 S. rausan-matu 40 40