ON SEMANTIC CONCEPT SIMILARITY METHODS

Similar documents
Study on the Classification and Stability of Industry-University- Research Symbiosis Phenomenon: Based on the Logistic Model

Extinction Ratio and Power Penalty

E F. and H v. or A r and F r are dual of each other.

Overview. 1 Recall: continuous-time Markov chains. 2 Transient distribution. 3 Uniformization. 4 Strong and weak bisimulation

COMPSCI 230 Discrete Math Trees March 21, / 22

Combining Subword and State-level Dissimilarity Measures for Improved Spoken Term Detection in NTCIR-11 SpokenQuery&Doc Task

Hydrogen atom. Energy levels and wave functions Orbital momentum, electron spin and nuclear spin Fine and hyperfine interaction Hydrogen orbitals

8 - GRAVITATION Page 1

GAUSS PLANETARY EQUATIONS IN A NON-SINGULAR GRAVITATIONAL POTENTIAL

A STUDY OF PROPERTIES OF SOFT SET AND ITS APPLICATIONS

GRAVITATION. (d) If a spring balance having frequency f is taken on moon (having g = g / 6) it will have a frequency of (a) 6f (b) f / 6

SUPPLEMENTARY INFORMATION

The angle between L and the z-axis is found from

Physics 111. Lecture 38 (Walker: ) Phase Change Latent Heat. May 6, The Three Basic Phases of Matter. Solid Liquid Gas

FREQUENCY DETECTION METHOD BASED ON RECURSIVE DFT ALGORITHM

(, ) which is a positively sloping curve showing (Y,r) for which the money market is in equilibrium. The P = (1.4)

The theory of electromagnetic field motion. 6. Electron

arxiv: v1 [cond-mat.stat-mech] 27 Aug 2015

CBSE-XII-2013 EXAMINATION (MATHEMATICS) The value of determinant of skew symmetric matrix of odd order is always equal to zero.

An Elementary Approach to a Model Problem of Lagerstrom

STATISTICAL MECHANICS OF DIATOMIC GASES

Theoretical Extension and Experimental Verification of a Frequency-Domain Recursive Approach to Ultrasonic Waves in Multilayered Media

International Journal of Industrial Engineering Computations

GRAVITATION 4) R. max. 2 ..(1) ...(2)

Solid state physics. Lecture 3: chemical bonding. Prof. Dr. U. Pietsch

Finite Element Analysis of Adhesive Steel Bar in Concrete under Tensile Load

Loss factor for a clamped edge circular plate subjected to an eccentric loading

Lecture 3.2: Cosets. Matthew Macauley. Department of Mathematical Sciences Clemson University

ADDITIVE INTEGRAL FUNCTIONS IN VALUED FIELDS. Ghiocel Groza*, S. M. Ali Khan** 1. Introduction

Aakash. For Class XII Studying / Passed Students. Physics, Chemistry & Mathematics

Knowledge Creation with Parallel Teams: Design of Incentives and the Role of Collaboration

Investigation Effect of Outage Line on the Transmission Line for Karbalaa-132Kv Zone in Iraqi Network

Chapter 4: Algebra and group presentations

5.61 Fall 2007 Lecture #2 page 1. The DEMISE of CLASSICAL PHYSICS

Estimation of a Random Variable

Extensive Form Games with Incomplete Information. Microeconomics II Signaling. Signaling Examples. Signaling Games

217Plus TM Integrated Circuit Failure Rate Models

Electron spin resonance

Near Space Hypersonic Unmanned Aerial Vehicle Dynamic Surface Backstepping Control Design

Fourier transforms (Chapter 15) Fourier integrals are generalizations of Fourier series. The series representation

Mechanism Analysis of Dynamic Compaction based on Large Deformation

Search sequence databases 3 10/25/2016

Using the Hubble Telescope to Determine the Split of a Cosmological Object s Redshift into its Gravitational and Distance Parts

Coverage and Rate in Cellular Networks with Multi-User Spatial Multiplexing

Keywords: Auxiliary variable, Bias, Exponential estimator, Mean Squared Error, Precision.

Mathematical Model for Expediting the Execution of Projects under Uncertainty

Chapter 5. Control of a Unified Voltage Controller. 5.1 Introduction

Geometrical Analysis of the Worm-Spiral Wheel Frontal Gear

Mid Year Examination F.4 Mathematics Module 1 (Calculus & Statistics) Suggested Solutions

Green Dyadic for the Proca Fields. Paul Dragulin and P. T. Leung ( 梁培德 )*

Q Q N, V, e, Quantum Statistics for Ideal Gas. The Canonical Ensemble 10/12/2009. Physics 4362, Lecture #19. Dr. Peter Kroll

Centralized Multi-Node Repair in Distributed Storage

Q Q N, V, e, Quantum Statistics for Ideal Gas and Black Body Radiation. The Canonical Ensemble

CDS 101/110: Lecture 7.1 Loop Analysis of Feedback Systems

Sources. My Friends, the above placed Intro was given at ANTENTOP to Antennas Lectures.

Lecture 2: Frequency domain analysis, Phasors. Announcements

A Study of Generalized Thermoelastic Interaction in an Infinite Fibre-Reinforced Anisotropic Plate Containing a Circular Hole

arxiv: v1 [gr-qc] 26 Jul 2015

How!do!humans!combine!sounds!into!an! infinite!number!of!utterances? How!do!they!use!these!utterances!!to! communicate!and!express!meaning?

Swarm Intelligence Based Controller for Electric Machines and Hybrid Electric Vehicles Applications

II.3. DETERMINATION OF THE ELECTRON SPECIFIC CHARGE BY MEANS OF THE MAGNETRON METHOD

SYSTEMS ENGINEERING ANALYSIS OF A TRMM-LIKE RETRIEVAL ALGORITHM: IMPLICATION FOR GPM DESIGN

Physics 202, Lecture 5. Today s Topics. Announcements: Homework #3 on WebAssign by tonight Due (with Homework #2) on 9/24, 10 PM

XFlow: Internet-Scale Extensible Stream Processing

CHAPTER 5 CIRCULAR MOTION

Physics 240: Worksheet 15 Name

PH672 WINTER Problem Set #1. Hint: The tight-binding band function for an fcc crystal is [ ] (a) The tight-binding Hamiltonian (8.

A New Vision for Design of Steel Transmission Line Structures by Reliability Method

Bodo Pareigis. Abstract. category. Because of their noncommutativity quantum groups do not have this

Theoretical Study of Electromagnetic Wave Propagation: Gaussian Bean Method

What Makes Production System Design Hard?

1. Radiation from an infinitesimal dipole (current element).

A Heuristic Approach to Detect Feature Interactions in Requirements

EXST Regression Techniques Page 1

NEWTON S THEORY OF GRAVITY

Sundials and Linear Algebra

Brushless Doubly-Fed Induction Machines: Torque Ripple

Inertia identification based on adaptive interconnected Observer. of Permanent Magnet Synchronous Motor

Midterm Exam. CS/ECE 181B Intro to Computer Vision. February 13, :30-4:45pm

Basic Polyhedral theory

Logical Topology Design for WDM Networks Using Survivable Routing

A Self-Tuning Proportional-Integral-Derivative Controller for an Autonomous Underwater Vehicle, Based On Taguchi Method

Galaxy Photometry. Recalling the relationship between flux and luminosity, Flux = brightness becomes

2 MARTIN GAVALE, GÜNTER ROTE R Ψ 2 3 I 4 6 R Ψ 5 Figu I ff R Ψ lcm(3; 4; 6) = 12. Th componnts a non-compaabl, in th sns of th o

Bohr model and dimensional scaling analysis of atoms and molecules

Propositional Logic. Combinatorial Problem Solving (CPS) Albert Oliveras Enric Rodríguez-Carbonell. May 17, 2018

Advanced School on Synchrotron and Free Electron Laser Sources and their Multidisciplinary Applications

Pricing decision problem in dual-channel supply chain based on experts belief degrees

While flying from hot to cold, or high to low, watch out below!

Chapter 7 Dynamic stability analysis I Equations of motion and estimation of stability derivatives - 4 Lecture 25 Topics

Efficient Pruning of Large Knowledge Graphs

12 The Open Economy Revisited

A Comparative Study and Analysis of an Optimized Control Strategy for the Toyota Hybrid System

COHORT MBA. Exponential function. MATH review (part2) by Lucian Mitroiu. The LOG and EXP functions. Properties: e e. lim.

CHAPTER 5 CIRCULAR MOTION AND GRAVITATION

EE243 Advanced Electromagnetic Theory Lec # 22 Scattering and Diffraction. Reading: Jackson Chapter 10.1, 10.3, lite on both 10.2 and 10.

Frictional effects, vortex spin-down

Study on the Static Load Capacity and Synthetic Vector Direct Torque Control of Brushless Doubly Fed Machines

Collective Focusing of a Neutralized Intense Ion Beam Propagating Along a Weak Solenodial Magnetic Field

Transcription:

4 ON SEMANTIC CONCEPT SIMILARITY METHODS Lu Yang*, Vinda Bhavsa* and Haold Boly** *Faculty of Comput Scinc, Univsity of Nw Bunswick Fdicton, NB, E3B 5A3, Canada **Institut fo Infomation Tchnology, National Rsach Council Canada, Fdicton, NB, E3B 9W4, Canada {lyang, bhavsa} AT unb.ca, haold.boly AT nc.gc.ca ABSTRACT Smantic matching is impotant in sach aas including databass, -Businss, cas-basd asoning (CBR), infomation tival, infomation intgation, wb svics and natual languag pocssing. In paticula, smantic concpt matching using taxonomis is an impotant tchniqu in ths aas. W psnt a suvy of smantic concpt similaity algoithms that can b usd fo concpt matching in taxonomis. W also popos a nw appoach that povids a fin concpt ganulaity masu. Kywods: smantic matching, concpt similaity, taxonomy, dpth scaling, shotst path lngth, infomation contnt. INTRODUCTION Syntactic matching algoithms match stings without considing any domain knowldg. Howv, th smantics of wods/stings vay basd on thi contxts. Fo xampl, th stings woodn chai and committ chai us th sam wod chai but it has totally diffnt smantics. Syntactic matching cannot distinguish such smantic diffncs and may lad to inaccuat o vn wong matchs in documnt tival, quy pocssing systms, and infomation intgation systms. Thfo, smantic matching tchniqus, such as schma matching and concpt matching in a taxonomy/ontology a quid. This pap focuss on concpt matching algoithms fo taxonomis. A taxonomy stuctus a body of knowldg into classs (concpts) and dfins lationships among thm, usually th subsumption (subclass-of) lationship []. A taxonomy is oftn psntd as a t o a dictd acyclic gaph (DAG). Each nod in a taxonomy t (DAG) is a concpt that might hav supconcpts and subconcpts. Concpt similaity algoithms that aim to match concpts in a taxonomy can b dividd into two catgois: (a) th dg-basd appoachs that consid th shotst path lngth [2] btwn two concpts, and (b) th nod-basd appoachs that mak us of th nod infomation contnt [3]. Combinations of ths two appoachs hav sultd in btt matchs, fo xampl th wok potd in [4] [5]. Enhancmnts to th two basic appoachs tak into account vaious factos such as link typs, link stngths, dg wights [6] [7], local dnsitis [5] and nod dpths [4]. W impov on th pvious dpth scaling mthods by a fin ganulaity similaity masu of two concpts. W dfin th concpt similaity in a taxonomy as follows. Dfinition. Givn a taxonomy T and a st C that contains all of th concpts in T, th similaity sim(c, c 2 ) of two concpts c C and c 2 C is a mapping sim: C [0.0,.0] (al intval). Although th a sval possibl link typs in a taxonomy, such as subclass-of/subsums, patof/has-pat, typ-of/has-typ, and instanc of/has instanc, th subclass-of lationship is th most impotant on among thm. This is th ason why many sachs pay mo attntion to th taxonomy that only has th subclass-of typ of links. To mak xpimntal sults compaabl, many of th sachs us WodNt [8] as thi smantic knowldg bas. WodNt is th poduct of a sach pojct at Pincton Univsity which has attmptd to modl th lxical knowldg of a nativ spak of English. Th infomation in WodNt is oganizd aound logical goupings calld synsts. Each synst consists of synonymous wod foms and smantic points that dscib lationships btwn th cunt synst and oth synsts.

On Smantic Concpt Similaity Mthods - Vinda Bhavsa 5 Sinc th distancs btwn abitay pais of adjacnt concpts a not ncssaily qual, dgwight assignmnt algoithms hav bn poposd to val th association dg of pant-child concpts [4] [6]. Sction 2 psnts pvious concpt similaity algoithms using taxonomis. Th dg-wight assignmnt is discussd in Sction 3. W psnt ou impovmnt on th lativ-dpth scaling in Sction 4. Finally, w giv th conclusion in Sction 5. 2. CONCEPT SIMILARITY ALGORITHMS Th smantic similaity of a pai of concpts in a taxonomy is computd considing th path lngths, dg wights, nod infomation contnts, and link typs. Th dg-basd appoachs a basd on th shotst path lngth of two concpts. Th nodbasd appoachs xplo th infomation contnt of nods to comput th concpt similaity. Th dgs in a taxonomy might hav abitay wights du to vaious factos such as thi local dnsitis, link typs, nod dpths, and link stngths. Ths two appoachs a bifly viwd in subsctions 2. and 2.2. A tchniqu by Li [5] that combins th two appoachs is dscibd in subsction 2.3. 2.. Edg-Basd Appoachs Edg counting is an intuitiv and natual way to masu th smantic similaity of nods which cospond to concpts in a taxonomy. Th shot th path lngth btwn thm, th mo simila thy a. Fo a taxonomy fo biomdical domain, Rada t al. [2] poposd that th concpt distanc btwn nods is a mtic which satisfis th zo popty, symmty popty, and tiangl inquality. Th concpt distanc is calculatd by counting th minimum numb of dgs spaating two concpts. It is fomally dfind as follows: Distanc (A, B) = minimum numb of dgs spaating a and b () wh A and B a two concpts psntd by th nods a and b in an is-a smantic nt. Th similaity masu intoducd by Lacok and Chodoow [9] nomalizs th shotst path lngth of two concpts by th maximum dpth in th taxonomy. lngth sim( c, c2) log 2 D In quation (2), lngth psnts th shotst path lngth btwn two concpts and D stands fo th maximum dpth of th taxonomy. Not that in this algoithm pais of concpts hav th sam similaity valu as long as thi shotst path lngths a idntical. Wu and Palm [0] dtmin th concpt similaity by th stuctual lation of two concpts and thi last common subsum (LCS): 2 N3 sim( c, c2) N N2 2 N3 Figu illustats th symbols of quation (3) [0]. H c 3 is th LCS of c and c 2. N and N 2 psnt th numbs of nods on th path fom c to c 3 and c 2 to c 3, spctivly. N 3 is th numb of nods on th path fom c 3 to th taxonomy oot. Accoding to quation (3) and Figu, th similaity of c and c 2 bcoms small and small whn c and c 2 a dp and dp. This implis that fo a st of concpts that sha th sam LCS, th mo spcific (i.. d th two concpts in th taxonomy, th lss thi similaity is. N 3 R o o t c 3 N N 2 c c 2 Figu. Stuctual lation of concpts [0] (2) (3) Th abov mntiond th appoachs do not diffntiat th dg wights in a taxonomy. Th oth two appoachs givn in [4] [6], which a also basd on th shotst path, povid dgwight assignmnt statgis and sum up th wights on th shotst path to obtain th similaity/distanc. Two tchniqus fo wight assignmnt a discussd in Sction 3. Th dg-basd appoachs assum that th links in a taxonomy psnt unifom distancs [3] []. Howv, it is common that a link may hav much dns sub-taxonomis than oth pats of th

6 4 th Intnational Confnc Infomation & Communication Tchnology and Systm taxonomy. An impovmnt that taks into account this possibility is illustatd in th nxt subsction. 2.2. Nod-Basd Appoachs Th ida nod infomation contnt poposd by Rsnik [3] [2] foms th basis of many nod-basd concpt similaity algoithms. Rsnik [3] poposd that th concpt similaity is th xtnt to which concpts sha infomation in common. Th common infomation of two concpts is caid by thi supconcpts subsuming both of thm. Each concpt in a taxonomy is associatd with a pobability p(c) which fs to th pobability of ncounting an instanc of concpt c. Th pobability is quantifid by th ngativ logaithm, -log p(c). Obviously, th high a concpt in a taxonomy, th high its pobability valu is. Th quantifid infomation contnt dcass as th pobability incass. Thus, th mo abstact a concpt in a taxonomy, th lss its infomativnss is. Th similaity of two concpts is dfind in th following quation [3]: sim( c, c2) max [ logp( c)] c S( c, c ) 2 wh S(c, c 2 ) is th st of concpts that subsum both c and c 2. Accoding to quation (4), fo a pai of concpts that may hav multipl subsums, w slct th on having th highst infomation contnt as thi similaity. In this abov appoach, any pais of concpts hav an idntical similaity valu as long as thy hav th sam last common subsum. This appoach also ovlooks th stuctu infomation of taxonomis and thus sultd in coas similaity valus as suggstd by Maguitman t al. [3]. Lin has poposd a similaity masu simila to Rsnik s masu [4]. This appoach assums that two concpts a indpndnt and uss th sum of thi infomation contnt (IC) is usd to nomaliz th infomation contnt of thi last common subsum. Th following quation givs th fomal dfinition of similaity: 2 ( ), 2 ) IC LCS sim( c c IC( c ) IC( c2 ) (4) (5) wh IC is th infomation contnt of a concpt. Equation (5) has th sam implication as quation (3): fo a givn concpt c that is th LCS of concpts c and c 2, th bigg th valus of IC(c ) and IC(c 2 ), th dp thy a in a taxonomy, and thus th small thi similaity is. Th mthod psntd by Jian and Conath [4] uss a combination of th dg-basd and nodbasd appoachs considing local dnsity, nod dpth, infomation contnt and link typ. Th mthod can b simplifid as a nod-basd appoach whn oth factos a ovlookd xcpt th infomation contnt. Th simplifid similaity masu is dtmind as sim( c, c2 ) IC( c ) IC( c2 ) 2 IC( LCS) 2.3. Combination of Edg-Basd and Nod- Basd Appoachs Th concpt similaity algoithm that has a btt pfomanc is poposd by Li t al. [5]. Th ida of this algoithm is that concpts at upp lays of th taxonomy hav mo gnal smantics and lss similaity btwn thm, whil concpts at low lays hav mo conct smantics and high similaity. Thus, th dpth of th concpts is takn into account fo th concpt similaity computation. This ida contadicts with th idas of [0] and [4]. Li t al. [5] also agud that th pvious algoithms ith dictly us infomation soucs as a function of similaity o us a paticula infomation souc without considing th contibution of oths. Li t al. [5] poposd th non-lina (viz. xponntial) functions to tansfom th possibl infinit infomation soucs (.g. infinit dpth and path lngth) to a valu within [0, ]. Although this algoithm tnds to match wods containd in synsts in th WodNt, th following similaity masu can b usd fo concpt matching: sim ( c, c2 ) f ( f ( l), f2 ( h), f3 ( d )) wh f (l), f 2 (h) and f 3 (d) a th functions of th shotst path lngth l, th dpth of th last common subsum h, and th local dnsity d of c and c 2, spctivly. Ths th functions a dfind as f ( l) l wh α > 0. h h f2 ( h) h h (6) (7) (8) (9)

On Smantic Concpt Similaity Mthods - Vinda Bhavsa 7 wh β > 0 is a smoothing facto and whn β appoachs infinity, th dpth of th concpts is not considd. d d f3 ( d) d d wh d is th similaity of two concpts computd by th algoithm of [3] and λ > 0 and whn λ appoachs infinity, th infomation contnt of th concpts is not considd. Li t al. [5] xpimntd with vy combination of th abov th functions quations (8)-(0), and pvious dg-basd as wll as nodbasd appoachs and concludd that th following combination obtains th bst sults: h h f f l f h l ( ) 2 ( ) h h (0) () This confimd th assumption of Li t al. [5] that human similaity judgmnt is a typ of nonlina pocss ov infomation soucs. Mill and Chals [8] gav 30 pais of wods to 38 human subjcts to ank thi similaity valus anging fom 0 to 4. Oth sachs [5] compad th xpimntal sults of thi algoithms with Mill-Chals s human-basd xpimnt to valuat thi algoithm pfomanc. Th high th colation valu btwn thm, th btt th pfomanc is. In Tabl, w giv an nhancd summay of th pfomanc of th algoithms in litatu ov th on givn in [5]. Tabl. Expimntal sults of th concpt similaity algoithms Similaity Algoithm Colation Rsnik plication to Mill- 0.9583 Chals [2] Combination of dg- and nodbasd and consids concpt 0.894 dpth and non-lina tansfomation [5] Combination of dg- and nodbasd and consids dpth, local 0.8484 dnsity, tc. [4] Nod-basd appoach [4] built 0.823 on [3] Nod-basd infomation contnt 0.745 [3] Edg-basd shotst path lngth 0.664 [2] 3. EDGE-WEIGHT ASSIGNMENT IN A TAXONOMY Addition of wights to th dgs in a taxonomy can impov th pfomanc of th concpt similaity masus. This is appoach poposd by [4] lads to high colation valus than th sults obtaind with [2] [3] [4] appoachs that do not us dg wights. Jiang and Conath [4] concludd that th factos that affct th dg-wight assignmnt a: th local ntwok dnsity, nod (concpt) dpth, link typs and link stngth. Thy considd all of th fou factos fo th dg-wight assignmnt. Sussna [6] considd th fist th factos. Richadson and Smaton [7] took into account th fist two and th last factos. Howv, thy did not fomally dfin a wight-assignmnt fomula. Jiang and Conath suggstd th following wight assignmnt fomula [4]: E d( wt( c, ( ) [ IC( c) IC( ] T( c, E( d( wh, (2) wt(c, : wight of th dg conncting concpt c and its pant nod p E(: numb of dgs in th child links (i.. local dnsity) E : avag dnsity in th whol taxonomy IC(c)- IC(: link stngth of th dg btwn c and p T(c, : th link typ facto. In th abov quation, paamts α (α 0) and β (0 β ) contol th dg of how much th nod dpth and dnsity factos contibut to th dg wighting computation. Th distanc of two concpts is calculatd by summing up th wights on th shotst path btwn thm. Sussna [6] has concns that ach link in a taxonomy has two invs lations. Fo xampl, a link conncting vhicl and ca psnts both th is-a and has-a lations. Diffnt link typs hav thi own wight angs psntd by max and min. Th local dnsity which is psntd as th typ-spcific fanout facto is th numb of dgs laving a nod with th sam typ. Th sum of th two invs wights a avagd and thn

8 4 th Intnational Confnc Infomation & Communication Tchnology and Systm dividd by th high dpth of th two concpts as statd blow: w( c w( c, c ) givn 2 w( x wh c ) w( c 2 2d 2 ' max min y) max n ( x) c ) w(c, c 2 ): wight of th dg conncting concpts c and c 2 : a lation of typ ' : an invs lation of d: dpth of th dp concpt max : maximum wight fo th lation with typ min : minimum wight fo th lation with typ n (x): numb of lations with typ laving nod x. (3) (4) sha th sam LCS d. Thfo, th dpth of d maks th sam contibution to th similaity computation of (a, b) and (a, c). Howv, it is obvious that b which has th sam dpth as a is high than c and thus has mo gnal infomation. W agu that, in th contxt of th subt ootd at d (LCS), a and b hav high similaity valu than that of a and c fom th point of viw of dpth bcaus a and b hav th sam ganulaity. W impov th lativ-dpth scaling by not only considing th contxt of th sub (s bold lins in Figu 3) ootd at th last common subsum of two concpts, but also by using th upwad and downwad path lngth of ach concpt. (Th following discussion dos not consid th cas of two idntical concpts bcaus thy obviously hav similaity.0.) LC a b Th abov mthod also sums up th wights on th shotst path btwn two concpts to obtain thi latdnss. d c 4. IMPROVED CONCEPT DEPTH SCALING Pvious coas-gaind dpth-scaling mthods sult in concpt similaity that dos not flct diffnc in th ganulaity of concpts. Fo xampl, in quation (2) givn by [9] simply uss th maximum dpth of a taxonomy to nomaliz th shotst path lngth btwn concpts and in quation (3) givn in [6] psnts th lativdpth scaling by dividing th dpth of a dp concpt. a d Figu 2. Concpts that hav diffnt ganulaitis Th xponntial function was mployd in [5] to non-linaly tansfom th dpth of th last common subsum of two concpts. Howv, it dos not diffntiat th vaious cass that pais of concpts sha th sam LCS. Fo xampl, in Figu 2, th two pais of concpts (a, b) and (a, c) b c Figu 3. A taxonomy t Fo any pai of concpts c and c 2, w consid thi ganulaity diffnc in thi most spcific contxt, i.. th subt ootd at thi last common subsum LCS. In Figu 3, although concpts a and b a on th sam lvl blow LCS, a has a dp subt than b. This indicats that a is mo gnal than b in thi most spcific contxt. Thfo, w masu th ganulaity of ach concpt basd on its upwad and downwad path lngths. A concpt s upwad path lngth is th numb of dgs on th path fom th concpt to LCS. Its downwad path lngth is th numb of dgs on th path fom th concpt to th dpst laf nod blow. Fo xampl, th upwad and downwad path lngths fo concpt a a and 3, spctivly. Fomally, w dfin th lativ dpth of a concpt as follows: RlativDpth(c) = UpLngth( c) DownLngth ( c) UpLngth( c) DownLngth ( c) wh, f (5)

On Smantic Concpt Similaity Mthods - Vinda Bhavsa 9 RlativDpth(c): lativ dpth of concpt c UpLngth(c): numb of dgs fom c to LCS DownLngth(c): numb of dgs fom c to th dpst laf nod blow c Accoding to quation (5), th lativ dpth of a concpt can b a ngativ valu. Fo th spcial cass that a concpt is th sam as LCS, th dpst laf nod, and in th middl of th path, spctivly, w cospondingly gt lativ dpth valus -,, and 0. Thfo, a concpt s lativ dpth angs btwn - and. W linaly map th lativ dpth of a concpt to its ganulaity (within [0, ]) dscibd in quation (6). Ganulaity(c) = (/2)(RlativDpth(c) + ) (6) Fom th plot (s Figu 4) of quation (6), th Ganulaity monotonically incass with th Concpt Pais Eq (7) Objct- 8 Distibutd P 0.396 0.625 Ointd P RlativDpth. Th high th ganulaity valu of a concpt, th dp it is in th t and th mo spcific infomation it cais. Th absolut diffnc Ganulaity(c ) - Ganulaity(c 2 ) of two concpts c and c 2 is usd to contibut to th similaity computation of thm. Figu 4. Plot of quation (6) Eq (7) No Wight P Tchniqus Automatic P 0.29 0.32 2 P Tchniqus Concunt P 0.265 0.53 3 P Tchniqus Distibutd P 0.65 0.276 4 Automatic P 5 Concunt P Objct- Ointd P Objct- Ointd P 0.43 0.75 0.273 0.455 6 Concunt P Distibutd P 0.225 0.32 7 Distibutd P Paalll P 0.45 0.75 (-,) Ganulaity (0, ) (0, /2) (0, 0) (, ) (,) RlativDpth Ns Sim( c, c2) ( )* Av* N Gnal 0.5 Applicativ Pogamming t Pogamming Tchniqus 0.7 0.4 0.5 0.7 0.7 Automatic Squntial Pogamming Pogamming Distibutd Pogamming Ganulait y( c ) Ganulait y( c2 ) Concunt Objct-Ointd Pogamming Pogamming 0.7 0.5 Paalll Pogamming Figu 5. Taxonomy t of Pogamming Tchniqus (7) In quation (7), N s is th numb of dgs on th shotst path btwn c and c 2. N t stands fo th total numb of dgs in th taxonomy. Av stands fo th avagd dg wights on th shotst path. W boow th non-lina tansfomation, i.. th xponntial fundtion, of [5] to comput th ganulaity similaity of c and c 2. Tabl 2 shows th similaity valus of vaious pais of concpts in th taxonomy of Figu 5 using quation (7) with and without wights. Tabl 2. Computational sults fo vaious pais of nods in th taxonomy t of Figu 5 In od to sav spac in th abov tabl, w us P to psnt Pogamming. W show 8 pais of psntativ concpts in Figu 5 and thi similaity valus. Th column Eq (7) givs th similaity valus using quation (7) taking into account th dg wights. Th last column contains th sults using quation (7) without considing th dg wights (i.. ach dg has wight.0). In this way, w can claly show th impact of concpt ganulaitis. H, w analyz th xpimntal sults basd on th last column, Equation (7) No Wight. Th fist th pais of concpts xpimnt on th similaity valus of th oot concpt, Pogamming Tchniqus, vsus oth concpts on diffnt lvls. Although th concpts Automatic Pogamming and Concunt Pogamming a

0 4 th Intnational Confnc Infomation & Communication Tchnology and Systm on th sam lvl, w obtain sim (Pogamming Tchniqus, Automatic Pogamming) = 0.32 which is small than sim (Pogamming Tchniqus, Concunt Pogamming) = 0.53. Th ason is that Concunt Pogamming still has subts which indicat that it is mo gnal than th laf-nod concpt Automatic Pogamming in th contxt of thi LCS Pogamming Tchniqus. Th similaity of Pogamming Tchniqus and Distibutd Pogamming is small than both of thm bcaus thy hav long path lngth. Concpts of pai no. hav th sam similaity valu as pai no.6 bcaus, in ach of ths two pais, on concpt is th pant of th oth on which is a laf nod. Concpts of pai no.4 hav high similaity valu than that of pai no.5 although all th concpts in thm a on th sam lvl. Th ason is that concpts of pai no.4 hav idntical ganulaity valu and no.5 concpts hav diffnt on. No.4 concpts also hav idntical similaity valu as no.7 bcaus ach pai contains two sibling laf nods that sha thi pant nod as th last common subsum. 5. CONCLUSION Smantic matching tchniqus, such as taxonomic concpt matching masus and schma matching algoithms can mov th ambiguitis sulting fom using th syntactic matching algoithms to som xtnt. Rlatd concpts can b locatd in a taxonomy which indicats th hiachical lationships of thm. Using a taxonomy (.g., WodNt) as a knowldg bas, taxonomic concpt similaity/distanc masus hav bn poposd. Edg-basd and nod-basd appoachs povid th foundation of vaious algoithms in litatu. Th combinations of ths appoachs hav povidd btt matching sults. Futhmo, th human judgmnt of infomation soucs was dmonstatd to b a non-lina pocss towad thi similaity. Among th sval stuctual chaactistics (i.. shotst path, local dnsity, link typ, link stngth, and nod dpth) having impact on th concpt similaity, th combination of th non-lina tansfomations of th shotst path lngth and concpt dpth obtains th bst colation valu against human ankings on wod similaity. Howv, pvious dpth scaling algoithms only took into account th absolut dpth of two concpts and thi last common subsum. W hav poposd a nw lativ-dpth scaling mthod to compa th diffnc of concpt ganulaitis in th most spcific contxt (which is th subt ootd at thi last common subsum). W hav obtaind mo asonabl computational sults fo th an illustativ taxonomy. Rfncs [] Institut of Elctical and Elctonics Engins. IEEE Standad Comput Dictionay: A Compilation of IEEE Standad Comput Glossais. Nw Yok, NY, 990. [2] R. Rada, H. Mili, E. Bicknll, and M. Blttn. Dvlopmnt and Application of a Mtic on Smantic Nts. IEEE Tansactions on Syst., Man, and Cybntics, 989, 9():7-30. [3] P. Rsnik. Using Infomation Contnt to Evaluat Smantic Similaity in a Taxonomy. Pocdings of th 4th Intnational Joint Confnc on Atificial Intllignc, Montal, August 995, :448-453. [4] J. J. Jiang and D. W. Conath. Smantic Similaity Basd on Copus Statistics and Lxical Taxonomy. Pocdings of Intnational Confnc Rsach on Computational Linguistics (ROCLING X), Taiwan, 997. [5] Y. Li, Z. A. Banda, and D. McLan. An Appoach fo Masuing Smantic Similaity btwn Wods Using Multipl Infomation Soucs. IEEE Tansactions on Knowldg and Data Engining, 2003, 5(4):87-882. [6] M. Sussna. Wod Sns Disambiguation fo F-txt Indxing Using a Massiv Smantic Ntwok. Pocdings of th Scond Intnational confnc on Infomation and Knowldg Managmnt, Alington, VA, 993. [7] R. Richadson, and A. F. Smaton. Using WodNt in a Knowldg-Basd Appoach to Infomation Rtival. Woking Pap, CA-0395, School of Comput Applications, Dublin City Univsity, Iland, 995. [8] G. Mill. Nouns in WodNt: A Lxical Inhitanc Systm. Intnational Jounal of Lxicogaphy, 990, 3(4):245-264. [9] C. Lacock and M. Chodoow. Combining Local contxt and WodNt Sns Similaity fo

On Smantic Concpt Similaity Mthods - Vinda Bhavsa Wod Sns Disambiguation. In WodNt, An Elctonic Lxical Databas. Th MIT Pss, 998. [0] Z. Wu and M. Palm. Vb Smantics and Lxical Slction. Pocdings of th 32nd Annual Mting of th Associations fo Computational Linguistics, Las Cucs, NM, 994, 33 38. [] C. Coly and R. Mihalca. Masuing th Smantic Similaity of Txts. Pocdings of th ACL Wokshop on Empiical Modling of Smantic Equivalnc and Entailmnt, Ann Abo, Jun 2005, 3-8. [2] P.Rsnik. Smantic Similaity in a Taxonomy: An Infomation-Basd Masu and its Application to Poblms of Ambiguity in Natual Languag. Jounal of Atificial Intllignc Rsach, 999, :95-30. [3] A. G. Maguitman, F. Mncz, H. Roinstad and A. Vspingnan. Algoithmic Dtction of Smantic Similaity. Pocdings of WWW 2005, Chiba, Japan, May 0-4, 2005. [4] D. Lin. An Infomation-Thotical Dfinition of Similaity. Pocdings of th Fifth Intnational Confnc on Machin Laning, Mogan Kaufmann Publishs Inc., 998, 296-304.