Extended IR Models. Johan Bollen Old Dominion University Department of Computer Science

Size: px

Start display at page:

Download "Extended IR Models. Johan Bollen Old Dominion University Department of Computer Science"

Naomi Holmes
5 years ago
Views:

1 Extended IR Models. Johan Bollen Old Dominion University Department of Computer Science jbollen January 20, 2004 Page 1

2 UserTask Retrieval Classic Model Boolean Fuzzy Ext Boolean Vect Gen. Vec LSI NN Prob Inf Netw Struct Models Non-overlap Lists Belief Netw Prox Nodes Browsing Flat Guided HT January 20, 2004 Page 1

3 Extended Set Theoretic Models 1. Set Theoretic Models or Boolean Retrieval: (a) Do now allow partial matching (b) Do now allow ranking of documents (c) However, efficient and widespread 2. Fuzzy sets: (a) Allows membership degree to be determined (b) Retains set theoretic origins of Boolean IR (c) Ranking of results 3. Extended Boolean: (a) Based on VSM principles (b) More frequently deployed than simple Boolean (c) Ranking of results January 20, 2004 Page 2

4 Fuzzy sets and Information Retrieval 1. Traditional logic: (a) True or false logic (b) No perhaps, somewhat, largely (c) Cold vs warm: balmy? hot? (d) Need to connect logic to linguistic variables 2. Attempts to produce multi-valued logics 3. How about infinity-valued logic? (a) Fuzzy sets vs. crisp sets (b) Introduced in 1960s (c) Each element is assigned membership value [0,1] (d) Standard boolean operators: i. Union max ii. Intersection min iii. Negation: 1-membership value 4. Use of keyterm-keyterm similarities to define fuzzy result set January 20, 2004 Page 3

5 Fuzzy sets Universe of discourse: U all possible elements A Fuzzy subset of U: characterized by membership function µ A : µ A : U [0,1] Each u U is mapped to [0, 1], i.e. µa(u) [0.1] So, every element is assigned a value which indicates the degree to which it is a member of a set, which is a subset of the universe of discourse. Boolean Operations: µ A = 1 µ A (u) µ A B (u) = max(µ A (u),µ B (u)) µ A B (u) = min(µ A (u),µ B (u)) Question: membership functions? January 20, 2004 Page 4

6 Fuzzy sets: Example Universe of discourse: U set of people = U set of hip people = H fuzzy subset of U hip people are young and and like DJ s Precision and Recall. Two subsets: young and likes Precision and Recall Membership functions: January 20, 2004 Page 5

7 Young January 20, 2004 Page 6

8 Likes Precision and Recall January 20, 2004 Page 7

9 Young AND Likes Precision and Recall January 20, 2004 Page 8

10 Young OR Likes Precision and Recall January 20, 2004 Page 9

11 Fuzzy sets: extension of Boolean retrieval model 1. Partially based on term-term correlation matrix 2. Represented as thesaurus 3. Calculated from ratio of documents that contain pair of terms vs. number of documents which contain either matric c: term-term correlation matrix c i,l = n i,l n i +n j n i,l similar to bibliographic coupling (Kessler 1963) and co-citation (Small 1973). fuzzy set is defined on the basis of keyterm k i. Membership function for document d j is defined as: µ i, j = 1 kl d j (1 c i.l Document d j belongs to fuzzy set when at least a number of keyterms in document are close to k i. January 20, 2004 Page 10

12 Remember Boolean retrieval model 1. Boolean expression 2. Converted to DNF Boolean query expresses user information need 3. query matches to DNF components For example, query: [q = k a (k b k c ] [ q dn f = (1,1,1) (1,1,0) (1,0,0)] for the tuple (k a,k b,k c ) Let cc i be i-th conjunctive component. Let D a be the fuzzy set of documents with index k a D a contains documents for which µ a, j > K, K is threshold. Same for D b and D c. D q is then union of fuzzy sets associated with cc 1, cc 2 and cc 3, three conjunctive components. January 20, 2004 Page 11

13 Fuzzy Conjunctive Components January 20, 2004 Page 12

14 Remember Boolean retrieval model Membership degree µ q, j is then defined as: µ q, j = µ cc1 +cc 2 +cc 3 i.e. sum of membership degrees. µ q, j = µ cc1 +cc 2 +cc 3 = 1 3 i=1 (1 µ cc i, j ) = 1 (1 µ a, j µ b, j µ c, j ) (1 µ a, j µ b, j (1 µ c, j )) (1 µ a, j (1 µ b, j )(1 µ c, j )) January 20, 2004 Page 13

15 Fuzzy Conjunctive Components January 20, 2004 Page 14

16 Fuzzy sets Information Retrieval 1. Limited appeal (a) Few applications (b) Applications to recommender systems (c) Different ways to construct thesaurus (d) Scalability: term-term matrix 2. Relations to query expansion and neural network approaches January 20, 2004 Page 15

17 Extended Boolean Model 1. Boolean model: (a) Retrieval is brittle (b) Precise boolean query is difficult to generate 2. Extended (a) Features of vector space model (b) Keyterm weighing (c) Partial Matching January 20, 2004 Page 16

18 Extended Boolean Model 1. Main principles (a) Represent document in n-dimensional term space (b) x, y and z.. coordinates determined by term weights (c) depending on conjunction or disjunction: i. determine vector distance from (0,0) ii. determine vector distance from (1.0) (d) Distance calculation: i. Concept of p-norm ii. Varying characteristics of extended model 2. Discussion will focus on 2-dimensional problems January 20, 2004 Page 17

19 Term weighing and document vectors in boolean term space Document is assigned coordinates by term weighing t number of terms t-dimensional space For example, x coordinate: w x, j = f x, j idf x max i idf i Assume two keyterms, x and y coordinates as above Depending on Boolean query type, certain regions of space are either required or to be avoided: January 20, 2004 Page 18

20 January 20, 2004 Page 19

21 Query - document similarity measures Similarities are calculated on the basis of distance to desired coordinates. AND: distance to right-upper corner OR: general distance from left-bottom corner normalized by denominator 2 sim(qor, d) = sim(q and,d) = 1 x 2 +y 2 2 (1 x) 2 +(1 y) 2 2 Characteristics: if w x, j {0,1}: document always in four corners AND OR January 20, 2004 Page 20

22 Query - document similarity measures: p-norm Generally t-dimensional space Use of generalized vector norm: p-norm model sim(q and,d j ) = 1 a parameter 1 p inf General disjunctive query (under p-norm: qor = k 1 p k 2 p k m sim(q or,d j ) = ( x p i +xp i + +xp m m General conjunctive query (under p-norm: qor = k 1 p k 2 p k m ) 1 p p = 1: norm is sum of weights: similar ( (1 xi ) p +(1 x i ) p + +(1 x m ) p m to vector space model p = inf: sim(q or,d j ) = max(x i ) sim(q and,d j ) = min(x i ) Thus: fuzzy logic! System can thus be made to naturally adapt to range of behaviors, VSM fuzzy ) 1 p January 20, 2004 Page 21

23 Extended Boolean Model 1. Promising features 2. Apparently introduced in Found few applications since (a) Appeal of VSM (b) interpretation of p-norm (c) Provides theoretical framework 4. Evaluation: (a) Criterion: user satisfaction, result relevance (b) No applications to large collection that I know of (c) Outperforms VSM? Boolean? January 20, 2004 Page 22

24 Extended Algabraic Models 1. Terms are often thought of as orthogonal in VSM (a) provide basis of space (b) one term weight another term weight 2. Not true: (a) terms in language occur in groups (b) co-occurence of term groups (c) Natural consequence of semantic nature of information 3. Extended VSM: (a) Assumes linear independence of terms (b) seeks orthogonal vectors to be used as subspace January 20, 2004 Page 23

25 Minterms and Orthogonal vectors Given a set of t index terms {k 1,k 2,,k t } Assume w i, j is weight associated with [k i,d j ] and w i, j {0,1} (binary) g i (m j ) returns weight of index term k i in minterm m 1 2 t minterms: possible patterns of term occurence: for example: m 1 = (0,0,,0) m 2 = (0,0,,1) m 2 t = (1,1,,1) We define a set of orthogonal vectors: m 1 = (1,0,,0) m 2 = (0,1,,0) m 2 t = (0,0,,1) each associated with an element of the set of minterms. January 20, 2004 Page 24

26 Minterms and Orthogonal vectors - Example assuming 3 keyterms: minterm (basic) vector (0,0,0) (1,0,0,0,0,0,0,0) (1,0,0) (0,1,0,0,0,0,0,0) (0,1,0) (0,0,1,0,0,0,0,0) (1,1,0) (0,0,0,1,0,0,0,0) (0,0,1) (0,0,0,0,1,0,0,0) (1,0,1) (0,0,0,0,0,1,0,0) (0,1,1) (0,0,0,0,0,0,1,0) (1,1,1) (0,0,0,0,0,0,0,1) Although m i m j = 0, m 4 k 1,k 2 : co-occurence of keyterms when there is a document that contains both k 1 and k 2 we say that minterm m 4 is active. Number of active minterms? January 20, 2004 Page 25

27 Deriving index term vectors We define correlation factors c i r = d j g l ( d j )=g l (m r ) l w i, j This is essentially a count of the frequency by which each keyterm occured in an active minterm We generate keyterm vectors A linear combination of all basic vectors corresponding to minterms having nonzero correlation factors for the specific term. k i = r,g i mr=1 c i,r m r r,gi (mr)=1 c2 i, j So, once the keyterm vectors have been produced, we can translate queries etc to minterm vectors, and calculate query-document similarities. However, index term correlations: product of term vectors: k i k j = r gi (m r )=1 g j (m r )=1 c i,r c j,r January 20, 2004 Page 26

28 Whhaaaa??? OK OK, that wasn t very clear. We need an example. Let s say we have a set of 6 documents and three keyterms document-keyterm matrix d 1 d 2 d 3 d 4 d 5 d 6 k k k January 20, 2004 Page 27

29 Minterms c i r = d j g l ( d j )=g l (m r ) l w i, j m r active c 1,r c 2,r c 3,r (0, 0, 0) NO (1,0,0) Y ES (0,1,0) Y ES (1,1,0) Y ES (0, 0, 1) NO (1, 0, 1) NO (0,1,1) Y ES (1, 1, 1) NO January 20, 2004 Page 28

30 Basic Vectors number of minterms = 2 t = 8 basicvector m r (0,0,0) (1,0,0,0,0,0,0,0) = b 1 (1,0,0) (0,1,0,0,0,0,0,0) = b 2 (0,1,0) (0,0,1,0,0,0,0,0) = b 3 (1,1,0) (0,0,0,1,0,0,0,0) = b 4 (0,0,1) (0,0,0,0,1,0,0,0) = b 5 (1,0,1) (0,0,0,0,0,1,0,0) = b 6 (0,1,1) (0,0,0,0,0,0,1,0) = b 7 (1,1,1) (0,0,0,0,0,0,0,1) = b 8 Final term vectors will be linear combination of basic vectors according to minterm state and term occurence. January 20, 2004 Page 29

31 Generating term vectors k i = r,g i mr=1 c i,r m r r,gi (mr)=1 c2 i, j m r active c 1,r c 2,r c 3,r (0, 0, 0) NO (1,0,0) Y ES (0,1,0) Y ES (1,1,0) Y ES (0, 0, 1) NO (1, 0, 1) NO (0,1,1) Y ES (1, 1, 1) NO b 2, b 3, b 4 and b 7 have c i,r > 0 k 1 = b b 4 k 2 = 2 3 b b b 7 k 3 = 1 b 7 January 20, 2004 Page 30

32 Query matching Matching queries Let s say we have a query: k 1,k 3 Documents are represented by linear combinations of term vectors d 1 = k 1 d 2 = k 2 d 3 = k 1 + k 2 d 4 = k 2 + k 3 d 5 = k 1 + k 2 d 6 = k2 The query is k 1 + k 2 Use cosine similarity measure: sim(q,d i ) = q d j q d j January 20, 2004 Page 31

33 Query matching We know that: k 1 = b b 2 4 so: k 1 = (0.447,0,0, ,0,0,0,0) k 2 = 2 3 b b b 7 k 2 = (0,0,0.666,0.666,0,0,0.333,0) k 3 = b 7 = (0,0,0,0,0,0,1,0) Document vectors are linear combinations of k 1,k 2,k 3 : d 1 = (0.447,0,0,0.894,0,0,0,0) d 2 = (0,0,0.666,0.666,0,0,0.333,0) d 3 = (0.447,0,0.666,1.561,0,0,0.333,0) d 4 = (0,0,0.666,0.666,0,0,1.333,0) d 5 = (0.447,0,0.666,1.561,0,0,0.333,0) d 6 = (0,0,0.666,0.666,0,0,0.333,0) Query vector: q = k 1 + k 2 = ( ,0,0,0.894,0,0,1,0 January 20, 2004 Page 32

34 Similarities sim(q,d i ) = sim(q, d1) = sim(q, d2) = sim(q, d3) = sim(q, d4) = sim(q, d5) = sim(q, d6) = q d j q d j January 20, 2004 Page 33

35 Comparison to Classic Vector Space model We use binary term-document matrix: d 1 d 2 d 3 d 4 d 5 d 6 k k k Query vector for term 1 and term 2: q = (1,1,0) We simply multiply q t with term-document vector: m t q = q r q t r = (1,1,2,1,2,1) Now divide entires of q t r by products of document and query vector norms, p: p = (1.414,1.414,2,2,2,1.414) January 20, 2004 Page 34

36 Comparison to Classic Vector Space model Resulting Ranking Compared to GVS: i GV S CV S d d d d d d Inclusion of term dependencies makes significant difference January 20, 2004 Page 35

37 Evaluation of Generalized Vector Space Model 1. Comparison to other models: (a) Little evidence that model outperforms existing models (b) Problematic to interprete keyterm dependencies (c) Difficult to interprete specific ranking 2. Large collections (a) Very large number of minterms (b) Considerable computational overhead (c) Warranted effort? 3. Implementations (a) Few systems (b) Little empirical basis for evaluation 4. Benefits (a) Theoretical notion of keyterm dependence is exploited (b) Improved rankings (?) (c) Relatively simple extension of CVS January 20, 2004 Page 36

38 Keyterms and Concepts January 20, 2004 Page 37

39 Exploiting Term dependencies 1. Neural network model (a) imitate capacity of human brain to process information (b) Feedforward neural network (c) Input layer: terms (d) Output layer: documents (e) Activation of term nodes propagates to documents (f) Two-way communication 2. Latent Semantic Indexing (a) Create lower-dimensional space of concepts (b) Based on Singular Value Decomposition of term-document matrix (c) Lower-rank approximation (d) Project query vector into lower-dimensional space (e) Translate back to documents January 20, 2004 Page 38

40 Neural Network model 1. Biological nervous systems (a) Parallel computation on massive scale i. Billions of neurons ii. Complex electro-chemical interactions iii. Adaptivity to outside stimuli (b) No CPU, von Neuman absent (c) Speed in recognition and knowledge processing tasks is excellent January 20, 2004 Page 39

41 Human Brain January 20, 2004 Page 40

42 Neural Network model 1. Accomplished by: (a) Layers of connected neurons (b) Neuron s cell membrane is semi-permeable to Na +, K + and Cl ions (c) Can be depolarized by chemical and electrical stimuli (d) Depolarization produces spike or activation level: action potential (e) Spike is communicated to other neurons (FM modulation) January 20, 2004 Page 41

43 Action Potential January 20, 2004 Page 42

44 Neural networks 1. Simulations: (a) Artificial Neural Networks: simplified representation (b) Directed, weighted graphs (c) Nodes = neurons (d) Nodes have activation levels (e) Activation of node = weighted sum of connecting nodes activation levels 2. Applied in: (a) Image and Speech Recognition (b) Adaptive Control Systems (c) Models of human behavior and learning 3. Connectionist theory: (a) Learning to relate stimulus-response activation patterns (b) Two learning paradigms i. Supervized ii. Unsupervized (c) Implementation of specific learning algorithms (d) Structure of networks: i. Recurrent (e.g. SOM) ii. Feedforward (perceptron) January 20, 2004 Page 43

45 Neural network model January 20, 2004 Page 44

46 Neural network model January 20, 2004 Page 45

47 Retrieval: activation propagation from input to document layer 1. Retrieval: (a) Activation levels set of query term layer (b) Activation spreads to document term layer (c) Modulation by query term weights (d) Activation propagates to document layer (e) Modulation by term weights in documents (f) Essentials of VSM 2. Iterative Procedure (a) Activation can move from document layer to document term layer (b) Casts document activation back in term layer and back, etc (c) Activation will wane (weights < 0) January 20, 2004 Page 46

48 Query term, document term weights start at query term layer All activated query nodes: activation=1 (maximum) Use of normal vector ( space model query term weights: w i,q = freq ) i,q max l freq log n N l,q i but normalized: w w i,q = i,q t i=1 w2 i,q Transmission of activation values from document terms to document layer: document term weights: w i, j = f i, j log n N i normalization of weights: w w i, j = i, j t i=1 w2 i, j Ranking: t i=1 w i,qw i, j And again! January 20, 2004 Page 47

49 NN model discussion 1. NN: interesting concept (a) Concept of parallel, distributed computation (b) Conceptually close to VSM (c) Recursive nature allows network to fine-tune ranking 2. Evaluation (a) Few applications (b) Mostly theoretical contribution (c) Refinements have been formulated 3. Future: (a) Activation propagation in document networks (b) Remove keyterm layer (c) Problem: creation of large document networks January 20, 2004 Page 48

Lecture 2: IR Models. Johan Bollen Old Dominion University Department of Computer Science

Lecture 2: IR Models. Johan Bollen Old Dominion University Department of Computer Science http://www.cs.odu.edu/ jbollen January 30, 2003 Page 1 Structure 1. IR formal characterization (a) Mathematical