vector space retrieval many slides courtesy James Amherst

Similar documents
Fall CS646: Information Retrieval. Lecture 6 Boolean Search and Vector Space Model. Jiepu Jiang University of Massachusetts Amherst 2016/09/26

Conceptual Questions for Review

Final Review Written by Victoria Kala SH 6432u Office Hours R 12:30 1:30pm Last Updated 11/30/2015

Latent Semantic Analysis. Hongning Wang

Glossary of Linear Algebra Terms. Prepared by Vince Zaccone For Campus Learning Assistance Services at UCSB

Remark By definition, an eigenvector must be a nonzero vector, but eigenvalue could be zero.

Computational Methods. Eigenvalues and Singular Values

2. Every linear system with the same number of equations as unknowns has a unique solution.

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

ICS 6N Computational Linear Algebra Eigenvalues and Eigenvectors

Math Linear Algebra Final Exam Review Sheet

Variable Latent Semantic Indexing

Linear Algebra - Part II

EIGENVALE PROBLEMS AND THE SVD. [5.1 TO 5.3 & 7.4]

Recall : Eigenvalues and Eigenvectors

HOMEWORK PROBLEMS FROM STRANG S LINEAR ALGEBRA AND ITS APPLICATIONS (4TH EDITION)

Math 315: Linear Algebra Solutions to Assignment 7

Remark 1 By definition, an eigenvector must be a nonzero vector, but eigenvalue could be zero.

1 9/5 Matrices, vectors, and their applications

Natural Language Processing. Topics in Information Retrieval. Updated 5/10

.. CSC 566 Advanced Data Mining Alexander Dekhtyar..

Let A an n n real nonsymmetric matrix. The eigenvalue problem: λ 1 = 1 with eigenvector u 1 = ( ) λ 2 = 2 with eigenvector u 2 = ( 1

A Review of Linear Algebra

TBP MATH33A Review Sheet. November 24, 2018

Practice Final Exam. Solutions.

Latent Semantic Models. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

Math Final December 2006 C. Robinson

Parallel Singular Value Decomposition. Jiaxing Tan

Latent Semantic Analysis. Hongning Wang

Chapters 5 & 6: Theory Review: Solutions Math 308 F Spring 2015

Information Retrieval

Linear Algebra Review. Fei-Fei Li

Linear Algebra Review. Vectors

Econ Slides from Lecture 7

A Brief Outline of Math 355

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Math 4A Notes. Written by Victoria Kala Last updated June 11, 2017

Warm-up. True or false? Baby proof. 2. The system of normal equations for A x = y has solutions iff A x = y has solutions

I. Multiple Choice Questions (Answer any eight)

Boolean and Vector Space Retrieval Models

235 Final exam review questions

18.06SC Final Exam Solutions

Numerical Linear Algebra Homework Assignment - Week 2

Math 18, Linear Algebra, Lecture C00, Spring 2017 Review and Practice Problems for Final Exam

INFO 4300 / CS4300 Information Retrieval. IR 9: Linear Algebra Review

Linear Algebra Primer

Study Guide for Linear Algebra Exam 2

4. Determinants.

and let s calculate the image of some vectors under the transformation T.

Math 520 Exam 2 Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008

The Singular Value Decomposition

1. Select the unique answer (choice) for each problem. Write only the answer.

Definition (T -invariant subspace) Example. Example

1. General Vector Spaces

MATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP)

CS 246 Review of Linear Algebra 01/17/19

Matrix decompositions

[Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty.]

Problem Set (T) If A is an m n matrix, B is an n p matrix and D is a p s matrix, then show

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Linear Algebra Background

Quizzes for Math 304

Linear vector spaces and subspaces.

Solution of Linear Equations

6 EIGENVALUES AND EIGENVECTORS

MATH 1120 (LINEAR ALGEBRA 1), FINAL EXAM FALL 2011 SOLUTIONS TO PRACTICE VERSION

Linear Algebra (MATH ) Spring 2011 Final Exam Practice Problem Solutions

c c c c c c c c c c a 3x3 matrix C= has a determinant determined by

Linear Algebra Linear Algebra : Matrix decompositions Monday, February 11th Math 365 Week #4

PRACTICE PROBLEMS FOR THE FINAL

This can be accomplished by left matrix multiplication as follows: I

MATH 1553 SAMPLE FINAL EXAM, SPRING 2018

Linear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4

MATH 2331 Linear Algebra. Section 2.1 Matrix Operations. Definition: A : m n, B : n p. Example: Compute AB, if possible.

Knowledge Discovery and Data Mining 1 (VO) ( )

Solving a system by back-substitution, checking consistency of a system (no rows of the form

MTH 464: Computational Linear Algebra

q n. Q T Q = I. Projections Least Squares best fit solution to Ax = b. Gram-Schmidt process for getting an orthonormal basis from any basis.

Review of Some Concepts from Linear Algebra: Part 2

CS47300: Web Information Search and Management

LINEAR ALGEBRA KNOWLEDGE SURVEY

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

Math Bootcamp An p-dimensional vector is p numbers put together. Written as. x 1 x =. x p

TMA Calculus 3. Lecture 21, April 3. Toke Meier Carlsen Norwegian University of Science and Technology Spring 2013

MATH 1553 PRACTICE FINAL EXAMINATION

forms Christopher Engström November 14, 2014 MAA704: Matrix factorization and canonical forms Matrix properties Matrix factorization Canonical forms

City Suburbs. : population distribution after m years

Math 1553 Worksheet 5.3, 5.5

MATH 310, REVIEW SHEET 2

Applied Linear Algebra in Geoscience Using MATLAB

Practical Linear Algebra: A Geometry Toolbox

Information Retrieval and Topic Models. Mausam (Based on slides of W. Arms, Dan Jurafsky, Thomas Hofmann, Ata Kaban, Chris Manning, Melanie Martin)

9 Searching the Internet with the SVD

ANSWERS (5 points) Let A be a 2 2 matrix such that A =. Compute A. 2

Mathematical Methods for Engineers 1 (AMS10/10A)

Information Retrieval. Lecture 6

Background Mathematics (2/2) 1. David Barber

Math 314/ Exam 2 Blue Exam Solutions December 4, 2008 Instructor: Dr. S. Cooper. Name:

Final Review Sheet. B = (1, 1 + 3x, 1 + x 2 ) then 2 + 3x + 6x 2

CSE 494/598 Lecture-6: Latent Semantic Indexing. **Content adapted from last year s slides

Transcription:

vector space retrieval many slides courtesy James Allan@umass Amherst 1

what is a retrieval model? Model is an idealization or abstraction of an actual process Mathematical models are used to study the properties of the process, draw conclusions, make predictions Conclusions derived from a model depend on whether the model is a good approximation of the actual situation Statistical models represent repetitive processes, make predictions about frequencies of interesting events Retrieval models can describe the computational process e.g. how documents are ranked Note that how documents or indexes are stored is implementation Retrieval models can attempt to describe the human process e.g. the information need, interaction Few do so meaningfully Retrieval models have an explicit or implicit definition of relevance 2

retrieval models boolean vector space latent semnatic indexing statistical language inference network 3

outline review: geometry, linear algebra vector space model vector selection simmilarity weighting schemes latent semnatic indexing 4

linear algebra 5

vectors text text 6

subspaces text text 7

linear independence, base, dimmension, rank vector x is linear dependent of vectors y 1,y 2,..., y t if there exists real numbers c 1,c 2,..., c t such that x = c 1 y 1 + c 2 y 2 +...c t y t base of a vectorial space = maximal set of linear independent vectors. All bases of a given space have the same dimmension (dimmension of the space) rank(a) = maximum number of raws/columns linear independent rank(a) = dimenion of the subspacespanned by A 8

matrix multiplication 9

dot product, norm dot product of 2 same dimension arrays is simply the matrix product with result a real number x =(x 1,x 2,...,xn); y =(y 1,y 2,...,yn) then <x y>= x y T = P n i=1 x i y i L 2 norm : x = <x x> normalization: x = x x ; x =1 10

cosine computation cos(θ) =cos(β α) =cos(β) cos(α)+sin(β) sin(α) 11

orthogonality 12

projections text text 13

outline vector space model review: geometry, linear algebra vector selection simmilarity weighting schemes latent semnatic indexing 14

vector space represent documents and queries as vectors in the term space issue: find the right coefficients (many variants) use a geometric similarity measure, often angle-related issue: normalization 15

mapping to vectors terms: an axis for every term -vectors coresponding to terms are canonical vectors documents: sum of the vectors corresponding to terms in the doc queries: treated the same as documents 16

coefficients The coefficients (vector lengths, term weights) represent term presence, importance, or aboutness Magnitude along each dimension Model gives no guidance on how to set term weights Some common choices: Binary: 1 = term is present, 0 = term not present in document tf: The frequency of the term in the document tf idf: idf indicates the discriminatory power of the term Tf idf is far and away the most common Numerous variations 17

example: raw tf weights cat cat cat cat cat cat cat lion lion cat cat lion dog cat cat lion dog dog 18

tf = term frequency rawtf(calledtf)=countof term indocument tf robinsontf (okapi tf): okapi tf = tf+.5+1.5 avgdoclen doclen - Based on a set of simple criteria loosely connected to the 2-Poisson model - Basic formula is tf/(k+tf) where k is a constant (approx. 1-2) - Document length introduced as a verbosity factor many variants 19

Robertson tf 20

IDF weights Inverse Document Frequency used to weight terms based on frequency in the corpus (or language) fixed, it can be precomputed for every term (basic) IDF (t) =log( N N t )where N= #ofdocs N t = # of docs containing term t 21

TFIDF in fact tf*idf the weight on every term is tf(t,d)*idf(t) Often : IDF= log(n/df)+1 wheren is the number of documents in the collection, df is the number of documents the term occurs in IDF = log 1 p,wherp is the term probability sometimes normalized when in TF.IDF combination e.g. for INQUERY: log( N+0.5 df ) log(n+10) TF and IDF combined using multiplication No satisfactory model behind these combinations 22

outline review: geometry, linear algebra vector space model vector selection simmilarity weighting schemes latent semnatic indexing 23

similarity, normalized similarity = intersection set 1 set 2 same intersection area but different fraction of sets the size of intersection alone is meaningless often divided by sizes of sets same for vectors, using norm by normalizing vectors, cosine does not change 24

common similarity measures 25

similarity: weighted features 26

vector similarity: cosine θ 1 θ 2 27

cosine, normalization 28

cosine similarity: example text text 29

cosine example normalized text text round-off error, should be the same 30

tf-idf base similarity formula P t (TF query (t) IDF query (t)) (TF doc (t) IDF doc (t)) doc query many options for TF query and TF doc raw tf, Robertson tf, Lucene etc try to come up with yours some options for IDF doc IDF query sometimes not considered normalization is critical 31

Lucene comparison text text 32

other term weighting schemes Lucene augmented tf-idf cosine 33

outline review: geometry, linear algebra vector space model vector selection simmilarity weighting schemes latent semnatic indexing 34

more linear algebra 35

A = LDU factorization For any mxn matrix A, there exists a permutation matrix P, a lower triangular matrix L with unit diagonal and an mxn echelon matrix U such that P A = LU For any nxn matrix A, there exists L,U lower and upper triunghiular with unit diagonals,d a diagonal matrix of pivots and P a permutation matrix such that P A = LDU If A is symmetric (A = A T ) then there is no need for P and U = L T : A = LDL T 36

eigenvalues and eigenvectors λ is an eigenvalue for matrix A iff det(a λi) =0 every eigenvalue has a correspondent nonzero eigenvector x that satisfies (A λi)x =0 or Ax = λx in other words Ax and x have same direction sum of eigenvalues = trace(a) =sumofdiagonal product of eigenvalues = det(a) eigenvalues of a upper/lower triangular matrix are the diagonal entries 37

matrix diagonal form if A has lineary independent eigenvectors y 1,y 2,...,y n and S is the matrix having those as columns, S =[y 1 y 2...y n ], then S is invertible and λ 1 S 1 λ AS =Λ= 2, the diagonal... λn matrix of eigenvalues of A. A = SΛS 1 no repeated eigenval indep. eigenvect A symetric A T = A S orthogonal: S T S =1 S is not unique AS = SΛ holds iff S has eigenvect as columns not all matrices are diagonalizable 38

singular value decomposition if A is m n, m > n real matrix then it can be decomposed as A = UDV T where U is m n; D, V are n n U, V are orthogonal : U T U = V T V =1 n n D is diagonal, its entries are the squre roots of eigenvalues of A T A 39

latent semantic indexing Variant of the vector space model Uses Singular Value Decomposition (a dimensionality reduction technique) to identify uncorrelated, significant basis vectors or factors Rather than non-independent terms Replace original words with a subset of the new factors (say 100) in both documents and queries Compute similarities in this new space Computationally expensive, uncertain effectiveness 40

dimensionality reduction when the representation space is rich but the data is lying in a small-dimension subspace that s when some eigenvalues are zero non-exact: ignore smallest eigenvalues, even if they are not zero 41

latent semantic indexing T 0,D 0 orthogonal matrices with unit length columns (T 0 T0 T =1) S 0 diagonal matrix of eigen values m is the rank of X 42

LSI: example text text 43

LSI: example text text 44

LSI T has orthogonal unit-length col (T T T =1) D has orthogonalunit-lengthcol (D D T =1) S diagonal matrix of eigen values m is the rank of X t =#ofrowsinx d =#ofcolumnsinx k = chosen number of dimensions of reduced model 45

46

LSI: example 47

original vs LSI text text 48

using LSI D is new doc vectors (k dimensions) T provides term vectors Given Q=q 1 q 2 q t want to compare to docs Convert Q from t dimensions to k Q 0 = Q T 1 t T t k S 1 k k Can now compare to doc vectors Same basic approach can be used to add new docs to the database 49

LSI: does it work? Decomposes language into basis vectors In a sense, is looking for core concepts In theory, this means that system will retrieve documents using synonyms of your query words The magic that appeals to people From a demo at lsi.research.telcordia.com They hold the patent on LSI 50

vector space retrieval: summary Standard vector space Each dimension corresponds to a term in the vocabulary Vector elements are real-valued, reflecting term importance Any vector (document,query,...) can be compared to any other Cosine correlation is the similarity metric used most often Latent Semantic Indexing (LSI) Each dimension corresponds to a basic concept Documents and queries mapped into basic concepts Same as standard vector space after that Whether it s good depends on what you want 51

vector space model: disadvantages Assumed independence relationship among terms Though this is a very common retrieval model assumption Lack of justification for some vector operations e.g. choice of similarity function e.g., choice of term weights Barely a retrieval model Doesn t explicitly model relevance, a person s information need, language models, etc. Assumes a query and a document can be treated the same (symmetric) 52

vector space model: advantages Simplicity Ability to incorporate term weights Any type of term weights can be added No model that has to justify the use of a weight Ability to handle distributed term representations e.g., LSI Can measure similarities between almost anything: documents and queries documents and documents queries and queries sentences and sentences etc. 53