Problems. Looks for literal term matches. Problems:

Similar documents
Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Latent Semantic Models. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

Latent Semantic Analysis. Hongning Wang

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

Information Retrieval

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

CSE 494/598 Lecture-6: Latent Semantic Indexing. **Content adapted from last year s slides

Latent Semantic Analysis. Hongning Wang

CS 3750 Advanced Machine Learning. Applications of SVD and PCA (LSA and Link analysis) Cem Akkaya

Linear Algebra Background

PROBABILISTIC LATENT SEMANTIC ANALYSIS

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

Manning & Schuetze, FSNLP (c) 1999,2000

Singular Value Decompsition

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Let A an n n real nonsymmetric matrix. The eigenvalue problem: λ 1 = 1 with eigenvector u 1 = ( ) λ 2 = 2 with eigenvector u 2 = ( 1

EIGENVALE PROBLEMS AND THE SVD. [5.1 TO 5.3 & 7.4]

Principal Components Analysis (PCA)

A few applications of the SVD

CS 572: Information Retrieval

Latent semantic indexing

Singular Value Decomposition

Manning & Schuetze, FSNLP, (c)

Computational Methods. Eigenvalues and Singular Values

Introduction to Information Retrieval

Latent Semantic Analysis (Tutorial)

Numerical Methods I Singular Value Decomposition

Fast LSI-based techniques for query expansion in text retrieval systems

Matrix decompositions and latent semantic indexing

Multivariate Statistical Analysis

CSE 494/598 Lecture-4: Correlation Analysis. **Content adapted from last year s slides

COMP 558 lecture 18 Nov. 15, 2010

Latent Semantic Indexing

EE731 Lecture Notes: Matrix Computations for Signal Processing

The Singular Value Decomposition

PV211: Introduction to Information Retrieval

14 Singular Value Decomposition

Basic Calculus Review

Notes on Latent Semantic Analysis

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

15 Singular Value Decomposition

UNIT 6: The singular value decomposition.

Variable Latent Semantic Indexing

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

Assignment 3. Latent Semantic Indexing

Singular Value Decomposition

Linear Methods in Data Mining

Linear Algebra Review. Fei-Fei Li

CS47300: Web Information Search and Management

Linear Algebra - Part II

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction

Conceptual Questions for Review

Functional Analysis Review

Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson

CSE 554 Lecture 7: Alignment

Chapter 3 Transformations

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2

Review problems for MA 54, Fall 2004.

Dimensionality Reduction

Positive Definite Matrix

Probabilistic Latent Semantic Analysis

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

Linear Least Squares. Using SVD Decomposition.

Jeffrey D. Ullman Stanford University

Linear Algebra Review. Fei-Fei Li

Principal Component Analysis

Image Registration Lecture 2: Vectors and Matrices

Singular Value Decomposition. 1 Singular Value Decomposition and the Four Fundamental Subspaces

Singular Value Decomposition

Singular Value Decomposition

Parallel Singular Value Decomposition. Jiaxing Tan

Matrices, Vector Spaces, and Information Retrieval

1 Linearity and Linear Systems

9 Searching the Internet with the SVD

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u

CS 143 Linear Algebra Review

Deep Learning Book Notes Chapter 2: Linear Algebra

Matrix decompositions

Problem # Max points possible Actual score Total 120

Large Scale Data Analysis Using Deep Learning

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

Linear Least-Squares Data Fitting

Machine learning for pervasive systems Classification in high-dimensional spaces

Dimensionality Reduction

Computational Methods CMSC/AMSC/MAPL 460. Eigenvalues and Eigenvectors. Ramani Duraiswami, Dept. of Computer Science

Linear Algebra. Shan-Hung Wu. Department of Computer Science, National Tsing Hua University, Taiwan. Large-Scale ML, Fall 2016

SVD, PCA & Preprocessing

IV. Matrix Approximation using Least-Squares

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Linear Algebra: Matrix Eigenvalue Problems

Foundations of Computer Vision

7 Principal Component Analysis

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Linear Algebra for Machine Learning. Sargur N. Srihari

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Notes on Eigenvalues, Singular Values and QR

Lecture 5: Web Searching using the SVD

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Transcription:

Problems Looks for literal term matches erms in queries (esp short ones) don t always capture user s information need well Problems: Synonymy: other words with the same meaning Car and automobile 电脑 vs. 计算机 What if we could match against concepts, that represent related words, rather than words themselves 2010-5-18 Wang Houfeng, ICL,PKU 74

Latent Semantic Indexing (LSI) Key idea: instead of representing documents as vectors in a term-dim space of terms Represent them (and terms themselves) as vectors in a lowerdimensional space whose axes are concepts that effectively group together similar words hese axes are the Principal Components from PCA Suppose we have keywords Car, automobile, driver, elephant We want queries on car to also get docs about drivers and automobiles, but not about elephants Relevant docs may not have the query terms, but may have many related terms 2010-5-18 Wang Houfeng, ICL,PKU 75

LSI via SVD he matrix A can be decomposed into 3 matrices (SVD) as follows: A = UΣV U are the matrix of orthogonal eigenvectors of AA. V is the matrix of orthogonal eigenvectors of A A. Eigenvalues λ 1 λ r of AA are the eigenvalues of A A. Σ is an r r diagonal matrix of singular values. R is the rank of A Σ = ( σ... ) diag 1 σ r σ i = λ i Singular values. 2010-5-18 Wang Houfeng, ICL,PKU 76

Example term ch1 ch2 ch3 ch4 ch5 ch6 ch7 ch8 controllability 1 1 0 0 1 0 0 1 observability 1 0 0 0 1 1 0 1 realization 1 0 1 0 1 0 1 0 feedback 0 1 0 0 0 1 0 0 controller 0 1 0 0 1 1 0 0 observer 0 1 1 0 1 1 0 0 transfer function 0 0 0 0 1 1 0 0 polynomial 0 0 0 0 1 0 1 0 matrices 0 0 0 0 1 0 1 1 his happens to be a rank-7 matrix -so only 7 dimensions required U (9x7) = 0.3996-0.1037 0.5606-0.3717-0.3919-0.3482 0.1029 0.4180-0.0641 0.4878 0.1566 0.5771 0.1981-0.1094 0.3464-0.4422-0.3997-0.5142 0.2787 0.0102-0.2857 0.1888 0.4615 0.0049-0.0279-0.2087 0.4193-0.6629 0.3602 0.3776-0.0914 0.1596-0.2045-0.3701-0.1023 0.4075 0.3622-0.3657-0.2684-0.0174 0.2711 0.5676 0.2750 0.1667-0.1303 0.4376 0.3844-0.3066 0.1230 0.2259-0.3096-0.3579 0.3127-0.2406-0.3122-0.2611 0.2958-0.4232 0.0277 0.4305-0.3800 0.5114 0.2010 Σ (7x7) = 3.9901 0 0 0 0 0 0 0 2.2813 0 0 0 0 0 0 0 1.6705 0 0 0 0 0 0 0 1.3522 0 0 0 0 0 0 0 1.1818 0 0 0 0 0 0 0 0.6623 0 0 0 0 0 0 0 0.6487 V (8x7) = 0.2917-0.2674 0.3883-0.5393 0.3926-0.2112-0.4505 0.3399 0.4811 0.0649-0.3760-0.6959-0.0421-0.1462 0.1889-0.0351-0.4582-0.5788 0.2211 0.4247 0.4346-0.0000-0.0000-0.0000-0.0000 0.0000-0.0000 0.0000 0.6838-0.1913-0.1609 0.2535 0.0050-0.5229 0.3636 0.4134 0.5716-0.0566 0.3383 0.4493 0.3198-0.2839 0.2176-0.5151-0.4369 0.1694-0.2893 0.3161-0.5330 0.2791-0.2591 0.6442 0.1593-0.1648 0.5455 0.2998 2010-5-18 Wang Houfeng, ICL,PKU 77

Dimension Reduction in LSI In matrix Σ, select only k largest values Keep corresponding columns in U and V he resultant matrix A k is given by A k = U k Σ k V k where k, k < r, is the dimensionality of the concept space he parameter k should be large enough to allow fitting the characteristics of the data small enough to filter out the non-relevant representational detail 2010-5-18 Wang Houfeng, ICL,PKU 78

Computing and Using LSI Documents Documents erms A = U Σ V U k Σ k V k = erms m n A = m r U r r D r n V mxk U k kxk D k kxn V k = mxn  k Singular Value Decomposition (SVD): Convert term-document matrix into 3matrices U, Σ and V Reduce Dimensionality: hrow out low-order rows and columns Recreate Matrix: Multiply to produce approximate termdocument matrix. Use new matrix to process queries OR, better, map query to reduced space 2010-5-18 Wang Houfeng, ICL,PKU 79

Low-rank Approximation:Reduction SVD can be used to compute optimal low-rank approximations. Approximation problem: Find A k of rank k such that A k = min)= X : rank ( X k A X F Frobenius norm A k and X are both m n matrices. ypically, want k << r. 2010-5-18 Wang Houfeng, ICL,PKU 80

Low-rank Approximation Solution via SVD A = U diag( σ 1,..., σ,0,...,0) V k k set smallest r-k singular values to zero k k Ak = σ i = 1 i u i v i column notation: sum of rank 1 matrices 2010-5-18 Wang Houfeng, ICL,PKU 81

Approximation error It s the best possible, measured by the Frobenius norm of the error: A X = σ :min F k F k + 1 X rank ( X ) = k = A A where the σ i are ordered such that σ i σ i+1. Suggests why Frobenius error drops as k increased. 2010-5-18 Wang Houfeng, ICL,PKU 82

SVD Low-rank approximation Whereas the term-doc matrix A may have m=50000, n=10 million (and rank close to 50000) We can construct an approximation A 100 with rank 100. Of all rank 100 matrices, it would have the lowest Frobenius error. 2010-5-18 Wang Houfeng, ICL,PKU 83

Following the Example term ch1 ch2 ch3 ch4 ch5 ch6 ch7 ch8 controllability 1 1 0 0 1 0 0 1 observability 1 0 0 0 1 1 0 1 realization 1 0 1 0 1 0 1 0 feedback 0 1 0 0 0 1 0 0 controller 0 1 0 0 1 1 0 0 observer 0 1 1 0 1 1 0 0 transfer function 0 0 0 0 1 1 0 0 polynomial 0 0 0 0 1 0 1 0 matrices 0 0 0 0 1 0 1 1 U (9x7) = 0.3996-0.1037 0.5606-0.3717-0.3919-0.3482 0.1029 0.4180-0.0641 0.4878 0.1566 0.5771 0.1981-0.1094 0.3464-0.4422-0.3997-0.5142 0.2787 0.0102-0.2857 0.1888 0.4615 0.0049-0.0279-0.2087 0.4193-0.6629 0.3602 0.3776-0.0914 0.1596-0.2045-0.3701-0.1023 0.4075 0.3622-0.3657-0.2684-0.0174 0.2711 0.5676 0.2750 0.1667-0.1303 0.4376 0.3844-0.3066 0.1230 0.2259-0.3096-0.3579 0.3127-0.2406-0.3122-0.2611 0.2958-0.4232 0.0277 0.4305-0.3800 0.5114 0.2010 Σ (7x7) = 3.9901 0 0 0 0 0 0 0 2.2813 0 0 0 0 0 0 0 1.6705 0 0 0 0 0 0 0 1.3522 0 0 0 0 0 0 0 1.1818 0 0 0 0 0 0 0 0.6623 0 0 0 0 0 0 0 0.6487 V (8x7) = 0.2917-0.2674 0.3883-0.5393 0.3926-0.2112-0.4505 0.3399 0.4811 0.0649-0.3760-0.6959-0.0421-0.1462 0.1889-0.0351-0.4582-0.5788 0.2211 0.4247 0.4346-0.0000-0.0000-0.0000-0.0000 0.0000-0.0000 0.0000 0.6838-0.1913-0.1609 0.2535 0.0050-0.5229 0.3636 0.4134 0.5716-0.0566 0.3383 0.4493 0.3198-0.2839 0.2176-0.5151-0.4369 0.1694-0.2893 0.3161-0.5330 0.2791-0.2591 0.6442 0.1593-0.1648 0.5455 0.2998 2010-5-18 Wang Houfeng, ICL,PKU 84

Formally, this will be the rank-k (2) matrix that is closest to M in the matrix norm sense U (9x7) = 0.3996-0.1037 0.5606-0.3717-0.3919-0.3482 0.1029 0.4180-0.0641 0.4878 0.1566 0.5771 0.1981-0.1094 0.3464-0.4422-0.3997-0.5142 0.2787 0.0102-0.2857 0.1888 0.4615 0.0049-0.0279-0.2087 0.4193-0.6629 0.3602 0.3776-0.0914 0.1596-0.2045-0.3701-0.1023 0.4075 0.3622-0.3657-0.2684-0.0174 0.2711 0.5676 0.2750 0.1667-0.1303 0.4376 0.3844-0.3066 0.1230 0.2259-0.3096-0.3579 0.3127-0.2406-0.3122-0.2611 0.2958-0.4232 0.0277 0.4305-0.3800 0.5114 0.2010 Σ (7x7) = 3.9901 0 0 0 0 0 0 0 2.2813 0 0 0 0 0 0 0 1.6705 0 0 0 0 0 0 0 1.3522 0 0 0 0 0 0 0 1.1818 0 0 0 0 0 0 0 0.6623 0 0 0 0 0 0 0 0.6487 V (7x8) = 0.2917-0.2674 0.3883-0.5393 0.3926-0.2112-0.4505 0.3399 0.4811 0.0649-0.3760-0.6959-0.0421-0.1462 0.1889-0.0351-0.4582-0.5788 0.2211 0.4247 0.4346-0.0000-0.0000-0.0000-0.0000 0.0000-0.0000 0.0000 0.6838-0.1913-0.1609 0.2535 0.0050-0.5229 0.3636 0.4134 0.5716-0.0566 0.3383 0.4493 0.3198-0.2839 0.2176-0.5151-0.4369 0.1694-0.2893 0.3161-0.5330 0.2791-0.2591 0.6442 0.1593-0.1648 0.5455 0.2998 U2 (9x2) = 0.3996-0.1037 0.4180-0.0641 0.3464-0.4422 0.1888 0.4615 0.3602 0.3776 0.4075 0.3622 0.2750 0.1667 0.2259-0.3096 0.2958-0.4232 Σ 2 (2x2) = 3.9901 0 0 2.2813 V2 (8x2) = 0.2917-0.2674 0.3399 0.4811 0.1889-0.0351-0.0000-0.0000 0.6838-0.1913 0.4134 0.5716 0.2176-0.5151 0.2791-0.2591 U2* Σ 2*V2 will be a 9x8 matrix hat approximates original matrix 2010-5-18 Wang Houfeng, ICL,PKU 85

What should be the value of k? UΣV 5 components ignored =U 7 Σ 7 V 7 K=2 term ch1 ch2 ch3 ch4 ch5 ch6 ch7 ch8 controllability 1 1 0 0 1 0 0 1 observability 1 0 0 0 1 1 0 1 realization 1 0 1 0 1 0 1 0 feedback 0 1 0 0 0 1 0 0 controller 0 1 0 0 1 1 0 0 observer 0 1 1 0 1 1 0 0 transfer function 0 0 0 0 1 1 0 0 polynomial 0 0 0 0 1 0 1 0 matrices 0 0 0 0 1 0 1 1 K=6 One component ignored K=4 3 components ignored U 2 S 2 V 2 0.52835834 0.42813724 0.30949408 0.0 1.1355368 0.5239192 0.46880865 0.5063048 0.5256176 0.49655432 0.3201918 0.0 1.1684579 0.6059082 0.4382505 0.50338876 0.6729299-0.015529543 0.29650056 0.0 1.1381099-0.0052356124 0.82038856 0.6471-0.0617774 0.76256883 0.10535021 0.0 0.3137232 0.9132189-0.37838274-0.06253 0.18889774 0.90294445 0.24125765 0.0 0.81799114 1.0865396-0.1309748 0.17793834 0.25334513 0.95019233 0.27814224 0.0 0.9537667 1.1444798-0.071810216 0.2397161 0.21838559 0.55592346 0.19392742 0.0 0.6775683 0.6709899 0.042878807 0.2077163 0.4517898-0.033422917 0.19505836 0.0 0.75146574-0.031091988 0.55994695 0.4345 0.60244554-0.06330189 0.25684044 0.0 0.99175954-0.06392482 0.75412846 0.5795 U 6 Σ 6 V 6 U 4 Σ 4 V 4 1.1630535 0.67789733 0.17131016 0.0 0.85744447 0.30088043-0.025483057 1.0295205 0.7278324 0.46981966-0.1757451 0.0 1.0910251 0.6314231 0.11810507 1.0620605 0.78863835 0.20257005 1.0048805 0.0 1.0692837-0.20266426 0.9943222 0.106248446-0.03825318 0.7772852 0.12343567 0.0 0.30284256 0.89999276-0.3883498-0.06326774 0.013223715 0.8118903 0.18630582 0.0 0.8972661 1.1681904-0.027708884 0.11395822 0.21186034 1.0470067 0.76812166 0.0 0.960058 1.0562774 0.1336124-0.2116417-0.18525022 0.31930918-0.048827052 0.0 0.8625925 0.8834896 0.23821498 0.1617572-0.008397698-0.23121 0.2242676 0.0 0.9548515 0.14579195 0.89278513 0.1167786 0.30647483-0.27917668-0.101294056 0.0 1.1318822 0.13038804 0.83252335 0.70210195 1.0299273 1.0099105-0.029033005 0.0 0.9757162 0.019038305 0.035608776 0.98004794 0.96788234-0.010319378 0.030770123 0.0 1.0258299 0.9798115-0.03772955 1.0212346 0.9165214-0.026921304 1.0805727 0.0 1.0673982-0.052518982 0.9011715 0.055653755-0.19373542 0.9372319 0.1868434 0.0 0.15639876 0.87798584-0.22921464 0.12886547-0.029890355 0.9903935 0.028769515 0.0 1.0242295 0.98121595-0.03527296 0.020075336 0.16586632 1.0537577 0.8398298 0.0 0.8660687 1.1044582 0.19631699-0.11030859 0.035988174 0.01172187-0.03462495 0.0 0.9710446 1.0226605 0.04260301-0.023878671-0.07636017-0.024632007 0.07358454 0.0 1.0615499-0.048087567 0.909685 0.050844945 0.05863098 0.019081593-0.056740552 0.0 0.95253044 0.03693092 1.0695065 0.96087193 2010-5-18 Wang Houfeng, ICL,PKU 86

Querying o query for feedback controller, the query vector would be q = [0 0 0 1 1 0 0 0 0]' (' indicates transpose), Let q be the query vector. hen the document-space vector corresponding to q is given by: q'*u2*inv(σ2) = Dq Point at the centroid of the query terms poisitions in the new space. For the feedback controller query vector, the result is: Dq = 0.1376 0.3678 o find the best document match, we compare the Dq vector against all the document vectors in the 2- dimensional V2 space. he document vector that is nearest in direction to Dq is the best match. he cosine values for the eight document vectors and the query vector are: -0.3747 0.9671 0.1735-0.9413 0.0851 0.9642-0.7265-0.3805 U2 (9x2) = 0.3996-0.1037 0.4180-0.0641 0.3464-0.4422 0.1888 0.4615 0.3602 0.3776 0.4075 0.3622 0.2750 0.1667 0.2259-0.3096 0.2958-0.4232 Σ2 (2x2) = 3.9901 0 0 2.2813 V2 (8x2) = 0.2917-0.2674 0.3399 0.4811 0.1889-0.0351-0.0000-0.0000 0.6838-0.1913 0.4134 0.5716 0.2176-0.5151 0.2791-0.2591 term ch1 ch2 ch3 ch4 ch5 ch6 ch7 ch8 controllability 1 1 0 0 1 0 0 1 observability 1 0 0 0 1 1 0 1 realization 1 0 1 0 1 0 1 0 feedback 0 1 0 0 0 1 0 0 controller 0 1 0 0 1 1 0 0 observer 0 1 1 0 1 1 0 0 transfer function 0 0 0 0 1 1 0 0 polynomial 0 0 0 0 1 0 1 0 matrices 0 0 0 0 1 0 1 1-0.37 0.967 0.173-0.94 0.08 0.96-0.72-0.38 2010-5-18 Wang Houfeng, ICL,PKU 87

What LSI can do LSI analysis effectively does Dimensionality reduction Noise reduction Correlation analysis and Query expansion (with related words) Some of the individual effects can be achieved with simpler techniques (e.g. thesaurus construction). LSI does them together. LSI handles synonymy well, not so much polysemy Challenge: SVD is complex to compute (O(n 3 )) Needs to be updated as new documents are found/updated 2010-5-18 Wang Houfeng, ICL,PKU 88