Multimedia Databases - 68A6 Final Term - exercises

Similar documents
Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa

Family Feud Review. Linear Algebra. October 22, 2013

IR: Information Retrieval

Online Social Networks and Media. Link Analysis and Web Search

Link Analysis. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

Online Social Networks and Media. Link Analysis and Web Search

How does Google rank webpages?

As it is not necessarily possible to satisfy this equation, we just ask for a solution to the more general equation

Conditioning of the Entries in the Stationary Vector of a Google-Type Matrix. Steve Kirkland University of Regina

Homework Assignment 1 Solutions

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

PageRank. Ryan Tibshirani /36-662: Data Mining. January Optional reading: ESL 14.10

Complex Social System, Elections. Introduction to Network Analysis 1

6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search

Applications to network analysis: Eigenvector centrality indices Lecture notes

Data Mining and Matrices

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS 3750 Advanced Machine Learning. Applications of SVD and PCA (LSA and Link analysis) Cem Akkaya

Link Analysis Ranking

Eigenvalues and Eigenvectors

Hyperlinked-Induced Topic Search (HITS) identifies. authorities as good content sources (~high indegree) HITS [Kleinberg 99] considers a web page

Section 7.1 Relations and Their Properties. Definition: A binary relation R from a set A to a set B is a subset R A B.

Eigenvalues of Exponentiated Adjacency Matrices

This section is an introduction to the basic themes of the course.

Math 113 Homework 5 Solutions (Starred problems) Solutions by Guanyang Wang, with edits by Tom Church.

Web Information Retrieval Dipl.-Inf. Christoph Carl Kling

Linear Algebra Final Exam Solutions, December 13, 2008

STA141C: Big Data & High Performance Statistical Computing

Math 314H Solutions to Homework # 3

Graphic sequences, adjacency matrix

NORMS ON SPACE OF MATRICES

Link Analysis Information Retrieval and Data Mining. Prof. Matteo Matteucci

An Introduction to Spectral Graph Theory

Authoritative Sources in a Hyperlinked Enviroment

Symmetric Matrices and Eigendecomposition

PageRank algorithm Hubs and Authorities. Data mining. Web Data Mining PageRank, Hubs and Authorities. University of Szeged.

Link Analysis and Web Search

LINK ANALYSIS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Singular Value Decompsition

Google Page Rank Project Linear Algebra Summer 2012

1998: enter Link Analysis

Divide-Conquer-Glue Algorithms

CSE 494/598 Lecture-4: Correlation Analysis. **Content adapted from last year s slides

Section 5.4 (Systems of Linear Differential Equation); 9.5 Eigenvalues and Eigenvectors, cont d

1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det

Class President: A Network Approach to Popularity. Due July 18, 2014

ICS141: Discrete Mathematics for Computer Science I

Diagonalization of Matrix

Link Analysis. Leonid E. Zhukov

3 Best-Fit Subspaces and Singular Value Decomposition

Recitation 8: Graphs and Adjacency Matrices

Math 118, Fall 2014 Final Exam

How works. or How linear algebra powers the search engine. M. Ram Murty, FRSC Queen s Research Chair Queen s University

The following techniques for methods of proofs are discussed in our text: - Vacuous proof - Trivial proof

Outline Inverse of a Relation Properties of Relations. Relations. Alice E. Fischer. April, 2018

Retrieval by Content. Part 2: Text Retrieval Term Frequency and Inverse Document Frequency. Srihari: CSE 626 1

Graph-theoretic Problems

Flow Network. The following figure shows an example of a flow network:

k 6, then which of the following statements can you assume to be

Math 443/543 Graph Theory Notes 5: Graphs as matrices, spectral graph theory, and PageRank

UNIVERSITY OF YORK. MSc Examinations 2004 MATHEMATICS Networks. Time Allowed: 3 hours.

1 Searching the World Wide Web

Problem Set 5 Solutions

CS276A Text Information Retrieval, Mining, and Exploitation. Lecture 4 15 Oct 2002

1.1 Inductive Reasoning filled in.notebook August 20, 2015

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

7 Matrix Operations. 7.0 Matrix Multiplication + 3 = 3 = 4

Chapter 5. Eigenvalues and Eigenvectors

1 Invariant subspaces

directed weighted graphs as flow networks the Ford-Fulkerson algorithm termination and running time

Inf 2B: Ranking Queries on the WWW

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides

Exercise Sheet 1.

Bounds on the Ratio of Eigenvalues

Solution Set 7, Fall '12

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides

a s 1.3 Matrix Multiplication. Know how to multiply two matrices and be able to write down the formula

Week Quadratic forms. Principal axes theorem. Text reference: this material corresponds to parts of sections 5.5, 8.2,

CS47300: Web Information Search and Management

Lecture 12: Link Analysis for Web Retrieval

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

Impression Store: Compressive Sensing-based Storage for. Big Data Analytics

Linear Systems. Class 27. c 2008 Ron Buckmire. TITLE Projection Matrices and Orthogonal Diagonalization CURRENT READING Poole 5.4

Math 304 Handout: Linear algebra, graphs, and networks.

Linear Algebra 2 Spectral Notes

Algorithms Exam TIN093 /DIT602

1. What does abiotic mean? Give an example of an abiotic factor in the classroom. 2. What does biotic mean? Give an example of an biotic factor in

HW Graph Theory SOLUTIONS (hbovik) - Q

Graph Transformations T1 and T2

CS54701 Information Retrieval. Link Analysis. Luo Si. Department of Computer Science Purdue University. Borrowed Slides from Prof.

Eigenvectors Via Graph Theory

Index. Copyright (c)2007 The Society for Industrial and Applied Mathematics From: Matrix Methods in Data Mining and Pattern Recgonition By: Lars Elden

Jeffrey D. Ullman Stanford University

Faloutsos, Tong ICDE, 2009

Your quiz in recitation on Tuesday will cover 3.1: Arguments and inference. Your also have an online quiz, covering 3.1, due by 11:59 p.m., Tuesday.

The Singular Values of the Exponentiated Adjacency Matrices of Broom-Tree Graphs

Math 273 (51) - Final

DIAGONALIZATION. In order to see the implications of this definition, let us consider the following example Example 1. Consider the matrix

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine

Ma/CS 6b Class 12: Graphs and Matrices

Transcription:

Multimedia Databases - 8A Final Term - exercises Exercises for the preparation to the final term June, the 1th 00 quiz 1. approximation of cosine similarity An approximate computation of the cosine similarity is based on grouping the documents so as to have master and slave documents. If the number of document of a given collection is 1 million, how many times the cosine similarity needs to be computed for a given query? (a) 1 million; (b) less than 1,000; (c) log 1,000 that is less than 0; (d) neither of the previous answers. Answer: b. inverted indeces: space Suppose we want to create an inverted index capable of supporting full-text retrieval. Which one of the following hypothesis is more reasonable concerning the space of the index? (a) It is about 10% of the space of the documents; (b) It is at least the same dimension as the document collection s; (c) It is at least 10 times as big as the document collection. Risposta: b Exercises 1. meta-search engines Meta-search is a technique for searching the Web by collecting results of different search engines. Which one of the following hypothesis do you consider more reasonable? The answer needs an explanation. 1

1 Figure 1: Compute the degree of hub and authorities for the hub-like structure represented in Fig. 1. (a) Meta-search engines can cover a more significant part of the Web of traditional search engines thanks to the merging of their results. Hence, they are likely to represent the best Web search service in the next few years; (b) Meta-searching is very effective in principle, but the current situation in the market does not suggest it will replace nowadays best search engines. counterexample for approximate cosine Provide a counterexample of the statemenet: the master-slave cosine similarity scheme yields the same similarity as the classic cosine similarity. hubs and authorities Compute the degree of hub and authorities for the hub-like structure. What happens if we also connect the dashed links? Solution We stat considering the dashed links disconnected. We solve by directly writing down the equations: a 1 (t + 1) = a 1 (t) a (t + 1) = h 1 (t) a (t + 1) = h 1 (t) a (t + 1) = h 1 (t) and h 1 (t + 1) = a (t + 1) + a (t + 1) + a (t + 1) h (t + 1) = h (t) h (t + 1) = h (t)

1 Figure : Compute the degree of hub and authorities. Considering that a(0) = h(0) = [1,1,1,1], from induction on t, we find a 1 (t) = 1 a (t) = a (t) = a (t) = t 1 h 1 (t) = t h (t) = h (t) = h (t) = 1 Hence, considering the normalization, as t a = [0,1,1,1] h = [1,0,0,0] The same result can be found by constructing the graph incidence matrix 0 1 1 1 0 0 0 0 A = 0 0 0 0 0 0 0 0 If we construct M. = A A and compute the principal eigenvector, we find the same result. Now suppose we attach the dashed links. Hub and authorities of nodes 1,,, does not change and, obviously, a = a = a = h = h = h 7 = 0. loop Consider the graph represented in Fig.. Determine the hub and the authority of Solution We write down the equation of hub and authority: a 1 (t + 1) = h (t) a (t + 1) = h 1 (t) h 1 (t + 1) = a (t + 1) h (t + 1) = a 1 (t + 1) Beginning from the init values a(0) = h(0) = [1,1] and considering the normalization we have, using induction on t: a = [1,1] h = [1,1] Notice that in this case there is no principal eigenvector (two eigenvectors with the same eigenvalue, since AA = I).

1 Figure : Compute the degree of hub and authorities. 1 Figure : Compute the degree of hub and authorities.. layer Consider the graph represented in Fig.. Determine the hub and the authority of Answer a = h = [0,0,0,1,1,1] [1,1,1,0,0,0]. circular hub Consider the graph represented in Fig.. Determine the hub and the authority of Answer a = [0,0.,0.,0.,0.]

7 1 9 8 1 1 1 1 0 1 1 1 Figure : Compute the degree of hub and authorities. Figure : Distribution of hub and authority. h = 0 0 [,1,1,1,1] 7. a larger example Consider the graph represented in Fig.. Determine the hub and the authority of The principal eigenvector associated with AA and A A are a = [0.1,0.00,0.00,0.0,0.9,0.00,0.19,0.,0.8,0.0,0.7,0.00,0.8,0.,0.] h = [0.1,0.,0.,0.1,0.1,0.0,0.0,0.00,0.08,0.08,0.00,0.0,0.00,0.0,0.00]

Hub and authorities They are computed according to the following Kleinberg s algorithm (a) 1 [1,...,1] (b) a = 1 (c) h = 1 (d) for k = 1 to n do i. a(k) h(k 1) ii. h(k) a(k) iii. a(k) a(k)/ a(k) iv. h(k) h(k)/ h(k) Proposition 0.1 After k steps: Proof: Base: by definition a(k) [A A] k 1 A 1 h(k) [AA ] k 1 Induction step - From the induction hypothesis: Using an iteration of the algorithm: a(k 1) [A A] k A 1 h(k 1) [AA ] k 1 1 a(k) A h(k 1) A [AA ] k 1 1 = [A A] k 1 A 1 h(k) Aa(k) A[A A] k 1 A = [AA ] k 1 Proposition 0. Let M. = AA has a principal eigenvalue; that is k 1 : λ 1 (A) > λ k (A). Then Kleinberg s algoritm converges as k Proof: Let the initial point be and z IR n. If A is nonsingular than it can be written as z = α i v i where v i are the eigenvectors of A spanning IR n (repeat in case of multiplicity). Then Mz = = α i Mv i α i λ i v i

By induction on j M j z = α i λ j i v i As j M j z M j z v 1 v 1 7