Multimedia Databases - 8A Final Term - exercises Exercises for the preparation to the final term June, the 1th 00 quiz 1. approximation of cosine similarity An approximate computation of the cosine similarity is based on grouping the documents so as to have master and slave documents. If the number of document of a given collection is 1 million, how many times the cosine similarity needs to be computed for a given query? (a) 1 million; (b) less than 1,000; (c) log 1,000 that is less than 0; (d) neither of the previous answers. Answer: b. inverted indeces: space Suppose we want to create an inverted index capable of supporting full-text retrieval. Which one of the following hypothesis is more reasonable concerning the space of the index? (a) It is about 10% of the space of the documents; (b) It is at least the same dimension as the document collection s; (c) It is at least 10 times as big as the document collection. Risposta: b Exercises 1. meta-search engines Meta-search is a technique for searching the Web by collecting results of different search engines. Which one of the following hypothesis do you consider more reasonable? The answer needs an explanation. 1
1 Figure 1: Compute the degree of hub and authorities for the hub-like structure represented in Fig. 1. (a) Meta-search engines can cover a more significant part of the Web of traditional search engines thanks to the merging of their results. Hence, they are likely to represent the best Web search service in the next few years; (b) Meta-searching is very effective in principle, but the current situation in the market does not suggest it will replace nowadays best search engines. counterexample for approximate cosine Provide a counterexample of the statemenet: the master-slave cosine similarity scheme yields the same similarity as the classic cosine similarity. hubs and authorities Compute the degree of hub and authorities for the hub-like structure. What happens if we also connect the dashed links? Solution We stat considering the dashed links disconnected. We solve by directly writing down the equations: a 1 (t + 1) = a 1 (t) a (t + 1) = h 1 (t) a (t + 1) = h 1 (t) a (t + 1) = h 1 (t) and h 1 (t + 1) = a (t + 1) + a (t + 1) + a (t + 1) h (t + 1) = h (t) h (t + 1) = h (t)
1 Figure : Compute the degree of hub and authorities. Considering that a(0) = h(0) = [1,1,1,1], from induction on t, we find a 1 (t) = 1 a (t) = a (t) = a (t) = t 1 h 1 (t) = t h (t) = h (t) = h (t) = 1 Hence, considering the normalization, as t a = [0,1,1,1] h = [1,0,0,0] The same result can be found by constructing the graph incidence matrix 0 1 1 1 0 0 0 0 A = 0 0 0 0 0 0 0 0 If we construct M. = A A and compute the principal eigenvector, we find the same result. Now suppose we attach the dashed links. Hub and authorities of nodes 1,,, does not change and, obviously, a = a = a = h = h = h 7 = 0. loop Consider the graph represented in Fig.. Determine the hub and the authority of Solution We write down the equation of hub and authority: a 1 (t + 1) = h (t) a (t + 1) = h 1 (t) h 1 (t + 1) = a (t + 1) h (t + 1) = a 1 (t + 1) Beginning from the init values a(0) = h(0) = [1,1] and considering the normalization we have, using induction on t: a = [1,1] h = [1,1] Notice that in this case there is no principal eigenvector (two eigenvectors with the same eigenvalue, since AA = I).
1 Figure : Compute the degree of hub and authorities. 1 Figure : Compute the degree of hub and authorities.. layer Consider the graph represented in Fig.. Determine the hub and the authority of Answer a = h = [0,0,0,1,1,1] [1,1,1,0,0,0]. circular hub Consider the graph represented in Fig.. Determine the hub and the authority of Answer a = [0,0.,0.,0.,0.]
7 1 9 8 1 1 1 1 0 1 1 1 Figure : Compute the degree of hub and authorities. Figure : Distribution of hub and authority. h = 0 0 [,1,1,1,1] 7. a larger example Consider the graph represented in Fig.. Determine the hub and the authority of The principal eigenvector associated with AA and A A are a = [0.1,0.00,0.00,0.0,0.9,0.00,0.19,0.,0.8,0.0,0.7,0.00,0.8,0.,0.] h = [0.1,0.,0.,0.1,0.1,0.0,0.0,0.00,0.08,0.08,0.00,0.0,0.00,0.0,0.00]
Hub and authorities They are computed according to the following Kleinberg s algorithm (a) 1 [1,...,1] (b) a = 1 (c) h = 1 (d) for k = 1 to n do i. a(k) h(k 1) ii. h(k) a(k) iii. a(k) a(k)/ a(k) iv. h(k) h(k)/ h(k) Proposition 0.1 After k steps: Proof: Base: by definition a(k) [A A] k 1 A 1 h(k) [AA ] k 1 Induction step - From the induction hypothesis: Using an iteration of the algorithm: a(k 1) [A A] k A 1 h(k 1) [AA ] k 1 1 a(k) A h(k 1) A [AA ] k 1 1 = [A A] k 1 A 1 h(k) Aa(k) A[A A] k 1 A = [AA ] k 1 Proposition 0. Let M. = AA has a principal eigenvalue; that is k 1 : λ 1 (A) > λ k (A). Then Kleinberg s algoritm converges as k Proof: Let the initial point be and z IR n. If A is nonsingular than it can be written as z = α i v i where v i are the eigenvectors of A spanning IR n (repeat in case of multiplicity). Then Mz = = α i Mv i α i λ i v i
By induction on j M j z = α i λ j i v i As j M j z M j z v 1 v 1 7