A Latent Variable Graphical Model Derivation of Diversity for Set-based Retrieval

Size: px
Start display at page:

Download "A Latent Variable Graphical Model Derivation of Diversity for Set-based Retrieval"

Transcription

1 A Latent Variable Graphical Model Derivation of Diversity for Set-based Retrieval Keywords: Graphical Models::Directed Models, Foundations::Other Foundations Abstract Diversity has been heavily motivated as an objective criterion for result sets in the information retrieval literature and various ad-hoc heuristics have been proposed to explicitly optimize for it. In this paper, we start from first principles and show that optimizing a simple criterion of set-based relevance in a latent variable graphical model a framework we refer to as probabilistic latent accumulated relevance (PLAR) leads to diversity as a naturally emergent property of the solution. PLAR derives variants of latent semantic indexing (LSI) kernels for relevance and diversity and does not require ad-hoc tuning parameters to balance them. PLAR also directly motivates the general form of many other ad-hoc diversity heuristics in the literature, albeit with important modifications that we show can lead to improved performance on a diversity testbed from the TREC 6-8 Interactive Track. 1 Introduction One of the basic tenets of set-based information retrieval is to minimize redundancy (hence maximize diversity) in the result set to increase the chance that the results will contain items relevant to the user s query [9]. However, among the numerous proposals for result set diversification in the literature, most typically employ ad-hoc heuristics for balancing relevance and diversity. In this paper, we argue that diversification should emerge naturally from the optimization of a simple set-based relevance criterion and hence should form an integral part of relevance rather than a separate ad-hoc objective. By formalizing set-based retrieval as a sequence of query optimizations in a latent variable graphical model a framework we refer to as probabilistic latent accumulated relevance (PLAR) we show that indeed we can derive diversity as an emergent phenomena. One of PLAR s key contributions is to show that we can derive (and formally motivate) variants of many ad-hoc relevance and diversity heuristics commonly used in the information retrieval literature. Here we briefly discuss these areas along with the formal contributions of PLAR: Latent semantic indexing (LSI) [8] is a fundamental technique for determining document similarity in a latent space. PLAR directly derives an LSI-like latent similarity kernel as well as a novel variant LSI-like kernel for latent diversity. Maximal marginal relevance [5] is one of the staples of result set reranking and diversification in the information retrieval literature. PLAR derives a variant of MMR that uses a softer expected marginal relevance in place of the original maximal marginal relevance. Set-covering heuristics that ensure a result set optimally covers words or topics are a common heuristic for diversity and can be directly optimized in a supervised setting [20]. While PLAR is unsupervised, its derivation of expected marginal relevance has a direct interpretation as a latent topic set-covering approach. Portfolio theory is used in the information retrieval literature to motivate result set diversification based on the choice of uncorrelated items [18] to minimize investment risk (e.g., if one item is suboptimal, the others will not be related). PLAR can be directly intepreted as minimizing an expected risk metric derived from latent topic correlations of items in the result set. The rest of the paper proceeds as follows: we begin with the formal problem definition for set-based information retrieval and then proceed to introduce the PLAR framework and derive its solution; next we discuss related work on diversity metrics and in particular, the relation of PLAR to the prototypical, but ad-hoc methods discussed above for balancing relevance and diversity; finally, we empirically demonstrate the improved performance of PLAR over all of these ad-hoc diversification objectives on a diversity testbed from the TREC 6-8 Interactive Track.

2 2 Probabilistic Latent Accumulated Relevance Our objective in this section is to answer the following questions from first principles: suppose we have a set of items D (e.g., in information retrieval (IR), D could be a set of documents) and we want to recommend a small subset S k D (where S k = k and k D ) relevant to a given query q (itself an item, e.g, in IR, q would be a document-like list of terms); then (Q1) how do we define an appropriate set-based relevance criterion for S k and (Q2) how do we efficiently find the set S k that optimizes it? We start with question (Q1) and seek to provide an optimization criterion for set-based retrieval under relevance uncertainty; while each document is either relevant or not, we do not directly observe document relevance and instead treat it as a random variable. Thus we formalize document relevance in a latent variable graphical model as follows: ws 1 ws 21 wt 1 wt 21 ws 31 wt from Latent Dirichlet Allocation (LDA) [4]. We note that unsupervised topic models such as LDA can be trained offline for Figure 1; that is, there is no need to modify LDA inference to take into account dependencies introduced in Figure 1 since the relevance variables r i in PLAR are unobserved and hence D-separate all topic variables. With the CPTs for P (t i s i ) and P (t q) learned offline by LDA, we now define the CPTs for the relevance r i of item s i, which has a very natural definition: { P (r 1 t 1 if t 1 = t, t 1 ) = 0 if t 1 t { j > i : P (r j t 1 if (t i t j ) (t j = t ), r i = 0, t i, t j ) = 0 if (t i = t j ) (t j t ) Simply, s 1 is relevant if its topic t 1 matches the query topic t. For j > i, s j is relevant under the same conditions with the addition that if s i was not relevant (r i = 0), topic t j for s j should also not match t i. We omit a definition of P (r j t, r i = 1, t i, t j ) since it will not be needed in the following derivation. Finally, for j > i, we factorize the CPT for the relevance r j given the irrelevance of all r i (to be explained) as follows: wr 1 wr 21 wr 31 P (r j t, t 1,.., t j, r 1 =0,.., r j 1 =0) = P (r j t, r i =0, t i, t j ) q t' Figure 1: Graphical model used in PLAR. In Figure 1, shaded nodes represent observed variables while unshaded nodes are latent. The observed variables are the vector of query terms q and selected items s i (where for 1 i k, s i D). For the latent variables, let T be a discrete topic set. Then variables t i T represent topics for respective s i and t T represents a topic for query q. The r i are binary variables that indicate whether the respective selected items s i are relevant (1) or not (0). In what follows, we do not differentiate between random variables and specific realizations, assuming the latter are evident from context, e.g., r 1 = 0. The conditional probability tables (CPTs) in this discrete directed graphical model are defined as follows. P (t i s i ) represents the topic model of item s i and P (t q) represents a topic model of the query. There are a variety of ways to learn these topic CPTs based on the nature of the items and query; for an item set D consisting of text documents and a query that can be treated as a text document, one natural probabilistic model for P (t i s i ) and P (t q) can be derived By inspection, we note that this factorized perspective is equivalent to a joint CPT where r j is relevant only if t j = t and t i, t j t i. However, we use the equivalent factorized view since it will facilitate variable elimination used to simplify probabilistic queries in the graphical model. We define the accumulated relevance AccRel(S k, q) of set S k with query q as follows, where following the click chain model [11] of result set browsing, we assume item i is only examined if all previous items were irrelevant (r <i = 0): AccRel(S k, q) = k P (r i r 1 =0,..., r i 1 =0, s 1,..., s i, q) Hence, the formal set-based relevance optimization is to find the set S k D that maximizes AccRel(S k, q): S k = arg max AccRel(S k, q) (1) S k ={s 1,...,s k } Because it is intractable to directly optimize (1) for all possible subsets S k D for large D, we instead opt for a greedy optimization approach to address (Q2) where we choose each s i in order, conditioned on the previously chosen s <i. This approach is consistent with all of the result set diversification approaches we cite in Related Work. To demonstrate our greedy optimization of accumulated

3 relevance, we begin with the selection of the first item s 1: s 1 AccRel({s 1 }, q) P (r 1 s 1, q) s 1 s 1 P (r 1, s 1, q) s 1 r 1 P (r 1, s 1, q) =Z (indep. of s 1, c.f. Appendix A) s 1 s 1 P (r 1 t, t 1 )P (t q)p (t 1 s 1 ) t 1,t P (t q)p (t 1 = t s 1 ) (2) t This intuitive result follows by exploiting the structure of P (r 1 t, t 1 ) and draws parallels to Latent Semantic Indexing (LSI) [8] because it chooses s 1 so as to maximize the similarity between its topic model and that of the query q. Continuing, we next want to select s 2 given s 1 in order to maximize AccRel({s 1, s 2}, q). Here, we need only maximize over the second summation term (i = 2) in (1) since the first term (i = 1) in (1) is a constant given s 1: s 2 s 2 D\{s 1 } AccRel({s 1, s 2 }, q) s 2 D\{s 1 } P (r 2 = 1 s 1, s 2, q, r 1 = 0) P (r 2 = 1, s 1, s 2, q, r 1 = 0) s 2 D\{s 1 } P (r 2, s 1, s 2, q, r 1 = 0) r 2 }{{} =Z, (independent of s 2, c.f. Appendix A) P (r 2 = 1, s 1, s 2, q, r 1 = 0) s 2 D\{s 1 } P (r 2 = 1 r 1 = 0, t 1, t 2, t )P (t 1 s 1) s 2 D\{s 1 } t 1,t 2,t P (r 1 = 0 t 1, t )P (t 2 s 2 )P (t q) (3) Here we simply rewrote the conditional query P (r 2 = 1 s 1, s 2, q, r 1 = 0) in terms of the joint distribution and noted that the denominator was a constant Z independent of the maximization argument and hence irrelevant as derived in Appendix A. Then we expanded the query into the appropriate marginalization over the joint factored distribution given by the graphical model. P (t q)p (t 1 = t s 1)P (t 2 = t s 2 ) Given (3), we provide the remainder of the fascinating but tedious derivation in Appendix B (actually for the more general case of k > 1) and directly provide the final solution for the optimal selection of s 2: [ ] s 2 s 2 D\{s 1 } } t P (t q)p (t 2 = t s 2 ) {{} relevance [ ] (4) } t {{} diversity We pause at this point to note that one of our primary objectives in the paper has been achieved. By directly optimizing a set-based relevance criterion, diversity has automatically emerged in the solution. That is, the relevance term measures the similarity between the topic distribution for the query and s 2, while the diversity term penalizes s 2 for having too similar a topic distribution to the already chosen s 1, where the diversity term re-weights topics according to their probability given q. As discussed later, this querybased topic reweighting in the latent diversity term is an important modification of the ad-hoc diversity metrics we compare to. Continuing onto the most general case, we show the derivation for selecting s k (k > 1) once S = {s 1,..., s } has been selected: s k = arg max AccRel({s 1,..., s, s k }, q) s k D\S s k D\S P (r k = 1 s 1,..., s, s k, r 1 = 0,..., r = 0, q) P (r k = 1, s 1,..., s, s k, r 1 = 0,..., r = 0, q) s k D\S P (s 1,..., s, s k, r 1 = 0,..., r = 0, q) r k {z } =Z, (independent of s k, c.f. Appendix A) P (r k = 1, s 1,..., s, s k, r 1 = 0,..., r = 0, q) s k D\S P (t k s k )P (t q) s k D\S t 1,...,t k,t i 1 P (r k = 1 r i = 0, t, t i, t k )P (t i s i ) P (r i = 0 r j = 0, t, t i, t j) j=1 Again, we simply rewrote the conditional query P (r k = 1 s 1,..., s, s k, r 1 = 0,..., r = 0, q) in terms of joint distribution, noted independence of the denominator w.r.t the maximization argument as derived in Appendix A, then expanded the query into the appropriate marginalization over the joint factored distribution given by the graphical model. Given (5), we again refer to the reminder of the derivation in Appendix B, and directly present the final solution for the optimal selection of s k that generally defines probabilistic latent accumulated relevance (PLAR): s k s k D\S t P (t k = t s k )P (t q) (5) [1 P (t i = t s i )] (6) We note there are many nice properties of this general solution. First, given a learned topic model, it is very efficient and easy to compute. Second, as we will later show, there is a natural set-covering interpretation to the optimization in (6) that demonstrates that the best s k covers the most latent topic space not already covered by {s 1,..., s }.

4 3 Comparison of PLAR to Related Work As early as 1964, Goffman [9] notes that the relevance of documents in a list has to depend on the documents preceding it. More recently, work on maximal marginal relevance (MMR) [5] was one of the first to formalize diversification as a mathematical optimization criterion; MMR has proved one of the most popular diversity approaches. Aside from this work, two of the other notable works are [20], which formalizes a structured SVM loss function based on a set covering objective, and [18], which borrows concepts from portfolio theory in economics to treat result set diversification as optimization of a risk minimization objective. We note that PLAR as derived in the last section formally motivates these last three somewhat ad-hoc diversification approaches and we discuss these connections more deeply in the following sections. Before we discuss specifics though, we provide a general summary of the result set diversification literature in Table 1. Here we breakdown the different proposals according to whether they are probabilistic, use latent models for determining similarity, use adaptive learning techniques, and finally whether they are unsupervised, i.e., they do not require labeled data or feedback. We note that PLAR is the first proposal (that we are aware of) to combine all four traits in the affirmative and respectively, we believe these four traits provide robustness, generalization with sparse data, data adaptivity, and generality for situations without the availability of manual labeling or user feedback. To strengthen these claims, we note that each trait plays some role in the improved performance of PLAR over various diversification approaches in the empirical results section. 3.1 Latent Semantic Indexing We noted previously that PLAR directly re-derives latent similarity metrics introduced by Latent Semantic Indexing (LSI) [8]. To clarify, we introduce the relevance metric Sim PLAR 1 given by PLAR in both (2) and (4) where we let T and T i be respective topic probability vectors for query q and item s i with vector elements T j = P (t = j q) and T ij = P (t i = j s 2 ) and use, for the inner product: Sim PLAR 1 (q, s i ) = t P (t q)p (t i = t s i ) = T, T i We note that Sim PLAR 1 is precisely the unnormalized LSI similarity kernel metric and directly derived in (2) and (4). More interestingly, using, w to denote the reweighted inner product where the ith term of the dot product is multiplied by w i, we note that a similar analysis applies to the diversity metric Sim PLAR 2 (s i, s j ) given by the second diver- Table 1: Dimensions of diversified set-based retrieval system: Prob.: uses probabilistic models?; Latent: use latent topic models?; Learn.: uses some form of learning?; Unsup.: does not require labeled topic data or feedback? Diversity Paper Prob. Latent Learn. Unsup. Carbonell [5] Anagnostop. [2] Radlinski [12] Radlinski [13] Clarke [6] ue [20] Bai [3] Sanderson [16] u [19] Agrawal [1] Gollapudi [10] Clough [7] Song [17] Wang [18] PLAR (this paper) sity term of PLAR given in (4): Sim PLAR 2 (q, s i, s j ) = t P (t q)p (t i = t s i )P (t j = t s j ) = T i, T j T This analysis shows that PLAR derives a diversity metrics based on the LSI similarity metric, but reweighted by the query topic probability P (t q). This is a fascinating and intuitive result that reflects the intuitions that latent diversity between items should be emphasized according to the latent topic probability of the query for which the setbased relevance criterion is being optimized. We note that no other diversity metrics that we are aware of use such a query reweighted approach at retrieval time. 3.2 Maximal Marginal Relevance Maximal marginal relevance (MMR) [5] is perhaps one of the most popular methods for balancing relevance and diversity in set-based information retrieval systems and has been cited over 530 times since its publication in 1998, according to Google Scholar. Like PLAR, MMR proposes to build Sk in a greedy manner by selecting s k given S = {s 1,..., s } (where as usual Sk = S {s k }) according to the following criteria s k [λ(sim 1 (s k, q)) (1 λ) max Sim 2 (s k, s i )] s k D\S s i S (7) where Sim 1 (, ) measures the relevance between an item and a query, Sim 2 (, ) measures the similarity between two items, and the manually tuned λ [0, 1] trades off rel-

5 evance and diversity. In the case of s 1, the second term disappears. While MMR is a popular algorithm, it was specified in a rather ad-hoc manner and good performance typically relies on careful tuning of the λ parameter. Furthermore, MMR is agnostic to the specific similarity metrics used, which indeed allows for flexibility, but makes no indication as to the choice of similarity metrics for Sim 1 and Sim 2 that are compatible with each other and also appropriate for good performance. We note that a very natural mapping to the MMR algorithm in (7) has emerged automatically from the derivation that maximized accumulated relevance. This is especially apparent for the optimal S2 in the form of (4), where we note that MMR and PLAR exactly match when λ = Hence, and when Sim 1 = Sim PLAR 1 and Sim 2 = Sim PLAR making these three substitutions into MMR, we might arrive at a general algorithm for PLAR, derived from first principles. However, we note that MMR and PLAR are divergent when considering Sk for k > 2. Here, MMR chooses to penalize each newly selected item s k by its minimal diversity with respect to items in S, whereas PLAR (as we will demonstrate next) penalizes each s k according to its expected, query-weighted overlap with the topic coverage of all items in S. We formalize this more clearly next, but note that this difference leads to improvement of PLAR over MMR, even when MMR uses the same similarity and diversity metrics (Sim PLAR 1 and Sim PLAR 2 ) as used by PLAR. 3.3 Set Covering Heuristics As an exemplar of set-covering heuristics for result set diversity optimization, we describe the general form of the loss function used by [20] for training SVMs with structured output. The weighted subtopic loss function WSL(S k, q) in this work is the weighted percentage of distinct subtopics in the topic set of query q not covered by the retrieved set S k. Formally, we define the set of topics for query q as T and denote the set of topics in T covered by S k as T k. Then we penalize a result set S k by the topics T \ T k according to the sum of their weights, where topic i is weighted according to the count n i of documents in D covered by topic i (normalized by i n i) as follows: WSL(S k, q) = i T \T k n i i T n i (8) While WSL provides a hard set-covering view of diversity, we now demonstrate that an expansion of (6) provides a soft latent set-covering interpretation that provides the fascinating insight that s k is chosen so as to best cover the latent topic space not already covered by {s 1,..., s }, again with topic importance reweighted by the topic probability given the query q. Formally expanding the prods * s s * 2 1 Figure 2: Inclusion-exclusion principle. The sets represent candidate items s for a query, and the area covered by each set is the information covered by that item. Numbers on different areas indicates the number of sets that share these areas. uct in in (6), (1 P (t i = t s i )), collecting terms and writing it as a series, we arrive at a form that reflects the inclusion-exclusion principle applied to the problem of tabulating the joint topic probability mass for topic t covered by {s 1,..., s }: `1 P (ti = t s i ) = 1 " P (t i = t s i ) P (t i = t s i )P (t j = t s j ) j=1 + ( 1) P (t i = t s i ) The inclusion-exclusion principle calculation provided by the term in [ ] is illustrated in Figure 2. In words, this expression in [ ] is calculating the total topic probability coverage of t by all selected items {s 1,..., s } by properly applying the inclusion-exclusion principle to ensure that overlapping probability coverage is not double counted. Then referring back to (6), we note that s k is chosen by maximizing a weighted sum over topics, where each topic weight is determined by its relevance to the query q, the item s k, and penalized (i.e., due to the 1 ) by the topic coverage of t by the set {s 1,..., s } to naturally encourage diversity. We note that this is a soft probabilistic version of the in or out topic coverage approach of WSL. 3.4 Portfolio Theory As a final connection to the literature, we note recent work [18] motivates diversification in set-based information retrieval by a risk minimizing portfolio selection approach. Rewriting the diversification objective of [18] to match the notation of our paper, they note the following (9)

6 greedy approach to selecting item s k worked best among the alternatives they tried: [ ] s k s k Sim BM25 (s k, q) λ ω i Sim TFIDF (s i, s k ) (10) Here, λ is a manually tuned weight parameter, ω 1,..., ω k are the weights of each position in the result set (for lack of more informed choice, we chose a uniform weighting), Sim BM25 (s k, q) is the BM25 [14] probabilistic relevance score of document s k w.r.t. query q and Sim TFIDF (s i, s k ) is a TFIDF [15] similarity metric between two documents s i and s k represented as term frequency vectors. Like MMR, this Portfolio objective directly trades off a relevance and diversity metric, but it uses a softer diversity metric that penalizes s k according to the sum of pairwise similarities with all previous documents {s 1,..., s }. This is motivated by portfolio theory, where the risk of investing in a new item s k is determined by its correlation with all previous items {s 1,..., s } (the theory being that if one of the previous items was suboptimal and very similar to s k then there is higher risk if choosing s k ). We note that like this Portfolio approach, PLAR also uses a soft diversity metric. And returning to (4), we note that we can rewrite it in a similar form to show that PLAR can also be viewed as formally motivating a Portfolio-like latent expected risk minimization approach to diversification: s 2 s 2 D\{s 1 } t P (t q) P (t 2 = t s 2 ) }{{} relevance P (t 1 = t s 1)P (t 2 = t s 2 ) }{{} correlation risk We note that an ad-hoc generalization of this Portfolio form for s 2 to the general case of s k might sum over all pairwise correlation risks. However, the previous set-covering interpretation of PLAR indicates that this would actually double count the risk because of overlap in the latent topic space that is not adjusted for using the principle of inclusionexclusion. In this sense, the Portfolio objective runs the risk of being overly conservative by overestimating risk and indeed we will see such results in the experiments. 4 Experimental Comparison Since we derived the diversification approach in PLAR from first principles in a latent variable graphical model, one interesting question to ask is how PLAR compares to more commonly used ad-hoc result set diversification approaches as discussed previously. For this purpose, we note that the TREC 6-8 diversity testbed described by [20] is the only publicly available diversity testbed that we found. Table 2: Weighted subtopic loss (WSL) of four methods using all words and first 10 words. Standard error estimates are shown for PLAR-LDA methods since their topic models are sampled, which may lead to small variances in performance. Method WSL (first 10 words) WSL (all words) Portfolio Theory MMR-TF MMR-TFIDF PLAR-LDA-Max ± ± PLAR-LDA ± ± We report experiments on a subset of TREC 6-8 data where we follow the same experimental setup as [20] who measure the weighted subtopic loss (WSL) (8) of item sets where in brief, WSL gives higher penalty for not covering popular subtopics. We do not compare directly to [20] as their method was supervised while the MMR, Portfolio approaches, and PLAR are inherently unsupervised and do not require manually labeled data. Standard query and item similarity metrics used in MMR applied to text data include the cosine of the term frequency (TF) and TF inverse document frequency (TFIDF) vector space models [15]. We denote these variants of MMR as MMR-TF and MMR-TFIDF and found λ =.5 to work best for both. We also implemented the BM25 Portfolio theorybased diversification objective described in [18] with uniform ω and best λ = 1. PLAR specifically suggests the use of LSI-based similarity metrics defined in the last section and we used LDA to train the topic models P (t i s i ) and P (t q) for a resulting algorithm we refer to as PLAR-LDA. LDA was trained with α = 2.0, β = 0.5, T = 15 and we note the results were not highly sensitive to these parameter choices. We also define PLAR-LDA-Max as MMR using the PLAR similarity and diversity metrics, respectively Sim PLAR 1 and Sim PLAR 2. Average WSL scores are shown in Table 2 on the 17 queries examined by [20]. We use both full documents and also just the first 10 words of each document. Due to the power of the latent topic model and derived similarity metrics, PLAR-LDA is able to perform better than MMR with standard TF and TFIDF metrics and without a λ parameter to be tuned. Furthermore, we note that PLAR with the MMR motivated maximum (PLAR-LDA-Max) performs worse on one evaluation indicating that PLAR-LDA s soft latent set covering approach may serve as a better diversity metric than a hard MMR-like diversity metric. The Portfolio approached showed somewhat unstable performance; we hypothesize it did worse with more data due to increased similarity and thus double counting in its risk minimization metric. We note that PLAR-LDA works comparatively very well with short documents (i.e., the results for documents restricted to the first 10 words) since intrinsic document and query similarities are automatically derived from the latent PLAR relevance and diversity metrics.

7 These results show that PLAR is robust (it does as well or better than other diversification algorithms), its latent topic modeling helps in situations with small context, the topic models automatically adapt to the data at hand, and it is fully unsupervised, requiring no manual tuning of diversity. 5 Concluding Remarks We formalized a relevance criterion for set-based information retrieval in a latent variable graphical model called probabilistic latent accumulated relevance (PLAR) and derived from this a novel solution yielding diversity as an emergent property. We showed that PLAR derives an LSI [8] similarity metric and a novel LSI variant for diversity. We also related the PLAR derivation to three important definitions of diversity in the literature: (a) maximal marginal relevance (MMR) [5], (b) set-covering heuristics [20], and (c) portfolio theory [18], and we showed how PLAR directly motivates all of them, but makes important modifications that led to improved empirical results on the TREC 6-8 diversity testbed. This provides evidence that formally optimizing the correct set-based relevance criterion in information retrieval can lead to natural and effective diversification strategies, demonstrating a path to move information retrieval away from ad-hoc diversity metrics towards formally motivated and better performing solutions. References [1] R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In Proceedings of the 2nd ACM International Conference on Web Search and Data Mining, pages 5 14, New ork, N, USA, ACM. [2] A. Anagnostopoulos, A. Z. Broder, and D. Carmel. Sampling search-engine results. In Proceedings of the 14th International Conference on World Wide Web, pages , Chiba, Japan, May ACM. [3] J. Bai and J.-. Nie. Adapting information retrieval to query contexts. Information Processing and Management, 44(6): , November [4] D. M. Blei, A.. Ng, and M. I. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3: , [5] J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reording documents and producing summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages ACM, August [6] C. L. A. Clarke, M. Kolla, G. V. Cormack, and O. Vechtomova. Novelty and diversity in information retrieval evaluation. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages ACM, [7] P. Clough, M. Sanderson, M. Abouammoh, S. Navarro, and M. Paramita. Multiple approaches to analysing query diversity. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages , New ork, N, USA, ACM. [8] S. Deerwester, S. T. Dumaisand, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41: , [9] W. Goffman. A searching procedure for information retrieval. Information Storage and Retrieval, 2:73 78, [10] S. Gollapudi and A. Sharma. An axiomatic approach for result diversification. In Proceedings of the 18th International Conference on World Wide Web, pages ACM, April [11] F. Guo, C. Liu, A. Kannan, T. Minka, M. Taylor,.- M. Wang, and C. Faloutsos. Click chain model in web search. In Proceedings of the 2009 International Conference on the World Wide Web, pages 11 20, Madrid, Spain, ACM. [12] F. Radlinski and S. Dumais. Improving personalized web search using result diversification. In Proceedings of the 29th Annual International ACM SI- GIR Conference on Research and Development in Information Retrieval, pages , New ork, N, USA, ACM. [13] F. Radlinski, R. Kleinberg, and T. Joachims. Learning diverse rankings with multi-armed bandits. In Proceedings of the 25th International Conference on Machine Learning, pages ACM, [14] S. Robertson, S. Walker, S. Jones, M. Hancock- Beaulieu, and M. Gatford. Okapi at trec-3. In Proceedings of the Third Text REtrieval Conference (TREC 1994), pages , Gaithersburg, USA, [15] G. Salton and M. McGill. Introduction to modern information retrieval. McGraw-Hill, [16] M. Sanderson. Ambiguous queries: Test collections need more sense. pages ACM, July [17] R. Song, Z. Luo, J.-. Nie,. u, and H.-W. Hon. Identification of ambiguous queries in web search. Information Processing and Management, 45(2): , March [18] J. Wang and J. Zhu. Portfolio theory of information retrieval. In Proceedings of SIGIR 09, pages ACM, July [19] C. u, L. Lakshmanan, and S. Amer-ahia. It takes variety to make a world: diversification in recommender systems. In EDBT 09: Proceedings of the 12th International Conference on Extending

8 Database Technology, volume 360, pages ACM, [20]. ue and T. Joachims. Predicting diverse subsets using structural SVMs. In ICML 08: Proceedings of the 25th International Conference on Machine Learning, pages , New ork, N, USA, ACM. A Independence of Normalization Constant Here we derive the normalization constant Z in (5): Z = P (r k = 1, s 1,..., s, s k, r 1 = 0,..., r = 0, q) j=1 " P (t k s k )P (t q) P (r k = 1 r i = 0, t, t i, t k ) t 1,...,t k,t i 1 = P (t i s i ) P (r i = 0 r j = 0, t, t i, t j)] As described in B, the term i 1 j=1 P (r i = 0 r j = 0, t, t i, t j ) is implied. Thus we can omit it for further derivation, and add the constraint i < k, t i t. Hence, we have Z = P (t k s k )P (t q) r k t 1 t,...,t k t,t (P (r k r i = 0, t, t i, t k )P (t i s i )) = = P (t k s k )P (t q) t 1 t,...,t k t,t " (P (r k = 1 r i = 0, t, t i, t k )P (t i s i ))+ We note that (P (r k = 0 r i = 0, t, t i, t k )P (t i s i )) t 1 t,...,t k t,t P (t k s k )P (t q) " P (t i s i ) P (r k = 1 r i = 0, t, t i, t k )+ P (r k = 0 r i = 0, t, t i, t k ) Q P (r k = 0 r i = 0, t, t i, t k ) = 1 implies that (t k t ) ( i < k, t k = t i) is true. Q P (r k = 1 r i = 0, t, t i, t k ) = 1 implies that ((t k t ) ( i < k, t k = t i)) is true. Hence, we have P (r k = 0 r i = 0, t, t i, t k ) =1 P (r k = 1 r i = 0, t, t i, t k ) Therefore, we obtain Z = = = t 1 t,...,t k t,t P (t k s k )P (t q) t 1 t,...,t t,t P (t q) t 1 t,...,t t,t P (t q) = t P (t q) = t P (t q) P (t i s i ) P (t i s i ) P (t k s k ) t k P (t i s i ) P (t i t s i ) (1 P (t i = t s i )) The crucial observation here is that Z is independent of the chosen s k and thus when choosing the maximizing s k, Z is irrelevant and can be ignored. B PLAR Derivation for s k, k > 2 Following is the derivation of PLAR from (5) through to the final result in (6). s k s k D\S P (r k = 1, s 1,..., s, s k, r 1 = 0,..., r = 0, q) P (t k s k )P (t q) s k D\S t 1,...,t k,t " P (r k = 1 r i = 0, t, t i, t k )P (t i s i ) i 1 P (r i = 0 r j = 0, t, t i, t j) j=1 i < k, P (r k = 1 r i = 0, t, t i, t k ) is 1 when t k = t and t i t holds. Since t i t implies that P (r i = 0 r j = 0, t, t i, t j ) is 1, the term P (r i = 0 r j = 0, t, t i, t j ) is redundant, which implies that i 1 j=1 P (r i = 0 r j = 0, t, t i, t j ) is redundant. Hence we can omit it, and obtain s k P (t k s k )P (t q) s k D\S t 1 t,...,t k t,t P (r k = 1 r i = 0, t, t i, t k )P (t i s i ) P (t k s k )P (t q) s k D\S t k,t 0 P (r k = 1 r i = 0, t, t i, t k )P (t i s i ) A t i t s k D\S t P (t k = t s k )P (t q) k (1 P (t i = t s i ))

On the Foundations of Diverse Information Retrieval. Scott Sanner, Kar Wai Lim, Shengbo Guo, Thore Graepel, Sarvnaz Karimi, Sadegh Kharazmi

On the Foundations of Diverse Information Retrieval. Scott Sanner, Kar Wai Lim, Shengbo Guo, Thore Graepel, Sarvnaz Karimi, Sadegh Kharazmi On the Foundations of Diverse Information Retrieval Scott Sanner, Kar Wai Lim, Shengbo Guo, Thore Graepel, Sarvnaz Karimi, Sadegh Kharazmi 1 Outline Need for diversity The answer: MMR But what was the

More information

An Axiomatic Framework for Result Diversification

An Axiomatic Framework for Result Diversification An Axiomatic Framework for Result Diversification Sreenivas Gollapudi Microsoft Search Labs, Microsoft Research sreenig@microsoft.com Aneesh Sharma Institute for Computational and Mathematical Engineering,

More information

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute

More information

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS RETRIEVAL MODELS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Boolean model Vector space model Probabilistic

More information

Collaborative topic models: motivations cont

Collaborative topic models: motivations cont Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem.

More information

A Study of the Dirichlet Priors for Term Frequency Normalisation

A Study of the Dirichlet Priors for Term Frequency Normalisation A Study of the Dirichlet Priors for Term Frequency Normalisation ABSTRACT Ben He Department of Computing Science University of Glasgow Glasgow, United Kingdom ben@dcs.gla.ac.uk In Information Retrieval

More information

Advanced Topics in Information Retrieval 5. Diversity & Novelty

Advanced Topics in Information Retrieval 5. Diversity & Novelty Advanced Topics in Information Retrieval 5. Diversity & Novelty Vinay Setty (vsetty@mpi-inf.mpg.de) Jannik Strötgen (jtroetge@mpi-inf.mpg.de) 1 Outline 5.1. Why Novelty & Diversity? 5.2. Probability Ranking

More information

Learning Maximal Marginal Relevance Model via Directly Optimizing Diversity Evaluation Measures

Learning Maximal Marginal Relevance Model via Directly Optimizing Diversity Evaluation Measures Learning Maximal Marginal Relevance Model via Directly Optimizing Diversity Evaluation Measures Long Xia Jun Xu Yanyan Lan Jiafeng Guo Xueqi Cheng CAS Key Lab of Network Data Science and Technology, Institute

More information

Notes on Latent Semantic Analysis

Notes on Latent Semantic Analysis Notes on Latent Semantic Analysis Costas Boulis 1 Introduction One of the most fundamental problems of information retrieval (IR) is to find all documents (and nothing but those) that are semantically

More information

III.6 Advanced Query Types

III.6 Advanced Query Types III.6 Advanced Query Types 1. Query Expansion 2. Relevance Feedback 3. Novelty & Diversity Based on MRS Chapter 9, BY Chapter 5, [Carbonell and Goldstein 98] [Agrawal et al 09] 123 1. Query Expansion Query

More information

PROBABILISTIC LATENT SEMANTIC ANALYSIS

PROBABILISTIC LATENT SEMANTIC ANALYSIS PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications

More information

Diversity Regularization of Latent Variable Models: Theory, Algorithm and Applications

Diversity Regularization of Latent Variable Models: Theory, Algorithm and Applications Diversity Regularization of Latent Variable Models: Theory, Algorithm and Applications Pengtao Xie, Machine Learning Department, Carnegie Mellon University 1. Background Latent Variable Models (LVMs) are

More information

Max-Sum Diversification, Monotone Submodular Functions and Semi-metric Spaces

Max-Sum Diversification, Monotone Submodular Functions and Semi-metric Spaces Max-Sum Diversification, Monotone Submodular Functions and Semi-metric Spaces Sepehr Abbasi Zadeh and Mehrdad Ghadiri {sabbasizadeh, ghadiri}@ce.sharif.edu School of Computer Engineering, Sharif University

More information

Comparing Relevance Feedback Techniques on German News Articles

Comparing Relevance Feedback Techniques on German News Articles B. Mitschang et al. (Hrsg.): BTW 2017 Workshopband, Lecture Notes in Informatics (LNI), Gesellschaft für Informatik, Bonn 2017 301 Comparing Relevance Feedback Techniques on German News Articles Julia

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval Lecture 11: Probabilistic Information Retrieval 1 Outline Basic Probability Theory Probability Ranking Principle Extensions 2 Basic Probability Theory For events A

More information

Modeling User Rating Profiles For Collaborative Filtering

Modeling User Rating Profiles For Collaborative Filtering Modeling User Rating Profiles For Collaborative Filtering Benjamin Marlin Department of Computer Science University of Toronto Toronto, ON, M5S 3H5, CANADA marlin@cs.toronto.edu Abstract In this paper

More information

Probabilistic Information Retrieval

Probabilistic Information Retrieval Probabilistic Information Retrieval Sumit Bhatia July 16, 2009 Sumit Bhatia Probabilistic Information Retrieval 1/23 Overview 1 Information Retrieval IR Models Probability Basics 2 Document Ranking Problem

More information

Information retrieval LSI, plsi and LDA. Jian-Yun Nie

Information retrieval LSI, plsi and LDA. Jian-Yun Nie Information retrieval LSI, plsi and LDA Jian-Yun Nie Basics: Eigenvector, Eigenvalue Ref: http://en.wikipedia.org/wiki/eigenvector For a square matrix A: Ax = λx where x is a vector (eigenvector), and

More information

Content-based Recommendation

Content-based Recommendation Content-based Recommendation Suthee Chaidaroon June 13, 2016 Contents 1 Introduction 1 1.1 Matrix Factorization......................... 2 2 slda 2 2.1 Model................................. 3 3 flda 3

More information

Can Vector Space Bases Model Context?

Can Vector Space Bases Model Context? Can Vector Space Bases Model Context? Massimo Melucci University of Padua Department of Information Engineering Via Gradenigo, 6/a 35031 Padova Italy melo@dei.unipd.it Abstract Current Information Retrieval

More information

PV211: Introduction to Information Retrieval

PV211: Introduction to Information Retrieval PV211: Introduction to Information Retrieval http://www.fi.muni.cz/~sojka/pv211 IIR 11: Probabilistic Information Retrieval Handout version Petr Sojka, Hinrich Schütze et al. Faculty of Informatics, Masaryk

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Natural Language Processing. Topics in Information Retrieval. Updated 5/10

Natural Language Processing. Topics in Information Retrieval. Updated 5/10 Natural Language Processing Topics in Information Retrieval Updated 5/10 Outline Introduction to IR Design features of IR systems Evaluation measures The vector space model Latent semantic indexing Background

More information

Lecture 5: Introduction to (Robertson/Spärck Jones) Probabilistic Retrieval

Lecture 5: Introduction to (Robertson/Spärck Jones) Probabilistic Retrieval Lecture 5: Introduction to (Robertson/Spärck Jones) Probabilistic Retrieval Scribes: Ellis Weng, Andrew Owens February 11, 2010 1 Introduction In this lecture, we will introduce our second paradigm for

More information

1. Ignoring case, extract all unique words from the entire set of documents.

1. Ignoring case, extract all unique words from the entire set of documents. CS 378 Introduction to Data Mining Spring 29 Lecture 2 Lecturer: Inderjit Dhillon Date: Jan. 27th, 29 Keywords: Vector space model, Latent Semantic Indexing(LSI), SVD 1 Vector Space Model The basic idea

More information

Diversity over Continuous Data

Diversity over Continuous Data Diversity over Continuous Data Marina Drosou Computer Science Department University of Ioannina, Greece mdrosou@cs.uoi.gr Evaggelia Pitoura Computer Science Department University of Ioannina, Greece pitoura@cs.uoi.gr

More information

Fast LSI-based techniques for query expansion in text retrieval systems

Fast LSI-based techniques for query expansion in text retrieval systems Fast LSI-based techniques for query expansion in text retrieval systems L. Laura U. Nanni F. Sarracco Department of Computer and System Science University of Rome La Sapienza 2nd Workshop on Text-based

More information

A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank

A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank Shoaib Jameel Shoaib Jameel 1, Wai Lam 2, Steven Schockaert 1, and Lidong Bing 3 1 School of Computer Science and Informatics,

More information

Collaborative Topic Modeling for Recommending Scientific Articles

Collaborative Topic Modeling for Recommending Scientific Articles Collaborative Topic Modeling for Recommending Scientific Articles Chong Wang and David M. Blei Best student paper award at KDD 2011 Computer Science Department, Princeton University Presented by Tian Cao

More information

A Risk Minimization Framework for Information Retrieval

A Risk Minimization Framework for Information Retrieval A Risk Minimization Framework for Information Retrieval ChengXiang Zhai a John Lafferty b a Department of Computer Science University of Illinois at Urbana-Champaign b School of Computer Science Carnegie

More information

Information Retrieval

Information Retrieval Introduction to Information CS276: Information and Web Search Christopher Manning and Pandu Nayak Lecture 13: Latent Semantic Indexing Ch. 18 Today s topic Latent Semantic Indexing Term-document matrices

More information

Comparative Summarization via Latent Dirichlet Allocation

Comparative Summarization via Latent Dirichlet Allocation Comparative Summarization via Latent Dirichlet Allocation Michal Campr and Karel Jezek Department of Computer Science and Engineering, FAV, University of West Bohemia, 11 February 2013, 301 00, Plzen,

More information

Citation for published version (APA): Andogah, G. (2010). Geographically constrained information retrieval Groningen: s.n.

Citation for published version (APA): Andogah, G. (2010). Geographically constrained information retrieval Groningen: s.n. University of Groningen Geographically constrained information retrieval Andogah, Geoffrey IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from

More information

Latent Dirichlet Allocation Introduction/Overview

Latent Dirichlet Allocation Introduction/Overview Latent Dirichlet Allocation Introduction/Overview David Meyer 03.10.2016 David Meyer http://www.1-4-5.net/~dmm/ml/lda_intro.pdf 03.10.2016 Agenda What is Topic Modeling? Parametric vs. Non-Parametric Models

More information

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

More information

Active and Semi-supervised Kernel Classification

Active and Semi-supervised Kernel Classification Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),

More information

Chap 2: Classical models for information retrieval

Chap 2: Classical models for information retrieval Chap 2: Classical models for information retrieval Jean-Pierre Chevallet & Philippe Mulhem LIG-MRIM Sept 2016 Jean-Pierre Chevallet & Philippe Mulhem Models of IR 1 / 81 Outline Basic IR Models 1 Basic

More information

Information Retrieval

Information Retrieval Introduction to Information CS276: Information and Web Search Christopher Manning and Pandu Nayak Lecture 15: Learning to Rank Sec. 15.4 Machine learning for IR ranking? We ve looked at methods for ranking

More information

Variable Latent Semantic Indexing

Variable Latent Semantic Indexing Variable Latent Semantic Indexing Prabhakar Raghavan Yahoo! Research Sunnyvale, CA November 2005 Joint work with A. Dasgupta, R. Kumar, A. Tomkins. Yahoo! Research. Outline 1 Introduction 2 Background

More information

11. Learning To Rank. Most slides were adapted from Stanford CS 276 course.

11. Learning To Rank. Most slides were adapted from Stanford CS 276 course. 11. Learning To Rank Most slides were adapted from Stanford CS 276 course. 1 Sec. 15.4 Machine learning for IR ranking? We ve looked at methods for ranking documents in IR Cosine similarity, inverse document

More information

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine CS 277: Data Mining Mining Web Link Structure Class Presentations In-class, Tuesday and Thursday next week 2-person teams: 6 minutes, up to 6 slides, 3 minutes/slides each person 1-person teams 4 minutes,

More information

Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing

Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing 1 Zhong-Yuan Zhang, 2 Chris Ding, 3 Jie Tang *1, Corresponding Author School of Statistics,

More information

CS276A Text Information Retrieval, Mining, and Exploitation. Lecture 4 15 Oct 2002

CS276A Text Information Retrieval, Mining, and Exploitation. Lecture 4 15 Oct 2002 CS276A Text Information Retrieval, Mining, and Exploitation Lecture 4 15 Oct 2002 Recap of last time Index size Index construction techniques Dynamic indices Real world considerations 2 Back of the envelope

More information

Information Retrieval and Web Search Engines

Information Retrieval and Web Search Engines Information Retrieval and Web Search Engines Lecture 4: Probabilistic Retrieval Models April 29, 2010 Wolf-Tilo Balke and Joachim Selke Institut für Informationssysteme Technische Universität Braunschweig

More information

1 Information retrieval fundamentals

1 Information retrieval fundamentals CS 630 Lecture 1: 01/26/2006 Lecturer: Lillian Lee Scribes: Asif-ul Haque, Benyah Shaparenko This lecture focuses on the following topics Information retrieval fundamentals Vector Space Model (VSM) Deriving

More information

Text Mining for Economics and Finance Latent Dirichlet Allocation

Text Mining for Economics and Finance Latent Dirichlet Allocation Text Mining for Economics and Finance Latent Dirichlet Allocation Stephen Hansen Text Mining Lecture 5 1 / 45 Introduction Recall we are interested in mixed-membership modeling, but that the plsi model

More information

Improving Diversity in Ranking using Absorbing Random Walks

Improving Diversity in Ranking using Absorbing Random Walks Improving Diversity in Ranking using Absorbing Random Walks Andrew B. Goldberg with Xiaojin Zhu, Jurgen Van Gael, and David Andrzejewski Department of Computer Sciences, University of Wisconsin, Madison

More information

Topic Modelling and Latent Dirichlet Allocation

Topic Modelling and Latent Dirichlet Allocation Topic Modelling and Latent Dirichlet Allocation Stephen Clark (with thanks to Mark Gales for some of the slides) Lent 2013 Machine Learning for Language Processing: Lecture 7 MPhil in Advanced Computer

More information

Language Models, Smoothing, and IDF Weighting

Language Models, Smoothing, and IDF Weighting Language Models, Smoothing, and IDF Weighting Najeeb Abdulmutalib, Norbert Fuhr University of Duisburg-Essen, Germany {najeeb fuhr}@is.inf.uni-due.de Abstract In this paper, we investigate the relationship

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

A Probabilistic Model for Online Document Clustering with Application to Novelty Detection

A Probabilistic Model for Online Document Clustering with Application to Novelty Detection A Probabilistic Model for Online Document Clustering with Application to Novelty Detection Jian Zhang School of Computer Science Cargenie Mellon University Pittsburgh, PA 15213 jian.zhang@cs.cmu.edu Zoubin

More information

Lecture 9: Probabilistic IR The Binary Independence Model and Okapi BM25

Lecture 9: Probabilistic IR The Binary Independence Model and Okapi BM25 Lecture 9: Probabilistic IR The Binary Independence Model and Okapi BM25 Trevor Cohn (Slide credits: William Webber) COMP90042, 2015, Semester 1 What we ll learn in this lecture Probabilistic models for

More information

Learning Socially Optimal Information Systems from Egoistic Users

Learning Socially Optimal Information Systems from Egoistic Users Learning Socially Optimal Information Systems from Egoistic Users Karthik Raman and Thorsten Joachims Department of Computer Science, Cornell University, Ithaca, NY, USA {karthik,tj}@cs.cornell.edu http://www.cs.cornell.edu

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

Fall CS646: Information Retrieval. Lecture 6 Boolean Search and Vector Space Model. Jiepu Jiang University of Massachusetts Amherst 2016/09/26

Fall CS646: Information Retrieval. Lecture 6 Boolean Search and Vector Space Model. Jiepu Jiang University of Massachusetts Amherst 2016/09/26 Fall 2016 CS646: Information Retrieval Lecture 6 Boolean Search and Vector Space Model Jiepu Jiang University of Massachusetts Amherst 2016/09/26 Outline Today Boolean Retrieval Vector Space Model Latent

More information

A Neural Passage Model for Ad-hoc Document Retrieval

A Neural Passage Model for Ad-hoc Document Retrieval A Neural Passage Model for Ad-hoc Document Retrieval Qingyao Ai, Brendan O Connor, and W. Bruce Croft College of Information and Computer Sciences, University of Massachusetts Amherst, Amherst, MA, USA,

More information

Manning & Schuetze, FSNLP, (c)

Manning & Schuetze, FSNLP, (c) page 554 554 15 Topics in Information Retrieval co-occurrence Latent Semantic Indexing Term 1 Term 2 Term 3 Term 4 Query user interface Document 1 user interface HCI interaction Document 2 HCI interaction

More information

Term Filtering with Bounded Error

Term Filtering with Bounded Error Term Filtering with Bounded Error Zi Yang, Wei Li, Jie Tang, and Juanzi Li Knowledge Engineering Group Department of Computer Science and Technology Tsinghua University, China {yangzi, tangjie, ljz}@keg.cs.tsinghua.edu.cn

More information

CS47300: Web Information Search and Management

CS47300: Web Information Search and Management CS47300: Web Information Search and Management Prof. Chris Clifton 6 September 2017 Material adapted from course created by Dr. Luo Si, now leading Alibaba research group 1 Vector Space Model Disadvantages:

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Matrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang

Matrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang Matrix Factorization & Latent Semantic Analysis Review Yize Li, Lanbo Zhang Overview SVD in Latent Semantic Indexing Non-negative Matrix Factorization Probabilistic Latent Semantic Indexing Vector Space

More information

Motivation. User. Retrieval Model Result: Query. Document Collection. Information Need. Information Retrieval / Chapter 3: Retrieval Models

Motivation. User. Retrieval Model Result: Query. Document Collection. Information Need. Information Retrieval / Chapter 3: Retrieval Models 3. Retrieval Models Motivation Information Need User Retrieval Model Result: Query 1. 2. 3. Document Collection 2 Agenda 3.1 Boolean Retrieval 3.2 Vector Space Model 3.3 Probabilistic IR 3.4 Statistical

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

12 : Variational Inference I

12 : Variational Inference I 10-708: Probabilistic Graphical Models, Spring 2015 12 : Variational Inference I Lecturer: Eric P. Xing Scribes: Fattaneh Jabbari, Eric Lei, Evan Shapiro 1 Introduction Probabilistic inference is one of

More information

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component

More information

Ranked Retrieval (2)

Ranked Retrieval (2) Text Technologies for Data Science INFR11145 Ranked Retrieval (2) Instructor: Walid Magdy 31-Oct-2017 Lecture Objectives Learn about Probabilistic models BM25 Learn about LM for IR 2 1 Recall: VSM & TFIDF

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

SUPPORT VECTOR MACHINE

SUPPORT VECTOR MACHINE SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition

More information

Multi-Label Informed Latent Semantic Indexing

Multi-Label Informed Latent Semantic Indexing Multi-Label Informed Latent Semantic Indexing Shipeng Yu 12 Joint work with Kai Yu 1 and Volker Tresp 1 August 2005 1 Siemens Corporate Technology Department of Neural Computation 2 University of Munich

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Topic Models and Applications to Short Documents

Topic Models and Applications to Short Documents Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text

More information

Integrating Logical Operators in Query Expansion in Vector Space Model

Integrating Logical Operators in Query Expansion in Vector Space Model Integrating Logical Operators in Query Expansion in Vector Space Model Jian-Yun Nie, Fuman Jin DIRO, Université de Montréal C.P. 6128, succursale Centre-ville, Montreal Quebec, H3C 3J7 Canada {nie, jinf}@iro.umontreal.ca

More information

Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval. Sargur Srihari University at Buffalo The State University of New York

Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval. Sargur Srihari University at Buffalo The State University of New York Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval Sargur Srihari University at Buffalo The State University of New York 1 A Priori Algorithm for Association Rule Learning Association

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, etworks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

Predicting Neighbor Goodness in Collaborative Filtering

Predicting Neighbor Goodness in Collaborative Filtering Predicting Neighbor Goodness in Collaborative Filtering Alejandro Bellogín and Pablo Castells {alejandro.bellogin, pablo.castells}@uam.es Universidad Autónoma de Madrid Escuela Politécnica Superior Introduction:

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Chapter 8 Text Classification Introduction A Characterization of Text Classification Unsupervised Algorithms Supervised Algorithms Feature Selection or Dimensionality Reduction

More information

Lecture 5: Web Searching using the SVD

Lecture 5: Web Searching using the SVD Lecture 5: Web Searching using the SVD Information Retrieval Over the last 2 years the number of internet users has grown exponentially with time; see Figure. Trying to extract information from this exponentially

More information

topic modeling hanna m. wallach

topic modeling hanna m. wallach university of massachusetts amherst wallach@cs.umass.edu Ramona Blei-Gantz Helen Moss (Dave's Grandma) The Next 30 Minutes Motivations and a brief history: Latent semantic analysis Probabilistic latent

More information

Latent Semantic Tensor Indexing for Community-based Question Answering

Latent Semantic Tensor Indexing for Community-based Question Answering Latent Semantic Tensor Indexing for Community-based Question Answering Xipeng Qiu, Le Tian, Xuanjing Huang Fudan University, 825 Zhangheng Road, Shanghai, China xpqiu@fudan.edu.cn, tianlefdu@gmail.com,

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract)

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Arthur Choi and Adnan Darwiche Computer Science Department University of California, Los Angeles

More information

Outline for today. Information Retrieval. Cosine similarity between query and document. tf-idf weighting

Outline for today. Information Retrieval. Cosine similarity between query and document. tf-idf weighting Outline for today Information Retrieval Efficient Scoring and Ranking Recap on ranked retrieval Jörg Tiedemann jorg.tiedemann@lingfil.uu.se Department of Linguistics and Philology Uppsala University Efficient

More information

Comparative Document Analysis for Large Text Corpora

Comparative Document Analysis for Large Text Corpora Comparative Document Analysis for Large Text Corpora Xiang Ren Yuanhua Lv Kuansan Wang Jiawei Han University of Illinois at Urbana-Champaign, Urbana, IL, USA Microsoft Research, Redmond, WA, USA {xren7,

More information

13 Searching the Web with the SVD

13 Searching the Web with the SVD 13 Searching the Web with the SVD 13.1 Information retrieval Over the last 20 years the number of internet users has grown exponentially with time; see Figure 1. Trying to extract information from this

More information

9 Searching the Internet with the SVD

9 Searching the Internet with the SVD 9 Searching the Internet with the SVD 9.1 Information retrieval Over the last 20 years the number of internet users has grown exponentially with time; see Figure 1. Trying to extract information from this

More information

ChengXiang ( Cheng ) Zhai Department of Computer Science University of Illinois at Urbana-Champaign

ChengXiang ( Cheng ) Zhai Department of Computer Science University of Illinois at Urbana-Champaign Axiomatic Analysis and Optimization of Information Retrieval Models ChengXiang ( Cheng ) Zhai Department of Computer Science University of Illinois at Urbana-Champaign http://www.cs.uiuc.edu/homes/czhai

More information

1 [15 points] Frequent Itemsets Generation With Map-Reduce

1 [15 points] Frequent Itemsets Generation With Map-Reduce Data Mining Learning from Large Data Sets Final Exam Date: 15 August 2013 Time limit: 120 minutes Number of pages: 11 Maximum score: 100 points You can use the back of the pages if you run out of space.

More information

Large-scale Information Processing, Summer Recommender Systems (part 2)

Large-scale Information Processing, Summer Recommender Systems (part 2) Large-scale Information Processing, Summer 2015 5 th Exercise Recommender Systems (part 2) Emmanouil Tzouridis tzouridis@kma.informatik.tu-darmstadt.de Knowledge Mining & Assessment SVM question When a

More information

Medical Question Answering for Clinical Decision Support

Medical Question Answering for Clinical Decision Support Medical Question Answering for Clinical Decision Support Travis R. Goodwin and Sanda M. Harabagiu The University of Texas at Dallas Human Language Technology Research Institute http://www.hlt.utdallas.edu

More information

Lecture 2 August 31, 2007

Lecture 2 August 31, 2007 CS 674: Advanced Language Technologies Fall 2007 Lecture 2 August 31, 2007 Prof. Lillian Lee Scribes: Cristian Danescu Niculescu-Mizil Vasumathi Raman 1 Overview We have already introduced the classic

More information

Prediction of Citations for Academic Papers

Prediction of Citations for Academic Papers 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Statistical Ranking Problem

Statistical Ranking Problem Statistical Ranking Problem Tong Zhang Statistics Department, Rutgers University Ranking Problems Rank a set of items and display to users in corresponding order. Two issues: performance on top and dealing

More information

Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation (LDA) Latent Dirichlet Allocation (LDA) A review of topic modeling and customer interactions application 3/11/2015 1 Agenda Agenda Items 1 What is topic modeling? Intro Text Mining & Pre-Processing Natural Language

More information

DM-Group Meeting. Subhodip Biswas 10/16/2014

DM-Group Meeting. Subhodip Biswas 10/16/2014 DM-Group Meeting Subhodip Biswas 10/16/2014 Papers to be discussed 1. Crowdsourcing Land Use Maps via Twitter Vanessa Frias-Martinez and Enrique Frias-Martinez in KDD 2014 2. Tracking Climate Change Opinions

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Learning MN Parameters with Alternative Objective Functions. Sargur Srihari

Learning MN Parameters with Alternative Objective Functions. Sargur Srihari Learning MN Parameters with Alternative Objective Functions Sargur srihari@cedar.buffalo.edu 1 Topics Max Likelihood & Contrastive Objectives Contrastive Objective Learning Methods Pseudo-likelihood Gradient

More information