Concept Tracking for Microblog Search

Size: px
Start display at page:

Download "Concept Tracking for Microblog Search"

Transcription

1 1,a) 1,b) 1,c) , Tweets2011 Twitter Concept Tracking for Microblog Search Taiki Miyanishi 1,a) Kazuhiro Seki 1,b) Kuniaki Uehara 1,c) Received: December 22, 2013, Accepted: April 7, 2014 Abstract: Incorporating the temporal property of words into query expansion methods based on relevance feedback has been shown to have a significant positive effect on microblog searching. In this paper, we propose a concept-based query expansion method based on a temporal relevance model that uses the temporal variation of concepts (e.g., terms or phrases) on microblogs. Our model naturally extends an extremely effective existing concept-based relevance model by tracking the concept frequency over time. Moreover, the proposed model produces important concepts that are frequently used within a particular time period associated with a given topic, which have more power to discriminate between relevant and non-relevant microblog documents than words. Our experiments using a corpus of microblog data (the Tweets2011 corpus) show that the proposed concept-based query expansion method improves search performance significantly, especially when retrieving highly relevant documents. Keywords: Twitter, microblog search, pseudo-relevance feedback, query expansion 1. 1 Graduate School of System Informatics, Kobe University, Kobe, Hyogo , Japan a) miyanishi@ai.cs.kobe-u.ac.jp b) seki@cs.kobe-u.ac.jp c) uehara@kobe-u.ac.jp [10], [11], [12], [20], [21], [30] [8], [23], [24], [27], [28] WebDB 2013 c 2014 Information Processing Society of Japan 1

2 [27] [1], [2], [3], [4], [5], [16], [18], [25], [26] Web Latent Concept Expansion; LCE [26] wrm [17] ctrm TREC White House spokesman replaced MB044 1 Joseph R. Diden Jr. Jay Carney wrm ctrm Lexical ctrm Temporal 1 wrm jay carney ctrm Lexical news spokesman press secretary jay carney 1 Table 1 MB044: White House spokesman replaced Example of expanded words and concepts about a topic MB044: White House spokesman replaced suggested by a word-based PRF (wrm) and a concept-based temporal one (ctrm). wrm ctrm (Lexical) ctrm (Temporal) jay jay carney carney carney jay qantas qantas press secretary new new spokesman jay carney obama new biden spokesman ctrm ctrm Temporal ctrm Lexical ctrm LCE TREC ,600 tweet Twitter Tweets2011 * [35] Li [19] *1 c 2014 Information Processing Society of Japan 2

3 Efron [10] Dakka [9] Keikha [14] Lin [21] Miyanishi [27] Jones [13] 2 Efron [12] Miyanishi [28] tweet 1 Twitter 2.2 Markov Random Field; MRF MRF Web Metzler [25] MRF MRF [26] LCE Bendersky [3] Wikipedia Bendersky [4], [5] Bendersky [2] TREC Web Wikipedia 3. Ponte [31] 3.1 Q D P (D Q) P (D Q) P (Q D) P (D) P (D Q) P (Q D)P (D) P (D) D n q 1,q 2,...,q n Q P (Q D) =P (q 1,q 2,...,q n D) n P (Q D) = P (q i D) (1) i=1 n q i c 2014 Information Processing Society of Japan 3

4 Q i P (q D) f(w;d) P ml (w D) = w V f(w ;D) f(w; D) D w w V f(w ; D) D V (1) 1 P (Q D) 0 [36] P (w D) = D D + μ P μ ml(w D)+ D + μ P (w C) P (w C) C w μ 3.2 Lavrenko [17] P (w R) w w P (w R) P (w Q) = D R P (w, D Q) = 1 P (Q) P (w, Q D)P (D) D R D R P (D)P (w D) n P (q i D), (2) R Q M P (Q) P (w Q) D w q i (2) P (w Q) w k 3.3 Metzler [26] LCE LCE 1 LCE i [25] LCE Bendersky [4] LCE LCE R M c S LCE (c, Q) D R exp{γ 1 φ 1 (Q, D)+γ 2 φ 2 (c, D) γ 3 φ 3 (c, C)}(3) φ 1 (Q, D) Q D c φ 2 (c, D) c D φ 3 (c, C) C c γ 1 γ 2 γ 3 (3) γ 1 = γ 2 = γ 3 =1 φ 1 (Q, D) = log P (Q D) φ 2 (c, D) =logp (c D) φ 3 (c, C) =0 *2 D Q ˆq 1, ˆq 2,...,ˆq m c bag-of-concepts c Q S crm (c, Q) S crm (c, Q) D R P (D)P (c D) m P (ˆq j D) (4) ˆq j Q j (4) (3) (4) (2) Lavrenko [17] (2) (4) 3.4 [8], [23], [24], [27], [28] *2 [22] φ 3 (c, C) =0 j c 2014 Information Processing Society of Japan 4

5 P (c R) 2 c R l R t P (c Q) P (c Q) = P (c, D l,d t Q) D l R l D t R t = P (D l c, D t,q)p(c, D t Q) (5) D t R t D l R l D l R l D t R t Efron [10] D t D l (5) D t P (Q) P (c Q) P (c Q) = P (D l c, Q) P (c, D t Q) D l R l D t R t 1 = P (c, D l Q) P (c, D t Q) P (c Q) D l R t D t R t 1 P (D l )P (c, Q D l ) P (c Q) D l R t P (D t )P (c, Q D t ) D t R t bag-of-concepts D l D t Q ˆq 1, ˆq 2,...,ˆq m c 1 m P (c Q) P (D l )P (c D l ) P (ˆq j D l ) P (c Q) D l R l j m P (D t )P (c D t ) P (ˆq j D t ) D t R t j P (c Q) c Q S ct RM (c, Q) c { m S ct RM (c, Q) rank = P (D l )P (c D l ) P (ˆq j D l ) D l R l j }{{} Lexical m } 1/2 P (D t )P (c D t ) P (ˆq j D t ), D t R t j }{{} Temporal (6) P (c D t ) t c P (ˆq j D t ) t ˆq j (6) P (c D t ) m j P (ˆq j D t ) c ˆq 1, ˆq 2,...,ˆq m (6) R t c P (ˆq j D t ) P (c D t ) D l R l P (D l )P (c D l )P (Q D l ) (4) LCE (6) (1) P (c D t ) 0 P (c D t ) P (c D t )= D t ˆPml (c D t )+ P (c C) (7) D t + μ t D t + μ t D t D t f(c;d t) P ml (c D t )= c Vc f(c ;D V t) c f(c; D t ) t c μ t S ct RM (c, Q) k TREC Tweets2011 tweet Tweets ,600 tweet <num> <title> <querytime> TREC <title> TREC tweet tweet <num> MB001 <title> BBC World Service staff cuts <querytime> Tue Feb 08 12:30: TREC 2011 Fig. 1 Example topic from the TREC 2011 microblog track. μ t c 2014 Information Processing Society of Japan 5

6 2 Table 2 TREC Summary of TREC collections and topics used for evaluation. Name Type #Topics Topic Numbers TREC 2011 allrel highrel 33 1, 10-30, 32, 36-38, 40-42, 44-46, 49 TREC 2012 allrel , highrel 56 51, 52, 54-68, 70-75, , tweet allrel tweet highrel TREC 2011 TREC 2012 allrel highrel tweet tweet Indri [34] stopword Krovetz stemmer [15] stemming Tweet Tweet Indri [36] [12] μ = 2500 LM μ t allrel μ t = 150 highrel μ t = 350 LM tweet ldig *3 retweet RT tweet *4 TREC [29], [33] retweet [8] retweet retweet *3 *4 tweet tweet retweet retweet HTTP tweet 1, tweet sequential-dependence [25] w 1 w 2 #1(w 1,w 2 ) w 1 w 2 8 #8(w 1,w 2 ) (7) P (c C) df (c)/n P (c C) df (c) c N df (c)/n P (c C) μ t (4) ctrm 2 wtrm wtrm (4) 1 wtrm ctrm wtrm ctrm wtrm ctrm wrm [17] wrm 2 2 c 2014 Information Processing Society of Japan 6

7 3 Lexical Temporal Concept Table 3 Summary of the evaluated retrieval methods. Method Lexical Temporal Concept wrm crm wtrm ctrm crm LCE (3) LCE Bendersky [4] LCE crm ctrm ctrm crm R t 3 wrm crm wtrm ctrm 3 crm ctrm wrm wtrm wtrm ctrm wrm crm wrm crm wtrm ctrm tweet Tweets2011 tweet wrm crm wtrm ctrm # tweet P (D l ) P (D t ) S crm (c, Q) S ct RM (c, Q) k S crm (c, Q) S ct RM (c, Q) k 2 #weight( λ 1 #combine(bbc world service staff cuts) λ 2 #weight( s 1 #1(service outlines) s 2 #uw8(bbc outlines) s 3 outlines... s k #1(weds bbcworldservice))) TREC 2011 MB001: BBC world service staff cuts Fig. 2 Example of query reformulation of topic MB001: BBC world service staff cuts from TREC 2011 Microblog track queries. 4 Table 4 Parameters for IR models. allrel highrel Method Year M N K M N K wtrm ctrm wtrm ctrm Indri [34] 2 i ScT RM (ci,q) s i = k j ScT RM (cj,q) 1:1 λ 1 = λ 2 = M wtrm ctrm N k M N k allrel highrel 1,000 Average Precision AP AP Precision [6]TREC ,000 AP TREC 2011 TREC 2012 TREC wtrm ctrm M N k TREC c 2014 Information Processing Society of Japan 7

8 Table 5 5 The performance comparison of the word-based PRFs. allrel highrel Method AP P@30 bpref AP P@30 bpref LM wrm α α α α α wtrm α β α α β α α α Table 6 6 The performance comparison of the concept-based PRFs. allrel highrel Method AP P@30 bpref AP P@30 bpref LM crm α α α α α α ctrm α α α β α β α α 4.3 Average Precision AP 30 Precision P@30 binary preference bpref [7] P@30 TREC 2011 TREC 2012 P@30 AP bpref AP Precision TREC 2012 highrel allrel highrel t [32] 100,000 p< wtrm ctrm wtrm ctrm wrm crm LM [17] wrm wtrm AP P@30 bpref LM wrm wtrm α β γ 5 wrm wtrm allrel highrel LM allrel highrel wtrm wrm allrel AP bpref [4] crm Table 7 7 The performance comparison of the proposed wordand concept-based temporal PRF methods. allrel highrel Method AP P@30 bpref AP P@30 bpref wtrm β ctrm ctrm AP P@30 bpref LM crm ctrm α β γ 6 crm ctrm allrel highrel LM ctrm crm allrel bref highrel AP wtrm ctrm AP P@30 bpref wtrm ctrm α β 7 allrel wtrm ctrm P@30 P@30 highrel ctrm wtrm tweet Gasland *5 wtrm oscar *5 Gasland MB c 2014 Information Processing Society of Japan 8

9 nomination ctrm oscar nomination nomination oscar nomination ctrm wtrm allrel highrel 5. 2 TREC [1] Bendersky, M. and Croft, W.B.: Discovering key concepts in verbose queries, SIGIR, pp (2008). [2] Bendersky, M. and Croft, W.B.: Modeling higher-order term dependencies in information retrieval using query hypergraphs, SIGIR, pp (2012). [3] Bendersky, M., Metzler, D. and Croft, W.B.: Learning concept importance using a weighted dependence model, WSDM, pp (2010). [4] Bendersky, M., Metzler, D. and Croft, W.B.: Parameterized concept weighting in verbose queries, SIGIR, pp (2011). [5] Bendersky, M., Metzler, D. and Croft, W.B.: Effective query formulation with multiple information sources, WSDM, pp (2012). [6] Buckley, C. and Voorhees, E.M.: Evaluating evaluation measure stability, SIGIR, pp (2000). [7] Buckley, C. and Voorhees, E.M.: Retrieval evaluation with incomplete information, SIGIR, pp (2004). [8] Choi, J. and Croft, W.B.: Temporal models for microblogs, CIKM, pp (2012). [9] Dakka, W., Gravano, L. and Ipeirotis, P.G.: Answering general time-sensitive queries, TKDE, Vol.24, No.2, pp (2012). [10] Efron, M. and Golovchinsky, G.: Estimation methods for ranking recent information, SIGIR, pp (2011). [11] Efron, M.: Query-specific recency ranking: Survival analysis for improved microblog retrieval, #TAIA (2012). [12] Efron, M., Organisciak, P. and Fenlon, K.: Improving retrieval of short texts through document expansion, SI- GIR, pp (2012). [13] Jones, R. and Diaz, F.: Temporal profiles of queries, TOIS, Vol.25, No.3 (2007). [14] Keikha, M., Gerani, S. and Crestani, F.: Time-based relevance models, SIGIR, pp (2011). [15] Krovetz, R.: Viewing morphology as an inference process, SIGIR, pp (1993). [16] Lang, H., Metzler, D., Wang, B. and Li, J.-T.: Improved latent concept expansion using hierarchical Markov random fields, CIKM, pp (2010). [17] Lavrenko, V. and Croft, W.B.: Relevance based language models, SIGIR, pp (2001). [18] Lease, M.: An improved Markov random field model for supporting verbose queries, SIGIR, pp (2009). [19] Li, X. and Croft, W.: Time-based language models, CIKM, pp (2003). [20] Liang, F., Qiang, R. and Yang, J.: Exploiting real-time information retrieval in the microblogosphere, JCDL, pp (2012). [21] Lin, J. and Efron, M.: Temporal relevance profiles for tweet search, #TAIA (2013). [22] Macdonald, C. and Ounis, I.: Global statistics in proximity weighting models, SIGIR Web N-gram Workshop (2010). [23] Massoudi, K., Tsagkias, M., de Rijke, M. and Weerkamp, W.: Incorporating query expansion and quality indicators in searching microblog posts, ECIR, pp (2011). [24] Metzler, D., Cai, C. and Hovy, E.: Structured event retrieval over microblog archives, HLT/NAACL, pp (2012). [25] Metzler, D. and Croft, W.B.: A Markov random field model for term dependencies, SIGIR, pp (2005). [26] Metzler, D. and Croft, W.B.: Latent concept expansion using Markov random fields, SIGIR, pp (2007). [27] Miyanishi, T., Seki, K. and Uehara, K.: Combining recency and topic-dependent temporal variation for microblog search, ECIR, pp (2013). [28] Miyanishi, T., Seki, K. and Uehara, K.: Improving pseudo-relevance feedback via tweet selection, CIKM, pp (2013). [29] Ounis, I., Macdonald, C., Lin, J. and Soboroff, I.: Overview of the TREC-2011 microblog track, TREC (2011). [30] Peetz, M.H., Meij, E., de Rijke, M. and Weerkamp, W.: Adaptive temporal query modeling, ECIR, pp (2012). [31] Ponte, J. and Croft, W.: A language modeling approach to information retrieval, SIGIR, pp (1998). [32] Smucker, M.D., Allan, J. and Carterette, B.: A comparison of statistical significance tests for information retrieval evaluation, CIKM, pp (2007). [33] Soboroff, I., Ounis, I. and Lin, J.: Overview of the TREC-2012 microblog track, TREC (2012). [34] Strohman, T., Metzler, D., Turtle, H. and Croft, W.: Indri: A language model-based search engine for complex queries, ICIA, pp.2 6 (2005). [35] Teevan, J., Ramage, D. and Morris, M.: #Twitter- Search: A comparison of microblog search and web search, WSDM, pp (2011). [36] Zhai, C. and Lafferty, J.: A study of smoothing methc 2014 Information Processing Society of Japan 9

10 ods for language models applied to information retrieval, TOIS, Vol.22, No.2, pp (2004) Web Ph.D AAAI c 2014 Information Processing Society of Japan 10

Mining the Temporal Statistics of Query Terms for Searching Social Media Posts

Mining the Temporal Statistics of Query Terms for Searching Social Media Posts Mining the Temporal Statistics of Query Terms for Searching Social Media Posts Jinfeng Rao, 1 Ferhan Ture, 2 Xing Niu, 1 and Jimmy Lin 3 1 Department of Computer Science, University of Maryland 2 Comcast

More information

Using temporal bursts for query modeling

Using temporal bursts for query modeling DOI 10.1007/s10791-013-9227-2 Using temporal bursts for query modeling Maria-Hendrike Peetz Edgar Meij Maarten de Rijke Received: 2 October 2012 / Accepted: 23 February 2013 Ó Springer Science+Business

More information

5 10 12 32 48 5 10 12 32 48 4 8 16 32 64 128 4 8 16 32 64 128 2 3 5 16 2 3 5 16 5 10 12 32 48 4 8 16 32 64 128 2 3 5 16 docid score 5 10 12 32 48 O'Neal averaged 15.2 points 9.2 rebounds and 1.0 assists

More information

Utilizing Passage-Based Language Models for Document Retrieval

Utilizing Passage-Based Language Models for Document Retrieval Utilizing Passage-Based Language Models for Document Retrieval Michael Bendersky 1 and Oren Kurland 2 1 Center for Intelligent Information Retrieval, Department of Computer Science, University of Massachusetts,

More information

A Neural Passage Model for Ad-hoc Document Retrieval

A Neural Passage Model for Ad-hoc Document Retrieval A Neural Passage Model for Ad-hoc Document Retrieval Qingyao Ai, Brendan O Connor, and W. Bruce Croft College of Information and Computer Sciences, University of Massachusetts Amherst, Amherst, MA, USA,

More information

Generalized Inverse Document Frequency

Generalized Inverse Document Frequency Generalized Inverse Document Frequency Donald Metzler metzler@yahoo-inc.com Yahoo! Research 2821 Mission College Blvd. Santa Clara, CA 95054 ABSTRACT Inverse document frequency (IDF) is one of the most

More information

Lecture 13: More uses of Language Models

Lecture 13: More uses of Language Models Lecture 13: More uses of Language Models William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 13 What we ll learn in this lecture Comparing documents, corpora using LM approaches

More information

arxiv: v1 [cs.ir] 1 May 2018

arxiv: v1 [cs.ir] 1 May 2018 On the Equivalence of Generative and Discriminative Formulations of the Sequential Dependence Model arxiv:1805.00152v1 [cs.ir] 1 May 2018 Laura Dietz University of New Hampshire dietz@cs.unh.edu John Foley

More information

Ranking-II. Temporal Representation and Retrieval Models. Temporal Information Retrieval

Ranking-II. Temporal Representation and Retrieval Models. Temporal Information Retrieval Ranking-II Temporal Representation and Retrieval Models Temporal Information Retrieval Ranking in Information Retrieval Ranking documents important for information overload, quickly finding documents which

More information

Query Performance Prediction: Evaluation Contrasted with Effectiveness

Query Performance Prediction: Evaluation Contrasted with Effectiveness Query Performance Prediction: Evaluation Contrasted with Effectiveness Claudia Hauff 1, Leif Azzopardi 2, Djoerd Hiemstra 1, and Franciska de Jong 1 1 University of Twente, Enschede, the Netherlands {c.hauff,

More information

Within-Document Term-Based Index Pruning with Statistical Hypothesis Testing

Within-Document Term-Based Index Pruning with Statistical Hypothesis Testing Within-Document Term-Based Index Pruning with Statistical Hypothesis Testing Sree Lekha Thota and Ben Carterette Department of Computer and Information Sciences University of Delaware, Newark, DE, USA

More information

A Study of the Dirichlet Priors for Term Frequency Normalisation

A Study of the Dirichlet Priors for Term Frequency Normalisation A Study of the Dirichlet Priors for Term Frequency Normalisation ABSTRACT Ben He Department of Computing Science University of Glasgow Glasgow, United Kingdom ben@dcs.gla.ac.uk In Information Retrieval

More information

Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Bethlehem, PA

Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Bethlehem, PA Rutgers, The State University of New Jersey Nov. 12, 2012 Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Bethlehem, PA Motivation Modeling Social Streams Future

More information

ChengXiang ( Cheng ) Zhai Department of Computer Science University of Illinois at Urbana-Champaign

ChengXiang ( Cheng ) Zhai Department of Computer Science University of Illinois at Urbana-Champaign Axiomatic Analysis and Optimization of Information Retrieval Models ChengXiang ( Cheng ) Zhai Department of Computer Science University of Illinois at Urbana-Champaign http://www.cs.uiuc.edu/homes/czhai

More information

CS 646 (Fall 2016) Homework 3

CS 646 (Fall 2016) Homework 3 CS 646 (Fall 2016) Homework 3 Deadline: 11:59pm, Oct 31st, 2016 (EST) Access the following resources before you start working on HW3: Download and uncompress the index file and other data from Moodle.

More information

Language Models, Smoothing, and IDF Weighting

Language Models, Smoothing, and IDF Weighting Language Models, Smoothing, and IDF Weighting Najeeb Abdulmutalib, Norbert Fuhr University of Duisburg-Essen, Germany {najeeb fuhr}@is.inf.uni-due.de Abstract In this paper, we investigate the relationship

More information

Time-Aware Rank Aggregation for Microblog Search

Time-Aware Rank Aggregation for Microblog Search Time-Aware Rank Aggregation for Microblog Search Shangsong Liang s.liang@uva.nl Edgar Meij emeij@yahoo-inc.com Zhaochun Ren z.ren@uva.nl Wouter Weerkamp wouter@904labs.com Maarten de Rijke derijke@uva.nl

More information

Yahoo! Labs Nov. 1 st, Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University

Yahoo! Labs Nov. 1 st, Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Yahoo! Labs Nov. 1 st, 2012 Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Motivation Modeling Social Streams Future work Motivation Modeling Social Streams

More information

Query-document Relevance Topic Models

Query-document Relevance Topic Models Query-document Relevance Topic Models Meng-Sung Wu, Chia-Ping Chen and Hsin-Min Wang Industrial Technology Research Institute, Hsinchu, Taiwan National Sun Yat-Sen University, Kaohsiung, Taiwan Institute

More information

Is Document Frequency important for PRF?

Is Document Frequency important for PRF? Author manuscript, published in "ICTIR 2011 - International Conference on the Theory Information Retrieval, Bertinoro : Italy (2011)" Is Document Frequency important for PRF? Stéphane Clinchant 1,2 and

More information

The Combination and Evaluation of Query Performance Prediction Methods

The Combination and Evaluation of Query Performance Prediction Methods The Combination and Evaluation of Query Performance Prediction Methods Claudia Hauff 1, Leif Azzopardi 2, and Djoerd Hiemstra 1 1 University of Twente, The Netherlands {c.hauff,d.hiemstra}@utwente.nl 2

More information

Using the Past To Score the Present: Extending Term Weighting Models Through Revision History Analysis

Using the Past To Score the Present: Extending Term Weighting Models Through Revision History Analysis Using the Past To Score the Present: Extending Term Weighting Models Through Revision History Analysis Ablimit Aji Yu Wang Emory University {aaji,yuwang}@emoryedu Eugene Agichtein Emory University eugene@mathcsemoryedu

More information

INFO 630 / CS 674 Lecture Notes

INFO 630 / CS 674 Lecture Notes INFO 630 / CS 674 Lecture Notes The Language Modeling Approach to Information Retrieval Lecturer: Lillian Lee Lecture 9: September 25, 2007 Scribes: Vladimir Barash, Stephen Purpura, Shaomei Wu Introduction

More information

Language Models. Web Search. LM Jelinek-Mercer Smoothing and LM Dirichlet Smoothing. Slides based on the books: 13

Language Models. Web Search. LM Jelinek-Mercer Smoothing and LM Dirichlet Smoothing. Slides based on the books: 13 Language Models LM Jelinek-Mercer Smoothing and LM Dirichlet Smoothing Web Search Slides based on the books: 13 Overview Indexes Query Indexing Ranking Results Application Documents User Information analysis

More information

Information Retrieval

Information Retrieval Information Retrieval Online Evaluation Ilya Markov i.markov@uva.nl University of Amsterdam Ilya Markov i.markov@uva.nl Information Retrieval 1 Course overview Offline Data Acquisition Data Processing

More information

Markov Precision: Modelling User Behaviour over Rank and Time

Markov Precision: Modelling User Behaviour over Rank and Time Markov Precision: Modelling User Behaviour over Rank and Time Marco Ferrante 1 Nicola Ferro 2 Maria Maistro 2 1 Dept. of Mathematics, University of Padua, Italy ferrante@math.unipd.it 2 Dept. of Information

More information

Featured tweet search: Modeling time and social influence for microblog retrieval

Featured tweet search: Modeling time and social influence for microblog retrieval Featured tweet search: Modeling time and social influence for microblog retrieval Lamjed Ben Jabeur, Lynda Tamine and Mohand Boughanem IRIT, Paul Sabatier University 118 Route de Narbonne F-3162 TOULOUSE

More information

Predicting Neighbor Goodness in Collaborative Filtering

Predicting Neighbor Goodness in Collaborative Filtering Predicting Neighbor Goodness in Collaborative Filtering Alejandro Bellogín and Pablo Castells {alejandro.bellogin, pablo.castells}@uam.es Universidad Autónoma de Madrid Escuela Politécnica Superior Introduction:

More information

Dynamic Clustering of Streaming Short Documents

Dynamic Clustering of Streaming Short Documents Dynamic Clustering of Streaming Short Documents Shangsong Liang shangsong.liang@ucl.ac.uk Emine Yilmaz emine.yilmaz@ucl.ac.uk University College London, London, United Kingdom University of Amsterdam,

More information

Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of Data

Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of Data Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of Data Nikita Zhiltsov 1,2 Alexander Kotov 3 Fedor Nikolaev 3 1 Kazan Federal University 2 Textocat 3 Textual Data Analytics

More information

Boolean and Vector Space Retrieval Models

Boolean and Vector Space Retrieval Models Boolean and Vector Space Retrieval Models Many slides in this section are adapted from Prof. Joydeep Ghosh (UT ECE) who in turn adapted them from Prof. Dik Lee (Univ. of Science and Tech, Hong Kong) 1

More information

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS RETRIEVAL MODELS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Boolean model Vector space model Probabilistic

More information

Chapter 10: Information Retrieval. See corresponding chapter in Manning&Schütze

Chapter 10: Information Retrieval. See corresponding chapter in Manning&Schütze Chapter 10: Information Retrieval See corresponding chapter in Manning&Schütze Evaluation Metrics in IR 2 Goal In IR there is a much larger variety of possible metrics For different tasks, different metrics

More information

Injecting User Models and Time into Precision via Markov Chains

Injecting User Models and Time into Precision via Markov Chains Injecting User Models and Time into Precision via Markov Chains Marco Ferrante Dept. Mathematics University of Padua, Italy ferrante@math.unipd.it Nicola Ferro Dept. Information Engineering University

More information

Boolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK).

Boolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). Boolean and Vector Space Retrieval Models 2013 CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). 1 Table of Content Boolean model Statistical vector space model Retrieval

More information

CS630 Representing and Accessing Digital Information Lecture 6: Feb 14, 2006

CS630 Representing and Accessing Digital Information Lecture 6: Feb 14, 2006 Scribes: Gilly Leshed, N. Sadat Shami Outline. Review. Mixture of Poissons ( Poisson) model 3. BM5/Okapi method 4. Relevance feedback. Review In discussing probabilistic models for information retrieval

More information

Semantic Term Matching in Axiomatic Approaches to Information Retrieval

Semantic Term Matching in Axiomatic Approaches to Information Retrieval Semantic Term Matching in Axiomatic Approaches to Information Retrieval Hui Fang Department of Computer Science University of Illinois at Urbana-Champaign ChengXiang Zhai Department of Computer Science

More information

Motivation. User. Retrieval Model Result: Query. Document Collection. Information Need. Information Retrieval / Chapter 3: Retrieval Models

Motivation. User. Retrieval Model Result: Query. Document Collection. Information Need. Information Retrieval / Chapter 3: Retrieval Models 3. Retrieval Models Motivation Information Need User Retrieval Model Result: Query 1. 2. 3. Document Collection 2 Agenda 3.1 Boolean Retrieval 3.2 Vector Space Model 3.3 Probabilistic IR 3.4 Statistical

More information

Measuring the Variability in Effectiveness of a Retrieval System

Measuring the Variability in Effectiveness of a Retrieval System Measuring the Variability in Effectiveness of a Retrieval System Mehdi Hosseini 1, Ingemar J. Cox 1, Natasa Millic-Frayling 2, and Vishwa Vinay 2 1 Computer Science Department, University College London

More information

Time Series Topic Modeling and Bursty Topic Detection of Correlated News and Twitter

Time Series Topic Modeling and Bursty Topic Detection of Correlated News and Twitter Time Series Topic Modeling and Bursty Topic Detection of Correlated News and Twitter Daichi Koike Yusuke Takahashi Takehito Utsuro Grad. Sch. Sys. & Inf. Eng., University of Tsukuba, Tsukuba, 305-8573,

More information

The Use of Dependency Relation Graph to Enhance the Term Weighting in Question Retrieval

The Use of Dependency Relation Graph to Enhance the Term Weighting in Question Retrieval The Use of Dependency Relation Graph to Enhance the Term Weighting in Question Retrieval Weinan Zhang 1 Zhao- yan M ing 2 Yu Zhang 1 Liqiang N ie 2 T ing Liu 1 Tat-Seng Chua 2 (1) School of Computer Science

More information

Modeling Controversy Within Populations

Modeling Controversy Within Populations Modeling Controversy Within Populations Myungha Jang, Shiri Dori-Hacohen and James Allan Center for Intelligent Information Retrieval (CIIR) University of Massachusetts Amherst {mhjang, shiri, allan}@cs.umass.edu

More information

Comparing Relevance Feedback Techniques on German News Articles

Comparing Relevance Feedback Techniques on German News Articles B. Mitschang et al. (Hrsg.): BTW 2017 Workshopband, Lecture Notes in Informatics (LNI), Gesellschaft für Informatik, Bonn 2017 301 Comparing Relevance Feedback Techniques on German News Articles Julia

More information

III.6 Advanced Query Types

III.6 Advanced Query Types III.6 Advanced Query Types 1. Query Expansion 2. Relevance Feedback 3. Novelty & Diversity Based on MRS Chapter 9, BY Chapter 5, [Carbonell and Goldstein 98] [Agrawal et al 09] 123 1. Query Expansion Query

More information

Modeling the Score Distributions of Relevant and Non-relevant Documents

Modeling the Score Distributions of Relevant and Non-relevant Documents Modeling the Score Distributions of Relevant and Non-relevant Documents Evangelos Kanoulas, Virgil Pavlu, Keshi Dai, and Javed A. Aslam College of Computer and Information Science Northeastern University,

More information

Reducing the Risk of Query Expansion via Robust Constrained Optimization

Reducing the Risk of Query Expansion via Robust Constrained Optimization Reducing the Risk of Query Expansion via Constrained Optimization Kevyn Collins-Thompson Microsoft Research 1 Microsoft Way Redmond, WA 98-6399 U.S.A. kevynct@microsoft.com ABSTRACT We introduce a new

More information

Chap 2: Classical models for information retrieval

Chap 2: Classical models for information retrieval Chap 2: Classical models for information retrieval Jean-Pierre Chevallet & Philippe Mulhem LIG-MRIM Sept 2016 Jean-Pierre Chevallet & Philippe Mulhem Models of IR 1 / 81 Outline Basic IR Models 1 Basic

More information

Positional Language Models for Information Retrieval

Positional Language Models for Information Retrieval Positional Language Models for Information Retrieval Yuanhua Lv Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL 61801 ylv2@uiuc.edu ChengXiang Zhai Department of Computer

More information

Discovering Geographical Topics in Twitter

Discovering Geographical Topics in Twitter Discovering Geographical Topics in Twitter Liangjie Hong, Lehigh University Amr Ahmed, Yahoo! Research Alexander J. Smola, Yahoo! Research Siva Gurumurthy, Twitter Kostas Tsioutsiouliklis, Twitter Overview

More information

Blog Distillation via Sentiment-Sensitive Link Analysis

Blog Distillation via Sentiment-Sensitive Link Analysis Blog Distillation via Sentiment-Sensitive Link Analysis Giacomo Berardi, Andrea Esuli, Fabrizio Sebastiani, and Fabrizio Silvestri Istituto di Scienza e Tecnologie dell Informazione, Consiglio Nazionale

More information

Optimal Mixture Models in IR

Optimal Mixture Models in IR Optimal Mixture Models in IR Victor Lavrenko Center for Intelligent Information Retrieval Department of Comupter Science University of Massachusetts, Amherst, MA 01003, lavrenko@cs.umass.edu Abstract.

More information

Language Models. Hongning Wang

Language Models. Hongning Wang Language Models Hongning Wang CS@UVa Notion of Relevance Relevance (Rep(q), Rep(d)) Similarity P(r1 q,d) r {0,1} Probability of Relevance P(d q) or P(q d) Probabilistic inference Different rep & similarity

More information

A Probabilistic Model for Online Document Clustering with Application to Novelty Detection

A Probabilistic Model for Online Document Clustering with Application to Novelty Detection A Probabilistic Model for Online Document Clustering with Application to Novelty Detection Jian Zhang School of Computer Science Cargenie Mellon University Pittsburgh, PA 15213 jian.zhang@cs.cmu.edu Zoubin

More information

Towards Collaborative Information Retrieval

Towards Collaborative Information Retrieval Towards Collaborative Information Retrieval Markus Junker, Armin Hust, and Stefan Klink German Research Center for Artificial Intelligence (DFKI GmbH), P.O. Box 28, 6768 Kaiserslautern, Germany {markus.junker,

More information

On the Foundations of Diverse Information Retrieval. Scott Sanner, Kar Wai Lim, Shengbo Guo, Thore Graepel, Sarvnaz Karimi, Sadegh Kharazmi

On the Foundations of Diverse Information Retrieval. Scott Sanner, Kar Wai Lim, Shengbo Guo, Thore Graepel, Sarvnaz Karimi, Sadegh Kharazmi On the Foundations of Diverse Information Retrieval Scott Sanner, Kar Wai Lim, Shengbo Guo, Thore Graepel, Sarvnaz Karimi, Sadegh Kharazmi 1 Outline Need for diversity The answer: MMR But what was the

More information

Ranked Retrieval (2)

Ranked Retrieval (2) Text Technologies for Data Science INFR11145 Ranked Retrieval (2) Instructor: Walid Magdy 31-Oct-2017 Lecture Objectives Learn about Probabilistic models BM25 Learn about LM for IR 2 1 Recall: VSM & TFIDF

More information

Research Methodology in Studies of Assessor Effort for Information Retrieval Evaluation

Research Methodology in Studies of Assessor Effort for Information Retrieval Evaluation Research Methodology in Studies of Assessor Effort for Information Retrieval Evaluation Ben Carterette & James Allan Center for Intelligent Information Retrieval Computer Science Department 140 Governors

More information

An Efficient Framework for Constructing Generalized Locally-Induced Text Metrics

An Efficient Framework for Constructing Generalized Locally-Induced Text Metrics Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence An Efficient Framework for Constructing Generalized Locally-Induced Text Metrics Saeed Amizadeh Intelligent Systems

More information

Correlation, Prediction and Ranking of Evaluation Metrics in Information Retrieval

Correlation, Prediction and Ranking of Evaluation Metrics in Information Retrieval Correlation, Prediction and Ranking of Evaluation Metrics in Information Retrieval Soumyajit Gupta 1, Mucahid Kutlu 2, Vivek Khetan 1, and Matthew Lease 1 1 University of Texas at Austin, USA 2 TOBB University

More information

Language Models and Smoothing Methods for Collections with Large Variation in Document Length. 2 Models

Language Models and Smoothing Methods for Collections with Large Variation in Document Length. 2 Models Language Models and Smoothing Methods for Collections with Large Variation in Document Length Najeeb Abdulmutalib and Norbert Fuhr najeeb@is.inf.uni-due.de, norbert.fuhr@uni-due.de Information Systems,

More information

Topic Models and Applications to Short Documents

Topic Models and Applications to Short Documents Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text

More information

A Framework of Detecting Burst Events from Micro-blogging Streams

A Framework of Detecting Burst Events from Micro-blogging Streams , pp.379-384 http://dx.doi.org/10.14257/astl.2013.29.78 A Framework of Detecting Burst Events from Micro-blogging Streams Kaifang Yang, Yongbo Yu, Lizhou Zheng, Peiquan Jin School of Computer Science and

More information

The Axiometrics Project

The Axiometrics Project The Axiometrics Project Eddy Maddalena and Stefano Mizzaro Department of Mathematics and Computer Science University of Udine Udine, Italy eddy.maddalena@uniud.it, mizzaro@uniud.it Abstract. The evaluation

More information

Latent Semantic Tensor Indexing for Community-based Question Answering

Latent Semantic Tensor Indexing for Community-based Question Answering Latent Semantic Tensor Indexing for Community-based Question Answering Xipeng Qiu, Le Tian, Xuanjing Huang Fudan University, 825 Zhangheng Road, Shanghai, China xpqiu@fudan.edu.cn, tianlefdu@gmail.com,

More information

The effect on accuracy of tweet sample size for hashtag segmentation dictionary construction

The effect on accuracy of tweet sample size for hashtag segmentation dictionary construction The effect on accuracy of tweet sample size for hashtag segmentation dictionary construction Laurence A. F. Park and Glenn Stone School of Computing, Engineering and Mathematics Western Sydney University,

More information

A Study of Smoothing Methods for Language Models Applied to Information Retrieval

A Study of Smoothing Methods for Language Models Applied to Information Retrieval A Study of Smoothing Methods for Language Models Applied to Information Retrieval CHENGXIANG ZHAI and JOHN LAFFERTY Carnegie Mellon University Language modeling approaches to information retrieval are

More information

Bayesian Query-Focused Summarization

Bayesian Query-Focused Summarization Bayesian Query-Focused Summarization Hal Daumé III and Daniel Marcu Information Sciences Institute 4676 Admiralty Way, Suite 1001 Marina del Rey, CA 90292 me@hal3.name,marcu@isi.edu Abstract We present

More information

Advances in IR Evalua0on. Ben Cartere6e Evangelos Kanoulas Emine Yilmaz

Advances in IR Evalua0on. Ben Cartere6e Evangelos Kanoulas Emine Yilmaz Advances in IR Evalua0on Ben Cartere6e Evangelos Kanoulas Emine Yilmaz Course Outline Intro to evalua0on Evalua0on methods, test collec0ons, measures, comparable evalua0on Low cost evalua0on Advanced user

More information

A Risk Minimization Framework for Information Retrieval

A Risk Minimization Framework for Information Retrieval A Risk Minimization Framework for Information Retrieval ChengXiang Zhai a John Lafferty b a Department of Computer Science University of Illinois at Urbana-Champaign b School of Computer Science Carnegie

More information

Information Retrieval CS Lecture 03. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Information Retrieval CS Lecture 03. Razvan C. Bunescu School of Electrical Engineering and Computer Science Information Retrieval CS 6900 Lecture 03 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Statistical Properties of Text Zipf s Law models the distribution of terms

More information

Advanced Topics in Information Retrieval 5. Diversity & Novelty

Advanced Topics in Information Retrieval 5. Diversity & Novelty Advanced Topics in Information Retrieval 5. Diversity & Novelty Vinay Setty (vsetty@mpi-inf.mpg.de) Jannik Strötgen (jtroetge@mpi-inf.mpg.de) 1 Outline 5.1. Why Novelty & Diversity? 5.2. Probability Ranking

More information

Medical Question Answering for Clinical Decision Support

Medical Question Answering for Clinical Decision Support Medical Question Answering for Clinical Decision Support Travis R. Goodwin and Sanda M. Harabagiu The University of Texas at Dallas Human Language Technology Research Institute http://www.hlt.utdallas.edu

More information

Applying the Divergence From Randomness Approach for Content-Only Search in XML Documents

Applying the Divergence From Randomness Approach for Content-Only Search in XML Documents Applying the Divergence From Randomness Approach for Content-Only Search in XML Documents Mohammad Abolhassani and Norbert Fuhr Institute of Informatics and Interactive Systems, University of Duisburg-Essen,

More information

Fall CS646: Information Retrieval. Lecture 6 Boolean Search and Vector Space Model. Jiepu Jiang University of Massachusetts Amherst 2016/09/26

Fall CS646: Information Retrieval. Lecture 6 Boolean Search and Vector Space Model. Jiepu Jiang University of Massachusetts Amherst 2016/09/26 Fall 2016 CS646: Information Retrieval Lecture 6 Boolean Search and Vector Space Model Jiepu Jiang University of Massachusetts Amherst 2016/09/26 Outline Today Boolean Retrieval Vector Space Model Latent

More information

BEWARE OF RELATIVELY LARGE BUT MEANINGLESS IMPROVEMENTS

BEWARE OF RELATIVELY LARGE BUT MEANINGLESS IMPROVEMENTS TECHNICAL REPORT YL-2011-001 BEWARE OF RELATIVELY LARGE BUT MEANINGLESS IMPROVEMENTS Roi Blanco, Hugo Zaragoza Yahoo! Research Diagonal 177, Barcelona, Spain {roi,hugoz}@yahoo-inc.com Bangalore Barcelona

More information

IPSJ SIG Technical Report Vol.2014-MPS-100 No /9/25 1,a) 1 1 SNS / / / / / / Time Series Topic Model Considering Dependence to Multiple Topics S

IPSJ SIG Technical Report Vol.2014-MPS-100 No /9/25 1,a) 1 1 SNS / / / / / / Time Series Topic Model Considering Dependence to Multiple Topics S 1,a) 1 1 SNS /// / // Time Series Topic Model Considering Dependence to Multiple Topics Sasaki Kentaro 1,a) Yoshikawa Tomohiro 1 Furuhashi Takeshi 1 Abstract: This pater proposes a topic model that considers

More information

A General Linear Mixed Models Approach to Study System Component Effects

A General Linear Mixed Models Approach to Study System Component Effects A General Linear Mixed Models Approach to Study System Component Effects ABSTRACT Nicola Ferro Department of Information Engineering University of Padua Padua, Italy nicolaferro@unipdit Topic variance

More information

USING SINGULAR VALUE DECOMPOSITION (SVD) AS A SOLUTION FOR SEARCH RESULT CLUSTERING

USING SINGULAR VALUE DECOMPOSITION (SVD) AS A SOLUTION FOR SEARCH RESULT CLUSTERING POZNAN UNIVE RSIY OF E CHNOLOGY ACADE MIC JOURNALS No. 80 Electrical Engineering 2014 Hussam D. ABDULLA* Abdella S. ABDELRAHMAN* Vaclav SNASEL* USING SINGULAR VALUE DECOMPOSIION (SVD) AS A SOLUION FOR

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa Recap: vector space model Represent both doc and query by concept vectors Each concept defines one dimension K concepts define a high-dimensional space Element

More information

Ranking Sequential Patterns with Respect to Significance

Ranking Sequential Patterns with Respect to Significance Ranking Sequential Patterns with Respect to Significance Robert Gwadera, Fabio Crestani Universita della Svizzera Italiana Lugano, Switzerland Abstract. We present a reliable universal method for ranking

More information

A Note on the Expectation-Maximization (EM) Algorithm

A Note on the Expectation-Maximization (EM) Algorithm A Note on the Expectation-Maximization (EM) Algorithm ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign March 11, 2007 1 Introduction The Expectation-Maximization

More information

A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank

A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank Shoaib Jameel Shoaib Jameel 1, Wai Lam 2, Steven Schockaert 1, and Lidong Bing 3 1 School of Computer Science and Informatics,

More information

CS276A Text Information Retrieval, Mining, and Exploitation. Lecture 4 15 Oct 2002

CS276A Text Information Retrieval, Mining, and Exploitation. Lecture 4 15 Oct 2002 CS276A Text Information Retrieval, Mining, and Exploitation Lecture 4 15 Oct 2002 Recap of last time Index size Index construction techniques Dynamic indices Real world considerations 2 Back of the envelope

More information

Hub, Authority and Relevance Scores in Multi-Relational Data for Query Search

Hub, Authority and Relevance Scores in Multi-Relational Data for Query Search Hub, Authority and Relevance Scores in Multi-Relational Data for Query Search Xutao Li 1 Michael Ng 2 Yunming Ye 1 1 Department of Computer Science, Shenzhen Graduate School, Harbin Institute of Technology,

More information

Analysis of Patents for Prior Art Candidate Search

Analysis of Patents for Prior Art Candidate Search Analysis of Patents for Prior Art Candidate Search Sébastien Déjean, Nicolas Faessel, Lucille Marty, Josiane Mothe, Samantha Sadala and Soda Thiam IMT, Toulouse, France Email: sebastien.dejean@math.univ-toulouse.fr

More information

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Schütze s, linked from http://informationretrieval.org/ IR 26/26: Feature Selection and Exam Overview Paul Ginsparg Cornell University,

More information

Effectiveness of complex index terms in information retrieval

Effectiveness of complex index terms in information retrieval Effectiveness of complex index terms in information retrieval Tokunaga Takenobu, Ogibayasi Hironori and Tanaka Hozumi Department of Computer Science Tokyo Institute of Technology Abstract This paper explores

More information

Learning Topical Transition Probabilities in Click Through Data with Regression Models

Learning Topical Transition Probabilities in Click Through Data with Regression Models Learning Topical Transition Probabilities in Click Through Data with Regression Xiao Zhang, Prasenjit Mitra Department of Computer Science and Engineering College of Information Sciences and Technology

More information

Learning Maximal Marginal Relevance Model via Directly Optimizing Diversity Evaluation Measures

Learning Maximal Marginal Relevance Model via Directly Optimizing Diversity Evaluation Measures Learning Maximal Marginal Relevance Model via Directly Optimizing Diversity Evaluation Measures Long Xia Jun Xu Yanyan Lan Jiafeng Guo Xueqi Cheng CAS Key Lab of Network Data Science and Technology, Institute

More information

Integrating Logical Operators in Query Expansion in Vector Space Model

Integrating Logical Operators in Query Expansion in Vector Space Model Integrating Logical Operators in Query Expansion in Vector Space Model Jian-Yun Nie, Fuman Jin DIRO, Université de Montréal C.P. 6128, succursale Centre-ville, Montreal Quebec, H3C 3J7 Canada {nie, jinf}@iro.umontreal.ca

More information

Click Models for Web Search

Click Models for Web Search Click Models for Web Search Lecture 1 Aleksandr Chuklin, Ilya Markov Maarten de Rijke a.chuklin@uva.nl i.markov@uva.nl derijke@uva.nl University of Amsterdam Google Research Europe AC IM MdR Click Models

More information

ECIR Tutorial 30 March Advanced Language Modeling Approaches (case study: expert search) 1/100.

ECIR Tutorial 30 March Advanced Language Modeling Approaches (case study: expert search)   1/100. Advanced Language Modeling Approaches (case study: expert search) http://www.cs.utwente.nl/~hiemstra 1/100 Overview 1. Introduction to information retrieval and three basic probabilistic approaches The

More information

Language Models. CS6200: Information Retrieval. Slides by: Jesse Anderton

Language Models. CS6200: Information Retrieval. Slides by: Jesse Anderton Language Models CS6200: Information Retrieval Slides by: Jesse Anderton What s wrong with VSMs? Vector Space Models work reasonably well, but have a few problems: They are based on bag-of-words, so they

More information

1 Information retrieval fundamentals

1 Information retrieval fundamentals CS 630 Lecture 1: 01/26/2006 Lecturer: Lillian Lee Scribes: Asif-ul Haque, Benyah Shaparenko This lecture focuses on the following topics Information retrieval fundamentals Vector Space Model (VSM) Deriving

More information

Passage language models in ad hoc document retrieval

Passage language models in ad hoc document retrieval Passage language models in ad hoc document retrieval Research Thesis In Partial Fulfillment of The Requirements for the Degree of Master of Science in Information Management Engineering Michael Bendersky

More information

A Query Algebra for Quantum Information Retrieval

A Query Algebra for Quantum Information Retrieval A Query Algebra for Quantum Information Retrieval Annalina Caputo 1, Benjamin Piwowarski 2, Mounia Lalmas 3 1 University of Bari, Italy. acaputo@di.uniba.it 2 University of Glasgow, UK. benjamin@bpiwowar.net

More information

Lower-Bounding Term Frequency Normalization

Lower-Bounding Term Frequency Normalization Lower-Bounding Term Frequency Normalization Yuanhua Lv Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL 68 ylv@uiuc.edu ChengXiang Zhai Department of Computer Science

More information

Fast LSI-based techniques for query expansion in text retrieval systems

Fast LSI-based techniques for query expansion in text retrieval systems Fast LSI-based techniques for query expansion in text retrieval systems L. Laura U. Nanni F. Sarracco Department of Computer and System Science University of Rome La Sapienza 2nd Workshop on Text-based

More information

Faceted Models of Blog Feeds

Faceted Models of Blog Feeds Faceted Models of Blog Feeds Lifeng Jia Department of Computer Science University of Illinois at Chicago 851 S. Morgan Street Chicago, IL, USA, 60607 ljia2@uic.edu Clement Yu Department of Computer Science

More information

AN INTRODUCTION TO TOPIC MODELS

AN INTRODUCTION TO TOPIC MODELS AN INTRODUCTION TO TOPIC MODELS Michael Paul December 4, 2013 600.465 Natural Language Processing Johns Hopkins University Prof. Jason Eisner Making sense of text Suppose you want to learn something about

More information