Concept Tracking for Microblog Search
|
|
- Phyllis Lewis
- 5 years ago
- Views:
Transcription
1 1,a) 1,b) 1,c) , Tweets2011 Twitter Concept Tracking for Microblog Search Taiki Miyanishi 1,a) Kazuhiro Seki 1,b) Kuniaki Uehara 1,c) Received: December 22, 2013, Accepted: April 7, 2014 Abstract: Incorporating the temporal property of words into query expansion methods based on relevance feedback has been shown to have a significant positive effect on microblog searching. In this paper, we propose a concept-based query expansion method based on a temporal relevance model that uses the temporal variation of concepts (e.g., terms or phrases) on microblogs. Our model naturally extends an extremely effective existing concept-based relevance model by tracking the concept frequency over time. Moreover, the proposed model produces important concepts that are frequently used within a particular time period associated with a given topic, which have more power to discriminate between relevant and non-relevant microblog documents than words. Our experiments using a corpus of microblog data (the Tweets2011 corpus) show that the proposed concept-based query expansion method improves search performance significantly, especially when retrieving highly relevant documents. Keywords: Twitter, microblog search, pseudo-relevance feedback, query expansion 1. 1 Graduate School of System Informatics, Kobe University, Kobe, Hyogo , Japan a) miyanishi@ai.cs.kobe-u.ac.jp b) seki@cs.kobe-u.ac.jp c) uehara@kobe-u.ac.jp [10], [11], [12], [20], [21], [30] [8], [23], [24], [27], [28] WebDB 2013 c 2014 Information Processing Society of Japan 1
2 [27] [1], [2], [3], [4], [5], [16], [18], [25], [26] Web Latent Concept Expansion; LCE [26] wrm [17] ctrm TREC White House spokesman replaced MB044 1 Joseph R. Diden Jr. Jay Carney wrm ctrm Lexical ctrm Temporal 1 wrm jay carney ctrm Lexical news spokesman press secretary jay carney 1 Table 1 MB044: White House spokesman replaced Example of expanded words and concepts about a topic MB044: White House spokesman replaced suggested by a word-based PRF (wrm) and a concept-based temporal one (ctrm). wrm ctrm (Lexical) ctrm (Temporal) jay jay carney carney carney jay qantas qantas press secretary new new spokesman jay carney obama new biden spokesman ctrm ctrm Temporal ctrm Lexical ctrm LCE TREC ,600 tweet Twitter Tweets2011 * [35] Li [19] *1 c 2014 Information Processing Society of Japan 2
3 Efron [10] Dakka [9] Keikha [14] Lin [21] Miyanishi [27] Jones [13] 2 Efron [12] Miyanishi [28] tweet 1 Twitter 2.2 Markov Random Field; MRF MRF Web Metzler [25] MRF MRF [26] LCE Bendersky [3] Wikipedia Bendersky [4], [5] Bendersky [2] TREC Web Wikipedia 3. Ponte [31] 3.1 Q D P (D Q) P (D Q) P (Q D) P (D) P (D Q) P (Q D)P (D) P (D) D n q 1,q 2,...,q n Q P (Q D) =P (q 1,q 2,...,q n D) n P (Q D) = P (q i D) (1) i=1 n q i c 2014 Information Processing Society of Japan 3
4 Q i P (q D) f(w;d) P ml (w D) = w V f(w ;D) f(w; D) D w w V f(w ; D) D V (1) 1 P (Q D) 0 [36] P (w D) = D D + μ P μ ml(w D)+ D + μ P (w C) P (w C) C w μ 3.2 Lavrenko [17] P (w R) w w P (w R) P (w Q) = D R P (w, D Q) = 1 P (Q) P (w, Q D)P (D) D R D R P (D)P (w D) n P (q i D), (2) R Q M P (Q) P (w Q) D w q i (2) P (w Q) w k 3.3 Metzler [26] LCE LCE 1 LCE i [25] LCE Bendersky [4] LCE LCE R M c S LCE (c, Q) D R exp{γ 1 φ 1 (Q, D)+γ 2 φ 2 (c, D) γ 3 φ 3 (c, C)}(3) φ 1 (Q, D) Q D c φ 2 (c, D) c D φ 3 (c, C) C c γ 1 γ 2 γ 3 (3) γ 1 = γ 2 = γ 3 =1 φ 1 (Q, D) = log P (Q D) φ 2 (c, D) =logp (c D) φ 3 (c, C) =0 *2 D Q ˆq 1, ˆq 2,...,ˆq m c bag-of-concepts c Q S crm (c, Q) S crm (c, Q) D R P (D)P (c D) m P (ˆq j D) (4) ˆq j Q j (4) (3) (4) (2) Lavrenko [17] (2) (4) 3.4 [8], [23], [24], [27], [28] *2 [22] φ 3 (c, C) =0 j c 2014 Information Processing Society of Japan 4
5 P (c R) 2 c R l R t P (c Q) P (c Q) = P (c, D l,d t Q) D l R l D t R t = P (D l c, D t,q)p(c, D t Q) (5) D t R t D l R l D l R l D t R t Efron [10] D t D l (5) D t P (Q) P (c Q) P (c Q) = P (D l c, Q) P (c, D t Q) D l R l D t R t 1 = P (c, D l Q) P (c, D t Q) P (c Q) D l R t D t R t 1 P (D l )P (c, Q D l ) P (c Q) D l R t P (D t )P (c, Q D t ) D t R t bag-of-concepts D l D t Q ˆq 1, ˆq 2,...,ˆq m c 1 m P (c Q) P (D l )P (c D l ) P (ˆq j D l ) P (c Q) D l R l j m P (D t )P (c D t ) P (ˆq j D t ) D t R t j P (c Q) c Q S ct RM (c, Q) c { m S ct RM (c, Q) rank = P (D l )P (c D l ) P (ˆq j D l ) D l R l j }{{} Lexical m } 1/2 P (D t )P (c D t ) P (ˆq j D t ), D t R t j }{{} Temporal (6) P (c D t ) t c P (ˆq j D t ) t ˆq j (6) P (c D t ) m j P (ˆq j D t ) c ˆq 1, ˆq 2,...,ˆq m (6) R t c P (ˆq j D t ) P (c D t ) D l R l P (D l )P (c D l )P (Q D l ) (4) LCE (6) (1) P (c D t ) 0 P (c D t ) P (c D t )= D t ˆPml (c D t )+ P (c C) (7) D t + μ t D t + μ t D t D t f(c;d t) P ml (c D t )= c Vc f(c ;D V t) c f(c; D t ) t c μ t S ct RM (c, Q) k TREC Tweets2011 tweet Tweets ,600 tweet <num> <title> <querytime> TREC <title> TREC tweet tweet <num> MB001 <title> BBC World Service staff cuts <querytime> Tue Feb 08 12:30: TREC 2011 Fig. 1 Example topic from the TREC 2011 microblog track. μ t c 2014 Information Processing Society of Japan 5
6 2 Table 2 TREC Summary of TREC collections and topics used for evaluation. Name Type #Topics Topic Numbers TREC 2011 allrel highrel 33 1, 10-30, 32, 36-38, 40-42, 44-46, 49 TREC 2012 allrel , highrel 56 51, 52, 54-68, 70-75, , tweet allrel tweet highrel TREC 2011 TREC 2012 allrel highrel tweet tweet Indri [34] stopword Krovetz stemmer [15] stemming Tweet Tweet Indri [36] [12] μ = 2500 LM μ t allrel μ t = 150 highrel μ t = 350 LM tweet ldig *3 retweet RT tweet *4 TREC [29], [33] retweet [8] retweet retweet *3 *4 tweet tweet retweet retweet HTTP tweet 1, tweet sequential-dependence [25] w 1 w 2 #1(w 1,w 2 ) w 1 w 2 8 #8(w 1,w 2 ) (7) P (c C) df (c)/n P (c C) df (c) c N df (c)/n P (c C) μ t (4) ctrm 2 wtrm wtrm (4) 1 wtrm ctrm wtrm ctrm wtrm ctrm wrm [17] wrm 2 2 c 2014 Information Processing Society of Japan 6
7 3 Lexical Temporal Concept Table 3 Summary of the evaluated retrieval methods. Method Lexical Temporal Concept wrm crm wtrm ctrm crm LCE (3) LCE Bendersky [4] LCE crm ctrm ctrm crm R t 3 wrm crm wtrm ctrm 3 crm ctrm wrm wtrm wtrm ctrm wrm crm wrm crm wtrm ctrm tweet Tweets2011 tweet wrm crm wtrm ctrm # tweet P (D l ) P (D t ) S crm (c, Q) S ct RM (c, Q) k S crm (c, Q) S ct RM (c, Q) k 2 #weight( λ 1 #combine(bbc world service staff cuts) λ 2 #weight( s 1 #1(service outlines) s 2 #uw8(bbc outlines) s 3 outlines... s k #1(weds bbcworldservice))) TREC 2011 MB001: BBC world service staff cuts Fig. 2 Example of query reformulation of topic MB001: BBC world service staff cuts from TREC 2011 Microblog track queries. 4 Table 4 Parameters for IR models. allrel highrel Method Year M N K M N K wtrm ctrm wtrm ctrm Indri [34] 2 i ScT RM (ci,q) s i = k j ScT RM (cj,q) 1:1 λ 1 = λ 2 = M wtrm ctrm N k M N k allrel highrel 1,000 Average Precision AP AP Precision [6]TREC ,000 AP TREC 2011 TREC 2012 TREC wtrm ctrm M N k TREC c 2014 Information Processing Society of Japan 7
8 Table 5 5 The performance comparison of the word-based PRFs. allrel highrel Method AP P@30 bpref AP P@30 bpref LM wrm α α α α α wtrm α β α α β α α α Table 6 6 The performance comparison of the concept-based PRFs. allrel highrel Method AP P@30 bpref AP P@30 bpref LM crm α α α α α α ctrm α α α β α β α α 4.3 Average Precision AP 30 Precision P@30 binary preference bpref [7] P@30 TREC 2011 TREC 2012 P@30 AP bpref AP Precision TREC 2012 highrel allrel highrel t [32] 100,000 p< wtrm ctrm wtrm ctrm wrm crm LM [17] wrm wtrm AP P@30 bpref LM wrm wtrm α β γ 5 wrm wtrm allrel highrel LM allrel highrel wtrm wrm allrel AP bpref [4] crm Table 7 7 The performance comparison of the proposed wordand concept-based temporal PRF methods. allrel highrel Method AP P@30 bpref AP P@30 bpref wtrm β ctrm ctrm AP P@30 bpref LM crm ctrm α β γ 6 crm ctrm allrel highrel LM ctrm crm allrel bref highrel AP wtrm ctrm AP P@30 bpref wtrm ctrm α β 7 allrel wtrm ctrm P@30 P@30 highrel ctrm wtrm tweet Gasland *5 wtrm oscar *5 Gasland MB c 2014 Information Processing Society of Japan 8
9 nomination ctrm oscar nomination nomination oscar nomination ctrm wtrm allrel highrel 5. 2 TREC [1] Bendersky, M. and Croft, W.B.: Discovering key concepts in verbose queries, SIGIR, pp (2008). [2] Bendersky, M. and Croft, W.B.: Modeling higher-order term dependencies in information retrieval using query hypergraphs, SIGIR, pp (2012). [3] Bendersky, M., Metzler, D. and Croft, W.B.: Learning concept importance using a weighted dependence model, WSDM, pp (2010). [4] Bendersky, M., Metzler, D. and Croft, W.B.: Parameterized concept weighting in verbose queries, SIGIR, pp (2011). [5] Bendersky, M., Metzler, D. and Croft, W.B.: Effective query formulation with multiple information sources, WSDM, pp (2012). [6] Buckley, C. and Voorhees, E.M.: Evaluating evaluation measure stability, SIGIR, pp (2000). [7] Buckley, C. and Voorhees, E.M.: Retrieval evaluation with incomplete information, SIGIR, pp (2004). [8] Choi, J. and Croft, W.B.: Temporal models for microblogs, CIKM, pp (2012). [9] Dakka, W., Gravano, L. and Ipeirotis, P.G.: Answering general time-sensitive queries, TKDE, Vol.24, No.2, pp (2012). [10] Efron, M. and Golovchinsky, G.: Estimation methods for ranking recent information, SIGIR, pp (2011). [11] Efron, M.: Query-specific recency ranking: Survival analysis for improved microblog retrieval, #TAIA (2012). [12] Efron, M., Organisciak, P. and Fenlon, K.: Improving retrieval of short texts through document expansion, SI- GIR, pp (2012). [13] Jones, R. and Diaz, F.: Temporal profiles of queries, TOIS, Vol.25, No.3 (2007). [14] Keikha, M., Gerani, S. and Crestani, F.: Time-based relevance models, SIGIR, pp (2011). [15] Krovetz, R.: Viewing morphology as an inference process, SIGIR, pp (1993). [16] Lang, H., Metzler, D., Wang, B. and Li, J.-T.: Improved latent concept expansion using hierarchical Markov random fields, CIKM, pp (2010). [17] Lavrenko, V. and Croft, W.B.: Relevance based language models, SIGIR, pp (2001). [18] Lease, M.: An improved Markov random field model for supporting verbose queries, SIGIR, pp (2009). [19] Li, X. and Croft, W.: Time-based language models, CIKM, pp (2003). [20] Liang, F., Qiang, R. and Yang, J.: Exploiting real-time information retrieval in the microblogosphere, JCDL, pp (2012). [21] Lin, J. and Efron, M.: Temporal relevance profiles for tweet search, #TAIA (2013). [22] Macdonald, C. and Ounis, I.: Global statistics in proximity weighting models, SIGIR Web N-gram Workshop (2010). [23] Massoudi, K., Tsagkias, M., de Rijke, M. and Weerkamp, W.: Incorporating query expansion and quality indicators in searching microblog posts, ECIR, pp (2011). [24] Metzler, D., Cai, C. and Hovy, E.: Structured event retrieval over microblog archives, HLT/NAACL, pp (2012). [25] Metzler, D. and Croft, W.B.: A Markov random field model for term dependencies, SIGIR, pp (2005). [26] Metzler, D. and Croft, W.B.: Latent concept expansion using Markov random fields, SIGIR, pp (2007). [27] Miyanishi, T., Seki, K. and Uehara, K.: Combining recency and topic-dependent temporal variation for microblog search, ECIR, pp (2013). [28] Miyanishi, T., Seki, K. and Uehara, K.: Improving pseudo-relevance feedback via tweet selection, CIKM, pp (2013). [29] Ounis, I., Macdonald, C., Lin, J. and Soboroff, I.: Overview of the TREC-2011 microblog track, TREC (2011). [30] Peetz, M.H., Meij, E., de Rijke, M. and Weerkamp, W.: Adaptive temporal query modeling, ECIR, pp (2012). [31] Ponte, J. and Croft, W.: A language modeling approach to information retrieval, SIGIR, pp (1998). [32] Smucker, M.D., Allan, J. and Carterette, B.: A comparison of statistical significance tests for information retrieval evaluation, CIKM, pp (2007). [33] Soboroff, I., Ounis, I. and Lin, J.: Overview of the TREC-2012 microblog track, TREC (2012). [34] Strohman, T., Metzler, D., Turtle, H. and Croft, W.: Indri: A language model-based search engine for complex queries, ICIA, pp.2 6 (2005). [35] Teevan, J., Ramage, D. and Morris, M.: #Twitter- Search: A comparison of microblog search and web search, WSDM, pp (2011). [36] Zhai, C. and Lafferty, J.: A study of smoothing methc 2014 Information Processing Society of Japan 9
10 ods for language models applied to information retrieval, TOIS, Vol.22, No.2, pp (2004) Web Ph.D AAAI c 2014 Information Processing Society of Japan 10
Mining the Temporal Statistics of Query Terms for Searching Social Media Posts
Mining the Temporal Statistics of Query Terms for Searching Social Media Posts Jinfeng Rao, 1 Ferhan Ture, 2 Xing Niu, 1 and Jimmy Lin 3 1 Department of Computer Science, University of Maryland 2 Comcast
More informationUsing temporal bursts for query modeling
DOI 10.1007/s10791-013-9227-2 Using temporal bursts for query modeling Maria-Hendrike Peetz Edgar Meij Maarten de Rijke Received: 2 October 2012 / Accepted: 23 February 2013 Ó Springer Science+Business
More information5 10 12 32 48 5 10 12 32 48 4 8 16 32 64 128 4 8 16 32 64 128 2 3 5 16 2 3 5 16 5 10 12 32 48 4 8 16 32 64 128 2 3 5 16 docid score 5 10 12 32 48 O'Neal averaged 15.2 points 9.2 rebounds and 1.0 assists
More informationUtilizing Passage-Based Language Models for Document Retrieval
Utilizing Passage-Based Language Models for Document Retrieval Michael Bendersky 1 and Oren Kurland 2 1 Center for Intelligent Information Retrieval, Department of Computer Science, University of Massachusetts,
More informationA Neural Passage Model for Ad-hoc Document Retrieval
A Neural Passage Model for Ad-hoc Document Retrieval Qingyao Ai, Brendan O Connor, and W. Bruce Croft College of Information and Computer Sciences, University of Massachusetts Amherst, Amherst, MA, USA,
More informationGeneralized Inverse Document Frequency
Generalized Inverse Document Frequency Donald Metzler metzler@yahoo-inc.com Yahoo! Research 2821 Mission College Blvd. Santa Clara, CA 95054 ABSTRACT Inverse document frequency (IDF) is one of the most
More informationLecture 13: More uses of Language Models
Lecture 13: More uses of Language Models William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 13 What we ll learn in this lecture Comparing documents, corpora using LM approaches
More informationarxiv: v1 [cs.ir] 1 May 2018
On the Equivalence of Generative and Discriminative Formulations of the Sequential Dependence Model arxiv:1805.00152v1 [cs.ir] 1 May 2018 Laura Dietz University of New Hampshire dietz@cs.unh.edu John Foley
More informationRanking-II. Temporal Representation and Retrieval Models. Temporal Information Retrieval
Ranking-II Temporal Representation and Retrieval Models Temporal Information Retrieval Ranking in Information Retrieval Ranking documents important for information overload, quickly finding documents which
More informationQuery Performance Prediction: Evaluation Contrasted with Effectiveness
Query Performance Prediction: Evaluation Contrasted with Effectiveness Claudia Hauff 1, Leif Azzopardi 2, Djoerd Hiemstra 1, and Franciska de Jong 1 1 University of Twente, Enschede, the Netherlands {c.hauff,
More informationWithin-Document Term-Based Index Pruning with Statistical Hypothesis Testing
Within-Document Term-Based Index Pruning with Statistical Hypothesis Testing Sree Lekha Thota and Ben Carterette Department of Computer and Information Sciences University of Delaware, Newark, DE, USA
More informationA Study of the Dirichlet Priors for Term Frequency Normalisation
A Study of the Dirichlet Priors for Term Frequency Normalisation ABSTRACT Ben He Department of Computing Science University of Glasgow Glasgow, United Kingdom ben@dcs.gla.ac.uk In Information Retrieval
More informationLiangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Bethlehem, PA
Rutgers, The State University of New Jersey Nov. 12, 2012 Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Bethlehem, PA Motivation Modeling Social Streams Future
More informationChengXiang ( Cheng ) Zhai Department of Computer Science University of Illinois at Urbana-Champaign
Axiomatic Analysis and Optimization of Information Retrieval Models ChengXiang ( Cheng ) Zhai Department of Computer Science University of Illinois at Urbana-Champaign http://www.cs.uiuc.edu/homes/czhai
More informationCS 646 (Fall 2016) Homework 3
CS 646 (Fall 2016) Homework 3 Deadline: 11:59pm, Oct 31st, 2016 (EST) Access the following resources before you start working on HW3: Download and uncompress the index file and other data from Moodle.
More informationLanguage Models, Smoothing, and IDF Weighting
Language Models, Smoothing, and IDF Weighting Najeeb Abdulmutalib, Norbert Fuhr University of Duisburg-Essen, Germany {najeeb fuhr}@is.inf.uni-due.de Abstract In this paper, we investigate the relationship
More informationTime-Aware Rank Aggregation for Microblog Search
Time-Aware Rank Aggregation for Microblog Search Shangsong Liang s.liang@uva.nl Edgar Meij emeij@yahoo-inc.com Zhaochun Ren z.ren@uva.nl Wouter Weerkamp wouter@904labs.com Maarten de Rijke derijke@uva.nl
More informationYahoo! Labs Nov. 1 st, Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University
Yahoo! Labs Nov. 1 st, 2012 Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Motivation Modeling Social Streams Future work Motivation Modeling Social Streams
More informationQuery-document Relevance Topic Models
Query-document Relevance Topic Models Meng-Sung Wu, Chia-Ping Chen and Hsin-Min Wang Industrial Technology Research Institute, Hsinchu, Taiwan National Sun Yat-Sen University, Kaohsiung, Taiwan Institute
More informationIs Document Frequency important for PRF?
Author manuscript, published in "ICTIR 2011 - International Conference on the Theory Information Retrieval, Bertinoro : Italy (2011)" Is Document Frequency important for PRF? Stéphane Clinchant 1,2 and
More informationThe Combination and Evaluation of Query Performance Prediction Methods
The Combination and Evaluation of Query Performance Prediction Methods Claudia Hauff 1, Leif Azzopardi 2, and Djoerd Hiemstra 1 1 University of Twente, The Netherlands {c.hauff,d.hiemstra}@utwente.nl 2
More informationUsing the Past To Score the Present: Extending Term Weighting Models Through Revision History Analysis
Using the Past To Score the Present: Extending Term Weighting Models Through Revision History Analysis Ablimit Aji Yu Wang Emory University {aaji,yuwang}@emoryedu Eugene Agichtein Emory University eugene@mathcsemoryedu
More informationINFO 630 / CS 674 Lecture Notes
INFO 630 / CS 674 Lecture Notes The Language Modeling Approach to Information Retrieval Lecturer: Lillian Lee Lecture 9: September 25, 2007 Scribes: Vladimir Barash, Stephen Purpura, Shaomei Wu Introduction
More informationLanguage Models. Web Search. LM Jelinek-Mercer Smoothing and LM Dirichlet Smoothing. Slides based on the books: 13
Language Models LM Jelinek-Mercer Smoothing and LM Dirichlet Smoothing Web Search Slides based on the books: 13 Overview Indexes Query Indexing Ranking Results Application Documents User Information analysis
More informationInformation Retrieval
Information Retrieval Online Evaluation Ilya Markov i.markov@uva.nl University of Amsterdam Ilya Markov i.markov@uva.nl Information Retrieval 1 Course overview Offline Data Acquisition Data Processing
More informationMarkov Precision: Modelling User Behaviour over Rank and Time
Markov Precision: Modelling User Behaviour over Rank and Time Marco Ferrante 1 Nicola Ferro 2 Maria Maistro 2 1 Dept. of Mathematics, University of Padua, Italy ferrante@math.unipd.it 2 Dept. of Information
More informationFeatured tweet search: Modeling time and social influence for microblog retrieval
Featured tweet search: Modeling time and social influence for microblog retrieval Lamjed Ben Jabeur, Lynda Tamine and Mohand Boughanem IRIT, Paul Sabatier University 118 Route de Narbonne F-3162 TOULOUSE
More informationPredicting Neighbor Goodness in Collaborative Filtering
Predicting Neighbor Goodness in Collaborative Filtering Alejandro Bellogín and Pablo Castells {alejandro.bellogin, pablo.castells}@uam.es Universidad Autónoma de Madrid Escuela Politécnica Superior Introduction:
More informationDynamic Clustering of Streaming Short Documents
Dynamic Clustering of Streaming Short Documents Shangsong Liang shangsong.liang@ucl.ac.uk Emine Yilmaz emine.yilmaz@ucl.ac.uk University College London, London, United Kingdom University of Amsterdam,
More informationFielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of Data
Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of Data Nikita Zhiltsov 1,2 Alexander Kotov 3 Fedor Nikolaev 3 1 Kazan Federal University 2 Textocat 3 Textual Data Analytics
More informationBoolean and Vector Space Retrieval Models
Boolean and Vector Space Retrieval Models Many slides in this section are adapted from Prof. Joydeep Ghosh (UT ECE) who in turn adapted them from Prof. Dik Lee (Univ. of Science and Tech, Hong Kong) 1
More informationRETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
RETRIEVAL MODELS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Boolean model Vector space model Probabilistic
More informationChapter 10: Information Retrieval. See corresponding chapter in Manning&Schütze
Chapter 10: Information Retrieval See corresponding chapter in Manning&Schütze Evaluation Metrics in IR 2 Goal In IR there is a much larger variety of possible metrics For different tasks, different metrics
More informationInjecting User Models and Time into Precision via Markov Chains
Injecting User Models and Time into Precision via Markov Chains Marco Ferrante Dept. Mathematics University of Padua, Italy ferrante@math.unipd.it Nicola Ferro Dept. Information Engineering University
More informationBoolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK).
Boolean and Vector Space Retrieval Models 2013 CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). 1 Table of Content Boolean model Statistical vector space model Retrieval
More informationCS630 Representing and Accessing Digital Information Lecture 6: Feb 14, 2006
Scribes: Gilly Leshed, N. Sadat Shami Outline. Review. Mixture of Poissons ( Poisson) model 3. BM5/Okapi method 4. Relevance feedback. Review In discussing probabilistic models for information retrieval
More informationSemantic Term Matching in Axiomatic Approaches to Information Retrieval
Semantic Term Matching in Axiomatic Approaches to Information Retrieval Hui Fang Department of Computer Science University of Illinois at Urbana-Champaign ChengXiang Zhai Department of Computer Science
More informationMotivation. User. Retrieval Model Result: Query. Document Collection. Information Need. Information Retrieval / Chapter 3: Retrieval Models
3. Retrieval Models Motivation Information Need User Retrieval Model Result: Query 1. 2. 3. Document Collection 2 Agenda 3.1 Boolean Retrieval 3.2 Vector Space Model 3.3 Probabilistic IR 3.4 Statistical
More informationMeasuring the Variability in Effectiveness of a Retrieval System
Measuring the Variability in Effectiveness of a Retrieval System Mehdi Hosseini 1, Ingemar J. Cox 1, Natasa Millic-Frayling 2, and Vishwa Vinay 2 1 Computer Science Department, University College London
More informationTime Series Topic Modeling and Bursty Topic Detection of Correlated News and Twitter
Time Series Topic Modeling and Bursty Topic Detection of Correlated News and Twitter Daichi Koike Yusuke Takahashi Takehito Utsuro Grad. Sch. Sys. & Inf. Eng., University of Tsukuba, Tsukuba, 305-8573,
More informationThe Use of Dependency Relation Graph to Enhance the Term Weighting in Question Retrieval
The Use of Dependency Relation Graph to Enhance the Term Weighting in Question Retrieval Weinan Zhang 1 Zhao- yan M ing 2 Yu Zhang 1 Liqiang N ie 2 T ing Liu 1 Tat-Seng Chua 2 (1) School of Computer Science
More informationModeling Controversy Within Populations
Modeling Controversy Within Populations Myungha Jang, Shiri Dori-Hacohen and James Allan Center for Intelligent Information Retrieval (CIIR) University of Massachusetts Amherst {mhjang, shiri, allan}@cs.umass.edu
More informationComparing Relevance Feedback Techniques on German News Articles
B. Mitschang et al. (Hrsg.): BTW 2017 Workshopband, Lecture Notes in Informatics (LNI), Gesellschaft für Informatik, Bonn 2017 301 Comparing Relevance Feedback Techniques on German News Articles Julia
More informationIII.6 Advanced Query Types
III.6 Advanced Query Types 1. Query Expansion 2. Relevance Feedback 3. Novelty & Diversity Based on MRS Chapter 9, BY Chapter 5, [Carbonell and Goldstein 98] [Agrawal et al 09] 123 1. Query Expansion Query
More informationModeling the Score Distributions of Relevant and Non-relevant Documents
Modeling the Score Distributions of Relevant and Non-relevant Documents Evangelos Kanoulas, Virgil Pavlu, Keshi Dai, and Javed A. Aslam College of Computer and Information Science Northeastern University,
More informationReducing the Risk of Query Expansion via Robust Constrained Optimization
Reducing the Risk of Query Expansion via Constrained Optimization Kevyn Collins-Thompson Microsoft Research 1 Microsoft Way Redmond, WA 98-6399 U.S.A. kevynct@microsoft.com ABSTRACT We introduce a new
More informationChap 2: Classical models for information retrieval
Chap 2: Classical models for information retrieval Jean-Pierre Chevallet & Philippe Mulhem LIG-MRIM Sept 2016 Jean-Pierre Chevallet & Philippe Mulhem Models of IR 1 / 81 Outline Basic IR Models 1 Basic
More informationPositional Language Models for Information Retrieval
Positional Language Models for Information Retrieval Yuanhua Lv Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL 61801 ylv2@uiuc.edu ChengXiang Zhai Department of Computer
More informationDiscovering Geographical Topics in Twitter
Discovering Geographical Topics in Twitter Liangjie Hong, Lehigh University Amr Ahmed, Yahoo! Research Alexander J. Smola, Yahoo! Research Siva Gurumurthy, Twitter Kostas Tsioutsiouliklis, Twitter Overview
More informationBlog Distillation via Sentiment-Sensitive Link Analysis
Blog Distillation via Sentiment-Sensitive Link Analysis Giacomo Berardi, Andrea Esuli, Fabrizio Sebastiani, and Fabrizio Silvestri Istituto di Scienza e Tecnologie dell Informazione, Consiglio Nazionale
More informationOptimal Mixture Models in IR
Optimal Mixture Models in IR Victor Lavrenko Center for Intelligent Information Retrieval Department of Comupter Science University of Massachusetts, Amherst, MA 01003, lavrenko@cs.umass.edu Abstract.
More informationLanguage Models. Hongning Wang
Language Models Hongning Wang CS@UVa Notion of Relevance Relevance (Rep(q), Rep(d)) Similarity P(r1 q,d) r {0,1} Probability of Relevance P(d q) or P(q d) Probabilistic inference Different rep & similarity
More informationA Probabilistic Model for Online Document Clustering with Application to Novelty Detection
A Probabilistic Model for Online Document Clustering with Application to Novelty Detection Jian Zhang School of Computer Science Cargenie Mellon University Pittsburgh, PA 15213 jian.zhang@cs.cmu.edu Zoubin
More informationTowards Collaborative Information Retrieval
Towards Collaborative Information Retrieval Markus Junker, Armin Hust, and Stefan Klink German Research Center for Artificial Intelligence (DFKI GmbH), P.O. Box 28, 6768 Kaiserslautern, Germany {markus.junker,
More informationOn the Foundations of Diverse Information Retrieval. Scott Sanner, Kar Wai Lim, Shengbo Guo, Thore Graepel, Sarvnaz Karimi, Sadegh Kharazmi
On the Foundations of Diverse Information Retrieval Scott Sanner, Kar Wai Lim, Shengbo Guo, Thore Graepel, Sarvnaz Karimi, Sadegh Kharazmi 1 Outline Need for diversity The answer: MMR But what was the
More informationRanked Retrieval (2)
Text Technologies for Data Science INFR11145 Ranked Retrieval (2) Instructor: Walid Magdy 31-Oct-2017 Lecture Objectives Learn about Probabilistic models BM25 Learn about LM for IR 2 1 Recall: VSM & TFIDF
More informationResearch Methodology in Studies of Assessor Effort for Information Retrieval Evaluation
Research Methodology in Studies of Assessor Effort for Information Retrieval Evaluation Ben Carterette & James Allan Center for Intelligent Information Retrieval Computer Science Department 140 Governors
More informationAn Efficient Framework for Constructing Generalized Locally-Induced Text Metrics
Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence An Efficient Framework for Constructing Generalized Locally-Induced Text Metrics Saeed Amizadeh Intelligent Systems
More informationCorrelation, Prediction and Ranking of Evaluation Metrics in Information Retrieval
Correlation, Prediction and Ranking of Evaluation Metrics in Information Retrieval Soumyajit Gupta 1, Mucahid Kutlu 2, Vivek Khetan 1, and Matthew Lease 1 1 University of Texas at Austin, USA 2 TOBB University
More informationLanguage Models and Smoothing Methods for Collections with Large Variation in Document Length. 2 Models
Language Models and Smoothing Methods for Collections with Large Variation in Document Length Najeeb Abdulmutalib and Norbert Fuhr najeeb@is.inf.uni-due.de, norbert.fuhr@uni-due.de Information Systems,
More informationTopic Models and Applications to Short Documents
Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text
More informationA Framework of Detecting Burst Events from Micro-blogging Streams
, pp.379-384 http://dx.doi.org/10.14257/astl.2013.29.78 A Framework of Detecting Burst Events from Micro-blogging Streams Kaifang Yang, Yongbo Yu, Lizhou Zheng, Peiquan Jin School of Computer Science and
More informationThe Axiometrics Project
The Axiometrics Project Eddy Maddalena and Stefano Mizzaro Department of Mathematics and Computer Science University of Udine Udine, Italy eddy.maddalena@uniud.it, mizzaro@uniud.it Abstract. The evaluation
More informationLatent Semantic Tensor Indexing for Community-based Question Answering
Latent Semantic Tensor Indexing for Community-based Question Answering Xipeng Qiu, Le Tian, Xuanjing Huang Fudan University, 825 Zhangheng Road, Shanghai, China xpqiu@fudan.edu.cn, tianlefdu@gmail.com,
More informationThe effect on accuracy of tweet sample size for hashtag segmentation dictionary construction
The effect on accuracy of tweet sample size for hashtag segmentation dictionary construction Laurence A. F. Park and Glenn Stone School of Computing, Engineering and Mathematics Western Sydney University,
More informationA Study of Smoothing Methods for Language Models Applied to Information Retrieval
A Study of Smoothing Methods for Language Models Applied to Information Retrieval CHENGXIANG ZHAI and JOHN LAFFERTY Carnegie Mellon University Language modeling approaches to information retrieval are
More informationBayesian Query-Focused Summarization
Bayesian Query-Focused Summarization Hal Daumé III and Daniel Marcu Information Sciences Institute 4676 Admiralty Way, Suite 1001 Marina del Rey, CA 90292 me@hal3.name,marcu@isi.edu Abstract We present
More informationAdvances in IR Evalua0on. Ben Cartere6e Evangelos Kanoulas Emine Yilmaz
Advances in IR Evalua0on Ben Cartere6e Evangelos Kanoulas Emine Yilmaz Course Outline Intro to evalua0on Evalua0on methods, test collec0ons, measures, comparable evalua0on Low cost evalua0on Advanced user
More informationA Risk Minimization Framework for Information Retrieval
A Risk Minimization Framework for Information Retrieval ChengXiang Zhai a John Lafferty b a Department of Computer Science University of Illinois at Urbana-Champaign b School of Computer Science Carnegie
More informationInformation Retrieval CS Lecture 03. Razvan C. Bunescu School of Electrical Engineering and Computer Science
Information Retrieval CS 6900 Lecture 03 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Statistical Properties of Text Zipf s Law models the distribution of terms
More informationAdvanced Topics in Information Retrieval 5. Diversity & Novelty
Advanced Topics in Information Retrieval 5. Diversity & Novelty Vinay Setty (vsetty@mpi-inf.mpg.de) Jannik Strötgen (jtroetge@mpi-inf.mpg.de) 1 Outline 5.1. Why Novelty & Diversity? 5.2. Probability Ranking
More informationMedical Question Answering for Clinical Decision Support
Medical Question Answering for Clinical Decision Support Travis R. Goodwin and Sanda M. Harabagiu The University of Texas at Dallas Human Language Technology Research Institute http://www.hlt.utdallas.edu
More informationApplying the Divergence From Randomness Approach for Content-Only Search in XML Documents
Applying the Divergence From Randomness Approach for Content-Only Search in XML Documents Mohammad Abolhassani and Norbert Fuhr Institute of Informatics and Interactive Systems, University of Duisburg-Essen,
More informationFall CS646: Information Retrieval. Lecture 6 Boolean Search and Vector Space Model. Jiepu Jiang University of Massachusetts Amherst 2016/09/26
Fall 2016 CS646: Information Retrieval Lecture 6 Boolean Search and Vector Space Model Jiepu Jiang University of Massachusetts Amherst 2016/09/26 Outline Today Boolean Retrieval Vector Space Model Latent
More informationBEWARE OF RELATIVELY LARGE BUT MEANINGLESS IMPROVEMENTS
TECHNICAL REPORT YL-2011-001 BEWARE OF RELATIVELY LARGE BUT MEANINGLESS IMPROVEMENTS Roi Blanco, Hugo Zaragoza Yahoo! Research Diagonal 177, Barcelona, Spain {roi,hugoz}@yahoo-inc.com Bangalore Barcelona
More informationIPSJ SIG Technical Report Vol.2014-MPS-100 No /9/25 1,a) 1 1 SNS / / / / / / Time Series Topic Model Considering Dependence to Multiple Topics S
1,a) 1 1 SNS /// / // Time Series Topic Model Considering Dependence to Multiple Topics Sasaki Kentaro 1,a) Yoshikawa Tomohiro 1 Furuhashi Takeshi 1 Abstract: This pater proposes a topic model that considers
More informationA General Linear Mixed Models Approach to Study System Component Effects
A General Linear Mixed Models Approach to Study System Component Effects ABSTRACT Nicola Ferro Department of Information Engineering University of Padua Padua, Italy nicolaferro@unipdit Topic variance
More informationUSING SINGULAR VALUE DECOMPOSITION (SVD) AS A SOLUTION FOR SEARCH RESULT CLUSTERING
POZNAN UNIVE RSIY OF E CHNOLOGY ACADE MIC JOURNALS No. 80 Electrical Engineering 2014 Hussam D. ABDULLA* Abdella S. ABDELRAHMAN* Vaclav SNASEL* USING SINGULAR VALUE DECOMPOSIION (SVD) AS A SOLUION FOR
More informationLatent Semantic Analysis. Hongning Wang
Latent Semantic Analysis Hongning Wang CS@UVa Recap: vector space model Represent both doc and query by concept vectors Each concept defines one dimension K concepts define a high-dimensional space Element
More informationRanking Sequential Patterns with Respect to Significance
Ranking Sequential Patterns with Respect to Significance Robert Gwadera, Fabio Crestani Universita della Svizzera Italiana Lugano, Switzerland Abstract. We present a reliable universal method for ranking
More informationA Note on the Expectation-Maximization (EM) Algorithm
A Note on the Expectation-Maximization (EM) Algorithm ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign March 11, 2007 1 Introduction The Expectation-Maximization
More informationA Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank
A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank Shoaib Jameel Shoaib Jameel 1, Wai Lam 2, Steven Schockaert 1, and Lidong Bing 3 1 School of Computer Science and Informatics,
More informationCS276A Text Information Retrieval, Mining, and Exploitation. Lecture 4 15 Oct 2002
CS276A Text Information Retrieval, Mining, and Exploitation Lecture 4 15 Oct 2002 Recap of last time Index size Index construction techniques Dynamic indices Real world considerations 2 Back of the envelope
More informationHub, Authority and Relevance Scores in Multi-Relational Data for Query Search
Hub, Authority and Relevance Scores in Multi-Relational Data for Query Search Xutao Li 1 Michael Ng 2 Yunming Ye 1 1 Department of Computer Science, Shenzhen Graduate School, Harbin Institute of Technology,
More informationAnalysis of Patents for Prior Art Candidate Search
Analysis of Patents for Prior Art Candidate Search Sébastien Déjean, Nicolas Faessel, Lucille Marty, Josiane Mothe, Samantha Sadala and Soda Thiam IMT, Toulouse, France Email: sebastien.dejean@math.univ-toulouse.fr
More informationINFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from
INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Schütze s, linked from http://informationretrieval.org/ IR 26/26: Feature Selection and Exam Overview Paul Ginsparg Cornell University,
More informationEffectiveness of complex index terms in information retrieval
Effectiveness of complex index terms in information retrieval Tokunaga Takenobu, Ogibayasi Hironori and Tanaka Hozumi Department of Computer Science Tokyo Institute of Technology Abstract This paper explores
More informationLearning Topical Transition Probabilities in Click Through Data with Regression Models
Learning Topical Transition Probabilities in Click Through Data with Regression Xiao Zhang, Prasenjit Mitra Department of Computer Science and Engineering College of Information Sciences and Technology
More informationLearning Maximal Marginal Relevance Model via Directly Optimizing Diversity Evaluation Measures
Learning Maximal Marginal Relevance Model via Directly Optimizing Diversity Evaluation Measures Long Xia Jun Xu Yanyan Lan Jiafeng Guo Xueqi Cheng CAS Key Lab of Network Data Science and Technology, Institute
More informationIntegrating Logical Operators in Query Expansion in Vector Space Model
Integrating Logical Operators in Query Expansion in Vector Space Model Jian-Yun Nie, Fuman Jin DIRO, Université de Montréal C.P. 6128, succursale Centre-ville, Montreal Quebec, H3C 3J7 Canada {nie, jinf}@iro.umontreal.ca
More informationClick Models for Web Search
Click Models for Web Search Lecture 1 Aleksandr Chuklin, Ilya Markov Maarten de Rijke a.chuklin@uva.nl i.markov@uva.nl derijke@uva.nl University of Amsterdam Google Research Europe AC IM MdR Click Models
More informationECIR Tutorial 30 March Advanced Language Modeling Approaches (case study: expert search) 1/100.
Advanced Language Modeling Approaches (case study: expert search) http://www.cs.utwente.nl/~hiemstra 1/100 Overview 1. Introduction to information retrieval and three basic probabilistic approaches The
More informationLanguage Models. CS6200: Information Retrieval. Slides by: Jesse Anderton
Language Models CS6200: Information Retrieval Slides by: Jesse Anderton What s wrong with VSMs? Vector Space Models work reasonably well, but have a few problems: They are based on bag-of-words, so they
More information1 Information retrieval fundamentals
CS 630 Lecture 1: 01/26/2006 Lecturer: Lillian Lee Scribes: Asif-ul Haque, Benyah Shaparenko This lecture focuses on the following topics Information retrieval fundamentals Vector Space Model (VSM) Deriving
More informationPassage language models in ad hoc document retrieval
Passage language models in ad hoc document retrieval Research Thesis In Partial Fulfillment of The Requirements for the Degree of Master of Science in Information Management Engineering Michael Bendersky
More informationA Query Algebra for Quantum Information Retrieval
A Query Algebra for Quantum Information Retrieval Annalina Caputo 1, Benjamin Piwowarski 2, Mounia Lalmas 3 1 University of Bari, Italy. acaputo@di.uniba.it 2 University of Glasgow, UK. benjamin@bpiwowar.net
More informationLower-Bounding Term Frequency Normalization
Lower-Bounding Term Frequency Normalization Yuanhua Lv Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL 68 ylv@uiuc.edu ChengXiang Zhai Department of Computer Science
More informationFast LSI-based techniques for query expansion in text retrieval systems
Fast LSI-based techniques for query expansion in text retrieval systems L. Laura U. Nanni F. Sarracco Department of Computer and System Science University of Rome La Sapienza 2nd Workshop on Text-based
More informationFaceted Models of Blog Feeds
Faceted Models of Blog Feeds Lifeng Jia Department of Computer Science University of Illinois at Chicago 851 S. Morgan Street Chicago, IL, USA, 60607 ljia2@uic.edu Clement Yu Department of Computer Science
More informationAN INTRODUCTION TO TOPIC MODELS
AN INTRODUCTION TO TOPIC MODELS Michael Paul December 4, 2013 600.465 Natural Language Processing Johns Hopkins University Prof. Jason Eisner Making sense of text Suppose you want to learn something about
More information