Ranking-II. Temporal Representation and Retrieval Models. Temporal Information Retrieval

Similar documents
Information Retrieval

Probabilistic Information Retrieval

Lecture 9: Probabilistic IR The Binary Independence Model and Okapi BM25

Boolean and Vector Space Retrieval Models

Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Ranked Retrieval (2)

Information Retrieval

Natural Language Processing. Topics in Information Retrieval. Updated 5/10

Term Weighting and Vector Space Model. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology

IR Models: The Probabilistic Model. Lecture 8

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology

Motivation. User. Retrieval Model Result: Query. Document Collection. Information Need. Information Retrieval / Chapter 3: Retrieval Models

Information Retrieval

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology

Term Weighting and the Vector Space Model. borrowing from: Pandu Nayak and Prabhakar Raghavan

PV211: Introduction to Information Retrieval

Information Retrieval

Ranked IR. Lecture Objectives. Text Technologies for Data Science INFR Learn about Ranked IR. Implement: 10/10/2017. Instructor: Walid Magdy

Information Retrieval and Web Search

Boolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK).

Vector Space Scoring Introduction to Information Retrieval INF 141 Donald J. Patterson

Web Information Retrieval Dipl.-Inf. Christoph Carl Kling

What is Text mining? To discover the useful patterns/contents from the large amount of data that can be structured or unstructured.

Lecture 12: Link Analysis for Web Retrieval

Chapter 4: Advanced IR Models

Scoring, Term Weighting and the Vector Space


PV211: Introduction to Information Retrieval

Vector Space Scoring Introduction to Information Retrieval Informatics 141 / CS 121 Donald J. Patterson

CAIM: Cerca i Anàlisi d Informació Massiva

INFO 630 / CS 674 Lecture Notes

Variable Latent Semantic Indexing

.. CSC 566 Advanced Data Mining Alexander Dekhtyar..

Fall CS646: Information Retrieval. Lecture 6 Boolean Search and Vector Space Model. Jiepu Jiang University of Massachusetts Amherst 2016/09/26

vector space retrieval many slides courtesy James Amherst

CS630 Representing and Accessing Digital Information Lecture 6: Feb 14, 2006

CS276A Text Information Retrieval, Mining, and Exploitation. Lecture 4 15 Oct 2002

A Study of the Dirichlet Priors for Term Frequency Normalisation

TDDD43. Information Retrieval. Fang Wei-Kleiner. ADIT/IDA Linköping University. Fang Wei-Kleiner ADIT/IDA LiU TDDD43 Information Retrieval 1

Vector Space Model. Yufei Tao KAIST. March 5, Y. Tao, March 5, 2013 Vector Space Model

Chap 2: Classical models for information retrieval

Ranked IR. Lecture Objectives. Text Technologies for Data Science INFR Learn about Ranked IR. Implement: 10/10/2018. Instructor: Walid Magdy

Dealing with Text Databases

Information Retrieval and Topic Models. Mausam (Based on slides of W. Arms, Dan Jurafsky, Thomas Hofmann, Ata Kaban, Chris Manning, Melanie Martin)

Pivoted Length Normalization I. Summary idf II. Review

Information Retrieval. Lecture 6

CSCE 561 Information Retrieval System Models

Modern Information Retrieval

Recap of the last lecture. CS276A Information Retrieval. This lecture. Documents as vectors. Intuition. Why turn docs into vectors?

Maschinelle Sprachverarbeitung

16 The Information Retrieval "Data Model"

Chapter 10: Information Retrieval. See corresponding chapter in Manning&Schütze

ChengXiang ( Cheng ) Zhai Department of Computer Science University of Illinois at Urbana-Champaign

Machine Learning for natural language processing

Compact Indexes for Flexible Top-k Retrieval

Information Retrieval

CS 646 (Fall 2016) Homework 3

1 Information retrieval fundamentals

Language Models. Hongning Wang

Language Models. CS6200: Information Retrieval. Slides by: Jesse Anderton

University of Illinois at Urbana-Champaign. Midterm Examination

Behavioral Data Mining. Lecture 2

Informa(on Retrieval

Outline for today. Information Retrieval. Cosine similarity between query and document. tf-idf weighting

Information Retrieval

Lecture 5: Web Searching using the SVD

11. Learning To Rank. Most slides were adapted from Stanford CS 276 course.

Comparing Relevance Feedback Techniques on German News Articles

Concept Tracking for Microblog Search

Lecture 13: More uses of Language Models

9 Searching the Internet with the SVD

13 Searching the Web with the SVD

As it is not necessarily possible to satisfy this equation, we just ask for a solution to the more general equation

Score Distribution Models

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

Retrieval by Content. Part 2: Text Retrieval Term Frequency and Inverse Document Frequency. Srihari: CSE 626 1

Query Propagation in Possibilistic Information Retrieval Networks

Information Retrieval Basic IR models. Luca Bondi

Part I: Web Structure Mining Chapter 1: Information Retrieval and Web Search

Informa(on Retrieval

Midterm Examination Practice

Language Models. Web Search. LM Jelinek-Mercer Smoothing and LM Dirichlet Smoothing. Slides based on the books: 13

Computer science research seminar: VideoLectures.Net recommender system challenge: presentation of baseline solution

Latent Semantic Models. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

Vector Space Scoring Introduction to Information Retrieval INF 141 Donald J. Patterson

Querying. 1 o Semestre 2008/2009

Topic Models and Applications to Short Documents

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

ANLP Lecture 22 Lexical Semantics with Dense Vectors

MATRIX DECOMPOSITION AND LATENT SEMANTIC INDEXING (LSI) Introduction to Information Retrieval CS 150 Donald J. Patterson

Language Models, Smoothing, and IDF Weighting

3. Basics of Information Retrieval

Lesson4: Descriptive Modelling of Similarity of Text Unit3: Vector space models for similarity

B. Piwowarski P. Gallinari G. Dupret

CS 572: Information Retrieval

Non-Boolean models of retrieval: Agenda

On the Foundations of Diverse Information Retrieval. Scott Sanner, Kar Wai Lim, Shengbo Guo, Thore Graepel, Sarvnaz Karimi, Sadegh Kharazmi

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

Transcription:

Ranking-II Temporal Representation and Retrieval Models Temporal Information Retrieval

Ranking in Information Retrieval Ranking documents important for information overload, quickly finding documents which are relevant for the query Notion of relevance : In IR documents are ranked according to how relevant they are to the issued query Higher the rank, more the relevance of the document Given a query, find a ordered sequence or ranking of relevant documents based on the notion of relevance Interpretations and Modelling of relevance Geometric Interpretation Vector Space Similarity Probabilistic interpretation Probabilistic IR, Statistical LM q 1) d1 2) d5 3) d3 4) d2.. 2

Vector Space Similarity Given a document collection D = {d1, d2,.}, and a query q Represent a document d as a vector where the dimensions are the vocabulary of the collection V (all distinct word in the collection) Weights of the dimensions are boolean (term presence), integral (tf), scalar (tf-idf or BM-25) Represent the query also as a vector on the same vocabulary space Relevance of the document d given q is given by the cosine similarity (dot product) between both these vectors

Vector Space Similarity q = apple aadvark apple".. zzz 0 1 0 d1 d2 aadvark apple".. zzz 2 0 0 aadvark apple".. zzz 0 4 1 sim(d, q) = i.e. tf in doc. P i2v w d(i).w q (i) ppi2v w d(q i ) 2p P 4 i2v w q(i) 2 tf in query 4

Term Weighting Boolean retrieval 0,1 encoding presence or absence of term Weighted retrieval tf-idf score (idf for discriminative nature of term) Okapi-BM25 : probabilistic notion, with length normalisation = t q log [ N df t idf ] (k 1 +1)tf td k 1 ((1 b)+b (L d /L ave )) + tf td tf length normalization sim(d, q) = P i2v w d(i).w q (i) ppi2v w d(q i ) 2p P i2v w q(i) 2 5 5

Statistical Language models for Ranking Notion of relevance: How likely is a document D is generated from Q P (D Q) / P (Q D).P (D) P (Q D) = Y w i 2Q doc. contrib. laplace smoothing corpus contrib..p (w i D) + (1 ).P (w i C) P (Q D) = Y w i 2Q doc. contrib. corpus contrib. tf(w i ; D) + µ.p (w i C) D + µ dirichlet smoothing How do we improve retrieval models by adding temporal information? 6

Temporal Representation Valid time vs Transaction time Valid time: related to the period of time during which the events occur in real time Transaction time refers to the time when fact was stored (say in DB or corpus) Focus time : time mentioned or implicitly referred to in the content multiple focus times per document possible Reading time: In web search, reading time is assumed to be the same as the time a search query was issued 7

Temporal Expressions Temporal expressions are Natural language text which have a temporal meaning Explicit temporal expressions : denote a precise moment in time and can be anchored on timelines without further knowledge december 2014 Implicit Temporal Expressions : associated with events carrying an implicit temporal nature christmas day Need to look at text proximity or explicit features for disambiguation Relative temporal expressions : Implicit metions dependent on a timepoint (publication date) or previous explicit mention..yesterday,..next monday,..in 2 hours 8..in the 1980 s

Temporal Expressions Temporal expressions are Natural language text which have a temporal meaning Explicit temporal expressions : denote a precise moment in time and can be anchored on timelines without further knowledge december 2014 Implicit Temporal Expressions : associated with events carrying an implicit temporal nature christmas day Need to look at text proximity or explicit features for disambiguation Relative temporal expressions : Implicit metions dependent on a timepoint (publication date) or previous explicit mention..yesterday,..next monday,..in 2 hours 9..in the 1980 s

Recency Aware Ranking Freshness of documents could be utilised for improving ranking How do we include freshness? now P (D Q) / P (Q D).P (D) publication times Querying time or hitting time : Time when the query was issues Recently published documents are more relevant that older documents Exponential decay to model old documents freshness param. P (D) = e.(t now t D ) pub. time 10

Recency Aware Ranking - Timeliness Equal decay to all queries P (D Q) / P (Q D).P (D) P (D) = e.(t now t D ) Some queries are more timely than others hairstyle fashion trends, olympics, house of cards Estimate different parameters for different queries Parameter typically estimated from user assessments (next lecture) P (D) = Q e Q.(t now t D ) refer to Estimation Methods for Ranking Recent Information, Miles Efron and Gene Golovchinsky, SIGIR 11 11

Exploiting Temporal References for Ranking Temporal information from text good indicators for focus time of the doc. FIFA World Cup tournaments of the 1990 s Movies that won an Academy Award in 2007 Crusades of the 12th century London Summer Olympics 2012 Four-tuple representation: T = (tbl, tbu, tel,teu) Upper and Lower bounds for begin and end times in 1998, e.g., is represented as (1998/01/01, 1998/12/31, 1998/01/01, 1998/12/31) 12

Exploiting Temporal References for Ranking Distinguish between temporal and textual part of the document P (Q D) = P (Q text D text ).P (Q time D time ) Independent generation of temporal query expressions 2 Step Generation P (Q time D time ) = Y q2q time P (q D time ) Draw a temporal expression at random Generate q from T P (q D time ) = 1 D time Y T 2D time P (q T ) 13

Matching Temporal Expressions How do you match two temporal expressions Exact Match: Intervals should match exactly (after normalization) query: football in the 80 s temporal expression: within 1980 to 1990. p (q T) = 1 or 0 Might miss almost similar expressions Partial Match: query: football in the 80 s temporal expression: in 1982 to 1989. P (q D time ) = 1 D time Y T 2D time P (q T ) query temporal expression P (q T )= T \ q T. q 14

Summary P (D Q) / P (Q D).P (D) P (Q D) = P (Q text D text ).P (Q time D time ) P (Q time D time ) = P (q D time ) = Y q2q time P (q D time ) 1 D time Y T 2D time P (q T ) P (D) = Q e Q.(t now t D ) exploiting freshness from publication times exploiting temporal expressions Rank documents using P (D Q) as scores Index scores in a temporal index for faster retrieval 15

References and Further Readings Informa)on retrieval: (h3p://www.ir.uwaterloo.ca/book/) Stefan Bü3cher, Google Inc., Charles L. A. Clarke, Univ. of Waterloo, Gordon V. Cormack, Univ. of Waterloo Founda)ons of Informa)on retrieval: Manning, Schutze, Raghavan Es)ma)on Methods for Ranking Recent Informa)on, Miles Efron and Gene Golovchinsky, SIGIR 11 Temporal Search in Web Archives, Klaus Berberich, 2010.