Markov Precision: Modelling User Behaviour over Rank and Time

Similar documents
Injecting User Models and Time into Precision via Markov Chains

Click Models for Web Search

An interval-like scale property for IR evaluation measures

Correlation, Prediction and Ranking of Evaluation Metrics in Information Retrieval

Advanced Click Models & their Applications to IR

The Maximum Entropy Method for Analyzing Retrieval Measures. Javed Aslam Emine Yilmaz Vilgil Pavlu Northeastern University

Stochastic Processes

A Novel Click Model and Its Applications to Online Advertising

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Lecture 12: Link Analysis for Web Retrieval

Part A. P (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 ) P (w M w 1 w 2 w M 1 ) P (w 1 )P (w 2 w 1 )P (w 3 w 2 ) P (w M w M 1 )

Link Analysis. Leonid E. Zhukov

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine

Ranked Retrieval (2)

+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing

A Neural Passage Model for Ad-hoc Document Retrieval

1998: enter Link Analysis

Language Models and Smoothing Methods for Collections with Large Variation in Document Length. 2 Models

The Static Absorbing Model for the Web a

MATH 56A: STOCHASTIC PROCESSES CHAPTER 1

Stochastic Modelling

MATH 564/STAT 555 Applied Stochastic Processes Homework 2, September 18, 2015 Due September 30, 2015

OPTIMIZING SEARCH ENGINES USING CLICKTHROUGH DATA. Paper By: Thorsten Joachims (Cornell University) Presented By: Roy Levin (Technion)

Research Methodology in Studies of Assessor Effort for Information Retrieval Evaluation

Evaluation Metrics. Jaime Arguello INLS 509: Information Retrieval March 25, Monday, March 25, 13

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Lecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016

Collaborative topic models: motivations cont

Modeling User s Cognitive Dynamics in Information Access and Retrieval using Quantum Probability (ESR-6)

Math 304 Handout: Linear algebra, graphs, and networks.

1 Introduction 2. 4 Q-Learning The Q-value The Temporal Difference The whole Q-Learning process... 5

Are IR Evaluation Measures on an Interval Scale?

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa

PageRank. Ryan Tibshirani /36-662: Data Mining. January Optional reading: ESL 14.10

Introduction to Machine Learning CMU-10701

Reading: Karlin and Taylor Ch. 5 Resnick Ch. 3. A renewal process is a generalization of the Poisson point process.

Language Models, Smoothing, and IDF Weighting

Measuring the Variability in Effectiveness of a Retrieval System

LINK ANALYSIS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Data Mining and Matrices

Semestrial Project - Expedia Hotel Ranking

Towards modeling implicit feedback with quantum entanglement

IR: Information Retrieval

Law of large numbers for Markov chains

0.1 Naive formulation of PageRank

MATH3200, Lecture 31: Applications of Eigenvectors. Markov Chains and Chemical Reaction Systems

8. Statistical Equilibrium and Classification of States: Discrete Time Markov Chains

The Dynamic Absorbing Model for the Web

Data analysis and stochastic modeling

0 t < 0 1 t 1. u(t) =

Medical Question Answering for Clinical Decision Support

CS630 Representing and Accessing Digital Information Lecture 6: Feb 14, 2006

Lecture: Local Spectral Methods (1 of 4)

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Page rank computation HPC course project a.y

Convex Optimization CMU-10725

Information Retrieval

Introduction to Restricted Boltzmann Machines

MultiRank and HAR for Ranking Multi-relational Data, Transition Probability Tensors, and Multi-Stochastic Tensors

Probabilistic Field Mapping for Product Search

PageRank algorithm Hubs and Authorities. Data mining. Web Data Mining PageRank, Hubs and Authorities. University of Szeged.

Question Points Score Total: 70

Markov chains. 1 Discrete time Markov chains. c A. J. Ganesh, University of Bristol, 2015

Online Social Networks and Media. Link Analysis and Web Search

CS276A Text Information Retrieval, Mining, and Exploitation. Lecture 4 15 Oct 2002

Web Structure Mining Nodes, Links and Influence

Stochastic modelling of epidemic spread

IEOR 4106: Introduction to Operations Research: Stochastic Models Spring 2011, Professor Whitt Class Lecture Notes: Tuesday, March 1.

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Non-homogeneous random walks on a semi-infinite strip

Information Retrieval and Web Search

stochnotes Page 1

Statistics 992 Continuous-time Markov Chains Spring 2004

Midterm Examination Practice

A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank

The TREC-4 Filtering Track. David D. Lewis. AT&T Research. 600 Mountain Avenue. Murray Hill, NJ

arxiv:cond-mat/ v1 3 Sep 2004

Can Vector Space Bases Model Context?

Conditional Random Field

Statistics 150: Spring 2007

Information Retrieval

Some Formal Analysis of Rocchio s Similarity-Based Relevance Feedback Algorithm

Low-Cost and Robust Evaluation of Information Retrieval Systems

Entropy Rate of Stochastic Processes

The Axiometrics Project

Learning Topical Transition Probabilities in Click Through Data with Regression Models

P(X 0 = j 0,... X nk = j k )

Gaussian discriminant analysis Naive Bayes

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Unsupervised Rank Aggregation with Distance-Based Models

Stochastic modelling of epidemic spread

The Transition Probability Function P ij (t)

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /

Stochastic Modelling Unit 1: Markov chain models

Modelling data networks stochastic processes and Markov chains

Query Performance Prediction: Evaluation Contrasted with Effectiveness

AP Calculus AB Course Description and Syllabus

Modelling data networks stochastic processes and Markov chains

Link Analysis. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

BEWARE OF RELATIVELY LARGE BUT MEANINGLESS IMPROVEMENTS

Transcription:

Markov Precision: Modelling User Behaviour over Rank and Time Marco Ferrante 1 Nicola Ferro 2 Maria Maistro 2 1 Dept. of Mathematics, University of Padua, Italy ferrante@math.unipd.it 2 Dept. of Information Engineering, University of Padua, Italy ferro@dei.unipd.it, maistro@dei.unipd.it IIR 2015, 6th Italian Information Retrieval Workshop Cagliari, 25 26 May 2015

Outline 1 2 A Markovian Approach to Evaluation Measures 3

User Models Information Retrieval (IR) evaluation measures, implicitly or explicitly, embed a user model describing how the user interacts with the ranked result list User models may be more or less artificial and may be more or less correlated with actual user behaviour and preferences

User Models: Precision Precision The user visits exactly n ranks positions and then stops Prec[n] = 1 n n m=1 a m where a m = 1, if the document at rank m is considered relevant

User Models: RBP Rank-Biased Precision (RBP) The user starts from the top ranked document and with probability p, called persistence, goes to the next document or with probability 1 p stops RBP = (1 p) p i 1 i R where R is the set of the ranks of the retrieved relevant documents

User Models: NCP Normalized Cumulative Precision (NCP) NCP 1 is the expectation (average) of the precision at the ranks of the retrieved, relevant documents, accordingly to a distribution p s ( ) NCP(p s ) = E ps [Prec(n)] = + n=1 p s (d n )Prec(n). 1 S. E. Robertson, E. Kanoulas, E. Yilmaz: A New Interpretation of Average Precision. In SIGIR 2008, pp. 689 690. ACM Press, USA

User Models: AP Average Precision (AP) The user will stop his search at a given relevant document in the ranked list (satisfaction point), according to a probability law fixed and independent from the specific run he/she is considering (uniform distribution). AP = 1 RB Prec(i) = i R r RB 1 r Prec(i) i R where RB is the Recall Base, i.e. the total number of relevant documents, R is the set of the ranks of the retrieved relevant documents and r = R is the number of relevant retrieved documents

Criticisms to AP AP is the gold standard measure in IR but its user models has some weaknesses: it is considered artificial, indeed the uniform distribution which determines the stopping point in a given search is of difficult interpretation, since this means that any relevant document in a ranked list of retrieved documents has the same probability; the probability that a user stops his/her search at a given document depends on a probability distribution defined on the whole set of relevant documents assuming a perfect knowledge of the relevance of each document in the collection.

Goals (I) More flexible modelling of user behaviour into an evaluation measure not just one user model, even though parametric in some attribute Powerful and unifying framework for expressing alternative user models within evaluation measures Possibility of bridging towards actual user behaviour

Time Time-Biased Gain (TBG) TBG = i R g(i) exp ln 2 h T (i) where g(i) is the gain value of each document, h is a discounting function, T (i) is the expected time spent at each rank i. While the time users spend in inspecting and interacting with the result list is a key dimension, it is often overlooked in traditional IR evaluation measures TBG focuses on how much time the user spends on each document in order to accumulate gain over rank positions TBG embeds a user model similar to the one of RBP and it is complementary to the standard IR measures

Goals (II) Unify rank-based and time-based evaluation measures into a single coherent framework avoid to resort to pairs of measures to grasp different angles Possibility of embedding alternative user models into time-based evaluation measure

A Markovian Approach to Evaluation Measures

Markovian User Model We assume that each user: starts from a chosen document at rank X 0 and considers this document for a random time, T 0 ; then he/she decides, according to a probability law independent of the random time spent in the first document, to move to another document in the list at position X 1 ; he/she considers this new document for a random time T 1 ; then he/she moves, independently, to a third relevant document, and so on. 0 ( X, T 0 ) 0 After a random number of forward or backward movements along the ranked list, the user will end his/her search and we will evaluate the total utility provided by the system to him/her.

Markovian User Model We assume that each user: starts from a chosen document at rank X 0 and considers this document for a random time, T 0 ; then he/she decides, according to a probability law independent of the random time spent in the first document, to move to another document in the list at position X 1 ; he/she considers this new document for a random time T 1 ; then he/she moves, independently, to a third relevant document, and so on. 1 X 1, T ) ( 1 0 ( X 0, T ) 0 After a random number of forward or backward movements along the ranked list, the user will end his/her search and we will evaluate the total utility provided by the system to him/her.

Markovian User Model We assume that each user: starts from a chosen document at rank X 0 and considers this document for a random time, T 0 ; then he/she decides, according to a probability law independent of the random time spent in the first document, to move to another document in the list at position X 1 ; he/she considers this new document for a random time T 1 ; then he/she moves, independently, to a third relevant document, and so on. 1 X 1, T ) ( 1 0 ( X 0, T ) 0 2 X 2, T ) ( 2 After a random number of forward or backward movements along the ranked list, the user will end his/her search and we will evaluate the total utility provided by the system to him/her.

Mathematical Framework Notation: X 0, X 1, X 2,... T = {1, 2,..., T } the random sequence of document ranks visited by the user; T 0, T 1, T 2,... the random times spent, respectively, visiting the first document considered, the second one and so on. Assumption: the probability to pass from the document at rank i to the document at rank j will only depend on the starting rank i and not on the whole list of documents: P[X n+1 = j X n = i, X n 1 = i n 1,..., X 0 = i 0 ] = = P[X n+1 = j X n = i] = p i,j for any n N and i, j, i 0,..., i n 1 T.

Structure of the Markov Chain p 1,T p 1,2 p 1,3 p 1,T-1 p 2,T-1 p 2,T p 2,3 d 1 d 2 d 3 d T-1 d T p T-1,3 pt-1,1 p T-1,2 p T,3 p T,T-1 p T,1 p T,2

Definition of the Markov Chain Fixing a starting distribution λ, the random variables (X n ) n N define a time homogeneous discrete time Markov Chain, with state space T, initial distribution λ and transition matrix P = (p i,j ) i,j T. Under the assumption that P is irreducible, i.e. the user can move in a finite number of steps from any document to any other document with positive probability, there exist a unique invariant distribution π, independent from the initial distribution λ, which provides us with the probability of the user being in any given document in the long run, i.e. n. in practice, the convergence is quite fast and n 10 is usually enough The invariant distribution is the unique left eigenvector of the eigenvalue 1 of the transition matrix and can be computed solving the linear system π = πp.

Structure of the sub-markov Chain p 1,T p 2,T p 1,2 d 1 d 2 d 3 d T-1 d T p T,1 p T,2

Discrete Time Markov Precision Let (Y n ) n N denotes the sub-chain of (X n ) n N that considers just the visits to the judged relevant documents at ranks R. Note that this sub-chain has in general a transition matrix P different form P, which can be computed from P by solving a linear system. We define Markov Precision (MP) as: MP = i R π i Prec(i). i.e. the weighted mean of the precision, at each relevant retrieved document, by the probability of the user visiting that document.

Markov Precision: Considerations MP does not depend on the recall base, a knowledge not available to users; MP allows users to move forward and backward in the result list; The transition matrix allows for plugging several alternative user models into MP, either by pre-defining them or learning them from logs and user interaction data this also allows to compare different user models within the same measure rather than comparing different measures with different user models each other; The (fast converging) invariant distribution frees us from assuming that the user starts browsing from a specific document, typically the first one; MP relies on the assumption of a memory-less Markovian process this differs from measures like Expected Reciprocal Rank (ERR), which takes into account the history of all the visited documents, but it is similar to measures like RBP, where the decision on whether continuing or not to visit the list is independent from previously visited documents.

Markov Precision and NCP Recall the NCP model: NCP(p s ) = E ps [Prec(n)] = + n=1 p s (d n )Prec(n). MP fits this model if you use the invariant distribution as probability distribution: p s (d n ) = a n π n where a n = 1, if the document at rank n is considered relevant

AP and MP Consider the following transition probabilities among the relevant documents in a given ranked list: P[X n+1 = j X n = i] = 1 r 1 i, j R, i j where r = R is the number of the relevant retrieved documents. Then the invariant distribution is ( 1 r, 1 r,..., 1 r ) and we obtain MP = 1 Prec(i) r i R which is equal to AP once multiplied by the Recall r RB.

AP Markov Model: Considerations We explain AP with a slightly richer user model, where the user can move forward and backward among any document; MP is not AP unless you provide it with the same amount of information AP knows about the recall base; As we will see in the evaluation part, other MP user models are extremely highly correlated with AP, providing further alternative explanations of it.

Continuous time Markov Precision To obtain a continuous-time Markov Chain, we have to assume that the holding times T n have all exponential distribution, i.e. 0 t < 0 P[T n t] = 1 exp( µt) t 0 where µ i is a positive real number that may depend on the specific state i of the chain the user is visiting at that time. Let (X t ) t 0 be the continuous Markov Chain, with jump chain (X n ) n N and holding times T n, then the Continuous-Time Markov Precision is: MPcont = i R π i Prec(i). where π i = π i µ 1 i j R π jµ 1 j

Experimented User Models We will analyse three possible choices: state space choice: the state space of the Markov chain (X n ) n N is the whole set T, indicated with AD (all documents model), or the set R, indicated with OR (only relevant documents model); connectedness: the nonzero transition probabilities are among all the documents, indicated with GL (global model), or only among adjacent documents, indicated with LO (local model); transition probabilities: the transition probabilities are proportional to the inverse of the distance, indicated with ID (inverse distance model), or to the inverse of the logarithm of the distance, indicated with LID (logarithmic inverse distance model).

Experimental Collections Feature TREC 7 TREC 8 TREC 10 TREC 14 Track Ad Hoc Ad Hoc Web Robust Corpus Tipster 4, 5 Tipster 4, 5 WT10g AQUAINT # Documents 528K 528K 1.7M 1.0M # Topics 50 50 50 50 # Runs 103 129 95 74 Run Length 1,000 1,000 1,000 1,000 Relevance Degrees Binary Binary 3 Grades 3 Grades Pool Depth 100 100 100 55 Minimum # Relevant 7 6 2 9 Average # Relevant 93.48 94.56 67.26 131.22 Maximum # Relevant 361 347 372 376

Correlation Analysis TREC 07, 1998, Ad Hoc AP 1 0,95 0,9 0,85 0,8 0,75 RBP 0,7 0,65 0,6 P@10 GL_OR_ID GL_OR_LID LO_OR_ID LO_OR_LID bpref Rprec

Correlation Analysis TREC 08, 1999, Ad Hoc AP 1 0,95 0,9 TREC 08, 1999, Ad Hoc 0,95 0,9 AP 1 RBP 0,85 0,8 P@10 RBP 0,85 0,8 P@10 0,75 0,75 bpref Rprec bpref Rprec GL_OR_ID GL_OR_LID LO_OR_ID LO_OR_LID GL_OR_ID@Rec GL_OR_LID@Rec LO_OR_ID@Rec LO_OR_LID@Rec

Effect of Incompleteness on Absolute Performances Performances averaged over topics and runs 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 TREC 08, 1999, Ad Hoc GL_AD_ID GL_AD_ID@Rec(T) LO_AD_ID LO_AD_ID@Rec(T) AP P@10 Rprec bpref RBP 0.05 0 100% 90% 70% 50% 30% 10% Pool reduction rate

Effect of Incompleteness on Rank Correlation 1 TREC 08, 1999, Ad Hoc 0.95 0.9 0.85 Kendall s τ correlation 0.8 0.75 0.7 GL_AD_ID 0.65 GL_AD_ID@Rec(T) LO_AD_ID LO_AD_ID@Rec(T) 0.6 AP P@10 0.55 Rprec bpref RBP 0.5 100% 90% 70% 50% 30% 10% Pool reduction rate

Click-Log Data Set Click logs made available by Yandex in the context of the Relevance Prediction Challenge; http://imat-relpred.yandex.ru/en/ 340, 796, 067 records with 30, 717, 251 unique queries, retrieving 10 URLs each; the training set contains 5, 191 assessed queries which correspond to 30, 741, 907 records; we selected those queries which appear at least in 100 sessions each to calibrate the time. On the basis of the click logs, 21% of the observed transitions are backward.

Time Calibration We estimated the parameters of the exponential holding times by using an unbiased estimator of the inverse of the sample mean of the time spent by the users visiting these states; We used the GL_AD_ID model, i.e. considering all the retrieved documents (AD), allowing for transition among all of them (GL), and with transition probability proportional to the inverse of the distance (ID). Run µ1 µ2 µ3 µ4 µ5 µ6 µ7 µ8 µ9 µ10 disc MP cont MP (1,1,1,1,0,0,0,1,0,0) 0.2000 0.0357 0.2000 0.0400 0.0056 0.0005 0.0035 0.0017 0.0034 0.0024 0.9205 0.6603 (1,1,1,0,1,0,0,0,1,0) 0.0177 0.0047 0.0037 0.0015 0.0041 0.0031 0.0057 0.0022 0.0061 0.0045 0.8668 0.8710 (1,1,0,1,1,0,0,0,0,1) 0.0056 0.0051 0.0062 0.0031 0.0046 0.0025 0.005 0.0022 0.007 0.005 0.8120 0.8001

Reproducibility Source code for running the experiments is available at: http://matters.dei.unipd.it/

Conclusions We introduce a new measure, Markov Precision (MP), which models the user interaction with the result list via Markov chains: it provides a single and coherent framework for plugging alternative user models into the same measure; it allows for dealing with both rank-based and time-based evaluation of IR systems; it provides us with alternative interpretations of average precision.

Future Work Markov Precision just scratches the top of the iceber : study different user models, also learned from logs, together with their time-based calibration; address the memory-less property of Markov chains in order to take into account also the previously visited documents; study its relationship with click-based measures; further study of its properties, e.g. discriminative power.