Score Distribution Models

Size: px

Start display at page:

Download "Score Distribution Models"

Dustin Tucker
5 years ago
Views:

1 Score Distribution Models Evangelos Kanoulas Virgil Pavlu Keshi Dai Javed Aslam

2 Score Distributions 2

3 Score Distributions 2

4 Score Distributions

Score Distributions 9.6592 9.5761 9.4919 9.4784 9.

5 Score Distributions

6 Score Distributions

7 Score Distributions Applications : norm. for multiple sources Information Filtering (e.g. news retrieval) Recall-oriented IR (e.g. legal, patent IR) Distributed IR (multiple data collections) Diversity/Faceted IR (news, images, video, web pages, feeds) Meta-search To be useful, Score Distributions models must be reasonably accurate 4

8 Modeling Score Distributions Modeling score distributions key to inference EM to fit the model into the data Dozens of models in the literature Negative Exponential (nonrel) & Gaussian (rel) Gamma & Gaussian 2 Poisson 2 Gaussian 5

9 Motivation What is wrong with Neg. Exponential & Gaussian? It simply does not fit the data Undesirable IR properties 6

10 Motivation What is wrong with Neg. Exponential & Gaussian? It simply does not fit the data Undesirable IR properties 6

11 Our work (some previous) 7

12 Our work (some previous) New model Theoretical basis Fits the data better Focus on getting it right rather than making it simple 7

13 Overview Many related problems hardest: on modeling [TREC] relevant documents This talk: three of these problems Theory BM25 and LM Relevant docs score distribution via PR curves 8

14 1 DL/TF variable: A case for Gammamixture-based distribution model 9

15 Why DL/TF BM25 LM 10

16 Quality classes and term frequency

17 Quality classes and term frequency Quality class = set of documents for which query terms are consistently generated by a Poisson process can model aspects/facets, doc types,etc

18 Quality classes and term frequency Quality class = set of documents for which query terms are consistently generated by a Poisson process can model aspects/facets, doc types,etc Distance between terms occurrences =waiting time between Poisson events

19 Quality classes and term frequency Quality class = set of documents for which query terms are consistently generated by a Poisson process can model aspects/facets, doc types,etc Distance between terms occurrences =waiting time between Poisson events !me

20 Quality classes and term frequency Quality class = set of documents for which query terms are consistently generated by a Poisson process can model aspects/facets, doc types,etc Distance between terms occurrences =waiting time between Poisson events waiting times(exp distrib) average waiting time!me

21 DL/TF variable θ= average waiting time between terms depends on class quality Q and query generality (hardness) g, collection size etc ADL = average document length For each class, model the DL/TF variable separately for each TF value k DL = sum of waiting times 12

22 Mixture over TF values k=

23 Mixture over TF values P Q []=geometric mixture over TF values (k) with rate 1-p example: relevant class p=0.1 nonrelevant class p=0.7 avg TF = mean(p Q ) = 1/p k=

24 Mixture over TF values P Q []=geometric mixture over TF values (k) with rate 1-p example: relevant class p=0.1 nonrelevant class p=0.7 avg TF = mean(p Q ) = 1/p k= Model DL/TF as a mixture of gammas 13

25 DL/TF per quality class 14

26 DL/TF per quality class For a geometric P[], the mixture is actually a single gamma 14

27 DL/TF per quality class For a geometric P[], the mixture is actually a single gamma Multiple query terms : requires a proportionality usually not achievable in practice but approx by a gamma with higher shape 14

28 Gamma mixture for DL/TF mixture Empirical Histogram MLE Gamma Fit approximate with a single gamma DL/TF 15

29 Score Transformations r=non-decreasing differentiable function f(x) = distribution modeled Many basic transformations preserve gamma-like distribution shape 16

30 Score Transform: Inversion 17

31 Score Transform: Inversion 17

32 Score Transformations Saturators r (RobertsonTF) can make the distribution more hill - like Frequency Robertson s TF k1=1 k1=3 k1= TF k1=1 k1=3 k1= BM25 Scores

33 2 Popular retrieval functions: BM25 and LM 19

34 Theory models Three fits Mixture of gammas inverted, score transformations Data-driven approach maximum likelihood gamma fit Analytical approach Traditional ranking functions: TF-IDF, BM25, LM Make basic assumptions of low level components Derive score distribution 20

35 Analytical Approach:BM25 Ireland Peace Talks BM25 21

36 Analytical Approach:BM25 Ireland Peace Talks BM25 21

37 BM25 X=DL/TF 22

38 BM25 X=DL/TF 22

39 BM BM25 score histogram Analytically Numerical MLE Gamma fit Model (theory) 0.03 Frequency BM25 score 23

40 Analytical Approach:LM 6000 Ireland Peace Talks ireland, 6.155, 7698 docs TF 2 x 104 peac, 3.876, docs TF 6 x 104 talk, 2.777, docs TF Normalized TF log(normalized TF) log(normalized TF + CTF/TN) log(lambda*normalized TF + (1 lambda)*ctf/tn) Normalized TF log(normalized TF) log(normalized TF + CTF/TN) log(lambda*normalized TF + (1 lambda)*ctf/tn) Normalized TF log(normalized TF) log(normalized TF + CTF/TN) log(lambda*normalized TF + (1 lambda)*ctf/tn) BM25 Scores 4 3 Language Model

25 0 7 6 5 4 3 2 1 LM(Jelinek-Mercer smooth) 0.09 0.08 0.

41 LM(Jelinek-Mercer smooth) BM25 score histogram Analytically Numerical MLE Gamma fit Model (theory) 0.06 Frequency

42 3 Inferring Relevant distribution using a Precision-Recall model 26

43 Precision-Recall curves 27

44 Precision-Recall curves Model Precision recall curves for various values of rp precision recall 27

45 Score Distrib for Relevant Docs Previous work Input : Score distribution of relevant documents Score distribution of non-relevant documents Output : PR-curve model

46 Score Distrib of Relevant Docs Input : Score distribution of non-relevant documents Gamma-based distribution fit to all scores PR-curve model Obtain parameter by fitting the model to the data (ranked list of relevant and non-relevant documents) Output : Score distribution of relevant documents for now a very nice/simple model for PR curves; still a messy derivation use of Recall and Fallout as defined by S.Robertson

47 Inferred relevant distribution Estonia economy 30

48 Inferred relevant distribution Estonia economy

49 Conclusions Quality classes : concept that relates relevance with Poisson-process parameters goes beyond relevance grades assessments can model aspects(diversity), types of docs etc Models better than traditional ones Relevant class distrib. needs more work can be approx inferred form a Prec-Recall model PR models used too simple (for now) Thank You! Questions? 31

50 32

51 Summation over query terms For scores computed as sums of term components BM25, LM, TF-IDF Non-relevant documents (low quality Q) each term component will be distributed approximately as a Gamma(low shape, low scale). If the scales are approximately equal their sum follows a Gamma distribution with the same scale Relevant documents mixture has more effective components sum a rich mixture, usually multiple-hill like 33

Modeling the Score Distributions of Relevant and Non-relevant Documents

Modeling the Score Distributions of Relevant and Non-relevant Documents Evangelos Kanoulas, Virgil Pavlu, Keshi Dai, and Javed A. Aslam College of Computer and Information Science Northeastern University,