Probabilistic Ranking Principle. Hongning Wang

Size: px

Start display at page:

Download "Probabilistic Ranking Principle. Hongning Wang"

Patience Cummings
5 years ago
Views:

1 Probablstc anng Prncple Hongnng Wang

2 ecap: latent semantc analyss Soluton Low ran matrx approxmaton Imagne ths s *true* concept-document matrx Imagne ths s our observed term-document matrx andom nose over the word selecton n each document CS@UVa CS4501: Informaton etreval 2

3 ecap: latent semantc analyss Solve LS by SVD Z = argmn Z ran Z = = argmn Z ran Z = = C M N Procedure of LS Map to a lower dmensonal space C Z F M N 2 =1 j=1 C j Z j 1. Perform SVD on document-term adjacency matrx 2. Construct C M N by only eepng the largest sngular values n Σ non-zero CS@UVa CS4501: Informaton etreval 3

4 ecap: latent semantc analyss Vsualzaton CS4501: Informaton etreval 4

5 ecap: LS for retreval Project queres nto the new document space 1 q = qv N Σ Treat query as a pseudo document of term vector Cosne smlarty between query and documents n ths lower-dmensonal space CS@UVa CS4501: Informaton etreval 5

6 ecap: LS for retreval HCI Graph theory q: human computer nteracton CS4501: Informaton etreval 6

7 Noton of relevance elevance epq epd Smlarty r=1 qd r {01} Probablty of elevance d q or q d Probablstc nference Dfferent rep & smlarty Vector space model Salton et al. 75 egresson Model Fox 83 Prob. dstr. model Wong & Yao 89 Doc generaton Generatve Model Classcal prob. Model obertson & Sparc Jones 76 Today s lecture uery generaton LM approach Ponte & Croft 98 Lafferty & Zha 01a Dfferent nference system Prob. concept space model Wong & Yao 95 Inference networ model Turtle & Croft 91 CS@UVa CS 4501: Informaton etreval 7

8 Basc concepts n probablty andom experment n experment wth uncertan outcome e.g. tossng a con pcng a word from text Sample space S ll possble outcomes of an experment e.g. tossng 2 far cons S={HH HT TH TT} Event E ES E happens ff outcome s n S e.g. E={HH} all heads E={HHTT} same face Impossble event {} certan event S Probablty of event 0 E 1 CS@UVa CS 4501: Informaton etreval 8

9 Essental probablty concepts Probablty of events Mutually exclusve events P B = P + B General events P B = P + P B B Independent events P B = P P B Jont probablty or smply as B CS@UVa CS 4501: Informaton etreval 9

10 Essental probablty concepts Condtonal probablty P B = B/P Bayes ule: P B = P B B/P For ndependent events P B = B Total probablty If 1 n form a non-overlappng partton of S BS = B B n P B = P B P P B 1 P 1 + +P B n P n P B Ths allows us to compute B based on B CS@UVa CS 4501: Informaton etreval 10

11 Interpretaton of Bayes rule Hypothess space: H = {H 1 H n } Evdence: E P H E = P E H H E If we want to pc the most lely hypothess H* we can drop E Posteror probablty of H Pror probablty of H H E E H H Lelhood of data/evdence gven H CS@UVa CS 4501: Informaton etreval 11

12 Theoretcal justfcaton of ranng s stated by Wllam Cooper If a reference retreval system s response to each request s a ranng of the documents n the collectons n order of decreasng probablty of usefulness to the user who submtted the request where the probabltes are estmated as accurately as possble on the bass of whatever data made avalable to the system for ths purpose then the overall effectveness of the system to ts users wll be the best that s obtanable on the bass of that data. an by probablty of relevance leads to optmal retreval performance CS@UVa CS 4501: Informaton etreval 12

13 Justfcaton From decson theory Two types of loss Lossretreved non-relevant=a 1 Lossnot retreved relevant=a 2 φd q: probablty of d beng relevant to q Expected loss regardng to the decson of ncludng d n the fnal results etreve: 1 φ d q a 1 Your decson crteron? Not retreve: φ d q a 2 CS@UVa CS 4501: Informaton etreval 13

14 Justfcaton From decson theory We mae decson by If 1 φ d q Otherwse not retreve d a 1 <φ d q a 2 retreve d Chec f φ d q > a 1 a 1 +a 2 an documents by descendng order of φ d q would mnmze the loss CS@UVa CS 4501: Informaton etreval 14

15 ccordng to PP what we need s relevance measure functon Fqd For all q d 1 d 2 Fqd 1 > Fqd 2 ff. pel qd 1 >pel qd 2 ssumptons Independent relevance Sequental browsng Most exstng research on I models so far has fallen nto ths lne of thnng Lmtatons? CS@UVa CS 4501: Informaton etreval 15

16 Probablty of relevance Three random varables uery Document D elevance {01} Goal: ran D based on =1 D Compute =1 D ctually one only needs to compare =1 D 1 wth =1 D 2.e. ran documents Several dfferent ways to defne =1 D CS@UVa CS 4501: Informaton etreval 16

17 Condtonal models for =1 D Basc dea: relevance depends on how well a query matches a document =1 D=gepD epd: feature representaton of query-doc par E.g. #matched terms hghest IDF of a matched term doclen Usng tranng data wth nown relevance judgments to estmate parameter pply the model to ran new documents Specal case: logstc regresson a functonal form CS@UVa CS 4501: Informaton etreval 17

18 egresson for ranng? Lnear regresson y w T X In a ranng problem: X features about query-document par y relevance label of document for the gven query elatonshp between a scalar dependent varable y and one or more explanatory varables CS@UVa CS 4501: Informaton etreval 18

19 Features/ttrbutes for ranng Typcal features consdered n ranng problems X X X X X X M 1 M 1 M M 1 L M 1 DL M 1 log M logf log DF N nt log n t j t t j j j verage bsolute uery Frequency uery Length verage bsolute Document Frequency Document Length verage Inverse Document Frequency Number of Terms n common between query and document CS@UVa CS 4501: Informaton etreval 19

20 egresson for ranng Lnear regresson y w T X elatonshp between a scalar dependent varable y and one or more explanatory varables y Y s dscrete n a ranng problem! y = 1 wt X > w T X 0.5 What f we have an outler? Optmal regresson model 0.00 CS@UVa CS 4501: Informaton etreval 20 x

21 egresson for ranng Logstc regresson =1 D = σ w T X = 1+exp w T X Drectly modelng posteror of document relevance Sgmod functon y x 1 What f we have an outler? CS@UVa CS 4501: Informaton etreval 21 x

22 Condtonal models for =1 D Pros & Cons dvantages bsolute probablty of relevance avalable May re-use all the past relevance judgments Problems Performance heavly depends on the selecton of features Lttle gudance on feature selecton Wll be covered wth more detals n later learnng-to-ran dscussons CS@UVa CS 4501: Informaton etreval 22

23 Generatve models for =1 D Basc dea Compute Odd=1 D usng Bayes rule Odd 1 D 1 D 0 D ssumpton elevance s a bnary varable Varants Document generaton D =D uery generaton D = DD D 1 1 D 0 0 Ignored for ranng CS@UVa CS 4501: Informaton etreval 23

24 Document generaton model Odd 1 D D 1 D 0 D 1 1 D 0 0 nformaton retreval retrevedignored s helpful for ranng for you everyone Doc Model 1 of relevant 1 docs 1 for 0 1 Doc Model 1 of non-relevant 1 1 docs 1 for 0 D 1 D 0 ssume ndependent attrbutes of 1.why? Let D=d 1 d where d {01} s the value of attrbute Smlarly =q 1 q d 1 Terms occur n doc 1 D Terms do not occur n doc Odd 1 1 d 1 d d document relevant=1 nonrelevant=0 term present =1 p u CS@UVa term absent =0 1-p 1-u CS 4501: Informaton etreval 24

25 Document generaton model CS 4501: Informaton etreval q q d q d q d d d u p p u u p u p u p P P P P d P d P D Odd Terms occur n doc Terms do not occur n doc document relevant=1 nonrelevant=0 term present =1 p u term absent =0 1-p 1-u ssumpton: terms not occurrng n the query are equally lely to occur n relevant and nonrelevant documents.e. p t =u t Important trcs 25

26 obertson-sparc Jones Model obertson & Sparc Jones 76 logo 1 D an 1 d q 1 p 1 u log u 1 p 1 d q 1 p log 1 p 1 u log u SJ model Two parameters for each term : p = =1 =1: prob. that term occurs n a relevant doc u = =1 =0: prob. that term occurs n a non-relevant doc How to estmate these parameters? Suppose we have relevance judgments pˆ # rel. doc wth 0.5 # rel. doc 1 uˆ # nonrel. doc wth 0.5 # nonrel. doc and +1 can be justfed by Bayesan estmaton as prors CS@UVa Per-query estmaton! CS 4501: Informaton etreval 26

27 Parameter estmaton General settng: Gven a hypotheszed & probablstc model that governs the random experment The model gves a probablty of any data pd θ that depends on the parameter θ Now gven actual sample data X={x 1 x n } what can we say about the value of θ? Intutvely tae our best guess of θ -- best means best explanng/fttng the data Generally an optmzaton problem CS@UVa CS 4501: Informaton etreval 27

28 Maxmum lelhood vs. Bayesan Maxmum lelhood estmaton Best means data lelhood reaches maxmum θ = argmax θ X θ Issue: small sample sze Bayesan estmaton Best means beng consstent wth our pror ML: Frequentst s pont of vew nowledge and explanng data well θ = argmax θ P θ X = argmax θ P X θ θ..a Maxmum a Posteror estmaton Issue: how to defne pror? MP: Bayesan s pont of vew CS@UVa CS 4501: Informaton etreval 28

29 Illustraton of Bayesan estmaton Posteror: p X px p Lelhood: px X=x 1 x N Pror: p : pror mode : posteror mode ml : ML estmate CS@UVa CS 4501: Informaton etreval 29

Data: a document d wth counts cw 1 cw N Model: multnomal computer dstrbuton pw θ wth parameters 30% nformaton θ = retreval pw computer scence relevant lterature Maxmum lelhood estmator: θ = argmax θ

30 Data: a document d wth counts cw 1 cw N Model: multnomal computer dstrbuton pw θ wth parameters 30% nformaton θ = retreval pw computer scence relevant lterature Maxmum lelhood estmator: θ = argmax θ pw θ p W θ = lterature nformaton relevant 8% 20% 1% Maxmum lelhood estmaton L W θ = N c w 1 cw N N =1 N =1 c w log θ + λ N θ cw =1 =1 L = c w + λ θ θ θ = c w λ Snce θ = scence 31% N θ =1 we have λ = =1 c w retreval 10% N θ 1 N =1 θ cw c w log p W θ = Usng Lagrange multpler approach we ll tune θ to maxmze LW θ Set partal dervatves to zero ML estmate CS@UVa CS 4501: Informaton etreval 30 =1 N c w N =1 c w equrement from probablty log θ

31 obertson-sparc Jones Model obertson & Sparc Jones 76 logo 1 D an 1 d q 1 p 1 u log u 1 p 1 d q 1 p log 1 p 1 u log u SJ model Two parameters for each term : p = =1 =1: prob. that term occurs n a relevant doc u = =1 =0: prob. that term occurs n a non-relevant doc How to estmate these parameters? Suppose we have relevance judgments pˆ # rel. doc wth 0.5 # rel. doc 1 uˆ # nonrel. doc wth 0.5 # nonrel. doc and +1 can be justfed by Bayesan estmaton as prors CS@UVa Per-query estmaton! CS 4501: Informaton etreval 31

32 SJ Model wthout relevance nfo Croft & Harper 79 nformaton retreval retreved s helpful for you everyone an p 1 u p 1u log Doc1 O 1 D1 log 1 0 1log 1 log d q u p d q p u Doc SJ model Suppose we do not have relevance judgments - We wll assume p to be a constant - Estmate u by assumng all documents to be non-relevant logo 1 D 0.5 an c log 1 d q 1 n 0. 5 N n emnder: IDF 1 log N n N: # documents n collecton n : # documents n whch term occurs IDF weghted Boolean model? CS@UVa CS 4501: Informaton etreval 32

33 SJ Model: summary The most mportant classc probablstc I model Use only term presence/absence thus also referred to as Bnary Independence Model Essentally Naïve Bayes for doc ranng Desgned for short catalog records When wthout relevance judgments the model parameters must be estmated n an ad-hoc way Performance sn t as good as tuned VS models CS@UVa CS 4501: Informaton etreval 33

34 Improvng SJ: addng TF CS 4501: Informaton etreval Let D=d 1 d where d s the frequency count of term d d d P d P P d P P P d P d P d P d P D P D P E E e f E P e f E p E f p E P E f p E p f p f E f E!! 2-Posson mxture model for TF Many more parameters to estmate! how many exactly? Compound wth document length! Elteness: f the term s about the concept ased n the query 34

35 BM25/Oap approxmaton obertson et al. 94 Idea: pproxmate p=1 D wth a smpler functon that approxmates 2-Posson mxture model Observatons: 1 D 0 D log O=1 D s a sum of term weghts occurrng n both query and document Term weght W = 0 f TF =0 W ncreases monotoncally wth TF W has an asymptotc lmt The smple functon s W d d 1 d 1 TF 1 1 p 1 u log K TF u 1 p CS@UVa CS 4501: Informaton etreval 35

36 ddng doc. length Incorporatng doc length Motvaton: the 2-Posson model assumes equal document length Implementaton: penalze long doc W where TF 1 1 p 1 u log K TF u 1 p d K 11 b b avg d Pvoted document length normalzaton CS@UVa CS 4501: Informaton etreval 36

37 ddng query TF Incorporatng query TF Motvaton Natural symmetry between document and query Implementaton: a smlar TF transformaton as n document TF W TF s 1 TF 1 1 p 1 u log TF K TF u 1 p s The fnal formula s called BM25 achevng top TEC performance BM: best match CS@UVa CS 4501: Informaton etreval 37

38 The BM25 formula Oap TF/BM25 TF becomes IDF when no relevance nfo s avalable CS@UVa CS 4501: Informaton etreval 38

39 The BM25 formula closer loo rel q D TF-IDF component for document n 1 IDF q D 1 tf b b 2 11 b s usually set to [ ] 1 s usually set to [ ] 2 s usually set to ] tf 1 avg D TF component for query qtf 2 1 qtf Vector space model wth TF-IDF schema! CS@UVa CS 4501: Informaton etreval 39

40 Extensons of Doc Generaton models Capture term dependence lternatve ways to ncorporate TF [jsbergen & Harper 78] Feature/term selecton for feedbac reports] [Croft 83 Kalt96] [Oap s TEC Estmate of the relevance model based on pseudo feedbac [Lavreno & Croft 01] to be covered later CS@UVa CS 4501: Informaton etreval 40

41 uery generaton models O 1 D D D 1 0 D 1 D 1 D 0 D 0 D D 1 D 1 0 ssume D 0 0 uery lelhood pq d Document pror CS@UVa ssumng unform document pror we have O 1 D D 1 Now the queston s how to compute D 1? Generally nvolves two steps: 1 estmate a language model based on D 2 compute the query lelhood accordng to the estmated model Language models we wll cover t n next lecture! CS 4501: Informaton etreval 41

42 What you should now Essental concepts n probablty Justfcaton of ranng by relevance Dervaton of SJ model Maxmum lelhood estmaton BM25 formula CS@UVa CS 4501: Informaton etreval 42

43 Today s readng Chapter 11. Probablstc nformaton retreval 11.2 The Probablty anng Prncple 11.3 The Bnary Independence Model Oap BM25: a non-bnary model CS@UVa CS 4501: Informaton etreval 43

Probabilistic Ranking Principle. Hongning Wang

Probabilistic Ranking Principle. Hongning Wang robablstc ankng rncple Hongnng Wang CS@UVa Noton of relevance elevance epq epd Smlarty r1qd r {01} robablty of elevance d q or q d robablstc nference Dfferent rep & smlarty Vector space model Salton et