Machine Learning for IR. Outline. Learning to Rank. MAP vs Accuracy. Mean Average Precision 3/9/2010. Information Retrieval as Structured Prediction

Size: px

Start display at page:

Download "Machine Learning for IR. Outline. Learning to Rank. MAP vs Accuracy. Mean Average Precision 3/9/2010. Information Retrieval as Structured Prediction"

Edward Cameron
5 years ago
Views:

1 /9/00 Informaton Retreval as Structured Predcton S 6784 March 4 th, 00 Ysong Yue ornell Unverst Jont or th: horsten Joachms, Flp Radlns, and homas Fnle Machne Learnng for IR Machne learnng often used (learnng to ran) Frst generate features x q, d ( q, d) q q q appears n ttle appears n frst appears n anchor tex t lnng cos( q, d) pageran ( d) paragraph to d Learnng to Ran Desgn a retreval functon f( = x (eghted average of features) For each quer q Score all s q,d = x q,d Sort b s q,d to produce ranng Whch eght vector s best? Outlne Optmzng ranng measures Learnng to Ran Structured loss functon Mean average precson Dversfed retreval overage problem Structured predcton problem Mean Average Precson onsder ran poston of each evance doc K, K, K R ompute Precson@K for each K, K, K R Average precson = average of P@K MAP vs Accurac Rel? H H Ex: has AvgPrec of MAP s Average Precson across multple queres/ranngs Ranng MAP Best Acc H H

2 /9/00 Optmzng Parse Agreements Parse Preferences SVM Such that: x arg mn x,,,,,, :, 0,, parse dsagreements Large Margn Ordnal Regresson [Herbrch et al., 999] an be reduced to O( nlogn) tme [Joachms, 005] Pars can be reeghted to more closel model IR goals [ao et al., 006] MAP vs RO-area Rel? H H Ranng MAP RO-area H H Lnear Dscrmnant for Ranng Let x = (x, x n ) denote canddate documents (features) Let = {+, -} encode parse ran orders Feature map s lnear combnaton of documents. (, Predcton made b sortng on document scores x ˆ : :! ( x (, x ) Structural SVM Structural SVM for MAP Let x denote a structured nput (canddate documents) Let denote a structured output (ranng) Standard obectve functon: onstrants are defned for each ncorrect labelng over the set of documents x. Mnmze subect to ( ) ( ) ( ) ( ) here (, ( x x ) ( = {-, +} ) and ( ) : : :! ( ) Avgprec( ) ( ) ( ) ( ) (, x ) (, x ) ( ) ( ) : ( ) ( ) ( ) (, x ) (, x ) ( ) Sum of slacs s smooth upper bound on MAP loss. [Yue, Fnle, Radlns, Joachms; SIGIR 007] [Yue, Fnle, Radlns, Joachms; SIGIR 007]

/9/00 oo Man onstrants! uttng Plane ranng For Average Precson, the true labelng s a ranng here the evant documents are all raned n the front, e.g., An ncorrect labelng ould be an other ranng, e.g., hs ranng has Average Precson of about 0.

Orgnal SVM Problem Exponental constrants Most are domnated b a small set of mportant constrants Structural SVM Approach Repeatedl fnds the next most volated constrant untl set of constrants s a good

3 /9/00 oo Man onstrants! uttng Plane ranng For Average Precson, the true labelng s a ranng here the evant documents are all raned n the front, e.g., An ncorrect labelng ould be an other ranng, e.g., hs ranng has Average Precson of about 0.8 th ( ) ¼ 0. Intractable number of ranngs, thus an ntractable number of constrants! Orgnal SVM Problem Exponental constrants Most are domnated b a small set of mportant constrants Structural SVM Approach Repeatedl fnds the next most volated constrant untl set of constrants s a good approxmaton. Fndng Most Volated onstrant Fndng Most Volated onstrant arg max ( ) : :! Observatons MAP s nvarant on the order of documents thn a evance class Sappng to evant or non-evant documents does not change MAP. Jont SVM score s optmzed b sortng b document score, x Reduces to fndng an nterleavng beteen to sorted lsts of documents [Yue et al., SIGIR 007] ( x x ) arg max ( ) : :! Start th perfect ranng onsder sappng adacent evant/non-evant documents Fnd the best feasble ranng of the non-evant document Repeat for next non-evant document ever ant to sap past prevous non-evant document Repeat untl all non-evant documents have been consdered [Yue et al., SIGIR 007] ( x x ) Proof (Setch) Experments H( ) ( ) : :! ( x x ) Used RE 9 & 0 Web rac corpus. Assume evant and non-evant docs are sorted Defne (, ) as the change n H hen: he hghest raned evant document after x changes from x to x and ndex evant documents ( < ) ndexes non-evant document eed to sho (, ) (, ) Features of document/quer pars computed from outputs of exstng retreval functons. (Indr Retreval Functons & RE Submssons) Goal s to learn a recombnaton of outputs hch mproves mean average precson. [Yue et al., SIGIR 007]

8 0.6 0.4 0. 0. 0.8 SVM-MAP SVM-RO SVM-A SVM-A SVM-A SVM-A4 Requred for structural SVM tranng Depends on structure of loss functon

6 0.4 0. 0. RE 9 Indr RE 0 Indr RE 9 Submssons RE 0 Submssons RE 9 Submssons (thout best) RE 0 Submssons (thout best) More than one

, 007] Dataset Stor so Far ot Dversfed Optmzng ranng measures Learnng to Ran Structured loss functon Mean average precson Bobb Klenberg

4 Mean Average Precson /9/00 0. omparson th other SVM methods Fndng Most Volated onstrant SVM-MAP SVM-RO SVM-A SVM-A SVM-A SVM-A4 Requred for structural SVM tranng Depends on structure of loss functon Depends on structure of the feature map Effcent algorthms exst despte ntractable number of constrants RE 9 Indr RE 0 Indr RE 9 Submssons RE 0 Submssons RE 9 Submssons (thout best) RE 0 Submssons (thout best) More than one approach [Yue et al., 007] [hapelle et al., 007] Dataset Stor so Far ot Dversfed Optmzng ranng measures Learnng to Ran Structured loss functon Mean average precson Bobb Klenberg the curous hgh school student Dversfed retreval overage problem Structured predcton problem hoose top documents Indvdual Relevance: overage Soluton: D D4 D D D D5 For set S, Submodular Functons F : S R s submodular f A B) A B) A) B) Budgeted Maxmum overage Problem Documents cover some amount of nformaton Documents overlap n nformaton covered Documents have unform cost Select K docs that collectvel maxmze nformaton Greed has (-/e) approxmaton bound 4

5 /9/00 Dverst as overage Problem Gven a good representaton of nformaton Retreve documents to maxmze coverage Ho to Represent Informaton? All the ords (ttle ords, anchor text, etc) Learnng approach to automatcall learn coverage representaton Used to mae predctons on ne test examples Structural SVMs luster membershps (topc models / dm reducton) axonom membershps (ODP) Weghted Word overage More dstnct ords = more nformaton Weght ord mportance Goal: select K documents hch collectvel cover as man dstnct (eghted) ords as possble Greed algorthm (-/e) approxmaton bound (submodular) eed good eghtng functon (learnng problem). [Yue & Joachms, IML 008] Example Document Word ounts Word Beneft V V V V4 V5 V D X X X V D X X X V D X X X X V4 4 V5 5 Margnal Beneft D D D Best Iter 0 D Iter -- D Ho to Weght Words? ot all ords created equal the ondtonal on the quer computer s normall farl nformatve but not for the quer AM Weghtng functon based on the canddate set (for a quer) Pror Wor Essental Pages [Samnathan et al., 008] Uses fxed functon of ord beneft Depends on ord frequenc n canddate set - Local verson of F-IDF - Frequent ords lo eght (not mportant for dverst) - Rare ords lo eght (not representatve) 5

6 /9/00 Word Frequenc Features x = (x,x,,x n ) - canddate documents v an ndvdual ord [ v appears n [ v appears n [ v appears n [ v appears n 0% of ttles n x] 5% of anchors 5% of meta n x] We ll use thousands of such features Beneft of coverng ord v s (v, [Yue & Joachms, IML 008]... 0% of x] n x] x = (x,x,,x n ) - canddate documents subset of x of sze K (the predcton) V() unon of ords from Dscrmnant Functon: ) Beneft of coverng ord v s (v, ˆ [Yue & Joachms, IML 008] Structured Predcton for Maxmzng overage ) v V ( ) ) v V ( ) Does O reard redundanc Beneft of each ord onl counted once Greed has (-/e)-approxmaton bound More sophstcated structure n experments ran usng structural SVM approach Optmzes emprcal rs & generalzaton bound [Yue & Joachms, IML 008] Structured Predcton for Maxmzng overage More Sophstcated Dscrmnant Documents cover ords to dfferent degrees A document th 5 copes of Mcrosoft mght cover t better than another document th onl copes. Use multple ord sets, V (), V (),, V L () Each V () contans onl ords satsfng certan mportance crtera. [Y, Joachms; IML 008] More Sophstcated Dscrmnant (, v V ( ) v V L ( ) L Separate for each mportance level. Jont feature map s vector composton of all ˆ (, Greed has (-/e)-approxmaton bound. Stll uses lnear feature space. [Y, Joachms; IML 008] Structural Support Vector Machne Let x denote a structured nput (canddate documents) Let denote a structured output (subset of sze K) Standard SVM obectve functon: onstrants are defned for each ncorrect labelng over the set of documents x. ( ) [sochantards et al., 005] : ( ) ( ) ( ) ( x, ) ( x, ) ( ) 6

7 /9/00 Weghted Subtopc Loss Fndng Most Volated onstrant Example: x covers t x covers t,t,t x covers t,t # Docs Loss t / t /6 t / Motvaton Hgher penalt for not coverng popular subtopcs Mtgates label nose n the tal [Yue & Joachms, IML 008] ˆ Encode each subtopc as an addtonal ord to be covered. Use greed predcton to fnd approxmate most volated constrant. ) ( ) ' ) ˆ v V ( ') v V ( ) L L ( ) ' ) Approxmate onstrant Generaton heoretcal guarantees stll hold. onstant factor approxmaton to fndng optmal cuttng plane (-/e) approxmaton for solvng coverage problems Performs ell n practce. Dverst ranng Data RE 6-8 Interactve rac Queres th explctl labeled subtopcs E.g., Use of robots n the orld toda anorobots Space msson robots Underater robots Manual parttonng of the total nformaton regardng a quer Mssng Subtopc Error Rate raned & tested va cross valdaton Retrevng 5 documents Random Oap Uneghted Essental Pages SVM-dv Learnng overage Representatons ranng set th gold standard labels Learn automatc representaton Does not requre gold standard labels Maxmze coverage on ne problem nstances Inverse of predcton problem Gven gold standard, can predct a good coverng Learn automatc representaton that agrees th gold standard soluton 7

Support Vector Machines CS434

Support Vector Machines CS434 Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? Intuton of Margn Consder ponts A, B, and C We