A General Magnitude-Preserving Boosting Algorithm for Search Ranking

Similar documents
Introduction to Boosting

Variants of Pegasos. December 11, 2009

V.Abramov - FURTHER ANALYSIS OF CONFIDENCE INTERVALS FOR LARGE CLIENT/SERVER COMPUTER NETWORKS

Computing Relevance, Similarity: The Vector Space Model

Solution in semi infinite diffusion couples (error function analysis)

On One Analytic Method of. Constructing Program Controls

An introduction to Support Vector Machine

GENERATING CERTAIN QUINTIC IRREDUCIBLE POLYNOMIALS OVER FINITE FIELDS. Youngwoo Ahn and Kitae Kim

Advanced Machine Learning & Perception

Robust and Accurate Cancer Classification with Gene Expression Profiling

CHAPTER 10: LINEAR DISCRIMINATION

Cubic Bezier Homotopy Function for Solving Exponential Equations

Approximate Analytic Solution of (2+1) - Dimensional Zakharov-Kuznetsov(Zk) Equations Using Homotopy

Linear Response Theory: The connection between QFT and experiments

Lecture 11 SVM cont

FTCS Solution to the Heat Equation

TSS = SST + SSE An orthogonal partition of the total SS

In the complete model, these slopes are ANALYSIS OF VARIANCE FOR THE COMPLETE TWO-WAY MODEL. (! i+1 -! i ) + [(!") i+1,q - [(!

Robustness Experiments with Two Variance Components

Machine Learning Linear Regression

John Geweke a and Gianni Amisano b a Departments of Economics and Statistics, University of Iowa, USA b European Central Bank, Frankfurt, Germany

Lecture 6: Learning for Control (Generalised Linear Regression)

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 4

THE PREDICTION OF COMPETITIVE ENVIRONMENT IN BUSINESS

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

Math 128b Project. Jude Yuen

Mechanics Physics 151

( ) () we define the interaction representation by the unitary transformation () = ()

Attribute Reduction Algorithm Based on Discernibility Matrix with Algebraic Method GAO Jing1,a, Ma Hui1, Han Zhidong2,b

Mechanics Physics 151

CS286.2 Lecture 14: Quantum de Finetti Theorems II

e-journal Reliability: Theory& Applications No 2 (Vol.2) Vyacheslav Abramov

An Effective TCM-KNN Scheme for High-Speed Network Anomaly Detection

Existence and Uniqueness Results for Random Impulsive Integro-Differential Equation

Lecture VI Regression

( t) Outline of program: BGC1: Survival and event history analysis Oslo, March-May Recapitulation. The additive regression model

Chapter Lagrangian Interpolation

Econ107 Applied Econometrics Topic 5: Specification: Choosing Independent Variables (Studenmund, Chapter 6)

5th International Conference on Advanced Design and Manufacturing Engineering (ICADME 2015)

Comparison of Differences between Power Means 1

HEAT CONDUCTION PROBLEM IN A TWO-LAYERED HOLLOW CYLINDER BY USING THE GREEN S FUNCTION METHOD

EEL 6266 Power System Operation and Control. Chapter 5 Unit Commitment

Tight results for Next Fit and Worst Fit with resource augmentation

Chapter 6: AC Circuits

Dynamic Team Decision Theory. EECS 558 Project Shrutivandana Sharma and David Shuman December 10, 2005

. The geometric multiplicity is dim[ker( λi. number of linearly independent eigenvectors associated with this eigenvalue.

Clustering (Bishop ch 9)

Online Supplement for Dynamic Multi-Technology. Production-Inventory Problem with Emissions Trading

New M-Estimator Objective Function. in Simultaneous Equations Model. (A Comparative Study)

Volatility Interpolation

Approximation Lasso Methods for Language Modeling

Comb Filters. Comb Filters

Boosted LMS-based Piecewise Linear Adaptive Filters

Time-interval analysis of β decay. V. Horvat and J. C. Hardy

Li An-Ping. Beijing , P.R.China

FI 3103 Quantum Physics

Ordinary Differential Equations in Neuroscience with Matlab examples. Aim 1- Gain understanding of how to set up and solve ODE s

. The geometric multiplicity is dim[ker( λi. A )], i.e. the number of linearly independent eigenvectors associated with this eigenvalue.

Lecture 2 L n i e n a e r a M od o e d l e s

THEORETICAL AUTOCORRELATIONS. ) if often denoted by γ. Note that

M. Y. Adamu Mathematical Sciences Programme, AbubakarTafawaBalewa University, Bauchi, Nigeria

Lecture 18: The Laplace Transform (See Sections and 14.7 in Boas)

Notes on the stability of dynamic systems and the use of Eigen Values.

J i-1 i. J i i+1. Numerical integration of the diffusion equation (I) Finite difference method. Spatial Discretization. Internal nodes.

Appendix to Online Clustering with Experts

CHAPTER 5: MULTIVARIATE METHODS

Dynamically Weighted Majority Voting for Incremental Learning and Comparison of Three Boosting Based Approaches

Appendix H: Rarefaction and extrapolation of Hill numbers for incidence data

Relative controllability of nonlinear systems with delays in control

P R = P 0. The system is shown on the next figure:

Performance Analysis for a Network having Standby Redundant Unit with Waiting in Repair

An Integrated and Interactive Video Retrieval Framework with Hierarchical Learning Models and Semantic Clustering Strategy

Outline. Probabilistic Model Learning. Probabilistic Model Learning. Probabilistic Model for Time-series Data: Hidden Markov Model

WiH Wei He

3. OVERVIEW OF NUMERICAL METHODS

On computing differential transform of nonlinear non-autonomous functions and its applications

A Novel Efficient Stopping Criterion for BICM-ID System

CSCE 478/878 Lecture 5: Artificial Neural Networks and Support Vector Machines. Stephen Scott. Introduction. Outline. Linear Threshold Units

Department of Economics University of Toronto

A Novel Iron Loss Reduction Technique for Distribution Transformers. Based on a Combined Genetic Algorithm - Neural Network Approach

CH.3. COMPATIBILITY EQUATIONS. Continuum Mechanics Course (MMC) - ETSECCPB - UPC

Confidence Estimation Using the Incremental Learning Algorithm, Learn++

How about the more general "linear" scalar functions of scalars (i.e., a 1st degree polynomial of the following form with a constant term )?

RELATIONSHIP BETWEEN VOLATILITY AND TRADING VOLUME: THE CASE OF HSI STOCK RETURNS DATA

( ) [ ] MAP Decision Rule

Multiclass Boosting for Weak Classifiers

A NEW TECHNIQUE FOR SOLVING THE 1-D BURGERS EQUATION

A Competitive Test for Uniformity of Monotone Distributions

UNIVERSITAT AUTÒNOMA DE BARCELONA MARCH 2017 EXAMINATION

SOME NOISELESS CODING THEOREMS OF INACCURACY MEASURE OF ORDER α AND TYPE β

Anomaly Detection. Lecture Notes for Chapter 9. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Learning Objectives. Self Organization Map. Hamming Distance(1/5) Introduction. Hamming Distance(3/5) Hamming Distance(2/5) 15/04/2015

Let s treat the problem of the response of a system to an applied external force. Again,

Machine Learning 2nd Edition

Improved Classification Based on Predictive Association Rules

Mechanics Physics 151

CHAPTER 2: Supervised Learning

ON THE WEAK LIMITS OF SMOOTH MAPS FOR THE DIRICHLET ENERGY BETWEEN MANIFOLDS

Epistemic Game Theory: Online Appendix

Forecasting Using First-Order Difference of Time Series and Bagging of Competitive Associative Nets

Transcription:

A General Magnude-Preservng Boosng Algorhm for Search Rankng Chenguang Zhu 1 * Wezhu Chen Zeyuan Allen Zhu 3 Gang Wang Dong Wang 1 Zheng Chen 1 nsue for Theorecal Compuer Scence Tsnghua Unversy Beng Chna 100084 {zcg.cs60 wd890415}@gmal.com Mcrosof Research Asa No. 49 Zhchun Road Hadan Dsrc Beng Chna 100080 {v-chezhu wzchen v-zezhu gawa v-dongmw zhengc}@mcrosof.com 3 Fundamenal Scence Class Deparmen of Physcs Tsnghua Unversy Beng Chna 100084 zhuzeyuan@homal.com ABSTRACT Tradonal boosng algorhms for he rankng problems usually employ he parwse approach and conver he documen rang preference no a bnary-value label lke RankBoos. However such a parwse approach gnores he nformaon abou he magnude of preference n he learnng process. n hs paper we presen he dreced dsance funcon (DDF as a subsue for bnary labels n parwse approach o preserve he magnude of preference and propose a new boosng algorhm called MPBoos whch apples GenleBoos opmzaon and drecly ncorporaes DDF no he exponenal loss funcon. We gve he boundedness propery of MPBoos hrough heorec analyss. Expermenal resuls demonsrae ha MPBoos no only leads o beer NDCG accuracy as compared o sae-of-he-ar rankng soluons n boh publc and commercal daases bu also has good properes of avodng he overfng problem n he ask of learnng rankng funcons. Caegores and Subec Descrpors H.3.3 [nformaon Sorage and Rereval]: nformaon Search and Rereval;..6[Arfcal nellgence]: Learnng General Terms Algorhms Performance Expermenaon Keywords Dreced dsance funcon (DDF magnude-preservng parwse-preference Permsson o make dgal or hard copes of all or par of hs work for personal or classroom use s graned whou fee provded ha copes are no made or dsrbued for prof or commercal advanage and ha copes bear hs noce and he full caon on he frs page. To copy oherwse or republsh o pos on servers or o redsrbue o lss requres pror specfc permsson and/or a fee. CKM 09 November -6 009 Hong Kong Chna. Copyrgh 009 ACM 978-1-60558-51-3/09/11...$10.00. 1. NTRODUCTON Boosng [9 10] s one of he sae-of-he-ar algorhms n machne learnng. Based on boosng many algorhms have been proposed o solve supervsed learnng asks ncludng [10] for classfcaon [4] for facal recognon and [1] for spam flerng. All of hese have heorecally and emprcally demonsraed ha boosng algorhm and s varans have excellen advanages n erms of convergence of loss funcon low generalzaon error lle resrcon on he form of weak learners ec. As an neresng and sgnfcan applcaon boosng s also successfully employed n he learnng o rank problem for Web search where a se of queres Q and a se of documens D q for each query q Q are gven. For each documen of a gven query here s also a rang r o encode he relevance level of hs documen. Curren popular sraegy o apply boosng for rankng lke RankBoos [7 8] s o regard he parwse preference nformaon as he bnary classfcaon daa whch belongs o he parwse approach for learnng o rank [11]. Specfcally f documen x 1 s raed hgher han anoher documen x wh respec o he same gven query q (x 1 x s labeled as a posve nsance. Oherwse (x 1 x s labeled as a negave nsance. However hs knd of parwse ransformaon from rankng o classfcaon neglecs he scale or magnude of he dfference beween rang pars lke he magnude value of r 1 -r. As a maer of fac far rcher represenaons can be acheved f he magnude of rang dfferences can be consdered n boosng algorhm. Therefore desgnng a new boosng algorhm for rankng wh consderaon of he magnude of dfference beween rang pars s very desrable. Some aemps have been made o address smlar ssues. n he RankBoos algorhm [7 8] he magnude ssue s menoned n he nroducon of he feedback funcon Φ bu laer he heorecal and emprcal analyss focus on he case where *Ths work was done when he frs auhor was vsng Mcrosof Research Asa. 817

Φ= 1 01 whch s under he bnary-label model. [1] leveraged mulple hyperplanes o preserve he magnude of rang dfferences on he bass of he RankSVM algorhm [13 17] and demonsraed he mporance of preference magnude. More recenly [4] analyzed he sably bounds of magnudepreservng loss funcons for generalzaon error and proposed wo magnude-preservng rankng algorhms MPRank and SVRank wh repors of he mprovemen on ms-orderng loss. n hs paper we observe ha s farly sraghforward o apply he preference magnude no he exponenal loss funcon of boosng o mprove he accuracy of rankng. To preserve he magnude of rang dfferences we propose dreced dsance funcon (DDF as he subsue for bnary labels n he parwse approach o rankng. The exac form of DDF can vary wh dfferen represenaons under only wo basc requremens. Thus we presen hree knds of DDFs and ls he approprae scope of usage. Then on he bass of DDF we propose a novel rankng algorhm MPBoos based on GenleBoos [10]. drecly leverages he exponenal loss funcon wh DDF as he subsue for bnary labels whch makes hs algorhm suable for magnude-preservng rankng. We also prove a heorem abou s effecveness on he ranng se. Expermenal resuls on wo publc and one commercal daases all llusrae ha MPBoos wh DDF can sgnfcanly ouperform radonal parwse rankng algorhms as well as he sae-of-he-ar rankng mehods lke LsNe. Also he expermen demonsraes ha he applcaon of DDF can lead MPBoos o avod he overfng problem. The res of he paper s organzed as follows: n Secon we frsly revew some relaed works. n Secon 3 we presen he concep of magnude-preservng labels: dreced dsance funcon (DDF wh hree exemplary funcons. n Secon 4 we propose he MPBoos algorhm whch apples exponenal loss funcons wh DDF. We repor he expermen resuls n Secon 5 and conclude he paper n Secon 6.. RELATED WORKS.1 Learnng o Rank Learnng o rank s a popular opc n boh machne learnng and nformaon rereval research. One of he man approaches o rankng problem s referred o as he parwse approach. n he parwse approach he learnng o rank ask s ransformed no a bnary classfcaon ask based on documen pars (wheher he frs documen or he second should be ranked frs gven a query. [13 17] proposed usng he SVM echnques o buld he classfcaon model whch s referred o as RankSVM. [7 8] proposed performng he ask n a smlar way bu va he AdaBoos algorhm. [1] also adoped he approach and developed a mehod called RankNe whch leverages he cross enropy as he loss funcon and graden descen as he algorhm o ran a neural nework model. Recenly he concep of magnude-preservng rankng was nroduced n [4 1]. [1] leveraged mulple hyperplanes o preserve he magnude of rang dfferences on he bass of he RankSVM and proposed a mehod called Mulple Hyperplane Ranker (MHR. n [4] he auhors analyzed he sably bounds of magnude-preservng loss funcons for generalzaon error. Based on he resuls hey proposed wo algorhms MPRank and SVRank and repored emprcal resuls whch showed mprovemens on ms-orderng loss. However he loss funcons n [4] are regularzaon-based and he σ - admssbly requremen of cos funcon lms he forms of funcons o some exen. For example exponenal cos funcons can hardly mee he requremen wh a small consan σ.. Boosng AdaBoos [9] s a sae-of-he-ar classfcaon algorhm whch sage-wse combnes a number of weak learners and generaes a srong hypohess. [10 ] analyzed he heorecal advanage of he boosng algorhm from he aspec of margn and VCdmenson. [0] presens he absracon for dfferen versons of boosng algorhms. n real applcaon GenleBoos [10] s a varaon of boosng algorhm whch employs he Newon sep o mnmze he exponenal loss funcon. has been shown n [10] ha GenleBoos has smlar performance o AdaBoos and ofen ouperforms he laer especally when sably s an ssue. 3. MAGNTUDE PRESERVNG LABELS Snce our work s under he parwse rankng approach we wll frs walk hrough he basc conceps n he parwse learnng o rank. Nex we wll propose he dreced dsance funcon o preserve he magnude of rang dfference and llusrae hree examples. 3.1 The Parwse Approach for Rankng The parwse approach for rankng s defned as follows n [3]. A rankng daase ncludes a se of queres q Q and a se of documens for each query x D q. The assocaed relevance rang of documen x for query q s represened as r q. The relevance rangs are dscree and ordered wh values such as {Probably Relevan Possbly Relevan No Relevan}. Furhermore here exss a oal orderng > beween varous relevance levels e.g. Probably Relevan > Possbly Relevan > No Relevan. n hs paper we assume ha numercal rangs values are avalable because rankng performance measuremens ofen ake rangs as a componen n calculaon. Also for he learnng asks documens are represened by a vecor of feaure weghs obaned by some query-documen funcon Φ. For nsance he documen x for he query q s represened as Φ ( qx = x = ( x x. q q1 q d The goal of he learnng o rank procedure n he parwse framework s o nroduce a score funcon s ( over he documen space such ha: r > r sx ( > sx ( (1 q q q q.e. f documen x q s preferred over x q he score value for x q should be larger han x q. Ths clarfes he relaonshp beween he parwse-preference learnng and he bnary classfcaon: a classfer can be nroduced o manan he gven preference relaons on he lef of nequaly (1. Thus for parwse approach we defne a preference se conanng documen pars for each query q : Sq = {(( xq xq yq rq rq} where he bnary label y q sasfes: y q 1 rq > rq = 1 rq > r q ( 818

3. Dreced Dsance Funcon (DDF Alhough he bnary label defned n ( s suable for leveragng well-suded classfcaon ools hs ransformaon from rankng o classfcaon also loses consderable amoun of nformaon. For example ms-rankng wo documens wh rangs 5 and 1 should receve more punshmen han ms-rankng wo documens wh rangs 5 and 4 because he former wll cause larger decrease n rankng measuremens lke NDCG [16]. n hs sense he bnary label consruced by ( only reflecs he desred order of wo documens omng he useful nformaon hdden n he magnude of rang dfferences. On he oher hand far rcher represenaons can be acheved f he magnude of rang dfferences s consdered. Therefore we need o modfy he radonal defnon of labels n ( o preserve he magnude nformaon. Specfcally we defne he label o be he dreced dsance from a rang o anoher: ds( ra r b whch depcs he mpeus of placng a documen wh rang r a before anoher documen wh rang r b. As shown n Fgure 1 alhough he concree values of dreced dsances can vary he magnude of rang dfferences should be preserved as well as he advanage of placng hgh-raed documens n fron. Now we formally presen he requremens (3 and (4 on he concep of dreced dsance funcon (DDF: ds ( whch wll be used as an exenson of radon bnary labels ( n he subsequen analyss: sgn( ds( r r = sgn( r r (3 1 x>0 where sgn( x = 0 x=0 1 x<0 r r > r ' r ' ds( r r > ds( r ' r ' (4 Then DDF s drecly appled as he subsue for bnary labels and he modfed preference se s (5 whch wll be leveraged no he exponenal loss funcon n our boosng approach for learnng o rank. S ' = {(( x x ds( r r r r } (5 q q q q q q q Compared o σ -admssbly presened n Defnon of [4] he basc requremens on DDF allow many more canddae funcons n real applcaon. For nsance we wll propose hree possble dreced dsance funcons whch have performed superorly n he emprcal analyss. 3..1 Lnear Dreced Dsance (LDD An nuve negraon of preference magnude no DDF s he lnear funcon based on he dfference of he preference dfference. We call hs funcon as Lnear Dreced Dsance (LDD and show s equaon n formula (6. The coeffcen α s a posve consan for regularzng he scope of ds ( and s used o mee he requremens (3 and (4. ds( r r = α( r r (6 LDD s an easy and smple funcon o consder he preference magnude. Bu when he numercal dfference beween rangs s large s very dffcul o une a good α value o map all rang dfferences no a proper nerval. We wll show hese fndngs n ds( r r c a r a r b r c Fgure 1. Three rangs ra > rb > rc where he dreced dsances ds( rc r a and ds( ra r b are marked. The exac values of he dsances can vary bu should follow ha: 1. ds( rc ra > ds( ra rb o preserve magnude of rang dfferences.. ds( rc r a < 0 and ds( ra r b > 0 o presen he advanage of placng documens wh hgher rangs n fron. Fgure. Curves of LDD LOGDD and LOGTDD under dfferen values of rang dfferences. The parameers are α = 0. λ = 3 and β = 0.5. he expermen. n hs case funcons ha can smooh large rang dfferences should help as proposed n Secon 3.. and 3..3. 3.. Logarhmc Dreced Dsance (LOGDD Logarhmc Dreced Dsance funcon (7 akes he form of logarhms n order o smooh he gradng dfference beween rangs. Lke LDD LOGDD ulzes he posve parameer λ. ds( r r = sgn( r r log(1 + λ r r (7 ds( r r Compared wh LDD LOGDD can be ulzed when he rang values come from a large range or he grades of rangs are nonunform. s also obvous o noe ha due o he logarhmc naure he oupu range of LOGDD wll evenually be smaller han ha of LDD and be more suable o apply n he exponenal a b 819

loss funcon. On he oher hand LOGDD s much smooher han LDD n erm of he oupu value as shown n Fgure. 3..3 Logsc Dreced Dsance (LOGTDD Logsc Dreced Dsance funcon (8 leverages he well-suded logsc funcon. Lke LDD and LOGDD LOGTDD apples he posve parameer β. 1 ds( r r = sgn( r r 1 r r + e β Compared wh LDD and LOGDD he oupu range of LOGTDD s always n ( 1 0.5] [0.51 whch makes LOGTDD be smooher han boh he above funcons. follows ha s easer o une he relaed parameers when consderng he preference magnude. Also noe ha when applyng LOGTDD he only parameer β should be carefully chosen o make ds( r r non-neglgbly dfferen for dsparae values of r r. For example f he gradng dfferences beween rangs are usually large a relavely small β should be seleced o avod sauraon and we wll use valdaon daase o une hs parameer n our expermens. n summary he nroducon of dreced dsance funcon s o subsue convenonal bnary-value labels and assgn magnudepreservng propery o he rankng algorhm lke he one we wll nroduce n Secon 4. 4. MPBOOST ALGORTHM Followng he nroducon o DDF n Secon 3 we wll presen a novel boosng algorhm MPBoos. Specfcally we apply he GenleBoos approach [10] o defne loss funcon and desgn opmzaon mehods. For convenence we combne he preference ses over all queres: S = U S ' and defne he ndex se over S : q q = {( (( x xds( r r S}. Noe ha for sake of smplcy we wll somemes om he query subscrp n he followng dscusson.e. whenever ( x x s nvolved we assume ha he documens x and (8 x belong o he same query q and r q r q. Smlarly ds( r r means ds( rq r q. Nex o leverage DDF n he MPBoos algorhm we requre ha he condon ds( r r 1 (9 s sasfed whch can be aaned va carefully seng he parameers whn he appled dreced dsance funcons lke α λ and β n LDD LOGDD and LOGTDD. Ths condon wll be ulzed n he followng analyss. Now he loss funcon MPBoos employs s as follows: ds( r r( F( x F( x J( F = e (10 where he srong hypohess F( x s a score funcon based on an addve model. n oher words F s nally se as 0. Then n he h round F( x F( x + f ( x where f ( x s called he weak Algorhm 1 MPBoos algorhm for generang a rankng funcon. npu: Query se Q and {( x } q q r q = 1 for each q Q. Oupu: rankng funcon F( x 1: Generae S = {(( x x ds( r r r r } U q n q q q q q q : Generae ndex se = {( (( x x ds( r r S} (1 1 w 3: nalze = for ( 4: for = 1... T do 5: F he weak ranker f such ha: f = argmn J ( f f wse = argmn w [ ds( r r ( f( x f( x ] f ( + 1 ( ds ( r r( f ( x f ( x w w e / Z ( ds( r r( f ( x f ( x = 6: Updae: where Z w e 7: end for T 8: Oupu he fnal rankng funcon F( x = f ( x learner n he h round. Thus when he MPBoos algorhm ends afer m rounds F( x = f ( x. m = 1 n order o mnmze loss funcon MPBoos needs o fnd he bes weak learner n each round. For example f he curren hypohess s F and he nex weak learner o be added s f hen he addve loss funcon should be: JF ( + f = e ds( r r( F( x F( x ds( r r[( F( x + f ( x ( F( x + f ( x] = 1 e (1 ds( r r ( f( x f( x ds( r r [ f ( x f ( x ] + ds( r ( ( ( 1 1 r F x F x e ( [ ds( r r ( f( x f( x] + 1 ds ( r r( F ( x F ( x = e ([ ds( r r ( f( x f( x] + 1 (11 where we apply second-order Taylor approxmaon and he condon (9. Thus we can deermne he bes f o be added o F by mnmzng J ( F + f equvalen o mnmzng a weghed squared loss: Jwse( f = w [ ds( r r ( f( x f( x] (1 ds( r r( F( x F( x where w = e he wegh assgned o each documen par. 80

Now we formally presen our algorhm: MPBoos whch s based on GenleBoos [10] wh changes n he loss funcon calculaon and wegh modfcaon. n MPBoos he nal wegh of each documen par s unform. Durng each eraon a weak ranker s chosen o mnmze (1. Then he weghs are updaed wh help of he normalzer Z : ( ds( r r( f ( x f ( x Z = w e (13 The fnal rankng funcon s he summaon over all weak rankers. The procedure deals of MPBoos are presened n Algorhm 1. Noe ha MPBoos bears some resemblance o RankBoos whch apples he opmzaon scheme of AdaBoos. n he RankBoos framework a bound on he rankng loss s offered n Theorem 1 [8]. Now wh he employmen of GenleBoos approach and DDF ds ( MPBoos sll nhers he bounded rankng loss propery of RankBoos. We formalze hs he followng heorem. Theorem 1. Assumng he noaon of Algorhm 1 he normalzed rankng loss (ms-orderng of F s bounded: T (1 (1 w [[ F( x F( x]] + w [[ F( x F( x]] Z {( r > r} {( r < r} = 1 where [[ π ]] s defned o be 1 f predcae π holds and 0 oherwse. Proof: The proof s smlar o he one for Theorem 1 n [8] bu wh he nroducon of DDF. Noe ha [[ x 0]] e α x and [[ x 0]] e α x hold for all α > 0 and all real x. Furhermore ds( r r has he same sgn as r r. Thus w [[ F( x F( x ]] + (1 {( r > r} (1 w [[ F( x F( x ]] {( r < r} (1 [ ds( r r ( F ( x F ( x ] w e {( r > r} (1 [ ds( r r ]( F ( x F ( x w e {( r < r} (1 [ ds( r r ( F( x F( x ] = w e + = w T ( T + 1 = 1 Z T = Z (14 = 1 n vew of he bound esablshed n Theorem 1 we are guaraneed o produce a combned rankng wh low rankng loss f on each round we choose a weak ranker f o mnmze Z. Acually mnmzng J wse( f (1 s exacly mnmzng a second-order Taylor approxmaon of Z (13 when (9 s sasfed: Z = ( ds ( r r ( f ( x f ( x w [1 ds( r r ( f ( x f ( x w e ( ds( r r ( ( ( + f x f x ] ( 1 1 w { [ ds( r r ( f( x f( x] + } 1 ( 1 = w [ ds( r r ( f( x f( x ] + w 1 1 = Jwse( f + ( (15 Therefore n heory he MPBoos algorhm can acheve low ms-orderng loss (16 va sage-wse graden descen mehod. And has been proved ha mnmzng he number of msorderngs s equvalen o maxmzng a lower-bound on rankng performance mercs [6]. The emprcal analyss n Secon 5 also subsanaes he resul. (16 MsOrder( F = [[ F( x F( x ]] + [[ F( x F( x ]] {( r> r} {( r< r} 5. EXPERMENTAL RESULTS 5.1 Daa Collecons We ulzed hree daa ses n he expermens: OHSUMED [14] a benchmark daa se for documen rereval downloadable from LETOR 3.0 [18 19]; Web-1 a Russan web search daase [15]; Web- an Englsh web search daase obaned from a popular search engne. OHSUMED [14] s a collecon for nformaon rereval research. s a subse of MEDLNE a daabase on medcal publcaons. OHSUMED conans a oal of 348556 records (ou of over 7 mllon from 70 medcal ournals durng he perod of 1987-1991. The felds of a record embrace le absrac MeSH ndexng erms auhor source and publcaon ype. n OHSUMED here are 106 queres each wh a number of assocaed documens. Also n he daa se are a oal of 16140 query-documen pars each of whch descrbed by 45 feaures. n OHSUMED relevance grades are from {0 1 }. We conduced expermens on each of he 5 subfolders n OHSUMED each conanng ranng/valdaon/es daa. Web-1 s he publc ranng daa from nerne Mahemacs 009 cones [15]. Ths daase conans compued and normalzed feaures of query-documen pars as well as relevance udgmens made by Yandex search engne assessors. There are 9790 query-documen pars whn a oal of 914 queres. Each query-documen par s descrbed by 45 feaures. All feaures are eher bnary value from {0 1} or connuous values from [0 1]. n Web-1 relevance grades are connuous values from range [0 4] wh hgher values represens hgher relevance. We used fvefold cross valdaon wh 3+1+1 spls beween ran/valdaon/es ses. Web- s from a commercal Englsh search engne. There are 50 000 query-documen pars whn a oal of 467 queres. Each query-documen par s descrbed by 1779 feaures. n Web- relevance grades are from {0 1 3 4}. Agan we used fve-fold cross valdaon wh 3+1+1 spls beween ran/valdaon/es ses. 81

5. Performance Measures n he expermen we apply he Normalzed Dscouned Cumulave Gan(NDCG [16] as he performance measure. NDCG can handle mulple levels of relevance and favors algorhms ha gve hgher ranks o hghly relevan documens han margnally relevan ones. Furhermore lower rankng poson s of less value o hs merc snce has less chance o be examned by a user. n accordance wh hese prncples compung NDCG values follow he followng four seps: 1 Compue he gan of each documen Dscoun he gan of each documen by s rankng poson n he ls 3 Cumulae hese dscouned gan of he ls 4 Normalze he dscouned cumulave gan of he ls Therefore NDCG of a rankng ls a poson n s calculaed as followng: r( 1 = 1 n Nn ( = Z r( n 1 = 1 > 1 log ( where r( s he rang of he h documen n he ls and he normalzaon consan Z n s chosen so ha he perfec rankng ls receves a NDCG score of 1. The fnal NDCG score s he average over all queres. n addon n he valdaon phase of our expermens we se NDCG@5 as he crera for selecng he bes parameers. 5.3 Expermenal resuls on NDCG We presen he rankng accuracy of MPBoos algorhm on he hree daases along wh some sae-of-ar baselne rankng mehods. Specfcally on OHSUMED daase we apply RankBoos [7 8] LsNe [] and AdaRank-NDCG [3] as he baselne mehods. The measuremen on hese algorhms comes from [19]. Also we make MPBoos wh bnary labels as anoher baselne mehod whch s represened by MPBoos.BNARY n he fgures. Also n followng fgures MPBoos.LDD MPBoos.LOGDD and MPBoos.LOGTDD respecvely represen he MPBoos algorhm wh DDF n form (6 (7 and (8. These hree versons of algorhms leverage he magnude-preservng propery. The parameers uned n he expermens nclude α n (6 λ n (7 β n (8 and he number of boosng rounds T. Noe ha for MPBoos he condon (9 should be consdered durng he unng. And we use he valdaon se o une hese parameers ndependenly. n he expermens we leverage decson sumps as he weak ranker for he MPBoos algorhm. The defnon and opmzaon process of decson sumps s presened n he appendx. 5.3.1 Expermens on OHSUMED On OHSUMED we conduced fve-fold cross valdaon expermens usng he daa spl provded n LETOR. Every rankng measuremen was calculaed as he mean over fve folders. As shown n Fgure 3 he hree versons of magnudepreservng MPBoos algorhms ouperform nearly all baselnes n NDCG@1 o NDCG@10 by 1 pon o 6 pons gan. Alhough MPBoos.LDD and MPBoos.LOGDD lag behnd LsNe and AdaRank-NDCG by abou 0.5% n NDCG@1 MPBoos.LOGTDD conssenly acheves he bes NDCG accuracy across NDCG@1 o NDCG@10. Furhermore MPBoos.BNARY whch s he MPBoos algorhm wh bnary labels ouperforms RankBoos n all mercs excep for NDCG@6 and NDCG@7. Thus we leverage MPBoos.BNARY as he baselne n he followng expermens. 5.3. Expermens on Web-1 We conduced expermens on Web-1 va fve-fold crossvaldaon and he repored resul s he average over fve folders. As shown n Fgure 4 he advanage of MPBoos wh magnudepreservng loss funcons s no as clear as ha n OHSUMED. Sll an average of 0.16% and 0.13% NDCG advanage s ganed by MPBoos.LOGDD and MPBoos.LOGTDD over MPBoos.BNARY. However MPBoos.LDD lags behnd MPBoos.BNARY by an average of 0.4% NDCG. 5.3.3 Expermens on Web- The expermen on Web- was also conduced va fve-fold crossvaldaon and he repored resul s an average over fve folders. From Fgure 5 we can see ha MPBoos wh magnudepreservng loss funcons sgnfcanly ouperform he verson wh bnary labels by an average of 1.5% (MPBoos.LDD.% (MPBoos.LOGDD and 1.8%(MPBoos.LOGTDD and MPBoos.LOGTDD performs he bes n NDCG@1 and NDCG@ whle MPBoos.LOGDD acheves he bes n he res posons. 5.4 Dscusson n hs secon we wll ry o presen some analyss n lgh of he conssen and excellen expermenal resuls generaed from our MPBoos algorhm as compared o he sae-of-he-ar rankng algorhms. Frsly he loss funcon (10 appled n MPBoos ulzes he concep of dreced dsance funcon over rangs hus preservng he magnude of rang dfferences even n classfcaon. n he expermen on hree daases all hree versons of MPBoos wh DDF ouperform MPBoos wh bnary labels. Hence subsanaes ha magnude-preservng labels can lead he algorhm o grasp he nheren paern n documen vecors and n general yeld hgher performance. Secondly MPBoos pursued he rankng problem wh quas- GenleBoos approach. n [10] GenleBoos has been emprcally proved o perform beer han AdaBoos whle RankBoos s consruced on he bass of AdaBoos. Thus MPBoos.BNARY and he oher hree versons of MPBoos wh magnudepreservng propery conssenly ouperform RankBoos n OHSUMED daase. s neresng o observe ha among he hree versons of MPBoos applyng LDD LOGDD and LOGTDD he logarhmbased one acheved he bes performance n wo daases; whle he lnear-based one fall behnd he oher wo n all hree daases. We arbue he resul o he ably o depc rang dfferences. The lnear-based DDF LDD ends o oupu oo large values for large rang grades hus forcng he parameer α o ake prey small values (e.g. he bes α n he 8

Fgure 3. Rankng accuraces on OHSUMED daase Fgure 6. Average NDCG@5 on es se over 5 folds n Web- Daase. For MPBoos.LDD MPBoos.LOGDD and MPBoos.LOGTDD he parameers α λ and β are se as he one achevng he bes performance on valdaon se. Web- daase was n average 0.06. The consequence s ha he ably o dfferenae smaller rang dfferences degrades. The logsc-based DDF LOGTDD has smlar problems n ha he overall oupu nerval s ( 1 0.5] [0.51 and he sauraon problem appears que ofen. On he conrary LOGDD generaes moderae range of oupu whle preservng he orgnal magnude properly. Fgure 4. Rankng accuraces on Web-1 daase Fgure 5. Rankng accuraces on Web- daase. 5.5 Overfng ssues n he boosng mehod overfng s a horny ssue ha needs o be handled [5]. Specfcally whle he emprcal error wll keep decreasng n he ranng phase he performance on es se may degrade as he number of ranng rounds ncreases. n our expermens we observe ha MPBoos wh bnary label suffers from overfng whle MPBoos wh magnude-preservng properes perform well as ranng goes on. To demonsrae n he prevous expermen on Web- we record he performance of he raned model on es se every 10 eraons. Fgure 6 shows he average NDCG@5 on es se over fve folders n he Web- daase. As shown n he fgure MPBoos.BNARY suffers from serous overfng problem afer peakng a around he 90h eraon. However MPBoos.LDD MPBoos.LOGDD and MPBoos.LOGTDD can acheve comparavely sable rankng accuraces as he number of ranng rounds ncrease and avod he overfng ssue o some exen. We arbue hs phenomenon o he ncompleeness of radonal bnary-label parwse approach o rankng combned wh he dsrbuon of dfferen magnude of rangs. On one hand he calculaon of mercs such as NDCG deals wh he magnude of rangs no drecly wh he parwse order. On he oher hand n he Web- daase he relevance rangs come from {0 1 3 4} whch gve consderable mpeus o place documen wh rangs lke 3 and 4 n fron. Thus gnorng he magnude of rangs and only reanng he relave order bnary labels lose a large amoun of nformaon durng he ransformaon from rankng o classfcaon. Thus he raned model devaes from correcly 83

capurng he way o mprove mercs lke NDCG and es daa subsanaes he ncompleeness. On he conrary he oher hree versons of magnude-preservng MPBoos all apply loss funcons wh DDF. Thus n he ranng phase hese hree versons can learn a more suable model o mach he goal of rankng n he sense of mprovng NDCG. Hence he performance on es se does no degrade afer suffcen long me of ranng. 6. CONCLUSON AND FUTURE WORK n hs paper we propose a new approach o magnudepreservng rankng: dreced dsance funcon (DDF. Compared o prevous schemes [4] DDF mposes less resrcon on he form of funcons whle reanng he magnude of rang dfferences. We also presen hree knds of dreced dsance funcons: LDD LOGDD and LOGTDD whch can be appled under dfferen crcumsances due o he oupu range. The parameers n hese DDFs can be easly adaped o mee requremens of dfferen rankng algorhms. Based on DDF we propose a new boosng mehod for rankng problem: MPBoos. MPBoos ncorporaes dreced dsance funcon wh he exponenal loss funcon and apples GenleBoos-lke opmzaon. The rankng loss or msorderng of MPBoos s sll bounded lke RankBoos whch s based on AdaBoos. Expermenal resuls wh hree daases ndcae ha he MPBoos mehod when combned wh magnude-preservng DDF ouperforms bnary-label-based MPBoos and exsng sae-of-ar approaches lke RankBoos LsNe and AdaRank- NDCG. Furhermore MPBoos wh DDF end o avod overfng n ranng. For fuure work we plan o sudy he heorecal advanage n our mehod. We also nend o apply mxed dreced dsance for dfferen pars of rangs o more accuraely depc he magnude ssue. 7. ACKNOWLEDGEMENT The work of Chenguang Zhu was suppored n par by he Naonal Basc Research Program of Chna Gran 007CB807900 007CB807901 he Naonal Naural Scence Foundaon of Chna Gran 60604033 60553001 and he H-Tech research and Developmen Program of Chna Gran 006AA10Z16.We would also express our sncere acknowledgemen o Xaohu Wu and Teng Gao for her grea help for our work. 8. REFERENCES [1] Burges C. Shaked T. Renshaw E. Lazer A. DeedsM. Hamlon N. and Hullender G. (005. Learnng o rank usng graden descen. Proceedngs of CML 005 89 96. [] Cao Z. Qn T. Lu T. Y. Tsa M.-F. and L H. (007. Learnng o Rank: From Parwse Approach o Lswse Approach. Proceedngs of CML 007 19-136. [3] Carvalho V. R. Elsas J. L. Cohen W. W. and Carbonell J. G. (008. A Mea-Learnng Approach for Robus Rank Learnng. SGR 008 Workshop on Learnng o Rank for nformaon Rereval (LR4R 008 15-3. [4] Cores C. Mohr M. and Rasog A. (007. Magnude- Preservng Rankng Algorhms. Proceedngs of CML 007 169-176. [5] Derch T. G. (1999. An expermenal comparson of hree mehods for consrucng ensembles of decson rees: Baggng boosng and randomzaon. Machne Learnng 40 ( (1999. [6] Elsas J. Carvalho V. R. and Carbonell J. G. (008. Fas learnng of documen rankng funcons wh he commee percepron. Proceedngs of ACM nernaonal Conference on Web Search and Daa Mnng 008. [7] Freund Y. yer R. Schapre R.E. and Snger Y.(1998. An Effcen Boosng Algorhm for Combnng Preferences. Proceedngs of CML 1998. [8] Freund Y. yer R. Schapre R. E. and Snger Y.(003. An Effcen Boosng Algorhm for Combnng Preferences. Journal of Machne Learnng Research 4 (003 933-969 [9] Freund Y. and Schapre R. E. (1995. A decson-heorec generalzaon of onlne learnng and an applcaon o boosng. n Compuaonal Learnng Theory: Eurocol 95 3 37. [10] Fredman J. Hase T. and Tbshran R.(000 Addve Logsc Regresson: a Sascal Vew of Boosng. Annals of Sascs 000 Vol. 8. [11] Fürnkranz J. and Hüllermeer E. (003. Parwse Preference Learnng and Rankng. Proceedngs of ECML 003 (pp. 145-156. [1] He J. and Bo T. (007. Asymmerc graden boosng wh applcaon o spam flerng. Proceedngs of Fourh Conference on Emal and An-Spam CEAS 007. [13] Herbrch R. Graepel T. and Obermayer K. (1999. Suppor vecor learnng for ordnal regresson. Proceedngs of CANN 1999 97 10. [14] Hersh W. R. Buckley C. Leone T. J. and Hckam D. H. (1994. OHSUMED: An neracve rereval evaluaon and new large es collecon for research. Proceedngs of SGR 1994 19 01. [15] nerne Mahemacs Cones 009 ranng daa (Learnng o Rank hp://download.yandex.ru/ma009/ma009.ar.bz Accessed 0 May 009 [16] Jarveln K. and Kekananen J. (000. r evaluaon mehods for rerevng hghly relevan documens. Proceedngs of SGR 000 41 48. [17] Joachms T. (00 Opmzng search engnes usng clckhrough daa. Proceedngs of he eghh ACM SGKDD nernaonal conference on Knowledge dscovery and daa mnng 133 14. [18] Lu T. Y. Qn T. Xu J. Xong W. Y. and L H. (007. Leor: Benchmark daase for research on learnng o rank for nformaon rereval. Proceedngs of SGR 007. [19] Lu T. Y. Zhang R. C. (008 Learnng o Rank (LETOR hp://research.mcrosof.com/enus/um/beng/proecs/leor/ndex.hml. Accessed 17 Aprl 009. [0] Mason L. Baxer J. Barle P. and Frean M. (000. Boosng algorhms as Graden Descen. Proceedngs of NPS 1 51 518. [1] Qn T. Lu T. Y. La W. Zhang X. D. Wang D. S. and L H. Rankng wh Mulple Hyperplanes. Proceedngs of 84

he 30h Annual nernaonal ACM SGR Conference (007 79-86. [] Schapre R. E. Freund Y. Barle P. L. and Lee W. S. (1998 Boosng he margn: A new explanaon for he effecveness of vong mehods The Annals of Sascs J. Machne Learnng Research vol. 6(5 1651-1686. [3] Xu J. and L H. AdaRank: A Boosng Algorhm for nformaon Rereval. Proceedngs of he 30h annual nernaonal ACM SGR conference on Research and developmen n nformaon rereval (007 391-398. [4] Yang P. Shan S. Gao W. L S. and Zhang D. (004 Face Recognon Usng Ada-Boosed Gabor Feaures. EEE n. conf. On Auomac Face and Gesure Recognon (FG004 356-361. 9. APPENDX - THE WEAK RANKER: DECSON STUMPS The weak ranker based on decson sumps rd ( x s defned as he followng: a x k > θ rd ( x = (17 0 xk θ k s he seral number of he feaure rd ( x selecs. θ s he hreshold for hs feaure. a and 0 are he only wo possble values rd ( x can ake. We choose 0 here because n parwse approach only he dfference of he wo values rd ( x can ake maers. n oher words we only need rd( x rd( x n calculaon. To fnd he bes rd ( x we can erae k. When k s deermned we only have o erae ( n + 1 values for θ : - x 1 kx k xn k.thus he only problem s o fnd he bes a when k and θ are deermned. Noce ha boh rd( x and rd( x can possbly ake wo values yeldng 4 combnaons. Therefore we should spl he sum no 4 cases. Defne: A1 = {( xk > θ xk θ} A = {( xk > θ xk > θ} B1 = {( xk θ xk θ} B = {( x θ x > θ} (18 k k Suppose he bes a for specfc k and θ s ak θ. We have: J ( f = w ( ds( r r a + 0 + w ( ds( r r ( ( wse k θ A1 A ak θ + ak θ ( ( + w ( ds( r r 0+ 0 + w ( ds( r r B1 B 0 + ak θ ( ( = w ( ds( r r ak θ + w ds( r r A1 A ( ( + w ds( r r + w ( ds( r r + ak θ B1 B (19 Consequenly n order o mnmze J wse( f we ake he paral dervave and oban he bes ak θ : a w ds( r r w ds( r r ( ( A1 B k θ = ( ( w + w A1 B Thus afer erang hrough all possble k s and θ s we can ge he mnmum J wse( f and he correspondng bes parameers. ak x θ k > θ Then f ( x =. 0 xk θ 85