Learning to Identify Unexpected Instances in the Test Set

Size: px

Start display at page:

Download "Learning to Identify Unexpected Instances in the Test Set"

Curtis Anderson
5 years ago
Views:

1 Learnng to Ientfy Unexpete Instanes n the Test Set Xao-L L Insttute for Infoomm Researh, 21 Heng Mu Keng Terrae, Sngapore, xll@2r.a-star.eu.sg Bng Lu Department of Computer Sene, Unversty of Illnos at Chago, 851 South Morgan Street, Chago, IL lub@s.u.eu See-Kong Ng Insttute for Infoomm Researh, 21 Heng Mu Keng Terrae, Sngapore, skng@2r.a-star.eu.sg Abstrat Tratonal lassfaton nvolves bulng a lassfer usng labele tranng examples from a set of preefne lasses an then applyng the lassfer to lassfy test nstanes nto the same set of lasses. In prate, ths paragm an be problemat beause the test ata may ontan nstanes that o not belong to any of the prevously efne lasses. Detetng suh unexpete nstanes n the test set s an mportant ssue n prate. The problem an be formulate as learnng from postve an unlabele examples (PU learnng. Hoever, urrent PU learnng algorthms requre a large proporton of negatve nstanes n the unlabele set to be effetve. Ths paper proposes a novel tehnque to solve ths problem n the text lassfaton oman. The tehnque frst generates a sngle artfal negatve oument A N.ThesetsP an {A N } are then use to bul a naïve Bayesan lassfer. Our experment results sho that ths metho s sgnfantly better than exstng tehnques. 1 Introuton Classfaton s a ell-stue problem n mahne learnng. Tratonally, to bul a lassfer, a user frst ollets a set of tranng examples that are labele th preefne or knon lasses. A lassfaton algorthm s then apple to the tranng ata to bul a lassfer that s subsequently employe to assgn the preefne lasses to nstanes n a test set (for evaluaton or future nstanes (n prate. Ths paragm an be problemat n prate beause some of the test or future nstanes may not belong to any of the preefne lasses of the orgnal tranng set. The test set may ontan atonal unknon sublasses, or ne sublasses may arse as the unerlyng oman evolves over tme. For example, n aner lassfaton, the tranng set onssts of ata from urrently knon aner subtypes. Hoever, sne aner s a omplex an heterogeneous sease, an stll a perplexng one to-ate, t s lkely that the test ata ontan aner subtypes that are not yet meally lassfe (they are therefore not overe n the tranng ata. Even f the tranng ata o ontan all the urrent aner subtypes, ne subtypes may be forme at a later stage as the sease evolves ue to mutatons or other aner-ausng agents. Ths phenomenon s not unommon even n the seemngly smpler applaton omans. For example, n oument lassfaton, tops are often heterogeneous an ne tops evolve over tme. A oument lassfer bult for lassfyng say, omputer sene papers, oul fae the smlar problems as the aner lassfer esrbe above. Ths s beause omputer sene s a heterogeneous an nreasngly ross-splnary oman; t s also a raply evolvng one th ne tops beng reate over tme. Thus, a lassfer reate base on the noton of a fxe set of preefne lasses s boun to be naequate n the omplex an ynam real-orl n the long run, requrng the user to manually go through the lassfaton results to remove the unexpete nstanes. In prate, a ompetent lassfer shoul learn to entfy unexpete nstanes n the test set so as to automatally set these unlassfable nstanes apart. In some applatons, ths an be mportant n tself. For example, n the aner example above, eteton of the unexpete nstanes an alert the sentsts that some ne meal sovery (a ne aner subtype may have ourre. In reent years, researhers have stue the problem of learnng from postve an unlabele examples (or PU learnng. Gven a postve set P an an unlabelle set U, a PU learnng algorthm learns a lassfer that an entfy hen postve ouments n the unlabele set U. Our problem of entfyng unexpete nstanes n the test set an be moele as a PU learnng problem by treatng all the tranng ata as the postve set P an the test set as the unlabele set U. A lassfer an then be learne usng PU learnng algorthms to lassfy the test set to entfy those unexpete (or negatve nstanes before applyng a tratonal lassfer to lassfy the remanng nstanes nto the orgnal preefne lasses. Hoever, as the urrent PU tehnques operate by tryng to entfy an aequate set of relable negatve ata from the unlabele set U to learn from, they requre a large proporton of unexpete nstanes n the unlabele set U to be effetve. In prate, the number of unexpete nstanes n the test ata an be very small sne they are most lkely to be arsng from an emergng lass. Ths means that the lassfers bult th exstng PU learnng tehnques ll 2802

2 perform poorly ue to the small number of unexpete (negatve nstanes n U. In ths paper, e propose a novel tehnque alle LGN (PU Learnng by Generatng Negatve examples, an e stuy the problem usng text lassfaton. LGN uses an entropy-base metho to generate a sngle artfal negatve oument A N base on the nformaton n P an U, nhh the features frequeny strbutons orrespon to the egrees of negatveness n terms of ther respetve entropy values. A more aurate lassfer (e use the naïve Bayesan metho an be bult to entfy unexpete nstanes th the help of the artfal negatve oument A N. Expermental results on the benhmark 20 Nesgroup ata shoe that LGN outperforms exstng methos ramatally. 2 Relate Work PU learnng as nvestgate by several researhers n reent years. A stuy of PAC learnng from postve an unlabele examples uner the statstal query moel as gven n [Dens, 1998]. [Lu et al., 2002] reporte sample omplexty results an shoe ho the problem may be solve. Subsequently, a number of pratal algorthms [Lu et al., 2002; Yu et al., 2002; L an Lu, 2003] ere propose. They all onforme to the theoretal results n [Lu et al., 2002] follong a to-step strategy: (1 entfyng a set of relable negatve ouments from the unlabele set; an (2 bulng a lassfer usng EM or SVM teratvely. Ther spef fferenes n the to steps are as follos. S-EM propose n [Lu et al., 2002] s base on naïve Bayesan lassfaton an the EM algorthm [Dempster, 1977]. The man ea as to frst use a spyng tehnque to entfy some relable negatve ouments from the unlabele set, an then to run EM to bul the fnal lassfer. PEBL [Yu et al., 2002] uses a fferent metho (1-DNF to entfy relable negatve examples an then runs SVM teratvely to bul a lassfer. More reently, [L an Lu, 2003] reporte a tehnque alle Ro-SVM. In ths tehnque, relable negatve ouments are extrate by usng the nformaton retreval tehnque Roho [Roho, 1971], an SVM s use n the seon step. In [Fung et al., 2005], a metho alle PN-SVM s propose to eal th the stuaton hen the postve set s small. All these exstng methos requre that the unlabele set have a large number of hen negatve nstanes. In ths paper, e eal th the opposte problem,.e. the number of hen negatve nstanes s very small. Another lne of relate ork s learnng from only postve ata. In [Sholkopf, 1999], a one-lass SVM s propose. It as also stue n [Manevtz an Yousef, 2002] an [Crammer, 2004]. One-lass SVM buls a lassfer by treatng the tranng ata as the postve set P. Those nstanes n test set that are lassfe as negatve by the lassfer an be regare as unexpete nstanes. Hoever, our experments sho that ts results are poorer than PU learnng, hh nates that unlabele ata helps lassfaton. 3 The Propose Algorthm Gven a tranng set { }( =1,2,,n of multple lasses, our target s to automatally entfy those unexpete nstanes n test set T that o not belong to any of the tranng lasses. In the next subseton (Seton 3.1, e esrbe a baselne algorthm that retly apples PU learnng tehnques to entfy unexpete nstanes. Then, n Seton 3.2, e present our propose LGN algorthm. 3.1 Baselne Algorthms: PU Learnng To reaptulate, our problem of entfyng unexpete nstanes n the test set an be formulate as a PU learnng problem as follos. The tranng nstanes of all lasses are frst ombne to form the postve set P. The test set T then forms the unlabele set U, hh ontans both postve nstanes (.e., those belongng to tranng lasses an negatve/unexpete nstanes n T (.e., those not belongng to any tranng lass. Then, PU learnng tehnques an be employe to bul a lassfer to lassfy the unlabele set U (test set T to entfy negatve nstanes n U (the unexpete nstanes. Fgure 1 gves the etale frameork for generatng baselne algorthms base on PU learnng tehnques. 1. UE = ; 2. P = tranng examples from all lasses (treate as postve; 3. U = T (test set, gnore the lass labels n T f present; 4. Run an exstng PU learnng algorthm th P an U to bul a lassfer Q; 5. For eah nstane U (hh s the same as T 6. Use a lassfer Q to lassfy 7. If s lassfe as negatve then 8. UE = UE { }; 9. output UE Fgure 1. Dretly applyng exstng PU learnng tehnques Inthebaselnealgorthm,euseasetUE to store the negatve (unexpete nstanes entfe. Step 1 ntalzes UE to the empty set, hle Steps 2-3 ntalze the postve set P an unlabele set U as esrbe above. In Step 4, e run an exstng PU learnng algorthm (varous PU learnng tehnques an be apple to bul fferent lassfers to onstrut a lassfer Q. We then employ the lassfer Q to lassfy the test nstanes n U n Steps 5 to 8. Those nstanes that are lassfe by Q as negatve lass are ae to UE as unexpete nstanes. After e have terate through all the test nstanes, Step 9 outputs the unexpete set UE. 3.2 The Propose Tehnque: LGN In tratonal lassfaton, the tranng an test nstanes are ran nepenently aorng to some fxe strbuton D over X Y, herex enotes the set of possble ouments n our text lassfaton applaton, an Y ={ 1, 2,..., n }enotes the knon lasses. Theoretally, for eah lass, f ts tranng an test nstanes follo the same strbuton, a lassfer learne from the tranng nstanes an be use to lassfy the test nstanes nto the n knon lasses. In our problem, the tranng set Tr th nstanes from lasses 1, 2,..., n are stll ran from the strbuton D. Hoever, the test set T onssts of to subsets, T.P (alle postve nstanes n T an T.N (alle unexpete / negatve nstanes n T. The nstanes n T.P are nepenently ran 2803

3 from D, but the nstanes n T.N are ran from an unknon an fferent strbuton D u. Our obetve s to entfy all the nstanes ran from ths unknon strbuton D u,orn other ors to entfy all the hen nstanes n T.N. Let us no formally reformulate ths problem as a to-lass lassfaton problem thout labele negatve tranng examples. We frst rename the tranng set Tr as the postve set P by hangng every lass label Y to + (the postve lass. We then rename the test set T as the unlabele set U, hh omprses both hen postve nstanes an hen unexpete nstanes. The unexpete nstanes n U (or T are no alle negatve nstanes th the lass label (bear n mn that there are many hen postve nstanes n U. A learnng algorthm ll selet a funton f from a lass of funtons F: X {+, } tobe use as a lassfer that an entfy the unexpete (negatve nstanes from U. The problem here s that there are no labele negatve examples for learnng. Thus, t beomes a problem of learnng from postve an unlabele examples (PU learnng. As susse n the prevous seton, ths problem has been stue by researhers n reent years, but exstng PU tehnques performe poorly hen the number of negatve (unexpete nstanes n U s very small. To aress ths, e ll propose a tehnque to generate artfal negatve ouments base on the gven ata. Let us analyze the problem from a probablst pont of ve. In our text lassfaton problem, ouments are ommonly represente by frequenes of ors 1, 2,..., v that appear n the oument olleton, here V s alle the voabulary. Let + represent a postve or feature that haraterzes the nstanes n P an let - represent a negatve feature that haraterzes negatve (unexpete nstanes n U. IfU ontans a large proporton of postve nstanes, then the feature + ll have smlar strbuton n both P an U. Hoever, for the negatve feature -, ts probablty strbutons n the set P an U are very fferent. Our strategy s to explot ths fferene to generate an effetve set of artfal negatve ouments N so that t an be use together th the postve set P for a lassfer tranng to entfy negatve (unexpete ouments n U aurately. Gven that e use the naïve Bayesan frameork n ths ork, before gong further, e no ntroue naïve Bayesan lassfer for text lassfaton. NAÏVE BAYESIAN CLASSIFICATION Naïve Bayesan (NB lassfaton has been shon to be an effetve tehnque for text lassfaton [Les, 1994; MCallum an Ngam, 1998]. Gven a set of tranng ouments D, eah oument s onsere an orere lst of ors. We use,k to enote the or n poston k of oument, here eah or s from the voabulary V = { 1, 2,..., V }. The voabulary s the set of all ors e onser for lassfaton. We also have a set of preefne lasses, C ={ 1, 2,..., C }. In orer to perform lassfaton, e nee to ompute the posteror probablty, Pr(, here s a lass an s a oument. Base on the Bayesan probablty an the multnomal moel, e have D r( 1 r( = Ρ Ρ = D (1 an th Laplaan smoothng, D 1+ N(, Ρr( = 1 t (2 Ρ r( t = V D V + N(, Ρr( s= 1 = 1 s here N( t, s the ount of the number of tmes that the or t ours n oument an Pr( {0,1} epenng on the lass label of the oument. Fnally, assumng that the probabltes of the ors are nepenent gven the lass, e obtan the NB lassfer: Ρr( Ρr( = 1, k k Ρ r( = C (3 Ρr( Ρr( = 1 = 1, r r k k r In the nave Bayesan lassfer, the lass th the hghest Pr( s assgne as the lass of the oument. GENERATING NEGATIVE DATA In ths subseton, e present our algorthm to generate the negatve ata. Gven that n a naïve Bayesan frameork, the ontonal probabltes Pr( t - (Equaton (2 are ompute base on the aumulatve frequenes of all the ouments n the negatve lass, a sngle artfal negatve nstane A N oul ork equally ell for Bayesan learnng. In other ors, e nee to generate the negatve oument A N n suh aaytoensurepr( + + Pr( + - > 0 for a postve feature + an Pr( - + Pr( - - < 0 for a negatve feature -.We use an entropy-base metho to estmate f a feature n U has sgnfantly fferent ontonal probabltes n P an n U (.e, (Pr( + an Pr( -. The entropy equaton s: entropy( = Pr( *log(pr( (4 { +, } The entropy values sho the relatve srmnatory poer of the or features: the bgger a feature s entropy s, the more lkely t has smlar strbutons n both P an U (.e. less srmnatory. Ths means that for a negatve feature -, ts entropy entropy( - ssmallaspr( - - ( - manly ourrng n U s sgnfantly larger than Pr( - +, hle entropy( + s large as Pr( + + an Pr( + - are smlar. The entropy (an ts ontonal probabltes an therefore nate hether a feature belongs to the postve or the negatve lass. We generate features for A N base on the entropy nformaton, eghte as follos: entropy( q( (5 = 1 max = 1,2..., V ( entropy( If q( = 0, t means that unformly ours n both P an U an e therefore o not generate n A N.Ifq( =1, e an be almost ertan that s a negatve feature an e generate t for A N, base on ts strbuton n U. Inthsay, those features that are eeme more srmnatory ll be generate more frequently n A N. For those features th q( beteen the to extremes, ther frequenes n A N are generate proportonally. We generate the artfal negatve oument A N as follos. Gven the postve set P an the unlabele set U, e ompute eah or feature s entropy value. The feature s frequeny n the negatve oument A N s then ranomly generate follong a Gaussan strbuton aorng to q( =1-entropy( /max(entropy(, V. The etale algorthm s shon n Fgure

4 1. A N = ; 2. P = tranng ouments from all lasses (treate as postve; 3. U = T (test set, gnore the lass labels n T f present; 4. For eah feature U 5. Compute the frequeny of n eah oument k freq(, k, k U; 6. Let mean freq(, k here D k D s the set of μ =, D ouments n ontanng 7. Let varane ; σ = ( freq(, k μ ( D 1 k D 8. For eah feature V 9. Compute Pr( +, Pr( - usng Equaton (2 assumng that all the ouments n U are negatve; 10. Let entropy ( = Pr( *log(pr( ; { +, } 11. Let m =max(entropy(, =1,..., V ; 12. For eah feature V 13. entropy( q( = 1 ; m 14. For =1to D *q( 15. Generate a frequeny fne(, usng the Gaussan strbuton ( x μ 2 2σ e μ 2π A N = A N {(, fne( } 17. Output A N Fgure 2. Generatng the negatve oument A N In the algorthm, Step 1 ntalzes the negatve oument A N (hh onssts of a set of feature-frequeny pars to the empty set hle Steps 2 an Step 3 ntalze the postve set P an the unlabele set U. FromStep4toStep7,for eah feature that appeare n U, e ompute ts frequeny n eah oument, an then alulate the frequeny mean an varane n those ouments D that ontan. These nformaton are use to generate A N later. From Step 8 to Step 10, e ompute the entropy of usng Pr( + an Pr( - (hh are ompute usng Equaton (2 by assumng that all the ouments n U are negatve. After obtanng the maxmal entropy value n Step 11, e generate the negatve oument A N n Steps 12 to 16. In partular, Step 13 omputes q(, hh shos ho negatve a feature s n terms of ho fferent the s strbutons n U an n P are the bgger the fferene, the hgher the frequeny th hh e generate the feature. Steps 14 to 16 s an nner loop an D *q( ees the number of tmes e generate a frequeny for or.thus,fq( s small, t means that has ourre n both P an U th smlar probabltes, an e generate feer.otherse, s qute lkely to be a negatve feature an e generate t th a strbuton smlar to the one n U. Ineahteraton, Step 15 uses a Gaussan strbuton th orresponng an σ to generate a frequeny fne( for.step16 plaes the par (, fne( nto the negatve oument A N. Fnally, Step 17 outputs our generate negatve set A N. Note that the frequeny for eah feature n A N may not of an nteger value as t s generate by a Gaussan strbuton. A N s essentally a ranomly generate aggregate oument that summarzes the unlabelle ata set, but th the features natve of postve lass ramatally reue. BUILDING THE FINAL NB CLASSIFIER Fnally, e esrbe ho to bul an NB lassfer th the postve set P an the generate sngle negatve oument A N to entfy unexpete oument nstanes. The etale algorthm s shon n Fgure UE = ; 2. Bul a naïve Bayesan lassfer Q th P an {A N }usng Equatons (1 an (2; 3. For eah oument U 4. Usng Q to lassfy usng Equaton (3; 5. If (Pr(- >Pr(+ 6. UE = UE { }; 7. output UE; Fgure 3. Bulng the fnal NB lassfer UE stores the set of unexpete ouments entfe n U (or test set T, ntalze to empty set n Step 1. In Step 2, e use Equatons (1 an (2 to bul a NB lassfer by omputng the pror probabltes Pr(+ an Pr(-, an the ontonal probabltes of Pr( + an Pr( -. Clearly, Pr( + an Pr( - an be ompute base on the postve set P an the sngle negatve oument A N respetvely (A N an be regare as the average oument of a set of vrtual negatve ouments. Hoever, the problem s ho to ompute the pror probabltes of Pr(+ an Pr(-. It turns out that ths s not a maor ssue e an smply assume that e have generate a negatve oument set that has the same number of ouments as the number of ouments n the postve set P. We ll report expermental results that support ths n the next seton. After bulng the NB lassfer Q, e use t to lassfy eah test oument n U (Steps 3-6. The fnal output s the UE set that store all the entfe unexpete ouments n U. 4 Empral Evaluaton In ths seton, e evaluate our propose tehnque LGN. We ompare t th both one-lass SVM (OSVM, e use LIBSVM an exstng PU learnng methos: S-EM [Lu et al., 2002], PEBL [Yu et al., 2002] an Ro-SVM [L an Lu, 2003]. S-EM an Ro-SVM are publly avalable 1. We mplemente PEBL as t s not avalable from ts authors. 4.1 Datasets For evaluaton, e use the benhmark 20 Nesgroup olleton, hh onssts of ouments from 20 fferent UseNet susson groups. The 20 groups ere also ategorze nto 4 man ategores, omputer, rereaton, sene, an talk. We frst perform the follong to sets of experments: 2-lasses: Ths set of experments smulates the ase n hh the tranng ata has to lasses,.e. our postve set P ontans to lasses. The to lasses of ata ere hosen

5 from to man ategores, omputer an sene, n hh the omputer group has fve subgroups, an the sene group has four subgroups. Every subgroup onssts of 1,000 ouments. Eah ata set for tranng an testng s then onstrute as follos: The postve ouments for both tranng an testng onsst of ouments from one subgroup (or lass n omputer an one subgroup (or lass n sene. Ths gves us 20 ata sets. For eah lass (or subgroup, e parttone ts ouments nto to stanar subsets: 70% for tranng an 30% for testng. That s, eah postve set P for tranng ontans 1400 ouments of to lasses, an eah test set U ontans 600 postve ouments of the same to lasses. We then a negatve (unexpete ouments to U, hh are ranomly selete from the remanng 18 groups. In orer to reate fferent expermental settngs, e vary the number of unexpete ouments, hh s ontrolle by a parameter α, a perentage of U,.e., the number of unexpete ouments ae to U s α U. 3-lasses: Ths set of experments smulates the ase n hh the tranng ata has three fferent lasses,.e. our postve set P ontans three lasses of ata. We use the same 20 ata sets forme above an ae another lass to eah for both P an U. The ae thr lass as ranomly selete from the remanng 18 groups. For eah ata set, the unexpete ouments n U ere then ranomly selete from the remanng 17 nesgroups. All other settngs ere the same as for the 2-lasses ase. 4.2 Expermental Results 2-lasses: We performe experments usng all possble 1 an 2 ombnatons (.e., 20 ata sets. For eah tehnque, namely, OSVM, S-EM, Ro-SVM, PEBL an LGN, e performe 5 ranom runs to obtan the average results. In eah run, the tranng an test oument sets from 1 an 2 as ell as the unexpete oument nstanes from the other 18 lasses ere selete ranomly. We vare α from 5% to 100%. Table 1 shos the lassfaton results of varous tehnques n terms of F-sore (for negatve lass hen α = 5%. The frst olumn of Table 1 lsts the 20 fferent ombnatons of 1 an 2. Columns 2 to 5 sho the results of four tehnques OSVM, S-EM, Ro-SVM an PEBL respetvely. Column 6 gves the orresponng results of our tehnque LGN. We observe from Table 1 that LGN proues the best results onsstently for all ata sets, ahevng an F-sore of 77.0% on average, hh s 54.8%, 32.8%, 60.2% an 76.5% hgher than the F-sores of exstng four tehnques (OSVM, S-EM, Ro-SVM an PEBL respetvely n absolute terms. We also see that LGN s hghly onsstent aross fferent ata sets. In fat, e have heke the frst step of the three exstng PU learnng tehnques an foun that most of the extrate negatve ouments ere rong. As a result, n ther respetve seon steps, SVM an EM ere unable to bul aurate lassfers ue to very nosy negatve ata. Sne the S-EM algorthm has a parameter, e tre fferent values, but the results ere smlar. Table 1. Expermental results for α =5%. Data Set OSVM S-EM Ro-SVM PEBL LGN graph-rypt graph-eletro graph-me graph-spae os-rypt os-eletrons os-me os-spae ma.harare-rypt ma.harare-eletro ma.harare-me ma.harare-spae bm.harare-rypt bm.harare-eletro bm.harare-me bm.harare-spae nos-rypt nos-eletro nos-me nos-spae Average Fgure 4 shos the maro-average results of all α values (from 5% to 100% for all fve tehnques n the 2-lasses experments. Our metho LGN outperforme all others sgnfantly for α 60%. When α as nrease to 80% an 100%, Ro-SVM aheve slghtly better results than LGN. We also observe that OSVM, S-EM an Ro-SVM outperforme PEBL sne they ere able to extrat more relable negatves than the 1-DNF metho use n PEBL. PEBL neee a hgher α (200% to aheve smlar goo results. F-sore LGN S-EM Ro-SVM PEBL OSVM 5% 10% 15% 20% 40% 60% 80% 100% a % of unexpete ouments Fgure. 4. The omparson results th fferent perentages of unexpete ouments n U n the 2-lasses experments. 3-lasses: Fgure 5 shos the 3-lasses results here LGN stll performe muh better than the methos hen the proporton of unexpete ouments s small (α 60% an omparably th S-EM an Ro-SVM hen the proporton s larger. OSVM s results are muh orse than S-EM, Ro-SVM an LGN hen α s larger, shong that PU learnng s better than one-lass SVM n the problem. Agan, PEBL requre a muh larger proporton of unexpete ouments to proue omparable results. 2806

6 F-sore In summary, e onlue that LGN s sgnfantly better (th hgh F-sores than the other tehnques hen α s small (α 60%, hh nates that t an be use to effetvely extrat unexpete ouments from the test set even n the hallengng senaros n hh ther presene n U s non-obvous. The other methos all fale baly hen α s small. LGN also performe omparably n the event hen the proporton of unexpete nstanes s large (α 80%. Fnally, e also onute 10-lasses experments n hh ten fferent lasses from both the 20 Nesgroups an Reuter olletons (th same expermental settng for the 3-lasses ere use. The behavors of the algorthms for 10 lasses ere the same as for 2 lasses an 3 lasses. Usng the Reuter olleton th 10 lasses an α set to 5%, 10%, 15%, 20% an 40%, our algorthm LGN aheve 32.77%, 32.14%, 27.82%, 18.43%, 11.11% hgher F-sores respetvely than the best results of the exstng methos (OSVM, S-EM, Ro-SVM an PEBL. Smlarly, usng 10 lasses from the 20 nesgroup olleton, LGN aheve 10.56%, 4.80%, 5.46%, 6.20%, an 4.00% hgher F-sores for α =5%, 10%, 15%, 20% an 40% of unexpete ouments respetvely than the best of the four other exstng methos. Effet of prors: Reall that n Seton 3 e have left the pror probabltes as a parameter sne e only generate a sngle artfal negatve oument. To hek the effet of prors, e also vare the pror n our experments by hangng the proporton of negatve ouments as a perentage of the number of postve ouments n P. Wetre 40%, 60%, 80% an 100%. The results ere vrtually the same, th average fferenes only thn ±1%.Thus,e smply hoose 100% as the efault of our system, hh gves us Pr(+ = Pr(- = 0.5. All the expermental results reporte here ere obtane usng ths efault settng. 5 Conluson LGN S-EM Ro-SVM PEBL OSVM 5% 10% 15% 20% 40% 60% 80% 100% a % of unexpete ouments Fgure. 5. The omparson results th fferent perentages of unexpete ouments n U n the 3-lasses experments. In real-orl lassfaton applatons, the test ata may ffer from the tranng ata beause unexpete nstanes that o not belong to any of the preefne lasses may be present (or emerge n the long run an they annot be entfe by tratonal lassfaton tehnques. We have shon here that the problem an be aresse by formulatng t as a PU learnng problem. Hoever, retly applyng exstng PU learnng algorthms performe poorly as they requre a large proporton of unexpete nstanes to be present n the unlabele test ata, hh s often not the ase n prate. We then propose a novel tehnque LGN to entfy unexpete ouments by generatng a sngle artfal negatve oument to help tran a lassfer to better etet unexpete nstanes. Our expermental results n oument lassfaton emonstrate that LGN performe sgnfantly better than exstng tehnques hen the proporton of unexpete nstanes s lo. The metho s also robust rrespetve of the proportons of unexpete nstanes present n the test set. Although our urrent experments ere performe n the text lassfaton applaton usng an NB lassfer, e beleve that the approah s also applable to other omans. Usng a sngle artfal negatve oument, hoever, ll not sutable for other learnng algorthms. In our future ork, e plan to generate a large set of artfal ouments so that other learnng methos may also be apple. Referenes [Crammer an Chehk, 2004] K. Crammer an G. Chehk. A neele n a haystak: loal one-lass optmzaton, ICML, [Dempster et al., 1977] A. Dempster, N. Lar an D. Rubn, Maxmum lkelhoo from nomplete ata va the EM algorthm, Journal of the Royal Statstal Soety, [Dens, 1998] F. Dens, PAC learnng from postve statstal queres. ALT, [Dens, 2002] F. Dens, R. Glleron, an M. Tommas. Text lassfaton from postve an unlabele examples. IPMU, [Fung, 2005] G. Fung, J. Yu, H. Lu, an P Yu. Text Classfaton thout Labele Negatve Douments. ICDE, [Les an Gale, 1994] D. Les an W. Gale. A sequental algorthm for tranng text lassfers. SIGIR, [L an Lu, 2003] X. L, an B. Lu. Learnng to lassfy text usng postve an unlabele ata. IJCAI, [Lu et al., 2002] B. Lu, W. Lee, P. Yu, an X. L. Partally supervse lassfaton of text ouments. ICML, [Manevtz an Yousef, 2001] L. Manevtz, an M. Yousef. One lass SVMs for oument lassfaton. Journal of Mahne Learnng Researh, 2, , [MCallum an Ngam, 1998] A. MCallum, an K. Ngam, A omparson of event moels for naïve Bayes text lassfaton. AAAI, [Muggleton, 2001] S. Muggleton. Learnng from the postve ata. Mahne Learnng, [Roho, 1971] J. Roho. Relevant feebak n nformaton retreval. G. Salton. The smart retreval system: experments n automat oument proessng, [Sholkopf et al., 1999] B. Sholkopf, J. Platt, J. Shae, A. Smola & R. Wllamson. Estmatng the support of a hgh-mensonal strbuton. Tehnal Report MSR-TR-99-87, Mrosoft Researh, [Yu et al., 2002] H. Yu, J. Han, an K. Chang. PEBL: Postve example base learnng for Web page lassfaton usng SVM. KDD,

Discriminative Estimation (Maxent models and perceptron)

Discriminative Estimation (Maxent models and perceptron) srmnatve Estmaton Maxent moels an pereptron Generatve vs. srmnatve moels Many sles are aapte rom sles by hrstopher Mannng Introuton So ar we ve looke at generatve moels Nave Bayes But there s now muh use