290 Concuson Wthn the framework of the Bayesan earnng theory, we anayze a cassfer generazaton abty for the recognton on fnte set of events. It was shown that the obtane resuts can be appe for cassfcaton tree prunng. Numerc experments showe that the Bayesan prunng has at east the same effcency or better than stanar reuce error prunng, an at the same tme s more resstant to overtranng. Acknowegements Ths work was supporte by the Russan Founaton of Basc Research, grant 04-01-00858a Bbography [1] Lbov, G.S., Startseva, N.G., About statstca robustness of ecson functons n pattern recognton probems. Pattern Recognton an Image Anayss, 1994. Vo 4. No.3. pp.97-106. [2] Berkov V.B., Ltvnenko A.G. The nfuence of pror knowege on the expecte performance of a cassfer. Pattern Recognton Letters, Vo. 24/15, 2003, pp. 2537-2548. [3] UCI Machne Learnng Database Repostory. http://www.cs.uc.eu/~mearn/ MLRepostory.htm [4] Qunan, J.R. C4.5: Programs for Machne Learnng. Morgen Kaufmann, San Mateo, CA, 1989. [5] Berkov V.B. A pror estmates of recognton quaty for screte features. Pattern Recognton an Image Anayss. V. 12, N 3, 2002. pp. 235-242. Author's Informaton Vamr Berkov Soboev Insttute of Mathematcs SD RAS, Koptyug pr.4, Novosbrsk, Russa, 630090; e ma: berkov@math.nsc.ru EXTREME SITUATIONS PREDICTION B MULTIDIMENSIONAL HETEROGENEOUS TIME SERIES USING LOGICAL DECISION FUNCTIONS 1 Svetana Nee ko Abstract: A metho for precton of mutmensona heterogeneous tme seres usng ogca ecson functons s suggeste. The metho mpements smutaneous precton of severa goa varabes. It uses ecng functon constructon agorthm that performs recte search of some varabe space parttonng n cass of ogca ecng functons. To estmate a ecng functon quaty the reazaton of nformatvty crteron for contona strbuton n goa varabes' space s offere. As an ncator of extreme states, an occurrence a transton wth sma probabty s suggeste. Keywors: mutmensona heterogeneous tme seres anayss, ata mnng, pattern recognton, cassfcaton, statstca robustness, ecng functons. ACM Cassfcaton Keywors: G.3 Probabty an Statstcs: Tme seres anayss; H.2.8 Database Appcatons: Data mnng; I.5.1 Pattern Recognton: Statstca Moes 1 The work s supporte by RFBR, grant 04-01-00858-a
291 Introucton The specfcs of mutmensona heterogeneous tme seres anayss conssts n smutaneous precton of severa goa 2 varabes. But the most of known agorthms construct ecson 3 functon for each goa varabe separatey. Such approach ooses some nformaton about features nterepenences [Mrenkova, 2002]. 2 The next probem s strong ncreasng of mensonaty when anaysng wnow ength ncreases. So one has to ether smpfy 1 ecson functons cass or make the wnow shorter. The probem of nsuffcent sampe appears much more essenta 1 [Rauys, 2001] when rare events are to be precte. 1 2 3 In ths work an agorthm of precton mutmensona heterogeneous tme seres base on fnng certan parttonng Fg. 1. that maxmzes nformatvty crteron [Lbov, Nee ko, 2001] for matrx of transtons between parttonng areas. Ths aows to avo ncreasng compexty when a wnow get onger, but precton ooses accuracy. Extreme stuatons are characterse by ow number of preceents n a pero uner observaton. Therefore, one nee statstcay robust methos of mutmensona heterogeneous tme seres forecast. It mght be nterestng aso to prect events havng ony a few preceents or may be no preceents at a. In ths case t seems to be mpossbe to forecast extreme stuatons themseves, but one cou catch changng a probabstc moe of tme seres an conser ths as an ncator of abnorma process behavour. Probem Defnton Let a ranom n-mensona process Z(t)=(Z 1 (t),, Z n (t)) wth screte tme be gven. Features may ncue both contnuous an screte (wth orere or unorere vaues) ones. Suppose that for a tme moment t vaues of n varabes epen on ts vaues n prevous tme moments,. e. on a wnow of ength. The most agorthms for precton mutmensona tme seres use repacement of tme seres sampe by a sampe n form of ata tabe. Ths s mae va new notaton: goa vaues are esgnate as j (t)=z j (t), an prevous vaues (prehstory) as X j (t)=z j (t-1), X j+n (t)=z j (t-2),, X j+n(-1) (t)=z j (t-) j = 1,, n., =, may be represente ke a sampe v {( x, y ) = 1, N } N = T the sampe sze. Here y = ( y y,..., y ), y j = j (), x ( x x,..., x ) Now any tme seres reazaton Z () t t 1, T j =, where 1,..., n = 1,..., m, x j = X j (+), m = n prector space mensonaty. Note that the frst tme moments have no prehstory of ength. Such notaton aows usng a ata mnng methos to prect each feature separatey. They may be for exampe cassfcaton or regresson anayss methos n ogca ecson functons [Lbov, Startseva, 1999]. But ths approach negects features nterepenence, so t s possbe to construct an exampes where separate ecson functons gve ncompatbe forecast [Mrenkova, 2002]. Let s conser an exampe that shows the weakness of separate feature forecast. Suppose two screte features are gven an probabstc measure on them s ke shown on fgure 1. Each of back ponts has probabty 0,25; another ponts have probabty zero. Methos those make ecson for every feature separatey gve precte vaue marke by whte crce. But such vaue combnaton w never occur. Ths exampe shows necessty n methos constructng a ecson rue for a features together because nterepenences are mportant. One nee aso to use ecson n form of an area (n the exampe such area contans four back ponts), but not a snge pont. In ths work, we suggest not to separate features onto X an but to bu parttonng n space Z recty. j
292 Quaty Crteron Let s ntrouce quaty crteron for ecson n form of areas f goa features space. Such type crtera were propose n [Rostovtsev, 1978]. It s sutabe now to conser agan separatey D X space of prectors an D a goa features space. Let P(E ) an P( E x) be uncontona an contona measures for E D. Suppose a set { E D = 1 k} ( ) = ( k B P( E x) P( E ) B =,..., of non-ntersecte areas to be gven. Then quaty crteron w be K = 1 * =.. Optma ecson n x w be B arg max K( ) B Quaty crteron for contona probabstc measure may be efne as K( P[ D x] ) max K( B ) =. Ths crteron s some kn of stance between contona by gven x an uncontona measures on goa features space. There are known mofcatons those use unform strbuton nstea of uncontona one. If B s a parttonng of D one nees to use mofe crteron: K k ( B ) = P( E x) P( E ) = 1 B. (1) It ffers n takng absoute vaues. When the strbuton s unknown an we have a sampe ony we can t estmate crteron for each nee to bu some parttonng λ of D X. Then λ) = K( P[ D E ]) K ( P( E ) w be ntegra ecson quaty crteron. E X λ A probabtes n expresson may be estmate on sampe. Agorthm Suggeste agorthm makes parttonng recty n space Snce parttonng = { E DZ = 1, k} sequence () t { β = 1, k} X X = n j= 1 x DX, so D Z D j, where D j a set of feature Z j a vaues. λ was fxe nta tme seres Z(t) may be represente by one symboc β, where β a symbo corresponent to area E, an β(t) = β when Z() t E. Crteron (1) may be appe to transton matrx of process β(t): k k k k k K ( λ ) =... p... p j...... p j... j, 0 0 1 0 1 0 = 1 = 1 j0 = 1 j1 = 1 j= 1 (2) where = ( β ( τ ) = β τ p... P t ) = P ( ( ) ) 0 Z t τ E τ the probabty of gven prehstory of τ = 0 τ = 0 ength. To obtan sampe estmaton of the crteron nee to repace p 0... by N 0... N a rate of prehstory appearance n the sampe. Transton probabtes for parttonng areas are a kn of mut-varant ecson functons [Lbov, Nee ko, 2001]. Note that a parttonng λ may be constructe n any approprate cass, e. g. by near scrmnatng functons or by ogca ecng functons (ecson trees).
293 Logca Decson Functons For constructng a parttonng λ we sha use agorthm LRP [Lbov, Startseva, 1999] that bus a ecson tree. Ths agorthm was esgne frst for cassfcaton task an appe then for varous tasks of ata anayss by usng speca quaty crtera. The agorthm bus a parttonng onto mutmensona ntervas. Here an nterva s a set of neghbour vaues when orer s efne or any subset of vaues f feature vaues are unorere. Mutmensona nterva s a Cartesan prouct of ntervas. Agorthm LRP makes sequenta parttonng the space D onto gven number of areas. 1 Snce parttonng { E,..., E,..., E s }, E D, was constructe on step s 1, on step s the agorthm goes over the a areas an seects one that beng spt by a possbe ways onto two sub-areas proves crteron maxmum. Then these sub-areas repace nta area an the process s repeate unt k areas been prouce. The parttonng may be represente by ecson tree. Each non-termna noe s corresponent to some precate P ( z ) Rare Events Precton j E j, E j D j. Each termna noe correspons to an area of the parttonng λ. Extreme stuatons are characterse by ow number of preceents n a sampe. Therefore, statstca robustness of the methos use s especay actua. Propose metho of mutmensona heterogeneous tme seres precton prove hgh robustness. Nevertheess, t may be not enough f there are ony severa preceents. Moreover, t mght be nterestng to prect events havng no preceents. Obvousy, n ths case reabe precton s mpossbe, but one cou try to mark tme moments where extreme stuaton s probabe. One of ncators for such tme moments may be changng a probabstc moe of tme seres. Snce we represent nta tme seres by corresponent Markov chan, a reate mathematca resuts are avaabe. So, a moment of changng a probabstc moe can be reveae. Another ncator of process abnormaty mght be occurrng n corresponent symboc chan a transton wth sma probabty. Concuson Methos of smutaneous precton the a varabes of mutmensona heterogeneous tme seres aows usng features nterepenence nformaton n comparson wth metho of separate constructng a ecson functon for each feature. It s possbe aso to bu ecson base on parttonng nta features space that ecreases agorthm compexty. As quaty crteron the metho uses transton matrx nformatvty that was ntrouce. The metho propose represents nta tme seres by corresponent Markov chan that aows avong great ncreasng compexty when consere prehstory ength ncreases. Ths s especay mportant for prectng rare events. Such representaton aso aows appyng a mathematca resuts reate to Markov chans. To prect tme moments when extreme stuatons have hgher probabty here was suggeste usng changes n probabstc moe of tme seres. Bbography [Lbov, Startseva, 1999] Lbov G.S., Startseva N.G. Logca ecng functons an questons of statstca stabty of ecsons. Novosbrsk: Insttute of mathematcs, 1999. 211 p. (n Russan). [Rostovtsev, 1978] P. S. Rostovtsev. Typoogy constructng agorthm for bg sets of soca-economy nformaton. // Moes for aggregatng a soca-economy nformaton. Proceengs, pub. IE an SPP SB AS USSR, 1978. (n Russan). [Lbov, Nee ko, 2001] G.S. Lbov, V.M. Nee ko. A Maxmum nformatvty crteron for the forecastng severa varabes of fferent types. // Computer ata anayss an moeng. Robustness an computer ntensve methos. Mnsk, 2001, vo 2, p. 43 48.
294 [Rauys, 2001] Rauys S., Statstca an neura cassfers, Sprnger, 2001. [Mrenkova, 2002] S. V. Mrenkova (Nee ko). A metho for precton mutmensona heterogeneous tme seres n cass of ogca ecson functons // Artfca Integence, No 2, 2002, p. 197 201. (n Russan). Author's Informaton Svetana Vaeryevna Nee ko Insttute of Mathematcs SB RAS, Laboratory of Data Anayss, 630090, pr. Koptyuga, 4, Novosbrsk, Russa, e-ma: neeko@math.nsc.ru EVALUATING MISCLASSIFICATION PROBABILIT USING EMPIRICAL RISK 1 Vctor Nee ko Abstract: The goa of the paper s to estmate mscassfcaton probabty for ecson functon by tranng sampe. Here are presente resuts of nvestgaton an emprca rsk bas for nearest neghbours, near an ecson tree cassfer n comparson wth exact bas estmatons for a screte (mutnoma) case. Ths aows to fn out how far Vapnk Chervonenks rsk estmatons are off for consere ecson functon casses an to choose optma compexty parameters for constructe ecson functons. Comparson of near cassfer an ecson trees capactes s aso performe. Keywors: pattern recognton, cassfcaton, statstca robustness, ecng functons, compexty, capacty, overtranng probem. ACM Cassfcaton Keywors:I.5.1 Pattern Recognton: Statstca Moes Introucton One of the most mportant probems n cassfcaton s estmatng a quaty of ecson but. As a quaty measure, a mscassfcaton probabty s usuay use. The ast vaue s aso known as a rsk. There are many methos for estmatng a rsk: vaaton set, eave-one-out metho etc. But these methos have some savantages, for exampe, the frst one ecreases a voume of sampe avaabe for bung a ecson functon, the secon one takes extra computatona resources an s unabe to estmate rsk evaton. So, the most attractve way s to evauate a ecson functon quaty by the tranng sampe mmeatey. But an emprca rsk or a rate of mscassfe objects from the tranng sampe appears to be a base rsk estmate, because a ecson functon quaty beng evauate by the tranng sampe usuay appears much better than ts rea quaty. Ths fact s known as an overtranng probem. To sove ths probem n [Vapnk, Chervonenks, 1974] there was ntrouce a concept of capacty (compexty measure) of a ecson rues set. The authors obtane unversa ecson quaty estmatons, but these VC estmatons are not accurate an suggest pessmstc rsk expectatons. For a case of screte feature n [Nee ko, 2003] there were obtane exact estmatons of emprca rsk bas. Ths aows fnng out how far VC estmatons are off. The goa of ths paper s to extrapoate the resut on contnuous case ncung near an ecson tree cassfers. 1 The work s supporte by RFBR, grant 04-01-00858-a