2011 Interntionl Conference on ocument Anlysis n Recognition Stroke-Bse Performnce Metrics for Hnwritten Mthemticl Expressions Richr Znii rlz@cs.rit.eu Amit Pilly p2731@rit.eu eprtment of Computer Science Rochester Institute of Technology, NY, SA Hrol Mouchère hrol.mouchere@univ-nntes.fr Christin Vir-Guin christin.vir-guin@univ-nntes.fr LNAM, niversité e Nntes IRCCyN, Frnce orothe Blostein lostein@cs.queensu.c School of Computing, Queen s niversity, Cn Astrct Evluting mthemticl expression recognition involves complex interction of input primitives (e.g. pen/finger strokes), recognize symols, n recognize sptil structure. Existing performnce metrics simplify this prolem y seprting the ssessment of sptil structure from the ssessment of symol segmenttion n clssifiction. These metrics o not chrcterize the overll ccurcy of pense mthemtics recognition, mking it ifficult to compre mth recognition lgorithms, n preventing the use of mchine lerning lgorithms requiring criterion function chrcterizing overll system performnce. To ress this prolem, we introuce performnce metrics tht rige the gp from hnwritten strokes to sptil structure. Our metrics re compute using iprtite grphs tht represent clssifiction, segmenttion n sptil structure t the stroke level. Overll correctness of n expression is mesure y counting the numer of relelings of noes n eges neee to mke the iprtite grph for recognition result mtch the iprtite grph for groun truth. This metric my lso e use with other primitive types (e.g. imge pixels). Keywors-Performnce Evlution; Mth Recognition; Hnwriting Recognition; Grphics Recognition I. INTROCTION Evluting the performnce of ocument nlysis systems is n importnt n ifficult prolem. As recently summrize y Silv [1], much of the ifficulty stems from iversity in gols, input types, input omins, concepts, output grnulrity, evlution moments, n evlution metrics. Our work focuses on ressing performnce evlution issues tht rise ue to iversity in grnulrity. Performnce metrics re simpler to efine for computtions in which input n output items hve similr grnulrity. In the cse of mth recognition, the grnulrity iffers mrkely, with sptil rrngement of strokes or symols s input, n hierrchicl lyout escription (e.g. L A TEX, see Figure 1) n/or representtion of mening (e.g. Content MthML, OpenMth) s output. There is nee for stnr metrics tht permit meningful comprison of mth recognition results [2], [3], oth for compring systems, n for use with mchine lerning lgorithms tht optimize system performnce. Our primry contriution is iprtite grph-se representtion of expression structure t the level of primitives (e.g. strokes), tht cptures errors in oth recognize symols n lyout. We were motivte to use representtion t the primitive rther thn symol level, ecuse the istinction etween symol segmenttion n structure recognition is sometimes lurre. For exmple, the configurtion = cn e viewe either s single symol consisting of two sptilly seprte strokes, or s two symols (short horizontl lines) with one top nother. We consier isolte expressions in this pper, ut our metho my e pte in stright-forwr mnner for multiple expressions, flowchrts, tles, n even imges using pixel regions such s connecte components or imge ptches. The secon contriution of the pper is set of new error metrics se in our iprtite representtion. This inclues metrics tht chrcterize overll recognition ccurcy, proviing criterion function for expressions, se on strokes s primitives; existing criterion functions use symols s primitives, s iscusse in Section II. We ssume tht it is possile to efine groun truth interprettion for given set of strokes. This exclues unersegmente strokes, in which single stroke is use to prouce two items tht must e represente seprtely in \frc{}{ˆ} ) Input ) L A TEX c) Symol Lyout Tree (: up, : own, S: superscript) Figure 1. Hnwritten Expression Contining Five Strokes n Four Symols. A LATEX string () or equivlent Symol Lyout Tree (c) my e use to represent symol rrngement 1520-5363/11 $26.00 2011 IEEE OI 10.1109/ICAR.2011.75 334
the groun truth interprettion (e.g. for cursive x written using single stroke, ut contining two symols). For etile performnce nlysis, ifferent metrics re neee to chrcterize ccurcy for specific tsks: segmenttion, clssifiction, n prsing. However, in some situtions single vlue chrcterizing performnce is neee, such s when using mchine lerning lgorithms to optimize system performnce s whole. Thus, in this pper, we iscuss severl component metrics (section IV) n lso iscuss methos of comining these into single overll estimte of performnce (Section V). II. PREVIOS WORK IN EVALATING MATH RECOGNITION Mthemticl expression recognition is n ctive reserch fiel, oth for on-line n off-line t [4], [5], [6], Grounthruthe tset re now ville (e.g. [7] (off-line) n [8] (online)). As pointe out y oth Lpointe n Blostein [2] n Awl et l. [3], the mth recognition omin now nees stnr evlution metrics to support comprisons of existing n newly evelope systems. Most existing pproches to evluting mth recognition compute istnce etween the recognize expression n the groun truth, ccoring to ifferent spects. The expression recognition rte is common, ut glol n reltively uninformtive, s it counts only expressions tht precisely mtch groun truth. Symol recognition rte oes not consier symol lyout; Bseline recognition checks only if symols pper on the correct seline reltive to symol. Some metrics, such s the verge performnce inex [9] weight errors epening on the epth of nesting for selines in n expression. A ifficulty in efining n ccurcy metric for symol lyout in mth, is tht the tree-se representtion neee to represent symol lyout is unsuitle for use with clssicl metrics use for text recognition, such s the Levenstein eit istnce. One solution is to use tree-eit istnce, ut this is not use in prctice, ecuse of the NP complexity of the existing lgorithms to mtch oth tree eges n noes. An interesting solution y Grin et l. [10] proposes to trnsform the tree into token string which llows one to use eit istnce. The rwck of this pproch is tht it looses some of the eit opertions offere y tree eits (like swpping chilren of noe), leing them to incur high cost in the string-se representtion. Our min contriution is iprtite grph representtion of expression structure t the level of strokes, from which metrics se on Hmming istnces my e simply efine, with n intuitive interprettion. Given set of input primitives for test expression, our representtion prevents the nee to mtch iniviul strokes, so tht only stroke lels n lyout reltionships etween strokes nee to e mtche, s escrie in the next section. III. EXPRESSION REPRESENTATION In our pproch, the recognizer output n groun truth interprettions must first e converte into iprtite grphs. This is illustrte in Figures 2 n 3. The iprtite representtion is shown in Figure 3), where the noes of the grph represent ech stroke in the expression twice: s n unlele input stroke (t left), n with n ssigne symol lel n etecte reltionships (t right, with sptil reltionships shown s incoming eges). Note tht there re N(N 1) eges in this grph, where N is the numer of strokes, n we omit eges from strokes to themselves. For legiility, eges representing no reltionship re not rwn. This iprtite grph is constructe from symol lyout tree: strokes in symol noes re split into seprte stroke noes (see Figure 2), with ech stroke possessing the sptil reltionships of its ssocite symol. All stroke noes inherit the sptil reltionships of their ncestors in the lyout tree. In Figure 2, the two strokes corresponing to the inherit the own sptil reltionship of the singlestroke. Note tht this inheritnce pplies to ll sptil reltionships, incluing continment y squre roots, n horizontl jcency: for exmple, in k + m, m inherits the Right reltionship of the + reltive to the k. The informtion presente in the AG in Figure 2) cn then e converte irectly into iprtite grph, s shown in Figure 3). Note tht strokes with the sme set of incoming sptil reltionships in Figure 3 re symols, i.e. stroke reltionships inuce segmenttion of strokes into symols. It is esier to visulize ifferences in interprettions using the iprtite representtion (s noe positions in the grph my e fixe cross interprettions), ut it is esier to visulize interprete lyout from the AG representtion. To evlute symol segmenttion seprtely from lyout, we my use secon iprtite grph: see Figure 3). An unirecte ege is plce etween ll pirs of non-ienticl strokes elonging to symol. Thus symol compose of 3 strokes is represente y 6 eges; one isolte stroke corresponing to symol is not connecte. ) Stroke Lyout Tree ) Stroke Lyout AG Figure 2. Stroke-level Groun Truth Representtions for Expression in Figure 1). Strokes re lele using s<num>. In ), the stroke lyout tree is converte to AG y ing incoming sptil reltionships t noe to ll of its escenents. Sptil reltionships re represente y : up, : own, : superscript, Su: suscript, n R: right 335
Δ(E 1,E 1 ) = 0, n the tringle inequlity: Δ(E 1,E 3 ) Δ(E 1,E 2 )+Δ(E 2,E 3 ). Clssifiction (Δ C ): The numer of strokes with ifferent symol lels in the expression grphs E 1 n E 2 : Δ C (E 1,E 2 )= {s S l(s, E 1 ) l(s, E 2 )} (1) Lyout (Δ L ): Let L 1 n L 2 e the set of lelle eges in expression grphs E 1 n E 2. Lyout isgreement is the numer of isgreeing ege lels etween non-ienticl strokes: ) Stroke lels n lyout ) Segmenttion Figure 3. Biprtite Grphs Representing the Expression in Figure 1. Noes represent strokes, n lele eges represent sptil reltionships. In ), symol clsses re shown using noe lels, n sptil reltionships using ege lels. This grph represents the sme informtion s in Figure 2). In ) segmenttion grph is shown, in which strokes elonging to symol re connecte IV. METRICS FOR SPECIFIC ERROR TYPES Given two iprtite grphs representing recognizer output n groun truth for strokes in hnwritten mth expression, recognition errors my e foun irectly s isgreeing noe or ege lels. A numer of exmples re provie in Figure 4; incorrect lels n reltionships reltive to groun truth re shown in re. ) mislele stroke (1 error, Δ C ) ) misrecognize reltionships (2 errors, Δ L ) c) segmenttion error, where the hs een split into two symols. There re two mislele strokes (clssifiction errors), n spurious sptil reltionship etween n (3 errors) ) error similr to tht in c), ut with the stem of the misrecognize s eing ove the frction line; there is lso missing reltionship etween n (5 errors) These errors re summrize in Tle I. Aitionlly, one my count the numer of isgreeing stroke pirings in segmenttion grphs s illustrte in Figure 3) (Δ S ). In Figure 4c) n ), two segmenttion grph eges from groun truth re missing. For set of strokes S n two expressions efine on S represente y iprtite grphs E 1 n E 2, we efine the metrics elow for specific stroke properties. Let e the set of ll non-ienticl stroke pirs: = {(p, q) S S p q}, where = S S 1. Ech metric elow is Hmming istnce, specificlly the numer of isgreeing lels/reltionships. As such, ech stisfies the four requirements for metric [11]: non-negtivity, symmetry (Δ(E 1,E 2 ) = Δ(E 2,E 1 ), Δ L (E 1,E 2 )= L 1 L 2 (2) Segmenttion (Δ S ): This is efine similrly to lyout, ut using unirecte segmenttion iprtite grphs (see Figure 3) B 1 n B 2 constructe on the set of strokes for ech symol reltion tree. Δ S (E 1,E 2 )= B 1 B 2 (3) V. EXPRESSION-LEVEL ISTANCE METRICS We now comine our metrics for specific error types (clssifiction, segmenttion, n lyout) into Expression- Level istnce Metrics tht efine single istnce mesure for two interprettions of n expression. First consier the istnce metric Δ B [0, 1] efine s the numer of isgreeing stroke lels n sptil reltionships, such s shown in the thumnil imges of Figure 4. This is Hmming istnce, with S 2 elements in ech vector of noe/ege lels for grph. Δ B (E 1,E 2 )= Δ C +Δ L S 2 (4) This metric is unweighte, n s result will prouce less istnce for clssifiction errors thn errors in lyout n segmenttion (represente implicitly in the lyout reltionships). As n solute mesure of the ifference etween two iprtite grphs Δ B is sufficient, ut one my wnt to weight errors to mke clssifiction errors proportionl to segmenttion n lyout errors. In prticulr, when compring lgorithms for use in prctice, or when using mchine lerning to optimize the complete recognition system, one my wnt to weight the ifferent error types. We efine metric Δ E [0, 1] s the verge per-stroke clssifiction, segmenttion n lyout errors: Δ C (E 1,E 2 ) S + Δ S (E 1,E 2 ) + Δ L (E 1,E 2 ) Δ E(E 1,E 2)= 3 (5) We use the squre root of the segmenttion n sptil reltionship istnces in orer to mke them proportionl to S rther thn S 2 (one coul inste ivie Δ L n Δ S 336
6 R R ) Clssifiction Error ( 6) ) Lyout Error ( R) 0 R 1 1 01 0 Su 1 0 c) Clssifiction n Segmenttion ( {0,1}) ) Clssifiction, Segmenttion n Lyout Figure 4. Exmple Recognition Errors for Expression in Figure 1. In the AGs errors re shown in re, n y fille noes n eges in the iprtite grphs. In the iprtite grph thumnils, noes correspon to strokes s shown in Figure 3. In prt ) there re five errors: the hs een seprte into two mis-clssifie strokes, with two spurious sptil reltionships ( n Su), n one missing reltionship (the superscript etween the n the verticl line in the ). Tle I ISTANCE BETWEEN EXPRESSIONS IN FIGRE 4 AN GRON-TRTH IN FIGRES 2B) AN 3A) Fig. 4 Δ C Δ S Δ L Δ B Δ E ) 1 0 0 0.04 0.067 ) 0 0 2 0.08 0.105 c) 2 2 1 0.12 0.313 ) 2 2 3 0.2 0.368 y S 1 for the sme reson). This prevents ifferences in segments n sptil reltionships from eing weighte less hevily thn ifferences in stroke (symol) clssifiction lels. As ech component istnce is in [0, 1], Δ E lso lies in the intervl [0, 1]. Δ B n Δ E re proper metrics. They re non-negtive, symmetric, n the istnce from lyout tree to itself is 0. As the squre root of non-negtive vlues is n orerpreserving monotonic function, the squre root of metric is lso metric. Given tht Δ C, Δ S n Δ L re proper metrics, their sum oeys the tringle inequlity y efinition. Similrly, using the verge of their sum oes not invlite the metric property. Both Δ B n Δ E require O( S 2 ) time to compute. In prctice, S tens to e reltively smll, n so the qurtic complexity is not significnt concern. Further, sent sptil reltionships nee never e explicitly compre: we cn simply count lels n reltionships present in t lest one of the two input grphs. In orer to illustrte n compre these metrics, Tle I shows these five istnces etween errors n its grountruth. Notice tht clssifiction errors re weighte more hevily, n tht in generl the compute istnce/error vlue is higher for Δ E thn Δ B. VI. GENERALIZATION: SEGMENTS AN PIXELS We hve ssume tht no stroke correspons to more thn one symol in the input (i.e. no stroke is uner-segmente). This ssumption my e remove if we use finer-grine primitives, such s line segments rther thn whole strokes. A single stroke contining the two symols x cn then e prtitione, n the resulting segmenttion evlute. ocument imges often hve some symols overlpping within single connecte component, such s in: y x, where the frction line n y my intersect. In this cse we cn econstruct connecte components into smller sucomponents tht correspon to smll contiguous regions, or s more extreme pproch, tking pixels to e the primitives. sing the smllest possile primitives (e.g. pixels) is ttrctive ecuse uner-segmenttion cnnot occur; however, efficiency my ecome prolem, s the iprtite grphs/ags woul e very lrge. Pixel-level groun truth is imprecise; however, this level of groun truthing is common in computer vision, where it is unerstoo tht the humn interprettion involve in constructing groun truth results in resiul errors for ecisions within miguous regions (e.g. 337
ientifying the specific split point etween two connecte symols rwn with single stroke (e.g. x)). With pproprite primitives, the metrics presente my e use s criterion functions for mchine lerning lgorithms. In most cses, losses for errors in stroke leling n reltionships will nee to e softene to vlues in [0, 1] rther thn {0, 1}, e.g. to voi iscontinuities in the error surfce when using lgorithms se on grient escent. These soft errors my e otine using itionl metrics for stroke lels n reltionships (e.g. proilities or fuzzy vlues). VII. CONCLSION We hve presente new metrics for compring the similrity of two interprettions of set of online strokes, with ppliction to pen-se mthemtics recognition. Our pproch is novel in tht it uses strokes rther thn symols s the sis for compring symol n structure recognition results. This hs the vntge of proviing roer chrcteriztion of system performnce, llowing expressionlevel performnce to e ssesse in terms of input primitives. Our metrics cn e efficiently compute, in time O(n 2 ), where n is the numer of strokes. Note tht hnwritten expressions typiclly consist of reltively smll numer of strokes. The pproch cn lso e esily pte other pen-se omins, such s recognition of flowchrts, n for use in imges. An open question is whether the metric cn e usefully pplie when one cnnot ssume tht the sets primitives for two recognition results eing compre mtch (e.g. for Mthemticl Informtion Retrievl (MIR) pplictions). A relte issue is efining metrics for evlution of mthemticl content (i.e. mthemticl syntx of recognize expression); the metho presente in this pper resses only evlution of lyout. As mthemticl content is normlly represente hierrchiclly y opertor trees, it my e possile to employ iprtite grph-se pproch to evlution there s well, gin using input primitives s the noes in the grph. [3] A.-M. Awl, H. Mouchere, n C. Vir-Guin, The prolem of hnwritten mthemticl expression recognition evlution, in Int l Conf. on Frontiers in Hnwriting Recognition, Kolkt, Ini, 2010, pp. 646 651. [4]. Blostein n A. Grvec, Recognition of mthemticl nottion, in Hnook of Chrcter Recognition n ocument Imge Anlysis. Worl Scientific Pulishing Compny, 1997, pp. 557 582. [5] K.-F. Chn n.-y. Yeung, Mthemticl expression recognition: survey, Interntionl Journl on ocument Anlysis n Recognition, vol. 3, pp. 3 15, Aug 2000. [6]. Grin n B. Chuhuri, OCR of Printe Mthemticl Expressions. Springer, 2007, pp. 235 259. [7] S. chi, A. Nomur, n M. Suzuki, Quntittive nlysis of mthemticl ocuments, Int l J. ocument Anlysis n Recognition, vol. 7, no. 4, pp. 211 218, 2005. [8] S. McLen, G. Lhn, E. Lnk, M. Mrzouk, n. Tusky, Grmmr-se techniques for creting grountruthe sketch corpor, Int l. J. ocument Anlysis n Recognition, vol. 14, no. 1, pp. 65 74, 2011. [9]. Grin n B. Chuhuri, A corpus for OCR reserch on mthemticl expressions, Int l J. ocument Anlysis n Recognition, vol. 7, no. 4, pp. 241 259, 2005. [10] K. Sin, A. sgupt, n. Grin, Emers: tree mtching-se performnce evlution of mthemticl expression recognition systems, Int l J. ocument Anlysis n Recognition, no. 14, pp. 75 85, 2011. [11] R. u, P. Hrt, n. Stork, Pttern Clssifiction, 2n e. Wiley, 2001. Acknowlegements: This mteril is se upon work supporte y the Ntionl Science Fountion uner Grnt No. IIS-1016815, the Nturl Sciences n Engineering Reserch Council of Cn, the Xerox Fountion, n the Center for Emerging n Innovtive Sciences (NYSTAR). REFERENCES [1] A. Sliv, Metrics for evluting performnce in ocument nlysis: ppliction to tles, Int l J. ocument Anlysis n Recognition, vol. 14, pp. 101 109, 2011. [2] A. Lpointe n. Blostein, Issues in performnce evlution: A cse stuy of mth recognition. IEEE Computer Society, 2009, pp. 1355 1359. 338