A NEW METRIC FOR MEASURING METAMODELS QUALITY-OF-FIT FOR DETERMINISTIC SIMULATIONS. Husam Hamad

Proceedngs of the 6 Wnter Smulaton Conference L. F. Perrone, F. P. Weland, J. Lu, B. G. Lawson, D. M. Ncol, and. M. Fujmoto, eds. A NEW METIC FO MEASUING METAMODELS QUALITY-OF-FIT FO DETEMINISTIC SIMULATIONS Husam Hamad Electronc Engneerng Department Hjjaw College of Engneerng Technology Yarmouk Unversty Irbd, JODAN ABSTACT Metamodels are used to provde smpler predcton means than the complex smulaton models they approxmate. Accuracy of a metamodel s one fundamental crteron that s used as the bass for acceptng or rejectng a metamodel. Average-based metrcs such as root-mean-square error and -square are often used. Lke all other average-based statstcs, these measures are senstve to sample szes unless the number of test ponts n these samples s adequate. We ntroduce n ths paper a new metrc that can be used to measure metamodels ft qualty, called metamodel acceptablty score. The proposed metrc gves readly nterpretable meanng to metamodels acceptablty. Furthermore, ntal studes show that s less senstve to test sample szes compared to average-based valdaton measures. 1 INTODUCTION Metamodels are approxmatons to smulaton models. They are bult and valdated usng smulaton results for samples of data ponts n the nput space. Two fundamental crtera are used as the bass for acceptng or rejectng a metamodel: effcency and accuracy. Effcency s ndcatve of how expedtously predctons can be obtaned; accuracy s ndcatve of how good these predctons are. Effcency of a metamodel can be determned pror to metamodel constructon, and wthout any computatonal cost n terms of the smulaton runs needed, e.g., the tme taken to evaluate a second-order polynomal metamodel n a gven number of dmensons s the same regardless of the underlyng smulaton model. On the other hand, determnng the accuracy of a metamodel s closely lnked to the number of data ponts used n error calculatons. The accuracy of a metamodel s determned usng objectve methods or subjectve methods (Balc 1989, Sargent 4, and Hamad 5). Objectve methods are based on statstcal tests that make certan assumptons about the data's correlaton, dstrbuton, etc. A central assumpton for the applcaton of these statstcs s related to the number of data ponts used. Even when other assumptons are satsfed, statstcal tests are meanngful only f the data used s suffcent n number. It s often the case that, even f the system s completely observable, obtanng a suffcent number of observatons s mpractcally expensve. For such cases, average-based metrcs such as and -square -two of the most popular statstcs used for metamodel accuracy assessment-may be senstve to the number of observatons used. The objectve of ths paper s to ntroduce a new measure to assess the acceptablty of metamodels qualty of ft based on ther accuracy of predctons. The term gven to ths metrc s : metamodel acceptablty score. can be used partcularly for stuatons where there s not enough valdaton test data or the valdaton data s too expensve to generate. The proposed metrc gves readly nterpretable meanng to metamodels acceptablty. Furthermore, and by comparson to average-based measures such as and -square, ntal studes show that s less senstve to test sample szes. The remander of ths paper s organzed as follows. In secton, the metrc s ntroduced and contrasted wth the and -square metrcs. Test results usng these three metrcs are compared by examples n secton. The paper s then concluded by secton 4. : METAMODEL ACCEPTABILITY SCOE The proposed metrc s defned n ths secton as a measure of metamodel acceptablty wth regard to predcton accuracy. The dscusson of ths new metrc s presented n the context of the two average-based statstcs of and -square. In the followng dscusson, y denotes the response modeled by ŷ for the data pont n th a valdaton test sample havng n observatons. 1-444-51-7/6/$. 6 IEEE 88

.1 Average-Based Statstcs Two of the more mportant measures used for model accuracy assessment ncludng determnstc smulaton models are and -square. They are defned by n (ŷ y ) = 1 = (1) n MSE = 1 () σ where MSE s the mean square error and σ s the varance. Note that results returned by cannot be nterpreted wthout referrng to the context of the problem. For example, f n a test problem s found to be.5 and 5 n another, then t could be that the metamodel s more accurate for the latter problem; e.g., f y values are wthn the range.4 to.6 for the frst problem and 1 to n the second one. Furthermore, the sze of s strongly nfluenced by the number of observatons n n the test sample f the sample sze s nadequate. Varatons wth sample sze n n the -square statstc can be n some cases less pronounced, snce there s a chance that the nfluence of such varatons cancels out n the numerator and domnator of Equaton above. However, care must be taken when nterpretng results returned by -square for response values y whch are nearly constant. In such cases, the domnator of the second term n Equaton s small, and even f the metamodel returns nearly zero MSE, the value of the second term n Equaton resultng from dvdng two small numbers s not wthout numercal problems. A smlar stuaton leadng to questonng the valdty of -square results that s dealt wth n the lterature rses for the case when n s close to the number of coeffcents q n the metamodel. For such cases, - square s adjusted to accommodate the relatve sze of n to q; see (Klejnn 7).. Metamodel Acceptablty Score Metamodel acceptablty score s defned n ths subsecton. It wll be shown how ntutve thnkng lead naturally to development and, hopefully, ts subsequent adopton for determnstc smulaton metamodel valdaton. The startng step n metamodel valdaton actvtes s a lst of n responses y and a correspondng lst of n metamodel ŷ values. To determne how close the consttuents of each of these n response/metamodel pars are to each other ether the dfference ( ŷ y ) or the metamodel-to-response rato ( ŷ ) may be used, broadly speakng. A dfference ( ŷ y ) of zero or a rato ( ŷ ) of one ndcate a perfect match for the th par. Leave out the case y equals zero for now; ths ssue wll be addressed n a short whle. Lke the case mentoned above for, a gven dfference ( ŷ y ) value s not readly nterpretable wthout context. On the contrary, and agan exemptng the case of zero response y for the moment, results for the rato ( ŷ ) requre no context for ther nterpretaton; a rato of.999 s always taken postvely whle a rato of 999 s on the other hand always taken negatvely, regardless of the response modeled. Before gong down the lst of n response/metamodel pars, a crteron s set for acceptablty of a gven par. Optons for such a crteron are numerous and applcaton dependant. For example, a par may be accepted f ts response and metamodel values are wthn % of each other,.e., f (.8 ŷ 1.). In another applcaton, a conservatve metamodel may be favored, thus the acceptablty crteron may change to (.8 ŷ 1.) so that t s guaranteed that for worst case a hgher value for the response s obtaned, e.g., to optmze an engneerng system desgn. Now, gven a requred acceptablty crteron, the lst of n response/metamodel pars s traversed and the number of accepted pars s counted. Let ths number be m; s then defned as m = 1 () n Gven a certan level along wth the acceptablty crteron used, there s no ambguty n nterpretng the results leadng to acceptance or rejecton of the metamodel. For nstance, a of 9% may be set as the mnmum acceptable level for a gven metamodel wth the acceptablty crteron of (.8 ŷ 1.). For ths stuaton, a metamodel wth = 65%, say, s rejected because 5% of ts tested space fals vs-à-vs the gven acceptablty crteron. In addton to nterpretablty of results, senstvty to varatons n sample-to-sample sze should be to a lesser degree n comparson to average-based metrcs. represents a relatve count of test ponts throughout the nput space. Changng the test sample sze n leads to a correspondng change n the number of acceptable ponts m, provded that the dstrbuton of test ponts over the space n the new sample s not changed; e.g., test ponts are 88

drawn from the entre space n all cases wthout puttng more emphass for one sub regon over another.. To clarfy ths pont, suppose that a gven test sample has n =1 observatons gvng a certan level. Now, f only 1 ponts are used nstead, but wth a dstrbuton n the nput space smlar to the dstrbuton of the 1-ponts sample, e.g., coverng the entre nput space not only the frst half, then for most cases the dfference n levels should not be detrmental. Of course, such an argument stll needs further scentfc justfcatons, may be beneftng from the well-establshed samplng theorems such as those used for dgtal sgnal processng methods n electronc engneerng. Nonetheless, ntal nvestgatons show that s less senstve to test sample szes compared to average-based valdaton measures such as, as expected for the reasons just mentoned, and as wll be shown n the examples of the next secton. We now return to the stuaton where for one or more of the n observatons the response y s zero. Ths ssue s treated further n (Hamad 6). Let the number of such ponts be ν. Then as far as calculaton s concerned, two approaches may be taken dependng on the sze of ν relatve to n: If ν s only a small fracton of n, as may be set by the analyst, then these ν ponts are smply removed. Ths should have almost no effect on levels provded that n >> 1. If on the other hand ν s a szable fracton of n, then, f valdaton usng levels s planned at the outset, an approach may be based on modelng usng a 'shft-transformaton' of the response y y to s = y + δ, by addng a constant δ to all of the n data ponts y. Here, δ s chosen to make y s the shfted response greater than zero. Ths way, valdaton usng results may be carred out for the complete lst of n data ponts after transformaton. The sample-to-sample varablty of results are contrasted to those of and -square for the examples n the next secton. EXAMPLES We compare n ths secton varatons wth sample sze for on one hand, and and -square on the other hand, va two examples. The frst example uses a onedmensonal analytc functon for the response. For ths example, two metamodels havng dfferent number of coeffcents q are studed. The second example nvolves smulaton results for an electronc crcut wth three desgn varables (nputs). For these examples: Polynomals metamodels are used. The number of coeffcents q for a polynomal n k dmensons wth a degree d s q = (k + d)! k! d! Latn hypercube valdaton test samples are used. Sample szes of ωq are used, where the multpler ω s vared n steps of 1 startng at ω = 1. Latn hypercube samplng s used to provde flexblty wth sample szes and good unformty over the nput space. The sample-to-sample varaton s measured by the dfference-mode to common-mode rato DMCM. The DMCM for the two quanttes ς 1 and ς s defned by DMCM = ς1 ς 1 1 ( ς1 + ς ) Absolute values are taken to calculate the common-mode component n the domnator of Equaton 4..1 Example 1 The followng response s defned for the space ( 1) 1 (4) x [-1,1]: y = 1 1-5 e x + (5) Two metamodels are derved for ths response: the frst one s a second-order polynomal bult usng a mnmum bas desgn havng four ponts, and the second metamodel s a ffth-order polynomal derved usng another mnmum bas desgn wth 1 ponts. Accuracy tests are carred out usng ffty samples for each metamodel. The number of observatons for these samples are ωq ; ω = 1,,, 5. The number of coeffcents q for the second-order and ffth-order polynomals s three and sx, respectvely. Calculatons of, -square, and are carred out for the second-order polynomal metamodel usng the ffty test samples n turn. Acceptablty crteron s set at (.8 ŷ 1.) for calculatons. esults are shown n Fgure 1-(c) for one tral, whle part (d) of the fgure shows the results usng three other trals for each of the ffty Latn hypercube test samples for calculatons. 884

15.8.6 -square.4 1. 5 -. 1 95 1 95 tral1 tral tral 9 9 85 85 (c) 8 (d) 8 Fgure 1: Valdaton esults for the Second-Order Polynomal Metamodel vs. the Number of Coeffcents Multpler ω for, -Square, and (c) ; (d) Three Trals for Each of the 5 Latn Hypercube Test Samples for Note from the fgure that each metrc settles at a fnal value shown by sold lnes crossng the plots n the mddle. However, varaton around the fnal value s smallest for, as can be seen from Fgure 1c whch shows that levels are at nearly (87 ± 6) after ω = 1. supercedes not only n terms of senstvty to sample sze varaton, but also n terms of nterpretablty of results. To clarfy, t can be seen by reference to Fgure 1a that s at nearly 7 for adequate sample szes. Now, wth ths sze n mnd, s the second-order polynomal a good metamodel? Movng on to part of the fgure, -square values reveal that the metamodel s nadequate. For results n Fgure 1c on the other hand, nformaton about the metamodel qualty-of-ft s not ambguous. The fgure reveals that for around 87% of the tested sample observatons the metamodel s wthn ± % of the response-the crteron used n the example for calculatons as mentoned. Wth such a readly nterpretable result, the decson of whether to accept or reject the metamodel becomes far more easer. Note that - square results pont out to a completely contradctory concluson. Ths s because y n Equaton 5 s almost constant for more than 85% of the nput space, a condton whch s not favorable for -square applcaton as mentoned n the prevous secton. In order to better compare sample-to-sample varatons for the three metrcs, DMCM s correspondng to the data of Fgure 1 are calculated and plotted n Fgure. The same scales can now be used for all three metrcs as shown. Fgure a shows that can vary by as much as 17% for small sample szes. The sample-to-sample varaton for -square can be as hgh as % for the smaller sample szes, as revealed by Fgure b. On the other hand, sample-to-sample varaton for s less than 1% for worst cases, as depcted n Fgure c, wth only -% changes for sample szes as small as three to four tmes the number of polynomal coeffcents q. The order of the metamodel polynomal s changed to fve, and the correspondng varatons of, -square and are calculated. esults for DMCMs are shown n Fgure supermposed for the three metrcs. The same scales as n Fgure are used for easy comparson. As seen n Fgure, although sample-to-sample varatons has mproved for, however, DMCM for s stll the worst, settlng down to small percentages after nearly ω > 15 ;.e., for sample szes wth ωq > 15 6 = 9 observatons. 885

-square 1 1-1 -1 - - 1 1 -square -1-1 (c) - (d) - Fgure : Sample-to-Sample Varatons Expressed by DMCM for the Second-Order Polynomal Metamodel vs. the Number of Coeffcents Multpler ω for, -square, and (c) ; (d) Supermposton of the Three Plots 1 8 6 4 -square The porton H s dependant upon the three desgn varables 1,, and connected as shown n the fgure. Usng a crcut smulator gves results whch are dentcal to those gven by the followng equaton obtaned from elementary crcut analyss technques: - -4 Fgure : DMCMs for the Ffth-Order Polynomal Metamodel vs. ω. Example The three-dmensonal problem n ths subsecton s an engneerng problem that relates the porton H of the nput sgnal that appears as an output n the crcut of Fgure 4. 1 Ω Input Sgnal 1 Output Sgnal H + 1 1 = (6) 1 + 1 + + 1 + 1 A second-order polynomal s constructed from a mnmum bas expermental desgn havng seventeen ponts n the space [1,1]. Accuracy tests are then carred out usng ffty samples. The number of observatons for these samples are ωq ; ω = 1,,, 5. The number of coeffcents q for ths case s ten. Calculatons of, - square, and are carred out for the metamodel usng the ffty test samples n turn. Acceptablty crteron s set agan at (.8 ŷ 1.) for calculatons. esults are shown for and n Fgure 5 for one tral for each of the ffty Latn hypercube test samples used; - square results are omtted to save space. Fgure 4: Electrc Crcut for Example 886

.4.5..5..15.1 1 98 96 94 9 Fgure 5: vs. ω vs. ω Sample-to-sample varatons are determned for the three metrcs va DMCM calculatons. esults are shown n Fgure 6a for the three metrcs supermposed for comparson, whle Fgure 6b shows DMCM results for. 8 6 4 -square - -4 8 6 4 - -4 Fgure 6: DMCM for Supermposton for the Three Metrcs for Only Fgure 5a shows that vares from a lttle over.5 to.4 for sample szes up to about ; s ths metamodel accurate based on these results? Note that the answer to ths queston s readly obtaned by reference to results n Fgure 5b. The fgure shows that the metamodel s accepted for (95 ± )% of the observatons for any test sample sze greater than ω =1 tmes the number of coeffcents q;.e., for sample szes greater than ten n ths case. Sample-to-sample varatons for are depcted by Fgure 6a, showng that DMCM for any change n sample sze s less than 5%. Note from Fgure 6b that sample-tosample varaton s worse for, wth the -square measure performng nearly as good as. 4 CONCLUSION Ths paper presented a new metrc as a measure of metamodel acceptablty wth regard to predcton accuracy. The term used for ths metrc s metamodel acceptablty score or. The dscusson of ths new metrc s presented n the context of the two well-known average-based statstcs of and -square. dffers from such exstng statstcs n nature; a level for a test sample s establshed by countng ponts n the sample whch satsfy a requred metamodel performance rather than takng average performance across the sample ponts. Ths aspect of makes t less senstve to sample sze varatons. Furthermore, results are extremely easy to nterpret, leadng to a decson for acceptng or rejectng a metamodel based on sold grounds. EFEENCES Balc, O. 1989. How to assess the acceptablty and credblty of smulaton results. In Proc. 1989 Wnter Smulaton Conf., ed. E. A. MacNar, K. J. Musselman, and P. Hedelberger, 6-71. Hamad, H, and S. Al-Hamdan. 5. Two new subjectve valdaton methods usng data dsplays. In Proc. 5 Wnter Smulaton Conf., ed. M. E. Kuhl, N. M. Steger, F. B. Armstrong, and J. A. Jones, 54-545. Hamad, H., and S. Al-Hamdan. 6. Dscoverng metamodels qualty-of-ft va graphcal technques, European Journal of Operatonal esearch (In Press). [onlne]. Avalable va <www.scencedrect.com> (do: 1.116/j.ejor.6.1.6) [accessed Aprl, 6]. Klejnen, J. P. C., and D. Deflandre. 6. Valdaton of regresson metamodels n smulaton: bootstrap approach, European Journal of Operatonal esearch, 17(1), pp. 1-11. Sargent,. G. 4. Valdaton and verfcaton of smulaton models. In Proc. 4 Wnter Smulaton Conf., ed.. G. Ingalls, M. D. ossett, J. S. Smth, and B. A. Peters, 1-4. 887

AUTHO BIOGAPHY HUSAM HAMAD s an assstant professor n the Electronc Engneerng Department at Yarmouk Unversty n Jordan. He receved hs B.S. n Electrcal Engneerng from Oklahoma State Unversty n 1984, M.S. n Devce Electroncs from Lousana State Unversty n 1985, and PhD n Electronc Systems Engneerng from the Unversty of Essex, England, n 1995. He was a member of PHI KAPPA PHI Honor Socety durng hs study n the U.S. Hs research nterests nclude analog ntegrated crcuts analyss and desgn, electronc desgn automaton, CAD, sgnal processng, and metamodel fttng and valdaton technques. Hs e-mal address s <husam@yu.edu.jo>. 888