Normalisatio with respect to patter Iwoa Müller-Fr czek Nicolaus Copericus Uiversity i Toru«, Polad Abstract The article presets a ew ormalisatio method of diagostic variables - ormalisatio with respect to the patter. The ormalisatio preserves some importat descriptive characteristics of variables: skewess, kurtosis ad the Pearso correlatio coeciets. It is particularly useful i dyamical aalysis, whe we work with the whole populatio of objects ot a sample, for example i regioal studies. After proposed trasformatio variables are comparable ot oly betwee themselves but also across time. The we ca use them, for example, to costruct composite variables. keywords: ormalisatio, stadardisatio, composite variable, sythetic measure 1 Itroductio I regioal studies we ofte eed to compare regios objects with respect to aalyzed complex or composite pheomeo. Complex pheomeo is a qualitative pheomeo, that is characterized by some quatitative features, called diagostic variables. Each object is the idetied with a poit of the multidimesioal real space. Oe of the tools of regioal research are composite variable or sythetic measure. Composite variable is created to reect multidimesioal poits objets i the oe-dimesioal space. May advaced methods of costructig sythetic variables have bee developed, however the simplest methods are ofte used i practice. There Author's Address: I. Müller-Fr czek, Faculty of Ecoomic Scieces ad Maagemet, Nicolaus Copericus Uiversity, ul. Gagaria 13 a, 87-100 Toru«, Polad, e-mail: muller@umk.pl 1
are a lot of such examples see [2], oe of them is very popular Huma Developmet Idex HDI, which raks coutries ito four tiers of socio-ecoomic developmet. Util 2010 HDI was a uiformly weighted sum of three idicators describig: life expectacy, educatio, ad icome per capita. Oe of the step of the costructio of sythetic measure is brigig diagostic variables to comparability, called ormalisatio or stadardisatio. Normalisatio deprives variables their uits ad uies their rages. There are a lot of ormalisatio formulas see [4], [5], [8]. Choosig a proper method is importat because ormalisatio iueces o results of object orderig. The usual stochastic approach ca be used to determie parameters eeded to ormalisatio. The we treat values of variable observatios as a radomly selected sample of the populatio. This approach should ot be used i regioal research, where we work with the whole populatio of objects. I this case we should use a descriptive determiistic approach. Normalisatio formulas are most ofte give for static aalysis, this is for a xed poit i time. A ormalisatio problem appears whe we wat to compare situatios of regios at several time poits. The the variables should also be comparable across time. To achieve this eect i the stochastic approach oe ca use all values of variable both for objects ad for time to determie parameters eeded for ormalisatio. However, this solutio is cotroversial i descriptive approach see [9], i additio, it requires icesat coversio of results whe later observatios occur. I this case we should rather use curret observatios, so after usual ormalisatio variables are ot comparable across time. The we ca ot compare the values of sythetic measures, we ca oly compare rakigs. To solve this problem i the metioed Huma Developmet Idex, the parameters of feature scalig are xed o levels, that are ot related to variable distributio. The levels are justied by substative reasos. For example, the age of 85 was established as the maximum life expectacy at birth. The article proposes a ew method of feature ormalisatio - ormalisatio with respect to the patter or patter ormalisatio for short. This ame was ispired by the Hellwig's paper see [3], [1]. The method is cosistet with the static approach, but it ca be used to compare objects at dieret time poits. The method meets the requiremets of ormalisatio that are suggested i literature see [4], [6]. It preserves skewess ad kurtosis. Moreover, the absolute values of the Pearso correlatio coeciets are ot chaged after ormalisatio. I the rs step of the patter ormalisatio the ature of variable is determied i the cotext of aalyzed complex pheomeo. We distiguish stimulats ad destimulats. Stimulat is a diagostic variable that has a 2
positive impact o the aalyzed complex pheomeo, while destimulat egative. I regioal research determiig the ature of variables is atural. Most ofte, before ormalisatio, we tur destimulats ito stimulats usig their iverse values. Ufortuately, the variables after coversio lose their iterpretatio ad their distributios are chaged. I the preseted method, we do ot coverse destimulat before ormalisatio. Destimulats ad stimulats are ormalized i dieret ways. Determiig the ature of variable allows us to choose the most beecial observatio amog all values of the variable, maximum for stimulat ad miimum for destimulat. We call this value a patter. Next we covert all values with respect to this patter. After trasformatio we get comparable variables. All of them are destimulats with clear iterpretatio. Patter ormalisatio ca be used i commo costructiio of composite variables istead of other methods of ormalisatio. A possible applicatio is show i [7]. 2 Deitio of patter ormalisatio Suppose that a complex pheomeo observed for N regios is aalyzed. Assume that we caot measure this pheomeo, whereas we kow a collectio of measurable diagostic variables that characterize it. Assume that diagostic variables meet both substative ad statistical requiremets, for more details see for example [9]. Let us cosider oe such variable x x 1, x 2,..., x R, which is a stimulat the we write x S, S deotes the set of stimulats or a destimulat x D aalogously. I the rst step we choose a patter - the most beecial of all values of the variable x. The patter is uique for all objects ad is described by the formula: max x i if x S, x + i 1 mi x i if x D. i After specifyig the patter x + we ca cosider a ew variable u + istead of the variable x give by: x u + x i x + + x if x S, i j1 x j x + j1 x+ x j x i x + 2 j1 x if x D. j x + 3
The formula 2 determies a certai trasformatio of iitial variable x x 1, x 2,..., x ito a ew variable u + u + 1, u + 2,..., u +. We call it a ormalisatio with respect to the patter. After this trasformatio the ew variable describes the same aspect of complex pheomeo as described by x. So u + is a diagostic variable of this pheomeo. The patter ormalisatio 2 is ot just a techical procedure. New variable has a clear iterpretatio, u + i species the share of distace betwee the i-th object ad the patter i the total distace of all objects from the patter. The situatio of the i-th object is better whe the value of u + i is lower. The values of variable u + characterize the positios of objects i the whole system. This is the same as for other forms of ormalisatio, but the system is specied i a dieret way. I the case of the patter ormalisatio the system is represeted by the sum of distaces betwee objects ad patter, while i commo ormalisatios descriptive characteristics of the distributio of x are used for this purpose. 3 Properties of variable after ormalisatio The quatitative descriptio of a immeasurable qualitative pheomeo is obtaied usig sythetic measures. Brigig diagostic variables to comparability is the rst step i the costructio of such measure. The patter ormalisatio ca be used for this purpose. Assume that diagostic variables are trasformed with respect to their patters. The the ew set of variables has advatages, which are expected for creatig sythetic variables. These properties ad some proofs are preseted below. A. Basic properties A1. All variables after patter ormalisatio are uitless, o-egative ad limited to iterval [0, 1]. Because of that, the ew set of diagostic variables cotais comparable elemets. A2. Irrespective of the iitial ature, variable after the patter ormalisatio becomes destimulat. It meas that the situatio of the i-th object is better whe the value u + i is lower. I this sese the patter ormalisatio uies the ature of diagostic variables. A3. Trasformig of variables does ot aect the orderig of objects. B. Extreme values after patter ormalisatio 4
B1. The variable u + ca take zero value oly for the patter object. Sice the patter is chose amog values of the variable x, zero value is take. u + i 0 x i x +. u + i 0 x i x + j1 x j x + 0 x i x + 0 x i x + B2. The value u + i equals 1 whe all objects are patters except the i-th object. This situatio is rather urealistic. u + i 1 j i x j x +. u + i 1 x i x + j1 x j x + 1 x i x + x j x + x j x + j i j1 B3. The maximum value of u + depeds o the ature of variable x ad it is expressed by: If x S, the: max i u + i max u + i max ix + x i i j1 x+ x j If x D, the: max i x i mi i x i j1 max i x i x j max i x i mi i x i j1 x j mi i x i x+ mi i x i j1 x+ x j max u + i max ix i x + i j1 x j x + max i x i x + j1 x j x + if x S, if x D. max i x i mi i x i j1 max i x i x j. max i x i mi i x i j1 x j mi i x i. 5
C. Descriptive characteristics of ormalised variables C1. The mea value of u + depeds oly o the umber of objects ad is iversely proportioal to this umber. It is expressed by: u + 1 u + def 1 u + i 1. x i x + j1 x j x + 1 x i x + j1 x j x + 1 C2. The variace of u + is described by: S 2 u + 1 S 2 u + def 1 u + i u + 2 x + x i j1 x+ x j 1 If x S, the: 2 S 2 u + 1 x + x j1 x+ x j 1 2 1 3 1 x + 2 x i 3 x + x 1 1 3 S 2 x 2 x + x 2 The proof is similar whe x D. S 2 x 2 x + x 2. x + x x + 1 j1 x 1 j 2 1 x x i 2 2 x + x 2 x xi x + x C3. The stadard deviatio of u + depeds o the ature of variable x ad it is expressed by: Su + def Sx if x S, S 2 u + x + x Sx if x D. x x + 6 2
C4. The coeciet of variatio of u + is give by: CV u + def Su+ u + C5. The 3-rd cetral momet of u + is give by: µ 3 u + def 1 Sx if x S, x + x Sx if x D. x x + u + i u + 3 µ 3 x 3 x + x 3. µ 3 u + 1 x + x i j1 x+ x j 1 3 If x S, the: µ 3 u + 1 x + x j1 x+ x j 1 3 1 x + x 4 x + 1 j1 x j 1 x + 3 x i 4 x + x 1 1 3 xi x 4 x x + µ 3 x 3 x x + 3 1 3 The proof is similar whe x D. C6. The absolute value of the coeciet of skewess does ot chage after the patter ormalisatio: { Au + def µ 3u + S 3 u + Ax if x S, Ax if x D. C7. The 4-th cetral momet of u + is give by: µ 4 u + def 1 u + i u + 4 µ 4 x 4 x + x 4. 7
µ 4 u + 1 x + x j1 x+ x j 1 4 If x S, the: µ 4 u + 1 x + x j1 x+ x j 1 4 1 5 1 4 xi x 1 5 x x + 5 µ 4 x 3 x x + 4 x xi x + x 1 x + x i x + 1 j1 x j 4 1 4 The proof is similar whe x D. C8. The kurtosis of u + does ot chage after the patter ormalisatio: Ku + def µ 4u + S 4 u + Kx. D. Liear relatio betwee variables after ormalisatio Assume that two diagostics variables x 1, x 2 are trasformed with respect to their patters. Deote by u + 1 ad u + 2 variables after ormalisatio. D1. The covariace betwee u + 1 ad u + 2 equals: covu 2 1, u + 2 def 1 u + i1 u+ 1 u + i2 u+ 2 covx 1, x 2 2 x + 1 x 1 x + 2 x 2 covx 1, x 2 2 x + 1 x 1 x + 2 x 2 if x 1, x 2 S or x 1, x 2 D, otherwise. covu 2 1, u + 2 1 x i1 x + 1 j1 x j1 x + 1 1 x i2 x + 2 j1 x j2 x + 2 1 8
Assume that x 1 ad x 2 are stimulats. The proof i other cases is similar. covu 2 1, u + 2 1 x + 1 x 1 j1 x j1 x + 1 1 x + 2 x 2 j1 x j2 x + 2 1 1 x + 1 x i1 x + 2 x 2 3 x + 1 1 j1 x 1 j1 x + 2 1 j1 x 1 j2 1 x + 1 x i1 x + 3 x + 1 2 x i2 1 x 1 x + 1 2 x 2 1 x1 x i1 3 x + x2 1 x i2 1 x 1 x + x i1 x 1 x i2 x 2 2 x 2 2 x + 1 x 1 x + 2 x 2 covx 1, x 2 2 x + 1 x 1 x + 2 x 2 D2. The absolute value of the Pearso correlatio coeciet of diagostic variables is preserved after the ormalisatio: { corru + 1, u + 2 def covu2 1, u + 2 Su + 1 Su + 2 corrx 1, x 2 if x 1, x 2 S or x 1, x 2 D, corrx 1, x 2 otherwise. E. Dyamic approach Assume that the diagostic variable x is observed i two periods of time the we write x 1 ad x 2 respectively. For each period we choose a patter ad trasform x 1 ad x 2 ito u 1+ ad u 2+ accordig to the formula 2. E1. The values of variables u 1+ ad u 2+ are comparable. Substatiatio. The system is characterized by the sum of distaces betwee objects ad the patter. It chages over time. For give object, if the value of the trasformed variable icreases over time, this meas that the share of distace from this object to the patter i the sum of all distaces icreases, so the situatio of this object becomes worse i compariso with the situatios of other objects. 9
4 Summary The ormalisatio of diagostic variables described by formula 2 plays a double role i the costructio of sythetic measure. First, it uies the ature of variables A2. Secodly, it brigs variables to comparability A1. So, after patter ormalisatio diagostic variables become comparable destimulats. The ormalisatio with respect to the patter preserves two importat characteristics of the distributio of diagostic variables - skewess C6 ad kurtosis C8. Moreover, this coversio does ot disrupt liear relatio betwee variables - the absolute value of the Pearso correlatio coeciet is ot chaged D2. This advatages are expected for ormalisatios used for brigig variables to comparability. Ulike other methods the patter ormalisatio is ot just a techical procedure, it has clear iterpretatio. However, the major advatage of the patter ormalisatio over other ormalisatio methods appears i dyamic approach. Although the curret data are the sole data used to covert variables, the trasformed variables are comparable i time E1. The ormalisatio with respect to the patter seems to be a useful tool i multidimesioal comparative aalysis. It ca be applied wheever variables eed to be comparable, for example i the sythetic aalysis of complex pheomeo. The proposed costructio ca have various modicatios, for example we ca chage the measure of distace or the method of choosig patter. Refereces [1] FANCHETTE, S. 1972 "Sychroic ad diachroic approaches i the Uesco project o huma resources idicators - Wroclaw taxoomy ad bivariate diachroic aalysis", UNESCO documet, SHS/WS/209, Paris. [2] FREUDENBERG, M. 2003, "Composite Idicators of Coutry Performace: A Critical Assessmet", OECD Sciece, Techology ad Idustry Workig Papers, No. 2003/16, OECD Publishig, Paris. [3] HELLWIG, Z. 1968, "Procedure of Evaluatig High-Level Mapower Data Ad Typology of Coutries by Meas of the Taxoomic Method", upublished UNESCO workig paper, COM/WS/91, Paris. [4] JAJUGA, K., WALESIAK, M. 2000, "Stamdardisatio of Data Set Uder Dieret Measuremet Scales", i Classicatio ad Iformatio Processig at the Tur of the Milleium. Studies i Classicatio, 10
Data Aalysis, ad Kowledge Orgaizatio, eds. Decker R., Gaul W., Spriger-Verlag, Berli, Heidelberg, 105-112. [5] MILLIGAN, G.W., COOPER, M.C. 1988, "A Study of Stadardizatio of Variables i Cluster Aalysis", Joural of Classicatio 5, 181-204. [6] MŠODAK, A. 2006, "Multirateral Normalisatios of Diagostic Features", Statistics I Trasitio 75, 1125-1139. [7] MÜLLER-FR CZEK, I. 2017, "Propozycja miary sytetyczej" [Propositio of Sythetic Measure], Przegl d Statystyczy, 644, 413-428. [8] STEINLEY, D. 2004, "Stadardizig Variables i K -meas Clusterig" i Classicatio, Clusterig, ad Data Miig Applicatios. Studies i Classicatio, Data Aalysis, ad Kowledge Orgaisatio, eds. Baks D., McMorris F.R., Arabie P., Gaul W., Spriger, Berli, Heidelberg. [9] ZELIA A. 2002, "Some Notes of the Selectio of Normalisatio of Diagostic Variables", Statistics I Trasitio 55, 787-802. 11