Least-Squares Fitting of a Hyperplane

Least-Squares Fttng of a Hyperplane Robert K. Monot October 20, 2002 Abstract A method s developed for fttng a hyperplane to a set of data by least-squares, allowng for ndependent uncertantes n all coordnates of each data pont, and ncludng an error analyss. Note: Ths paper s adapted from a techncal report I wrote as a graduate student n the Department of Physcs, Unversty of Calforna, Berkeley, n 1976. Copyrght c 2002, by Robert K. Monot. All rghts reserved. 1 Introducton A recurrng computatonal problem n the feld of sotopc studes of terrestral and extraterrestral materals has been the nterpretaton of the observed mass spectra n terms of mxtures of varous source components. Each source component s characterzed by fxed, but often unknown, sotopc ratos, but t s present n varable amounts n dfferent measured samples. One would lke to verfy the hypothess that a gven number of components s adequate to account for all observatons, and, f possble, not only to determne the source component compostons, but also to resolve each measured sample nto ts orgnal components, n order to separate dfferent processes n ts orgn for study. Ths paper treats only the frst steps n ths sequence of analyss,.e., the nvestgaton nto the number and compostons of possble source components. The measured mass spectra may be denoted by Y (µ), where = 1,..., n s the sample number, and µ s the mass number. Snce µ takes on only dscrete values µ k, k = 1,..., p, each mass spectrum can be represented by a vector n a p-dmensonal vector space, Y k = Y (µ k ). These are assumed to be made up of lnear combnatons of the component spectra, Y (µ) = m α jg j(µ) (1) j=1 Where the α j are scalars between 0 and 1 subject to the normalzaton condton m αj = 1, = 1,..., n and the gj(µ) are the m dfferent component spectra. j=1 The problem s analogous to that of curve resoluton encountered, for example, n chromatography or spectrophotometry, where t has been treated wth consderable success usng the technque of prncpal component analyss. Lawton and Slvestre (1971), for example, have consdered the case of two source components 1

and have developed a method for computng two bands of curves, each contanng one of the source components. The method of prncpal component analyss, however, runs nto dffcultes f the data are characterzed by wdely dfferent expermental uncertantes. Ths s often the case wth mass spectroscopc data. Even f the relatve uncertantes n sotopc ratos are smlar, the ratos can vary by orders of magntude. As Anderson (1963) ponted out, the method of prncpal component analyss s justfed only f the rato of the uncertanty varance to the systematc,.e. correlaton, varance s the same for all components of the data. Nonconformty wth ths requrement may be remeded to some degree by rescalng the data accordng to ther respectve uncertantes. Here we abandon the method of prncpal component analyss for an alternatve approach that s on a better statstcal footng n that t takes full account of the estmated uncertantes of the data. It s easly shown that data ponts consstng of lnear combnatons of components accordng to Equaton (1) must le n an m 1-dmensonal subspace of the full p-dmensonal vector space. Ths subspace s defned by the smplex whose vertces are the m dstnct components. Ths paper deals wth only the frst step n component resoluton, namely the determnaton of the parameters of ths subspace. Furthermore, t consders only the smplest case, n whch p = m, that s, the number of components s the same as the number of coordnates of the space (e.g. the number of sotopc ratos measured n each sample). Thus for 2-dmensonal data we seek the equaton of a straght lne, for 3-dmensonal data a plane, and n general a hyperplane of dmenson one less than the space n whch t s embedded. The general case of arbtrary m p s to be dealt wth n a future paper. 2 Defnton of problem In accordance wth the forgong dscusson, we assume that the measured data ponts should deally le on a hyperplane of m 1 dmensons n a space of m dmensons. If we let y = [y 1, y 2,..., y m] denote a pont on ths hyperplane, the equaton whch the data pont deally should satsfy can be wrtten f(y) = a k y k + a m y m = 0 (2) The measured data consst of a set of n vectors Y, = 1..., n. Each coordnate Y k of each data pont has an assocated expermental uncertanty σ k. (The σ k may or may not be known a pror. We assume that at least ther relatve magntudes are known.) The expermental errors wll cause the data ponts Y to le scattered off the hyperplane of Equaton (2). Therefore one seeks a best ft to the data. The method of prncpal component analyss, referred to above, s equvalent to seekng the hyperplane that mnmzes the sum of squares of perpendcular dstances from the measured ponts to the hyperplane. As already mentoned, ths method s unable to take proper account of the expermental uncertantes of the data, and s not nvarant under a change of scale of one or more axes. The usual method that one fnds n the lterature for obtanng a best ft of ths knd s based on mnmzng a sum of squares of the resduals f(y ). Ths s called a regresson of y m aganst y 1 through y. The sum of squares of these resduals s 2

ether unweghted or weghted by 1/σm 2 (Bevngton, 1991). Ths approach gnores the uncertantes n coordnates y 1 through y. It also gves dfferent results dependng on whch coordnate s chosen as the dependent coordnate y m. The correct treatment that properly takes account of the expermental uncertantes was formulated by Demng (1943). Gven the data Y k wth assocated uncertantes σ k, a set of correspondng adjusted values y k are sought whch le exactly on the hyperplane (2) and mnmze the varance m 1 S = (Y k y k ) 2 (3) σ 2 k The soluton of ths formulaton of the problem s not straghtforward. York (1966) frst devsed an approach, later mproved by Wllamson (1968), for the straght-lne case. Here we extend Wllamson s soluton to arbtrary m 1. 3 Dervaton of soluton We begn wth the constrant that the adjusted ponts y are requred to satsfy the hyperplane equaton,.e. f(y ) = 0, = 1,..., n where f s as defned n Equaton (2). Defnng resduals V k = Y k y k allows ths constrant to be rewrtten as f(y ) = f(y ) a k V k + V m = 0, = 1,..., n (4) Now, gven any values of the parameters a k, we seek those values of y k that wll mnmze S for that choce of the a k, subject to the constrant (4). Thus we requre m 1 δs = 0 = V k δv k (5) From (4) we have σ 2 k δf(y ) = 0 = a k δv k + δv m, = 1,..., n (6) Multplyng each of the n equatons (6) by ts own undetermned multpler λ and addng them all to Equaton (5), we obtan ( V k λ a σk 2 k ) δv k + ( Vm σ 2 m + λ ) δv m = 0 (7) Snce the V k are ndependent, the coeffcents of δv k n ths equaton must ndvdually be zero, gvng V k = λ a k σk, 2 = 1,..., n, k = 1,..., m 1 V m = λ σm, 2 = 1,..., n Substtutng ths result nto Equaton (4) and solvng for λ yelds (8) λ = W f(y ) (9) 3

where W = Ths allows S to be rewrtten as [ 1 a 2 kσk 2 + σm] 2 (10) S = Wf(Y)2 (11) Ths expresses S n the form of a weghted sum of the resduals as defned for a conventonal regresson, but wth weghts that properly take account of the ndvdual uncertantes σ k. Now S s to be mnmzed wth respect to the parameters a k. Settng S/ a k = 0 leads to the followng set of equatons analogous to the normal equatons of the conventonal regresson: Wf(Y) y k = 0, k = 1,..., m 1 Wf(Y) = 0 (12) Snce the parameters a k occur n W, f(y ), and y, these equatons are nonlnear and cannot be solved n closed form. However, recognzng that W and y are only weakly dependent on the parameters of the hyperplane, we can lnearze Equatons (12) by treatng those quanttes as constants. Wrtng out f n terms of the parameters, then, we obtan ( WyjY k) a k + ( Wyj) am = WyjYm, j = 1,..., m 1 ( WY k) a k + ( W) am = WYm (13) Identfyng the parentheszed terms n Equaton (13) as the elements of an m m matrx M and the rght-hand sdes as elements of a vector b of length m, ths set of equatons s seen as a lnear system of the form Ma = b. Soluton proceeds teratvely. Startng wth an ntal guess for the vector of coeffcents a, the matrx M and vector b are evaluated and the system Ma = b s solved for the new value of a. Ths new value s used to re-evaluate M and b, and the equaton s solved agan. The teraton s contnued untl convergence s obtaned. In practce, t s not necessary to have a good startng guess for a. From Equaton (10) t can be seen that the ntal choce a = 0 gves, as the result of the frst teraton, the same parameters as would be obtaned f the σ k were zero for all k except m. Ths s the same result as would be gven by the conventonal weghted regresson of y m aganst the other coordnates. Incdentally, ths means that weghted averages (m = 1) are computed correctly n one teraton. Experence has shown that the convergence s rapd for data sets where the ft s justfed, and only a few teratons are necessary to obtan the coeffcents to accuraces that are well wthn ther uncertantes. 3.1 Refnement It should be mentoned that for the sake of numercal stablty, the measured ponts Y should be translated f necessary nto a coordnate system whose orgn s close to the mean of the ponts. Ths stablty can be acheved automatcally, and the 4

computaton smplfed somewhat, by reformulatng the soluton n the followng way. Frst, solve for a m from the last of the equatons (13): where Y k = a m = Y m a k Y k (14) WY k W, k = 1,..., m (15) Ths shows that the pont Y les on the best-ft hyperplane. Now defne Y = Y Y and z = y Y. From Equatons (8) and (9) we have z j = Y j W a jσ 2 jf(y ), = 1,..., n, j = 1,... m 1 (16) where we can express f(y ) as f(y ) = a k Y k Y m = 1,..., n (17) Then upon nsertng (14) nto the remanng equatons (13) we fnd ( WzjY k) a k = WzjY m, j = 1,..., m 1 (18) Ths reformulaton has mproved the numercal stablty and reduced the order of the set of equatons that needs to be solved on each teraton by 1. 4 Error analyss The varances of the hyperplane parameters can be found by evaluatng m σ 2 (a j) = σ 2 k ( aj Y k ) 2 (19) (Ths equaton assumes the data Y k are uncorrelated.) Snce the dependence of a j on Y k s not lnear as Equaton (13) suggests, due to the dependence of W and y k on a j, evaluaton of ths expresson s very complcated. The orgnal verson of ths paper contaned an error n the result of ths calculaton, and a corrected calculaton has not yet been done. To frst order, however, gnorng the nonlnearty one obtans the approxmaton σ 2 (a j) M 1 jj (20) that s, the varances of the parameters are gven smply by the dagonal elements of the nverse of the normal matrx defned n Equaton (13). (The off-dagonal elements of ths matrx are the covarances of the parameters.) For well-behaved data such as those used for llustraton by York (1966), ths approxmaton s good to wthn a few percent. 5

If the expermenter does not have standard errors σ k for the measured quanttes Y k, but only relatve uncertantes, the resultng ft s the same usng these relatve uncertantes, but the varances n the ftted parameters are gven by expresson (20) multpled by S/ν, where ν = n m s the number of degrees of freedom of the problem. If the errors σ k are known a pror, then the goodness of ft can be nferred from the value of S/ν, whch should be close to unty for normally dstrbuted errors. Ths consttutes a test of the m-component hypothess as set forth n the ntroducton. 5 Bblography Anderson, T. W. (1963). Asymptotc theory for prncpal component analyss, Annals of Mathematcal Statstcs, 34, 122 148. Bevngton, P. R. and Robnson, D. K. (1991). Data Reducton and Error Analyss for The Physcal Scences, McGraw-Hll, New York. Demng, W. E. (1943). Statstcal Adjustment of Data, Wley, New York. Lawton, W. H. and Sylvestre, E. A. (1971). Self modelng curve resoluton. Technometrcs, 13, 617 633. Wllamson, J. H. (1968). Least-squares fttng of a straght lne. Canadan Journal of Physcs, 46, 1845 1846. York, D. (1966). Least-squares fttng of a straght lne. Canadan Journal of Physcs, 44, 1079 1086. 6