Linear Approximation with Regularization and Moving Least Squares

Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4

Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton... 4.. Soluton of over-determned system of equatons by QR decomposton... 6.. Statstcal bacground... 9. Weghted Least Squares Approxmaton of Functon Values and Gradents....3 Regularzaton of the Problem... 4.3. Addton of fcttous ponts... 5.3. Addton of mnmzng condtons wth respect to coeffcent sze... 7.3.3 Regularzaton by Addng the Mnmzng Condtons wth Respect to the Dfference n Coeffcents Obtaned by Related Approxmatons....4 Comments on Choce of Weghts... 3.5 General Weghted Least Squares... 8 Low Order Polynomal Approxmatons...8. Constant and Lnear Bass... 9. Quadratc Bass... 9.. One dmenson... 3.. wo dmensons... 3..3 hree dmensons... 3..4 Old (alternatve) mappng of coeffcents... 33.3 Cubc Bass... 35.4 Lnear Fttng wth Gradent data... 36.5 Quadratc Fttng wth Gradent data... 36 3 Movng least squares (MLS) approxmaton...38 4 Spatal dervatves of the MLS approxmaton anpproxmaton wth gradent nformaton 4 4. Normal system of equatons... 4 4.. Implementaton remars... 43 4.. Second order dervatves (normal system)... 46 4..3 Approxmaton wth values and gradents (normal system)... 48 4. Over-determned system of equatons... 50 5 Appendx...50 5. Quc remnder... 50 5.. WLS approxmaton... 50 5.. MLS approxmaton... 5 5..3 Implementaton remars... 55 5. Formulas for functon gradents... 60 6 Sandbox...

Change of notaton: m N v - number of ponts where approxmated functon s evaluated m N g - number of ponts where gradent of the approxmated functon s evaluated n N b - number of bass functons Use of ndces: ndex of samplng ponts, ndces of components of approxmaton coeffcents, components of rght-hand sde vector and components of the system matrx n systems of equatons l, m - components of co-ordnate dervatves t components of gradents of the sampled (approxmated) functon. LINEAR FIING. Weghted Least Squares n Functon Approxmaton We have values of some functon ( x) f, x R N, n N v ponts: ( ) =, =,..., f x y N. () v We would le to evaluate coeffcents of lnear combnaton of N b functons f ( x),, f ( x) such that N b = + + + n n = ( x; a) ( x) ( x)... ( x) ( x) f% a f a f a f a f, () ( x ) ( x ) = =,..., f % f y N, (3) v.e. we want that the lnear approxmaton (or approxmaton) agrees as much as possble wth values of f ( x) n all ponts x. We loo for the best agreement n the weghted least squares sense,.e. we mnmze the functon χ a a x x. (4) Nv Nv ( ) = φ ( ) = w ( y ( ) y ) = w a f ( ) y = = wth respect to parameters of approxmaton a. w s the m-dmensonal vector of weghts, whch weght sgnfcance of ponts x. Mnmum s the statonary pont of φ where n

d φ = 0 =,...,. (5) Dervatves of φ are d φ Nv = w a f ( ) y f x ( x ) (6) = Equaton ( 5) therefore gves the followng system of equatons for unnown coeffcents where and ( ( ) ( )) ( ) Nv Nv ( ) = = a : a w f x f x = w y f x, =,..., n (7) Coeffcents a can therefore be obtaned by solvng the lnear system of equatons Ca = d, (8) C = w f x f x (9) Nv = ( ) ( ) d = w f x y (0) Nv ( ) = he system of equatons ( 8) for calculaton of approxmaton coeffcents s calle normal system of equatons. It can be shown that C s postve-semdefnte. If C has a full ran n then t s postve-defnte, and the system can be solved by the Cholesy factorzaton C = V V. () (where V s upper trangular) followed by the solutons of a lower trangular system, V y = d, () ann upper trangular system, V a = y. (3)

.. Soluton of over-determned system of equatons by QR decomposton Here we pont at the relaton between the least squares formulaton ( 4), ( 8)and drect soluton of the over-determned system of equatons ( 3). We can wrte ntroduce matrx A and vector b such that and A ( ) = w f x (4) b = w y. (5) hen the equaton A a% = b (6) ( m n) ( n ) ( m ) reads component-wse as or ( x ) %, w f a = w y, w a% f ( x ) = w y, (7) whch s exactly ( 3), f we tae nto account ( ) and denote coeffcents by a% nstead of a. Equaton ( 6) (or ( 7) n component-wse notaton) s an over-determned system, therefore we can not dvde both sdes of each equaton by w because the ths woulffect the sgnfcance of ndvdual equatons and therefore the soluton (the system n ths case does not have an exact soluton and therefore the relatve sgnfcance of equatons s mportant). Now, we can show that the system ( 6) s n some sense equvalent to the system ( 8). hs s seen by observng that C = d = A A A b, (8).e. the least squares system of equatons ( 8) (also referred to as the normal system of equatons) s obtaned by left multplyng the over-determned system ( 6) by A. It can be shown that we can obtan the soluton of the normal system ( 8) by performng the QR decomposton of the matrx A form the system ( 6) []:

A = Q U (9) ( Nv ) ( Nv Nv ) Nv We denote by z soluton of the orthogonal system Q z = b,.e. z = Q b, (0) Matrx U s upper trapezod (by the QR decomposton). We wrte U and z n bloc form, U ( Nv ) V( ) = = > 0 (( Nv ) ) ( v 0, ), () z ( Nv ) y( ) = w( Nv ). () Now t follows from C = A A (tang nto account the decomposton and the bloc form) that C = A A = V V,.e. V s a Cholesy factor of the normal matrx C = A A. Wth QR factorzaton, we have avoded calculaton of C and ts Cholesy factorzaton. We can further verfy that d = A b = V y. (3) hs means that y, whch s the upper part of the transformed z, s the soluton of the lower trangular system ( ). he least squares soluton s therefore obtaned (accordng to ( 3)) by solvng the least squares system V a% = y. (4) Advantage of usng the QR factorzaton s that the matrx A s better condtoned than AA. If spectral senstvty of matrx A equals, then the spectral senstvty of AA = B s n. In ths expresson, s the largest and n n the smallest egenvalue of B. =========================== From ( ) we can see that = (, ) a f ( ) y x a x. (5) =

and therefore ( x,a) d y ( ) = f x = A w (6) We see that A w d y ( x, a) = (7) Sometmes we defne matrx X so that X ( x,a) d y = = f x ( ) = A. (8).. Statstcal bacground We have a model that predcts a set of measurements (observatons) y, whch s dependent on a set of unnown parameters a : ( ) y a. (9) In functon approxmaton, we have a model for a functon of one or a set of ndependent varables, y ( ) = y ( ; ) a x a (30) From the pont of vew of parameter estmaton, ths s the same as ( 9) because ndependent varables x are used ust to dstngush between dstnct measurements (to ndex the measurements, the same as ndex n ( 9)), anctual functonal relatons are not actually used. In the least squares formulaton, parameters a are estmated by mnmzng the sum of squares, mn χ Note that for lnear models, or n functon approxmaton, a ( ) ( y y ) Nv a =. (3) = ( ) = y a a f (3)

y = ( ; ) a f ( ) x a x (33) Both forms are equvalent, whch can be easly seen f we wrthe the second form ( 33) for x = x.... Statstcal bacground he statstcal bacground descrbed here appled for general least squares fttng (also nonlnear). For foundng the least squares procedures, we must assume that measurement errors are ndependently random and normally dstrbuted: y ~ e π y µ. (34) When fttng the parameters, we would le to fnd the parameters that are most lely to be correct. It s not meanngful to as e.g. What s the probablty that gven parameters a are correct. However, the ntuton tells us that the parameters for whch the model data doesn t loo le the measured data are unlely. We can as the queston Gven the partcular set of parameters, what s the probablty that the specfc data set could have occurred? If y tae contnuous values then we must say the probablty that y ± y occur. If ths probablty s very small then we conclude that the parameter under consderaton are unlely to be rght. Conversely, the ntuton tells us that the data should not be too unlely to mprobable for the correct model parameters. In other words, we ntutvely dentfy the probablty of data gven the parameters, as the lelhood of the parameters gven the data. hs s based on ntuton anhs no mathematcal bacground! We loo for parameters that maxmze the lelhood defned n the above way, and ths form of estmaton s the maxmum lelhood estmaton. Accordng to assumpton ( 34), the probablty of the data set s the product of probabltes for ndvdual data ponts: m y y P = exp y = (35) Maxmzng ths probablty s equvalent to mnmzng negatve of ts logarthm, ( y ( ) ) y m a mln y. = he pont s that there s ust one model the correct one, and there s a statstcal unverse of data sets that are drawn from that model.

Snce the last term s constant, mnmzng the equaton s equvalent to mnmzng ( 3). Remar: he dscusson s lmted to statstcal errors, whch we can average away (n a desred extent) f we tae enough data. Measurements are also susceptble systematc errors, whch can not be annhlated by any amount of averagng. In equatons ( 8)-( 0) and ( 4)-( 6), regardng statstcal argumentaton, we must set the weghts to w =. (36) wth Let us now estmate the uncertantes of the estmated parameters. he varance assocated a can be found from m a ( a ) = = y. (37) from ( 8) we have a ( x ) n m m y f = C d = C (38) = = = Snce C s ndependent of y, We wrte C = W Consequently, ( a ) ( ) a f x = y C. (39) = ( ) ( ) f x f x = n n m l WWl = l= =. (40) he fnal term n bracets n the above equaton s ust C. Snce ths s nverse of W, the equaton reduces do W,.e. ( a ) = C. (4) Off dagonal elements of C are covarances between Cov a (, a ) a an : = C (4) E.g. calbraton of a measurement equpment can depend on the temperature, and f we perform all measurements at a wrong temperature then averagng wll not reduce the systematc error.

... Non-normal dstrbuton of errors In the case of non-normal errors, we often do the followng thngs that are derved from the assumpton that the error dstrbuton s normal: Ft parameters by mnmzng χ Use contours of constant χ as the boundary of the confdence regon Use Monte Carlo smulatons or analytcal calculatons to determne whch contour of χ s the correct one for the desred confdence level Gve the covarance matrx C as the formal covarance matrx of the ft on the assumpton of normally dstrbuted errors Interpret C as the actual squared standard errors of the parameter estomaton. Weghted Least Squares Approxmaton of Functon Values and Gradents Sometmes we have gradent nformaton besde the values of a functon n a gven set of ponts, and we want to construct an approxmaton that best fts the specfed values and the gradents. In ths secton equatons are derved for approxmatons that consder both value and gradent data. We have values of some functon f ( x) and ts gradents n m ponts: ( ) = ( ) =, =,..., f x y f x g N. (43) g g v We would le to evaluate coeffcents of lnear combnaton of N b functons f ( x),, f ( x ) such that = + + + n n = ( x) ( x) ( x)... ( x) ( x) f% a f a f a f a f, (44) ( x ) ( x ) ( x ) g ( x ) g g g f % y = f =,..., N f % = f =,..., N, (45) v g g In order to eep the generalty of dervaton, we wll allow throughout the text that values and gradents of the approxmated functon f ( x ) are evaluated n dfferent sets of ponts (whch may however partally or fully concde). We wll denote the -th component of g by g. he gradent of the approxmaton s smply ( ) ( ) f x = a f x (46) N b

We want that the lnear approxmaton (or approxmaton) agrees as much as possble wth values of f ( x) and that ts gradent agrees as much as possble wth gradents of f ( x) n all ponts x. We loo for the best agreement n the weghted least squares sense,.e. we mnmze the functon Ng ( ) ( ) N y Nv φ = w ( y ( x ) y ) + w g t x g g g t = = g = t= x t Nv Ng N f w a f ( ) y + w ( ) gt a g x x g g t = g = t= x t. (47) wth respect to parameters of approxmaton a. w are N v weghts that wegh sgnfcance of values n ponts x and w gl w are N g N weghts that wegh sgnfcance of ndvdual gradent components n x. Mnmum s the statonary pont of g Dervatves of φ are d φ φ where = 0 =,...,. (48) d φ Nv = w a f ( ) y f ( ) + x x = + f f ( x ) ( x ) Ng N w gl a g g gt g g = t= xt xt (49) Equaton ( 48) therefore gves the followng system of equatons for unnown coeffcents a :