1 Lest Squres Pge 1 1 Liner Lest Squres I will try to be consistent in nottion, with n being the number of dt points, nd m < n being the number of prmeters in model function. We re interested in solving n inconsistent set of equtions, Ax = b, where A is n n m-mtrix with components i,j, x is n m-vector with m < n nd components x i, b is n n-vector with components b i. Notice tht if m = n this could be solved providing solution exists using Gussin elimintion. This system is inconsistent hs no solution since the number of unknown prmeters m is less thn the number of equtions n nd it is unlikely tht x cn be chosen such tht Ax = b. If m = n, we would wnt to determine x such tht Ax b 2 = 0 where we will work with the Eucliden norm. Gol: If m < n, we would wnt to determine x such tht Ax b 2 is minimized. Nottion: Given A = [ i,j ] n m, we define the trnspose of A s A T = [ j,i ] m n. Useful results from liner lgebr: A + B T = A T + B T AB T = B T A T x y 2 = x i y i 2 = x y T x y ith row of Ax is ith row of A T b is i,k x k k,i b k Definition: Two vectors x nd y in R n re orthogonl if x T y = 0 I wnt to ttck this in different mnner thn the text, using directionl derivtives. Ax b 2 = Ax b T Ax b = Ax T b T Ax b = x T A T b T Ax b = x T A T Ax b T Ax x T A T b + b T b Now, ech term in the bove is sclr, since the norm is sclr. For this to be minimum, the directionl derivtive in ny direction u must be zero from Clculus III. D u Ax b 2 = x T A T Ax b T Ax x T A T b + b T b u = 0 x

2 Lest Squres Pge 2 where 1 α α x = α x 1 α x m T Let s work out ech of these derivtives in turn, strting with the esy ones, nd looking t the jth row α. We will use x i = δ ij. α = b T b = α = 0 b 2 i x bt b = 0 α = α = α = x T A T b = k,j b k = α = b T Ax = b i i,j = x i k,i b k j,k T b k jth row of A T b b i i,k x k j,i T b i jth row of A T b x xt A T b = A T b x bt Ax = A T b α = x T A T Ax = Ax T Ax = i,q x q i,k x k q=1 α = i,j i,q x q + i,k x k i,j product rule of derivtives q=1 = 2 i,j i,k x k = 2 j,i T i,k x k = 2 jth row of A T Ax product of ith row of Ax x xt A T Ax = 2A T Ax Substituting these results bck, we find tht 2A T Ax 2A T b u = 0 1 Note this is the denomintor lyout convention which you cn lern more bout here: clculus

3 Lest Squres Pge 3 nd for this to be true for ll directions u, we must hve A T Ax = A T b the norml equtions The solution x to the norml equtions will minimize the vlue of the residul r = Ax b. You cn solve the norml equtions using Gussin elimintion, or finding n inverse of mtrix: x = A T A 1 A T b where A T A 1 A T is the pseudoinverse of the mtrix A. Fitting Models to Dt If we re given 1 i n dt points x i, y i nd hve liner model we wish to fit fx = c j f j x j=1 where m < n then we cn use the norml equtions to determine the vlues of the prmeters c j tht best fit the dt. Best fit is tken to men minimizing the quntity Ac b, where substituting the dt points into the model results in the equtions: fx i = y i which is written s Ac = b. You should note tht this process is ctully quite flexible, nd you cn construct whtever model might best fit the dt, even something like: fx, y = c 1 + c 2 cos x + c 3 sin y + c 4 xy You simply use this eqution to crete the ssocited norml equtions for the relted system Ac = b. The process is liner in terms of being liner in the prmeters c i tht re being determined, not in terms of liner model function f. Section 4.2 shows quite few models, so mke sure to red it. Improvements When Solving The Norml Equtions Fil: QR Algorithm Sometimes solving the norml equtions fils, since the Gussin elimintion is not successful. In those cses it is useful to employ the QR fctoriztion.

4 Lest Squres Pge 4 Grm-Schmidt Orthogonliztion I think the text switched the mening of m nd n here, so I will mintin my ide tht n is the number of dt points t m < n is the number of prmeters this is how the Mthemtic notebook is set up. Grm-Schmidt orthogonliztion is used to tke set of m linerly independent vectors nd crete new set of m vectors tht is orthogonlized ech vector is perpendiculr to ll the other vectors. The lgorithm is firly strightforwrd to element, nd is discussed in the Mthemtic file. The full QR decomposition is A [n m] = Q [n n] R [n m] where Q T = Q 1 this is why we need the full decomposition, so the inverse mtrix exists since the inverse is only defined for squre mtrices. Q T Q = Q 1 Q = I Once we hve the full QR decomposition of A we do liner lest squres on the m n system Ac = b for which we wish Ac b 2 to be minimum in the following mnner: Ac = b QRc = b Q T QRc = Q T b left multiply by Q T Rc = Q T b since Q T Q = I Rc = d where R is the upper m m prt of R nd d is the upper m entries of Q T b. Solve this system for the model prmeters c. 2 Nonliner Regression: Guss-Newton Method If the model function does not linerly depend on the prmeters, then the previous liner lgebr bsed methods will not work. We cn exmine the problem insted from the point of view of minimizing the error by tking the grdient. Generl Set Up We hve n dt points x i, y i, i = 1, 2,..., n which we wnt to fit to model function which hs m djustble prmeters α j,, j = 1, 2,..., m: fx = fx; α 1,..., α n. We ctully hve gret del of choice in wht type of function we wnt to minimize. It cn be nything tht will mesure the reltion of the dt to the model function. The vector which compres the dt to the model function t ech point is given by y 1 fx 1 ; α 1,..., α m y 2 fx 2 ; α 1,..., α m. y n fx n ; α 1,..., α m

5 Lest Squres Pge 5 We cn minimize this vector bsed on vriety of different norms: l 1 norm: l p norm: l norm: y i fx i ; α 1,..., α m 1/p y i fx i ; α 1,..., α m p mx y i fx i ; α 1,..., α m,..,n Wht is typiclly done is tht the l 2 norm is used, since it is the Eucliden spce norm, nd the squre of the norm is minimized: minimize Eα 1,..., α m = y i fx i ; α 1,..., α m 2 Minimizing is just multivrible unconstrined minimiztion procedure, which yields the system of equtions y i fx i ; α 1,..., α m fx i ; α 1,..., α m, k = 1,..., m 1 α k which must be solved for the m unknowns α j. You cn solve the equtions numericlly using Newton s method for systems. Applying this technique to Liner Regression For now we re interested in fitting to polynomil of degree m which mens we hve m + 1 prmeters. fx; α 0,..., α m = α j x j The system of equtions we solve becomes y i α j x j i α j x j i, k = 0,..., m α k m y i α j x j i δ jk x j i, k = 0,..., m α j x j+k i = y i α j x j i x k i, y i x k i y i x k i, k = 0,..., m α j x j i xk i, k = 0,..., m k = 0,..., m This is system of m + 1 equtions in the m + 1 unknowns α j. It is liner system, nd cn be solved using Crmer s rule, resulting in the fits you my hve seen before especilly for fx = α 0 + α 1 x.

6 Lest Squres Pge 6 3 Aside: Orthogonl Polynomils nd Approximting Functions Now, we consider hving function fx nd fitting curve P x; α 0,..., α n to the function. We wnt to minimize the error between the two, nd gin choose the lest squres pproximtion: 2 minimize Eα 0,..., α n = wt ft P t; α 0,..., α n dt where wt is weight function. The weight function gives greter weight to certin sections of the region tht it is defined on. Minimizing this eqution is just multivrible unconstrined minimiztion procedure, which yields Eα 0,..., α n b = 2 wt ft P t; α 0,..., α n P t; α 0,..., α n dt, α k α k We choose the function P t; α 0,..., α n = n α jφ j t, where φ j t is set of linerly independent functions on [, b]. Therefore, we get wtftφ k t dt = wt ft wt ft wt ft α j φ j t α k α j φ j t dt, α j φ j t δ kj φ j t dt, α j wtφ j tφ k t dt, α j φ j t φ k t dt, If the functions φ k t cn be chosen so tht they re orthogonl, { 0 j k wtφ j tφ k t dt = j = k then we get wtftφ k t dt = = β j α j wtφ j tφ k t dt, α j δ jk β j, = α k β k, α k = 1 wtftφ k t dt β k nd we hve determined the α j which minimize the function. The Grm-Schmidt process describes how to construct orthogonl polynomils given specific weight function wx.

