Linear Least Square Problems Dr.-Ing. Sudchai Boonto

Dr-Ing Sudchai Boonto Department of Control System and Instrumentation Engineering King Mongkuts Unniversity of Technology Thonburi Thailand

Linear Least-Squares Problems Given y, measurement signal, find the least-squares solution x, min x ϵ T ϵ st y = Ax or more standard ofrm min x Ax y 2, where A R m n, y meas R m and x R n is an unknown parameter vector 2/18

Linear Least-Squares Problems cont The cost function of the least-squares minimization problem can be expanded as V (x) = Ax y 2 = (Ax y) T (Ax y) = x T A T Ax x T A T y y T Ax + y T y The gradient x V (x) = V (x) x 1 V (x) x n = 2AT Ax 2A T y 3/18

Linear Least-Squares Problems cont The solution ˆx to the least-square problem is found by setting the gradient equal to 0 Thus A T Aˆx = A T y, normal equation or A T (Aˆx y) = 0 The minimal value of the cost function Ax y 2 can be find from V (x) = ˆx T A T Aˆx ˆx T A T y y T Aˆx + y T y = ˆx T A T (Aˆx y) y T (Aˆx y) with the normal equation, the first of above equation is zero, then min V (x) = y T (Aˆx y) 4/18

Geometric interpretation y ϵ ŷ a 2 A a 1 we are looking for the projection ŷ of the vector y onto the space spanned by the measurement vector a i ŷ is the vector closest to y if the error ϵ is orthogonal to this space 5/18

Geometric interpretation cont ϵ is orthogonal to this space if a T i ϵ = 0, i = 1,, n in matrix from A T ϵ = 0 or A T (y ŷ) = A T (y Aˆx) = 0 6/18

Completion of Squares the completion of squares formula for quadratic polynomials is ( ax 2 + 2bxy + cy 2 = a x + b ) 2 ( ) a y + c b2 y 2 a when a > 0, this tells us the minimum with respect to x for fixed y ( ) min x R ax2 + 2bxy + cy 2 = c b2 y 2, a which is achieved when x = b a y 7/18

Completion of Squares of matrices if A R n n and D R m m are symmetric matrices and B R n m, then [ ] T [ ] [ ] x A B x y B T = (x + A 1 By) T A(x + A 1 By) D y compare with + y T (D B T A 1 B)y ( ax 2 + 2bxy + cy 2 = a x + b ) 2 ( ) a y + c b2 y 2 a 8/18

Completion of Squares of matrices gives a general formula for quadratic optimization; if A > 0, then min x x y T A B T and the minimizing x is B x = y T (D B T A 1 B)y D y ˆx = A 1 By 9/18

Application to Least-square V (x) = Ax y 2 = x T A T Ax x T A T y y T Ax + y T y [ ] T [ x A = T A A T ] [ ] y x 1 y T A y T y 1 }{{} M By the help of Schur complement [ ] T [ ] [ x I 0 A V (x) = T ] [ ] [ ] A 0 I ˆx x 1 ˆx T I 0 y T y y T, Aˆx 0 I 1 for ˆx satisfying and ˆx = (A T A) 1 A T y V (x) = (x ˆx) T A T A(x ˆx) + (y T y y T Aˆx) 10/18

regression or curve fitting model using a linear combination of functions f(t) = x 1 f 1 (t) + x 2 f 2 (t) + + x n f n (t) collect m data samples y i = f(t i ), i = 1,, m write in matrix form y 1 f 1 (t 1 ) f n (t 1 ) x 1 y 2 = f 1 (t 2 ) f n (t 2 ) x 2 y m f 1 (t m ) f n (t m ) x n find least-squares estimate for x by x est = (A T A) 1 A T y called curve fitting or linear regression; functions f i are called regressors 11/18

Solutions if the matrix A has full column rank if A T A is square and invertible and the solution ˆx is ˆx = (A T A) 1 A T y (A T A) 1 A T is called pseudo-inverse of A because ((A T A) 1 A T )A = I 12/18

Example Given 1 1 1 A = 2 1, y = 0, 1 1 0 consider the set of three equations in two unknowns, F x = y The least squares solution is given by ˆx = (A T A) 1 A T y = = [ 3 2 2 2 3 [ ] 1 2 1 ] [ 1 2 1 1 1 1 ] 1 0 0 The least-squares residual is ϵ 2 = Aˆx y 2 = 1 2 13/18

Solutions if the matrix A does not have full column rank A does not have full column rank the problem is change to min x X x 2 with X = {x : x = arg min Az y 2 } x A = [ U 1 U 2 ] [ Σ 0 0 0 ] [ ] V T 1 V T 2 the minimization problem can be written as min z = U 1 ΣV T 1 Az y 2 = min U 1 ΣV T z 1 z y 2 14/18

Solutions if the matrix A does not have full column rank define the partitioned vector the problem becomes ξ 1 = ξ 2 V 1 T V T 2 z min Az y 2 = min U 1 Σξ 1 y 2 z ξ 1 ˆξ 1 = Σ 1 U T 1 y from the normal equation ξ 2 does not change the value of the minimization problem and can be chosen arbitrarily 15/18

Solutions if the matrix A does not have full column rank the solutions become ] ẑ = [V 1 V 2 ˆξ 1 = V 1 Σ 1 U1 T y + V 2 ˆξ2 ˆξ 2 thus X = {x : x = V 1 Σ 1 U T 1 y + V 2 ˆξ2, ˆξ 2 R n r } x 2 = V 1 Σ 1 U T 1 y 2 + V 2 ˆξ2 2 select ˆξ 2 = 0, thus ˆx = V 1 Σ 1 U T 1 y 16/18

Example Given A = [ ] 1 2 1, y = 1 1 1 ˆx = A T (AA T ) 1 y 1 1 = 2 1 1 1 1 2 = 1 [ 3 2 2 2 3 ] [ 1 0 [ ] 1 0 ] 1 2 the 2-norm of this solution is ˆx = 3/2 17/18

Reference 1 Lecture note on Introduction to Linear Dynamical Systems, Stephen Boyd, Stanford University 2 Michel Verhaegen and Vincent Verdult Filtering and System Identification: A Least Squares Approach, Cambridge University Press, 2007 18/18