A new method for searching an L 1 solution of an overdetermined. overdetermined system of linear equations and applications

A new method for searching an L 1 solution of an overdetermined system of linear equations and applications Goran Kušec 1 Ivana Kuzmanović 2 Kristian Sabo 2 Rudolf Scitovski 2 1 Faculty of Agriculture, University of Osijek 2 Department of Mathematics, University of Osijek

LAD Problem A R m n, (m n), b R m A =................................ }{{}, b =.. m n

system of linear equations Ax = b b R(A) b R(A)

overdetermined system of linear equations Ax b b / R(A) b R(A)

b Ax min x R n Euclidean l 2 -norm (Least Square (LS) solution) l 1 -norm (Least absolute deviation (LAD) solution) l -norm (Tschebishev solution) applications in various fields of applied research J. A. Cadzow, Minimum l 1, l 2 and l Norm Approximate Solutions to an Overdetermined System of Linear Equations, Digital Signal Processing 12(2002) 524 560 S. Dasgupta, S. K. Mishra, Least absolute deviation estimation of linear econometric models: A literature review, Munich Personal RePEc Archive, 1-26, 2004 C. Gurwitz, Weighted median algorithms for L 1 approximation, BIT 30(1990) 301 310 Dodge Y. (Ed.), Statistical Data Analysis Based on the L 1 -norm and Related Methods, Proceedings of The Thrid International Conference on Statistical Data Analysis Based on the L 1 -norm and Related Methods, Elsevier, Neuchâtel, 1997

Vector x R n such that min x R n b Ax 1 = b Ax 1 is called the best LAD-solution of overdetermined system Ax b. operational research literature - a parameter estimation problem of a hyperplane on the basis of a given set of points N. M. Korneenko, H. Martini, Hyperplane approximation and related topics, in: New Trends in Discrete and Computational Geometry, (J. Pach, Ed.), Springer-Verlag, Berlin, 1993. A. Schöbel, Locating Lines and Hyperplanes: Theory and Algorithms, Springer Verlag, Berlin, 1999. statistics literature - a parameter estimation problem for linear regression S. Dasgupta, S. K. Mishra, Least absolute deviation estimation of linear econometric models: A literature review, Munich Personal RePEc Archive, 1-26, 2004 E. Z. Demidenko, Optimization and Regression, Nauka, Moscow, 1989, in Russian Dodge Y. (Ed.) (1997): Statistical Data Analysis Based on the L 1 -norm and Related Methods, Proceedings of The Thrid International Conference on Statistical Data Analysis Based on the L 1 -norm and Related Methods, Elsevier, Neuchâtel

principle is considered to have been proposed by the Croatian mathematician J. R. Bošković in the mid-eighteenth century D. Birkes, Y. Dodge, Alternative Methods of Regression, Wiley, New York, 1993 P. Bloomfield, W. Steiger, Least Absolute Deviations: Theory, Applications, and Algorithms, Birkhauser, Boston, 1983 Dodge Y. (Ed.) (1997): Statistical Data Analysis Based on the L 1 -norm and Related Methods, Proceedings of The Thrid International Conference on Statistical Data Analysis Based on the L 1 -norm and Related Methods, Elsevier, Neuchâtel Numerical methods classical nondifferentiable minimization methods cannot be applied directly-unreasonably long computing time specialized algorithms J. A. Cadzow, Minimum l 1, l 2 and l Norm Approximate Solutions to an Overdetermined System of Linear Equations, Digital Signal Processing 12(2002) 524 560 E. Castillo, R. Mínguez, C. Castillo, A. S. Cofinño, Dealing with the multiplicity of solutions of the l 1 and l regression models, European J. Oper. Res. 188(2008), 460 484. Y. Li, A maximum likelihood approach to least absolute deviation regression, EURASIP Journal on Applied Signal Processing 12(2004), 1762 1769 N. T. Trendafilov, G. A. Watson, The l 1 oblique procrustes problem, Statistics and Computing 14(2004) 39 51

LAD problem On the basis of the given set of points Λ = {T i (x (i) 1,..., x (i) n 1, z(i) ) R n : i I }, I = {1,..., m}, vector a = (a0, a 1,..., a n 1 ) Rn of optimal parameters of hyperplane f (x; a) = a 0 + n 1 j=1 a jx j should be determined such that: m G(a ) = min G(a), G(a) = n 1 a R n z(i) a 0 a j x (i) j. i=1 j=1

n = 2 LAD Problem Data-points Λ = {(x (i) 1, z(i) ), i = 1,..., m} LAD line z = a0 + a 1 x 1 min (a 0,a 1 ) R 2 m z (i) a 0 a 1 x (i) m 1 = i=1 i=1 z (i) a 0 a 1x (i) 1

n = 2 LAD Problem Data-points Λ = {(x (i) 1, z(i) ), i = 1,..., m} LAD line z = a0 + a 1 x min (a 0,a 1 ) R 2 m z (i) a 0 a 1 x (i) m 1 = i=1 i=1 z (i) a 0 a 1x (i) 1

LAD solution always exists N. M. Korneenko, H. Martini, Hyperplane approximation and related topics, in: New Trends in Discrete and Computational Geometry, (J. Pach, Ed.), Springer-Verlag, Berlin, 1993. A. Pinkus, On L 1 -approximation, Cambridge University Press, New York, 1993. G. A. Watson, Approximation Theory and Numerical Methods, John Wiley & Sons, Chichester, 1980.

LAD solution is generally not unique

n=2, LAD solution is not unique 1 0.8 0.6 0.4 0.2 0.2 0.4 0.6 0.8 1

Definition Matrix A R m n (m n) is said to satisfy the Haar condition if every n n submatrix is nonsingular.

Theorem 1 Let A R m n, m > n, be a matrix of full column rank and b = (b 1,..., b m) T R m a given vector. Then there exists a permutation matrix Π R m m such that ΠA = [ A1 A 2 ], Πb = [ b1 b 2 ], A 1 R n n, A 2 R (m n) n, b 1 R n, b 2 R m n, whereby A 1 is a nonsingular matrix, and there exists a LAD-solution x R n such that A 1 x = b 1. Furthermore, if matrix [A b] satisfies the Haar condition, then the solution x R n of the system A 1 x = b 1 is a solution of LAD problem if and only if vector v = (A T 1 ) 1 A T 2 s, s i := sign (b 2 A 2 x ) i satisfies v 1. Furthermore, x is a unique solution of LAD problem if and only if v < 1. A. Pinkus, On L 1 -approximation, Cambridge University Press, New York, 1993. G. A. Watson, Approximation Theory and Numerical Methods, John Wiley & Sons, Chichester, 1980.

parameter estimation problem of a hyperplane for a set of points in a plane n = 2 for the given set of points in a plane, with condition that not all of them lie on some line parallel to the second axis, there exists the best LAD-line passing through at least two different data points n = 3 for the given set of points in the space, with condition that not all of them lie on some plane parallel to the third axis, there exists the best LAD-plane passing through at least three different data points Y. Li, A maximum likelihood approach to least absolute deviation regression, EURASIP Journal on Applied Signal Processing 12(2004), 1762 1769 R. Scitovski, K. Sabo, I. Kuzmanović, I. Vazler, R. Cupec, R. Grbić, Three points method for searching the best least absolute deviations plane, 4 th Croatian Mathematical Congress, Osijek, June 17 20, 2008 A. Schöbel, Locating Lines and Hyperplanes: Theory and Algorithms, Springer Verlag, Berlin, 1999.

LAD Problem I. Kuzmanović, K. Sabo, R. Scitovski, Method for searching the best least absolute deviations solution of an overdetermined system of linear equations, submitted

numerical methods for searching the best LAD-solution: Hooke and Jeeves Method, Differential Evolution, Nelder Mead, Random Search, Simulated Annealing J. A. Nelder, R. Mead, A simplex method for function minimization, Comput. J. 7(1965) 308 313 C. T. Kelley, Iterative Methods for Optimization, SIAM, Philadelphia, 1999. various specializations of the Gauss-Newton method E. Z. Demidenko, Optimization and Regression, Nauka, Moscow, 1989, in Russia M. R. Osborne, Finite Algorithms in Optimization and Data Analysis, Wiley, Chichester, 1985. Linear Programming methods: A. Barrodale, F. D. K. Roberts, An improved algorithm for discrete l 1 linear approximation, SIAM J. Numer. Anal. 10(1973) 839 848 N. M. Korneenko, H. Martini, Hyperplane approximation and related topics, in: New Trends in Discrete and Computational Geometry, (J. Pach, Ed.), Springer-Verlag, Berlin, 1993. some of specialized methods: Barrodale-Roberts, Bartels-Conn-Sinclair, Bloomfield-Steiger N. T. Trendafilov, G. A. Watson, The l 1 oblique procrustes problem, Statistics and Computing 14(2004) 39 51 P. Bloomfield, W. Steiger, Least Absolute Deviations: Theory, Applications, and Algorithms, Birkhauser, Boston, 1983 Y. Dodge, Statistical Data Analysis Based on the L 1 -norm and Related Methods, Kušec, Proceedings Kuzmanović, of The Sabo, Thrid Scitovski International A new Conference methodon forstatistical searchingdata an L 1 Analysis solutionbased of an on overdetermined

Lemma 1 Let y 1 y 2... y m be real numbers with corresponding weights w i > 0, W := m i=1 w i, and I = {1,..., m}, m 2 a set of indices. Denote { { max ν I : ν ν 0 = i=1 w } { i W 2, if ν I : ν i=1 w } i W 2 0, otherwise Furthermore, let F : R R be a function defined by the formula F (α) = m i=1 w i y i α. Then (i) if ν 0 i=1 w i < W, then the minimum of the function F is attained at the 2 point α = y ν0 +1; (ii) if ν 0 i=1 w i = W, then the minimum of the function F is attained at every 2 point α from the segment [y ν0, y ν0 +1].

Lemma 2 Let x 0 be a unique solution of the system Ax = b, where A R n n is a square nonsingular matrix and b R n a given vector. If the j-th equation of this system a T j x = b j is replaced with equation u T x = β, whereby u T d j 0, where d j R n is the j-th column of matrix A 1, then the solution of a new system is given by x = x 0 + αd j, α = β ut x 0 u T d j.

Definition Vector x J k R n, J k = {i 1,..., i k } I, k n is called the best J k -LAD-solution of problem min x R n b Ax 1 = b Ax 1 if on x J k the minimum of functional x r(x) 1 is attained with condition r i (x J k ) := b i (Ax) i = 0, i J k. the best J n -LAD-solution is obtained by solving the system of linear equations r i (x) = 0, i = i 1,..., i n, the best J n 1 -LAD-solution is obtained as a solution of one weighted median problem.

Lemma 3 Let A R m n, (m > n) be a matrix of full column rank, b R m a given vector and e = (1,..., m) T a vector of indices. Furthermore, let Π R m m be a permutation matrix such that ΠA = [ A 1 A + 2 ] [ ] [ A11 a = 1k A 12, Πb = A 21 a 2k A 22 b 1 b + 2 ], where A 11 R (n 1) (k 1), A 12 R (n 1) (n k), A 21 R (m n+1) (k 1), A 22 R (m n+1) (n k), a 1k, b 1 R n 1, a 2k, b + 2 Rm n+1, whereby index k is such that A 1k := [A 11 A 12 ] is a nonsingular matrix. Let A 2k := [A 21 A 22 ], I = {(Πe) k : k = 1,..., n 1} and I + = {(Πe) k : k = n, n + 1,..., m}. Then there exists i 0 I + and the best J n-lad-solution x R n, such that r i (x ) = 0 for all i I {i 0 }. The k-th component xk of vector x is thereby a solution of the weighted median problem A 2k η b + 2 α(a 2k ξ a 2k ) 1 min α, (1) whereby ξ, i.e. η, is a unique solution of systems A 1k ξ = a 1k, i.e. A 1k η = b 1, (2) whereas other components of vector x are obtained by solving the system A 1k (x 1,..., x k 1, x k+1,..., x n) T = b 1 x k a 1k. (3)

matrices A and [A b] - satisfy the Haar condition.

Step 0 (Input) m, n, e = (1,..., m) T, A R m n, b R m, Π 0 R m m. Step 1 (Defining the first submatrix) Determine index k such that A 1k := [A 11 A 12 ] R (n 1) (n 1) is a nonsingular matrix, where [ A ] [ ] [ H 0 := Π 0 A = 1 A11 a A + = 1k A 12 b ], Π A 2 21 a 2k A 0 b = 1 22 b +. 2 Set I + = {(Π 0 e) i : i = n, n + 1,..., m} and e := Π 0 e. Step 2 (Searching for a new equation) According to Lemma 3, determine i 0 I + and solution x (0). Calculate G 0 = G(x (0) [ ) := b ] Ax (0) 1 [ A Let A 1 = 1 a T R n n b ], b 1 = 1 R n. i b 0 i0 Furthermore, let A 2 R (m n) n be a matrix obtained from A + 2 by dropping the i 0-th row, b 2 R m n a vector obtained from b + 2 by dropping the i 0-th component Solve system A T 1 v = AT 2 s, s i := sign (b 2 A 2 x) i, i = 1,..., n, Step 3 and denote the solution by v (0) Determine j 0 such that v (0) (0) = max v =: v j 0 i=1,...,n i M If v M 1, STOP; otherwise go to Step 3 (Equations exchange) Let Π 1 R m m be a permutation matrix which in matrix H 0 replaces the j 0 -th row with the i 0 -th row. Determine index k such that A 1k := [A 11 A 12 ] R (n 1) (n 1) is a nonsingular matrix, where H 1 := Π 1 H 0 = [ A 1 A + 2 Set I + = {(Π 1 e) i : i = n, n + 1,..., m} and e := Π 1 e. ] [ ] [ A11 a = 1k A 12, Π A 21 a 2k A 1 b = 22 b 1 b + 2 ],

Step 4 (Searching for a new equation) According to Lemma 2, determine i 0 I + and solution x (1). Calculate G 1 = G(x (1) ) := b Ax (1) 1 ; [ ] [ A Step 5 (Preparation for a new loop) Let A 1 = 1 a T R n n b ], b 1 = 1 R n. i b 0 i0 Furthermore, let A 2 R (m n) n be a matrix obtained from A + 2 by dropping the i 0-th row, b 2 R m n a vector obtained from b + 2 by dropping the i 0-th component. Solve system A T 1 v = AT 2 s, s i := sign (b 2 A 2 x) i, i = 1,..., n, and denote the solution by v (1) Determine j 0 such that v (1) (1) = max v =: v j 0 i=1,...,n i M If v M 1, STOP; otherwise put H 0 = H 1 and go to Step 3

Initialization LAD Problem The initial matrix A 1 matrix A satisfies the Haar condition the first (n 1) rows of matrix A Π 0 = I- identity matrix A nonsingular submatrix A 1k of matrix [ ] [ ] A H 0 := Π 0 A = 1 A11 a A + = 1k A 12 2 A 21 a 2k A 22 in Step 1 - by applying QR factorization with column pivoting A 1 Π := [A 11 a 1k A 12 ] Π = Q[R ρ], Π R n n - permutation matrix Q R (n 1) (n 1) - an orthogonal matrix R R (n 1) (n 1) - an upper triangular nonsingular matrix ρ R (n 1)

A 1k := [A 11 A 12 ] = Q R Π T 1, a 1k = Qρ, Π 1 is the matrix - from Π by dropping the k th row and the n th column.

Theorem 2 Let x be approximation of LAD-problem obtained in Step 2, i.e. [ Step] 4, of the A Algorithm as a solution of the system A 1x = b 1, 1 where A 1 = a T R n n, i [ ] 0 b 1 b 1 = R n. Furthermore, let A 2 R (m n) n be a matrix obtained from b i0 A + 2 by dropping the i0-th row, b2 Rm n a vector obtained from b + 2 by dropping the i 0-th component, and v R n a solution of the system A T 1 v = A T 2 s, s i := sign (b 2 A 2x) i, i = 1,..., n. Denote v j0 = max i=1,...,n v i. I. If v j0 1, then x is the best LAD-solution; II. If v j0 > 1, then j 0 n, and maximum decreasing of minimizing function values is attained such that in Step 3 the j 0-th row of matrix A 1 is replaced.

Convergence Theorem 4 Let A R m n be a matrix and b R m a given vector, m n, such that matrices A and [A b] satisfy the Haar condition. Then sequence (x (n) ) defined by the iterative procedure in the Algorithm converges in finitely many steps to the best LAD-solution.

Example 1 LAD Problem Table 1. i A b 1 42 7-28 34 13-49 6 2 31 46-31 -12 17 18-42 3 13-11 -1-35 31-33 48 4 45-35 43-9 -36-8 -16 5 25-34 37-30 -2 32 13 6-37 -24-9 -35-35 21-16 7-12 26-49 -49-11 -22-32 8-42 31 28-5 -45 47-28 9 19 2-29 -33 0-46 35 10-21 -26-45 -44 5 26 47 11 28 26-38 -39-8 -16-6 12 32 27-7 9-43 -22-38 G 0 = 236.346

Application LAD Problem Data-points T i = (x (i) 1, x (i) 2, z(i) ), i = 1,..., m -fat thickness x (i) 1 -lumbar muscle thickness z (i) - muscle percent of the i th pig x (i) 2 z = a 0 + a 1 x 1 + a 2 x 2 outliers

Root-Mean-Square Error of Prediction (RMSEP) D. Causeur, G. Daumas, T. Dhorne, B. Engel, M. Font, I. Furnols, S. Hojsgaard, Statistical handbook for assessing pig classification methods: Recommendations from the EUPIGCLASS project group EU reference method P. Walstra, G. S. M. Merkus, Procedure for assessment of the lean meat percentage as a consequence of the new EU reference dissection method in pig carcass classification, DLO-Research Institute for Animal Science and Health (ID - DLO). Research Branch Zeist, P.O. Box 501, 3700 AM Zeist, The Netherlands, 1995

Example 2 m=145 EU reference method - RMSEP a2 = 0.71259, a1 = 0.02312, a0 = 67.77137 LAD method a2 = 0.696471, a1 = 0.0176915, a0 = 67.8906