LECTURE 7. Least Squares and Variants. Optimization Models EE 127 / EE 227AT. Outline. Least Squares. Notes. Notes. Notes. Notes.

Optimization Models EE 127 / EE 227AT Laurent El Ghaoui EECS department UC Berkeley Spring 2015 Sp 15 1 / 23 LECTURE 7 Least Squares and Variants If others would but reflect on mathematical truths as deeply and continuously as I have, they would make my discoveries. C.F. Gauss (1777 1855) Sp 15 2 / 23 Outline 1 Least Squares and Minimum-norm solutions 2 Solving systems of linear equations and LS problems 3 Direct and inverse mapping of a unit ball 4 Variants of the Least-Squares problem l 2-regularized LS 5 Eamples Sp 15 3 / 23 Least Squares When y R(A), the system of linear equations is infeasible: there is no such that A = y (as it happens frequently for overdetered systems). In such cases it may however make sense to detere an approimate solution to the system, that is a solution that renders the residual vector r. = A y as small as possible. In the most common case, we measure the residual via the Euclidean norm, whence the problem becomes A y 2 2. From this, that is a solution that imizes the sum of the squares of the equation residuals: m A y 2 2 = ri 2., r i = a i y i, where a i denotes the i-th row of A. i=1 Sp 15 4 / 23

Least Squares Geometric interpretation Since vector A lies in R(A), the problem amounts to detering a point ỹ = A in R(A) at imum distance form y. The Projection Theorem then tells us that this point is indeed the orthogonal projection of y onto the subspace R(A). Sp 15 5 / 23 Least Squares Solution y A R(A) = N (A ), hence A (y A ) = 0 Solutions to the LS problem must satisfy the Normal Equations: This system always admits a solution. A A = A y If A is full column rank (i.e., rank(a) = n), then the solution is unique, and it is given by = (A A) 1 A y. Sp 15 6 / 23 Minimum-norm solutions When matri A has more columns than rows (m < n: underdetered), and y R(A), we have that dim N (A) n m > 0, hence the system y = A has infinite solutions and that the set of solutions is S = { : = + z, z N (A)}, where is any vector such that A = y. We single out from S the one solution with imal Euclidean norm. That is, we solve :A=y 2, which is equivalent to S 2. The solution must be orthogonal to N (A) or, equivalently, R(A ), which means that = A ξ, for some suitable ξ. Since must solve the system of equations, it must be A = y, i.e., AA ξ = y. If A is full row rank, AA is invertible and the unique ξ that solves the previous equation is ξ = (AA ) 1 y. This finally gives us the unique imum-norm solution of the system: = A (AA ) 1 y. Sp 15 7 / 23 LS solutions and the pseudoinverse Corollary 1 (Set of solutions of LS problem) The set of optimal solutions of the LS problem can be epressed as p = A y 2 X opt = A y + N (A), where A y is the imum-norm point in the optimal set. The optimal value p is the norm of the projection of y onto orthogonal complement of R(A): for X opt, p = y A 2 = (I m AA )y 2 = P R(A) y 2, where matri P R(A) is the projector onto R(A). If A is full column rank, then the solution is unique, and equal to = A y = (A A) 1 A y. Sp 15 8 / 23

Solving systems of linear equations and LS problems Direct methods We discuss techniques for solving a square and nonsingular system of equations of the form A = y, A R n,n, A nonsingular. If A R n,n has a special structure, such as upper (resp., lower) triangular matri, then the algorithms of backward substitution (resp., forward substitution) can be directly applied. If A is not triangular, then the method of Gaussian eliation applies a sequence of elementary operations that reduce the system in upper triangular form. Then, backward substitution can be applied to this transformed system in triangular form. A possible drawback of these methods is that they work simultaneously on the coefficient matri A and on the right-hand side term y, hence the whole process has to be redone if one needs to solve the system for several different right-hand sides. Sp 15 9 / 23 Solving systems of linear equations and LS problems Factorization-based methods Another common approach for solving A = y is the so-called factor-solve method. The coefficient matri A is first factored into the product of matrices having particular structure (such as orthogonal, diagonal, or triangular), and then the solution is found by solving a sequence of simpler systems of equations, where the special structure of the factor matrices can be eploited. An advantage of factorization methods is that, once the factorization is computed, it can be used to solve systems for many different values of the right-hand side y. Sp 15 10 / 23 Factor-solve via SVD SVD of A R n,n : A = UΣV, where U, V R n,n are orthogonal, and Σ is diagonal and nonsingular. We write the system A = y as a sequence of systems: These are readily solved sequentially as Uw = y, Σz = w, V = z, w = U y, z = Σ 1 w, = Vz. Sp 15 11 / 23 Factor-solve via QR Any nonsingular matri A R n,n can be factored as A = QR, where Q R n,n is orthogonal, and R is upper triangular with positive diagonal entries. Then, the linear equations A = y can be solved by first multiplying both sides on the left by Q, obtaining Q A = R = ỹ, ỹ = Q y, and then solving the triangular system R = ỹ by backward substitution. This factor-solve process is represented graphically in the figure below. Sp 15 12 / 23

SVD method for non-square systems Consider the linear equations A = y, where A R m,n, and y R m, and let A = U ΣV be an SVD of A. We can completely describe the set of solutions via SVD, as follows. Pre-multiply the linear equation by the inverse of U, U ; then Σ = ỹ, = V, where ỹ = U y. Due to the diagonal form of Σ, the above writes σ i i = ỹ i, i = 1,..., r; 0 = ỹ i, i = r + 1,..., m. Sp 15 13 / 23 SVD method for non-square systems Two cases can occur: 1 If the last m r components of ỹ are not zero, then the second set of conditions in the last epression are not satisfied, hence the system is infeasible, and the solution set is empty. This occurs when y is not in the range of A. 2 If y is in the range of A, then the second set of conditions in the last epression hold, and we can solve for with the first set of conditions, obtaining i = ỹi σ i, i = 1,..., r. The last n r components of are free. This corresponds to elements in the nullspace of A. If A is full column rank (its nullspace is reduced to {0}), then there is a unique solution. Once vector is obtained, the actual unknown can then be recovered as = V. Sp 15 14 / 23 Solving LS problems Given A R m,n and y R m, we discuss solution of the LS problem A y 2. All solutions of the LS problem are solutions of the system of normal equations A A = A y. Therefore, LS solutions can be obtained by either using either direct or factor-solve methods to the normal equations. Sp 15 15 / 23 Direct and inverse mapping of a unit ball We focus on the linear map y = A, A R m,n, where R n is the input vector, and y R m is the output. We consider two problems that we call the direct and the inverse (or estimation) problem. Sp 15 16 / 23

Direct and inverse mapping of a unit ball Direct problem In the direct problem, we assume that the input lies in a unit Euclidean ball centered at zero, and we ask where the output y is. That is, we let and we want to find the output set B n, B n = {z R n : z 2 1} E y = {y : y = A, B n }. This set is a bounded but possibly degenerate ellipsoid (flat on R(A) ), with the aes directions given by the right singular vectors u i and with the semi-aes lengths given by σ i, i = 1,..., n. Sp 15 17 / 23 Direct and inverse mapping of a unit ball Inverse problem Suppose y B m, we ask what is the set of input vectors that would yield such a set as output. Formally, we seek E = { R n : A B m }. Since A B m if and only if A A 1, we obtain that E is E = { R n : (A A) 1}. This ellipsoid is unbounded along directions in the nullspace of A. The aes of E are along the directions of the left singular vectors v i, and the semi aes lengths are given by σ 1 i, i = 1,..., n. Sp 15 18 / 23 Variants of the Least-Squares problem Linear equality-constrained LS A generalization of the basic LS problem allows for the addition of linear equality constraints on the variable, resulting in the constrained problem where C R p,n and d R p. A y 2 2, s.t. C = d, This problem can be converted into a standard LS one, by eliating the equality constraints, via a standard procedure. Suppose the problem is feasible, and let be such that C = d. All feasible points are epressed as = + Nz, where N contains by columns a basis for N (C), and z is a new variable. Problem becomes unconstrained in variable z: where Ā. = AN, ȳ. = y A. z Āz ȳ 2 2, Sp 15 19 / 23 Variants of the Least-Squares problem Weighted LS The standard LS objective is a sum of squared equation residuals A y 2 2 = m i=1 r 2 i, r i = a i y i. In some cases, the equation residuals may not be given the same importance, and this relative importance can be modeled by introducing weights into the LS objective, that is f 0() = m i=1 w i 2 ri 2, where w i 0 are the given weights. This objective is rewritten as where f 0() = W (A y) 2 2 = A w y w 2 2, W = diag (w 1,..., w m), A w. = WA, yw = Wy. The weighted LS problem still has the structure of a standard LS problem, with row-weighted matri A w and vector y w. Sp 15 20 / 23

Variants of the Least-Squares problem l 2-regularized LS Regularized LS refer to a class of problems of the form A y 2 2 + φ(), where a regularization, or penalty, term φ() is added to the usual LS objective. In the most usual cases, φ is proportional either to the l 1 or to the l 2 norm of. The l 1-regularized case gives rise to the LASSO problem, which is discussed in more detail later. The l 2-regularized case is instead discussed net: A y 2 2 + γ 2 2, γ 0 Sp 15 21 / 23 Variants of the Least-Squares problem l 2-regularized LS A y 2 2 + γ 2 2, γ 0 Recalling that the squared Euclidean norm of a block-partitioned vector is equal to the sum of the squared norms of the blocks, i.e., [ ] a 2 = a 2 2 + b 2 2 b 2 we see that the regularized LS problem can be rewritten in the format of a standard LS problem as follows where A y 2 2 + γ 2 2 = Ã ỹ 2 2, Ã =. [ ] A, ỹ. [ ] y =. γin 0 n γ 0 is a tradeoff parameter. Interpretation in terms of tradeoff between output tracking accuracy and input effort. Sp 15 22 / 23 Eamples Linear regression via least-squares. Auto-regressive (AR) models for time-series prediction. Sp 15 23 / 23