LECTURE 7. Least Squares and Variants. Optimization Models EE 127 / EE 227AT. Outline. Least Squares. Notes. Notes. Notes. Notes.

Similar documents
Midterm Solutions. EE127A L. El Ghaoui 3/19/11

Lecture 2: Linear Algebra Review

Midterm 1 Solutions. 1. (2 points) Show that the Frobenius norm of a matrix A depends only on its singular values. Precisely, show that

Least Squares with Examples in Signal Processing 1. 2 Overdetermined equations. 1 Notation. The sum of squares of x is denoted by x 2 2, i.e.

Lecture notes: Applied linear algebra Part 1. Version 2

Lecture 7: Weak Duality

Numerical Methods. Elena loli Piccolomini. Civil Engeneering. piccolom. Metodi Numerici M p. 1/??

Pseudoinverse & Moore-Penrose Conditions

Math 407: Linear Optimization

8. Least squares. ˆ Review of linear equations. ˆ Least squares. ˆ Example: curve-fitting. ˆ Vector norms. ˆ Geometrical intuition

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

ECE 275A Homework #3 Solutions

Linear Systems. Carlo Tomasi. June 12, r = rank(a) b range(a) n r solutions

Least Squares Optimization

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u

Convexity II: Optimization Basics

DS-GA 1002 Lecture notes 10 November 23, Linear models

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

Linear Systems. Carlo Tomasi

The Singular Value Decomposition and Least Squares Problems

Lecture 4: September 12

MATH36001 Generalized Inverses and the SVD 2015

ENGG5781 Matrix Analysis and Computations Lecture 8: QR Decomposition

MATH 2331 Linear Algebra. Section 2.1 Matrix Operations. Definition: A : m n, B : n p. Example: Compute AB, if possible.

EE731 Lecture Notes: Matrix Computations for Signal Processing

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix

MATH 22A: LINEAR ALGEBRA Chapter 4

Chapter 3 Transformations

Linear Methods for Regression. Lijun Zhang

ECE 275A Homework # 3 Due Thursday 10/27/2016

Singular Value Decomposition

Assignment #9: Orthogonal Projections, Gram-Schmidt, and Least Squares. Name:

Vector Spaces, Orthogonality, and Linear Least Squares

EE731 Lecture Notes: Matrix Computations for Signal Processing

Lecture: Linear algebra. 4. Solutions of linear equation systems The fundamental theorem of linear algebra

Statistical Geometry Processing Winter Semester 2011/2012

pset3-sol September 7, 2017

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

MATH 581D FINAL EXAM Autumn December 12, 2016

Computational Methods. Least Squares Approximation/Optimization

Singular Value Decomposition (SVD)

The SVD-Fundamental Theorem of Linear Algebra

Chapter 4. Solving Systems of Equations. Chapter 4

Properties of Matrices and Operations on Matrices

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Lecture 18: The Rank of a Matrix and Consistency of Linear Systems

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

Least Squares. Tom Lyche. October 26, Centre of Mathematics for Applications, Department of Informatics, University of Oslo

COMP 558 lecture 18 Nov. 15, 2010

MTH Linear Algebra. Study Guide. Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education

MATH 167: APPLIED LINEAR ALGEBRA Chapter 2

Lecture 4 Orthonormal vectors and QR factorization

UNIT 6: The singular value decomposition.

MATH 350: Introduction to Computational Mathematics

Linear Algebra, part 3. Going back to least squares. Mathematical Models, Analysis and Simulation = 0. a T 1 e. a T n e. Anna-Karin Tornberg

Orthonormal Transformations and Least Squares

Linear Least-Squares Data Fitting

Least Squares Optimization

Lecture 6. Numerical methods. Approximation of functions

Systems of Linear Equations

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 13

Orthogonality. 6.1 Orthogonal Vectors and Subspaces. Chapter 6

Uniqueness of the Solutions of Some Completion Problems

orthogonal relations between vectors and subspaces Then we study some applications in vector spaces and linear systems, including Orthonormal Basis,

Applied Matrix Algebra Lecture Notes Section 2.2. Gerald Höhn Department of Mathematics, Kansas State University

Computational Methods. Systems of Linear Equations

Lecture 9: Numerical Linear Algebra Primer (February 11st)

Fundamentals of Linear Algebra. Marcel B. Finan Arkansas Tech University c All Rights Reserved

Scientific Computing: Dense Linear Systems

MATH 167: APPLIED LINEAR ALGEBRA Least-Squares

P = A(A T A) 1 A T. A Om (m n)

Problem set 5: SVD, Orthogonal projections, etc.

Lecture 5 Least-squares

Least Squares Optimization

Stat 159/259: Linear Algebra Notes

Review of Linear Algebra

Linear Algebra and Matrix Inversion

Notes on Row Reduction

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

The Full-rank Linear Least Squares Problem

1 Cricket chirps: an example

Math 240 Calculus III

Department of Aerospace Engineering AE602 Mathematics for Aerospace Engineers Assignment No. 4

The Singular Value Decomposition

Linear Least Square Problems Dr.-Ing. Sudchai Boonto

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

Vector Space Concepts

This can be accomplished by left matrix multiplication as follows: I

Digital Workbook for GRA 6035 Mathematics

Linear Algebra Primer

Singular Value Decomposition

MATH 240 Spring, Chapter 1: Linear Equations and Matrices

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

Lecture 1 Systems of Linear Equations and Matrices

14 Singular Value Decomposition

B553 Lecture 5: Matrix Algebra Review

Problem 1. CS205 Homework #2 Solutions. Solution

Problem # Max points possible Actual score Total 120

Transcription:

Optimization Models EE 127 / EE 227AT Laurent El Ghaoui EECS department UC Berkeley Spring 2015 Sp 15 1 / 23 LECTURE 7 Least Squares and Variants If others would but reflect on mathematical truths as deeply and continuously as I have, they would make my discoveries. C.F. Gauss (1777 1855) Sp 15 2 / 23 Outline 1 Least Squares and Minimum-norm solutions 2 Solving systems of linear equations and LS problems 3 Direct and inverse mapping of a unit ball 4 Variants of the Least-Squares problem l 2-regularized LS 5 Eamples Sp 15 3 / 23 Least Squares When y R(A), the system of linear equations is infeasible: there is no such that A = y (as it happens frequently for overdetered systems). In such cases it may however make sense to detere an approimate solution to the system, that is a solution that renders the residual vector r. = A y as small as possible. In the most common case, we measure the residual via the Euclidean norm, whence the problem becomes A y 2 2. From this, that is a solution that imizes the sum of the squares of the equation residuals: m A y 2 2 = ri 2., r i = a i y i, where a i denotes the i-th row of A. i=1 Sp 15 4 / 23

Least Squares Geometric interpretation Since vector A lies in R(A), the problem amounts to detering a point ỹ = A in R(A) at imum distance form y. The Projection Theorem then tells us that this point is indeed the orthogonal projection of y onto the subspace R(A). Sp 15 5 / 23 Least Squares Solution y A R(A) = N (A ), hence A (y A ) = 0 Solutions to the LS problem must satisfy the Normal Equations: This system always admits a solution. A A = A y If A is full column rank (i.e., rank(a) = n), then the solution is unique, and it is given by = (A A) 1 A y. Sp 15 6 / 23 Minimum-norm solutions When matri A has more columns than rows (m < n: underdetered), and y R(A), we have that dim N (A) n m > 0, hence the system y = A has infinite solutions and that the set of solutions is S = { : = + z, z N (A)}, where is any vector such that A = y. We single out from S the one solution with imal Euclidean norm. That is, we solve :A=y 2, which is equivalent to S 2. The solution must be orthogonal to N (A) or, equivalently, R(A ), which means that = A ξ, for some suitable ξ. Since must solve the system of equations, it must be A = y, i.e., AA ξ = y. If A is full row rank, AA is invertible and the unique ξ that solves the previous equation is ξ = (AA ) 1 y. This finally gives us the unique imum-norm solution of the system: = A (AA ) 1 y. Sp 15 7 / 23 LS solutions and the pseudoinverse Corollary 1 (Set of solutions of LS problem) The set of optimal solutions of the LS problem can be epressed as p = A y 2 X opt = A y + N (A), where A y is the imum-norm point in the optimal set. The optimal value p is the norm of the projection of y onto orthogonal complement of R(A): for X opt, p = y A 2 = (I m AA )y 2 = P R(A) y 2, where matri P R(A) is the projector onto R(A). If A is full column rank, then the solution is unique, and equal to = A y = (A A) 1 A y. Sp 15 8 / 23

Solving systems of linear equations and LS problems Direct methods We discuss techniques for solving a square and nonsingular system of equations of the form A = y, A R n,n, A nonsingular. If A R n,n has a special structure, such as upper (resp., lower) triangular matri, then the algorithms of backward substitution (resp., forward substitution) can be directly applied. If A is not triangular, then the method of Gaussian eliation applies a sequence of elementary operations that reduce the system in upper triangular form. Then, backward substitution can be applied to this transformed system in triangular form. A possible drawback of these methods is that they work simultaneously on the coefficient matri A and on the right-hand side term y, hence the whole process has to be redone if one needs to solve the system for several different right-hand sides. Sp 15 9 / 23 Solving systems of linear equations and LS problems Factorization-based methods Another common approach for solving A = y is the so-called factor-solve method. The coefficient matri A is first factored into the product of matrices having particular structure (such as orthogonal, diagonal, or triangular), and then the solution is found by solving a sequence of simpler systems of equations, where the special structure of the factor matrices can be eploited. An advantage of factorization methods is that, once the factorization is computed, it can be used to solve systems for many different values of the right-hand side y. Sp 15 10 / 23 Factor-solve via SVD SVD of A R n,n : A = UΣV, where U, V R n,n are orthogonal, and Σ is diagonal and nonsingular. We write the system A = y as a sequence of systems: These are readily solved sequentially as Uw = y, Σz = w, V = z, w = U y, z = Σ 1 w, = Vz. Sp 15 11 / 23 Factor-solve via QR Any nonsingular matri A R n,n can be factored as A = QR, where Q R n,n is orthogonal, and R is upper triangular with positive diagonal entries. Then, the linear equations A = y can be solved by first multiplying both sides on the left by Q, obtaining Q A = R = ỹ, ỹ = Q y, and then solving the triangular system R = ỹ by backward substitution. This factor-solve process is represented graphically in the figure below. Sp 15 12 / 23

SVD method for non-square systems Consider the linear equations A = y, where A R m,n, and y R m, and let A = U ΣV be an SVD of A. We can completely describe the set of solutions via SVD, as follows. Pre-multiply the linear equation by the inverse of U, U ; then Σ = ỹ, = V, where ỹ = U y. Due to the diagonal form of Σ, the above writes σ i i = ỹ i, i = 1,..., r; 0 = ỹ i, i = r + 1,..., m. Sp 15 13 / 23 SVD method for non-square systems Two cases can occur: 1 If the last m r components of ỹ are not zero, then the second set of conditions in the last epression are not satisfied, hence the system is infeasible, and the solution set is empty. This occurs when y is not in the range of A. 2 If y is in the range of A, then the second set of conditions in the last epression hold, and we can solve for with the first set of conditions, obtaining i = ỹi σ i, i = 1,..., r. The last n r components of are free. This corresponds to elements in the nullspace of A. If A is full column rank (its nullspace is reduced to {0}), then there is a unique solution. Once vector is obtained, the actual unknown can then be recovered as = V. Sp 15 14 / 23 Solving LS problems Given A R m,n and y R m, we discuss solution of the LS problem A y 2. All solutions of the LS problem are solutions of the system of normal equations A A = A y. Therefore, LS solutions can be obtained by either using either direct or factor-solve methods to the normal equations. Sp 15 15 / 23 Direct and inverse mapping of a unit ball We focus on the linear map y = A, A R m,n, where R n is the input vector, and y R m is the output. We consider two problems that we call the direct and the inverse (or estimation) problem. Sp 15 16 / 23

Direct and inverse mapping of a unit ball Direct problem In the direct problem, we assume that the input lies in a unit Euclidean ball centered at zero, and we ask where the output y is. That is, we let and we want to find the output set B n, B n = {z R n : z 2 1} E y = {y : y = A, B n }. This set is a bounded but possibly degenerate ellipsoid (flat on R(A) ), with the aes directions given by the right singular vectors u i and with the semi-aes lengths given by σ i, i = 1,..., n. Sp 15 17 / 23 Direct and inverse mapping of a unit ball Inverse problem Suppose y B m, we ask what is the set of input vectors that would yield such a set as output. Formally, we seek E = { R n : A B m }. Since A B m if and only if A A 1, we obtain that E is E = { R n : (A A) 1}. This ellipsoid is unbounded along directions in the nullspace of A. The aes of E are along the directions of the left singular vectors v i, and the semi aes lengths are given by σ 1 i, i = 1,..., n. Sp 15 18 / 23 Variants of the Least-Squares problem Linear equality-constrained LS A generalization of the basic LS problem allows for the addition of linear equality constraints on the variable, resulting in the constrained problem where C R p,n and d R p. A y 2 2, s.t. C = d, This problem can be converted into a standard LS one, by eliating the equality constraints, via a standard procedure. Suppose the problem is feasible, and let be such that C = d. All feasible points are epressed as = + Nz, where N contains by columns a basis for N (C), and z is a new variable. Problem becomes unconstrained in variable z: where Ā. = AN, ȳ. = y A. z Āz ȳ 2 2, Sp 15 19 / 23 Variants of the Least-Squares problem Weighted LS The standard LS objective is a sum of squared equation residuals A y 2 2 = m i=1 r 2 i, r i = a i y i. In some cases, the equation residuals may not be given the same importance, and this relative importance can be modeled by introducing weights into the LS objective, that is f 0() = m i=1 w i 2 ri 2, where w i 0 are the given weights. This objective is rewritten as where f 0() = W (A y) 2 2 = A w y w 2 2, W = diag (w 1,..., w m), A w. = WA, yw = Wy. The weighted LS problem still has the structure of a standard LS problem, with row-weighted matri A w and vector y w. Sp 15 20 / 23

Variants of the Least-Squares problem l 2-regularized LS Regularized LS refer to a class of problems of the form A y 2 2 + φ(), where a regularization, or penalty, term φ() is added to the usual LS objective. In the most usual cases, φ is proportional either to the l 1 or to the l 2 norm of. The l 1-regularized case gives rise to the LASSO problem, which is discussed in more detail later. The l 2-regularized case is instead discussed net: A y 2 2 + γ 2 2, γ 0 Sp 15 21 / 23 Variants of the Least-Squares problem l 2-regularized LS A y 2 2 + γ 2 2, γ 0 Recalling that the squared Euclidean norm of a block-partitioned vector is equal to the sum of the squared norms of the blocks, i.e., [ ] a 2 = a 2 2 + b 2 2 b 2 we see that the regularized LS problem can be rewritten in the format of a standard LS problem as follows where A y 2 2 + γ 2 2 = Ã ỹ 2 2, Ã =. [ ] A, ỹ. [ ] y =. γin 0 n γ 0 is a tradeoff parameter. Interpretation in terms of tradeoff between output tracking accuracy and input effort. Sp 15 22 / 23 Eamples Linear regression via least-squares. Auto-regressive (AR) models for time-series prediction. Sp 15 23 / 23