Least Squares. Tom Lyche. October 26, Centre of Mathematics for Applications, Department of Informatics, University of Oslo

Similar documents
The Singular Value Decomposition and Least Squares Problems

Orthonormal Transformations and Least Squares

Orthogonal Transformations

Numerical Methods. Elena loli Piccolomini. Civil Engeneering. piccolom. Metodi Numerici M p. 1/??

Linear Algebra, part 3. Going back to least squares. Mathematical Models, Analysis and Simulation = 0. a T 1 e. a T n e. Anna-Karin Tornberg

Orthonormal Transformations

Singular Value Decomposition

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u

4. Matrix inverses. left and right inverse. linear independence. nonsingular matrices. matrices with linearly independent columns

Radial Basis Functions I

Lecture notes: Applied linear algebra Part 1. Version 2

18.06 Problem Set 10 - Solutions Due Thursday, 29 November 2007 at 4 pm in

Computational Methods CMSC/AMSC/MAPL 460. EigenValue decomposition Singular Value Decomposition. Ramani Duraiswami, Dept. of Computer Science

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

UNIT 6: The singular value decomposition.

Linear Algebra, part 3 QR and SVD

Linear Least Squares. Using SVD Decomposition.

NORMS ON SPACE OF MATRICES

σ 11 σ 22 σ pp 0 with p = min(n, m) The σ ii s are the singular values. Notation change σ ii A 1 σ 2

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice

5. Orthogonal matrices

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

Introduction to Numerical Linear Algebra II

Basic Elements of Linear Algebra

7. Dimension and Structure.

Maths for Signals and Systems Linear Algebra in Engineering

Vector and Matrix Norms. Vector and Matrix Norms

1. Let m 1 and n 1 be two natural numbers such that m > n. Which of the following is/are true?

Least-Squares Systems and The QR factorization

Linear Algebra Review. Vectors

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 9

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

Notes on Solving Linear Least-Squares Problems

B553 Lecture 5: Matrix Algebra Review

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix

CS 143 Linear Algebra Review

1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det

Applied Linear Algebra in Geoscience Using MATLAB

ORTHOGONALITY AND LEAST-SQUARES [CHAP. 6]

Math 407: Linear Optimization

MAT Linear Algebra Collection of sample exams

A Review of Linear Algebra

DS-GA 1002 Lecture notes 10 November 23, Linear models

Lecture Notes for Inf-Mat 3350/4350, Tom Lyche

Linear Algebra. Session 12

Problem Set (T) If A is an m n matrix, B is an n p matrix and D is a p s matrix, then show

Notes on Eigenvalues, Singular Values and QR

MATH36001 Generalized Inverses and the SVD 2015

Computational Methods. Eigenvalues and Singular Values

Lecture 1: Review of linear algebra

The Singular Value Decomposition

Singular Value Decomposition (SVD)

Index. book 2009/5/27 page 121. (Page numbers set in bold type indicate the definition of an entry.)

Computational math: Assignment 1

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2013 PROBLEM SET 2

MATH 2331 Linear Algebra. Section 2.1 Matrix Operations. Definition: A : m n, B : n p. Example: Compute AB, if possible.

Problem # Max points possible Actual score Total 120

Linear Algebra Massoud Malek

Properties of Matrices and Operations on Matrices

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Lecture 4 Orthonormal vectors and QR factorization

Pseudoinverse & Moore-Penrose Conditions

Basic Calculus Review

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

Class notes: Approximation

Linear Algebra V = T = ( 4 3 ).

Computing Eigenvalues and/or Eigenvectors;Part 1, Generalities and symmetric matrices

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

EE263: Introduction to Linear Dynamical Systems Review Session 9

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Singular Value Decomposition

Numerical Methods I Singular Value Decomposition

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

2. Review of Linear Algebra

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5

COMP 558 lecture 18 Nov. 15, 2010

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

MATH 350: Introduction to Computational Mathematics

Hilbert Spaces: Infinite-Dimensional Vector Spaces

CS 322 Homework 4 Solutions

ELE/MCE 503 Linear Algebra Facts Fall 2018

Dot product and linear least squares problems

Basic Concepts in Linear Algebra

Cheat Sheet for MATH461

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Functional Analysis Review

EE731 Lecture Notes: Matrix Computations for Signal Processing

Chapter 4 Euclid Space

The Singular Value Decomposition

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Orthogonalization and least squares methods

Lecture 9: Numerical Linear Algebra Primer (February 11st)

Review of Basic Concepts in Linear Algebra

Since the determinant of a diagonal matrix is the product of its diagonal elements it is trivial to see that det(a) = α 2. = max. A 1 x.

Linear Algebra (Review) Volker Tresp 2018

Linear Analysis Lecture 16

Review of some mathematical tools

Knowledge Discovery and Data Mining 1 (VO) ( )

The Singular Value Decomposition

Transcription:

Least Squares Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University of Oslo October 26, 2010

Linear system Linear system Ax = b, A C m,n, b C m, x C n. under-determined (m < n): No solution, or an infinite number of solutions. square (m = n): Unique solution if nonsingular. Otherwise either no solution, or an infinite number of solutions. over-determined (m > n): Either no solution, a unique solution or an infinite number of solutions. But an over-determined system usually has no solution.

The Least Squares Problem Definition Given A C m,n and b C m. We call an x C n which minimizes r(x) 2 2 = Ax b 2 2 a least squares solution of Ax = b. We set E(x) := Ax b 2 2 = r(x) 2 2. To find x which minimizes E(x) is called the Least Squares Problem (LSQ). Since the square root function is monotone, minimizing E(x) or E(x) is equivalent.

Example 1 x 1 = 1 1 1 x 1 = 1, A = 1, x = [x 1 ], b = 1, x 1 = 2 1 2 Ax b 2 2 = (x 1 1) 2 + (x 1 1) 2 + (x 1 2) 2. Setting the first derivative with respect to x 1 equal to zero we obtain 2(x 1 1) + 2(x 1 1) + 2(x 1 2) = 0 x 1 = 4/3, the average of b 1, b 2, b 3. The second derivative is positive and x 1 = 4/3 is a global minimum.

The Normal Equations E(x) = Ax b 2 2 = x Bx 2c x + β, B = A A, c = A b, β = b b. E(x) := [ E x 1,..., E x n ] T = 2(Bx c). x least squares solution Bx c = 0 A Ax = A b (the Normal Equations). We will show the converse: A Ax = A b x least squares solution.

Example 2 We choose m abscissas t 1,..., t m, n functions φ 1, φ 2,..., φ n defined for t {t 1, t 2,..., t m }, and m positive numbers w 1,..., w m. We want to find x = [x 1, x 2,..., x n ] T such that E(x) := m k=1 w k [ n j=1 x j φ j (t k ) y k ] 2 is as small as possible. Typical examples of functions might be polynomials, trigonometric functions, exponential functions, or splines. The numbers w k are called weights.

Example 2 is a Least Squares Problem E(x) := m k=1 w k [ n j=1 x j φ j (t k ) y k ] 2 = Ax b 2 2, B = A T A = [ m k=1 w kφ i (t k )φ j (t k )] n i,j=1 Rn,n. [ c = A T m ] n b = k=1 w 1/2 k φ i (t k ) R n. i=1

w i = 1, i = 1,..., m, φ 1 (t) = 1,φ 2 (t) = t,φ n (t) = t n 1 The normal equations n = 3: m t k t 2 k B 3 x = tk t 2 k t 3 k t 2 k t 3 k t 4 k x 1 x 2 x 3 = yk tk y k t 2 k y k. symmetric positive definite if at least n distinct t s.

Hilbert Matrix t i = (i 1)/(m 1), i = 1,..., m. 1 B m n H n H 3 = 1 1 2 1 1 2 3 1 1 3 4 1 3 1 4 1 5 1 m m k=1 ti+j k = 1 m m k=1 K 1 (H 6 ) 3 10 7. ( k 1 ) i+j 1 x i+j dx = 1. m 1 0 i+j+1 Extremely ill-conditioned even for moderate n. Use different basis for polynomials.

What is next? The Pseudoinverse Orthogonal Projections LSQ: Existence and Uniqueness Numerical Solution Methods Perturbation Theory

Recall SVD, SVF, and Outer product form m n m n n r r n V * Σ * 1 V 1 A = m U Σ = U 1 = σ 1 u 1 v 1 * +...+σr u r v r * Figure: Three forms of SVD, decomposition (center left), factorization (center right), and outer product (right).

The Pseudoinverse Suppose A = U 1 Σ 1 V 1 C m,n is a singular value factorization of A. The matrix A := V 1 Σ 1 1 U 1 is called the pseudo-inverse of A. If A C m,n then A C n,m. If (1) ABA = A, (2) BAB = B, (3) (BA) = BA, and (4) (AB) = AB then B := A. A is independent of the factorization chosen to represent it. If A is square and nonsingular then A A = AA = I and A is the usual inverse of A. Any matrix has a pseudo-inverse, and so A is a generalization of the usual inverse.

Example 1 1 1/ 2 A = 1 1 = 1/ 2 [2] [ 1/ 2 1/ 2 ] 0 0 0 [ ] 1/ 2 B := A = 1/ [1/2] [ 1/ 2 1/ 2 0 ] = [ ] 2 1 1 1 0 4 1 1 0 If we guess a candidate B for A and (1) ABA = A, (2) BAB = B, (3) (BA) = BA, and (4) (AB) = AB. Then B := A.

What is next? The Pseudoinverse Orthogonal Projections LSQ: Existence and Uniqueness Numerical Solution Methods Perturbation Theory

Recall Direct sum and Orthogonal Sum Suppose S and T are subspaces of R n or C n. We define Sum: X := S + T := {s + t : s S and t T }; Direct Sum: If S T = {0}, then S T := S + T. Orthogonal Sum: Suppose, is an inner product on R n or C n. S T is an orthogonal sum if s, t = 0 for all s S and all t T. span(a) ker(a ) is an orthogonal sum with respect to the usual inner product s, t := s t. For if y = Ax span(a) and z ker(a ) then y z = (Ax) z = x (A z) = 0. orthogonal complement: T = S := {x X : s, x = 0 for all s S}.

Basic facts Suppose S and T are subspaces of R n or C n. Then S + T = T + S and S + T is a subspace of R n or C n. dim(s + T ) = dim S + dim T dim(s T ) dim(s T ) = dim S + dim T. C m = span(a) ker(a ) Every v S T can be decomposed uniquely as v = s + t, where s S and t T. If S T is an orthogonal sum then s is called the orthogonal projection of v into S.

Pythagoras v t S s If s, t = 0 then s + t 2 = s 2 + t 2. Here v := v, v.

Orthogonal basis for span(a) and ker(a ) A = UΣV, A = VΣ T U. AV = UΣ, A U = VΣ T A [ V 1 V 2 ] = [ U 1 U 2 ] [ ] Σ 1 0 0 0, A [ U 1 U 2 ] = [ V 1 V 2 ] [ ] Σ 1 0 0 0. AV 1 = U 1 Σ 1, AV 2 = 0, A U 1 = V 1 Σ 1, A U 2 = 0 U 1 is an orthonormal basis for span(a) U 2 is an orthonormal basis for ker(a ).

Orthogonal Projections and SVD A = [ U 1 U 2 ] [ ] [ ] Σ 1 0 V 1 0 0 V 2 U 1 is an orthonormal basis for span(a) U 2 is an orthonormal basis for ker(a ). Let b C m. Then b = UU b = [U 1 U 2 ] [ ] U 1 U b = U 1 (U 1b) + U 2 (U 2b). 2 b 1 := U 1 U 1b span(a) is the orthogonal projection of b into span(a). b 2 = U 2 U 2b ker(a ) is the orthogonal projection into ker(a ).

Orthogonal Projections and SVD b 1 = AA b, b 2 = (I AA )b b b2 Span(A) b1

Example 1 0 The singular value decomposition of A = 0 1 is 0 0 A = I 3 AI 2. [ ] [ ] 1 0 0 1 0 0 A = I 2 I 0 1 0 3 =. 0 1 0 1 0 [ ] 1 0 0 AA = 0 1 1 0 0 = 0 1 0 0 1 0 0 0 0 0 0 0 0 0 I 3 AA = 0 0 0 0 0 1 If b = [b 1, b 2, b 3 ] T, then b 1 = AA b = [b 1, b 2, 0] and b 2 = (I 3 AA )b = [0, 0, b 3 ] T.

What is next? The Pseudoinverse Orthogonal Projections LSQ: Existence and Uniqueness Numerical Solution Methods Perturbation Theory

Existence, Uniqueness, Characterization Theorem The least squares problem always has a solution. The solution is unique if and only if A has linearly independent columns. Moreover, the following are equivalent. 1. x is a solution of the least squares problem. 2. A Ax = A b 3. x = A b + z, for some z ker(a), and where A is the pseudo-inverse of A. We have x 2 A b 2 for all solutions x of the least squares problem.

Proof Existence Let b = b 1 + b 2, where b 1 span(a) and b 2 ker(a ) are the orthogonal projections into span(a) and ker(a ), respectively Since b 2v = 0 for any v span(a) we have b 2(b 1 Ax) = 0 for any x C n. Therefore, for x C n, b Ax 2 2 = (b 1 Ax)+b 2 2 2 = b 1 Ax 2 2+ b 2 2 2 b 2 2 2, with equality if and only if Ax = b 1. Since b 1 span(a) we can always find such an x and existence follows.

Proof 1,2,3 and Uniqueness 1 2: By what we have shown x solves the least squares problem if and only if Ax = b 1 so that b Ax = b 1 + b 2 Ax = b 2 ker(a ), or A (b Ax) = 0 or A Ax = A b. 1 = 3: Suppose Ax = b 1 and define z := x A b. Then Az = Ax AA b = b 1 b 1 = 0 and z ker(a). 3 = 1: If x = A b + z with z ker(a) then Ax = AA b + Az = b 1. If A has linearly independent columns then ker(a) = {0} and x = A b is the unique solution.

Proof Minimum norm property Suppose x = A b + z, with z ker(a) is a solution. Right singular vectors of A: [v 1,..., v r, v r+1,..., v n ] = [V 1, V 2 ], V 2 is a basis for ker(a) and V 2V 1 = 0. If A = V 1 Σ 1 1 U 1 and z ker(a) then z = V 2 y for some y C n r and z A b = y V 2V 1 Σ 1 1 U 1b = 0. Thus z and A b are orthogonal so that x 2 2 = A b + z 2 2 = A b 2 2 + z 2 2 A b 2 2.

What is next? The Pseudoinverse Orthogonal Projections LSQ: Existence and Uniqueness Numerical Solution Methods Normal Equations QR Decomposition and Factorization SVD and SVF Perturbation Theory

Numerical Solution: Normal Equations Assume A R m,n, b R m A has linearly independent columns (m n). Then B := A T A is symmetric positive definite, Can use R T R factorization of B.

Computing A T A A = [a :1,..., a :n ] = [ a T 1:. a T m: ]. 1. (A T A) ij = a T :i a :j, (A T b) i = a T :i b, (inner product form), 2. A T A = m i=1 a i:a T i:, AT b = m i=1 b ia i:,(outer product form). The outer product form is suitable for large problems since it uses only one pass through the data importing one row of A at a time from some separate storage.

Complexity Consider the number of operations to compute B := A T A. We need 2m flops to find each a T :i a :j. Since B is symmetric we only need to compute n(n + 1)/2 such inner products. It follows that B can be computed in O(mn 2 ) flops. The computation of B using outer products can also be done in O(mn 2 ) flops by computing only one half of A. In conclusion, the number of operations are O(mn 2 ) to find B, 2mn to find A T b, O(n 3 /3) to find R, O(n 2 ) to solve R T y = c and O(n 2 ) to solve Rx = y. Since m n, the bulk of the work is to find B.

Condition Number Issue: Squaring the Trouble A problem with the normal equation approach is that the linear system can be poorly conditioned. The 2-norm condition number of B := A T A is the square of the condition number of A. K 2 (B) = (σ 1 /σ n ) 2 = K 2 (A) 2. One difficulty which can be encountered is that the computed A T A might not be positive definite.

Numerical Solution using the QR Factorization Assume A R m,n, b R m A has linearly independent columns (m n). Then Suppose A = Q 1 R 1 is a QR factorization of A. A T A = R T 1 Q T 1 Q 1 R 1 = R T 1 R 1, A T b = R T 1 Q T 1 b. Since A has rank n the matrix R T 1 is nonsingular and can be canceled. Thus A T Ax = A T b = R 1 x = c 1, c 1 := Q T 1 b.

R 1 x = c 1, c 1 := Q T 1 b. We can use Householder transformations or Givens rotations to find R 1 and c 1. Consider using the Householder triangulation algorithm. We find R = Q T A and c = Q T b, where A = QR is the QR decomposition of A. The matrices R 1 and c 1 are located in the first n rows of R and c.

Example Consider the least squares problem with 1 3 1 1 A = 1 3 7 1 1 4 and b = 1 1. 1 1 2 1 A QR decomposition A = QR is 1 3 1 1 1 1 1 2 2 3 1 3 7 1 1 4 = 1 1 1 1 1 2 1 1 1 1 0 4 5 0 0 6. 1 1 2 1 1 1 1 0 0 0

Example A QR-factorization A = Q 1 R 1 is obtained by dropping the last column of Q and the last row of R so that 1 1 1 A = 1 1 1 1 2 2 3 2 1 1 1 0 4 5 = Q 1 R 1 0 0 6 1 1 1 The least squares solution x is found by solving the system 1 2 2 3 x 1 0 4 5 = 1 1 1 1 1 1 1 1 1 1 2 0 0 6 1 1 1 1 x 2 x 3 we find x = [1, 0, 0] T. 1 1

Complexity The leading term in the number of flops to compute a QR decomposition is approximately 2mn 2 2n 3 /3. The number of flops needed to form the normal equations, taking advantage of symmetry is approximately mn 2. Thus for m much larger than n using Householder triangulation requires twice as many flops as the an approach based on the normal equations. Also, Householder triangulation have problems taking advantage of the structure in sparse problems.

Condition Number; less trouble The 2 norm condition number for the system R 1 x = c 1 is K 2 (R 1 ) = K 2 (Q 1 R 1 ) = K 2 (A) = K 2 (A T A), the square root of the condition number for the normal equations. Thus if A is mildly ill-conditioned the normal equations can be quite ill-conditioned and solving the normal equations can give inaccurate results. The QR factorization approach is quite stable.

Numerical Solution using the Singular Value Factorization This method can be used even if A does not have full rank. It requires knowledge of the pseudo-inverse of A. x = A b + z is a least squares solution for any z ker(a). When rank(a) is less than the number of columns of A then ker(a) {0}, and we have a choice of z. One possible choice is z = 0 giving the minimal norm solution A b.

Example 1 1 Find all least square solution of A = 1 1 0 0 [ ] 1 1 0 The pseudo-inverse of A is A = 1. 4 1 1 0 [ 1, 1] T is a basis for ker(a). If b = [b 1, b 2, b 3 ] T, then for any z R the vector x = 1 [ ] 1 1 0 1 [ ] b b 4 1 1 0 2 1 + z 1 b 3 is a solution of min Ax b 2 and this gives all solutions. z = 0 gives the minimal norm solution.

Matlab x = lscov(a,b) returns the ordinary least squares solution to the linear system of equations A*x = b, b can also be an m-by-k matrix, and lscov returns one solution for each column of b. When rank(a) n, lscov sets the maximum possible number of elements of x to zero to obtain a basic solution. Not the same as minimal norm solution. Uses QR factorization

What is next? The Pseudoinverse Orthogonal Projections LSQ Theory: Existence and Uniqueness Numerical Solution Methods Perturbation Theory Perturbing the right hand side

Perturbing the right hand side Theorem Suppose A C m,n has linearly independent columns, and let b, e C m. Let x, y C n be the solutions of min Ax b 2 and min Ay b e 2. Finally, let b 1, e 1 be the projections of b and e on span(a). If b 1 0, we have for any operator norm 1 e 1 K(A) b 1 y x x K(A) e 1 b 1, K(A) = A A. τ = N( A * ) b b 2 e span( A) e 1 b 1

Proof Subtract x = A b from y = A b + A e y x = A e = A e 1 since A e = A e 1. Thus y x = A e 1 A e 1 Moreover, b 1 = Ax A x. Therefore y x / x A A e 1 / b 1 proving the rightmost inequality. From A(x y) = e 1 and x = A b 1 we obtain the leftmost inequality.