Computational Methods. Least Squares Approximation/Optimization

Similar documents
Numerical Methods. Elena loli Piccolomini. Civil Engeneering. piccolom. Metodi Numerici M p. 1/??

Linear Least squares

Computational Methods. Systems of Linear Equations

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Q T A = R ))) ) = A n 1 R

Chapter 3 Numerical Methods

Inverses. Stephen Boyd. EE103 Stanford University. October 28, 2017

Computational Methods. Eigenvalues and Singular Values

Matrix decompositions

Lecture 6. Regularized least-squares and minimum-norm methods 6 1

Lecture 6. Numerical methods. Approximation of functions

Orthogonal Transformations

MATH 350: Introduction to Computational Mathematics

3 QR factorization revisited

LINEAR ALGEBRA: NUMERICAL METHODS. Version: August 12,

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Orthonormal Transformations and Least Squares

Cheat Sheet for MATH461

LECTURE NOTES ELEMENTARY NUMERICAL METHODS. Eusebius Doedel

Lecture 6, Sci. Comp. for DPhil Students

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

Problem set 5: SVD, Orthogonal projections, etc.

Numerical solution of Least Squares Problems 1/32

Linear Least-Squares Data Fitting

Math 411 Preliminaries

Linear Analysis Lecture 16

MIDTERM. b) [2 points] Compute the LU Decomposition A = LU or explain why one does not exist.

QR Decomposition. When solving an overdetermined system by projection (or a least squares solution) often the following method is used:

Non-polynomial Least-squares fitting

Since the determinant of a diagonal matrix is the product of its diagonal elements it is trivial to see that det(a) = α 2. = max. A 1 x.

Review Questions REVIEW QUESTIONS 71

17 Solution of Nonlinear Systems

Practical Linear Algebra: A Geometry Toolbox

ANALYTICAL MATHEMATICS FOR APPLICATIONS 2018 LECTURE NOTES 3

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

7. Dimension and Structure.

ECEN 615 Methods of Electric Power Systems Analysis Lecture 18: Least Squares, State Estimation

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 9

CLASS NOTES Computational Methods for Engineering Applications I Spring 2015

Orthogonality. 6.1 Orthogonal Vectors and Subspaces. Chapter 6

Linear Algebra in Actuarial Science: Slides to the lecture

Math 407: Linear Optimization

Solution of Linear Equations

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

5.6. PSEUDOINVERSES 101. A H w.

Numerical Methods - Numerical Linear Algebra

Applied Linear Algebra in Geoscience Using MATLAB

Linear Algebra Massoud Malek

This can be accomplished by left matrix multiplication as follows: I

The Normal Equations. For A R m n with m > n, A T A is singular if and only if A is rank-deficient. 1 Proof:

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES

MATH 425-Spring 2010 HOMEWORK ASSIGNMENTS

MAT 610: Numerical Linear Algebra. James V. Lambers

LECTURE 7. Least Squares and Variants. Optimization Models EE 127 / EE 227AT. Outline. Least Squares. Notes. Notes. Notes. Notes.

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

Numerical Methods in Matrix Computations

orthogonal relations between vectors and subspaces Then we study some applications in vector spaces and linear systems, including Orthonormal Basis,

Applied Linear Algebra

Conceptual Questions for Review

Problem 1. CS205 Homework #2 Solutions. Solution

Lecture 9: Numerical Linear Algebra Primer (February 11st)

Lecture # 11 The Power Method for Eigenvalues Part II. The power method find the largest (in magnitude) eigenvalue of. A R n n.

FINITE-DIMENSIONAL LINEAR ALGEBRA

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

JACOBI S ITERATION METHOD

Notes on Eigenvalues, Singular Values and QR

ECE133A Applied Numerical Computing Additional Lecture Notes

Chapter 4. Solving Systems of Equations. Chapter 4

Topics. Vectors (column matrices): Vector addition and scalar multiplication The matrix of a linear function y Ax The elements of a matrix A : A ij

MATH 167: APPLIED LINEAR ALGEBRA Least-Squares

COMP 558 lecture 18 Nov. 15, 2010

Matrix decompositions

Review of matrices. Let m, n IN. A rectangle of numbers written like A =

Basic Concepts in Linear Algebra

M.SC. PHYSICS - II YEAR

Orthonormal Transformations

LINEAR SYSTEMS (11) Intensive Computation

ANSWERS. E k E 2 E 1 A = B

Linear and Nonlinear Optimization

Algebra C Numerical Linear Algebra Sample Exam Problems

Applied Numerical Linear Algebra. Lecture 8

LU Factorization. LU factorization is the most common way of solving linear systems! Ax = b LUx = b

Linear Algebraic Equations

Review of Basic Concepts in Linear Algebra

13-2 Text: 28-30; AB: 1.3.3, 3.2.3, 3.4.2, 3.5, 3.6.2; GvL Eigen2

HOMEWORK PROBLEMS FROM STRANG S LINEAR ALGEBRA AND ITS APPLICATIONS (4TH EDITION)

14.2 QR Factorization with Column Pivoting

Linear Algebra Linear Algebra : Matrix decompositions Monday, February 11th Math 365 Week #4

Linear Algebra, part 3 QR and SVD

The Full-rank Linear Least Squares Problem

arxiv: v1 [math.na] 5 May 2011

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

Solving Linear Systems of Equations

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Numerical Methods I Non-Square and Sparse Linear Systems

LU Factorization. Marco Chiarandini. DM559 Linear and Integer Programming. Department of Mathematics & Computer Science University of Southern Denmark

Solving linear equations with Gaussian Elimination (I)

x x2 2 + x3 3 x4 3. Use the divided-difference method to find a polynomial of least degree that fits the values shown: (b)

Transcription:

Computational Methods Least Squares Approximation/Optimization Manfred Huber 2011 1

Least Squares Least squares methods are aimed at finding approximate solutions when no precise solution exists Find the solution that minimizes the residual error in the system Least squares can be used to fit a model to noisy data points or to fit a simpler model to complex data Amounts to projecting higher dimensional data onto a lowerdimensional space. Manfred Huber 2011 2

Linear Least Squares Linear least squares attempts to find a least squares solution for an overdetermined linear system i.e. a linear system described by an m x n matrix A with more equations than parameters). A x! " b! Least squares minimizes the squared Eucliden norm of the residual! min! b " A! 2! 2 x x = min! r 2 x 2 For data fitting on m data points using a linear combination of basis functions this corresponds to m n min " y m # " j j x i )) 2 i=1 j=1 Manfred Huber 2011 3

Existence and Uniqueness Linear least squares problem always has a solution Solution is unique if and only if A has full rank, i.e. ranka)=n If ranka) < n then A is rank-deficient and the solution of the least squares problem is not unique If solution is unique the residual vector can be expressed through A r 2 2 = r T r = b " Ax) T b " Ax) = b T b " 2x T A T b + x T A T Ax This is minimized if its derivative is 0 "2A T b + 2A T Ax = 0 Manfred Huber 2011 4

Normal Equations Optimization reduces to a n x n system of linear) normal equations A T Ax = A T b Linear least squares can be found by solving this system of linear equations Solution can also be found through the Pseudo Inverse Ax " b # x = A + b A + = A T A) 1 A T Condition number with respect to A can be expressed as in the case of solving linear equations conda) = A 2 A + 2 Manfred Huber 2011 5

Data Fitting A common use for linear least squares solution is to fit a given type of function to noisy data points n th order polynomial fit using monomial basis: A! " = # n 1 x 1 " x 1 & # n 1 x 2 " x 2! " = # # # n 1 x m " x m y 1 y 2 # y m & Solving the system of equations provides the best fit in terms of the Euclidian norm A T A" = A T y Manfred Huber 2011 6

Condition Number and Sensitivity Sensitivity also depends on b Influence of b can be expressed in terms of an angle between b and y cos") = y 2 b 2 = Ax 2 b 2 Bound on the error in the solution due to perturbation in b can be expressed as "x 2 x 2 # conda) 1 cos) "b 2 b 2 Manfred Huber 2011 7

Condition Number and Sensitivity Bound on the error in the solution due to perturbation in A can be expressed as "x 2 x 2 # conda)) 2 tan) + conda)) "A 2 A 2 For small residuals the condition number for least squares is approximately conda). For large residuals the condition number can be square of worse Manfred Huber 2011 8

Condition Number and Sensitivity Conditioning of normal equation solution is conda T A) = conda)) 2 For large systems of equation the condition number of the formulation using normal equations or the Pseudoinverse) increases rapidly Much of the increased sensitivity is due to the need for multiplying A and A T in order to be able to apply a solution algorithm for the system of equations Conditioning of the normal equations is potentially significantly worse than the conditioning of the original system Algorithm is not very stable for large numbers of equations/data points Manfred Huber 2011 9

Augmented System Method Augmented system method transforms least square problem into an system of equation solving problem by adding equations and can be used to improve the conditioning Increase matrix size to a square m+n)xm+n) matrix by including the residual equations r + Ax = b A T r = 0 " # I A& A T 0 # r & x = # b & 0 Greater freedom to choose pivots and maintain stability Substentially higher complexity for systems with m>>n due to the need to solve a n+m)xn+m) system Manfred Huber 2011 10

QR Factorization The Augmented System method addresses stability but adds very high cost since it expands the matrix QR factorization changes the system matrix into solvable form without computation of A T A and without expanding the matrix " R 1,1 R 1,2! R 1,n 0 R 2,2! R 2,n " " # " A = Q # 0! 0 R n,n 0 0! 0 " " # " 0 0! 0 & " = Q R # 0& Q T Ax = Q T b Manfred Huber 2011 11

QR Factorization The important part of Q for the solution consists of the first n rows since the others are multiplied by 0 Q T = Q 1 Q 2 ) " Q T 1 Ax = Rx = Q T 1 b QR factorization factors the system matrix A into Q an R where R is upper triangular As in LU Factorization, Q T represents a sequence of solutionpreserving transformations In LU Factorization only identity has to be preserved which can be done using elimination matrices In QR Factorization the least square characteristic has to be preserved which requires the use of orthogonal transformations R is an upper triangular matrix that can be used for backward substitution to compute the solution Manfred Huber 2011 12

Orthogonal Transformations Orthogonal transformations preserve the least square solution QQ T = I Qv 2 2 = Qv) T Qv = v T Q T Qv = v T v = v 2 2 For QR factorization we need to find a set of orthogonal transformations that can transform A into an upper triangular matrix R and that are numerically stable Householder transforms Givens rotations Gram-Schmidt orthogonalization Manfred Huber 2011 13

QR Factorization with Householder Transformations Householder transformations allow to zero all entries below a chosen point in a vector a Applying this consecutively for every column of the matrix A, choosing the diagonal element as the one below which everything is to be zeroed out we can construct R Householder transformation can be computed from a given vector vector of column values), a, setting all the values that are not to be changed entries above the diagonal) to 0 Q = H = I " 2 vvt v T v, v = a " #e i, # = ± a 2 The sign for can be chosen to avoid cancellation loss of significance) in the computation of v H is orthogonal and symmetric: H = H T = H "1 Manfred Huber 2011 14

QR Factorization with Householder Transformations In QR factorization with Householder transforms, successive columns are adjusted to upper triangular by zeroing all entries below the diagonal element. Householder transform does not affect the entries in the columns to the left and the rows above the currently chosen diagonal element since it contains only 0s in the rows below the chosen diagonal term. Applying the Householder transform only to the columns to the right saves significant processing time H! # & a j = I " 2 vvt a! v T j = a! j " 2 vvt! a v v T j =!! a j " 2v vt a j v v T v =!! a j " 2 vt a j v T v v Applying it to individual columns also eliminates the need to compute full matrix H - only v = a! j " #e j is needed Manfred Huber 2011 15

QR Factorization Example Quadratic polynomial fit to 5 data points Data : "1,1),"0.5,0.5),0,0),0.5,0.5),1,2) Design linear system for data fitting A! " = # 2 1 x 1 x 1 & # 2 1 x 2 x 2 2 1 x 3 x 3! " ) 2 1 x 4 x 4 2 1 x 5 x 5 y 1 y 2 y 3 y 4 y 5 & * # 1 +1 1 & # 1 & 1 +0.5 0.25! 0.5 1 0 0 " ) 0 1 0.5 0.25 0.5 1 1 1 2 Starting with the first column start eliminating entries below the diagonal using appropriate Householder transforms choosing the sign on to avoid cancellation Manfred Huber 2011 16

Manfred Huber 2011 17 QR Factorization Example Householder elimination by column 1 st column: Negative sign on because potentially cancelling term is +1 v 1 =! a 1 "! a 1 2 e 1 = 1 1 1 1 1 # & + 5 1 0 0 0 0 # & = 3.236 1 1 1 1 # & H 1 A = "2.236 0 "1.118 0 "0.191 "0.405 0 0.309 "0.655 0 0.809 "0.405 0 1.309 0.345 # &, H 1! y = "1.789 "0.362 "0.862 "0.362 1.138 # &

QR Factorization Example Householder elimination by column 2 nd column: # 0 & # 0& # 0 & v 2 = a! 2 " a! "0.191 1 "1.772 e 2 2 2 = 0.309 " 2.5 0 = 0.309 0.809 0 0.809 1.309 0 1.309 Positive sign on because possibly cancelling term is -0.191 #"2.236 0 "1.118& 0 1.581 0 H 2 H 1 A = 0 0 "0.725 0 0 "0.589 0 0 0.047, H 2 H 1! y = #"1.789& 0.632 "1.035 "0.816 0.404 Manfred Huber 2011 18

QR Factorization Example Householder elimination by column 3 rd column: v 2 =! a 3 "! a 3 2 e 3 = # 0 & # 0& # 0 & 0 0 0 "0.725 " 0.875 1 = "1.66 "0.589 0 "0.589 0.047 0 0.047 Positive sign on because possibly cancelling term is -0.725 H 3 H 2 H 1 A = #"2.236 0 "1.118& 0 1.581 0 0 0 0.935 0 0 0 0 0 0 #"1.789&! 0.632, H 3 H 2 H 1 y = 1.336 0.026 0.337 Manfred Huber 2011 19

QR Factorization Example Backward substitution with the upper triangular matrix yields the parameters and least squares fit second order polynomial! " = # 0.086& 0.4 1.429 px) = 0.086 + 0.4x +1.429x 2 Manfred Huber 2011 20

Other Orthogonal Transformations Other transformations can be used for QR Factorization Givens rotations Gram-Schmidt Orthogonalization Householder Transformations generally achieve the best performance and stability tradeoff Complexity of QR Factorization with Householder transformations is approximately mn 2 -n 3 /3 multiplications Depending on the size of m data points / equations) this is between the same and two times the work of normal equations Conditioning of QR Factorization with Householder transformations is optimal ) 2 conda) + r 2 conda) Manfred Huber 2011 21

QR Factorization Transformations for QR Factorization are numerically more stable than elimination steps in LU Factorization Choice of sign in Householder transformations allows to avoid cancellation and thus instabilities in individual transforms Row Pivoting is usually not necessary Stability leads to QR factorization also frequently being used instead of LU Factorization to solve nonsingular systems of linear equations Increased complexity is traded off against stability and thus precision) of the solution Manfred Huber 2011 22

Nonlinear Least Squares As for equation solving, finding solutions for general nonlinear systems requires iterative solutions Goal is to find the approximate solution with the smallest square residual,, for the system of functions fx) r i x) = b i " f i x) #x) = 1 2 rt x)rx) At the minimum, the gradient of the square residual function, would be 0 "#x) = J r T x)rx) = 0 Could use Multivariate Newton method to find the root of the derivative which would require second derivative, the Hessian of the square error H " x) = J r T x)j r x) + r i x)h ri x) Manfred Huber 2011 i=1 23 # m

Gauss-Newton Method Multivariate Newton method would require solving the following linear system in each iteration Requires frequent computation of the Hessian which is expensive and reduces stability Gauss-Newton avoids this by dropping second order term # J T m & r x)j r x) + " r i x)h ri x) s = )J i=1 T r x)rx) J r T x)j r x)s = "J T x)rx) This is best solved by converting this into the corresponding least squares problem and using QR factorization J r x)s " #rx) Once solved, Gauss-Newton operates like Multivariate Newton x t +1 = x t + s Manfred Huber 2011 24

Gauss-Newton Method Gauss-Newton method replaces nonlinear least squares problem with a sequence of linear least squares problems that converge to solution of nonlinear system Converges if residual at the solution is not too large Large residual at solution can lead to large values in Hessian, potentially leading to slow convergence or, in extreme cases, non-convergence If it does not converge large residual at solution) other methods have to be used Levenberg-Marquardt method which uses an additional scalar parameter and a separate, function-specific strategy to choose it) to modify step size J J T r x)j r x) _µi)s = "J T r x) r x)rx) # & ) s * "rx) & ) µi 0 General optimization using the complete Hessian Manfred Huber 2011 Significant increase in computational complexity 25

Gauss-Newton Example Fit exponential function to data Data : 0,2),1,0.7),2,0.3),3,0.1) f ",x) = " 1 e " 2 x Residual function for data fitting is given by 2 # f ",0) & ) 0.7 # f ",1) r") = & ) & 0.3# f ",2)) & ) 0.1# f ",3) Resulting Jacobian is #1 0 & #e " 2 #" J r ") = 1 e " ) 2 & ) #e 2" 2 #2" 1 e 2" 2 & ) & #e 3" 2 #3" 1 e 3" ) 2 Manfred Huber 2011 26

Gauss-Newton Example First iteration starting at 0) =1 0) T Initial square residual: " r 1 # 0& 2 ) 4 = y i 1e i=1 0 ) 2 = 2.39 Solve for step "1 0 " 1 "" 1 J r s = 1 1 0.3 " s ) * s = 0.69 ## 0& & 1 2 0.7 #0.61& #1 3& # 0.9& Update parameter vector for next iteration # " 1) = 1 & # + 0.69 & # = 1.69 & 0 )0.61 )0.69 Manfred Huber 2011 27

Gauss-Newton Example Second Iteration Square residual # r 1.69 & "0.61 2 ) 4 = y i "1.69e "0.61x i i=1 ) 2 = 0.212 Solve for next step # "1 0 & #"0.31& ## 1.69 && J r s = "0.543 "0.918 0.218 # s ) * s = 0.285 & "0.61 "0.295 "0.998 0.199 "0.32 "0.16 "0.813 0.171 Update parameter vector for next iteration " 2) = & 1.69 ) + & 0.285 ) = & 1.975 ) #0.61 #0.32 #0.93 Manfred Huber 2011 28

Gauss-Newton Example Third Iteration Square residual # r 1.975 & "0.93 2 ) 4 = y i "1.975e "0.93x i i=1 ) 2 = 0.007 Solve for next step # "1 0 & #"0.025& ## 1.975& & J r s = "0.395 "0.779 0.079 # s ) * s = 0.019 & "0.93 "0.156 "0.615 0.007 "0.074 "0.061 "0.364 0.02 Update parameter vector for next iteration " 3) = & 1.975 ) + & 0.019 ) = & 1.994 ) #0.93 #0.074 #1.004 Manfred Huber 2011 29

Least Squares Approximation Least Squares approximation is used to determine approximate solutions for a system of equation or to fit an approximate function to a set of data points As opposed to interpolation data points are not met precisely Applicable to noisy data points Allows less complex function to be fitted to the data points Least squares for linear functions has direct solution methods Normal equations method not very stable for large numbers of equations QR Factorization provides a more stable alternative that can also be used for equation solving instead of LU factorization Least squares for nonlinear systems requires iterative solution Gauss-Newton provides robust solution for problems with small residual Otherwise general, more expensive optimization methods have to be used Manfred Huber 2011 30