Non-polynomial Least-squares fitting

Similar documents
The Normal Equations. For A R m n with m > n, A T A is singular if and only if A is rank-deficient. 1 Proof:

Applied Mathematics 205. Unit I: Data Fitting. Lecturer: Dr. David Knezevic

Chapter 3 Numerical Methods

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

Computational Methods. Least Squares Approximation/Optimization

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

Numerical Methods I Solving Nonlinear Equations

COMP 558 lecture 18 Nov. 15, 2010

CS 323: Numerical Analysis and Computing

Constrained Optimization

CLASS NOTES Computational Methods for Engineering Applications I Spring 2015

1 Computing with constraints

Non-linear least squares

AM 205: lecture 6. Last time: finished the data fitting topic Today s lecture: numerical linear algebra, LU factorization

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

CS 450 Numerical Analysis. Chapter 5: Nonlinear Equations

AM 205: lecture 6. Last time: finished the data fitting topic Today s lecture: numerical linear algebra, LU factorization

Numerical solution of Least Squares Problems 1/32

Applied Math 205. Full office hour schedule:

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality

Numerical solutions of nonlinear systems of equations

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 5. Nonlinear Equations

Finite Elements for Nonlinear Problems

Scientific Computing: An Introductory Survey

Outline. Scientific Computing: An Introductory Survey. Nonlinear Equations. Nonlinear Equations. Examples: Nonlinear Equations

CS 323: Numerical Analysis and Computing

LECTURE NOTES ELEMENTARY NUMERICAL METHODS. Eusebius Doedel

Lecture 6. Regularized least-squares and minimum-norm methods 6 1

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey

ME451 Kinematics and Dynamics of Machine Systems

Review Questions REVIEW QUESTIONS 71

Optimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

B553 Lecture 5: Matrix Algebra Review

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems

Scientific Computing: An Introductory Survey

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

MATH 3795 Lecture 13. Numerical Solution of Nonlinear Equations in R N.

Notes for CS542G (Iterative Solvers for Linear Systems)

Written Examination

Projections and Least Square Solutions. Recall that given an inner product space V with subspace W and orthogonal basis for

Math 411 Preliminaries

13. Nonlinear least squares

MATH 350: Introduction to Computational Mathematics

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Motivation: We have already seen an example of a system of nonlinear equations when we studied Gaussian integration (p.8 of integration notes)

L5 Support Vector Classification

A description of a math circle set of activities around polynomials, especially interpolation.

MATH 350: Introduction to Computational Mathematics

4 Newton Method. Unconstrained Convex Optimization 21. H(x)p = f(x). Newton direction. Why? Recall second-order staylor series expansion:

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005

An Overly Simplified and Brief Review of Differential Equation Solution Methods. 1. Some Common Exact Solution Methods for Differential Equations

1. Method 1: bisection. The bisection methods starts from two points a 0 and b 0 such that

INTERPOLATION. and y i = cos x i, i = 0, 1, 2 This gives us the three points. Now find a quadratic polynomial. p(x) = a 0 + a 1 x + a 2 x 2.

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Nonlinear Optimization for Optimal Control

Numerical Methods for Inverse Kinematics

Rank-Deficient Nonlinear Least Squares Problems and Subset Selection

10-725/36-725: Convex Optimization Prerequisite Topics

Error Analysis for Solving a Set of Linear Equations

12. Perturbed Matrices

Sparse Levenberg-Marquardt algorithm.

Numerical Analysis Preliminary Exam 10 am to 1 pm, August 20, 2018

Lecture Notes: Geometric Considerations in Unconstrained Optimization

17 Solution of Nonlinear Systems

CHAPTER 11. A Revision. 1. The Computers and Numbers therein

Interpolation and the Lagrange Polynomial

5 Handling Constraints

1 Non-negative Matrix Factorization (NMF)

Function approximation

Optimization Methods

Multidisciplinary System Design Optimization (MSDO)

MATHEMATICS FOR COMPUTER VISION WEEK 8 OPTIMISATION PART 2. Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year

Least Squares Approximation

x x2 2 + x3 3 x4 3. Use the divided-difference method to find a polynomial of least degree that fits the values shown: (b)

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Algorithms for Constrained Optimization

Numerical Methods of Approximation

1 Determinants. 1.1 Determinant

Bindel, Fall 2009 Matrix Computations (CS 6210) Week 8: Friday, Oct 17

MTH5112 Linear Algebra I MTH5212 Applied Linear Algebra (2017/2018)

Pose Tracking II! Gordon Wetzstein! Stanford University! EE 267 Virtual Reality! Lecture 12! stanford.edu/class/ee267/!

Linear and Nonlinear Optimization

Math 273a: Optimization Netwon s methods

Lecture 5. September 4, 2018 Math/CS 471: Introduction to Scientific Computing University of New Mexico

Trust-region methods for rectangular systems of nonlinear equations

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix

ECE521 lecture 4: 19 January Optimization, MLE, regularization

Homework 5. Convex Optimization /36-725

Numerical Mathematics & Computing, 7 Ed. 4.1 Interpolation

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017

Modeling of Data. Massimo Ricotti. University of Maryland. Modeling of Data p. 1/14

Mobile Robotics 1. A Compact Course on Linear Algebra. Giorgio Grisetti

Tangent spaces, normals and extrema

Math 5311 Constrained Optimization Notes

Maximum Likelihood Estimation

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Lectures 9-10: Polynomial and piecewise polynomial interpolation

Transcription:

Applied Math 205 Last time: piecewise polynomial interpolation, least-squares fitting Today: underdetermined least squares, nonlinear least squares Homework 1 (and subsequent homeworks) have several parts that are listed as optional. These parts are truly optional no extra credit is given.

Non-polynomial Least-squares fitting So far we have dealt with approximations based on polynomials, but we can also develop non-polynomial approximations We just need the model to depend linearly on parameters Example: Approximate e x cos(4x) using f n (x; b) n k= n b ke kx (Note that f n is linear in b: f n (x; γa + σb) = γf n (x; a) + σf n (x; b))

Non-polynomial Least-squares fitting 1.5 1 0.5 3 terms in approximation e x cos(4x) samples best fit 0 0.5 1 1.5 2 2.5 3 1 0.5 0 0.5 1 n = 1, r(b) 2 b 2 = 4.16 10 1

Non-polynomial Least-squares fitting 1.5 1 0.5 7 terms in approximation e x cos(4x) samples best fit 0 0.5 1 1.5 2 2.5 1 0.5 0 0.5 1 n = 3, r(b) 2 b 2 = 1.44 10 3

Non-polynomial Least-squares fitting 1.5 1 0.5 11 terms in approximation e x cos(4x) samples best fit 0 0.5 1 1.5 2 2.5 1 0.5 0 0.5 1 n = 5, r(b) 2 b 2 = 7.46 10 6

Pseudoinverse Recall that from the normal equations we have: A T Ab = A T y This motivates the idea of the pseudoinverse for A R m n : A + (A T A) 1 A T R n m Key point: A + generalizes A 1, i.e. if A R n n is invertible, then A + = A 1 Proof: A + = (A T A) 1 A T = A 1 (A T ) 1 A T = A 1

Pseudoinverse Also: Even when A is not invertible we still have still have A + A = I In general AA + I (hence this is called a left inverse ) And it follows from our definition that b = A + y, i.e. A + R n m gives the least-squares solution Note that we define the pseudoinverse differently in different contexts

Underdetermined Least Squares So far we have focused on overconstrained systems (more constraints than parameters) But least-squares also applies to underconstrained systems: Ab = y with A R m n, m < n A b = y i.e. we have a short, wide matrix A

Underdetermined Least Squares For φ(b) = r(b) 2 2 = y Ab 2 2, we can apply the same argument as before (i.e. set φ = 0) to again obtain A T Ab = A T y But in this case A T A R n n has rank at most m (where m < n), why? Therefore A T A must be singular! Typical case: There are infinitely vectors b that give r(b) = 0, we want to be able to select one of them

Underdetermined Least Squares First idea, pose as a constrained optimization problem to find the feasible b with minimum 2-norm: minimize subject to b T b Ab = y This can be treated using Lagrange multipliers (discussed later in the Optimization section) Idea is that the constraint restricts us to an (n m)-dimensional hyperplane of R n on which b T b has a unique minimum

Underdetermined Least Squares We will show later that the Lagrange multiplier approach for the above problem gives: b = A T (AA T ) 1 y As a result, in the underdetermined case the pseudoinverse is defined as A + = A T (AA T ) 1 R n m Note that now AA + = I, but A + A I in general (i.e. this is a right inverse )

Underdetermined Least Squares Here we consider an alternative approach for solving the underconstrained case Let s modify φ so that there is a unique minimum! For example, let φ(b) r(b) 2 2 + Sb 2 2 where S R n n is a scaling matrix This is called regularization: we make the problem well-posed ( more regular ) by modifying the objective function

Underdetermined Least Squares Calculating φ = 0 in the same way as before leads to the system (A T A + S T S)b = A T y We need to choose S in some way to ensure (A T A + S T S) is invertible Can be proved that if S T S is positive definite then (A T A + S T S) is invertible Simplest positive definite regularizer: S = µi R n n for µ R >0

Underdetermined Least Squares Example: Find least-squares fit for degree 11 polynomial to 5 samples of y = cos(4x) for x [0, 1] 12 parameters, 5 constraints = A R 5 12 We express the polynomial using the monomial basis (can t use Lagrange since m n): A is a submatrix of a Vandermonde matrix If we naively use the normal equations we see that cond(a T A) = 4.78 10 17, i.e. singular to machine precision! Let s see what happens when we regularize the problem with some different choices of S

Underdetermined Least Squares Find least-squares fit for degree 11 polynomial to 5 samples of y = cos(4x) for x [0, 1] Try S = 0.001I (i.e. µ = 0.001) 1.5 1 r(b) 2 = 1.07 10 4 0.5 0 b 2 = 4.40 0.5 1 1.5 0 0.2 0.4 0.6 0.8 1 cond(a T A + S T S) = 1.54 10 7 Fit is good since regularization term is small but condition number is still large

Underdetermined Least Squares Find least-squares fit for degree 11 polynomial to 5 samples of y = cos(4x) for x [0, 1] Try S = 0.5I (i.e. µ = 0.5) 0.8 0.6 0.4 0.2 0 0.2 r(b) 2 = 6.60 10 1 b 2 = 1.15 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 cond(a T A + S T S) = 62.3 Regularization term now dominates: small condition number and small b 2, but poor fit to the data!

Underdetermined Least Squares Find least-squares fit for degree 11 polynomial to 5 samples of y = cos(4x) for x [0, 1] Try S = diag(0.1, 0.1, 0.1, 10, 10..., 10) 1.5 1 r(b) 2 = 4.78 10 1 0.5 0 b 2 = 4.27 0.5 1 0 0.2 0.4 0.6 0.8 1 cond(a T A + S T S) = 5.90 10 3 We strongly penalize b 3, b 4,..., b 11, hence the fit is close to parabolic

Underdetermined Least Squares Find least-squares fit for degree 11 polynomial to 5 samples of y = cos(4x) for x [0, 1] 1.5 1 0.5 0 0.5 1 r(b) 2 = 1.03 10 15 b 2 = 7.18 1.5 0 0.2 0.4 0.6 0.8 1 Python routine gives Lagrange multiplier based solution, hence satisfies the constraints to machine precision

Nonlinear Least Squares So far we have looked at finding a best fit solution to a linear system (linear least-squares) A more difficult situation is when we consider least-squares for nonlinear systems Key point: We are referring to linearity in the parameters, not linearity of the model (e.g. polynomial p n (x; b) = b 0 + b 1 x +... + b n x n is nonlinear in x, but linear in b!) In nonlinear least-squares, we fit functions that are nonlinear in the parameters

Nonlinear Least Squares: Example Example: Suppose we have a radio transmitter at ˆb = (ˆb 1, ˆb 2 ) somewhere in [0, 1] 2 ( ) Suppose that we have 10 receivers at locations (x1 1, x 2 1), (x 1 2, x 2 2 ),..., (x 10 1, x 2 10) [0, 1]2 ( ) Receiver i returns a measurement for the distance y i to the transmitter, but there is some error/noise (ɛ) x2 1 0.9 x i 0.8 0.7 yi + ǫ 0.6 0.5 0.4 ˆb 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 x1

Nonlinear Least Squares: Example Let b be a candidate location for the transmitter The distance from b to (x i 1, x i 2 ) is d i (b) (b 1 x i 1 )2 + (b 2 x i 2 )2 We want to choose b to match the data as well as possible, hence minimize the residual r(b) R 10 where r i (b) = y i d i (b)

Nonlinear Least Squares: Example In this case, r i (α + β) r i (α) + r i (β), hence nonlinear least-squares! Define the objective function φ(b) = 1 2 r(b) 2 2, where r(b) R10 is the residual vector The 1/2 factor in φ(b) has no effect on the minimizing b, but leads to slightly cleaner formulae later on

Nonlinear Least Squares As in the linear case, we seek to minimize φ by finding b such that φ = 0 We have φ(b) = 1 2 m j=1 [r j(b)] 2 Hence for the i th component of the gradient vector, we have φ b i = 1 b i 2 m j=1 r 2 j = m j=1 r j r j b i

Nonlinear Least Squares This is equivalent to φ = J r (b) T r(b) where J r (b) R m n is the Jacobian matrix of the residual {J r (b)} ij = r i(b) b j Exercise: Show that J r (b) T r(b) = 0 reduces to the normal equations when the residual is linear

Nonlinear Least Squares Hence we seek b R n such that: J r (b) T r(b) = 0 This has n equations, n unknowns; in general this is a nonlinear system that we have to solve iteratively An important recurring theme is that linear systems can be solved in one shot, whereas nonlinear generally requires iteration We will briefly introduce Newton s method for solving this system and defer detailed discussion until the optimization section

Nonlinear Least Squares Recall Newton s method for a function of one variable: find x R such that f (x) = 0 Let x k be our current guess, and x k + x = x, then Taylor expansion gives 0 = f (x k + x) = f (x k ) + xf (x k ) + O(( x) 2 ) It follows that f (x k ) x f (x k ) (approx. since we neglect the higher order terms) This motivates Newton s method: f (x k ) x k = f (x k ), where x k+1 = x k + x k

Nonlinear Least Squares This argument generalizes directly to functions of several variables For example, suppose F : R n R n, then find x s.t. F (x) = 0 by J F (x k ) x k = F (x k ) where J F is the Jacobian of F, x k R n, x k+1 = x k + x k

Nonlinear Least Squares In the case of nonlinear least squares, to find a stationary point of φ we need to find b such that J r (b) T r(b) = 0 That is, we want to solve F (b) = 0 for F (b) J r (b) T r(b) We apply Newton s Method, hence need to find the Jacobian, J F, of the function F : R n R n

Nonlinear Least Squares Consider F i b j (this will be the ij entry of J F ): F i = ( ) J r (b) T r(b) b j b j i m = b j m = k=1 k=1 r k b i r k r k b i r k b j + m k=1 2 r k b i b j r k

Gauss Newton Method It is generally a pain to deal with the second derivatives in the previous formula, second derivatives get messy! Key observation: As we approach a good fit to the data, the residual values r k (b), 1 k m, should be small Hence we omit the term m k=1 r k 2 r k b i b j.

Gauss Newton Method Note that m r k r k k=1 b j b i = (J r (b) T J r (b)) ij, so that when the residual is small J F (b) J r (b) T J r (b) Then putting all the pieces together, we obtain the iteration: b k+1 = b k + b k where J r (b k ) T J r (b k ) b k = J(b k ) T r(b k ), k = 1, 2, 3,... This is known as the Gauss Newton Algorithm for nonlinear least squares

Gauss Newton Method This looks similar to Normal Equations at each iteration, except now the matrix J r (b k ) comes from linearizing the residual Gauss Newton is equivalent to solving the linear least squares problem J r (b k ) b k r(b k ) at each iteration This is a common refrain in Scientific Computing: Replace a nonlinear problem with a sequence of linearized problems

Computing the Jacobian To use Gauss Newton in practice, we need to be able to compute the Jacobian matrix J r (b k ) for any b k R n We can do this by hand, e.g. in our transmitter/receiver problem we would have: [J r (b)] ij = (b 1 x1 i b )2 + (b 2 x2 i )2 j Differentiating by hand is feasible in this case, but it can become impractical if r(b) is more complicated Or perhaps our mapping b y is a black box no closed form equations hence not possible to differentiate the residual!

Computing the Jacobian So, what is the alternative to differentiation by hand? Finite difference approximation: for h 1 we have [J r (b k )] ij r i(b k + e j h) r i (b k ) h Avoids tedious, error prone differentiation of r by hand! Also, can be used for differentiating black box mappings since we only need to be able to evaluate r(b)

Gauss Newton Method We derived the Gauss Newton algorithm method in a natural way: apply Newton s method to solve φ = 0 neglect the second derivative terms that arise However, Gauss Newton is not widely used in practice since it doesn t always converge reliably

Levenberg Marquardt Method A more robust variation of Gauss Newton is the Levenberg Marquardt Algorithm, which uses the update [J T (b k )J(b k ) + µ k diag(s T S)] b = J(b k ) T r(b k ) where 1 S = I or S = J(b k ), and some heuristic is used to choose µ k This looks like our regularized underdetermined linear least squares formulation! 1 In this context diag(a) means zero the off-diagonal part of A

Levenberg Marquardt Method Key point: The regularization term µ k diag(s T S) improves the reliability of the algorithm in practice Levenberg Marquardt is implemented in Python and Matlab s optimization toolbox We need to pass the residual to the routine, and we can also pass the Jacobian matrix or ask for a finite-differenced Jacobian Now let s solve our transmitter/receiver problem

Nonlinear Least Squares: Example Python example: Using lsqnonlin.py we provide an initial guess ( ), and converge to the solution ( ) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1

Nonlinear Least Squares: Example Levenberg Marquardt minimizes φ(b), as we see from the contour plot of φ(b) below Recall is the true transmitter location, is our best-fit to the data; φ( ) = 0.0248 < 0.0386 = φ( ). 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 These contours are quite different from what we get in linear problems

Linear Least-Squares Contours Two examples of linear least squares contours for φ(b) = y Ab 2 2, b R2 10 8 6 4 2 0 2 4 6 8 10 10 5 0 5 10 10 8 6 4 2 0 2 4 6 8 10 10 5 0 5 10 In linear least squares φ(b) is quadratic, hence contours are hyperellipses