Lecture 6. Regularized least-squares and minimum-norm methods 6 1

Similar documents
Lecture 5 Least-squares

Computational Methods. Least Squares Approximation/Optimization

EECS 275 Matrix Computation

Inverses. Stephen Boyd. EE103 Stanford University. October 28, 2017

13. Nonlinear least squares

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017

Least-squares data fitting

The Normal Equations. For A R m n with m > n, A T A is singular if and only if A is rank-deficient. 1 Proof:

Non-polynomial Least-squares fitting

Math 407: Linear Optimization

COMP 558 lecture 18 Nov. 15, 2010

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

The Full-rank Linear Least Squares Problem

7.2 Steepest Descent and Preconditioning

Nonlinear Optimization: What s important?

Chapter 3 Numerical Methods

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

Lecture 4 Orthonormal vectors and QR factorization

Lecture 19 Observability and state estimation

March 5, 2012 MATH 408 FINAL EXAM SAMPLE

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization

March 8, 2010 MATH 408 FINAL EXAM SAMPLE

Written Examination

Orthogonality. 6.1 Orthogonal Vectors and Subspaces. Chapter 6

1 Computing with constraints

Numerical solution of Least Squares Problems 1/32

1 Cricket chirps: an example

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

Lecture 9: Numerical Linear Algebra Primer (February 11st)

Applied Mathematics 205. Unit I: Data Fitting. Lecturer: Dr. David Knezevic

Lecture 2: Linear Algebra Review

Mathematical Methods

Least Squares. Stephen Boyd. EE103 Stanford University. October 28, 2017

Constrained Optimization

C&O367: Nonlinear Optimization (Winter 2013) Assignment 4 H. Wolkowicz

Interior Point Methods. We ll discuss linear programming first, followed by three nonlinear problems. Algorithms for Linear Programming Problems

Least-Squares Fitting of Model Parameters to Experimental Data

Mathematical Methods

Linear Least Square Problems Dr.-Ing. Sudchai Boonto

The QR Decomposition

Observability and state estimation

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009

Synopsis of Numerical Linear Algebra

LECTURE 7. Least Squares and Variants. Optimization Models EE 127 / EE 227AT. Outline. Least Squares. Notes. Notes. Notes. Notes.

Linear algebra review

Numerical Methods I Non-Square and Sparse Linear Systems

CS 450 Numerical Analysis. Chapter 5: Nonlinear Equations

Predator - Prey Model Trajectories and the nonlinear conservation law

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix

Jim Lambers MAT 419/519 Summer Session Lecture 11 Notes

Conjugate Gradient (CG) Method

Penalty and Barrier Methods. So we again build on our unconstrained algorithms, but in a different way.

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey

Worksheet for Lecture 25 Section 6.4 Gram-Schmidt Process

Lecture 6 Positive Definite Matrices

Econ Slides from Lecture 8

LP. Kap. 17: Interior-point methods

Math 520 Exam 2 Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

EE263 homework 9 solutions

SF2822 Applied Nonlinear Optimization. Preparatory question. Lecture 10: Interior methods. Anders Forsgren. 1. Try to solve theory question 7.

Mobile Robotics 1. A Compact Course on Linear Algebra. Giorgio Grisetti

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

Lecture 11. Fast Linear Solvers: Iterative Methods. J. Chaudhry. Department of Mathematics and Statistics University of New Mexico

σ 11 σ 22 σ pp 0 with p = min(n, m) The σ ii s are the singular values. Notation change σ ii A 1 σ 2

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian

Lecture 13: Orthogonal projections and least squares (Section ) Thang Huynh, UC San Diego 2/9/2018

AMS526: Numerical Analysis I (Numerical Linear Algebra)

SF2822 Applied Nonlinear Optimization. Preparatory question. Lecture 9: Sequential quadratic programming. Anders Forsgren

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.

DUAL REGULARIZED TOTAL LEAST SQUARES SOLUTION FROM TWO-PARAMETER TRUST-REGION ALGORITHM. Geunseop Lee

The Kalman filter is arguably one of the most notable algorithms

nonrobust estimation The n measurement vectors taken together give the vector X R N. The unknown parameter vector is P R M.

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality

Lecture Notes for EE263

EE263: Introduction to Linear Dynamical Systems Review Session 2

2 Regularized Image Reconstruction for Compressive Imaging and Beyond

ECEN 615 Methods of Electric Power Systems Analysis Lecture 18: Least Squares, State Estimation

Nonlinear Least Squares

Outline. Scientific Computing: An Introductory Survey. Nonlinear Equations. Nonlinear Equations. Examples: Nonlinear Equations

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 5. Nonlinear Equations

Statistically-Based Regularization Parameter Estimation for Large Scale Problems

Numerical Methods. Lecture Notes #08 Discrete Least Square Approximation

Fitting Linear Statistical Models to Data by Least Squares: Introduction

ECS130 Scientific Computing. Lecture 1: Introduction. Monday, January 7, 10:00 10:50 am

1 Non-negative Matrix Factorization (NMF)

Lecture: Linear algebra. 4. Solutions of linear equation systems The fundamental theorem of linear algebra

Lecture 11. Linear systems: Cholesky method. Eigensystems: Terminology. Jacobi transformations QR transformation

Homework 4. Convex Optimization /36-725

SECTION C: CONTINUOUS OPTIMISATION LECTURE 11: THE METHOD OF LAGRANGE MULTIPLIERS

ALADIN An Algorithm for Distributed Non-Convex Optimization and Control

Algorithms for Constrained Optimization

Lecture 6, Sci. Comp. for DPhil Students

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

MA 1128: Lecture 19 4/20/2018. Quadratic Formula Solving Equations with Graphs

Classical iterative methods for linear systems

Lecture 3: Linear Algebra Review, Part II

JACOBI S ITERATION METHOD

Transcription:

Regularized least-squares and minimum-norm methods 6 1 Lecture 6 Regularized least-squares and minimum-norm methods EE263 Autumn 2004 multi-objective least-squares regularized least-squares nonlinear least-squares & Gauss-Newton method minimum-norm solution of underdetermined equations relation to regularized least-squares

Regularized least-squares and minimum-norm methods 6 2 Multi-objective least-squares in many problems we have two (or more) objectives we want J 1 = Ax y 2 small and also J 2 = F x g 2 small (x R n is the variable) usually the objectives are competing we can make one smaller, at the expense of making the other larger common example: F = I, g = 0: we want Ax y small, with small x

Regularized least-squares and minimum-norm methods 6 3 plot (J 2, J 1 ) for every x: PSfrag replacements J 1 x (1) x (2) x (3) J 2 shaded area shows (J 2, J 1 ) achieved by some x R n clear area shows (J 2, J 1 ) not achieved by any x R n boundary of region is called optimal trade-off curve corresponding x are called Pareto optimal (for the two objectives Ax y 2, F x g 2 ) three example choices of x: x (1), x (2), x (3) x (3) is worse than x (2) on both counts (J 2 and J 1 ) x (1) is better than x (2) in J 2, but worse in J 1

Regularized least-squares and minimum-norm methods 6 4 Weighted-sum objective to find Pareto optimal points, i.e., x s on optimal trade-off curve, we minimize weighted-sum objective J 1 + µj 2 = Ax y 2 + µ F x g 2 parameter µ 0 gives relative weight between J 1 and J 2 points where weighted sum is constant, J 1 + µj 2 = α, correspond to line with slope µ: PSfrag replacements J 1 x (1) x (2) x (3) J 1 + µj 2 = α J 2 x (2) minimizes weighted-sum objective for µ shown by varying µ from 0 to +, can sweep out entire optimal tradeoff curve

Regularized least-squares and minimum-norm methods 6 5 Minimizing weighted-sum objective can express weighted-sum objective as ordinary least-squares objective: Ax y 2 + µ F x g 2 = A x µf = Ãx ỹ 2 y µg 2 where à = A, ỹ = µf y µg hence solution is (assuming à full rank) x = ( à T à ) 1 à T ỹ = ( A T A + µf T F ) 1 ( A T y + µf T g )

Regularized least-squares and minimum-norm methods 6 6 Example PSfrag replacements f unit mass at rest subject to forces x i for i 1 < t i, i = 1,..., 10 y R is position at t = 10; y = a T x where a R 10 J 1 = (y 1) 2 (final position error squared) J 2 = x 2 (sum of squares of forces) weighted-sum objective: (a T x 1) 2 + µ x 2 optimal x: x = ( aa T + µi ) 1 a

Regularized least-squares and minimum-norm methods 6 7 optimal trade-off curve: 1 0.9 0.8 0.7 J1 = (y 1) 2 0.6 0.5 0.4 0.3 0.2 Sfrag replacements 0.1 0 0 0.5 1 1.5 2 2.5 3 3.5 J 2 = x 2 x 10 3 upper left corner of optimal trade-off curve corresponds to x = 0 bottom right corresponds to input that yields y = 1, i.e., J 1 = 0

Regularized least-squares and minimum-norm methods 6 8 Regularized least-squares when F = I, g = 0 the objectives are J 1 = Ax y 2, J 2 = x 2 minimizer of weighted-sum objective, x = ( A T A + µi ) 1 A T y, is called regularized least-squares (approximate) solution of Ax y also called Tychonov regularization for µ > 0, works for any A (no restrictions on shape, rank... ) estimation/inversion application: Ax y is sensor residual prior information: x small or, model only accurate for x small regularized solution trades off sensor fit, size of x

Regularized least-squares and minimum-norm methods 6 9 Nonlinear least-squares nonlinear least-squares (NLLS) problem: find x R n that minimizes where r : R n R m r(x) 2 = m i=1 r i(x) 2, r(x) is a vector of residuals reduces to (linear) least-squares if r(x) = Ax b example: estimate position x R 2 from approximate distances to beacons at locations b 1,..., b m R 2 without linearizing we measure ρ i = x b i + v i (v i is range error, unknown but assumed small) NLLS estimate: choose ˆx to minimize m i=1 r i(x) 2 = m i=1 (ρ i x b i ) 2

Regularized least-squares and minimum-norm methods 6 10 Gauss-Newton method for NLLS NLLS: find x R n that minimizes where r : R n R m r(x) 2 = m i=1 r i(x) 2, in general, very hard to solve exactly many good heuristics to compute locally optimal solution Gauss-Newton method: given starting guess for x repeat linearize r near current guess new guess is linear LS solution, using linearized r until convergence

Regularized least-squares and minimum-norm methods 6 11 Gauss-Newton method (more detail): linearize r near current iterate x (k) : r(x) r(x (k) ) + Dr(x (k) )(x x (k) ) where Dr is the Jacobian: (Dr) ij = r i / x j rewrite linearized approximation as r(x (k) ) + Dr(x (k) )(x x (k) ) = A (k) x b (k) A (k) = Dr(x (k) ), b (k) = Dr(x (k) )x (k) r(x (k) ) at kth iteration, we approximate NLLS problem by linear LS problem: r(x) 2 A (k) x b (k) 2 next iterate solves this linearized LS problem: x (k+1) = ( A (k)t A (k)) 1 A (k)t b (k) (although you probably wouldn t compute x (k+1) using this formula... )

Regularized least-squares and minimum-norm methods 6 12 Gauss-Newton example 10 beacons + true position ( 3.6, 3.2); initial guess (1.2, 1.2) range estimates accurate to ±0.5 5 4 3 2 1 0 1 2 3 4 5 5 4 3 2 1 0 1 2 3 4 5

Regularized least-squares and minimum-norm methods 6 13 NLLS objective r(x) 2 versus x: 16 14 12 10 8 6 4 2 0 5 0 5 5 0 5 for a linear LS problem, objective would be nice quadratic bowl bumps in objective due to strong nonlinearity of r

Regularized least-squares and minimum-norm methods 6 14 objective of Gauss-Newton iterates: 12 10 8 r(x) 2 6 4 2 Sfrag replacements 0 1 2 3 4 5 6 7 8 9 10 iteration x (k) converges to (in this case, global) minimum of r(x) 2 convergence takes only five or so steps final estimate is ˆx = ( 3.3, 3.3) estimation error is ˆx x = 0.31 (substantially smaller than range accuracy!)

Regularized least-squares and minimum-norm methods 6 15 convergence of Gauss-Newton iterates: 5 4 3 4 56 3 2 1 2 0 1 1 2 3 4 5 5 4 3 2 1 0 1 2 3 4 5 useful varation on Gauss-Newton: add regularization term A (k) x b (k) 2 + µ x x (k) 2 so that next iterate is not too far from previous one (hence, linearized model still pretty accurate)

Regularized least-squares and minimum-norm methods 6 16 Underdetermined linear equations we consider y = Ax where A R m n is fat (m < n), i.e., there are more variables than equations x is underspecified, i.e., many choices of x lead to the same y we ll assume that A is full rank (m), so for each y R m, there is a solution set of all solutions has form { x Ax = y } = { x p + z z N (A) } where x p is any ( particular ) solution, i.e., Ax p = y z characterizes available choices in solution solution has dim N (A) = n m degrees of freedom can choose z to satisfy other specs or optimize among solutions

Regularized least-squares and minimum-norm methods 6 17 Least-norm solution one particular solution is x ln = A T (AA T ) 1 y (AA T is invertible since A full rank) in fact, x ln is the solution of y = Ax that minimizes x suppose Ax = y, so A(x x ln ) = 0 and (x x ln ) T x ln = (x x ln ) T A T (AA T ) 1 y = (A(x x ln )) T (AA T ) 1 y = 0 i.e., (x x ln ) x ln, so x 2 = x ln + x x ln 2 = x ln 2 + x x ln 2 x ln 2 i.e., x ln has smallest norm of any solution

Regularized least-squares and minimum-norm methods 6 18 x ln { x Ax = y } N (A) = { x Ax = 0 } PSfrag replacements orthogonality condition: x ln N (A) projection interpretation: x ln is projection of 0 on solution set { x Ax = y } A T (AA T ) 1 is called pseudo-inverse of (full rank, fat) A A T (AA T ) 1 is a right inverse of A least-norm solution via QR factorization: apply G-S to A T, so A T = QR, x ln = A T (AA T ) 1 y = QR T y, (R T = (R 1 ) T ) and x ln = R T y

Regularized least-squares and minimum-norm methods 6 19 Derivation via Lagrange multipliers least-norm solution solves optimization problem minimize x T x subject to Ax = y introduce Lagrange multipliers L(x, λ) = x T x + λ T (Ax y) optimality conditions are L x = 2xT + λ T A = 0, L λ = (Ax y)t = 0 from first condition, x = A T λ/2 substitute into second to get λ = 2(AA T ) 1 y hence x = A T (AA T ) 1 y

Regularized least-squares and minimum-norm methods 6 20 Example: transferring mass unit distance PSfrag replacements f unit mass at rest subject to forces x i for i 1 < t i, i = 1,..., 10 y 1 is position at t = 10, y 2 is velocity at t = 10 y = Ax where A R 2 10 (A is fat) find least norm force that transfers mass unit distance with zero final velocity, i.e., y = (1, 0) 0.06 0.04 xln 0.02 0 0.02 0.04 0.06 0 2 4 6 8 10 12 1 t position 0.8 0.6 0.4 0.2 0 0 2 4 6 8 10 12 0.2 PSfrag t replacements velocity 0.15 0.1 0.05 0 0 2 4 6 8 10 12 t

Regularized least-squares and minimum-norm methods 6 21 Relation to regularized least-squares suppose A R m n is fat, full rank define J 1 = Ax y 2, J 2 = x 2 least-norm solution minimizes J 2 with J 1 = 0 minimizer of weighted-sum objective J 1 + µj 2 = Ax y 2 + µ x 2 is x µ = ( A T A + µi ) 1 A T y fact: x µ x ln as µ 0, i.e., regularized solution converges to least-norm solution as µ 0 in matrix terms: as µ 0, ( A T A + µi ) 1 A T A T ( AA T ) 1 (for full rank, fat A)