Regularized least-squares and minimum-norm methods 6 1 Lecture 6 Regularized least-squares and minimum-norm methods EE263 Autumn 2004 multi-objective least-squares regularized least-squares nonlinear least-squares & Gauss-Newton method minimum-norm solution of underdetermined equations relation to regularized least-squares
Regularized least-squares and minimum-norm methods 6 2 Multi-objective least-squares in many problems we have two (or more) objectives we want J 1 = Ax y 2 small and also J 2 = F x g 2 small (x R n is the variable) usually the objectives are competing we can make one smaller, at the expense of making the other larger common example: F = I, g = 0: we want Ax y small, with small x
Regularized least-squares and minimum-norm methods 6 3 plot (J 2, J 1 ) for every x: PSfrag replacements J 1 x (1) x (2) x (3) J 2 shaded area shows (J 2, J 1 ) achieved by some x R n clear area shows (J 2, J 1 ) not achieved by any x R n boundary of region is called optimal trade-off curve corresponding x are called Pareto optimal (for the two objectives Ax y 2, F x g 2 ) three example choices of x: x (1), x (2), x (3) x (3) is worse than x (2) on both counts (J 2 and J 1 ) x (1) is better than x (2) in J 2, but worse in J 1
Regularized least-squares and minimum-norm methods 6 4 Weighted-sum objective to find Pareto optimal points, i.e., x s on optimal trade-off curve, we minimize weighted-sum objective J 1 + µj 2 = Ax y 2 + µ F x g 2 parameter µ 0 gives relative weight between J 1 and J 2 points where weighted sum is constant, J 1 + µj 2 = α, correspond to line with slope µ: PSfrag replacements J 1 x (1) x (2) x (3) J 1 + µj 2 = α J 2 x (2) minimizes weighted-sum objective for µ shown by varying µ from 0 to +, can sweep out entire optimal tradeoff curve
Regularized least-squares and minimum-norm methods 6 5 Minimizing weighted-sum objective can express weighted-sum objective as ordinary least-squares objective: Ax y 2 + µ F x g 2 = A x µf = Ãx ỹ 2 y µg 2 where à = A, ỹ = µf y µg hence solution is (assuming à full rank) x = ( à T à ) 1 à T ỹ = ( A T A + µf T F ) 1 ( A T y + µf T g )
Regularized least-squares and minimum-norm methods 6 6 Example PSfrag replacements f unit mass at rest subject to forces x i for i 1 < t i, i = 1,..., 10 y R is position at t = 10; y = a T x where a R 10 J 1 = (y 1) 2 (final position error squared) J 2 = x 2 (sum of squares of forces) weighted-sum objective: (a T x 1) 2 + µ x 2 optimal x: x = ( aa T + µi ) 1 a
Regularized least-squares and minimum-norm methods 6 7 optimal trade-off curve: 1 0.9 0.8 0.7 J1 = (y 1) 2 0.6 0.5 0.4 0.3 0.2 Sfrag replacements 0.1 0 0 0.5 1 1.5 2 2.5 3 3.5 J 2 = x 2 x 10 3 upper left corner of optimal trade-off curve corresponds to x = 0 bottom right corresponds to input that yields y = 1, i.e., J 1 = 0
Regularized least-squares and minimum-norm methods 6 8 Regularized least-squares when F = I, g = 0 the objectives are J 1 = Ax y 2, J 2 = x 2 minimizer of weighted-sum objective, x = ( A T A + µi ) 1 A T y, is called regularized least-squares (approximate) solution of Ax y also called Tychonov regularization for µ > 0, works for any A (no restrictions on shape, rank... ) estimation/inversion application: Ax y is sensor residual prior information: x small or, model only accurate for x small regularized solution trades off sensor fit, size of x
Regularized least-squares and minimum-norm methods 6 9 Nonlinear least-squares nonlinear least-squares (NLLS) problem: find x R n that minimizes where r : R n R m r(x) 2 = m i=1 r i(x) 2, r(x) is a vector of residuals reduces to (linear) least-squares if r(x) = Ax b example: estimate position x R 2 from approximate distances to beacons at locations b 1,..., b m R 2 without linearizing we measure ρ i = x b i + v i (v i is range error, unknown but assumed small) NLLS estimate: choose ˆx to minimize m i=1 r i(x) 2 = m i=1 (ρ i x b i ) 2
Regularized least-squares and minimum-norm methods 6 10 Gauss-Newton method for NLLS NLLS: find x R n that minimizes where r : R n R m r(x) 2 = m i=1 r i(x) 2, in general, very hard to solve exactly many good heuristics to compute locally optimal solution Gauss-Newton method: given starting guess for x repeat linearize r near current guess new guess is linear LS solution, using linearized r until convergence
Regularized least-squares and minimum-norm methods 6 11 Gauss-Newton method (more detail): linearize r near current iterate x (k) : r(x) r(x (k) ) + Dr(x (k) )(x x (k) ) where Dr is the Jacobian: (Dr) ij = r i / x j rewrite linearized approximation as r(x (k) ) + Dr(x (k) )(x x (k) ) = A (k) x b (k) A (k) = Dr(x (k) ), b (k) = Dr(x (k) )x (k) r(x (k) ) at kth iteration, we approximate NLLS problem by linear LS problem: r(x) 2 A (k) x b (k) 2 next iterate solves this linearized LS problem: x (k+1) = ( A (k)t A (k)) 1 A (k)t b (k) (although you probably wouldn t compute x (k+1) using this formula... )
Regularized least-squares and minimum-norm methods 6 12 Gauss-Newton example 10 beacons + true position ( 3.6, 3.2); initial guess (1.2, 1.2) range estimates accurate to ±0.5 5 4 3 2 1 0 1 2 3 4 5 5 4 3 2 1 0 1 2 3 4 5
Regularized least-squares and minimum-norm methods 6 13 NLLS objective r(x) 2 versus x: 16 14 12 10 8 6 4 2 0 5 0 5 5 0 5 for a linear LS problem, objective would be nice quadratic bowl bumps in objective due to strong nonlinearity of r
Regularized least-squares and minimum-norm methods 6 14 objective of Gauss-Newton iterates: 12 10 8 r(x) 2 6 4 2 Sfrag replacements 0 1 2 3 4 5 6 7 8 9 10 iteration x (k) converges to (in this case, global) minimum of r(x) 2 convergence takes only five or so steps final estimate is ˆx = ( 3.3, 3.3) estimation error is ˆx x = 0.31 (substantially smaller than range accuracy!)
Regularized least-squares and minimum-norm methods 6 15 convergence of Gauss-Newton iterates: 5 4 3 4 56 3 2 1 2 0 1 1 2 3 4 5 5 4 3 2 1 0 1 2 3 4 5 useful varation on Gauss-Newton: add regularization term A (k) x b (k) 2 + µ x x (k) 2 so that next iterate is not too far from previous one (hence, linearized model still pretty accurate)
Regularized least-squares and minimum-norm methods 6 16 Underdetermined linear equations we consider y = Ax where A R m n is fat (m < n), i.e., there are more variables than equations x is underspecified, i.e., many choices of x lead to the same y we ll assume that A is full rank (m), so for each y R m, there is a solution set of all solutions has form { x Ax = y } = { x p + z z N (A) } where x p is any ( particular ) solution, i.e., Ax p = y z characterizes available choices in solution solution has dim N (A) = n m degrees of freedom can choose z to satisfy other specs or optimize among solutions
Regularized least-squares and minimum-norm methods 6 17 Least-norm solution one particular solution is x ln = A T (AA T ) 1 y (AA T is invertible since A full rank) in fact, x ln is the solution of y = Ax that minimizes x suppose Ax = y, so A(x x ln ) = 0 and (x x ln ) T x ln = (x x ln ) T A T (AA T ) 1 y = (A(x x ln )) T (AA T ) 1 y = 0 i.e., (x x ln ) x ln, so x 2 = x ln + x x ln 2 = x ln 2 + x x ln 2 x ln 2 i.e., x ln has smallest norm of any solution
Regularized least-squares and minimum-norm methods 6 18 x ln { x Ax = y } N (A) = { x Ax = 0 } PSfrag replacements orthogonality condition: x ln N (A) projection interpretation: x ln is projection of 0 on solution set { x Ax = y } A T (AA T ) 1 is called pseudo-inverse of (full rank, fat) A A T (AA T ) 1 is a right inverse of A least-norm solution via QR factorization: apply G-S to A T, so A T = QR, x ln = A T (AA T ) 1 y = QR T y, (R T = (R 1 ) T ) and x ln = R T y
Regularized least-squares and minimum-norm methods 6 19 Derivation via Lagrange multipliers least-norm solution solves optimization problem minimize x T x subject to Ax = y introduce Lagrange multipliers L(x, λ) = x T x + λ T (Ax y) optimality conditions are L x = 2xT + λ T A = 0, L λ = (Ax y)t = 0 from first condition, x = A T λ/2 substitute into second to get λ = 2(AA T ) 1 y hence x = A T (AA T ) 1 y
Regularized least-squares and minimum-norm methods 6 20 Example: transferring mass unit distance PSfrag replacements f unit mass at rest subject to forces x i for i 1 < t i, i = 1,..., 10 y 1 is position at t = 10, y 2 is velocity at t = 10 y = Ax where A R 2 10 (A is fat) find least norm force that transfers mass unit distance with zero final velocity, i.e., y = (1, 0) 0.06 0.04 xln 0.02 0 0.02 0.04 0.06 0 2 4 6 8 10 12 1 t position 0.8 0.6 0.4 0.2 0 0 2 4 6 8 10 12 0.2 PSfrag t replacements velocity 0.15 0.1 0.05 0 0 2 4 6 8 10 12 t
Regularized least-squares and minimum-norm methods 6 21 Relation to regularized least-squares suppose A R m n is fat, full rank define J 1 = Ax y 2, J 2 = x 2 least-norm solution minimizes J 2 with J 1 = 0 minimizer of weighted-sum objective J 1 + µj 2 = Ax y 2 + µ x 2 is x µ = ( A T A + µi ) 1 A T y fact: x µ x ln as µ 0, i.e., regularized solution converges to least-norm solution as µ 0 in matrix terms: as µ 0, ( A T A + µi ) 1 A T A T ( AA T ) 1 (for full rank, fat A)