Matrix Secant Methods

Similar documents
Search Directions for Unconstrained Optimization

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS

1. Search Directions In this chapter we again focus on the unconstrained optimization problem. lim sup ν

17 Solution of Nonlinear Systems

5 Quasi-Newton Methods

Jim Lambers MAT 419/519 Summer Session Lecture 11 Notes

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)

Numerical solutions of nonlinear systems of equations

MATH 4211/6211 Optimization Quasi-Newton Method

Math 408A: Non-Linear Optimization

Maria Cameron. f(x) = 1 n

Solving Nonlinear Equations

Optimization and Root Finding. Kurt Hornik

Quasi-Newton methods for minimization

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method

Bindel, Spring 2016 Numerical Analysis (CS 4220) Notes for

Solutions Preliminary Examination in Numerical Analysis January, 2017

Math and Numerical Methods Review

Quasi-Newton Methods. Javier Peña Convex Optimization /36-725

Quasi-Newton Methods

Convex Optimization CMU-10725

March 8, 2010 MATH 408 FINAL EXAM SAMPLE

1 Implementation (continued)

Quasi-Newton Methods

MATH 3795 Lecture 13. Numerical Solution of Nonlinear Equations in R N.

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

2. Quasi-Newton methods

1. Introduction and motivation. We propose an algorithm for solving unconstrained optimization problems of the form (1) min

Nonlinear equations. Norms for R n. Convergence orders for iterative methods

Chapter 4. Unconstrained optimization

Numerical Methods for Large-Scale Nonlinear Systems

THE RELATIONSHIPS BETWEEN CG, BFGS, AND TWO LIMITED-MEMORY ALGORITHMS

Notes on Numerical Optimization

Academic Editors: Alicia Cordero, Juan R. Torregrosa and Francisco I. Chicharro

arxiv: v2 [math.na] 27 Jun 2018

Math 4329: Numerical Analysis Chapter 03: Fixed Point Iteration and Ill behaving problems. Natasha S. Sharma, PhD

Algorithms for Constrained Optimization

Stochastic Quasi-Newton Methods

EE5120 Linear Algebra: Tutorial 1, July-Dec Solve the following sets of linear equations using Gaussian elimination (a)

Optimization II: Unconstrained Multivariable

Lecture 18: November Review on Primal-dual interior-poit methods

Quasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization

Research Article A Two-Step Matrix-Free Secant Method for Solving Large-Scale Systems of Nonlinear Equations

Institute for Computational Mathematics Hong Kong Baptist University

15 Singular Value Decomposition

ABSTRACT 1. INTRODUCTION

An analysis for the DIIS acceleration method used in quantum chemistry calculations

Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations

Two improved classes of Broyden s methods for solving nonlinear systems of equations

Newton's method and secant methods: A long-standing. relationship from vectors to matrices

Improving L-BFGS Initialization for Trust-Region Methods in Deep Learning

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

New class of limited-memory variationally-derived variable metric methods 1

Higher-Order Methods

Preconditioned conjugate gradient algorithms with column scaling

Derivation of the Kalman Filter

A modified BFGS quasi-newton iterative formula

ME451 Kinematics and Dynamics of Machine Systems

Unconstrained optimization

Math 411 Preliminaries

Cubic regularization in symmetric rank-1 quasi-newton methods

A REDUCED HESSIAN METHOD FOR LARGE-SCALE CONSTRAINED OPTIMIZATION by Lorenz T. Biegler, Jorge Nocedal, and Claudia Schmid ABSTRACT We propose a quasi-

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 6

Jim Lambers MAT 610 Summer Session Lecture 1 Notes

HYBRID RUNGE-KUTTA AND QUASI-NEWTON METHODS FOR UNCONSTRAINED NONLINEAR OPTIMIZATION. Darin Griffin Mohr. An Abstract

MAT 610: Numerical Linear Algebra. James V. Lambers

A linear equation in two variables is generally written as follows equation in three variables can be written as

Key words. Broyden s method, nonlinear system, backtracking line search, Sherman-Morrison, superlinear

Conditional Gradient (Frank-Wolfe) Method

Review Questions REVIEW QUESTIONS 71

Math 273a: Optimization Netwon s methods

ORIE 6326: Convex Optimization. Quasi-Newton Methods

Global convergence of a regularized factorized quasi-newton method for nonlinear least squares problems

The matrix will only be consistent if the last entry of row three is 0, meaning 2b 3 + b 2 b 1 = 0.

ON THE CONNECTION BETWEEN THE CONJUGATE GRADIENT METHOD AND QUASI-NEWTON METHODS ON QUADRATIC PROBLEMS

Line search methods with variable sample size. Nataša Krklec Jerinkić. - PhD thesis -

Iterative Methods. Splitting Methods

The Secant-Newton Map is Optimal Among Contracting Quadratic Maps for Square Root Computation

LMS Algorithm Summary

Unconstrained Multivariate Optimization

Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method

LINEAR AND NONLINEAR PROGRAMMING

ECS550NFB Introduction to Numerical Methods using Matlab Day 2

ORIE 6334 Spectral Graph Theory November 8, Lecture 22

FIXED POINT ITERATION

Module 6 : Solving Ordinary Differential Equations - Initial Value Problems (ODE-IVPs) Section 1 : Introduction

Institute of Computer Science. Limited-memory projective variable metric methods for unconstrained minimization

The speed of Shor s R-algorithm

Chapter 6. Nonlinear Equations. 6.1 The Problem of Nonlinear Root-finding. 6.2 Rate of Convergence

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

QR Decomposition. When solving an overdetermined system by projection (or a least squares solution) often the following method is used:

Optimization II: Unconstrained Multivariable

Author copy. for some integers a i, b i. b i

Homework 3 Conjugate Gradient Descent, Accelerated Gradient Descent Newton, Quasi Newton and Projected Gradient Descent

Hk+lYk 8k, Hk+ Hk + (Sk Hkyk)V[, LINEAR EQUATIONS* WHY BROYDEN S NONSYMMETRIC METHOD TERMINATES ON. Y g(xk+l) g(xk) gk+l gk

Sherman-Morrison-Woodbury

Some definitions. Math 1080: Numerical Linear Algebra Chapter 5, Solving Ax = b by Optimization. A-inner product. Important facts

Fast Approximate Matrix Multiplication by Solving Linear Systems

Pair of Linear Equations in Two Variables

Transcription:

Equation Solving g(x) = 0

Newton-Lie Iterations: x +1 := x J g(x ), where J g (x ).

Newton-Lie Iterations: x +1 := x J g(x ), where J g (x ). 3700 years ago the Babylonians used the secant method in 1D: J = x x 1 g(x ) g(x 1 ).

Newton-Lie Iterations: x +1 := x J g(x ), where J g (x ). 3700 years ago the Babylonians used the secant method in 1D: This gives the error J = x x 1 g(x ) g(x 1 ). g (x ) 1 J = g(x 1 ) [g(x ) + g (x )(x 1 x )] g (x )[g(x 1 ) g(x. )]

Newton-Lie Iterations: x +1 := x J g(x ), where J g (x ). 3700 years ago the Babylonians used the secant method in 1D: This gives the error J = x x 1 g(x ) g(x 1 ). g (x ) 1 J = g(x 1 ) [g(x ) + g (x )(x 1 x )] g (x )[g(x 1 ) g(x. )] If g (x ) 0, then α > 0 such that g (x ) 1 J (L/2) x 1 x 2 α g (x ) x 1 x K x 1 x by the Quadratic Bound Lemma, so x x 2-step quadratic rate.

Can we apply the secant method to higher dimentions than 1? J = x x 1 g(x ) g(x 1 )

Can we apply the secant method to higher dimentions than 1? J = x x 1 g(x ) g(x 1 ) Multiply on the rhs by g(x ) g(x 1 ) gives J (g(x ) g(x 1 )) = x x 1. This is called the matrix secant equation (MSE), or quasi-newton equation.

Can we apply the secant method to higher dimentions than 1? J = x x 1 g(x ) g(x 1 ) Multiply on the rhs by g(x ) g(x 1 ) gives J (g(x ) g(x 1 )) = x x 1. This is called the matrix secant equation (MSE), or quasi-newton equation. Here the matrix J is unnown (n 2 unnowns), and satisfies the n linear equations of the MSE.

Can we apply the secant method to higher dimentions than 1? J = x x 1 g(x ) g(x 1 ) Multiply on the rhs by g(x ) g(x 1 ) gives J (g(x ) g(x 1 )) = x x 1. This is called the matrix secant equation (MSE), or quasi-newton equation. Here the matrix J is unnown (n 2 unnowns), and satisfies the n linear equations of the MSE. There are not enough equations to nail down all of the entries if J, so further conditions are required. What should they be?

To better understand what further conditions on J are sensible, we revert to discussing the matrices B = J 1, so the MSE becomes B (x x 1 ) = g(x ) g(x 1 ).

To better understand what further conditions on J are sensible, we revert to discussing the matrices B = J 1, so the MSE becomes B (x x 1 ) = g(x ) g(x 1 ). Key Idea: Thin of the J s as part of an iteration scheme {x, B }.

To better understand what further conditions on J are sensible, we revert to discussing the matrices B = J 1, so the MSE becomes B (x x 1 ) = g(x ) g(x 1 ). Key Idea: Thin of the J s as part of an iteration scheme {x, B }. As the iteration proceeds, it is hoped that B becomes a better approximation to g (x ). With this in mind, we assume that, in the long run, B is already close to g (x ), so B +1 should not differ from B by too much, i.e. B should be close to B +1.

What do we mean by close in this context? Should B B +1 be small?

What do we mean by close in this context? Should B B +1 be small? We tae close to mean algebraically close in the sense that B +1 should be easy to compute from B and yet satisfy the MSE.

What do we mean by close in this context? Should B B +1 be small? We tae close to mean algebraically close in the sense that B +1 should be easy to compute from B and yet satisfy the MSE. Let s start by assuming that B +1 is a ran one update to B. That is, there exist u, v R n such that B +1 = B + uv T

Set s := x +1 x and y := g(x +1 ) g(x ). The MSE becomes B +1 s = y.

Set s := x +1 x and y := g(x +1 ) g(x ). The MSE becomes B +1 s = y. Multiplying by s gives Hence, if v T s 0, we obtain and y = B +1 s = B s + uv T s. B +1 = B + u = y B s v T s ( y B s ) v T v T s.

( y B s ) v T B +1 = B + v T s This equation determines a whole class of ran one updates that satisfy the MSE by choosing v R n so that v T s 0.

( y B s ) v T B +1 = B + v T s This equation determines a whole class of ran one updates that satisfy the MSE by choosing v R n so that v T s 0. One choice is v = s giving B +1 = B = ( y B s ) s T s T s. This is nown as Broyden s update. It turns out that the Broyden update is also analytically close to B.

Let A R n n, s, y R n, s 0. Then for any matrix norms and e such that AB A B e and the solution to vv T v T v e 1, min { B A : Bs = y} B R n n is T (y As)s A + = A +. s T s In particular, A + solves the given optimization problem when is the l 2 matrix norm, and A + is the unique solution when is the Frobenius norm.

Proof: Let B {B R n n : Bs = y}, then T (y As)s A + A = = (B A) ss T s T s s T s B A ss T s T s e B A.

Broyden s Methods Algorithm: Broyden s Method Initialization: x 0 R n, B 0 R n n Having (x, B ) compute (x +1, B x+1 ) as follows: Solve B s = g(x ) for s and set x +1 : = x + s y : = g(x +1 ) g(x ) B +1 : = B + (y B s )s T. s T s

Sherman-Morrison-Woodbury Formula for Matrix Inversion Suppose A R n n, U R n, V R n are such that both A 1 and (I + V T A 1 U) 1 exist, then (A + UV T ) 1 = A 1 A 1 U(I + V T A 1 U) 1 V T A 1

Inverse Broyden Update If B 1 J +1 = = J exists and s T J y = s T B 1 y 0, then [ B + (y B s )s T s T s = J + (s J y )s T J. s T J y ] 1 = B 1 + (s B 1 y )s T B 1 s T B 1 y

Inverse Broyden Update If B 1 J +1 = = J exists and s T J y = s T B 1 y 0, then [ B + (y B s )s T s T s = J + (s J y )s T J. s T J y ] 1 = B 1 + (s B 1 y )s T B 1 s T B 1 y Under suitable hypotheses x x superlinearly.

Broyden Updates Algorithm: Broyden s Method (Inverse Updating) Initialization: x 0 R n, B 0 R n n Having (x, B ) compute (x +1, B x+1 ) as follows: s : = J g(x ) x +1 : = x + s y : = g(x +1 ) g(x ) J +1 = J + (s J y )s T J. s T J y