Convex Optimization CMU-10725

Similar documents
Convex Optimization CMU-10725

5 Quasi-Newton Methods

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)

MATH 4211/6211 Optimization Quasi-Newton Method

2. Quasi-Newton methods

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Optimization II: Unconstrained Multivariable

Optimization II: Unconstrained Multivariable

Quasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23

Chapter 4. Unconstrained optimization

Lecture 14: October 17

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

Comparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming Problems

Quasi-Newton Methods. Javier Peña Convex Optimization /36-725

Quasi-Newton Methods

Quasi-Newton methods for minimization

Programming, numerics and optimization

Lecture 7 Unconstrained nonlinear programming

Lecture 18: November Review on Primal-dual interior-poit methods

Optimization and Root Finding. Kurt Hornik

Math 408A: Non-Linear Optimization

Lecture 10: September 26

Nonlinear Programming

Algorithms for Constrained Optimization

NonlinearOptimization

Higher-Order Methods

Unconstrained optimization

LBFGS. John Langford, Large Scale Machine Learning Class, February 5. (post presentation version)

Statistics 580 Optimization Methods

Nonlinear Optimization: What s important?

Lecture V. Numerical Optimization

(One Dimension) Problem: for a function f(x), find x 0 such that f(x 0 ) = 0. f(x)

Optimization Methods

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

ECS550NFB Introduction to Numerical Methods using Matlab Day 2

Cubic regularization in symmetric rank-1 quasi-newton methods

Improving L-BFGS Initialization for Trust-Region Methods in Deep Learning

1 Numerical optimization

Convex Optimization. Problem set 2. Due Monday April 26th

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

ON THE CONNECTION BETWEEN THE CONJUGATE GRADIENT METHOD AND QUASI-NEWTON METHODS ON QUADRATIC PROBLEMS

Quasi-Newton Methods

Multivariate Newton Minimanization

Maximum Likelihood Estimation

Search Directions for Unconstrained Optimization

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS

Optimization Methods for Machine Learning

1 Numerical optimization

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review

Seminal papers in nonlinear optimization

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

STAT Advanced Bayesian Inference

Math 411 Preliminaries

Minimum Norm Symmetric Quasi-Newton Updates Restricted to Subspaces

Global Convergence of Perry-Shanno Memoryless Quasi-Newton-type Method. 1 Introduction

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

ORIE 6326: Convex Optimization. Quasi-Newton Methods

MA/OR/ST 706: Nonlinear Programming Midterm Exam Instructor: Dr. Kartik Sivaramakrishnan INSTRUCTIONS

ENSIEEHT-IRIT, 2, rue Camichel, Toulouse (France) LMS SAMTECH, A Siemens Business,15-16, Lower Park Row, BS1 5BN Bristol (UK)

The Conjugate Gradient Method

Gradient-Based Optimization

Improving the Convergence of Back-Propogation Learning with Second Order Methods

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

ABSTRACT 1. INTRODUCTION

Data Mining (Mineria de Dades)

Introduction to unconstrained optimization - direct search methods

Stochastic Quasi-Newton Methods

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Numerical solutions of nonlinear systems of equations

Homework 3 Conjugate Gradient Descent, Accelerated Gradient Descent Newton, Quasi Newton and Projected Gradient Descent

Optimization Methods

Optimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

Numerical Optimization: Penn State Math 555 Lecture Notes. Christopher Griffin 2012

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Part 4: IIR Filters Optimization Approach. Tutorial ISCAS 2007

MATHEMATICS FOR COMPUTER VISION WEEK 8 OPTIMISATION PART 2. Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year

C:\Users\Leonardo\Desktop\README_ELMO_NOTES.txt Montag, 7. Juli :48

Line Search Methods for Unconstrained Optimisation

Matrix Secant Methods

Line search methods with variable sample size. Nataša Krklec Jerinkić. - PhD thesis -

NONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 62, NO. 23, DECEMBER 1, Aryan Mokhtari and Alejandro Ribeiro, Member, IEEE

Notes on Numerical Optimization

A projected Hessian for full waveform inversion

UNIVERSITY OF CALIFORNIA, MERCED LARGE-SCALE QUASI-NEWTON TRUST-REGION METHODS: HIGH-ACCURACY SOLVERS, DENSE INITIALIZATIONS, AND EXTENSIONS

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

March 8, 2010 MATH 408 FINAL EXAM SAMPLE

University of Maryland at College Park. limited amount of computer memory, thereby allowing problems with a very large number

arxiv: v3 [math.na] 23 Mar 2016

Improved Damped Quasi-Newton Methods for Unconstrained Optimization

Optimization Tutorial 1. Basic Gradient Descent

Numerical Optimization

ALGORITHM XXX: SC-SR1: MATLAB SOFTWARE FOR SOLVING SHAPE-CHANGING L-SR1 TRUST-REGION SUBPROBLEMS

Jim Lambers MAT 419/519 Summer Session Lecture 11 Notes

nonrobust estimation The n measurement vectors taken together give the vector X R N. The unknown parameter vector is P R M.

Optimization for neural networks

ECE580 Exam 1 October 4, Please do not write on the back of the exam pages. Extra paper is available from the instructor.

Steepest Descent. Juan C. Meza 1. Lawrence Berkeley National Laboratory Berkeley, California 94720

Transcription:

Convex Optimization CMU-10725 Quasi Newton Methods Barnabás Póczos & Ryan Tibshirani

Quasi Newton Methods 2

Outline Modified Newton Method Rank one correction of the inverse Rank two correction of the inverse Davidon Fletcher Powell Method (DFP) Broyden Fletcher Goldfarb Shanno Method (BFGS) 3

Books to Read David G. Luenberger, Yinyu Ye: Linear and Nonlinear Programming Nesterov: Introductory lectures on convex optimization Bazaraa, Sherali, Shetty: Nonlinear Programming Dimitri P. Bestsekas: Nonlinear Programming 4

Motivation Motivation: Evaluation and use of the Hessian matrix is impractical or costly Idea: use an approximation to the inverse Hessian. Quasi Newton: somewhere between steepest descent and Newton s method 5

Modified Newton Method Goal: Gradient descent: Newton method: Modified Newton method: [Method of Deflected Gradients] Special cases: 6

Modified Newton Method Lemma [Descent direction] Proof: We know that if a vector has negative inner product with the gradient vector, then that direction is a descent direction 7

Quadratic problem Modified Newton Method update rule: Lemma [α k in quadratic problems] 8

Lemma [α k in quadratic problems] Quadratic problem Proof [α k ] 9

Convergence rate (Quadratic case) Theorem [Convergence rate of the modified Newton method] Then for the modified Newton method it holds at every step k Corollary If S k -1 is close to Q, then b k is close to B k, and then convergence is fast Proof: No time for it 10

Classical modified Newton s method Classical modified Newton: The Hessian at the initial point x 0 is used throughout the process. The effectiveness depends on how fast the Hessian is changing. 11

Construction of the inverse of the Hessian 12

Construction of the inverse Idea behind quasi-newton methods: construct the approximation of the inverse Hessian using information gathered during the process We show how the inverse Hessian can be built up from gradient information obtained at various points. Notation: In the quadratic case 13

Quadratic case: Construction of the inverse Goal: 14

Symmetric rank one correction (SR1) We want an update on H k such that : Let us find the update in this form [Rank one correction] Theorem[Rank one update of H k ] 15

Symmetric rank one correction (SR1) Proof: We already know that Therefore, Q.E.D. 16

Symmetric rank one correction (SR1) We still have to proof that this update will be good for us: Theorem[H k update works] Corollary 17

Symmetric rank one correction (SR1) Algorithm: [Modified Newton method with rank 1 correction] Issues: 18

Davidon Fletcher Powell Method [Rank two correction] 19

Davidon Fletcher Powell Method For a quadratic objective, it simultaneously generates the directions of the conjugate gradient method while constructing the inverse Hessian. At each step the inverse Hessian is updated by the sum of two symmetric rank one matrices. [rank two correction procedure] The method is also often referred to as the variable metric method 20

Davidon Fletcher Powell Method 21

Davidon Fletcher Powell Method Theorem[H k is positive definite] Theorem[DFP is a conjugate direction method] Corollary[finite step convergence for quadratic functions] 22

Broyden Fletcher Goldfarb Shanno In DFP, at each step the inverse Hessian is updated by the sum of two symmetric rank one matrices. BFGS we will estimate the Hessian Q, instead of its inverse In the quadratic case we already proved: To estimate H, we used the update: Therefore, if we switch q and p, then Q can be estimated as well with Q k 23

BFGS Similarly, the DFP update rule for H is Switching q and p, this can also be used to estimate Q: In the minimization algorithm, however, we will need an estimator of Q -1 To get an update for H k+1, let us use the Sherman-Morrison formula twice 24

Sherman-Morrison matrix inversion formula 25

BFGS Algorithm BFGS is almost the same as DFP, only the H update is different. In practice BFGS seems to work better than DFP. 26

Summary Modified Newton Method Rank one correction of the inverse Rank two correction of the inverse Davidon Fletcher Powell Method (DFP) Broyden Fletcher Goldfarb Shanno Method (BFGS) 27