An Iterative Descent Method

Similar documents
Quasi-Newton Methods

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.

The Conjugate Gradient Method for Solving Linear Systems of Equations

Nonlinear Optimization for Optimal Control

Conjugate Gradient Method

Conjugate Gradient Tutorial

The Conjugate Gradient Method

Lecture 11: CMSC 878R/AMSC698R. Iterative Methods An introduction. Outline. Inverse, LU decomposition, Cholesky, SVD, etc.

From Stationary Methods to Krylov Subspaces

Lec10p1, ORF363/COS323

Math 5630: Conjugate Gradient Method Hung M. Phan, UMass Lowell March 29, 2019

Optimization Tutorial 1. Basic Gradient Descent

1 Conjugate gradients

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal

Unconstrained minimization of smooth functions

Introduction to Unconstrained Optimization: Part 2

Iterative Methods for Solving A x = b

ORIE 6326: Convex Optimization. Quasi-Newton Methods

the method of steepest descent

, b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are

Some minimization problems

Lecture 10: September 26

Convex Optimization. Problem set 2. Due Monday April 26th

Performance Surfaces and Optimum Points

Newton s Method. Javier Peña Convex Optimization /36-725

Conjugate Gradient (CG) Method

14. Nonlinear equations

Programming, numerics and optimization

Mathematical optimization

10. Unconstrained minimization

ECE580 Exam 1 October 4, Please do not write on the back of the exam pages. Extra paper is available from the instructor.

Selected Topics in Optimization. Some slides borrowed from

Math 411 Preliminaries

LINEAR AND NONLINEAR PROGRAMMING

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

The Conjugate Gradient Method

Iterative Methods for Smooth Objective Functions

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

ECE580 Partial Solution to Problem Set 3

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Optimization Methods

ECE 680 Modern Automatic Control. Gradient and Newton s Methods A Review

6.4 Krylov Subspaces and Conjugate Gradients

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Lecture 3: Linesearch methods (continued). Steepest descent methods

Introduction to Optimization

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Conjugate Gradients: Idea

Improving the Convergence of Back-Propogation Learning with Second Order Methods

The Method of Conjugate Directions 21

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review

Course Notes: Week 1

CPSC 540: Machine Learning

Matrix Derivatives and Descent Optimization Methods

Truncated Newton Method

8 Numerical methods for unconstrained problems

Lecture 5: September 15

Numerical Optimization

EECS 275 Matrix Computation

Lecture 9: Krylov Subspace Methods. 2 Derivation of the Conjugate Gradient Algorithm

Penalty and Barrier Methods. So we again build on our unconstrained algorithms, but in a different way.

Nonlinear Optimization: What s important?

4 Newton Method. Unconstrained Convex Optimization 21. H(x)p = f(x). Newton direction. Why? Recall second-order staylor series expansion:

Unconstrained optimization

Lecture Notes: Geometric Considerations in Unconstrained Optimization

Gradient Descent. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725

5 Quasi-Newton Methods

Iterative methods for Linear System of Equations. Joint Advanced Student School (JASS-2009)

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

Numerical Optimization

Lecture 14: Newton s Method

Line Search Methods for Unconstrained Optimisation

Iterative methods for Linear System

Linear Solvers. Andrew Hazel

NonlinearOptimization

Introduction to unconstrained optimization - direct search methods

2. Quasi-Newton methods

Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization. Nick Gould (RAL)

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

M.A. Botchev. September 5, 2014

University of Maryland at College Park. limited amount of computer memory, thereby allowing problems with a very large number

Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016

Jordan Journal of Mathematics and Statistics (JJMS) 5(3), 2012, pp A NEW ITERATIVE METHOD FOR SOLVING LINEAR SYSTEMS OF EQUATIONS

Higher-Order Methods

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Report due date. Please note: report has to be handed in by Monday, May 16 noon.

Constrained Optimization

1 Introduction

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

Determination of Feasible Directions by Successive Quadratic Programming and Zoutendijk Algorithms: A Comparative Study

Convex Optimization. Prof. Nati Srebro. Lecture 12: Infeasible-Start Newton s Method Interior Point Methods

Linear Algebraic Equations

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

Nonlinear equations and optimization

ECE580 Exam 2 November 01, Name: Score: / (20 points) You are given a two data sets

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

MATHEMATICS FOR COMPUTER VISION WEEK 8 OPTIMISATION PART 2. Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year

Trajectory-based optimization

Transcription:

Conjugate Gradient: An Iterative Descent Method

The Plan Review Iterative Descent Conjugate Gradient

Review : Iterative Descent Iterative Descent is an unconstrained optimization process x (k+1) = x (k) + αδx Δx is the descent direction α > 0 is the step size along the descent direction Initial point x 0 needs to be chosen

Review : Descent Direction Gradient Descent: 1st-order approximation Δx = - f(x) Newton s Method: 2 nd -order approximation Δx = - 2 f(x) -1 f(x)

Review : Calculating Line search methods for calculating Exact Backtracking Others

Review : Convergence Gradient Descent: Linear rate Newton s Method: Quadratic near x* if certain conditions are met otherwise, linear rate

Conjugate Gradient (CG) Is an alternative Iterative Descent Is an alternative Iterative Descent algorithm

CG: Focus To more easily explain CG, I will focus on minimizing a quadratic system f(x) = ½x T Ax b T x + c

Ais CG: Assumptions Square Symmetric Positive Definite Theorem: - x* minimizes f(x) = ½x T Ax b T x + c iff x* solves Ax = b Will focus on solving Ax = b

CG: Philosophy Avoid taking steps in the same direction as Gradient Descent can Coordinate axes as search direction only works if already know answer! CG, instead of orthogonality, uses A- orthogonality Gradient Descent

CG: A-orthogonality For descent directions, the CG method uses an A-orthogonal set of nonzero search direction vectors {v (1),, v (n) } <v (i),av (j) > = 0 if i j Each search direction will be evaluated once and a step size of just the right length will be applied in order to line up with x*

CG: A-orthogonality Note: With respect to a symmetric, positive definite A, A-orthogonal vectors are linear independent So, they can be used as basis vectors for So, they can be used as basis vectors for components of the algorithm, such as errors and residuals

CG: General Idea With regard to an A-orthogonal basis vector, eliminate the component of the error (e (k) = x (k) x*) associated with that basis vector The new error, e (k+1), must be A-orthogonal to all A-orthogonal basis vectors considered so far

CG: Algorithm 1. Choose initial point x (0) 2. Generate first search direction v (1) 3. For k = 1, 2,, n 1. Generate (not search for) step size t k 2. x (k) = x (k-1) + t k v (k) (tk is the step size) 3. Generate next A-orthogonal search direction v (k+1) 4. x (n) is the solution to Ax = b

CG: Algorithm

CG: Notes Generating A-orthogonal vectors as we iterate, and don t have to keep old ones around. Important for storage on large problems. Convergence is quicker than O(n) if there are duplicated d eigenvalues.

CG: Notes For a well-conditioned matrix A, a good approximation can sometimes be reached in fewer steps (e.g. n ) Good approximation method for solving large pp g g sparse systems of Ax = b with nonzero entries in A occurring in predictable patterns

CG: Reality Intrudes Accumulated roundoff error causes the residual, r (i) = b Ax (i), to gradually lose accuracy Cancellation error causes the search vectors to lose A-orthogonality If A is ill-conditioned, CG is highly subject to rounding errors. can precondition the matrix to ameliorate this

CG: Nonlinear f(x) CG can be used if the gradient f(x) can be computed Use of CG in this case is guided by the idea that t near the solution point every problem is approximately quadratic. So, instead of approximating a solution to the actual problem, instead approximate a solution to an approximating problem.

CG: Nonlinear f(x) Convergence behavior is similar to that for the pure quadratic situation. Changes to CG algorithm recursive formula for residual r can t be used more complicated to compute the step size t k multiple ways to compute A-orthogonal descent vectors Not guaranteed to converge to global minimum if f has many local minima

CG: Nonlinear f(x) Doesn t have same convergence guarantees as linear CG Because CG can only generate n A-orthogonal vectors in n-space, it makes sense to restart CG every n iterations Preconditioning can be used to speed up convergence

CG: Roundup CG typically performs better than gradient descent, but not as well as Newton s method CG avoids Newton s information requirements associated with the evaluation, storage, and inversion of the Hessian (or at least a solution of a corresponding system of equations)

References Boyd & Vandenberghe, Convex Optimization Burden & Faires, Numerical Analysis, 8 th edition Chong & Zak, An Introduction to Optimization, 1 st edition Shewchuck, An Introduction to the Conjugate Gradient Method Without the Agonizing Pain, 1 ¼ edition, online Luenberger, Linear and Nonlinear Programming, 3 rd edition