Convex Optimization CMU-10725

Similar documents
Convex Optimization CMU-10725

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Lecture 14: Newton s Method

Lecture 5: September 15

Newton s Method. Javier Peña Convex Optimization /36-725

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

Selected Topics in Optimization. Some slides borrowed from

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

Optimization methods

Lecture 14: October 17

Unconstrained optimization

On Nesterov s Random Coordinate Descent Algorithms - Continued

CPSC 540: Machine Learning

Module 6 : Solving Ordinary Differential Equations - Initial Value Problems (ODE-IVPs) Section 1 : Introduction

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

6. Proximal gradient method

Convex Optimization CMU-10725

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review

Lecture 5: September 12

Higher-Order Methods

Lec7p1, ORF363/COS323

Subgradient Method. Ryan Tibshirani Convex Optimization

Unconstrained minimization of smooth functions

Lecture 5 : Projections

Quasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization

Optimization Tutorial 1. Basic Gradient Descent

arxiv: v1 [math.oc] 1 Jul 2016

Queens College, CUNY, Department of Computer Science Numerical Methods CSCI 361 / 761 Spring 2018 Instructor: Dr. Sateesh Mane.

Optimization methods

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent

Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016

Optimization II: Unconstrained Multivariable

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005

IFT Lecture 6 Nesterov s Accelerated Gradient, Stochastic Gradient Descent

2. Quasi-Newton methods

Gradient Descent. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725

An Iterative Descent Method

Gradient method based on epsilon algorithm for large-scale nonlinearoptimization

Lecture 17: October 27

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Line Search Methods for Unconstrained Optimisation

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

Numerical Optimization

8 Numerical methods for unconstrained problems

5 Quasi-Newton Methods

Design and Optimization of Energy Systems Prof. C. Balaji Department of Mechanical Engineering Indian Institute of Technology, Madras

5. Subgradient method

CS 435, 2018 Lecture 6, Date: 12 April 2018 Instructor: Nisheeth Vishnoi. Newton s Method

Lecture 5 Multivariate Linear Regression

Numerical Analysis Solution of Algebraic Equation (non-linear equation) 1- Trial and Error. 2- Fixed point

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

19. Basis and Dimension

Affine covariant Semi-smooth Newton in function space

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725

Landscapes & Algorithms for Quantum Control

Conditional Gradient (Frank-Wolfe) Method

Trust Region Methods. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

Newton Method with Adaptive Step-Size for Under-Determined Systems of Equations

LECTURE 8: DYNAMICAL SYSTEMS 7

Complexity of gradient descent for multiobjective optimization

OPER 627: Nonlinear Optimization Lecture 9: Trust-region methods

GRADIENT = STEEPEST DESCENT

Lecture 16: FTRL and Online Mirror Descent

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

Module 5 : Linear and Quadratic Approximations, Error Estimates, Taylor's Theorem, Newton and Picard Methods

Algorithms for nonlinear programming problems II

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

10. Unconstrained minimization

Algorithms for nonlinear programming problems II

Fast proximal gradient methods

A Second-Order Path-Following Algorithm for Unconstrained Convex Optimization

Programming, numerics and optimization

Lecture 25: November 27

Numerical Optimization

Nonlinear Optimization for Optimal Control

Lecture 16: October 22

Quasi-Newton Methods

Convex Optimization. Prof. Nati Srebro. Lecture 12: Infeasible-Start Newton s Method Interior Point Methods

CES739 Introduction to Iterative Methods in Computational Science, a.k.a. Newton s method

Chapter 8 Gradient Methods

Nonlinear equations. Norms for R n. Convergence orders for iterative methods

Introduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems

Lecture 23: November 21

Matrix Derivatives and Descent Optimization Methods

Handout on Newton s Method for Systems

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA

Complexity of Deciding Convexity in Polynomial Optimization

Lecture Notes: Geometric Considerations in Unconstrained Optimization

Zoology of Fatou sets

Lecture 10: September 26

Worst Case Complexity of Direct Search

Transcription:

Convex Optimization CMU-10725 Newton Method Barnabás Póczos & Ryan Tibshirani

Administrivia Scribing Projects HW1 solutions Feedback about lectures / solutions on blackboard 2

Books to read Boyd and Vandenberghe: Convex Optimization, Chapters 9.5 Nesterov: Introductory lectures on convex optimization Bazaraa, Sherali, Shetty: Nonlinear Programming Dimitri P. Bestsekas: Nonlinear Programming Wikipedia http://www.chiark.greenend.org.uk/~sgtatham/newton/ 3

Goal of this lecture Newton method Finding a root Unconstrained minimization Motivation with quadratic approximation Rate of Newton s method Newton fractals Next lectures: Conjugate gradients Quasi Newton Methods 4

Newton method for finding a root 5

Newton method for finding a root Newton method: originally developed for finding a root of a function also known as the Newton Raphson method 6

History Babylonian method: This is a special case of Newton s method S=100. starting values x 0 = 50, x 0 = 1, and x 0 = 5. 7

History 1690, Joseph Raphson: 1669, Isaac Newton: finding roots of polynomials finding roots of polynomials 1740, Thomas Simpson: 1879, Arthur Cayley: solving general nonlinear equation generalization to systems of two equations solving optimization problems (gradient = zero) generalizing the Newton's method to finding complex roots of polynomials 8

Goal: Newton Method for Finding a Root Linear Approximation (1 st order Taylor approx): Therefore, 9

Illustration of Newton s method Goal: finding a root In the next step we will linearize here in x 10

Example: Finding a Root http://en.wikipedia.org/wiki/newton%27s_method 11

Newton Method for Finding a Root This can be generalized to multivariate functions Therefore, [Pseudo inverse if there is no inverse] 12

Newton method for minimization 13

Newton method for minimization Iterate until convergence, or max number of iterations exceeded (divergence, loops, division by zero might happen ) 14

How good is the Newton method? 15

Descent direction Lemma [Descent direction] Proof: We know that if a vector has negative inner product with the gradient vector, then that direction is a descent direction 16

Newton method properties Quadratic convergence in the neighborhood of a strict local minimum [under some conditions]. It can break down if f (x k ) is degenerate. [no inverse] It can diverge. It can be trapped in a loop. It can converge to a loop 17

Motivation with Quadratic Approximation 18

Motivation with Quadratic Approximation Second order Taylor approximation: Assume that Newton step: 19

Motivation with Quadratic Approximation 20

Convergence rate (f: R R case) 21

Rates Example: Superlinear rate: Example: Sublinear rate: Example: Quadratic rate: Example: 22

Finding a root: Convergence rate Goal: Assumption: Taylor theorem: Therefore, 23

Finding a root: Convergence rate We have seen that 24

Problematic cases 25

Finding a root: chaotic behavior Let f(x)=x^3-2x^2-11x+12 Goal: find the roots, (-3, 1, 4), using Newton s method 2.35287527 converges to 4; 2.35284172 converges to 3; 2.35283735 converges to 4; 2.352836327 converges to 3; 2.352836323 converges to 1. 26

Finding a root: Cycle Goal: find its roots Stating point is important! 27

Finding a root: divergence everywhere (except in the root) Newton s method might never converge (except in the root)! 28

Finding a root: Linear convergence only If the first derivative is zero at the root, then convergence might be only linear (not quadratic) Linear convergence only! 29

Difficulties in minimization range of quadratic convergence 30

Generalizations Newton method in Banach spaces Newton method on the Banach space of functions We need Frechet derivatives Newton method on curved manifolds E.g. on space of otrthonormal matrices Newton method on complex numbers 31

Newton Fractals 32

Gradient descent 33

f(z)= z 4-1, Roots: -1, +1, -i, +i Compex functions color the starting point according to which root it ended up http://www.chiark.greenend.org.uk/~sgtatham/newton/ 34

f(z)= z 4-1, Roots: -1, +1, -i, +i Basins of attraction Shading according to how many iterations it needed till convergence http://www.chiark.greenend.org.uk/~sgtatham/newton/ 35

Basins of attraction f (z)=(z-1) 4 (z+1) 4 36

No convergence polynomial f, having the roots at +i, -i, -2.3and +2.3 Black circles: no convergence, attracting cycle with period 2 37

Avoiding divergence 38

Damped Newton method In order to avoid possible divergence do line search with back tracking We already know that the Newton direction is descent direction 39

Classes of differentiable functions 40

Classes of Differentiable functions Any continuous (but otherwise ugly, nondifferentiable) function can be approximated arbitrarily well with smooth (k times differentiable) function Assuming smoothness only is not enough to talk about rates We need stronger assumptions on the derivatives. [e.g. their magnitudes behaves nicely] 41

The C L k,p (Q) Class Definition [Lipschitz continuous pth order derivative] Notation 42

Lemma [Linear combinations] Trivial Lemmas Lemma [Class hierarchy] 43

Relation between 1 st and 2 nd derivatives Lemma Proof Therefore, Special case, Q.E.D 44

C L 2,1 (R n ) and the norm of f Lemma [C L 2,1 and the norm of f ] Proof Q.E.D 45

C L 2,1 (R n ) and the norm of f Proving the other direction Q.E.D 46

Examples 47

Error of 1 st ordertaylor approx. in C L 1,1 Lemma [1 st ordertaylorapproximation in C L 1,1 ] Proof 48

Error of 1 st ordertaylor approx. in C L 1,1 Q.E.D 49

Sandwiching with quadratic functions in C L 1,1 We have proved: Corollary [Sandwiching C L 1,1 functions with quadratic functions] Function f can be lower and upper bounded with quadratic functions 50

Lemma [Properties of C L 2,2 Class] C L 2,2 (R n ) Class [Error of the 1st order approximation of f ] [Error of the 2nd order approximation of f] Proof (*1) Definition (*2) Same as previous lemma [f instead of f] (*3) Similar [Homework] 51

By definition Sandwiching f (y) in C L 2,2 (R n ) Corollary [Sandwiching f (y) matrix] Proof Q.E.D 52

Convergence rate of Newton s method 53

Convergence rate of Newton s method Assumptions [Local convergence only] 54

Convergence rate of Newton s method Newton step: We already know: Therefore, Trivial identity: 55

Convergence rate of Newton s method 56

Convergence rate of Newton s method We have already proved: Therefore, and thus, 57

Convergence rate of Newton s method We already know: 58

Convergence rate of Newton s method Now, we have that The error doesn t increase! 59

Convergence rate of Newton s method We have proved the following theorem Theorem [Rate of Newton s method] Quadratic rate! 60

Summary Newton method Finding a root Unconstrained minimization Motivation with quadratic approximation Rate of Newton s method Newton fractals Classes of differentiable functions 61