# Optimization Tutorial 1. Basic Gradient Descent

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra. In many real-world applications of computer science, engineering, and economics, one often encounters numerical optimization problems of the following form, where the goal is to find the minimum of a realvalued function f : R d R over a (closed) constraint set C R d : min f(x). x C In this three-part tutorial, we shall discuss theory and algorithms for solving such problems. We begin in this part with the basic gradient descent method for solving unconstrained optimization problems where C = R d ; in the next two parts of the tutorial, we shall focus on constrained optimization problems. 1 One of the simplest examples of a unconstrained optimization problem is the following quadratic optimization problem, which we shall use as a running example in this document: min x R d f quad (x) = 1 2 x Ax b x, (OP1) where A R d d and b R d. Before we proceed to discuss methods for solving problems of the above form, we will find it useful to first provide some preliminaries on vector calculus and linear algebra. 1 Preliminaries Throughout this document, we shall assume that the given function f : R d R that we wish to optimize is bounded below, i.e., B R s.t. f(x) B, x R d. Gradient and Hessian. The gradient of a multi-dimensional function f : R d R is a generalization of the first derivative of a one-dimensional function, and is given by a vector of first-order partial derivatives of f (if they exist): f(x) = x 1 x 2. x d The Hessian of f is a generalization of the second derivative in one-dimension, and is a matrix of second-order partial derivatives of f (if they exist): 2 f(x) = x 2 1. x d x x 1 x d x 2 d 1 While we focus on minimization problems in the tutorial, the theory and methods discussed apply to maximization problems as well.. 1

2 2 Optimization Tutorial 1: Basic Gradient Descent Function value at x Original function First order approx. Second order approx x Figure 1: Plot of f(x) = e x and its first-order and second-order Taylor approximations at x 0 = 0. It can be verified that for the quadratic function in (OP1), f(x) = Ax b and 2 f(x) = A. Also, let C k denote the set of all real-valued functions whose k th -order partial derivatives exist and are continuous. Taylor series. We will often require to approximate a function in terms of its gradient and Hessian at a particular point x 0 R d. For this, we shall resort to the Taylor expansion of the functions at x 0. More specifically, the first-order Taylor expansion of a function f : R d R in C 1 is given by f(x) = f(x 0 ) + f(x 0 ) (x x 0 ) + o( x x 0 2 ), (1) while the second-order Taylor expansion of a function f in C 2 is given by f(x) = f(x 0 ) + f(x 0 ) (x x 0 ) (x x 0) 2 f(x 0 )(x x 0 ) + o( x x 0 2 2), u(t) where u(t) = o(v(t)) shall denote lim t 0 v(t) = 0, i.e., u(t) goes to 0 faster than v(t) as t 0; this tells us that when x x is sufficiently small, the last term in the above expansions can be effectively ignored. For the one-dimensional function f(x) = e x, the first-order Taylor approximation at 0 is 1 x, while the second-order Taylor approximation at 0 is 1 x x2 ; see Figure 1 for an illustration. Positive (semi-)definiteness. Finally, a matrix A R d d is called positive semi-definite (p.s.d.) if x R d, x 0, x Ax 0 and positive definite (p.d.) if this inequality is strict; it can be verified that a p.s.d (p.d.) matrix has non-negative (positive) eigen values. 2 Necessary and Sufficient Conditions for Optimality As a first step towards solving an unconstrained optimization problem, we provide necessary and sufficient conditions for the minimizer of a function in C 2. While our ideal goal is to find the global minimizer of f, i.e., a point x R d such that f(x ) f(x), x R d, due to practical considerations, often one needs to settle with a solution that is locally optimum: Definition 1. A point x R d is said to be a local minimizer of a function f : R d R if there exists ɛ > 0 such that f(x ) f(x) for all x R d within an ɛ-distance of x, i.e., x x 2 ɛ. The following characterization of a local minimizer of f is well-known; see for example [3]. Proposition 1 (Necessary and sufficient conditions for local optimality). Let f : R d R be a function in C 2. If x R d is a local minimizer of f, then 1. f(x ) = f(x ) is p.s.d. Moreover, if for a given x, f(x ) = 0 and 2 f(x ) is p.d., then x is a local minimizer of f. Let us apply the above characterization to the optimization problem in (OP1). Equating the gradient of f quad in this problem to zero we have f quad (x ) = Ax b = 0. If A is invertible, then x = A 1 b; further, if 2 f quad (x ) = A is p.d., we have that x is a local minimizer of f quad. Clearly, in order to minimize a function f over R d, we will have to find a point with zero gradient; the Hessian at this point can then be used to determine if this point is indeed a (local) minimizer of f. In the next section, we describe a technique for finding a point in R d where the gradient of f is zero.

4 4 Optimization Tutorial 1: Basic Gradient Descent α = 0 α = 0.25 α = 1 f(x t η t f(x t )) f(x t ) αη t f(x t ) η t Figure 3: Illustration of backtracking line search for an example function f : R d R. The solid black curve is a plot of f(x t+1 ) = f(x t η t f(x t )) as a function of step-size η t. The dotted blue curves are plots of the linear function f(x t ) αη t f(x t ) 2 2 as a function of η t for different values of α. When α = 1, this gives us a line that lies below the solid curve; when α < 1, we get a line that has portions above the solid curve, with α = 0 corresponding to a horizontal line. Given a value of α (0, 1), the backtracking line search scheme aims to find a value of η t for which f(x t η t f(x t )) f(x t ) αη t f(x t ) 2 2; as seen from the plots, for any value of α (0, 1), such a η t always exists. Note that the above sub-problem is itself an optimization problem (involving a single variable). In some cases, this sub-problem can be solved analytically, resulting in a simple closed-form solution for η t. For example, in the case of the quadratic problem in (OP1), the above exact line search scheme gives us: η t = f(x t ) 2 2 f(x t ) A f(x t ). However, in many optimization problems of practical importance, it is often quite expensive to solve this inner line search problem exactly. In such cases, one resorts to an approximate approach for solving Eq. (2); we next describe one such scheme. Inexact line search. An alternate to the above exact line search approach is a simple heuristic where one starts with a large value of η and multiplicatively decreases this value until f(x t η f(x t )) < f(x t ). However, this scheme may not yield sufficient progress in each iteration to provide convergence guarantees for the gradient descent method. Instead, consider the following sophisticated version of this heuristic that does indeed guarantee a sufficient decrease in the function value after each iteration [1]: Inexact (Backtracking) Line Search Input: f : R d R, x t Parameter: α (0, 1), β (0, 1) η t = 1 while f(x t η t f(x t )) > f(x t ) αη t f(x t ) 2 2 η t = βη t Output: η t Recall that the first order Taylor approximation of f(x t η t f(x t )) is f(x t ) η t f(x t ) 2 2. The above scheme aims to find a step-size η t that is lesser than a scaled version of this linear approximation: f(x t ) αη t f(x t ) 2 2. As seen from Figure 3, such a value of η t always exists, ensuring that the above procedure terminates. Moreover, at termination, we will have f(x t+1 ) = f(x t η t f(x t )) f(x t ) αη t f(x t ) 2 2 < f(x t ), as desired. Note that the parameter α now gives us a handle on the amount decrease in function value after each iteration and will determine the speed of convergence of the gradient descent method. Under (a slight variant of) the above backtracking line search scheme (and under a certain smoothness assumption on the function being optimized), the gradient descent method can be shown to generate a sequence of iterates whose gradients converge to zero; see for example [2] for a formal guarantee. Moreover, when the function being optimized satisfies certain additional properties, one can give explicit rates of convergence for the gradient descent method (under both the exact and backtracking line search schemes), i.e., bound the number of iterations required by the method to reach a solution that is within a certain factor of the optimal function value [1]. We shall discuss these guarantees in the second part of this tutorial.

### Gradient Descent. Dr. Xiaowei Huang

Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,

### Constrained Optimization and Lagrangian Duality

CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

### min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

### Unconstrained minimization of smooth functions

Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and

6-1: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction to gradient descent Derivation and intuitions Hessian 6-2: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction Our

### Optimization Methods. Lecture 19: Line Searches and Newton s Method

15.93 Optimization Methods Lecture 19: Line Searches and Newton s Method 1 Last Lecture Necessary Conditions for Optimality (identifies candidates) x local min f(x ) =, f(x ) PSD Slide 1 Sufficient Conditions

### Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent

10-725/36-725: Convex Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 5: Gradient Descent Scribes: Loc Do,2,3 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for

### Numerical optimization

Numerical optimization Lecture 4 Alexander & Michael Bronstein tosca.cs.technion.ac.il/book Numerical geometry of non-rigid shapes Stanford University, Winter 2009 2 Longest Slowest Shortest Minimal Maximal

### MATH 4211/6211 Optimization Basics of Optimization Problems

MATH 4211/6211 Optimization Basics of Optimization Problems Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 A standard minimization

### Convex Optimization. Problem set 2. Due Monday April 26th

Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining

### ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints Instructor: Prof. Kevin Ross Scribe: Nitish John October 18, 2011 1 The Basic Goal The main idea is to transform a given constrained

### Constrained Optimization

1 / 22 Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 30, 2015 2 / 22 1. Equality constraints only 1.1 Reduced gradient 1.2 Lagrange

### Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

### Nonlinear Optimization for Optimal Control

Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]

### Unconstrained optimization

Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

### Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

### E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained

### Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

### The Steepest Descent Algorithm for Unconstrained Optimization

The Steepest Descent Algorithm for Unconstrained Optimization Robert M. Freund February, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Steepest Descent Algorithm The problem

### Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

1 Numerical optimization Alexander & Michael Bronstein, 2006-2009 Michael Bronstein, 2010 tosca.cs.technion.ac.il/book Numerical optimization 048921 Advanced topics in vision Processing and Analysis of

### Nonlinear Optimization

Nonlinear Optimization (Com S 477/577 Notes) Yan-Bin Jia Nov 7, 2017 1 Introduction Given a single function f that depends on one or more independent variable, we want to find the values of those variables

Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

### Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like

### Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Introduction to Machine Learning Lecture 7 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Convex Optimization Differentiation Definition: let f : X R N R be a differentiable function,

### Scientific Computing: Optimization

Scientific Computing: Optimization Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course MATH-GA.2043 or CSCI-GA.2112, Spring 2012 March 8th, 2011 A. Donev (Courant Institute) Lecture

### 10. Unconstrained minimization

Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation

### MVE165/MMG631 Linear and integer optimization with applications Lecture 13 Overview of nonlinear programming. Ann-Brith Strömberg

MVE165/MMG631 Overview of nonlinear programming Ann-Brith Strömberg 2015 05 21 Areas of applications, examples (Ch. 9.1) Structural optimization Design of aircraft, ships, bridges, etc Decide on the material

### AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality

AM 205: lecture 18 Last time: optimization methods Today: conditions for optimality Existence of Global Minimum For example: f (x, y) = x 2 + y 2 is coercive on R 2 (global min. at (0, 0)) f (x) = x 3

### Numerical Optimization of Partial Differential Equations

Numerical Optimization of Partial Differential Equations Part I: basic optimization concepts in R n Bartosz Protas Department of Mathematics & Statistics McMaster University, Hamilton, Ontario, Canada

### CSCI 1951-G Optimization Methods in Finance Part 09: Interior Point Methods

CSCI 1951-G Optimization Methods in Finance Part 09: Interior Point Methods March 23, 2018 1 / 35 This material is covered in S. Boyd, L. Vandenberge s book Convex Optimization https://web.stanford.edu/~boyd/cvxbook/.

### Nonlinear Programming

Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week

### Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained

### Lecture V. Numerical Optimization

Lecture V Numerical Optimization Gianluca Violante New York University Quantitative Macroeconomics G. Violante, Numerical Optimization p. 1 /19 Isomorphism I We describe minimization problems: to maximize

### 8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

### Optimality Conditions for Constrained Optimization

72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

### Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications

### Line Search Methods for Unconstrained Optimisation

Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic

### Generalization to inequality constrained problem. Maximize

Lecture 11. 26 September 2006 Review of Lecture #10: Second order optimality conditions necessary condition, sufficient condition. If the necessary condition is violated the point cannot be a local minimum

Gradient Descent Mohammad Emtiyaz Khan EPFL Sep 22, 201 Mohammad Emtiyaz Khan 2014 1 Learning/estimation/fitting Given a cost function L(β), we wish to find β that minimizes the cost: min β L(β), subject

### 1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

Newton s Method Suppose we want to solve: (P:) min f (x) At x = x, f (x) can be approximated by: n x R. f (x) h(x) := f ( x)+ f ( x) T (x x)+ (x x) t H ( x)(x x), 2 which is the quadratic Taylor expansion

### 1 Kernel methods & optimization

Machine Learning Class Notes 9-26-13 Prof. David Sontag 1 Kernel methods & optimization One eample of a kernel that is frequently used in practice and which allows for highly non-linear discriminant functions

### Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Numerical Optimization 1 Working Horse in Computer Vision Variational Methods Shape Analysis Machine Learning Markov Random Fields Geometry Common denominator: optimization problems 2 Overview of Methods

### OPER 627: Nonlinear Optimization Lecture 2: Math Background and Optimality Conditions

OPER 627: Nonlinear Optimization Lecture 2: Math Background and Optimality Conditions Department of Statistical Sciences and Operations Research Virginia Commonwealth University Aug 28, 2013 (Lecture 2)

### 15-859E: Advanced Algorithms CMU, Spring 2015 Lecture #16: Gradient Descent February 18, 2015

5-859E: Advanced Algorithms CMU, Spring 205 Lecture #6: Gradient Descent February 8, 205 Lecturer: Anupam Gupta Scribe: Guru Guruganesh In this lecture, we will study the gradient descent algorithm and

### Selected Topics in Optimization. Some slides borrowed from

Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model

### Lecture 5: September 15

10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 15 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Di Jin, Mengdi Wang, Bin Deng Note: LaTeX template courtesy of UC Berkeley EECS

### Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

### Lecture 14: October 17

1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

### Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.

Optimization Unconstrained optimization One-dimensional Multi-dimensional Newton s method Basic Newton Gauss- Newton Quasi- Newton Descent methods Gradient descent Conjugate gradient Constrained optimization

### TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM H. E. Krogstad, IMF, Spring 2012 Karush-Kuhn-Tucker (KKT) Theorem is the most central theorem in constrained optimization, and since the proof is scattered

CHAPTER 2: QUADRATIC PROGRAMMING Overview Quadratic programming (QP) problems are characterized by objective functions that are quadratic in the design variables, and linear constraints. In this sense,

### 5 Handling Constraints

5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

### Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method

Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 3 Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 2 3.1. Gradient method Classical gradient method: to minimize a differentiable convex

### 2.098/6.255/ Optimization Methods Practice True/False Questions

2.098/6.255/15.093 Optimization Methods Practice True/False Questions December 11, 2009 Part I For each one of the statements below, state whether it is true or false. Include a 1-3 line supporting sentence

### Computational Finance

Department of Mathematics at University of California, San Diego Computational Finance Optimization Techniques [Lecture 2] Michael Holst January 9, 2017 Contents 1 Optimization Techniques 3 1.1 Examples

### Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,

### CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This

### Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen

### Lecture Notes: Geometric Considerations in Unconstrained Optimization

Lecture Notes: Geometric Considerations in Unconstrained Optimization James T. Allison February 15, 2006 The primary objectives of this lecture on unconstrained optimization are to: Establish connections

### Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Gradient descent Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Gradient descent First consider unconstrained minimization of f : R n R, convex and differentiable. We want to solve

### minimize x subject to (x 2)(x 4) u,

Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for

### ARE202A, Fall 2005 CONTENTS. 1. Graphical Overview of Optimization Theory (cont) Separating Hyperplanes 1

AREA, Fall 5 LECTURE #: WED, OCT 5, 5 PRINT DATE: OCTOBER 5, 5 (GRAPHICAL) CONTENTS 1. Graphical Overview of Optimization Theory (cont) 1 1.4. Separating Hyperplanes 1 1.5. Constrained Maximization: One

### Optimization. Yuh-Jye Lee. March 21, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 29

Optimization Yuh-Jye Lee Data Science and Machine Intelligence Lab National Chiao Tung University March 21, 2017 1 / 29 You Have Learned (Unconstrained) Optimization in Your High School Let f (x) = ax

### Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

CDS270 Maryam Fazel Lecture 2 Topics from Optimization and Duality Motivation network utility maximization (NUM) problem: consider a network with S sources (users), each sending one flow at rate x s, through

### 1 Numerical optimization

Contents 1 Numerical optimization 5 1.1 Optimization of single-variable functions............ 5 1.1.1 Golden Section Search................... 6 1.1. Fibonacci Search...................... 8 1. Algorithms

### Optimization methods

Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

### Appendix A Taylor Approximations and Definite Matrices

Appendix A Taylor Approximations and Definite Matrices Taylor approximations provide an easy way to approximate a function as a polynomial, using the derivatives of the function. We know, from elementary

### An Iterative Descent Method

Conjugate Gradient: An Iterative Descent Method The Plan Review Iterative Descent Conjugate Gradient Review : Iterative Descent Iterative Descent is an unconstrained optimization process x (k+1) = x (k)

### Descent methods. min x. f(x)

Gradient Descent Descent methods min x f(x) 5 / 34 Descent methods min x f(x) x k x k+1... x f(x ) = 0 5 / 34 Gradient methods Unconstrained optimization min f(x) x R n. 6 / 34 Gradient methods Unconstrained

### Constrained Optimization Theory

Constrained Optimization Theory Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. IMA, August 2016 Stephen Wright (UW-Madison) Constrained Optimization Theory IMA, August

### ICS-E4030 Kernel Methods in Machine Learning

ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This

### Roles of Convexity in Optimization Theory. Efor, T. E and Nshi C. E

IDOSR PUBLICATIONS International Digital Organization for Scientific Research ISSN: 2550-7931 Roles of Convexity in Optimization Theory Efor T E and Nshi C E Department of Mathematics and Computer Science

### 14. Nonlinear equations

L. Vandenberghe ECE133A (Winter 2018) 14. Nonlinear equations Newton method for nonlinear equations damped Newton method for unconstrained minimization Newton method for nonlinear least squares 14-1 Set

### 1 Overview. 2 A Characterization of Convex Functions. 2.1 First-order Taylor approximation. AM 221: Advanced Optimization Spring 2016

AM 221: Advanced Optimization Spring 2016 Prof. Yaron Singer Lecture 8 February 22nd 1 Overview In the previous lecture we saw characterizations of optimality in linear optimization, and we reviewed the

### Lecture 5 : Projections

Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization

### Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016

Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall 206 2 Nov 2 Dec 206 Let D be a convex subset of R n. A function f : D R is convex if it satisfies f(tx + ( t)y) tf(x)

### Introduction to Scientific Computing

Introduction to Scientific Computing Benson Muite benson.muite@ut.ee http://kodu.ut.ee/ benson https://courses.cs.ut.ee/2018/isc/spring 26 March 2018 [Public Domain,https://commons.wikimedia.org/wiki/File1

### Lecture 3. Optimization Problems and Iterative Algorithms

Lecture 3 Optimization Problems and Iterative Algorithms January 13, 2016 This material was jointly developed with Angelia Nedić at UIUC for IE 598ns Outline Special Functions: Linear, Quadratic, Convex

### Optimization Problems with Constraints - introduction to theory, numerical Methods and applications

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications Dr. Abebe Geletu Ilmenau University of Technology Department of Simulation and Optimal Processes (SOP)

### 10-725/36-725: Convex Optimization Prerequisite Topics

10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the

### Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

Quiz Discussion IE417: Nonlinear Programming: Lecture 12 Jeff Linderoth Department of Industrial and Systems Engineering Lehigh University 16th March 2006 Motivation Why do we care? We are interested in

### Optimality conditions for unconstrained optimization. Outline

Optimality conditions for unconstrained optimization Daniel P. Robinson Department of Applied Mathematics and Statistics Johns Hopkins University September 13, 2018 Outline 1 The problem and definitions

### 4 Newton Method. Unconstrained Convex Optimization 21. H(x)p = f(x). Newton direction. Why? Recall second-order staylor series expansion:

Unconstrained Convex Optimization 21 4 Newton Method H(x)p = f(x). Newton direction. Why? Recall second-order staylor series expansion: f(x + p) f(x)+p T f(x)+ 1 2 pt H(x)p ˆf(p) In general, ˆf(p) won

### CPSC 540: Machine Learning

CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in

### Nonlinear Optimization: What s important?

Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global

### Part 2: Linesearch methods for unconstrained optimization. Nick Gould (RAL)

Part 2: Linesearch methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

### Lecture 5: September 12

10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 12 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Barun Patra and Tyler Vuong Note: LaTeX template courtesy of UC Berkeley EECS

### Lecture 7 Unconstrained nonlinear programming

Lecture 7 Unconstrained nonlinear programming Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University,

### Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method

Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method Robert M. Freund March, 2004 2004 Massachusetts Institute of Technology. The Problem The logarithmic barrier approach

### Optimization for Machine Learning

Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

### Numerical Optimization

Constrained Optimization Computer Science and Automation Indian Institute of Science Bangalore 560 012, India. NPTEL Course on Constrained Optimization Constrained Optimization Problem: min h j (x) 0,

### Least Mean Squares Regression. Machine Learning Fall 2018

Least Mean Squares Regression Machine Learning Fall 2018 1 Where are we? Least Squares Method for regression Examples The LMS objective Gradient descent Incremental/stochastic gradient descent Exercises

### NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume

### AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality

### Trust Regions. Charles J. Geyer. March 27, 2013

Trust Regions Charles J. Geyer March 27, 2013 1 Trust Region Theory We follow Nocedal and Wright (1999, Chapter 4), using their notation. Fletcher (1987, Section 5.1) discusses the same algorithm, but