# Optimization Tutorial 1. Basic Gradient Descent

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra. In many real-world applications of computer science, engineering, and economics, one often encounters numerical optimization problems of the following form, where the goal is to find the minimum of a realvalued function f : R d R over a (closed) constraint set C R d : min f(x). x C In this three-part tutorial, we shall discuss theory and algorithms for solving such problems. We begin in this part with the basic gradient descent method for solving unconstrained optimization problems where C = R d ; in the next two parts of the tutorial, we shall focus on constrained optimization problems. 1 One of the simplest examples of a unconstrained optimization problem is the following quadratic optimization problem, which we shall use as a running example in this document: min x R d f quad (x) = 1 2 x Ax b x, (OP1) where A R d d and b R d. Before we proceed to discuss methods for solving problems of the above form, we will find it useful to first provide some preliminaries on vector calculus and linear algebra. 1 Preliminaries Throughout this document, we shall assume that the given function f : R d R that we wish to optimize is bounded below, i.e., B R s.t. f(x) B, x R d. Gradient and Hessian. The gradient of a multi-dimensional function f : R d R is a generalization of the first derivative of a one-dimensional function, and is given by a vector of first-order partial derivatives of f (if they exist): f(x) = x 1 x 2. x d The Hessian of f is a generalization of the second derivative in one-dimension, and is a matrix of second-order partial derivatives of f (if they exist): 2 f(x) = x 2 1. x d x x 1 x d x 2 d 1 While we focus on minimization problems in the tutorial, the theory and methods discussed apply to maximization problems as well.. 1

2 2 Optimization Tutorial 1: Basic Gradient Descent Function value at x Original function First order approx. Second order approx x Figure 1: Plot of f(x) = e x and its first-order and second-order Taylor approximations at x 0 = 0. It can be verified that for the quadratic function in (OP1), f(x) = Ax b and 2 f(x) = A. Also, let C k denote the set of all real-valued functions whose k th -order partial derivatives exist and are continuous. Taylor series. We will often require to approximate a function in terms of its gradient and Hessian at a particular point x 0 R d. For this, we shall resort to the Taylor expansion of the functions at x 0. More specifically, the first-order Taylor expansion of a function f : R d R in C 1 is given by f(x) = f(x 0 ) + f(x 0 ) (x x 0 ) + o( x x 0 2 ), (1) while the second-order Taylor expansion of a function f in C 2 is given by f(x) = f(x 0 ) + f(x 0 ) (x x 0 ) (x x 0) 2 f(x 0 )(x x 0 ) + o( x x 0 2 2), u(t) where u(t) = o(v(t)) shall denote lim t 0 v(t) = 0, i.e., u(t) goes to 0 faster than v(t) as t 0; this tells us that when x x is sufficiently small, the last term in the above expansions can be effectively ignored. For the one-dimensional function f(x) = e x, the first-order Taylor approximation at 0 is 1 x, while the second-order Taylor approximation at 0 is 1 x x2 ; see Figure 1 for an illustration. Positive (semi-)definiteness. Finally, a matrix A R d d is called positive semi-definite (p.s.d.) if x R d, x 0, x Ax 0 and positive definite (p.d.) if this inequality is strict; it can be verified that a p.s.d (p.d.) matrix has non-negative (positive) eigen values. 2 Necessary and Sufficient Conditions for Optimality As a first step towards solving an unconstrained optimization problem, we provide necessary and sufficient conditions for the minimizer of a function in C 2. While our ideal goal is to find the global minimizer of f, i.e., a point x R d such that f(x ) f(x), x R d, due to practical considerations, often one needs to settle with a solution that is locally optimum: Definition 1. A point x R d is said to be a local minimizer of a function f : R d R if there exists ɛ > 0 such that f(x ) f(x) for all x R d within an ɛ-distance of x, i.e., x x 2 ɛ. The following characterization of a local minimizer of f is well-known; see for example [3]. Proposition 1 (Necessary and sufficient conditions for local optimality). Let f : R d R be a function in C 2. If x R d is a local minimizer of f, then 1. f(x ) = f(x ) is p.s.d. Moreover, if for a given x, f(x ) = 0 and 2 f(x ) is p.d., then x is a local minimizer of f. Let us apply the above characterization to the optimization problem in (OP1). Equating the gradient of f quad in this problem to zero we have f quad (x ) = Ax b = 0. If A is invertible, then x = A 1 b; further, if 2 f quad (x ) = A is p.d., we have that x is a local minimizer of f quad. Clearly, in order to minimize a function f over R d, we will have to find a point with zero gradient; the Hessian at this point can then be used to determine if this point is indeed a (local) minimizer of f. In the next section, we describe a technique for finding a point in R d where the gradient of f is zero.

4 4 Optimization Tutorial 1: Basic Gradient Descent α = 0 α = 0.25 α = 1 f(x t η t f(x t )) f(x t ) αη t f(x t ) η t Figure 3: Illustration of backtracking line search for an example function f : R d R. The solid black curve is a plot of f(x t+1 ) = f(x t η t f(x t )) as a function of step-size η t. The dotted blue curves are plots of the linear function f(x t ) αη t f(x t ) 2 2 as a function of η t for different values of α. When α = 1, this gives us a line that lies below the solid curve; when α < 1, we get a line that has portions above the solid curve, with α = 0 corresponding to a horizontal line. Given a value of α (0, 1), the backtracking line search scheme aims to find a value of η t for which f(x t η t f(x t )) f(x t ) αη t f(x t ) 2 2; as seen from the plots, for any value of α (0, 1), such a η t always exists. Note that the above sub-problem is itself an optimization problem (involving a single variable). In some cases, this sub-problem can be solved analytically, resulting in a simple closed-form solution for η t. For example, in the case of the quadratic problem in (OP1), the above exact line search scheme gives us: η t = f(x t ) 2 2 f(x t ) A f(x t ). However, in many optimization problems of practical importance, it is often quite expensive to solve this inner line search problem exactly. In such cases, one resorts to an approximate approach for solving Eq. (2); we next describe one such scheme. Inexact line search. An alternate to the above exact line search approach is a simple heuristic where one starts with a large value of η and multiplicatively decreases this value until f(x t η f(x t )) < f(x t ). However, this scheme may not yield sufficient progress in each iteration to provide convergence guarantees for the gradient descent method. Instead, consider the following sophisticated version of this heuristic that does indeed guarantee a sufficient decrease in the function value after each iteration [1]: Inexact (Backtracking) Line Search Input: f : R d R, x t Parameter: α (0, 1), β (0, 1) η t = 1 while f(x t η t f(x t )) > f(x t ) αη t f(x t ) 2 2 η t = βη t Output: η t Recall that the first order Taylor approximation of f(x t η t f(x t )) is f(x t ) η t f(x t ) 2 2. The above scheme aims to find a step-size η t that is lesser than a scaled version of this linear approximation: f(x t ) αη t f(x t ) 2 2. As seen from Figure 3, such a value of η t always exists, ensuring that the above procedure terminates. Moreover, at termination, we will have f(x t+1 ) = f(x t η t f(x t )) f(x t ) αη t f(x t ) 2 2 < f(x t ), as desired. Note that the parameter α now gives us a handle on the amount decrease in function value after each iteration and will determine the speed of convergence of the gradient descent method. Under (a slight variant of) the above backtracking line search scheme (and under a certain smoothness assumption on the function being optimized), the gradient descent method can be shown to generate a sequence of iterates whose gradients converge to zero; see for example [2] for a formal guarantee. Moreover, when the function being optimized satisfies certain additional properties, one can give explicit rates of convergence for the gradient descent method (under both the exact and backtracking line search schemes), i.e., bound the number of iterations required by the method to reach a solution that is within a certain factor of the optimal function value [1]. We shall discuss these guarantees in the second part of this tutorial.

### Optimization Methods. Lecture 19: Line Searches and Newton s Method

15.93 Optimization Methods Lecture 19: Line Searches and Newton s Method 1 Last Lecture Necessary Conditions for Optimality (identifies candidates) x local min f(x ) =, f(x ) PSD Slide 1 Sufficient Conditions

### Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

### Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

### Nonlinear Optimization

Nonlinear Optimization (Com S 477/577 Notes) Yan-Bin Jia Nov 7, 2017 1 Introduction Given a single function f that depends on one or more independent variable, we want to find the values of those variables

### 10. Unconstrained minimization

Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation

Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

### 8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

### OPER 627: Nonlinear Optimization Lecture 2: Math Background and Optimality Conditions

OPER 627: Nonlinear Optimization Lecture 2: Math Background and Optimality Conditions Department of Statistical Sciences and Operations Research Virginia Commonwealth University Aug 28, 2013 (Lecture 2)

### Line Search Methods for Unconstrained Optimisation

Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic

### Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications

### 1 Kernel methods & optimization

Machine Learning Class Notes 9-26-13 Prof. David Sontag 1 Kernel methods & optimization One eample of a kernel that is frequently used in practice and which allows for highly non-linear discriminant functions

### Optimization methods

Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

CHAPTER 2: QUADRATIC PROGRAMMING Overview Quadratic programming (QP) problems are characterized by objective functions that are quadratic in the design variables, and linear constraints. In this sense,

### Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method

Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method Robert M. Freund March, 2004 2004 Massachusetts Institute of Technology. The Problem The logarithmic barrier approach

### 10-725/36-725: Convex Optimization Prerequisite Topics

10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the

### Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima

B9824 Foundations of Optimization Lecture 1: Introduction Fall 2009 Copyright 2009 Ciamac Moallemi Outline 1. Administrative matters 2. Introduction 3. Existence of optima 4. Local theory of unconstrained

### Optimization Part 1 P. Agius L2.1, Spring 2008

Optimization Part 1 Contents Terms and definitions, formulating the problem Unconstrained optimization conditions FONC, SONC, SOSC Searching for the solution Which direction? What step size? Constrained

### Trust Regions. Charles J. Geyer. March 27, 2013

Trust Regions Charles J. Geyer March 27, 2013 1 Trust Region Theory We follow Nocedal and Wright (1999, Chapter 4), using their notation. Fletcher (1987, Section 5.1) discusses the same algorithm, but

### Optimization for Machine Learning

Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

### Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

Quiz Discussion IE417: Nonlinear Programming: Lecture 12 Jeff Linderoth Department of Industrial and Systems Engineering Lehigh University 16th March 2006 Motivation Why do we care? We are interested in

### Chapter 3 Numerical Methods

Chapter 3 Numerical Methods Part 2 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization 1 Outline 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization Summary 2 Outline 3.2

### Lecture 6: Conic Optimization September 8

IE 598: Big Data Optimization Fall 2016 Lecture 6: Conic Optimization September 8 Lecturer: Niao He Scriber: Juan Xu Overview In this lecture, we finish up our previous discussion on optimality conditions

### SECTION C: CONTINUOUS OPTIMISATION LECTURE 9: FIRST ORDER OPTIMALITY CONDITIONS FOR CONSTRAINED NONLINEAR PROGRAMMING

Nf SECTION C: CONTINUOUS OPTIMISATION LECTURE 9: FIRST ORDER OPTIMALITY CONDITIONS FOR CONSTRAINED NONLINEAR PROGRAMMING f(x R m g HONOUR SCHOOL OF MATHEMATICS, OXFORD UNIVERSITY HILARY TERM 5, DR RAPHAEL

### Quasi-Newton Methods

Newton s Method Pros and Cons Quasi-Newton Methods MA 348 Kurt Bryan Newton s method has some very nice properties: It s extremely fast, at least once it gets near the minimum, and with the simple modifications

### On Nesterov s Random Coordinate Descent Algorithms - Continued

On Nesterov s Random Coordinate Descent Algorithms - Continued Zheng Xu University of Texas At Arlington February 20, 2015 1 Revisit Random Coordinate Descent The Random Coordinate Descent Upper and Lower

### Optimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

Optimization 2 CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Optimization 2 1 / 38

### Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima

B9824 Foundations of Optimization Lecture 1: Introduction Fall 2010 Copyright 2010 Ciamac Moallemi Outline 1. Administrative matters 2. Introduction 3. Existence of optima 4. Local theory of unconstrained

### 2. Quasi-Newton methods

L. Vandenberghe EE236C (Spring 2016) 2. Quasi-Newton methods variable metric methods quasi-newton methods BFGS update limited-memory quasi-newton methods 2-1 Newton method for unconstrained minimization

### Constrained optimization: direct methods (cont.)

Constrained optimization: direct methods (cont.) Jussi Hakanen Post-doctoral researcher jussi.hakanen@jyu.fi Direct methods Also known as methods of feasible directions Idea in a point x h, generate a

Quadratic Programming Outline Linearly constrained minimization Linear equality constraints Linear inequality constraints Quadratic objective function 2 SideBar: Matrix Spaces Four fundamental subspaces

### Introduction to Nonlinear Optimization Paul J. Atzberger

Introduction to Nonlinear Optimization Paul J. Atzberger Comments should be sent to: atzberg@math.ucsb.edu Introduction We shall discuss in these notes a brief introduction to nonlinear optimization concepts,

### 5 Overview of algorithms for unconstrained optimization

IOE 59: NLP, Winter 22 c Marina A. Epelman 9 5 Overview of algorithms for unconstrained optimization 5. General optimization algorithm Recall: we are attempting to solve the problem (P) min f(x) s.t. x

### Numerical Optimization Techniques

Numerical Optimization Techniques Léon Bottou NEC Labs America COS 424 3/2/2010 Today s Agenda Goals Representation Capacity Control Operational Considerations Computational Considerations Classification,

### Module 04 Optimization Problems KKT Conditions & Solvers

Module 04 Optimization Problems KKT Conditions & Solvers Ahmad F. Taha EE 5243: Introduction to Cyber-Physical Systems Email: ahmad.taha@utsa.edu Webpage: http://engineering.utsa.edu/ taha/index.html September

### CPSC 540: Machine Learning

CPSC 540: Machine Learning Gradient Descent, Newton-like Methods Mark Schmidt University of British Columbia Winter 2017 Admin Auditting/registration forms: Submit them in class/help-session/tutorial this

### MS&E 318 (CME 338) Large-Scale Numerical Optimization

Stanford University, Management Science & Engineering (and ICME) MS&E 318 (CME 338) Large-Scale Numerical Optimization 1 Origins Instructor: Michael Saunders Spring 2015 Notes 9: Augmented Lagrangian Methods

### FIXED POINT ITERATION

FIXED POINT ITERATION The idea of the fixed point iteration methods is to first reformulate a equation to an equivalent fixed point problem: f (x) = 0 x = g(x) and then to use the iteration: with an initial

### The fundamental theorem of calculus for definite integration helped us to compute If has an anti-derivative,

Module 16 : Line Integrals, Conservative fields Green's Theorem and applications Lecture 47 : Fundamental Theorems of Calculus for Line integrals [Section 47.1] Objectives In this section you will learn

### Numerical Optimization

Unconstrained Optimization (II) Computer Science and Automation Indian Institute of Science Bangalore 560 012, India. NPTEL Course on Unconstrained Optimization Let f : R R Unconstrained problem min x

### Convex Optimization CMU-10725

Convex Optimization CMU-10725 Quasi Newton Methods Barnabás Póczos & Ryan Tibshirani Quasi Newton Methods 2 Outline Modified Newton Method Rank one correction of the inverse Rank two correction of the

### The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know

The Bayes classifier Theorem The classifier satisfies where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know Alternatively, since the maximum it is

### Optimality Conditions

Chapter 2 Optimality Conditions 2.1 Global and Local Minima for Unconstrained Problems When a minimization problem does not have any constraints, the problem is to find the minimum of the objective function.

### Nonlinear Programming (NLP)

Natalia Lazzati Mathematics for Economics (Part I) Note 6: Nonlinear Programming - Unconstrained Optimization Note 6 is based on de la Fuente (2000, Ch. 7), Madden (1986, Ch. 3 and 5) and Simon and Blume

### Line Search Methods. Shefali Kulkarni-Thaker

1 BISECTION METHOD Line Search Methods Shefali Kulkarni-Thaker Consider the following unconstrained optimization problem min f(x) x R Any optimization algorithm starts by an initial point x 0 and performs

### Non-Convex Optimization. CS6787 Lecture 7 Fall 2017

Non-Convex Optimization CS6787 Lecture 7 Fall 2017 First some words about grading I sent out a bunch of grades on the course management system Everyone should have all their grades in Not including paper

### Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

### Math 273a: Optimization Basic concepts

Math 273a: Optimization Basic concepts Instructor: Wotao Yin Department of Mathematics, UCLA Spring 2015 slides based on Chong-Zak, 4th Ed. Goals of this lecture The general form of optimization: minimize

### Unconstrained Multivariate Optimization

Unconstrained Multivariate Optimization Multivariate optimization means optimization of a scalar function of a several variables: and has the general form: y = () min ( ) where () is a nonlinear scalar-valued

### Analytic Center Cutting-Plane Method

Analytic Center Cutting-Plane Method S. Boyd, L. Vandenberghe, and J. Skaf April 14, 2011 Contents 1 Analytic center cutting-plane method 2 2 Computing the analytic center 3 3 Pruning constraints 5 4 Lower

### Solution Methods. Richard Lusby. Department of Management Engineering Technical University of Denmark

Solution Methods Richard Lusby Department of Management Engineering Technical University of Denmark Lecture Overview (jg Unconstrained Several Variables Quadratic Programming Separable Programming SUMT

### Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

### 1 Convexity, concavity and quasi-concavity. (SB )

UNIVERSITY OF MARYLAND ECON 600 Summer 2010 Lecture Two: Unconstrained Optimization 1 Convexity, concavity and quasi-concavity. (SB 21.1-21.3.) For any two points, x, y R n, we can trace out the line of

### LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization

### Self-Concordant Barrier Functions for Convex Optimization

Appendix F Self-Concordant Barrier Functions for Convex Optimization F.1 Introduction In this Appendix we present a framework for developing polynomial-time algorithms for the solution of convex optimization

### Quasi-Newton Methods

Quasi-Newton Methods Werner C. Rheinboldt These are excerpts of material relating to the boos [OR00 and [Rhe98 and of write-ups prepared for courses held at the University of Pittsburgh. Some further references

### 8 Barrier Methods for Constrained Optimization

IOE 519: NL, Winter 2012 c Marina A. Epelman 55 8 Barrier Methods for Constrained Optimization In this subsection, we will restrict our attention to instances of constrained problem () that have inequality

### , b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are

Quadratic forms We consider the quadratic function f : R 2 R defined by f(x) = 2 xt Ax b T x with x = (x, x 2 ) T, () where A R 2 2 is symmetric and b R 2. We will see that, depending on the eigenvalues

### 3E4: Modelling Choice. Introduction to nonlinear programming. Announcements

3E4: Modelling Choice Lecture 7 Introduction to nonlinear programming 1 Announcements Solutions to Lecture 4-6 Homework will be available from http://www.eng.cam.ac.uk/~dr241/3e4 Looking ahead to Lecture

### Chapter 1. Optimality Conditions: Unconstrained Optimization. 1.1 Differentiable Problems

Chapter 1 Optimality Conditions: Unconstrained Optimization 1.1 Differentiable Problems Consider the problem of minimizing the function f : R n R where f is twice continuously differentiable on R n : P

### Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE7C (Spring 018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee7c@berkeley.edu February

### Optimization and Calculus

Optimization and Calculus To begin, there is a close relationship between finding the roots to a function and optimizing a function. In the former case, we solve for x. In the latter, we solve: g(x) =

### Sub-Sampled Newton Methods I: Globally Convergent Algorithms

Sub-Sampled Newton Methods I: Globally Convergent Algorithms arxiv:1601.04737v3 [math.oc] 26 Feb 2016 Farbod Roosta-Khorasani February 29, 2016 Abstract Michael W. Mahoney Large scale optimization problems

### Vector Calculus. Lecture Notes

Vector Calculus Lecture Notes Adolfo J. Rumbos c Draft date November 23, 211 2 Contents 1 Motivation for the course 5 2 Euclidean Space 7 2.1 Definition of n Dimensional Euclidean Space........... 7 2.2

### CE 191: Civil and Environmental Engineering Systems Analysis. LEC 05 : Optimality Conditions

CE 191: Civil and Environmental Engineering Systems Analysis LEC : Optimality Conditions Professor Scott Moura Civil & Environmental Engineering University of California, Berkeley Fall 214 Prof. Moura

### IE 5531 Midterm #2 Solutions

IE 5531 Midterm #2 s Prof. John Gunnar Carlsson November 9, 2011 Before you begin: This exam has 9 pages and a total of 5 problems. Make sure that all pages are present. To obtain credit for a problem,

### Optimality, Duality, Complementarity for Constrained Optimization

Optimality, Duality, Complementarity for Constrained Optimization Stephen Wright University of Wisconsin-Madison May 2014 Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 1 / 41 Linear

### Module 5 : Linear and Quadratic Approximations, Error Estimates, Taylor's Theorem, Newton and Picard Methods

Module 5 : Linear and Quadratic Approximations, Error Estimates, Taylor's Theorem, Newton and Picard Methods Lecture 14 : Taylor's Theorem [Section 141] Objectives In this section you will learn the following

### Primal-Dual Interior-Point Methods. Ryan Tibshirani Convex Optimization /36-725

Primal-Dual Interior-Point Methods Ryan Tibshirani Convex Optimization 10-725/36-725 Given the problem Last time: barrier method min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h i, i = 1,...

### 1. Method 1: bisection. The bisection methods starts from two points a 0 and b 0 such that

Chapter 4 Nonlinear equations 4.1 Root finding Consider the problem of solving any nonlinear relation g(x) = h(x) in the real variable x. We rephrase this problem as one of finding the zero (root) of a

### Key words. nonlinear programming, pattern search algorithm, derivative-free optimization, convergence analysis, second order optimality conditions

SIAM J. OPTIM. Vol. x, No. x, pp. xxx-xxx c Paper submitted to Society for Industrial and Applied Mathematics SECOND ORDER BEHAVIOR OF PATTERN SEARCH MARK A. ABRAMSON Abstract. Previous analyses of pattern

The Conjugate Gradient Method Lecture 5, Continuous Optimisation Oxford University Computing Laboratory, HT 2006 Notes by Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The notion of complexity (per iteration)

### Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R

### FIXED POINT ITERATIONS

FIXED POINT ITERATIONS MARKUS GRASMAIR 1. Fixed Point Iteration for Non-linear Equations Our goal is the solution of an equation (1) F (x) = 0, where F : R n R n is a continuous vector valued mapping in

### High Order Methods for Empirical Risk Minimization

High Order Methods for Empirical Risk Minimization Alejandro Ribeiro Department of Electrical and Systems Engineering University of Pennsylvania aribeiro@seas.upenn.edu IPAM Workshop of Emerging Wireless

### Empirical Risk Minimization and Optimization

Statistical Machine Learning Notes 3 Empirical Risk Minimization and Optimization Instructor: Justin Domke Contents 1 Empirical Risk Minimization 1 2 Ingredients of Empirical Risk Minimization 4 3 Convex

### Improved Damped Quasi-Newton Methods for Unconstrained Optimization

Improved Damped Quasi-Newton Methods for Unconstrained Optimization Mehiddin Al-Baali and Lucio Grandinetti August 2015 Abstract Recently, Al-Baali (2014) has extended the damped-technique in the modified

### Introduction to Optimization

Introduction to Optimization Gradient-based Methods Marc Toussaint U Stuttgart Gradient descent methods Plain gradient descent (with adaptive stepsize) Steepest descent (w.r.t. a known metric) Conjugate

### Linear Classification: Perceptron

Linear Classification: Perceptron Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong 1 / 18 Y Tao Linear Classification: Perceptron In this lecture, we will consider

### Mathematics for Economics

Mathematics for Economics third edition Michael Hoy John Livernois Chris McKenna Ray Rees Thanasis Stengos The MIT Press Cambridge, Massachusetts London, England c 2011 Massachusetts Institute of Technology

### Support Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017

Support Vector Machines: Training with Stochastic Gradient Descent Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem

### 10 Numerical methods for constrained problems

10 Numerical methods for constrained problems min s.t. f(x) h(x) = 0 (l), g(x) 0 (m), x X The algorithms can be roughly divided the following way: ˆ primal methods: find descent direction keeping inside

### Experiment 4 Free Fall

PHY9 Experiment 4: Free Fall 8/0/007 Page Experiment 4 Free Fall Suggested Reading for this Lab Bauer&Westfall Ch (as needed) Taylor, Section.6, and standard deviation rule ( t < ) rule in the uncertainty

### TAYLOR POLYNOMIALS DARYL DEFORD

TAYLOR POLYNOMIALS DARYL DEFORD 1. Introduction We have seen in class that Taylor polynomials provide us with a valuable tool for approximating many different types of functions. However, in order to really

### Lecture 8: February 9

0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we

### Second Order Optimization Algorithms I

Second Order Optimization Algorithms I Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapters 7, 8, 9 and 10 1 The

### Conic Linear Programming. Yinyu Ye

Conic Linear Programming Yinyu Ye December 2004, revised October 2017 i ii Preface This monograph is developed for MS&E 314, Conic Linear Programming, which I am teaching at Stanford. Information, lecture

### Optimization for Machine Learning

Optimization for Machine Learning Elman Mansimov 1 September 24, 2015 1 Modified based on Shenlong Wang s and Jake Snell s tutorials, with additional contents borrowed from Kevin Swersky and Jasper Snoek

### Review for Exam 2 Ben Wang and Mark Styczynski

Review for Exam Ben Wang and Mark Styczynski This is a rough approximation of what we went over in the review session. This is actually more detailed in portions than what we went over. Also, please note

### EC /11. Math for Microeconomics September Course, Part II Lecture Notes. Course Outline

LONDON SCHOOL OF ECONOMICS Professor Leonardo Felli Department of Economics S.478; x7525 EC400 20010/11 Math for Microeconomics September Course, Part II Lecture Notes Course Outline Lecture 1: Tools for

### A Second-Order Path-Following Algorithm for Unconstrained Convex Optimization

A Second-Order Path-Following Algorithm for Unconstrained Convex Optimization Yinyu Ye Department is Management Science & Engineering and Institute of Computational & Mathematical Engineering Stanford

### Primal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization

Primal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization Roger Behling a, Clovis Gonzaga b and Gabriel Haeser c March 21, 2013 a Department

### Lagrange Multipliers

Lagrange Multipliers (Com S 477/577 Notes) Yan-Bin Jia Nov 9, 2017 1 Introduction We turn now to the study of minimization with constraints. More specifically, we will tackle the following problem: minimize

### Convex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Convex Functions Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline of Lecture Definition convex function Examples

### Lecture 3 Introduction to optimality conditions

TMA947 / MMG621 Nonlinear optimisation Lecture 3 Introduction to optimality conditions Emil Gustavsson October 31, 2013 Local and global optimality We consider an optimization problem which is that to

### Problem 3. Give an example of a sequence of continuous functions on a compact domain converging pointwise but not uniformly to a continuous function

Problem 3. Give an example of a sequence of continuous functions on a compact domain converging pointwise but not uniformly to a continuous function Solution. If we does not need the pointwise limit of

### The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the

### Solving Dual Problems

Lecture 20 Solving Dual Problems We consider a constrained problem where, in addition to the constraint set X, there are also inequality and linear equality constraints. Specifically the minimization problem

### The Newton-Raphson Algorithm

The Newton-Raphson Algorithm David Allen University of Kentucky January 31, 2013 1 The Newton-Raphson Algorithm The Newton-Raphson algorithm, also called Newton s method, is a method for finding the minimum

### Lec10p1, ORF363/COS323

Lec10 Page 1 Lec10p1, ORF363/COS323 This lecture: Conjugate direction methods Conjugate directions Conjugate Gram-Schmidt The conjugate gradient (CG) algorithm Solving linear systems Leontief input-output

### Convergence Rates on Root Finding

Convergence Rates on Root Finding Com S 477/577 Oct 5, 004 A sequence x i R converges to ξ if for each ǫ > 0, there exists an integer Nǫ) such that x l ξ > ǫ for all l Nǫ). The Cauchy convergence criterion