Lecture Notes: Geometric Considerations in Unconstrained Optimization
|
|
- Chester Gallagher
- 5 years ago
- Views:
Transcription
1 Lecture Notes: Geometric Considerations in Unconstrained Optimization James T. Allison February 15, 2006 The primary objectives of this lecture on unconstrained optimization are to: Establish connections between optimality conditions and problem geometry Provide several motivations for the gradient method and Newton s method Illustrate these concepts with numerical examples The derivation of optimality conditions using abstract means is important, but the intuition gained through geometric understanding of optimality conditions can also be very useful. This additional insight can contribute to more effective implementation of optimization theory. A brief derivation of first and second order conditions is provided, followed by a discussion of function approximation models and a geometric explanation of optimality conditions. Finally, the impact of problem condition and scaling on optimization algorithms is discussed. 1 Optimality Conditions For a point to be a minimum, perturbations about this point ( x = x x 0 ) must result only in objective function increases: f = f(x) f(x 0 ) 0 (1) Finite term Taylor series expansions of a function are accurate near the point of expansion. Combining a first order expansion with equation 1, we can derive a necessary condition for optimality 1 : f = f(x ) x + o( x ) f f(x ) x 0 f(x ) = 0 (2) A point that meets this condition is a stationary point (x ), but it is unkown whether this point is a minimum, maximum, or a saddle point. Evaluation of this first order necessary condition involves the solution of a system of nonlinear equations (equation 2). A second order expansion about a known stationary point provides curvature information via a quadratic approximation of the function, and enables the determination of whether the stationary point is in fact a minimum. If we apply equation 1 to a second order expansion about a stationary point, noting that the linear term in this case is zero, we arrive at the following condition: x H x > 0 x (3) H is the Hessian (also written as 2 f(x)). The satisfaction of this condition and the stationarity condition of equation 2 together comprises a second order sufficiency condition, i.e., if this condition is met the point in 1 Note that in this document vectors are considered to be column vectors, and gradients are also considered to be column vectors. The transpose of a vector x is denoted x. Copyright c 2006 by James T. Allison 1
2 2 FUNCTION MODELS question is known to be an minimum. Evaluating this condition for all possible perturbations would be very difficult. However, it is known from linear algebra theory that equation 3 is satisfied if and only if the objective function Hessian matrix is positive definite. A positive definite matrix is often denoted with the expression H 0. A matrix is positive definite if and only if all of its eigenvalues are positive, which are easily evaluated numerically. The relationship between positive definiteness, positive eigenvalues, and function geometry will be clarified in these lecture notes. 2 Function Models The Taylor series expansions used in deriving the above optimality conditions can be viewed as function approximation models. Both linear and quadratic models were used, and a geometric understanding of these models can add insights to optimality conditions and optimization algorithms. Linear Function Models A linear function model characterizes the slope of a function in the neighborhood of a point. In R space a linear model is a line tangent to a function, and in R n space 2 it is a hyperplane tangent to the function. If the tangent plane is not horizontal, then directions of descent exist, as does an improved objective function value. Therefore, an optimal point must have a horizontal tangent plane. The gradient of the objective function is zero when the tangent plane, defined by a linear Taylor series expansion, is horizontal. This verifies equation 2. This geometric description also motivates the gradient method for unconstrained optimization. Gradient Method Algorithm: 1. Build a linear model for the function at the current point, and if descent directions exist, move in the direction of steepest descent ( f) until the objective function stops improving. 2. Update the linear model and repeat until f = 0. The iterative formula for the gradient method, where k is the iteration number and α is the step size, is: x k+1 = x k α f(x k ) (4) The gradient method algorithm converts a multidimensional minimization problem to a sequence of onedimensional line searches. During each of these line searches we are looking at a slice of the objective function surface. This is illustrated in the following example. Example 1: Consider the quadratic function: 7x x 1 x 2 + x 2 2 (5) The contours of the function level set are shown in the first plot of Figure 1. We can see by inspection of the objective function that the minimum is at x = [0 0. If we start at x 0 = [10 5 and perform the line search min α f(α) = x 0 α f(x 0 ), the objective function in the search direction appears as shown in the second plot of Figure 1. The search direction is illustrated in the first plot, and second plot is a slice of the objective function surface in this search direction. Quadratic Function Models A quadratic model can capture curvature information of a function in the neighborhood of a point. In R space a quadratic model is a parabola, and in R n it is a paraboloid. Constructing a quadratic model of a function facilitates the approximation of the function s stationary point, since the quadratic model has its own stationary point. Linear models (hyperplanes) do not have stationary points. The closer to quadratic a function s shape is, the better this approximation will be, and will of course be exact for quadratic objective 2 R is the set of all real numbers, and R n is the set of all real valued vectors of length n. Copyright c 2006 by James T. Allison 2
3 2 FUNCTION MODELS 15 Example 1: Convex Quadratic Function 2500 Line Search View x x s x 0 f(x 0 α f(x 0 )) x α Figure 1: Contour and line search plots for the quadratic function of Example 1 functions. Iterative approximation of a function s stationary point forms the basis for Newton s method for unconstrained optimization. Sequential quadratic modeling is the first of three motivations for Newton s method for optimization that will be discussed in these lecture notes. Newton s Method: 1. Build a quadratic model for the function at the current point, and use the stationary point of this model as the approximation for the objective function stationary point. 2. Check for convergence, and iterate if not converged. The iterative formula for Newton s method, where H 1 is the inverse of the objective function s Hessian, is: x k+1 = x k H 1 f(x k ) (6) Newton s method exhibits very fast (quadratic) local convergence compared to the slower linear convergence of the gradient method. Newton s method, however, can be unstable. It may converge to a maximum instead of a minimum, since it does not have a descent property. Newton s method seeks to find a stationary point, but has no ability to distinguish between a maximum or a minimum. In contrast, the gradient method will always decrease the objective function at each iteration because it always moves in a descent direction. The gradient method will find a stationary point that is either a minimum, or a saddle point that is an improvement over the starting point. In other words, the gradient method is effective at moving in a descent direction, even far from a stationary point location, while Newton s method is effective at converging quickly to a stationary point when one is near. Quasi-Newton methods combine the good global convergence of the gradient method with the rapid local convergence of Newton s method. Such methods begin with gradient method iterations, and dynamically transform into Newton s method iterations. You may be familiar with another form of Newton s method that is for finding the roots of a function. The one-dimensional root finding formula is: x k+1 = x k f(xk ) f (x k ) Copyright c 2006 by James T. Allison 3
4 3 QUADRATIC FORMS AND GEOMETRY An extension of Newton s method to multiple dimenions takes the form: x k+1 = x k J 1 f(x k ) (7) This multidimensional version seeks to solve the system of equations f(x) = 0. Note that f(x) is a vectorvalued function. The matrix J is the Jacobian of the function f(x), which is a matrix where each row is the transpose of the gradient of each component of the vector function f(x). This next concept establishes the connection between Newton s method for root finding (i.e., for solving systems of non-linear equations), and Newton s method for unconstrained optimization. Recall that if we are seeking to find a stationary point of an objective function f(x), we need to solve the system of equations f(x) = 0. If we use Newton s method for solving nonlinear systems of equations to solve f(x) = 0, we replace f(x) in equation 7 with the vector valued function f(x), and replace J 1 with the inverse of the Jacobian of f(x). Observe that the inverse of the Jacobian of f(x) is in fact the inverse of the Hessian of f(x), or H 1. Hence, by applying Newton s method for solving systems of equation to the problem of finding a stationary point, we have derived Newton s method for unconstrained optimization as defined in equation 6. This is the second of three motivations for Newton s method discussed in this document. 3 Quadratic Forms and Geometry Quadratic models can take either of two general shapes: paraboloid (convex or concave), or hyperboloid (a saddle). Example 1 exhibited a function with a convex parabolic shape, and a hyperboloid will be illustrated shortly. First a brief review of quadratic function forms will be given. A function has a quadratic form if it is a linear combination of x i x j terms. It can be written in matrix form: f(x) = x Ax, where A is a symmetric matix that defines the quadratic function. The conversion of the function in equation 8 will be illustrated. f(x) = 2x x 1 x 2 + x x 2 x 3 + x 2 3 (8) First, each coefficient of squared terms (i.e., i = j) are placed on the diagonal at location (i, i) Then, since each cross (or interaction ) term is split across two off-diagonal entries, the coefficient terms is divided by two and placed in each entry Finally, any quadratic terms that do not appear in the original function are assigned a value of zero in the matrix The function in equation 8, rewritten in matrix form, is: f(x) = [x 1 x 2 x x 1 x 2 = x Ax (9) x 3 The correctness of this representation can be verified by performing the vector and matrix multiplications in eqaution 9, and observing that the result simplifies to equation 8. Certain properties of the quadratic form indicate what type of shape the function has. Three general possible shapes exist. Copyright c 2006 by James T. Allison 4
5 , ), ), ) 3 QUADRATIC FORMS AND GEOMETRY! #"$ % 0 - %1"2./ 0 34$ %9 /% #"$ % * '() * '() * '() & & & -! #"$./ 0 - %1"2./ 0 34$ %9 /% #"$ % + Figure 2: Illustration of convex, concave, and hyperbolic quadratic functions if x Ax > 0 if x Ax < 0 x, A is positive definite convex quadratic function x, A is negative definite concave quadratic function if x Ax is positive for some x and negative for other, A is indefinite hyperbolic quadratic function Figure 2 illustrates each of these three cases using both surface and contour plots of covex, concave, and hyperbolic quadratic functions. The quadratic functions corresponding to the left, center, and right plots, respectively, are given below. f 1 (x) = x A 1 x where: A 1 = [ f 2 (x) = x A 2 x f 3 (x) = x A 3 x [ 7 1.2, A 2 = [ 5 2.6, and A 3 = It can also be demonstrated that if a matrix has all positive eigenvalues, i.e., λ i > 0 i, the matrix is positive definite. Similarly, if λ i < 0 i, the corresponding matrix is negative definite, and if eigenvalues take both positive and negative values, then the matrix is indefinite. Eigenvalues provide a nice way of evaluating properties of a quadratic function, but what exactly is the connection between eigenvalues and function geometry? We will demonstrate an intuitive interpretation of eigenvalues, and illustrate this with a numerical example. Copyright c 2006 by James T. Allison 5
6 3 QUADRATIC FORMS AND GEOMETRY Eigenvalues, Eigenvectors, and Geometry An eigenvalue λ and corresponding eigenvector v of a matrix A satisfies the relation: Av = λv (10) Eigenvectors v are vectors that result in a scalar multiple of themselves if they are pre-multiplied by the associated matrix. We can gain geometric intuition for what eigenvalues and eigenvectors are by shifting and rotating the coordinate system that we use to view a quadratic function. This is a lengthy process, but the end result will provide significant geometric insight. We start with a general quadratic function (including constant and linear terms), and translate the coordinate axes to be centered at the function s stationary point by defining new coordinates: z = x x. Note that this function s gradient is b + 2Ax, and the stationary point is x = 1 2 A 1 b. The vectors x and b have length n, and the matrix A has dimension n n. f(x) = f 0 + x b + x Ax f(z) = f 0 + (z + x ) b + (z + x ) A(z + x ) f(z) = (f 0 + x b + x Ax ) + z Az + z (b + 2Ax ) f(z) = f + z Az For convenience f is defined as the function value at x, and the last term was dropped in the third equation because of stationarity (i.e., b + 2Ax = 0). The coordinate system can be rotated by transforming (multiplying) the coordinate variables with a matrix. Consider the matrix V = [v 1 v 2... v n, with columns that are the normalized eigenvectors of A. Using V to rotate the coordinates will cause the coordinates to be aligned with the eigenvectors of A. This rotation is effected through the multiplication p = V z, where p are the new coordinates. Since the normalized eigenvectors form an orthonormal basis, the matrix V is orthogonal, and the following identities hold: V = V 1, V V = VV = I I is the identity matrix, an n n matrix with ones on the diagonal. We can use these identities and the definition of p to write z in terms of the rotated coordinates p. It will also be helpful to know z in terms of p: z = Iz = VV z = Vp z = [Vp = p V Substituting these expressions for z and z into the last equation for f(z), we arrive at a new form of the original quadratic function in terms of the translated and rotated coordinates p: f(p) = f + p V AVp This functional expression can be further simplified by defining the matrix Λ = V AV, which turns out to be a diagonal matrix whose entries are the eigenvalues of A. The function can be rewritten as: f(p) = f + p Λp (11) Since all off-diagonal terms are zero, the function can be written using a simple summation (equation 12). This final result will enable geometric interpretation of eigenvalues. f(p) = f + n λ i p 2 i (12) This form provides an excellent geometric interpretation for eigenvalues and eigenvectors. If we move along an eigenvector direction (i.e, vary p i ), the function will decrease if λ i < 0, and increase if λ i > 0. This interpretation is congruent with the geometry associated with positive definite, negative definite, and indefinite matrices. If an eigenvalue is large, then the rate of change in the associated direction will be large. The eigenvalues and eigenvectors from the functions in Figure 2 are shown below, and the eigenvector directions are plotted in Figure 3. Eigenvectors point in the direction of the axes of the level set contour ellipses. Note that the eigenvectors associated with the larger eigenvalues point in the direction of the minor axes of the level set ellipses, since the function in the direction of the minor axes is steepest. i=1 Copyright c 2006 by James T. Allison 6
7 & % & % & % 4 PROBLEM CONDITION AND SCALING! " '!(" ' " )+*", - ". ' " $& $& $& $% $% $% #& #& #& #$% #$% #$% #$& % #$% $% #$& % #$% $% #$& % #$% $% " Figure 3: Eigenvector directions of quadratic functions from Figure 2 Function 1: v 1 = [ [.982, λ 1 =.769, v 2 =.189, λ 2 = 7.23 Function 2: Function 3: v 1 = v 1 = [ [ [.189, λ 1 =.723, v 2 =.982 [.314, λ 1 = 5.86, v 2 =.949, λ 2 =.769, λ 2 = Problem Condition and Scaling An objective function is more difficult to minimize when it is highly elliptical. A quantitative measure of this is the condition number (C) of a function (equation 13). It is defined as the ratio between the maximum and minimum eigenvalues of the function s Hessian. A perfectly conditioned function has a condition number of 1, while ill-conditioned problems have very large condition numbers. Recall that large eigenvalues correspond to very steep function responses. Thus, an ill-conditioned function has directions with rapid change in some directions, and very little change in other directions. Also note that these directions of disparate sensitivity are not necessarily aligned with coordinate axes. C = λ max λ min (13) The gradient method has particular difficulty with poorly conditioned problems. The influence of steep directions can drown out the influence of relatively flat directions. For example, if the algorithm is evaluating a point in a long, narrow valley, the gradient method could be numerically fooled into thinking that the gradient is zero, even if the point is far from the minimum. The directional derivative in the steep direction may in fact be zero if the point is in the low point of the valley, but since the derivative in the nearly flat direction is so small, machine precision limitations may incorrectly identify a zero gradient. When the gradient method is Copyright c 2006 by James T. Allison 7
8 4 PROBLEM CONDITION AND SCALING stuck in such a valley, not much progress can be made with each step because of the relatively small gradient. Whether algorithm convergence is based on having a zero gradient or a sufficiently small step size, the gradient method may terminate before finding the solution because of poor scaling. In addition, it can be show that the each search direction of the gradient method (with exact line search) is orthogonal to the previous search direction. This results in a zig-zag route to the solution that requires many iterations. What can be done to address this issue with using the gradient method for ill-conditioned problems? A common approach is to scale the variables such that the objective function is approximately just as sensitive to all variables. A simple scaling approach is to multiply each variable by a scalar such that the nominal value (or starting point value) is equal to one. These scalars multipliers can be used to form a scaling vector, such that the scaled variables can be calculated using a vector multiplication: y = s x. Here y is the vector of scaled variables, and s is the scaling vector. Similarly, in constrained optimization it is important to scale the objective and constraint function values such that they have the same magnitude. Scaling each variable individually works well when the eigenvectors are nearly aligned with the coordinate axes. Recall that this might not be the case, i.e., a function may have a steep direction that points in a direction somewhere in the middle of the coordinate axes. Such a function requires more sophisticated scaling to achieve reasonable conditioning. The interaction between variables must be considered, and a scaling matrix may be used to accomplish this since the off-diagonal terms of a matrix can account for variable interaction. A useful class of scaling matrices are symmetric and positive definite. For convenience we define S 1 as a scaling matrix, and write: x = Sy (14) If we define the objective function in the new variable space as h(y) = f(sy), then the new gradient method iteration becomes: y k+1 = y k α h(y k ) (15) Although we could proceed using this formula, obtain a solution in terms of y, and convert the solution back to the original variable space using equation 14, it will be instructive to recast equation 15 in terms of x. If we premultiply this equation by S, define S 2 = D k, and use the chain rule to obtain the relation h(y) = S f(x), we find after algebraic manipulation that: x k+1 = x k αd k f(x k ) (16) This result is in fact a scaled gradient method iteration. The gradient scaling matrix for iteration k, D k, will ensure descent if it is symmetric and positive definite. It turns out that the best scaling results are obtained if we set D k to the inverse of the function s Hessian evaluated at x k, i.e., D k = H(x k ). Observe that when this is the case, and if we set the step size α = 1, the scaled steepest descent algorithm becomes Newton s method for optimization, as defined in equation 6. This is the third and final motivation for using Newton s method for unconstrained optimization that will be discussed in these lecture notes. If the Hessian of an objective function is positive definite at a point, then Newton s method will produce descent for that iteration, since the scaled gradient method is guaranteed descent when D k 0. The Hessian, however, may not be positive definite. Geometrically, when Newton s method is operating in a region where the objective function is convex (i.e., H(x k ) 0) that includes a minimum, it will iteratively descend to that minimum. Conversely, if the region is concave (i.e., H(x k ) 0) with an associated maximum, Newton s method will ascend to the maximum. Ideal scaling will remove any function ellipticity. This will transform elliptical level sets of a function s contour plot to circular level sets. Scaling a quadratic function with the inverse of its Hessian will result in a perfectly conditioned function with circular level sets. Applying the gradient method to such a function will locate the minimum in one step, since f(x k ) will point directly to the minimum. Since using the inverse of the Hessian to scale a function for the gradient method is the same as using Newton s method, this scenario is equivalent to applying Newton s method to the minimization of a quadratic function. Recall that Newton s method will find the minimum of a quadratic function in one step, since the quadratic approximation model is exact. Whether we view this situation as Newton s method applied to a quadratic function, or use of the gradient method with ideal scaling, the result is the same the solution is identified in one step. Copyright c 2006 by James T. Allison 8
9 5 SUMMARY 5 Summary A connection was established between optimality conditions and a geometric understanding of functions. Three approaches were used to motivate the use of Newton s method: 1. Sequential second-order function approximations 2. Newton s method for root finding to solve f(x ) = 0 3. Use of the objective function s Hessian to provide ideal scaling Copyright c 2006 by James T. Allison 9
, b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are
Quadratic forms We consider the quadratic function f : R 2 R defined by f(x) = 2 xt Ax b T x with x = (x, x 2 ) T, () where A R 2 2 is symmetric and b R 2. We will see that, depending on the eigenvalues
More informationLECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION
15-382 COLLECTIVE INTELLIGENCE - S19 LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION TEACHER: GIANNI A. DI CARO WHAT IF WE HAVE ONE SINGLE AGENT PSO leverages the presence of a swarm: the outcome
More informationGradient Descent. Dr. Xiaowei Huang
Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,
More informationChapter 8 Gradient Methods
Chapter 8 Gradient Methods An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Introduction Recall that a level set of a function is the set of points satisfying for some constant. Thus, a point
More informationIntroduction to gradient descent
6-1: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction to gradient descent Derivation and intuitions Hessian 6-2: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction Our
More informationAM 205: lecture 18. Last time: optimization methods Today: conditions for optimality
AM 205: lecture 18 Last time: optimization methods Today: conditions for optimality Existence of Global Minimum For example: f (x, y) = x 2 + y 2 is coercive on R 2 (global min. at (0, 0)) f (x) = x 3
More informationOptimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30
Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained
More informationIntroduction to Unconstrained Optimization: Part 2
Introduction to Unconstrained Optimization: Part 2 James Allison ME 555 January 29, 2007 Overview Recap Recap selected concepts from last time (with examples) Use of quadratic functions Tests for positive
More informationNonlinear Optimization: What s important?
Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global
More informationOptimization Methods
Optimization Methods Decision making Examples: determining which ingredients and in what quantities to add to a mixture being made so that it will meet specifications on its composition allocating available
More informationThe Steepest Descent Algorithm for Unconstrained Optimization
The Steepest Descent Algorithm for Unconstrained Optimization Robert M. Freund February, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Steepest Descent Algorithm The problem
More informationNonlinearOptimization
1/35 NonlinearOptimization Pavel Kordík Department of Computer Systems Faculty of Information Technology Czech Technical University in Prague Jiří Kašpar, Pavel Tvrdík, 2011 Unconstrained nonlinear optimization,
More informationUnconstrained optimization
Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout
More informationNonlinear Optimization for Optimal Control
Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]
More information4 Newton Method. Unconstrained Convex Optimization 21. H(x)p = f(x). Newton direction. Why? Recall second-order staylor series expansion:
Unconstrained Convex Optimization 21 4 Newton Method H(x)p = f(x). Newton direction. Why? Recall second-order staylor series expansion: f(x + p) f(x)+p T f(x)+ 1 2 pt H(x)p ˆf(p) In general, ˆf(p) won
More informationConstrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.
Optimization Unconstrained optimization One-dimensional Multi-dimensional Newton s method Basic Newton Gauss- Newton Quasi- Newton Descent methods Gradient descent Conjugate gradient Constrained optimization
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationMathematical optimization
Optimization Mathematical optimization Determine the best solutions to certain mathematically defined problems that are under constrained determine optimality criteria determine the convergence of the
More information1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:
Newton s Method Suppose we want to solve: (P:) min f (x) At x = x, f (x) can be approximated by: n x R. f (x) h(x) := f ( x)+ f ( x) T (x x)+ (x x) t H ( x)(x x), 2 which is the quadratic Taylor expansion
More information1 Numerical optimization
Contents 1 Numerical optimization 5 1.1 Optimization of single-variable functions............ 5 1.1.1 Golden Section Search................... 6 1.1. Fibonacci Search...................... 8 1. Algorithms
More information5 Handling Constraints
5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest
More informationExploring the energy landscape
Exploring the energy landscape ChE210D Today's lecture: what are general features of the potential energy surface and how can we locate and characterize minima on it Derivatives of the potential energy
More informationFunctions of Several Variables
Functions of Several Variables The Unconstrained Minimization Problem where In n dimensions the unconstrained problem is stated as f() x variables. minimize f()x x, is a scalar objective function of vector
More informationMath 302 Outcome Statements Winter 2013
Math 302 Outcome Statements Winter 2013 1 Rectangular Space Coordinates; Vectors in the Three-Dimensional Space (a) Cartesian coordinates of a point (b) sphere (c) symmetry about a point, a line, and a
More informationNumerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen
Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen
More informationNote: Every graph is a level set (why?). But not every level set is a graph. Graphs must pass the vertical line test. (Level sets may or may not.
Curves in R : Graphs vs Level Sets Graphs (y = f(x)): The graph of f : R R is {(x, y) R y = f(x)} Example: When we say the curve y = x, we really mean: The graph of the function f(x) = x That is, we mean
More informationECE580 Exam 1 October 4, Please do not write on the back of the exam pages. Extra paper is available from the instructor.
ECE580 Exam 1 October 4, 2012 1 Name: Solution Score: /100 You must show ALL of your work for full credit. This exam is closed-book. Calculators may NOT be used. Please leave fractions as fractions, etc.
More informationCS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares
CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares Robert Bridson October 29, 2008 1 Hessian Problems in Newton Last time we fixed one of plain Newton s problems by introducing line search
More informationScientific Computing: Optimization
Scientific Computing: Optimization Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course MATH-GA.2043 or CSCI-GA.2112, Spring 2012 March 8th, 2011 A. Donev (Courant Institute) Lecture
More informationMath 273a: Optimization Netwon s methods
Math 273a: Optimization Netwon s methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 some material taken from Chong-Zak, 4th Ed. Main features of Newton s method Uses both first derivatives
More informationOptimization Tutorial 1. Basic Gradient Descent
E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.
More information1 Kernel methods & optimization
Machine Learning Class Notes 9-26-13 Prof. David Sontag 1 Kernel methods & optimization One eample of a kernel that is frequently used in practice and which allows for highly non-linear discriminant functions
More informationx k+1 = x k + α k p k (13.1)
13 Gradient Descent Methods Lab Objective: Iterative optimization methods choose a search direction and a step size at each iteration One simple choice for the search direction is the negative gradient,
More informationConstrained Optimization
1 / 22 Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 30, 2015 2 / 22 1. Equality constraints only 1.1 Reduced gradient 1.2 Lagrange
More informationDeep Learning. Authors: I. Goodfellow, Y. Bengio, A. Courville. Chapter 4: Numerical Computation. Lecture slides edited by C. Yim. C.
Chapter 4: Numerical Computation Deep Learning Authors: I. Goodfellow, Y. Bengio, A. Courville Lecture slides edited by 1 Chapter 4: Numerical Computation 4.1 Overflow and Underflow 4.2 Poor Conditioning
More informationPerformance Surfaces and Optimum Points
CSC 302 1.5 Neural Networks Performance Surfaces and Optimum Points 1 Entrance Performance learning is another important class of learning law. Network parameters are adjusted to optimize the performance
More informationMatrix Derivatives and Descent Optimization Methods
Matrix Derivatives and Descent Optimization Methods 1 Qiang Ning Department of Electrical and Computer Engineering Beckman Institute for Advanced Science and Techonology University of Illinois at Urbana-Champaign
More informationA A x i x j i j (i, j) (j, i) Let. Compute the value of for and
7.2 - Quadratic Forms quadratic form on is a function defined on whose value at a vector in can be computed by an expression of the form, where is an symmetric matrix. The matrix R n Q R n x R n Q(x) =
More informationCOMP 558 lecture 18 Nov. 15, 2010
Least squares We have seen several least squares problems thus far, and we will see more in the upcoming lectures. For this reason it is good to have a more general picture of these problems and how to
More informationLeast Squares Optimization
Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques. I assume the reader is familiar with basic linear algebra, including the
More informationEAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science
EAD 115 Numerical Solution of Engineering and Scientific Problems David M. Rocke Department of Applied Science Taylor s Theorem Can often approximate a function by a polynomial The error in the approximation
More informationMath 411 Preliminaries
Math 411 Preliminaries Provide a list of preliminary vocabulary and concepts Preliminary Basic Netwon s method, Taylor series expansion (for single and multiple variables), Eigenvalue, Eigenvector, Vector
More informationNotes on Some Methods for Solving Linear Systems
Notes on Some Methods for Solving Linear Systems Dianne P. O Leary, 1983 and 1999 and 2007 September 25, 2007 When the matrix A is symmetric and positive definite, we have a whole new class of algorithms
More information1 Overview. 2 A Characterization of Convex Functions. 2.1 First-order Taylor approximation. AM 221: Advanced Optimization Spring 2016
AM 221: Advanced Optimization Spring 2016 Prof. Yaron Singer Lecture 8 February 22nd 1 Overview In the previous lecture we saw characterizations of optimality in linear optimization, and we reviewed the
More informationEAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science
EAD 115 Numerical Solution of Engineering and Scientific Problems David M. Rocke Department of Applied Science Multidimensional Unconstrained Optimization Suppose we have a function f() of more than one
More informationUnconstrained Multivariate Optimization
Unconstrained Multivariate Optimization Multivariate optimization means optimization of a scalar function of a several variables: and has the general form: y = () min ( ) where () is a nonlinear scalar-valued
More informationArc Search Algorithms
Arc Search Algorithms Nick Henderson and Walter Murray Stanford University Institute for Computational and Mathematical Engineering November 10, 2011 Unconstrained Optimization minimize x D F (x) where
More informationPenalty and Barrier Methods. So we again build on our unconstrained algorithms, but in a different way.
AMSC 607 / CMSC 878o Advanced Numerical Optimization Fall 2008 UNIT 3: Constrained Optimization PART 3: Penalty and Barrier Methods Dianne P. O Leary c 2008 Reference: N&S Chapter 16 Penalty and Barrier
More informationMath 118, Fall 2014 Final Exam
Math 8, Fall 4 Final Exam True or false Please circle your choice; no explanation is necessary True There is a linear transformation T such that T e ) = e and T e ) = e Solution Since T is linear, if T
More informationConvex Functions and Optimization
Chapter 5 Convex Functions and Optimization 5.1 Convex Functions Our next topic is that of convex functions. Again, we will concentrate on the context of a map f : R n R although the situation can be generalized
More information7.2 Steepest Descent and Preconditioning
7.2 Steepest Descent and Preconditioning Descent methods are a broad class of iterative methods for finding solutions of the linear system Ax = b for symmetric positive definite matrix A R n n. Consider
More informationCHAPTER 2: QUADRATIC PROGRAMMING
CHAPTER 2: QUADRATIC PROGRAMMING Overview Quadratic programming (QP) problems are characterized by objective functions that are quadratic in the design variables, and linear constraints. In this sense,
More informationGradient Descent. Sargur Srihari
Gradient Descent Sargur srihari@cedar.buffalo.edu 1 Topics Simple Gradient Descent/Ascent Difficulties with Simple Gradient Descent Line Search Brent s Method Conjugate Gradient Descent Weight vectors
More informationECE580 Partial Solution to Problem Set 3
ECE580 Fall 2015 Solution to Problem Set 3 October 23, 2015 1 ECE580 Partial Solution to Problem Set 3 These problems are from the textbook by Chong and Zak, 4th edition, which is the textbook for the
More information1 Computing with constraints
Notes for 2017-04-26 1 Computing with constraints Recall that our basic problem is minimize φ(x) s.t. x Ω where the feasible set Ω is defined by equality and inequality conditions Ω = {x R n : c i (x)
More informationMath (P)refresher Lecture 8: Unconstrained Optimization
Math (P)refresher Lecture 8: Unconstrained Optimization September 2006 Today s Topics : Quadratic Forms Definiteness of Quadratic Forms Maxima and Minima in R n First Order Conditions Second Order Conditions
More informationg(x,y) = c. For instance (see Figure 1 on the right), consider the optimization problem maximize subject to
1 of 11 11/29/2010 10:39 AM From Wikipedia, the free encyclopedia In mathematical optimization, the method of Lagrange multipliers (named after Joseph Louis Lagrange) provides a strategy for finding the
More informationCourse Notes: Week 4
Course Notes: Week 4 Math 270C: Applied Numerical Linear Algebra 1 Lecture 9: Steepest Descent (4/18/11) The connection with Lanczos iteration and the CG was not originally known. CG was originally derived
More informationEcon Slides from Lecture 8
Econ 205 Sobel Econ 205 - Slides from Lecture 8 Joel Sobel September 1, 2010 Computational Facts 1. det AB = det BA = det A det B 2. If D is a diagonal matrix, then det D is equal to the product of its
More informationNumerical optimization
Numerical optimization Lecture 4 Alexander & Michael Bronstein tosca.cs.technion.ac.il/book Numerical geometry of non-rigid shapes Stanford University, Winter 2009 2 Longest Slowest Shortest Minimal Maximal
More informationMeaning of the Hessian of a function in a critical point
Meaning of the Hessian of a function in a critical point Mircea Petrache February 1, 2012 We consider a function f : R n R and assume for it to be differentiable with continuity at least two times (that
More informationCE 191: Civil and Environmental Engineering Systems Analysis. LEC 05 : Optimality Conditions
CE 191: Civil and Environmental Engineering Systems Analysis LEC : Optimality Conditions Professor Scott Moura Civil & Environmental Engineering University of California, Berkeley Fall 214 Prof. Moura
More informationUnconstrained minimization of smooth functions
Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and
More informationNumerical Optimization
Numerical Optimization Unit 2: Multivariable optimization problems Che-Rung Lee Scribe: February 28, 2011 (UNIT 2) Numerical Optimization February 28, 2011 1 / 17 Partial derivative of a two variable function
More informationLecture V. Numerical Optimization
Lecture V Numerical Optimization Gianluca Violante New York University Quantitative Macroeconomics G. Violante, Numerical Optimization p. 1 /19 Isomorphism I We describe minimization problems: to maximize
More informationChapter 7. Extremal Problems. 7.1 Extrema and Local Extrema
Chapter 7 Extremal Problems No matter in theoretical context or in applications many problems can be formulated as problems of finding the maximum or minimum of a function. Whenever this is the case, advanced
More informationBasic Math for
Basic Math for 16-720 August 23, 2002 1 Linear Algebra 1.1 Vectors and Matrices First, a reminder of a few basic notations, definitions, and terminology: Unless indicated otherwise, vectors are always
More informationLeast Squares Optimization
Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques, which are widely used to analyze and visualize data. Least squares (LS)
More informationTangent spaces, normals and extrema
Chapter 3 Tangent spaces, normals and extrema If S is a surface in 3-space, with a point a S where S looks smooth, i.e., without any fold or cusp or self-crossing, we can intuitively define the tangent
More informationMATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018
MATH 57: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 18 1 Global and Local Optima Let a function f : S R be defined on a set S R n Definition 1 (minimizers and maximizers) (i) x S
More informationNonlinear equations and optimization
Notes for 2017-03-29 Nonlinear equations and optimization For the next month or so, we will be discussing methods for solving nonlinear systems of equations and multivariate optimization problems. We will
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationOptimization and Root Finding. Kurt Hornik
Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding
More information1 Numerical optimization
Contents Numerical optimization 5. Optimization of single-variable functions.............................. 5.. Golden Section Search..................................... 6.. Fibonacci Search........................................
More informationAppendix A Taylor Approximations and Definite Matrices
Appendix A Taylor Approximations and Definite Matrices Taylor approximations provide an easy way to approximate a function as a polynomial, using the derivatives of the function. We know, from elementary
More informationIntroduction to unconstrained optimization - direct search methods
Introduction to unconstrained optimization - direct search methods Jussi Hakanen Post-doctoral researcher jussi.hakanen@jyu.fi Structure of optimization methods Typically Constraint handling converts the
More informationAPPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.
APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product
More information8. Diagonalization.
8. Diagonalization 8.1. Matrix Representations of Linear Transformations Matrix of A Linear Operator with Respect to A Basis We know that every linear transformation T: R n R m has an associated standard
More information10.34 Numerical Methods Applied to Chemical Engineering Fall Quiz #1 Review
10.34 Numerical Methods Applied to Chemical Engineering Fall 2015 Quiz #1 Review Study guide based on notes developed by J.A. Paulson, modified by K. Severson Linear Algebra We ve covered three major topics
More informationarxiv: v1 [math.na] 5 May 2011
ITERATIVE METHODS FOR COMPUTING EIGENVALUES AND EIGENVECTORS MAYSUM PANJU arxiv:1105.1185v1 [math.na] 5 May 2011 Abstract. We examine some numerical iterative methods for computing the eigenvalues and
More informationCourse Notes: Week 1
Course Notes: Week 1 Math 270C: Applied Numerical Linear Algebra 1 Lecture 1: Introduction (3/28/11) We will focus on iterative methods for solving linear systems of equations (and some discussion of eigenvalues
More informationAlgorithms for Constrained Optimization
1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic
More informationMachine Learning Brett Bernstein. Recitation 1: Gradients and Directional Derivatives
Machine Learning Brett Bernstein Recitation 1: Gradients and Directional Derivatives Intro Question 1 We are given the data set (x 1, y 1 ),, (x n, y n ) where x i R d and y i R We want to fit a linear
More informationWeek 4: Differentiation for Functions of Several Variables
Week 4: Differentiation for Functions of Several Variables Introduction A functions of several variables f : U R n R is a rule that assigns a real number to each point in U, a subset of R n, For the next
More informationECE580 Fall 2015 Solution to Midterm Exam 1 October 23, Please leave fractions as fractions, but simplify them, etc.
ECE580 Fall 2015 Solution to Midterm Exam 1 October 23, 2015 1 Name: Solution Score: /100 This exam is closed-book. You must show ALL of your work for full credit. Please read the questions carefully.
More informationReview of Classical Optimization
Part II Review of Classical Optimization Multidisciplinary Design Optimization of Aircrafts 51 2 Deterministic Methods 2.1 One-Dimensional Unconstrained Minimization 2.1.1 Motivation Most practical optimization
More informationEcon 204 Supplement to Section 3.6 Diagonalization and Quadratic Forms. 1 Diagonalization and Change of Basis
Econ 204 Supplement to Section 3.6 Diagonalization and Quadratic Forms De La Fuente notes that, if an n n matrix has n distinct eigenvalues, it can be diagonalized. In this supplement, we will provide
More informationSECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS
SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS HONOUR SCHOOL OF MATHEMATICS, OXFORD UNIVERSITY HILARY TERM 2005, DR RAPHAEL HAUSER 1. The Quasi-Newton Idea. In this lecture we will discuss
More informationSeptember Math Course: First Order Derivative
September Math Course: First Order Derivative Arina Nikandrova Functions Function y = f (x), where x is either be a scalar or a vector of several variables (x,..., x n ), can be thought of as a rule which
More informationMobile Robotics 1. A Compact Course on Linear Algebra. Giorgio Grisetti
Mobile Robotics 1 A Compact Course on Linear Algebra Giorgio Grisetti SA-1 Vectors Arrays of numbers They represent a point in a n dimensional space 2 Vectors: Scalar Product Scalar-Vector Product Changes
More informationNumerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems
1 Numerical optimization Alexander & Michael Bronstein, 2006-2009 Michael Bronstein, 2010 tosca.cs.technion.ac.il/book Numerical optimization 048921 Advanced topics in vision Processing and Analysis of
More information6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection
6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE Three Alternatives/Remedies for Gradient Projection Two-Metric Projection Methods Manifold Suboptimization Methods
More informationREVIEW OF DIFFERENTIAL CALCULUS
REVIEW OF DIFFERENTIAL CALCULUS DONU ARAPURA 1. Limits and continuity To simplify the statements, we will often stick to two variables, but everything holds with any number of variables. Let f(x, y) be
More informationRecitation 1. Gradients and Directional Derivatives. Brett Bernstein. CDS at NYU. January 21, 2018
Gradients and Directional Derivatives Brett Bernstein CDS at NYU January 21, 2018 Brett Bernstein (CDS at NYU) Recitation 1 January 21, 2018 1 / 23 Initial Question Intro Question Question We are given
More informationOptimization Methods
Optimization Methods Categorization of Optimization Problems Continuous Optimization Discrete Optimization Combinatorial Optimization Variational Optimization Common Optimization Concepts in Computer Vision
More informationConstrained optimization: direct methods (cont.)
Constrained optimization: direct methods (cont.) Jussi Hakanen Post-doctoral researcher jussi.hakanen@jyu.fi Direct methods Also known as methods of feasible directions Idea in a point x h, generate a
More informationComputational Finance
Department of Mathematics at University of California, San Diego Computational Finance Optimization Techniques [Lecture 2] Michael Holst January 9, 2017 Contents 1 Optimization Techniques 3 1.1 Examples
More informationTranspose & Dot Product
Transpose & Dot Product Def: The transpose of an m n matrix A is the n m matrix A T whose columns are the rows of A. So: The columns of A T are the rows of A. The rows of A T are the columns of A. Example:
More informationQuadratic Programming
Quadratic Programming Outline Linearly constrained minimization Linear equality constraints Linear inequality constraints Quadratic objective function 2 SideBar: Matrix Spaces Four fundamental subspaces
More informationPart 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)
Part 4: Active-set methods for linearly constrained optimization Nick Gould RAL fx subject to Ax b Part C course on continuoue optimization LINEARLY CONSTRAINED MINIMIZATION fx subject to Ax { } b where
More information