Vector Derivatives and the Gradient

Size: px
Start display at page:

Download "Vector Derivatives and the Gradient"

Transcription

1 ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 1/1 Lecture 10 ECE 275A Vector Derivatives and the Gradient

2 ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 2/1 Objective Functions l(x) Let l(x) be a real-valued function (aka functional) of an n-dimensional real vector x X = R n l( ) : X = R n V = R where X is an n-dimensional real Hilbert space with metric matrix Ω = Ω T > 0. Note that henceforth vectors in X are represented a column vectors in R n. Because here the value space Y = R is one-dimensional, wlog we can take V to be Cartesian (because all (necessarily positive scalar) metric weightings yield inner products and norms that are equivalent up to an overall positive scaling). We will call l(x) an objective function. If l(x) is a cost, loss, or penalty function, then we assume that it is bounded from below and we attempt to minimize it wrt x. If l(x) is a profit, gain, or reward function, then we assume that it is bounded from above and we attempt to maximize it wrt x. For example, suppose we wish to match a model pdf p x (y) to a true, but unknown, density p x 0(y) for an observed random vector, where we assume p x (y) p x 0(y), x. We can then use a penalty function of x to be given by a measure of (non-averaged or instantaneous) divergence or discrepancy D I (x 0 x) of the model pdf p x (y) from the true pdf p x 0(y) defined by ( ) D I (x 0 px 0(y) x) log p x (y) = log p x 0(y) log p x (y)

3 ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 3/1 Instantaneous Divergence & Negative Log-likelihood Note that minimizing the instantaneous divergence D I (x 0 x) is equivalent to maximizing the log-likelihood p x (y) or minimizing the negative log-likelihood l(x) = log p x (y) A special case arises from use of the nonlinear Gaussian additive noise model with known noise covariance C and known mean function h( ) y = h(x) + n, n N(0, C) y N(h(x), C) which yields the nonlinear weighted least-squares problem l(x) = log p x (y). = y h(x) 2 W, W = C 1 Further setting h(x) = Ax is the linear weighted least-squares problem we have already discussed l(x) = log p x (y) =. y Ax 2 W, W = C 1 The symbol. = denotes that fact that we are ignoring additive terms and multiplicative factors which are irrelevant for the purposes of obtaining a extremum (here, a minimum) of a loss function. Of course we cannot ignore these terms if we are interested in the optimal value of the loss function itself.

4 ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 4/1 Stationary Points and the Vector Partial Derivative Henceforth let the real scalar function l(x) be twice partial differentiable with respect to all components of x R n. A necessary condition for x be be a local extremum (maximum or minimum) of l(x) is that l(x) (... 1 n }{{} 1 n ) = 0 where the vector partial derivative operator ( 1... n ) is defined as a row operator. (See the extensive discussion in the Lecture Supplement on Real Vector Derivative.) A vector that satisfies l(x) = 0 is known as a stationary point of l(x). Stationary points are points at which l(x) has a local maximum, minimum, or inflection. Sufficient conditions for a stationary point to be a local extremum require that we develop a theory of vector differentiation that will allow us to clearly and succinctly discuss second-order derivative properties of objective functions.

5 ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 5/1 Derivative of a Vector-Valued Function The Jacobian Let f(x) R m have elements f i (x), i = 1,, m, which are all differentiable with respect to the components of x R n. We define the vector partial derivative of the vector function f(x) as J f (x) f(x) f 1(x). f m(x) = f 1 (x) f 1 (x) 1 n f m (x) f m (x) 1 n }{{} m n The matrix J f (x) = f(x) is known as the Jacobian matrix or operator of the mapping f(x). It is the linearization of the nonlinear mapping f(x) at the point x. Often we write y = f(x) and the corresponding Jacobian as J y (x). If m = n, then y = f(x) can be viewed as a change of variables, in which case det J y (x) is known as the Jacobian of the transformation. det J y (x) plays a fundamental role in the change of variable formula of pdf s.

6 ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 6/1 The Jacobian Cont. A function f : R n R m is locally one-to-one in an open neighborhood of x if and only if its Jacobian (linearization) J f (ξ) = f(ξ) is a one-to-one matrix for all points ξ R n in the open neighborhood of x. A function f : R n R m is locally onto an open neighborhood of y = f(x) if and only if its Jacobian (linearization) J f (ξ) = f(ξ) is an onto matrix for all points ξ R n in the corresponding neighborhood of x. A function f : R n R m is locally invertible in an open neighborhood of x if and only if it is locally one-to-one and onto in the open neighborhood of x which is true if and only if its Jacobian (linearization) J f (ξ) = f(ξ) is a one-to-one and onto (and hence invertible) matrix for all points ξ R n in the open neighborhood of x. This is known as the Inverse Function Theorem.

7 ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 7/1 Vector Derivative Identities c T x Ax g T (x)h(x) T Ax T Ωx h(g(x)) = c T for an arbitrary vector c = A for an arbitrary matrix A = g T (x) h(x) + ht (x) g(x), = x T A + x T A T for an arbitrary matrix A = 2x T Ω when Ω = Ω T = h g g (Chain Rule) g(x)t h(x) scalar Note that the last identity) is a statement about Jacobians and can be restated in an illuminating manner as J h g = J h J g (1) which says that the linearization of a composition is the composition of the linearizations.

8 ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 8/1 Application to Linear Gaussian Model Stationary points of l(x) = log p x (y). = e(x) 2 W where e(x) = y A(x) and W = C 1 satisfy or 0 = l = l e e = (2eT W)( A) = 2(y Ax) T WA A T W(y Ax) = 0 e(x) = y Ax N(A ) with A = A T W Therefore stationary points satisfy the Normal Equation A T WAx = A T y A A = A y

9 ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 9/1 The Hessian of an Objective Function The Hessian, or matrix of second partial derivatives of l(x), is defined by H(x) ( ) T l(x) = n n n n }{{} n n As a consequence of the fact that the Hessian is obviously symmetric i j = j i H(x) = H T (x)

10 ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 10/1 Vector Taylor Series Expansion Taylor series expansion of a scalar-valued function l(x) about a point x 0 to second-order in x = x x 0 : l(x 0 + x) = l(x 0 ) + l(x 0) x xt H(x 0 ) x + h.o.t. where H is the Hessian of l(x). Taylor series expansion of a vector-valued function h(x) about a point x 0 to first-order in x = x x 0 : h(x) = h(x 0 + x) = h(x 0 ) + h(x 0) x + h.o.t. To obtain notationally uncluttered expressions for higher order expansions, one switches to the use of tensor notation.

11 ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 11/1 Sufficient Condition for an Extremum Let x 0 be a stationary value of l(x), l(x 0) of l(x) about x 0 we have = 0. Then from the second-order expansion l(x) = l(x) l(x 0 ) 1 2 (x x 0) T H(x 0 )(x x 0 ) = 1 2 xt H(x 0 ) x assuming that x = x x 0 is small enough in norm so that higher order terms in the expansion can be neglected. (That is, we consider only local excursions away from x 0.) We see that If the Hessian is positive definite, then all local excursions of x away from x 0 increase the value of l(x) and thus Suff. Cond. for Stationary Point x 0 to be a Unique Local Min: H(x 0 ) > 0 Contrawise, if the Hessian is negative definite, then all local excursions of x away from x 0 decrease the value of l(x) and thus Suff. Cond. for Stationary Point x 0 to be a Unique Local Max: H(x 0 ) < 0 If the Hessian H(x 0 ) is full rank and indefinite at a stationary point x 0, then x 0 is a saddle point. If H(x 0 ) 0 then x 0 is a non-unique local minimum. If H(x 0 ) 0 then x 0 is a non-unique local maximum.

12 ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 12/1 Application to Linear Gaussian Model Cont. Note that the scalar loss function l(x). = y Ax 2 w = (y Ax) T W(y Ax) = y T Wy 2y T WAx + x T A T WAx has an exact quadratic expansion. Thus the arguments given in the previous slide hold for arbitrarily large (global) excursions away from a stationary point x 0. In particular, if H is positive definite, the stationary point x 0 must be a unique global minimum. Having shown that l = 2xT A T WA 2y T WA we determine the Hessian to be ( ) l T = 2 A T WA 0 H(x) = Therefore stationary points (which, as we have seen, necessarily satisfy the normal equation) are global minima of l(x). Furthermore, if A is one-to-one (has full column rank), then H(x) = 2 A T WA > 0 and there is only one unique stationary point (i.e., weighted least-squares solution) which minimizes l(x). Of course this could not be otherwise, as we know from our previous analysis of the weighted least-squares problem using Hilbert space theory.

13 ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 13/1 The Gradient of an Objective Function l(x) Note that dl(x) = dx or, setting l = dl/dt and v = dx/dt, l(v) = v = Ω 1 x Ω xv = [ Ω 1 x ( ) ] T T Ω x v = x l(x), v with the Gradient of l(x) defined by x l(x) Ω 1 x ( ) T = Ω 1 x 1. n with Ω x a local Riemannian metric. Note that l(v) = x l(x),v is a linear functional of the velocity vector v. In the vector space structure we have seen to date Ω x is independent of x, Ω x = Ω, and represents the metric of the (globally) defined metric space containing x. Furthermore, if Ω = I, then the space is a (globally) Cartesian vector space. When considering spaces of smoothly parameterized regular family probability density functions, a natural Riemannian (local, non-cartesian) metric is provided by the Fisher Information Matrix to be discussed later in this course.

14 ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 14/1 The Gradient as the Direction of Steepest Ascent What unit velocities, v = 1, in the domain space X result in the fastest rate of change of l(x) as measured by l(v)? Equivalently, What unit velocity directions v in the domain space X result in the fastest rate of change of l(x) as measured by l(v)? From the Cauchy-Schwarz inequality, we have l(v) = x l(x), v x l(x) v = x l(x) or x l(x) l(v) x l(x) Note that v = c x l(x) with c = x l(x) 1 l(v) = x l(x) x l(x) = direction of steepest ascent. v = c x l(x) with c = x l(x) 1 l(v) = x l(x) x l(x) = direction of steepest descent.

15 ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 15/1 The Cartesian or Naive Gradient In a Cartesian vector space the gradient of a cost function l(x) corresponds to taking Ω = I, in which case we have the Cartesian gradient c xl(x) = ( ) T = 1. n Often one naively assumes that the gradient takes this form even if it is not evident that the space is, in fact, Cartesian. In this case one might more accurately refer to the gradient shown as the standard gradient or, even, the naive gradient. Because the Cartesian gradient is the standard form assumed in many applications, it is common to just refer to it as the gradient, even if it is not the the correct, true gradient. (Assuming that we agree that the true gradient must give the direction of steepest descent and therefore depends on the metric Ω x.) This is the terminology adhered to by Amari and his colleagues, who then refer to the true Riemannian metric-dependent gradient as the natural gradient. As mentioned earlier, when considering spaces of smoothly parameterized regular family probability density functions, a natural Riemannian metric is provided by the Fisher Information Matrix. Amari is one of the first researchers to consider parametric estimation from this Information Geometry perspective. He has argued that the use of the natural (true) gradient can significantly improve the performance of statistical learning algorithms.

Generalized Gradient Descent Algorithms

Generalized Gradient Descent Algorithms ECE 275AB Lecture 11 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 1/1 Lecture 11 ECE 275A Generalized Gradient Descent Algorithms ECE 275AB Lecture 11 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San

More information

Normed & Inner Product Vector Spaces

Normed & Inner Product Vector Spaces Normed & Inner Product Vector Spaces ECE 174 Introduction to Linear & Nonlinear Optimization Ken Kreutz-Delgado ECE Department, UC San Diego Ken Kreutz-Delgado (UC San Diego) ECE 174 Fall 2016 1 / 27 Normed

More information

Real Vector Derivatives, Gradients, and Nonlinear Least-Squares

Real Vector Derivatives, Gradients, and Nonlinear Least-Squares Real Vector Derivatives, Gradients, and Nonlinear Least-Squares ECE275A Lecture Notes Kenneth Kreutz-Delgado Electrical and Computer Engineering Jacobs School of Engineering University of California, San

More information

Pseudoinverse and Adjoint Operators

Pseudoinverse and Adjoint Operators ECE 275AB Lecture 5 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 1/1 Lecture 5 ECE 275A Pseudoinverse and Adjoint Operators ECE 275AB Lecture 5 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p.

More information

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality AM 205: lecture 18 Last time: optimization methods Today: conditions for optimality Existence of Global Minimum For example: f (x, y) = x 2 + y 2 is coercive on R 2 (global min. at (0, 0)) f (x) = x 3

More information

Pseudoinverse & Moore-Penrose Conditions

Pseudoinverse & Moore-Penrose Conditions ECE 275AB Lecture 7 Fall 2008 V1.0 c K. Kreutz-Delgado, UC San Diego p. 1/1 Lecture 7 ECE 275A Pseudoinverse & Moore-Penrose Conditions ECE 275AB Lecture 7 Fall 2008 V1.0 c K. Kreutz-Delgado, UC San Diego

More information

Vector Space Concepts

Vector Space Concepts Vector Space Concepts ECE 174 Introduction to Linear & Nonlinear Optimization Ken Kreutz-Delgado ECE Department, UC San Diego Ken Kreutz-Delgado (UC San Diego) ECE 174 Fall 2016 1 / 25 Vector Space Theory

More information

Introduction to gradient descent

Introduction to gradient descent 6-1: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction to gradient descent Derivation and intuitions Hessian 6-2: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction Our

More information

Sample ECE275A Midterm Exam Questions

Sample ECE275A Midterm Exam Questions Sample ECE275A Midterm Exam Questions The questions given below are actual problems taken from exams given in in the past few years. Solutions to these problems will NOT be provided. These problems and

More information

Functions of Several Variables

Functions of Several Variables Functions of Several Variables The Unconstrained Minimization Problem where In n dimensions the unconstrained problem is stated as f() x variables. minimize f()x x, is a scalar objective function of vector

More information

Course Summary Math 211

Course Summary Math 211 Course Summary Math 211 table of contents I. Functions of several variables. II. R n. III. Derivatives. IV. Taylor s Theorem. V. Differential Geometry. VI. Applications. 1. Best affine approximations.

More information

Gradient Descent. Dr. Xiaowei Huang

Gradient Descent. Dr. Xiaowei Huang Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,

More information

Moore-Penrose Conditions & SVD

Moore-Penrose Conditions & SVD ECE 275AB Lecture 8 Fall 2008 V1.0 c K. Kreutz-Delgado, UC San Diego p. 1/1 Lecture 8 ECE 275A Moore-Penrose Conditions & SVD ECE 275AB Lecture 8 Fall 2008 V1.0 c K. Kreutz-Delgado, UC San Diego p. 2/1

More information

Deep Learning. Authors: I. Goodfellow, Y. Bengio, A. Courville. Chapter 4: Numerical Computation. Lecture slides edited by C. Yim. C.

Deep Learning. Authors: I. Goodfellow, Y. Bengio, A. Courville. Chapter 4: Numerical Computation. Lecture slides edited by C. Yim. C. Chapter 4: Numerical Computation Deep Learning Authors: I. Goodfellow, Y. Bengio, A. Courville Lecture slides edited by 1 Chapter 4: Numerical Computation 4.1 Overflow and Underflow 4.2 Poor Conditioning

More information

Chapter 3 Numerical Methods

Chapter 3 Numerical Methods Chapter 3 Numerical Methods Part 2 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization 1 Outline 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization Summary 2 Outline 3.2

More information

1 Computing with constraints

1 Computing with constraints Notes for 2017-04-26 1 Computing with constraints Recall that our basic problem is minimize φ(x) s.t. x Ω where the feasible set Ω is defined by equality and inequality conditions Ω = {x R n : c i (x)

More information

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained

More information

Scientific Computing: Optimization

Scientific Computing: Optimization Scientific Computing: Optimization Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course MATH-GA.2043 or CSCI-GA.2112, Spring 2012 March 8th, 2011 A. Donev (Courant Institute) Lecture

More information

OR MSc Maths Revision Course

OR MSc Maths Revision Course OR MSc Maths Revision Course Tom Byrne School of Mathematics University of Edinburgh t.m.byrne@sms.ed.ac.uk 15 September 2017 General Information Today JCMB Lecture Theatre A, 09:30-12:30 Mathematics revision

More information

Nonlinear equations and optimization

Nonlinear equations and optimization Notes for 2017-03-29 Nonlinear equations and optimization For the next month or so, we will be discussing methods for solving nonlinear systems of equations and multivariate optimization problems. We will

More information

Fundamentals of Unconstrained Optimization

Fundamentals of Unconstrained Optimization dalmau@cimat.mx Centro de Investigación en Matemáticas CIMAT A.C. Mexico Enero 2016 Outline Introduction 1 Introduction 2 3 4 Optimization Problem min f (x) x Ω where f (x) is a real-valued function The

More information

Pseudoinverse & Orthogonal Projection Operators

Pseudoinverse & Orthogonal Projection Operators Pseudoinverse & Orthogonal Projection Operators ECE 174 Linear & Nonlinear Optimization Ken Kreutz-Delgado ECE Department, UC San Diego Ken Kreutz-Delgado (UC San Diego) ECE 174 Fall 2016 1 / 48 The Four

More information

Appendix 3. Derivatives of Vectors and Vector-Valued Functions DERIVATIVES OF VECTORS AND VECTOR-VALUED FUNCTIONS

Appendix 3. Derivatives of Vectors and Vector-Valued Functions DERIVATIVES OF VECTORS AND VECTOR-VALUED FUNCTIONS Appendix 3 Derivatives of Vectors and Vector-Valued Functions Draft Version 1 December 2000, c Dec. 2000, B. Walsh and M. Lynch Please email any comments/corrections to: jbwalsh@u.arizona.edu DERIVATIVES

More information

CE 191: Civil and Environmental Engineering Systems Analysis. LEC 05 : Optimality Conditions

CE 191: Civil and Environmental Engineering Systems Analysis. LEC 05 : Optimality Conditions CE 191: Civil and Environmental Engineering Systems Analysis LEC : Optimality Conditions Professor Scott Moura Civil & Environmental Engineering University of California, Berkeley Fall 214 Prof. Moura

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

EC /11. Math for Microeconomics September Course, Part II Lecture Notes. Course Outline

EC /11. Math for Microeconomics September Course, Part II Lecture Notes. Course Outline LONDON SCHOOL OF ECONOMICS Professor Leonardo Felli Department of Economics S.478; x7525 EC400 20010/11 Math for Microeconomics September Course, Part II Lecture Notes Course Outline Lecture 1: Tools for

More information

10.34 Numerical Methods Applied to Chemical Engineering Fall Quiz #1 Review

10.34 Numerical Methods Applied to Chemical Engineering Fall Quiz #1 Review 10.34 Numerical Methods Applied to Chemical Engineering Fall 2015 Quiz #1 Review Study guide based on notes developed by J.A. Paulson, modified by K. Severson Linear Algebra We ve covered three major topics

More information

Optimality Conditions

Optimality Conditions Chapter 2 Optimality Conditions 2.1 Global and Local Minima for Unconstrained Problems When a minimization problem does not have any constraints, the problem is to find the minimum of the objective function.

More information

1. Method 1: bisection. The bisection methods starts from two points a 0 and b 0 such that

1. Method 1: bisection. The bisection methods starts from two points a 0 and b 0 such that Chapter 4 Nonlinear equations 4.1 Root finding Consider the problem of solving any nonlinear relation g(x) = h(x) in the real variable x. We rephrase this problem as one of finding the zero (root) of a

More information

Optimality Conditions for Constrained Optimization

Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

More information

Machine Learning Brett Bernstein. Recitation 1: Gradients and Directional Derivatives

Machine Learning Brett Bernstein. Recitation 1: Gradients and Directional Derivatives Machine Learning Brett Bernstein Recitation 1: Gradients and Directional Derivatives Intro Question 1 We are given the data set (x 1, y 1 ),, (x n, y n ) where x i R d and y i R We want to fit a linear

More information

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained

More information

Gradient Descent. Sargur Srihari

Gradient Descent. Sargur Srihari Gradient Descent Sargur srihari@cedar.buffalo.edu 1 Topics Simple Gradient Descent/Ascent Difficulties with Simple Gradient Descent Line Search Brent s Method Conjugate Gradient Descent Weight vectors

More information

Math 302 Outcome Statements Winter 2013

Math 302 Outcome Statements Winter 2013 Math 302 Outcome Statements Winter 2013 1 Rectangular Space Coordinates; Vectors in the Three-Dimensional Space (a) Cartesian coordinates of a point (b) sphere (c) symmetry about a point, a line, and a

More information

Vectors. January 13, 2013

Vectors. January 13, 2013 Vectors January 13, 2013 The simplest tensors are scalars, which are the measurable quantities of a theory, left invariant by symmetry transformations. By far the most common non-scalars are the vectors,

More information

Appendix 5. Derivatives of Vectors and Vector-Valued Functions. Draft Version 24 November 2008

Appendix 5. Derivatives of Vectors and Vector-Valued Functions. Draft Version 24 November 2008 Appendix 5 Derivatives of Vectors and Vector-Valued Functions Draft Version 24 November 2008 Quantitative genetics often deals with vector-valued functions and here we provide a brief review of the calculus

More information

Math 273a: Optimization Basic concepts

Math 273a: Optimization Basic concepts Math 273a: Optimization Basic concepts Instructor: Wotao Yin Department of Mathematics, UCLA Spring 2015 slides based on Chong-Zak, 4th Ed. Goals of this lecture The general form of optimization: minimize

More information

LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION

LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION 15-382 COLLECTIVE INTELLIGENCE - S19 LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION TEACHER: GIANNI A. DI CARO WHAT IF WE HAVE ONE SINGLE AGENT PSO leverages the presence of a swarm: the outcome

More information

(Extended) Kalman Filter

(Extended) Kalman Filter (Extended) Kalman Filter Brian Hunt 7 June 2013 Goals of Data Assimilation (DA) Estimate the state of a system based on both current and all past observations of the system, using a model for the system

More information

Some definitions. Math 1080: Numerical Linear Algebra Chapter 5, Solving Ax = b by Optimization. A-inner product. Important facts

Some definitions. Math 1080: Numerical Linear Algebra Chapter 5, Solving Ax = b by Optimization. A-inner product. Important facts Some definitions Math 1080: Numerical Linear Algebra Chapter 5, Solving Ax = b by Optimization M. M. Sussman sussmanm@math.pitt.edu Office Hours: MW 1:45PM-2:45PM, Thack 622 A matrix A is SPD (Symmetric

More information

Lecture 3. Optimization Problems and Iterative Algorithms

Lecture 3. Optimization Problems and Iterative Algorithms Lecture 3 Optimization Problems and Iterative Algorithms January 13, 2016 This material was jointly developed with Angelia Nedić at UIUC for IE 598ns Outline Special Functions: Linear, Quadratic, Convex

More information

Functions of Several Variables

Functions of Several Variables Jim Lambers MAT 419/519 Summer Session 2011-12 Lecture 2 Notes These notes correspond to Section 1.2 in the text. Functions of Several Variables We now generalize the results from the previous section,

More information

Performance Surfaces and Optimum Points

Performance Surfaces and Optimum Points CSC 302 1.5 Neural Networks Performance Surfaces and Optimum Points 1 Entrance Performance learning is another important class of learning law. Network parameters are adjusted to optimize the performance

More information

10-725/36-725: Convex Optimization Prerequisite Topics

10-725/36-725: Convex Optimization Prerequisite Topics 10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

SF2822 Applied Nonlinear Optimization. Preparatory question. Lecture 9: Sequential quadratic programming. Anders Forsgren

SF2822 Applied Nonlinear Optimization. Preparatory question. Lecture 9: Sequential quadratic programming. Anders Forsgren SF2822 Applied Nonlinear Optimization Lecture 9: Sequential quadratic programming Anders Forsgren SF2822 Applied Nonlinear Optimization, KTH / 24 Lecture 9, 207/208 Preparatory question. Try to solve theory

More information

Mathematical Economics. Lecture Notes (in extracts)

Mathematical Economics. Lecture Notes (in extracts) Prof. Dr. Frank Werner Faculty of Mathematics Institute of Mathematical Optimization (IMO) http://math.uni-magdeburg.de/ werner/math-ec-new.html Mathematical Economics Lecture Notes (in extracts) Winter

More information

Variational Methods & Optimal Control

Variational Methods & Optimal Control Variational Methods & Optimal Control lecture 02 Matthew Roughan Discipline of Applied Mathematics School of Mathematical Sciences University of Adelaide April 14, 2016

More information

Nonlinear Optimization

Nonlinear Optimization Nonlinear Optimization (Com S 477/577 Notes) Yan-Bin Jia Nov 7, 2017 1 Introduction Given a single function f that depends on one or more independent variable, we want to find the values of those variables

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Introduction to Optimization Techniques. Nonlinear Optimization in Function Spaces

Introduction to Optimization Techniques. Nonlinear Optimization in Function Spaces Introduction to Optimization Techniques Nonlinear Optimization in Function Spaces X : T : Gateaux and Fréchet Differentials Gateaux and Fréchet Differentials a vector space, Y : a normed space transformation

More information

MATHEMATICAL ECONOMICS: OPTIMIZATION. Contents

MATHEMATICAL ECONOMICS: OPTIMIZATION. Contents MATHEMATICAL ECONOMICS: OPTIMIZATION JOÃO LOPES DIAS Contents 1. Introduction 2 1.1. Preliminaries 2 1.2. Optimal points and values 2 1.3. The optimization problems 3 1.4. Existence of optimal points 4

More information

are continuous). Then we also know that the above function g is continuously differentiable on [a, b]. For that purpose, we will first show that

are continuous). Then we also know that the above function g is continuously differentiable on [a, b]. For that purpose, we will first show that Derivatives and Integrals Now suppose f : [a, b] [c, d] IR is continuous. It is easy to see that for each fixed x [a, b] the function f x (y) = f(x, y) is continuous on [c, d] (note that f x is sometimes

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

Subject: Optimal Control Assignment-1 (Related to Lecture notes 1-10)

Subject: Optimal Control Assignment-1 (Related to Lecture notes 1-10) Subject: Optimal Control Assignment- (Related to Lecture notes -). Design a oil mug, shown in fig., to hold as much oil possible. The height and radius of the mug should not be more than 6cm. The mug must

More information

Chapter 5 Elements of Calculus

Chapter 5 Elements of Calculus Chapter 5 Elements of Calculus An Introduction to Optimization Spring, 2015 Wei-Ta Chu 1 Sequences and Limits A sequence of real numbers can be viewed as a set of numbers, which is often also denoted as

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:

More information

Lagrange Multipliers

Lagrange Multipliers Lagrange Multipliers (Com S 477/577 Notes) Yan-Bin Jia Nov 9, 2017 1 Introduction We turn now to the study of minimization with constraints. More specifically, we will tackle the following problem: minimize

More information

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen

More information

Matrix differential calculus Optimization Geoff Gordon Ryan Tibshirani

Matrix differential calculus Optimization Geoff Gordon Ryan Tibshirani Matrix differential calculus 10-725 Optimization Geoff Gordon Ryan Tibshirani Review Matrix differentials: sol n to matrix calculus pain compact way of writing Taylor expansions, or definition: df = a(x;

More information

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by: Newton s Method Suppose we want to solve: (P:) min f (x) At x = x, f (x) can be approximated by: n x R. f (x) h(x) := f ( x)+ f ( x) T (x x)+ (x x) t H ( x)(x x), 2 which is the quadratic Taylor expansion

More information

5 Handling Constraints

5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Calculus 2502A - Advanced Calculus I Fall : Local minima and maxima

Calculus 2502A - Advanced Calculus I Fall : Local minima and maxima Calculus 50A - Advanced Calculus I Fall 014 14.7: Local minima and maxima Martin Frankland November 17, 014 In these notes, we discuss the problem of finding the local minima and maxima of a function.

More information

N. L. P. NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP. Optimization. Models of following form:

N. L. P. NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP. Optimization. Models of following form: 0.1 N. L. P. Katta G. Murty, IOE 611 Lecture slides Introductory Lecture NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP does not include everything

More information

Nonlinear Optimization: What s important?

Nonlinear Optimization: What s important? Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global

More information

Linear and logistic regression

Linear and logistic regression Linear and logistic regression Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Linear and logistic regression 1/22 Outline 1 Linear regression 2 Logistic regression 3 Fisher discriminant analysis

More information

Lecture 11: October 2

Lecture 11: October 2 10-725: Optimization Fall 2012 Lecture 11: October 2 Lecturer: Geoff Gordon/Ryan Tibshirani Scribes: Tongbo Huang, Shoou-I Yu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes

More information

ECE580 Partial Solution to Problem Set 3

ECE580 Partial Solution to Problem Set 3 ECE580 Fall 2015 Solution to Problem Set 3 October 23, 2015 1 ECE580 Partial Solution to Problem Set 3 These problems are from the textbook by Chong and Zak, 4th edition, which is the textbook for the

More information

Kalman Filter. Predict: Update: x k k 1 = F k x k 1 k 1 + B k u k P k k 1 = F k P k 1 k 1 F T k + Q

Kalman Filter. Predict: Update: x k k 1 = F k x k 1 k 1 + B k u k P k k 1 = F k P k 1 k 1 F T k + Q Kalman Filter Kalman Filter Predict: x k k 1 = F k x k 1 k 1 + B k u k P k k 1 = F k P k 1 k 1 F T k + Q Update: K = P k k 1 Hk T (H k P k k 1 Hk T + R) 1 x k k = x k k 1 + K(z k H k x k k 1 ) P k k =(I

More information

Unconstrained Optimization

Unconstrained Optimization 1 / 36 Unconstrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University February 2, 2015 2 / 36 3 / 36 4 / 36 5 / 36 1. preliminaries 1.1 local approximation

More information

Note: Every graph is a level set (why?). But not every level set is a graph. Graphs must pass the vertical line test. (Level sets may or may not.

Note: Every graph is a level set (why?). But not every level set is a graph. Graphs must pass the vertical line test. (Level sets may or may not. Curves in R : Graphs vs Level Sets Graphs (y = f(x)): The graph of f : R R is {(x, y) R y = f(x)} Example: When we say the curve y = x, we really mean: The graph of the function f(x) = x That is, we mean

More information

ECE580 Exam 1 October 4, Please do not write on the back of the exam pages. Extra paper is available from the instructor.

ECE580 Exam 1 October 4, Please do not write on the back of the exam pages. Extra paper is available from the instructor. ECE580 Exam 1 October 4, 2012 1 Name: Solution Score: /100 You must show ALL of your work for full credit. This exam is closed-book. Calculators may NOT be used. Please leave fractions as fractions, etc.

More information

The Derivative. Appendix B. B.1 The Derivative of f. Mappings from IR to IR

The Derivative. Appendix B. B.1 The Derivative of f. Mappings from IR to IR Appendix B The Derivative B.1 The Derivative of f In this chapter, we give a short summary of the derivative. Specifically, we want to compare/contrast how the derivative appears for functions whose domain

More information

Implicit Functions, Curves and Surfaces

Implicit Functions, Curves and Surfaces Chapter 11 Implicit Functions, Curves and Surfaces 11.1 Implicit Function Theorem Motivation. In many problems, objects or quantities of interest can only be described indirectly or implicitly. It is then

More information

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009 UC Berkeley Department of Electrical Engineering and Computer Science EECS 227A Nonlinear and Convex Optimization Solutions 5 Fall 2009 Reading: Boyd and Vandenberghe, Chapter 5 Solution 5.1 Note that

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Lecture 1 October 9, 2013

Lecture 1 October 9, 2013 Probabilistic Graphical Models Fall 2013 Lecture 1 October 9, 2013 Lecturer: Guillaume Obozinski Scribe: Huu Dien Khue Le, Robin Bénesse The web page of the course: http://www.di.ens.fr/~fbach/courses/fall2013/

More information

Mechanical Systems II. Method of Lagrange Multipliers

Mechanical Systems II. Method of Lagrange Multipliers Mechanical Systems II. Method of Lagrange Multipliers Rafael Wisniewski Aalborg University Abstract. So far our approach to classical mechanics was limited to finding a critical point of a certain functional.

More information

ECON 5111 Mathematical Economics

ECON 5111 Mathematical Economics Test 1 October 1, 2010 1. Construct a truth table for the following statement: [p (p q)] q. 2. A prime number is a natural number that is divisible by 1 and itself only. Let P be the set of all prime numbers

More information

Preliminary draft only: please check for final version

Preliminary draft only: please check for final version ARE211, Fall2012 CALCULUS4: THU, OCT 11, 2012 PRINTED: AUGUST 22, 2012 (LEC# 15) Contents 3. Univariate and Multivariate Differentiation (cont) 1 3.6. Taylor s Theorem (cont) 2 3.7. Applying Taylor theory:

More information

UNCONSTRAINED OPTIMIZATION

UNCONSTRAINED OPTIMIZATION UNCONSTRAINED OPTIMIZATION 6. MATHEMATICAL BASIS Given a function f : R n R, and x R n such that f(x ) < f(x) for all x R n then x is called a minimizer of f and f(x ) is the minimum(value) of f. We wish

More information

Recitation 1. Gradients and Directional Derivatives. Brett Bernstein. CDS at NYU. January 21, 2018

Recitation 1. Gradients and Directional Derivatives. Brett Bernstein. CDS at NYU. January 21, 2018 Gradients and Directional Derivatives Brett Bernstein CDS at NYU January 21, 2018 Brett Bernstein (CDS at NYU) Recitation 1 January 21, 2018 1 / 23 Initial Question Intro Question Question We are given

More information

Basic Math for

Basic Math for Basic Math for 16-720 August 23, 2002 1 Linear Algebra 1.1 Vectors and Matrices First, a reminder of a few basic notations, definitions, and terminology: Unless indicated otherwise, vectors are always

More information

Mathematical Economics (ECON 471) Lecture 3 Calculus of Several Variables & Implicit Functions

Mathematical Economics (ECON 471) Lecture 3 Calculus of Several Variables & Implicit Functions Mathematical Economics (ECON 471) Lecture 3 Calculus of Several Variables & Implicit Functions Teng Wah Leo 1 Calculus of Several Variables 11 Functions Mapping between Euclidean Spaces Where as in univariate

More information

ECE 680 Modern Automatic Control. Gradient and Newton s Methods A Review

ECE 680 Modern Automatic Control. Gradient and Newton s Methods A Review ECE 680Modern Automatic Control p. 1/1 ECE 680 Modern Automatic Control Gradient and Newton s Methods A Review Stan Żak October 25, 2011 ECE 680Modern Automatic Control p. 2/1 Review of the Gradient Properties

More information

Math 273a: Optimization Netwon s methods

Math 273a: Optimization Netwon s methods Math 273a: Optimization Netwon s methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 some material taken from Chong-Zak, 4th Ed. Main features of Newton s method Uses both first derivatives

More information

, b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are

, b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are Quadratic forms We consider the quadratic function f : R 2 R defined by f(x) = 2 xt Ax b T x with x = (x, x 2 ) T, () where A R 2 2 is symmetric and b R 2. We will see that, depending on the eigenvalues

More information

g(x,y) = c. For instance (see Figure 1 on the right), consider the optimization problem maximize subject to

g(x,y) = c. For instance (see Figure 1 on the right), consider the optimization problem maximize subject to 1 of 11 11/29/2010 10:39 AM From Wikipedia, the free encyclopedia In mathematical optimization, the method of Lagrange multipliers (named after Joseph Louis Lagrange) provides a strategy for finding the

More information

Information geometry for bivariate distribution control

Information geometry for bivariate distribution control Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic

More information

MATH 4211/6211 Optimization Constrained Optimization

MATH 4211/6211 Optimization Constrained Optimization MATH 4211/6211 Optimization Constrained Optimization Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 Constrained optimization

More information

Examination paper for TMA4180 Optimization I

Examination paper for TMA4180 Optimization I Department of Mathematical Sciences Examination paper for TMA4180 Optimization I Academic contact during examination: Phone: Examination date: 26th May 2016 Examination time (from to): 09:00 13:00 Permitted

More information

Visual SLAM Tutorial: Bundle Adjustment

Visual SLAM Tutorial: Bundle Adjustment Visual SLAM Tutorial: Bundle Adjustment Frank Dellaert June 27, 2014 1 Minimizing Re-projection Error in Two Views In a two-view setting, we are interested in finding the most likely camera poses T1 w

More information

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

Lectures 9 and 10: Constrained optimization problems and their optimality conditions Lectures 9 and 10: Constrained optimization problems and their optimality conditions Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lectures 9 and 10: Constrained

More information

Convex/Schur-Convex (CSC) Log-Priors

Convex/Schur-Convex (CSC) Log-Priors 1999 6th Joint Symposium or2 Neural Cornputatiort Proceedings Convex/Schur-Convex (CSC) Log-Priors and Sparse Coding K. Kreutz-Delgado, B.D. Rao, and K. Engan* Electrical and Computer Engineering Department

More information

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints Instructor: Prof. Kevin Ross Scribe: Nitish John October 18, 2011 1 The Basic Goal The main idea is to transform a given constrained

More information

Recursive Estimation

Recursive Estimation Recursive Estimation Raffaello D Andrea Spring 08 Problem Set 3: Extracting Estimates from Probability Distributions Last updated: April 9, 08 Notes: Notation: Unless otherwise noted, x, y, and z denote

More information

CHAPTER 2: QUADRATIC PROGRAMMING

CHAPTER 2: QUADRATIC PROGRAMMING CHAPTER 2: QUADRATIC PROGRAMMING Overview Quadratic programming (QP) problems are characterized by objective functions that are quadratic in the design variables, and linear constraints. In this sense,

More information

Descent methods. min x. f(x)

Descent methods. min x. f(x) Gradient Descent Descent methods min x f(x) 5 / 34 Descent methods min x f(x) x k x k+1... x f(x ) = 0 5 / 34 Gradient methods Unconstrained optimization min f(x) x R n. 6 / 34 Gradient methods Unconstrained

More information

An Inexact Newton Method for Nonlinear Constrained Optimization

An Inexact Newton Method for Nonlinear Constrained Optimization An Inexact Newton Method for Nonlinear Constrained Optimization Frank E. Curtis Numerical Analysis Seminar, January 23, 2009 Outline Motivation and background Algorithm development and theoretical results

More information