Vector Derivatives and the Gradient
|
|
- Jodie McKinney
- 5 years ago
- Views:
Transcription
1 ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 1/1 Lecture 10 ECE 275A Vector Derivatives and the Gradient
2 ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 2/1 Objective Functions l(x) Let l(x) be a real-valued function (aka functional) of an n-dimensional real vector x X = R n l( ) : X = R n V = R where X is an n-dimensional real Hilbert space with metric matrix Ω = Ω T > 0. Note that henceforth vectors in X are represented a column vectors in R n. Because here the value space Y = R is one-dimensional, wlog we can take V to be Cartesian (because all (necessarily positive scalar) metric weightings yield inner products and norms that are equivalent up to an overall positive scaling). We will call l(x) an objective function. If l(x) is a cost, loss, or penalty function, then we assume that it is bounded from below and we attempt to minimize it wrt x. If l(x) is a profit, gain, or reward function, then we assume that it is bounded from above and we attempt to maximize it wrt x. For example, suppose we wish to match a model pdf p x (y) to a true, but unknown, density p x 0(y) for an observed random vector, where we assume p x (y) p x 0(y), x. We can then use a penalty function of x to be given by a measure of (non-averaged or instantaneous) divergence or discrepancy D I (x 0 x) of the model pdf p x (y) from the true pdf p x 0(y) defined by ( ) D I (x 0 px 0(y) x) log p x (y) = log p x 0(y) log p x (y)
3 ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 3/1 Instantaneous Divergence & Negative Log-likelihood Note that minimizing the instantaneous divergence D I (x 0 x) is equivalent to maximizing the log-likelihood p x (y) or minimizing the negative log-likelihood l(x) = log p x (y) A special case arises from use of the nonlinear Gaussian additive noise model with known noise covariance C and known mean function h( ) y = h(x) + n, n N(0, C) y N(h(x), C) which yields the nonlinear weighted least-squares problem l(x) = log p x (y). = y h(x) 2 W, W = C 1 Further setting h(x) = Ax is the linear weighted least-squares problem we have already discussed l(x) = log p x (y) =. y Ax 2 W, W = C 1 The symbol. = denotes that fact that we are ignoring additive terms and multiplicative factors which are irrelevant for the purposes of obtaining a extremum (here, a minimum) of a loss function. Of course we cannot ignore these terms if we are interested in the optimal value of the loss function itself.
4 ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 4/1 Stationary Points and the Vector Partial Derivative Henceforth let the real scalar function l(x) be twice partial differentiable with respect to all components of x R n. A necessary condition for x be be a local extremum (maximum or minimum) of l(x) is that l(x) (... 1 n }{{} 1 n ) = 0 where the vector partial derivative operator ( 1... n ) is defined as a row operator. (See the extensive discussion in the Lecture Supplement on Real Vector Derivative.) A vector that satisfies l(x) = 0 is known as a stationary point of l(x). Stationary points are points at which l(x) has a local maximum, minimum, or inflection. Sufficient conditions for a stationary point to be a local extremum require that we develop a theory of vector differentiation that will allow us to clearly and succinctly discuss second-order derivative properties of objective functions.
5 ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 5/1 Derivative of a Vector-Valued Function The Jacobian Let f(x) R m have elements f i (x), i = 1,, m, which are all differentiable with respect to the components of x R n. We define the vector partial derivative of the vector function f(x) as J f (x) f(x) f 1(x). f m(x) = f 1 (x) f 1 (x) 1 n f m (x) f m (x) 1 n }{{} m n The matrix J f (x) = f(x) is known as the Jacobian matrix or operator of the mapping f(x). It is the linearization of the nonlinear mapping f(x) at the point x. Often we write y = f(x) and the corresponding Jacobian as J y (x). If m = n, then y = f(x) can be viewed as a change of variables, in which case det J y (x) is known as the Jacobian of the transformation. det J y (x) plays a fundamental role in the change of variable formula of pdf s.
6 ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 6/1 The Jacobian Cont. A function f : R n R m is locally one-to-one in an open neighborhood of x if and only if its Jacobian (linearization) J f (ξ) = f(ξ) is a one-to-one matrix for all points ξ R n in the open neighborhood of x. A function f : R n R m is locally onto an open neighborhood of y = f(x) if and only if its Jacobian (linearization) J f (ξ) = f(ξ) is an onto matrix for all points ξ R n in the corresponding neighborhood of x. A function f : R n R m is locally invertible in an open neighborhood of x if and only if it is locally one-to-one and onto in the open neighborhood of x which is true if and only if its Jacobian (linearization) J f (ξ) = f(ξ) is a one-to-one and onto (and hence invertible) matrix for all points ξ R n in the open neighborhood of x. This is known as the Inverse Function Theorem.
7 ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 7/1 Vector Derivative Identities c T x Ax g T (x)h(x) T Ax T Ωx h(g(x)) = c T for an arbitrary vector c = A for an arbitrary matrix A = g T (x) h(x) + ht (x) g(x), = x T A + x T A T for an arbitrary matrix A = 2x T Ω when Ω = Ω T = h g g (Chain Rule) g(x)t h(x) scalar Note that the last identity) is a statement about Jacobians and can be restated in an illuminating manner as J h g = J h J g (1) which says that the linearization of a composition is the composition of the linearizations.
8 ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 8/1 Application to Linear Gaussian Model Stationary points of l(x) = log p x (y). = e(x) 2 W where e(x) = y A(x) and W = C 1 satisfy or 0 = l = l e e = (2eT W)( A) = 2(y Ax) T WA A T W(y Ax) = 0 e(x) = y Ax N(A ) with A = A T W Therefore stationary points satisfy the Normal Equation A T WAx = A T y A A = A y
9 ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 9/1 The Hessian of an Objective Function The Hessian, or matrix of second partial derivatives of l(x), is defined by H(x) ( ) T l(x) = n n n n }{{} n n As a consequence of the fact that the Hessian is obviously symmetric i j = j i H(x) = H T (x)
10 ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 10/1 Vector Taylor Series Expansion Taylor series expansion of a scalar-valued function l(x) about a point x 0 to second-order in x = x x 0 : l(x 0 + x) = l(x 0 ) + l(x 0) x xt H(x 0 ) x + h.o.t. where H is the Hessian of l(x). Taylor series expansion of a vector-valued function h(x) about a point x 0 to first-order in x = x x 0 : h(x) = h(x 0 + x) = h(x 0 ) + h(x 0) x + h.o.t. To obtain notationally uncluttered expressions for higher order expansions, one switches to the use of tensor notation.
11 ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 11/1 Sufficient Condition for an Extremum Let x 0 be a stationary value of l(x), l(x 0) of l(x) about x 0 we have = 0. Then from the second-order expansion l(x) = l(x) l(x 0 ) 1 2 (x x 0) T H(x 0 )(x x 0 ) = 1 2 xt H(x 0 ) x assuming that x = x x 0 is small enough in norm so that higher order terms in the expansion can be neglected. (That is, we consider only local excursions away from x 0.) We see that If the Hessian is positive definite, then all local excursions of x away from x 0 increase the value of l(x) and thus Suff. Cond. for Stationary Point x 0 to be a Unique Local Min: H(x 0 ) > 0 Contrawise, if the Hessian is negative definite, then all local excursions of x away from x 0 decrease the value of l(x) and thus Suff. Cond. for Stationary Point x 0 to be a Unique Local Max: H(x 0 ) < 0 If the Hessian H(x 0 ) is full rank and indefinite at a stationary point x 0, then x 0 is a saddle point. If H(x 0 ) 0 then x 0 is a non-unique local minimum. If H(x 0 ) 0 then x 0 is a non-unique local maximum.
12 ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 12/1 Application to Linear Gaussian Model Cont. Note that the scalar loss function l(x). = y Ax 2 w = (y Ax) T W(y Ax) = y T Wy 2y T WAx + x T A T WAx has an exact quadratic expansion. Thus the arguments given in the previous slide hold for arbitrarily large (global) excursions away from a stationary point x 0. In particular, if H is positive definite, the stationary point x 0 must be a unique global minimum. Having shown that l = 2xT A T WA 2y T WA we determine the Hessian to be ( ) l T = 2 A T WA 0 H(x) = Therefore stationary points (which, as we have seen, necessarily satisfy the normal equation) are global minima of l(x). Furthermore, if A is one-to-one (has full column rank), then H(x) = 2 A T WA > 0 and there is only one unique stationary point (i.e., weighted least-squares solution) which minimizes l(x). Of course this could not be otherwise, as we know from our previous analysis of the weighted least-squares problem using Hilbert space theory.
13 ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 13/1 The Gradient of an Objective Function l(x) Note that dl(x) = dx or, setting l = dl/dt and v = dx/dt, l(v) = v = Ω 1 x Ω xv = [ Ω 1 x ( ) ] T T Ω x v = x l(x), v with the Gradient of l(x) defined by x l(x) Ω 1 x ( ) T = Ω 1 x 1. n with Ω x a local Riemannian metric. Note that l(v) = x l(x),v is a linear functional of the velocity vector v. In the vector space structure we have seen to date Ω x is independent of x, Ω x = Ω, and represents the metric of the (globally) defined metric space containing x. Furthermore, if Ω = I, then the space is a (globally) Cartesian vector space. When considering spaces of smoothly parameterized regular family probability density functions, a natural Riemannian (local, non-cartesian) metric is provided by the Fisher Information Matrix to be discussed later in this course.
14 ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 14/1 The Gradient as the Direction of Steepest Ascent What unit velocities, v = 1, in the domain space X result in the fastest rate of change of l(x) as measured by l(v)? Equivalently, What unit velocity directions v in the domain space X result in the fastest rate of change of l(x) as measured by l(v)? From the Cauchy-Schwarz inequality, we have l(v) = x l(x), v x l(x) v = x l(x) or x l(x) l(v) x l(x) Note that v = c x l(x) with c = x l(x) 1 l(v) = x l(x) x l(x) = direction of steepest ascent. v = c x l(x) with c = x l(x) 1 l(v) = x l(x) x l(x) = direction of steepest descent.
15 ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 15/1 The Cartesian or Naive Gradient In a Cartesian vector space the gradient of a cost function l(x) corresponds to taking Ω = I, in which case we have the Cartesian gradient c xl(x) = ( ) T = 1. n Often one naively assumes that the gradient takes this form even if it is not evident that the space is, in fact, Cartesian. In this case one might more accurately refer to the gradient shown as the standard gradient or, even, the naive gradient. Because the Cartesian gradient is the standard form assumed in many applications, it is common to just refer to it as the gradient, even if it is not the the correct, true gradient. (Assuming that we agree that the true gradient must give the direction of steepest descent and therefore depends on the metric Ω x.) This is the terminology adhered to by Amari and his colleagues, who then refer to the true Riemannian metric-dependent gradient as the natural gradient. As mentioned earlier, when considering spaces of smoothly parameterized regular family probability density functions, a natural Riemannian metric is provided by the Fisher Information Matrix. Amari is one of the first researchers to consider parametric estimation from this Information Geometry perspective. He has argued that the use of the natural (true) gradient can significantly improve the performance of statistical learning algorithms.
Generalized Gradient Descent Algorithms
ECE 275AB Lecture 11 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 1/1 Lecture 11 ECE 275A Generalized Gradient Descent Algorithms ECE 275AB Lecture 11 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San
More informationNormed & Inner Product Vector Spaces
Normed & Inner Product Vector Spaces ECE 174 Introduction to Linear & Nonlinear Optimization Ken Kreutz-Delgado ECE Department, UC San Diego Ken Kreutz-Delgado (UC San Diego) ECE 174 Fall 2016 1 / 27 Normed
More informationReal Vector Derivatives, Gradients, and Nonlinear Least-Squares
Real Vector Derivatives, Gradients, and Nonlinear Least-Squares ECE275A Lecture Notes Kenneth Kreutz-Delgado Electrical and Computer Engineering Jacobs School of Engineering University of California, San
More informationPseudoinverse and Adjoint Operators
ECE 275AB Lecture 5 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 1/1 Lecture 5 ECE 275A Pseudoinverse and Adjoint Operators ECE 275AB Lecture 5 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p.
More informationAM 205: lecture 18. Last time: optimization methods Today: conditions for optimality
AM 205: lecture 18 Last time: optimization methods Today: conditions for optimality Existence of Global Minimum For example: f (x, y) = x 2 + y 2 is coercive on R 2 (global min. at (0, 0)) f (x) = x 3
More informationPseudoinverse & Moore-Penrose Conditions
ECE 275AB Lecture 7 Fall 2008 V1.0 c K. Kreutz-Delgado, UC San Diego p. 1/1 Lecture 7 ECE 275A Pseudoinverse & Moore-Penrose Conditions ECE 275AB Lecture 7 Fall 2008 V1.0 c K. Kreutz-Delgado, UC San Diego
More informationVector Space Concepts
Vector Space Concepts ECE 174 Introduction to Linear & Nonlinear Optimization Ken Kreutz-Delgado ECE Department, UC San Diego Ken Kreutz-Delgado (UC San Diego) ECE 174 Fall 2016 1 / 25 Vector Space Theory
More informationIntroduction to gradient descent
6-1: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction to gradient descent Derivation and intuitions Hessian 6-2: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction Our
More informationSample ECE275A Midterm Exam Questions
Sample ECE275A Midterm Exam Questions The questions given below are actual problems taken from exams given in in the past few years. Solutions to these problems will NOT be provided. These problems and
More informationFunctions of Several Variables
Functions of Several Variables The Unconstrained Minimization Problem where In n dimensions the unconstrained problem is stated as f() x variables. minimize f()x x, is a scalar objective function of vector
More informationCourse Summary Math 211
Course Summary Math 211 table of contents I. Functions of several variables. II. R n. III. Derivatives. IV. Taylor s Theorem. V. Differential Geometry. VI. Applications. 1. Best affine approximations.
More informationGradient Descent. Dr. Xiaowei Huang
Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,
More informationMoore-Penrose Conditions & SVD
ECE 275AB Lecture 8 Fall 2008 V1.0 c K. Kreutz-Delgado, UC San Diego p. 1/1 Lecture 8 ECE 275A Moore-Penrose Conditions & SVD ECE 275AB Lecture 8 Fall 2008 V1.0 c K. Kreutz-Delgado, UC San Diego p. 2/1
More informationDeep Learning. Authors: I. Goodfellow, Y. Bengio, A. Courville. Chapter 4: Numerical Computation. Lecture slides edited by C. Yim. C.
Chapter 4: Numerical Computation Deep Learning Authors: I. Goodfellow, Y. Bengio, A. Courville Lecture slides edited by 1 Chapter 4: Numerical Computation 4.1 Overflow and Underflow 4.2 Poor Conditioning
More informationChapter 3 Numerical Methods
Chapter 3 Numerical Methods Part 2 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization 1 Outline 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization Summary 2 Outline 3.2
More information1 Computing with constraints
Notes for 2017-04-26 1 Computing with constraints Recall that our basic problem is minimize φ(x) s.t. x Ω where the feasible set Ω is defined by equality and inequality conditions Ω = {x R n : c i (x)
More informationOptimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30
Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained
More informationScientific Computing: Optimization
Scientific Computing: Optimization Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course MATH-GA.2043 or CSCI-GA.2112, Spring 2012 March 8th, 2011 A. Donev (Courant Institute) Lecture
More informationOR MSc Maths Revision Course
OR MSc Maths Revision Course Tom Byrne School of Mathematics University of Edinburgh t.m.byrne@sms.ed.ac.uk 15 September 2017 General Information Today JCMB Lecture Theatre A, 09:30-12:30 Mathematics revision
More informationNonlinear equations and optimization
Notes for 2017-03-29 Nonlinear equations and optimization For the next month or so, we will be discussing methods for solving nonlinear systems of equations and multivariate optimization problems. We will
More informationFundamentals of Unconstrained Optimization
dalmau@cimat.mx Centro de Investigación en Matemáticas CIMAT A.C. Mexico Enero 2016 Outline Introduction 1 Introduction 2 3 4 Optimization Problem min f (x) x Ω where f (x) is a real-valued function The
More informationPseudoinverse & Orthogonal Projection Operators
Pseudoinverse & Orthogonal Projection Operators ECE 174 Linear & Nonlinear Optimization Ken Kreutz-Delgado ECE Department, UC San Diego Ken Kreutz-Delgado (UC San Diego) ECE 174 Fall 2016 1 / 48 The Four
More informationAppendix 3. Derivatives of Vectors and Vector-Valued Functions DERIVATIVES OF VECTORS AND VECTOR-VALUED FUNCTIONS
Appendix 3 Derivatives of Vectors and Vector-Valued Functions Draft Version 1 December 2000, c Dec. 2000, B. Walsh and M. Lynch Please email any comments/corrections to: jbwalsh@u.arizona.edu DERIVATIVES
More informationCE 191: Civil and Environmental Engineering Systems Analysis. LEC 05 : Optimality Conditions
CE 191: Civil and Environmental Engineering Systems Analysis LEC : Optimality Conditions Professor Scott Moura Civil & Environmental Engineering University of California, Berkeley Fall 214 Prof. Moura
More informationUnconstrained optimization
Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout
More informationEC /11. Math for Microeconomics September Course, Part II Lecture Notes. Course Outline
LONDON SCHOOL OF ECONOMICS Professor Leonardo Felli Department of Economics S.478; x7525 EC400 20010/11 Math for Microeconomics September Course, Part II Lecture Notes Course Outline Lecture 1: Tools for
More information10.34 Numerical Methods Applied to Chemical Engineering Fall Quiz #1 Review
10.34 Numerical Methods Applied to Chemical Engineering Fall 2015 Quiz #1 Review Study guide based on notes developed by J.A. Paulson, modified by K. Severson Linear Algebra We ve covered three major topics
More informationOptimality Conditions
Chapter 2 Optimality Conditions 2.1 Global and Local Minima for Unconstrained Problems When a minimization problem does not have any constraints, the problem is to find the minimum of the objective function.
More information1. Method 1: bisection. The bisection methods starts from two points a 0 and b 0 such that
Chapter 4 Nonlinear equations 4.1 Root finding Consider the problem of solving any nonlinear relation g(x) = h(x) in the real variable x. We rephrase this problem as one of finding the zero (root) of a
More informationOptimality Conditions for Constrained Optimization
72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)
More informationMachine Learning Brett Bernstein. Recitation 1: Gradients and Directional Derivatives
Machine Learning Brett Bernstein Recitation 1: Gradients and Directional Derivatives Intro Question 1 We are given the data set (x 1, y 1 ),, (x n, y n ) where x i R d and y i R We want to fit a linear
More informationE5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization
E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained
More informationGradient Descent. Sargur Srihari
Gradient Descent Sargur srihari@cedar.buffalo.edu 1 Topics Simple Gradient Descent/Ascent Difficulties with Simple Gradient Descent Line Search Brent s Method Conjugate Gradient Descent Weight vectors
More informationMath 302 Outcome Statements Winter 2013
Math 302 Outcome Statements Winter 2013 1 Rectangular Space Coordinates; Vectors in the Three-Dimensional Space (a) Cartesian coordinates of a point (b) sphere (c) symmetry about a point, a line, and a
More informationVectors. January 13, 2013
Vectors January 13, 2013 The simplest tensors are scalars, which are the measurable quantities of a theory, left invariant by symmetry transformations. By far the most common non-scalars are the vectors,
More informationAppendix 5. Derivatives of Vectors and Vector-Valued Functions. Draft Version 24 November 2008
Appendix 5 Derivatives of Vectors and Vector-Valued Functions Draft Version 24 November 2008 Quantitative genetics often deals with vector-valued functions and here we provide a brief review of the calculus
More informationMath 273a: Optimization Basic concepts
Math 273a: Optimization Basic concepts Instructor: Wotao Yin Department of Mathematics, UCLA Spring 2015 slides based on Chong-Zak, 4th Ed. Goals of this lecture The general form of optimization: minimize
More informationLECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION
15-382 COLLECTIVE INTELLIGENCE - S19 LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION TEACHER: GIANNI A. DI CARO WHAT IF WE HAVE ONE SINGLE AGENT PSO leverages the presence of a swarm: the outcome
More information(Extended) Kalman Filter
(Extended) Kalman Filter Brian Hunt 7 June 2013 Goals of Data Assimilation (DA) Estimate the state of a system based on both current and all past observations of the system, using a model for the system
More informationSome definitions. Math 1080: Numerical Linear Algebra Chapter 5, Solving Ax = b by Optimization. A-inner product. Important facts
Some definitions Math 1080: Numerical Linear Algebra Chapter 5, Solving Ax = b by Optimization M. M. Sussman sussmanm@math.pitt.edu Office Hours: MW 1:45PM-2:45PM, Thack 622 A matrix A is SPD (Symmetric
More informationLecture 3. Optimization Problems and Iterative Algorithms
Lecture 3 Optimization Problems and Iterative Algorithms January 13, 2016 This material was jointly developed with Angelia Nedić at UIUC for IE 598ns Outline Special Functions: Linear, Quadratic, Convex
More informationFunctions of Several Variables
Jim Lambers MAT 419/519 Summer Session 2011-12 Lecture 2 Notes These notes correspond to Section 1.2 in the text. Functions of Several Variables We now generalize the results from the previous section,
More informationPerformance Surfaces and Optimum Points
CSC 302 1.5 Neural Networks Performance Surfaces and Optimum Points 1 Entrance Performance learning is another important class of learning law. Network parameters are adjusted to optimize the performance
More information10-725/36-725: Convex Optimization Prerequisite Topics
10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the
More informationAPPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.
APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product
More informationSF2822 Applied Nonlinear Optimization. Preparatory question. Lecture 9: Sequential quadratic programming. Anders Forsgren
SF2822 Applied Nonlinear Optimization Lecture 9: Sequential quadratic programming Anders Forsgren SF2822 Applied Nonlinear Optimization, KTH / 24 Lecture 9, 207/208 Preparatory question. Try to solve theory
More informationMathematical Economics. Lecture Notes (in extracts)
Prof. Dr. Frank Werner Faculty of Mathematics Institute of Mathematical Optimization (IMO) http://math.uni-magdeburg.de/ werner/math-ec-new.html Mathematical Economics Lecture Notes (in extracts) Winter
More informationVariational Methods & Optimal Control
Variational Methods & Optimal Control lecture 02 Matthew Roughan Discipline of Applied Mathematics School of Mathematical Sciences University of Adelaide April 14, 2016
More informationNonlinear Optimization
Nonlinear Optimization (Com S 477/577 Notes) Yan-Bin Jia Nov 7, 2017 1 Introduction Given a single function f that depends on one or more independent variable, we want to find the values of those variables
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationIntroduction to Optimization Techniques. Nonlinear Optimization in Function Spaces
Introduction to Optimization Techniques Nonlinear Optimization in Function Spaces X : T : Gateaux and Fréchet Differentials Gateaux and Fréchet Differentials a vector space, Y : a normed space transformation
More informationMATHEMATICAL ECONOMICS: OPTIMIZATION. Contents
MATHEMATICAL ECONOMICS: OPTIMIZATION JOÃO LOPES DIAS Contents 1. Introduction 2 1.1. Preliminaries 2 1.2. Optimal points and values 2 1.3. The optimization problems 3 1.4. Existence of optimal points 4
More informationare continuous). Then we also know that the above function g is continuously differentiable on [a, b]. For that purpose, we will first show that
Derivatives and Integrals Now suppose f : [a, b] [c, d] IR is continuous. It is easy to see that for each fixed x [a, b] the function f x (y) = f(x, y) is continuous on [c, d] (note that f x is sometimes
More informationAlgorithms for Nonsmooth Optimization
Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization
More informationSubject: Optimal Control Assignment-1 (Related to Lecture notes 1-10)
Subject: Optimal Control Assignment- (Related to Lecture notes -). Design a oil mug, shown in fig., to hold as much oil possible. The height and radius of the mug should not be more than 6cm. The mug must
More informationChapter 5 Elements of Calculus
Chapter 5 Elements of Calculus An Introduction to Optimization Spring, 2015 Wei-Ta Chu 1 Sequences and Limits A sequence of real numbers can be viewed as a set of numbers, which is often also denoted as
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers
Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:
More informationLagrange Multipliers
Lagrange Multipliers (Com S 477/577 Notes) Yan-Bin Jia Nov 9, 2017 1 Introduction We turn now to the study of minimization with constraints. More specifically, we will tackle the following problem: minimize
More informationNumerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen
Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen
More informationMatrix differential calculus Optimization Geoff Gordon Ryan Tibshirani
Matrix differential calculus 10-725 Optimization Geoff Gordon Ryan Tibshirani Review Matrix differentials: sol n to matrix calculus pain compact way of writing Taylor expansions, or definition: df = a(x;
More information1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:
Newton s Method Suppose we want to solve: (P:) min f (x) At x = x, f (x) can be approximated by: n x R. f (x) h(x) := f ( x)+ f ( x) T (x x)+ (x x) t H ( x)(x x), 2 which is the quadratic Taylor expansion
More information5 Handling Constraints
5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationCalculus 2502A - Advanced Calculus I Fall : Local minima and maxima
Calculus 50A - Advanced Calculus I Fall 014 14.7: Local minima and maxima Martin Frankland November 17, 014 In these notes, we discuss the problem of finding the local minima and maxima of a function.
More informationN. L. P. NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP. Optimization. Models of following form:
0.1 N. L. P. Katta G. Murty, IOE 611 Lecture slides Introductory Lecture NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP does not include everything
More informationNonlinear Optimization: What s important?
Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global
More informationLinear and logistic regression
Linear and logistic regression Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Linear and logistic regression 1/22 Outline 1 Linear regression 2 Logistic regression 3 Fisher discriminant analysis
More informationLecture 11: October 2
10-725: Optimization Fall 2012 Lecture 11: October 2 Lecturer: Geoff Gordon/Ryan Tibshirani Scribes: Tongbo Huang, Shoou-I Yu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes
More informationECE580 Partial Solution to Problem Set 3
ECE580 Fall 2015 Solution to Problem Set 3 October 23, 2015 1 ECE580 Partial Solution to Problem Set 3 These problems are from the textbook by Chong and Zak, 4th edition, which is the textbook for the
More informationKalman Filter. Predict: Update: x k k 1 = F k x k 1 k 1 + B k u k P k k 1 = F k P k 1 k 1 F T k + Q
Kalman Filter Kalman Filter Predict: x k k 1 = F k x k 1 k 1 + B k u k P k k 1 = F k P k 1 k 1 F T k + Q Update: K = P k k 1 Hk T (H k P k k 1 Hk T + R) 1 x k k = x k k 1 + K(z k H k x k k 1 ) P k k =(I
More informationUnconstrained Optimization
1 / 36 Unconstrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University February 2, 2015 2 / 36 3 / 36 4 / 36 5 / 36 1. preliminaries 1.1 local approximation
More informationNote: Every graph is a level set (why?). But not every level set is a graph. Graphs must pass the vertical line test. (Level sets may or may not.
Curves in R : Graphs vs Level Sets Graphs (y = f(x)): The graph of f : R R is {(x, y) R y = f(x)} Example: When we say the curve y = x, we really mean: The graph of the function f(x) = x That is, we mean
More informationECE580 Exam 1 October 4, Please do not write on the back of the exam pages. Extra paper is available from the instructor.
ECE580 Exam 1 October 4, 2012 1 Name: Solution Score: /100 You must show ALL of your work for full credit. This exam is closed-book. Calculators may NOT be used. Please leave fractions as fractions, etc.
More informationThe Derivative. Appendix B. B.1 The Derivative of f. Mappings from IR to IR
Appendix B The Derivative B.1 The Derivative of f In this chapter, we give a short summary of the derivative. Specifically, we want to compare/contrast how the derivative appears for functions whose domain
More informationImplicit Functions, Curves and Surfaces
Chapter 11 Implicit Functions, Curves and Surfaces 11.1 Implicit Function Theorem Motivation. In many problems, objects or quantities of interest can only be described indirectly or implicitly. It is then
More informationUC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009
UC Berkeley Department of Electrical Engineering and Computer Science EECS 227A Nonlinear and Convex Optimization Solutions 5 Fall 2009 Reading: Boyd and Vandenberghe, Chapter 5 Solution 5.1 Note that
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationLecture 1 October 9, 2013
Probabilistic Graphical Models Fall 2013 Lecture 1 October 9, 2013 Lecturer: Guillaume Obozinski Scribe: Huu Dien Khue Le, Robin Bénesse The web page of the course: http://www.di.ens.fr/~fbach/courses/fall2013/
More informationMechanical Systems II. Method of Lagrange Multipliers
Mechanical Systems II. Method of Lagrange Multipliers Rafael Wisniewski Aalborg University Abstract. So far our approach to classical mechanics was limited to finding a critical point of a certain functional.
More informationECON 5111 Mathematical Economics
Test 1 October 1, 2010 1. Construct a truth table for the following statement: [p (p q)] q. 2. A prime number is a natural number that is divisible by 1 and itself only. Let P be the set of all prime numbers
More informationPreliminary draft only: please check for final version
ARE211, Fall2012 CALCULUS4: THU, OCT 11, 2012 PRINTED: AUGUST 22, 2012 (LEC# 15) Contents 3. Univariate and Multivariate Differentiation (cont) 1 3.6. Taylor s Theorem (cont) 2 3.7. Applying Taylor theory:
More informationUNCONSTRAINED OPTIMIZATION
UNCONSTRAINED OPTIMIZATION 6. MATHEMATICAL BASIS Given a function f : R n R, and x R n such that f(x ) < f(x) for all x R n then x is called a minimizer of f and f(x ) is the minimum(value) of f. We wish
More informationRecitation 1. Gradients and Directional Derivatives. Brett Bernstein. CDS at NYU. January 21, 2018
Gradients and Directional Derivatives Brett Bernstein CDS at NYU January 21, 2018 Brett Bernstein (CDS at NYU) Recitation 1 January 21, 2018 1 / 23 Initial Question Intro Question Question We are given
More informationBasic Math for
Basic Math for 16-720 August 23, 2002 1 Linear Algebra 1.1 Vectors and Matrices First, a reminder of a few basic notations, definitions, and terminology: Unless indicated otherwise, vectors are always
More informationMathematical Economics (ECON 471) Lecture 3 Calculus of Several Variables & Implicit Functions
Mathematical Economics (ECON 471) Lecture 3 Calculus of Several Variables & Implicit Functions Teng Wah Leo 1 Calculus of Several Variables 11 Functions Mapping between Euclidean Spaces Where as in univariate
More informationECE 680 Modern Automatic Control. Gradient and Newton s Methods A Review
ECE 680Modern Automatic Control p. 1/1 ECE 680 Modern Automatic Control Gradient and Newton s Methods A Review Stan Żak October 25, 2011 ECE 680Modern Automatic Control p. 2/1 Review of the Gradient Properties
More informationMath 273a: Optimization Netwon s methods
Math 273a: Optimization Netwon s methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 some material taken from Chong-Zak, 4th Ed. Main features of Newton s method Uses both first derivatives
More information, b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are
Quadratic forms We consider the quadratic function f : R 2 R defined by f(x) = 2 xt Ax b T x with x = (x, x 2 ) T, () where A R 2 2 is symmetric and b R 2. We will see that, depending on the eigenvalues
More informationg(x,y) = c. For instance (see Figure 1 on the right), consider the optimization problem maximize subject to
1 of 11 11/29/2010 10:39 AM From Wikipedia, the free encyclopedia In mathematical optimization, the method of Lagrange multipliers (named after Joseph Louis Lagrange) provides a strategy for finding the
More informationInformation geometry for bivariate distribution control
Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic
More informationMATH 4211/6211 Optimization Constrained Optimization
MATH 4211/6211 Optimization Constrained Optimization Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 Constrained optimization
More informationExamination paper for TMA4180 Optimization I
Department of Mathematical Sciences Examination paper for TMA4180 Optimization I Academic contact during examination: Phone: Examination date: 26th May 2016 Examination time (from to): 09:00 13:00 Permitted
More informationVisual SLAM Tutorial: Bundle Adjustment
Visual SLAM Tutorial: Bundle Adjustment Frank Dellaert June 27, 2014 1 Minimizing Re-projection Error in Two Views In a two-view setting, we are interested in finding the most likely camera poses T1 w
More informationLectures 9 and 10: Constrained optimization problems and their optimality conditions
Lectures 9 and 10: Constrained optimization problems and their optimality conditions Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lectures 9 and 10: Constrained
More informationConvex/Schur-Convex (CSC) Log-Priors
1999 6th Joint Symposium or2 Neural Cornputatiort Proceedings Convex/Schur-Convex (CSC) Log-Priors and Sparse Coding K. Kreutz-Delgado, B.D. Rao, and K. Engan* Electrical and Computer Engineering Department
More informationISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints
ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints Instructor: Prof. Kevin Ross Scribe: Nitish John October 18, 2011 1 The Basic Goal The main idea is to transform a given constrained
More informationRecursive Estimation
Recursive Estimation Raffaello D Andrea Spring 08 Problem Set 3: Extracting Estimates from Probability Distributions Last updated: April 9, 08 Notes: Notation: Unless otherwise noted, x, y, and z denote
More informationCHAPTER 2: QUADRATIC PROGRAMMING
CHAPTER 2: QUADRATIC PROGRAMMING Overview Quadratic programming (QP) problems are characterized by objective functions that are quadratic in the design variables, and linear constraints. In this sense,
More informationDescent methods. min x. f(x)
Gradient Descent Descent methods min x f(x) 5 / 34 Descent methods min x f(x) x k x k+1... x f(x ) = 0 5 / 34 Gradient methods Unconstrained optimization min f(x) x R n. 6 / 34 Gradient methods Unconstrained
More informationAn Inexact Newton Method for Nonlinear Constrained Optimization
An Inexact Newton Method for Nonlinear Constrained Optimization Frank E. Curtis Numerical Analysis Seminar, January 23, 2009 Outline Motivation and background Algorithm development and theoretical results
More information