Recitation 1. Gradients and Directional Derivatives. Brett Bernstein. CDS at NYU. January 21, 2018

Size: px
Start display at page:

Download "Recitation 1. Gradients and Directional Derivatives. Brett Bernstein. CDS at NYU. January 21, 2018"

Transcription

1 Gradients and Directional Derivatives Brett Bernstein CDS at NYU January 21, 2018 Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23

2 Initial Question Intro Question Question We are given the data set (x 1, y 1 ),..., (x n, y n ) where x i R d and y i R. We want to fit a linear function to this data by performing empirical risk minimization. More precisely, we are using the hypothesis space F = {f (x) = w T x w R d } and the loss function l(a, y) = (a y) 2. Given an initial guess w for the empirical risk minimizing parameter vector, how could we improve our guess? y x Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23

3 Initial Question Intro Solution Solution The empirical risk is given by ˆR n (f ) = 1 n n l(f (x i ), y i ) = 1 n i=1 n (w T x i y i ) 2 = 1 n Xw y 2 2, i=1 where X R n d is the matrix whose ith row is given by x i. Can improve a non-optimal guess w by taking a small step in the direction of the negative gradient. Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23

4 Single Variable Calculus Single Variable Differentiation Calculus lets us turn non-linear problems into linear algebra. For f : R R differentiable, the derivative is given by f f (x + h) f (x) (x) = lim. h 0 h Can also be written as f (x + h) = f (x) + hf (x) + o(h) as h 0, where o(h) denotes a function g(h) with g(h)/h 0 as h 0. Points with f (x) = 0 are called critical points. Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23

5 Single Variable Calculus 1D Linear Approximation By Derivative f(t) f(x 0 ) + (t x 0 )f (x 0 ) f(t) (f(x 0 ) + (t x 0 )f (x 0 )) (x 0, f(x 0 )) Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23

6 Multivariable Differentiation Consider now a function f : R n R with inputs of the form x = (x 1,..., x n ) R n. Unlike the 1-dimensional case, we cannot assign a single number to the slope at a point since there are many directions we can move in. Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23

7 Multiple Possible Directions for f : R 2 R Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23

8 Directional Derivative Definition Let f : R n R. The directional derivative f (x; u) of f at x R n in the direction u R n is given by f f (x + hu) f (x) (x; u) = lim. h 0 h By fixing a direction u we turned our multidimensional problem into a 1-dimensional problem. Similar to 1-d we have f (x + hu) = f (x) + hf (x; u) + o(h). We say that u is a descent direction of f at x if f (x; u) < 0. Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23

9 Directional Derivative as a Slope of a Slice Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23

10 Partial Derivative i 1 {}}{ Let e i = ( 0, 0,..., 0, 1, 0,..., 0) denote the ith standard basis vector. The ith partial derivative is defined to be the directional derivative along e i. It can be written many ways: f (x; e i ) = x i f (x) = xi f (x) = i f (x). What is the intuitive meaning of xi f (x)? For example, what does a large value of x3 f (x) imply? Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23

11 Differentiability We say a function f : R n R is differentiable at x R n if for some g R n. f (x + v) f (x) g T v lim = 0, v 0 v 2 If it exists, this g is unique and is called the gradient of f at x with notation g = f (x). It can be shown that f (x) = x1 f (x). xn f (x). Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23

12 Useful Convention Consider f : R p+q R. Split the input x R p+q into parts w R p and z R q so that x = (w, z). Define the partial gradients w1 f (w, z) w f (w, z) :=. wp f (w, z) and z f (w, z) := z1 f (w, z). zq f (w, z). Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23

13 Tangent Plane Analogous to the 1-d case we can express differentiability as f (x + v) = f (x) + f (x) T v + o( v 2 ). The approximation f (x + v) f (x) + f (x) T v gives a tangent plane at the point x. The tangent plane of f at x is given by P = {(x + v, f (x) + f (x) T v) v R n } R n+1. Methods like gradient descent approximate a function locally by its tangent plane, and then take a step accordingly. Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23

14 Tangent Plane for f : R 2 R Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23

15 Directional Derivatives from Gradients If f is differentiable we have If f (x) 0 this implies that arg max f (x; u) = u 2 =1 f (x) f (x; u) = f (x) T u. f (x) 2 and arg min u 2 =1 The gradient points in the direction of steepest ascent. f f (x) (x; u) =. f (x) 2 The negative gradient points in the direction of steepest descent. Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23

16 Critical Points Analogous to 1-d, if f : R n R is differentiable and x is a local extremum then we must have f (x) = 0. Points with f (x) = 0 are called critical points. Later we will see that for a convex differentiable function, x is a critical point if and only if it is a global minimizer. Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23

17 Critical Points of f : R 2 R Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23

18 Computing Gradients Computing Gradients Question For questions 1 and 2, compute the gradient of the given function. 1 f : R 3 R is given by 2 f : R n R is given by f (x 1, x 2, x 3 ) = log(1 + e x 1+2x 2 +3x 3 ). f (x) = Ax y 2 2 = (Ax y) T (Ax y) = x T A T Ax 2y T Ax + y T y, for some A R m n and y R m. 3 Assume A in the previous question has full column rank. What is the critical point of f? Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23

19 Computing Gradients f (x 1, x 2, x 3 ) = log(1 + e x 1+2x 2 +3x 3 ) Solution 1 We can compute the partial derivatives directly: and obtain x1 f (x 1, x 2, x 3 ) = x2 f (x 1, x 2, x 3 ) = x3 f (x 1, x 2, x 3 ) = f (x 1, x 2, x 3 ) = e x 1+2x 2 +3x e x 1+2x 2 +3x 3 2e x 1+2x 2 +3x e x 1+2x 2 +3x 3 3e x 1+2x 2 +3x e x 1+2x 2 +3x 3 e x 1+2x 2 +3x e x 1+2x 2 +3x 3 2e x 1+2x 2 +3x e x 1+2x 2 +3x 3 3e x 1+2x 2 +3x e x 1+2x 2 +3x 3. Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23

20 Computing Gradients f (x 1, x 2, x 3 ) = log(1 + e x 1+2x 2 +3x 3 ) Solution 2 Let w = (1, 2, 3) T. Write f (x) = log(1 + e w T x ). Apply a version of the chain rule: f (x) = ew T x 1 + e w T x w. Theorem (Chain Rule) If g : R R and h : R n R are differentiable then (g h)(x) = g (h(x)) h(x). Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23

21 Computing Gradients f (x) = Ax y 2 2 Solution We could use techniques similar to the previous problem, but instead we show a different method using directional derivatives. For arbitrary t R and x, v R n we have f (x + tv) = (x + tv) T A T A(x + tv) 2y T A(x + tv) + y T y = x T A T Ax + t 2 v T A T Av + 2tx T A T Av 2y T Ax 2ty T Av + y T y = f (x) + t(2x T A T A 2y T A)v + t 2 v T A T Av. This gives f f (x + tv) f (x) (x; v) = lim = (2x T A T A 2y T A)v = f (x) T v t 0 t Thus f (x) = 2(A T Ax A T y) = 2A T (Ax y). Data science interpretation of f (x)? Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23

22 Computing Gradients Critical Points of f (x) = Ax y 2 2 Need f (x) = 2A T Ax 2A T y = 0. Since A is assumed to have full column rank, we see that A T A is invertible. Thus we have x = (A T A) 1 A T y. As we will see later, this function is strictly convex (Hessian 2 f (x) = 2A T A is positive definite). Thus we have found the unique minimizer (least squares solution). Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23

23 Computing Gradients Technical Aside: Differentiability When computing the gradients above we assumed the functions were differentiable. Can use the following theorem to be completely rigorous. Theorem Let f : R n R and suppose xi f : R n R is continuous for i = 1,..., n. Then f is differentiable. Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23

Machine Learning Brett Bernstein. Recitation 1: Gradients and Directional Derivatives

Machine Learning Brett Bernstein. Recitation 1: Gradients and Directional Derivatives Machine Learning Brett Bernstein Recitation 1: Gradients and Directional Derivatives Intro Question 1 We are given the data set (x 1, y 1 ),, (x n, y n ) where x i R d and y i R We want to fit a linear

More information

Recitation 9. Gradient Boosting. Brett Bernstein. March 30, CDS at NYU. Brett Bernstein (CDS at NYU) Recitation 9 March 30, / 14

Recitation 9. Gradient Boosting. Brett Bernstein. March 30, CDS at NYU. Brett Bernstein (CDS at NYU) Recitation 9 March 30, / 14 Brett Bernstein CDS at NYU March 30, 2017 Brett Bernstein (CDS at NYU) Recitation 9 March 30, 2017 1 / 14 Initial Question Intro Question Question Suppose 10 different meteorologists have produced functions

More information

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained

More information

Lecture Notes: Geometric Considerations in Unconstrained Optimization

Lecture Notes: Geometric Considerations in Unconstrained Optimization Lecture Notes: Geometric Considerations in Unconstrained Optimization James T. Allison February 15, 2006 The primary objectives of this lecture on unconstrained optimization are to: Establish connections

More information

Convex Optimization. Problem set 2. Due Monday April 26th

Convex Optimization. Problem set 2. Due Monday April 26th Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining

More information

Nonlinear equations and optimization

Nonlinear equations and optimization Notes for 2017-03-29 Nonlinear equations and optimization For the next month or so, we will be discussing methods for solving nonlinear systems of equations and multivariate optimization problems. We will

More information

Differentiable Functions

Differentiable Functions Differentiable Functions Let S R n be open and let f : R n R. We recall that, for x o = (x o 1, x o,, x o n S the partial derivative of f at the point x o with respect to the component x j is defined as

More information

Optimization and Calculus

Optimization and Calculus Optimization and Calculus To begin, there is a close relationship between finding the roots to a function and optimizing a function. In the former case, we solve for x. In the latter, we solve: g(x) =

More information

Unconstrained minimization of smooth functions

Unconstrained minimization of smooth functions Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and

More information

ECE580 Exam 1 October 4, Please do not write on the back of the exam pages. Extra paper is available from the instructor.

ECE580 Exam 1 October 4, Please do not write on the back of the exam pages. Extra paper is available from the instructor. ECE580 Exam 1 October 4, 2012 1 Name: Solution Score: /100 You must show ALL of your work for full credit. This exam is closed-book. Calculators may NOT be used. Please leave fractions as fractions, etc.

More information

Note: Every graph is a level set (why?). But not every level set is a graph. Graphs must pass the vertical line test. (Level sets may or may not.

Note: Every graph is a level set (why?). But not every level set is a graph. Graphs must pass the vertical line test. (Level sets may or may not. Curves in R : Graphs vs Level Sets Graphs (y = f(x)): The graph of f : R R is {(x, y) R y = f(x)} Example: When we say the curve y = x, we really mean: The graph of the function f(x) = x That is, we mean

More information

Chapter 8: Taylor s theorem and L Hospital s rule

Chapter 8: Taylor s theorem and L Hospital s rule Chapter 8: Taylor s theorem and L Hospital s rule Theorem: [Inverse Mapping Theorem] Suppose that a < b and f : [a, b] R. Given that f (x) > 0 for all x (a, b) then f 1 is differentiable on (f(a), f(b))

More information

ECE 680 Modern Automatic Control. Gradient and Newton s Methods A Review

ECE 680 Modern Automatic Control. Gradient and Newton s Methods A Review ECE 680Modern Automatic Control p. 1/1 ECE 680 Modern Automatic Control Gradient and Newton s Methods A Review Stan Żak October 25, 2011 ECE 680Modern Automatic Control p. 2/1 Review of the Gradient Properties

More information

1 Convexity, concavity and quasi-concavity. (SB )

1 Convexity, concavity and quasi-concavity. (SB ) UNIVERSITY OF MARYLAND ECON 600 Summer 2010 Lecture Two: Unconstrained Optimization 1 Convexity, concavity and quasi-concavity. (SB 21.1-21.3.) For any two points, x, y R n, we can trace out the line of

More information

LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION

LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION 15-382 COLLECTIVE INTELLIGENCE - S19 LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION TEACHER: GIANNI A. DI CARO WHAT IF WE HAVE ONE SINGLE AGENT PSO leverages the presence of a swarm: the outcome

More information

Properties of the Gradient

Properties of the Gradient Properties of the Gradient Gradients and Level Curves In this section, we use the gradient and the chain rule to investigate horizontal and vertical slices of a surface of the form z = g (x; y) : To begin

More information

Gradient and Directional Derivatives October 2013

Gradient and Directional Derivatives October 2013 Gradient and Directional Derivatives 14.5 07 October 2013 function of one variable: makes sense to talk about the rate of change function of several variables: rate of change depends on direction slope

More information

Computational Optimization. Mathematical Programming Fundamentals 1/25 (revised)

Computational Optimization. Mathematical Programming Fundamentals 1/25 (revised) Computational Optimization Mathematical Programming Fundamentals 1/5 (revised) If you don t know where you are going, you probably won t get there. -from some book I read in eight grade If you do get there,

More information

MA102: Multivariable Calculus

MA102: Multivariable Calculus MA102: Multivariable Calculus Rupam Barman and Shreemayee Bora Department of Mathematics IIT Guwahati Differentiability of f : U R n R m Definition: Let U R n be open. Then f : U R n R m is differentiable

More information

E 600 Chapter 3: Multivariate Calculus

E 600 Chapter 3: Multivariate Calculus E 600 Chapter 3: Multivariate Calculus Simona Helmsmueller August 21, 2017 Goals of this lecture: Know when an inverse to a function exists, be able to graphically and analytically determine whether a

More information

ECE580 Partial Solution to Problem Set 3

ECE580 Partial Solution to Problem Set 3 ECE580 Fall 2015 Solution to Problem Set 3 October 23, 2015 1 ECE580 Partial Solution to Problem Set 3 These problems are from the textbook by Chong and Zak, 4th edition, which is the textbook for the

More information

Limits, Continuity, and the Derivative

Limits, Continuity, and the Derivative Unit #2 : Limits, Continuity, and the Derivative Goals: Study and define continuity Review limits Introduce the derivative as the limit of a difference quotient Discuss the derivative as a rate of change

More information

Week 3: Sets and Mappings/Calculus and Optimization (Jehle September and Reny, 20, 2015 Chapter A1/A2)

Week 3: Sets and Mappings/Calculus and Optimization (Jehle September and Reny, 20, 2015 Chapter A1/A2) Week 3: Sets and Mappings/Calculus and Optimization (Jehle and Reny, Chapter A1/A2) Tsun-Feng Chiang *School of Economics, Henan University, Kaifeng, China September 20, 2015 Week 3: Sets and Mappings/Calculus

More information

IFT Lecture 2 Basics of convex analysis and gradient descent

IFT Lecture 2 Basics of convex analysis and gradient descent IF 6085 - Lecture Basics of convex analysis and gradient descent his version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribes: Assya rofimov,

More information

SOLUTIONS to Exercises from Optimization

SOLUTIONS to Exercises from Optimization SOLUTIONS to Exercises from Optimization. Use the bisection method to find the root correct to 6 decimal places: 3x 3 + x 2 = x + 5 SOLUTION: For the root finding algorithm, we need to rewrite the equation

More information

B553 Lecture 1: Calculus Review

B553 Lecture 1: Calculus Review B553 Lecture 1: Calculus Review Kris Hauser January 10, 2012 This course requires a familiarity with basic calculus, some multivariate calculus, linear algebra, and some basic notions of metric topology.

More information

26. Directional Derivatives & The Gradient

26. Directional Derivatives & The Gradient 26. Directional Derivatives & The Gradient Given a multivariable function z = f(x, y) and a point on the xy-plane P 0 = (x 0, y 0 ) at which f is differentiable (i.e. it is smooth with no discontinuities,

More information

APPLICATIONS OF INTEGRATION

APPLICATIONS OF INTEGRATION 6 APPLICATIONS OF INTEGRATION APPLICATIONS OF INTEGRATION 6.5 Average Value of a Function In this section, we will learn about: Applying integration to find out the average value of a function. AVERAGE

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Mathematical Economics (ECON 471) Lecture 3 Calculus of Several Variables & Implicit Functions

Mathematical Economics (ECON 471) Lecture 3 Calculus of Several Variables & Implicit Functions Mathematical Economics (ECON 471) Lecture 3 Calculus of Several Variables & Implicit Functions Teng Wah Leo 1 Calculus of Several Variables 11 Functions Mapping between Euclidean Spaces Where as in univariate

More information

Iterative Reweighted Least Squares

Iterative Reweighted Least Squares Iterative Reweighted Least Squares Sargur. University at Buffalo, State University of ew York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative

More information

Slopes and Rates of Change

Slopes and Rates of Change Slopes and Rates of Change If a particle is moving in a straight line at a constant velocity, then the graph of the function of distance versus time is as follows s s = f(t) t s s t t = average velocity

More information

Matrix Derivatives and Descent Optimization Methods

Matrix Derivatives and Descent Optimization Methods Matrix Derivatives and Descent Optimization Methods 1 Qiang Ning Department of Electrical and Computer Engineering Beckman Institute for Advanced Science and Techonology University of Illinois at Urbana-Champaign

More information

Week 4: Calculus and Optimization (Jehle and Reny, Chapter A2)

Week 4: Calculus and Optimization (Jehle and Reny, Chapter A2) Week 4: Calculus and Optimization (Jehle and Reny, Chapter A2) Tsun-Feng Chiang *School of Economics, Henan University, Kaifeng, China September 27, 2015 Microeconomic Theory Week 4: Calculus and Optimization

More information

LOCAL SEARCH. Today. Reading AIMA Chapter , Goals Local search algorithms. Introduce adversarial search 1/31/14

LOCAL SEARCH. Today. Reading AIMA Chapter , Goals Local search algorithms. Introduce adversarial search 1/31/14 LOCAL SEARCH Today Reading AIMA Chapter 4.1-4.2, 5.1-5.2 Goals Local search algorithms n hill-climbing search n simulated annealing n local beam search n genetic algorithms n gradient descent and Newton-Rhapson

More information

Calculus Overview. f(x) f (x) is slope. I. Single Variable. A. First Order Derivative : Concept : measures slope of curve at a point.

Calculus Overview. f(x) f (x) is slope. I. Single Variable. A. First Order Derivative : Concept : measures slope of curve at a point. Calculus Overview I. Single Variable A. First Order Derivative : Concept : measures slope of curve at a point. Notation : Let y = f (x). First derivative denoted f ʹ (x), df dx, dy dx, f, etc. Example

More information

Regression with Numerical Optimization. Logistic

Regression with Numerical Optimization. Logistic CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204

More information

Lecture 10. Econ August 21

Lecture 10. Econ August 21 Lecture 10 Econ 2001 2015 August 21 Lecture 10 Outline 1 Derivatives and Partial Derivatives 2 Differentiability 3 Tangents to Level Sets Calculus! This material will not be included in today s exam. Announcement:

More information

Convex Optimization. Chapter Introduction Mathematical Optimization

Convex Optimization. Chapter Introduction Mathematical Optimization Chapter 4 Convex Optimization 4.1 Introduction 4.1.1 Mathematical Optimization The problem of mathematical optimization is to minimize a non-linear cost function f 0 (x) subject to inequality constraints

More information

C&O367: Nonlinear Optimization (Winter 2013) Assignment 4 H. Wolkowicz

C&O367: Nonlinear Optimization (Winter 2013) Assignment 4 H. Wolkowicz C&O367: Nonlinear Optimization (Winter 013) Assignment 4 H. Wolkowicz Posted Mon, Feb. 8 Due: Thursday, Feb. 8 10:00AM (before class), 1 Matrices 1.1 Positive Definite Matrices 1. Let A S n, i.e., let

More information

The Derivative. Appendix B. B.1 The Derivative of f. Mappings from IR to IR

The Derivative. Appendix B. B.1 The Derivative of f. Mappings from IR to IR Appendix B The Derivative B.1 The Derivative of f In this chapter, we give a short summary of the derivative. Specifically, we want to compare/contrast how the derivative appears for functions whose domain

More information

Chapter 13. Convex and Concave. Josef Leydold Mathematical Methods WS 2018/19 13 Convex and Concave 1 / 44

Chapter 13. Convex and Concave. Josef Leydold Mathematical Methods WS 2018/19 13 Convex and Concave 1 / 44 Chapter 13 Convex and Concave Josef Leydold Mathematical Methods WS 2018/19 13 Convex and Concave 1 / 44 Monotone Function Function f is called monotonically increasing, if x 1 x 2 f (x 1 ) f (x 2 ) It

More information

Vector Derivatives and the Gradient

Vector Derivatives and the Gradient ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 1/1 Lecture 10 ECE 275A Vector Derivatives and the Gradient ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego

More information

February 4, 2019 LECTURE 4: THE DERIVATIVE.

February 4, 2019 LECTURE 4: THE DERIVATIVE. February 4, 29 LECTURE 4: THE DERIVATIVE 2 HONORS MULTIVARIABLE CALCULUS PROFESSOR RICHARD BROWN Synopsis Today, we finish our discussion on its and pass through the concept of continuity Really, there

More information

Families of Functions, Taylor Polynomials, l Hopital s

Families of Functions, Taylor Polynomials, l Hopital s Unit #6 : Rule Families of Functions, Taylor Polynomials, l Hopital s Goals: To use first and second derivative information to describe functions. To be able to find general properties of families of functions.

More information

A function(al) f is convex if dom f is a convex set, and. f(θx + (1 θ)y) < θf(x) + (1 θ)f(y) f(x) = x 3

A function(al) f is convex if dom f is a convex set, and. f(θx + (1 θ)y) < θf(x) + (1 θ)f(y) f(x) = x 3 Convex functions The domain dom f of a functional f : R N R is the subset of R N where f is well-defined. A function(al) f is convex if dom f is a convex set, and f(θx + (1 θ)y) θf(x) + (1 θ)f(y) for all

More information

1 Overview. 2 A Characterization of Convex Functions. 2.1 First-order Taylor approximation. AM 221: Advanced Optimization Spring 2016

1 Overview. 2 A Characterization of Convex Functions. 2.1 First-order Taylor approximation. AM 221: Advanced Optimization Spring 2016 AM 221: Advanced Optimization Spring 2016 Prof. Yaron Singer Lecture 8 February 22nd 1 Overview In the previous lecture we saw characterizations of optimality in linear optimization, and we reviewed the

More information

MATH529 Fundamentals of Optimization Unconstrained Optimization II

MATH529 Fundamentals of Optimization Unconstrained Optimization II MATH529 Fundamentals of Optimization Unconstrained Optimization II Marco A. Montes de Oca Mathematical Sciences, University of Delaware, USA 1 / 31 Recap 2 / 31 Example Find the local and global minimizers

More information

Fitting Linear Statistical Models to Data by Least Squares I: Introduction

Fitting Linear Statistical Models to Data by Least Squares I: Introduction Fitting Linear Statistical Models to Data by Least Squares I: Introduction Brian R. Hunt and C. David Levermore University of Maryland, College Park Math 420: Mathematical Modeling January 25, 2012 version

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

Vector Multiplication. Directional Derivatives and the Gradient

Vector Multiplication. Directional Derivatives and the Gradient Vector Multiplication - 1 Unit # : Goals: Directional Derivatives and the Gradient To learn about dot and scalar products of vectors. To introduce the directional derivative and the gradient vector. To

More information

Parameter estimation Conditional risk

Parameter estimation Conditional risk Parameter estimation Conditional risk Formalizing the problem Specify random variables we care about e.g., Commute Time e.g., Heights of buildings in a city We might then pick a particular distribution

More information

5.) For each of the given sets of vectors, determine whether or not the set spans R 3. Give reasons for your answers.

5.) For each of the given sets of vectors, determine whether or not the set spans R 3. Give reasons for your answers. Linear Algebra - Test File - Spring Test # For problems - consider the following system of equations. x + y - z = x + y + 4z = x + y + 6z =.) Solve the system without using your calculator..) Find the

More information

SIMPLE MULTIVARIATE OPTIMIZATION

SIMPLE MULTIVARIATE OPTIMIZATION SIMPLE MULTIVARIATE OPTIMIZATION 1. DEFINITION OF LOCAL MAXIMA AND LOCAL MINIMA 1.1. Functions of variables. Let f(, x ) be defined on a region D in R containing the point (a, b). Then a: f(a, b) is a

More information

2.098/6.255/ Optimization Methods Practice True/False Questions

2.098/6.255/ Optimization Methods Practice True/False Questions 2.098/6.255/15.093 Optimization Methods Practice True/False Questions December 11, 2009 Part I For each one of the statements below, state whether it is true or false. Include a 1-3 line supporting sentence

More information

Integration - Past Edexcel Exam Questions

Integration - Past Edexcel Exam Questions Integration - Past Edexcel Exam Questions 1. (a) Given that y = 5x 2 + 7x + 3, find i. - ii. - (b) ( 1 + 3 ) x 1 x dx. [4] 2. Question 2b - January 2005 2. The gradient of the curve C is given by The point

More information

Chapter 5 Elements of Calculus

Chapter 5 Elements of Calculus Chapter 5 Elements of Calculus An Introduction to Optimization Spring, 2015 Wei-Ta Chu 1 Sequences and Limits A sequence of real numbers can be viewed as a set of numbers, which is often also denoted as

More information

Fitting Linear Statistical Models to Data by Least Squares II: Weighted

Fitting Linear Statistical Models to Data by Least Squares II: Weighted Fitting Linear Statistical Models to Data by Least Squares II: Weighted Brian R. Hunt and C. David Levermore University of Maryland, College Park Math 420: Mathematical Modeling April 21, 2014 version

More information

Math 5630: Conjugate Gradient Method Hung M. Phan, UMass Lowell March 29, 2019

Math 5630: Conjugate Gradient Method Hung M. Phan, UMass Lowell March 29, 2019 Math 563: Conjugate Gradient Method Hung M. Phan, UMass Lowell March 29, 219 hroughout, A R n n is symmetric and positive definite, and b R n. 1 Steepest Descent Method We present the steepest descent

More information

Nonlinear equations. Norms for R n. Convergence orders for iterative methods

Nonlinear equations. Norms for R n. Convergence orders for iterative methods Nonlinear equations Norms for R n Assume that X is a vector space. A norm is a mapping X R with x such that for all x, y X, α R x = = x = αx = α x x + y x + y We define the following norms on the vector

More information

Sections 2.1, 2.2 and 2.4: Limit of a function Motivation:

Sections 2.1, 2.2 and 2.4: Limit of a function Motivation: Sections 2.1, 2.2 and 2.4: Limit of a function Motivation: There are expressions which can be computed only using Algebra, meaning only using the operations +,, and. Examples which can be computed using

More information

More First-Order Optimization Algorithms

More First-Order Optimization Algorithms More First-Order Optimization Algorithms Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapters 3, 8, 3 The SDM

More information

Directional Derivatives and the Gradient

Directional Derivatives and the Gradient Unit #20 : Directional Derivatives and the Gradient Goals: To learn about dot and scalar products of vectors. To introduce the directional derivative and the gradient vector. To learn how to compute the

More information

Chapter 2. Optimization. Gradients, convexity, and ALS

Chapter 2. Optimization. Gradients, convexity, and ALS Chapter 2 Optimization Gradients, convexity, and ALS Contents Background Gradient descent Stochastic gradient descent Newton s method Alternating least squares KKT conditions 2 Motivation We can solve

More information

CS 453X: Class 20. Jacob Whitehill

CS 453X: Class 20. Jacob Whitehill CS 3X: Class 20 Jacob Whitehill More on training neural networks Training neural networks While training neural networks by hand is (arguably) fun, it is completely impractical except for toy examples.

More information

Directional Derivatives in the Plane

Directional Derivatives in the Plane Directional Derivatives in the Plane P. Sam Johnson April 10, 2017 P. Sam Johnson (NIT Karnataka) Directional Derivatives in the Plane April 10, 2017 1 / 30 Directional Derivatives in the Plane Let z =

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

APPLICATIONS OF DIFFERENTIABILITY IN R n.

APPLICATIONS OF DIFFERENTIABILITY IN R n. APPLICATIONS OF DIFFERENTIABILITY IN R n. MATANIA BEN-ARTZI April 2015 Functions here are defined on a subset T R n and take values in R m, where m can be smaller, equal or greater than n. The (open) ball

More information

, b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are

, b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are Quadratic forms We consider the quadratic function f : R 2 R defined by f(x) = 2 xt Ax b T x with x = (x, x 2 ) T, () where A R 2 2 is symmetric and b R 2. We will see that, depending on the eigenvalues

More information

Scientific Computing: Optimization

Scientific Computing: Optimization Scientific Computing: Optimization Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course MATH-GA.2043 or CSCI-GA.2112, Spring 2012 March 8th, 2011 A. Donev (Courant Institute) Lecture

More information

Math 117: Honours Calculus I Fall, 2002 List of Theorems. a n k b k. k. Theorem 2.1 (Convergent Bounded) A convergent sequence is bounded.

Math 117: Honours Calculus I Fall, 2002 List of Theorems. a n k b k. k. Theorem 2.1 (Convergent Bounded) A convergent sequence is bounded. Math 117: Honours Calculus I Fall, 2002 List of Theorems Theorem 1.1 (Binomial Theorem) For all n N, (a + b) n = n k=0 ( ) n a n k b k. k Theorem 2.1 (Convergent Bounded) A convergent sequence is bounded.

More information

Multivariable Calculus

Multivariable Calculus 2 Multivariable Calculus 2.1 Limits and Continuity Problem 2.1.1 (Fa94) Let the function f : R n R n satisfy the following two conditions: (i) f (K ) is compact whenever K is a compact subset of R n. (ii)

More information

September Math Course: First Order Derivative

September Math Course: First Order Derivative September Math Course: First Order Derivative Arina Nikandrova Functions Function y = f (x), where x is either be a scalar or a vector of several variables (x,..., x n ), can be thought of as a rule which

More information

l 1 and l 2 Regularization

l 1 and l 2 Regularization David Rosenberg New York University February 5, 2015 David Rosenberg (New York University) DS-GA 1003 February 5, 2015 1 / 32 Tikhonov and Ivanov Regularization Hypothesis Spaces We ve spoken vaguely about

More information

Motivation: We have already seen an example of a system of nonlinear equations when we studied Gaussian integration (p.8 of integration notes)

Motivation: We have already seen an example of a system of nonlinear equations when we studied Gaussian integration (p.8 of integration notes) AMSC/CMSC 460 Computational Methods, Fall 2007 UNIT 5: Nonlinear Equations Dianne P. O Leary c 2001, 2002, 2007 Solving Nonlinear Equations and Optimization Problems Read Chapter 8. Skip Section 8.1.1.

More information

Optimization Tutorial 1. Basic Gradient Descent

Optimization Tutorial 1. Basic Gradient Descent E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.

More information

8.7 Taylor s Inequality Math 2300 Section 005 Calculus II. f(x) = ln(1 + x) f(0) = 0

8.7 Taylor s Inequality Math 2300 Section 005 Calculus II. f(x) = ln(1 + x) f(0) = 0 8.7 Taylor s Inequality Math 00 Section 005 Calculus II Name: ANSWER KEY Taylor s Inequality: If f (n+) is continuous and f (n+) < M between the center a and some point x, then f(x) T n (x) M x a n+ (n

More information

Introduction to gradient descent

Introduction to gradient descent 6-1: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction to gradient descent Derivation and intuitions Hessian 6-2: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction Our

More information

MAT 473 Intermediate Real Analysis II

MAT 473 Intermediate Real Analysis II MAT 473 Intermediate Real Analysis II John Quigg Spring 2009 revised February 5, 2009 Derivatives Here our purpose is to give a rigorous foundation of the principles of differentiation in R n. Much of

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

10-701/ Recitation : Linear Algebra Review (based on notes written by Jing Xiang)

10-701/ Recitation : Linear Algebra Review (based on notes written by Jing Xiang) 10-701/15-781 Recitation : Linear Algebra Review (based on notes written by Jing Xiang) Manojit Nandi February 1, 2014 Outline Linear Algebra General Properties Matrix Operations Inner Products and Orthogonal

More information

d(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N

d(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N Problem 1. Let f : A R R have the property that for every x A, there exists ɛ > 0 such that f(t) > ɛ if t (x ɛ, x + ɛ) A. If the set A is compact, prove there exists c > 0 such that f(x) > c for all x

More information

Comparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming Problems

Comparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming Problems International Journal of Scientific and Research Publications, Volume 3, Issue 10, October 013 1 ISSN 50-3153 Comparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming

More information

Solutions to Test #1 MATH 2421

Solutions to Test #1 MATH 2421 Solutions to Test # MATH Pulhalskii/Kawai (#) Decide whether the following properties are TRUE or FALSE for arbitrary vectors a; b; and c: Circle your answer. [Remember, TRUE means that the statement is

More information

n f(k) k=1 means to evaluate the function f(k) at k = 1, 2,..., n and add up the results. In other words: n f(k) = f(1) + f(2) f(n). 1 = 2n 2.

n f(k) k=1 means to evaluate the function f(k) at k = 1, 2,..., n and add up the results. In other words: n f(k) = f(1) + f(2) f(n). 1 = 2n 2. Handout on induction and written assignment 1. MA113 Calculus I Spring 2007 Why study mathematical induction? For many students, mathematical induction is an unfamiliar topic. Nonetheless, this is an important

More information

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method. Optimization Unconstrained optimization One-dimensional Multi-dimensional Newton s method Basic Newton Gauss- Newton Quasi- Newton Descent methods Gradient descent Conjugate gradient Constrained optimization

More information

Gradient Descent. Dr. Xiaowei Huang

Gradient Descent. Dr. Xiaowei Huang Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,

More information

MATH 23b, SPRING 2005 THEORETICAL LINEAR ALGEBRA AND MULTIVARIABLE CALCULUS Midterm (part 1) Solutions March 21, 2005

MATH 23b, SPRING 2005 THEORETICAL LINEAR ALGEBRA AND MULTIVARIABLE CALCULUS Midterm (part 1) Solutions March 21, 2005 MATH 23b, SPRING 2005 THEORETICAL LINEAR ALGEBRA AND MULTIVARIABLE CALCULUS Midterm (part 1) Solutions March 21, 2005 1. True or False (22 points, 2 each) T or F Every set in R n is either open or closed

More information

ECS171: Machine Learning

ECS171: Machine Learning ECS171: Machine Learning Lecture 3: Linear Models I (LFD 3.2, 3.3) Cho-Jui Hsieh UC Davis Jan 17, 2018 Linear Regression (LFD 3.2) Regression Classification: Customer record Yes/No Regression: predicting

More information

Vector Functions & Space Curves MATH 2110Q

Vector Functions & Space Curves MATH 2110Q Vector Functions & Space Curves Vector Functions & Space Curves Vector Functions Definition A vector function or vector-valued function is a function that takes real numbers as inputs and gives vectors

More information

(Extended) Kalman Filter

(Extended) Kalman Filter (Extended) Kalman Filter Brian Hunt 7 June 2013 Goals of Data Assimilation (DA) Estimate the state of a system based on both current and all past observations of the system, using a model for the system

More information

Existence of minimizers

Existence of minimizers Existence of imizers We have just talked a lot about how to find the imizer of an unconstrained convex optimization problem. We have not talked too much, at least not in concrete mathematical terms, about

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

Lecture 3 - Linear and Logistic Regression

Lecture 3 - Linear and Logistic Regression 3 - Linear and Logistic Regression-1 Machine Learning Course Lecture 3 - Linear and Logistic Regression Lecturer: Haim Permuter Scribe: Ziv Aharoni Throughout this lecture we talk about how to use regression

More information

The Inverse Function Theorem 1

The Inverse Function Theorem 1 John Nachbar Washington University April 11, 2014 1 Overview. The Inverse Function Theorem 1 If a function f : R R is C 1 and if its derivative is strictly positive at some x R, then, by continuity of

More information

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems Robert M. Freund February 2016 c 2016 Massachusetts Institute of Technology. All rights reserved. 1 1 Introduction

More information

1 Functions of Several Variables Some Examples Level Curves / Contours Functions of More Variables... 6

1 Functions of Several Variables Some Examples Level Curves / Contours Functions of More Variables... 6 Contents 1 Functions of Several Variables 1 1.1 Some Examples.................................. 2 1.2 Level Curves / Contours............................. 4 1.3 Functions of More Variables...........................

More information

10. Unconstrained minimization

10. Unconstrained minimization Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation

More information

a b c d e GOOD LUCK! 3. a b c d e 12. a b c d e 4. a b c d e 13. a b c d e 5. a b c d e 14. a b c d e 6. a b c d e 15. a b c d e

a b c d e GOOD LUCK! 3. a b c d e 12. a b c d e 4. a b c d e 13. a b c d e 5. a b c d e 14. a b c d e 6. a b c d e 15. a b c d e MA Elem. Calculus Fall 07 Exam 07-09- Name: Sec.: Do not remove this answer page you will turn in the entire exam. No books or notes may be used. You may use an ACT-approved calculator during the exam,

More information