Gradient Descent etc.

Similar documents
Notes: DERIVATIVES. Velocity and Other Rates of Change

The Derivative as a Function

Lecture XVII. Abstract We introduce the concept of directional derivative of a scalar function and discuss its relation with the gradient operator.

Lesson 6: The Derivative

Tangent Lines-1. Tangent Lines

The Derivative The rate of change

Math 212-Lecture 9. For a single-variable function z = f(x), the derivative is f (x) = lim h 0

DEFINITION OF A DERIVATIVE

Function Composition and Chain Rules

MAT 1339-S14 Class 2

f a h f a h h lim lim

1. Questions (a) through (e) refer to the graph of the function f given below. (A) 0 (B) 1 (C) 2 (D) 4 (E) does not exist

Section 15.6 Directional Derivatives and the Gradient Vector

Continuity. Example 1

THE IDEA OF DIFFERENTIABILITY FOR FUNCTIONS OF SEVERAL VARIABLES Math 225

HOMEWORK HELP 2 FOR MATH 151

MATH CALCULUS I 2.1: Derivatives and Rates of Change

Section 2.1 The Definition of the Derivative. We are interested in finding the slope of the tangent line at a specific point.

2.11 That s So Derivative

Solution. Solution. f (x) = (cos x)2 cos(2x) 2 sin(2x) 2 cos x ( sin x) (cos x) 4. f (π/4) = ( 2/2) ( 2/2) ( 2/2) ( 2/2) 4.

1 Lecture 13: The derivative as a function.

11.6 DIRECTIONAL DERIVATIVES AND THE GRADIENT VECTOR

Differentiation Rules and Formulas

Click here to see an animation of the derivative

Section 2.7 Derivatives and Rates of Change Part II Section 2.8 The Derivative as a Function. at the point a, to be. = at time t = a is

Name: Answer Key No calculators. Show your work! 1. (21 points) All answers should either be,, a (finite) real number, or DNE ( does not exist ).

A.P. CALCULUS (AB) Outline Chapter 3 (Derivatives)

1 The concept of limits (p.217 p.229, p.242 p.249, p.255 p.256) 1.1 Limits Consider the function determined by the formula 3. x since at this point

. Compute the following limits.

1. Consider the trigonometric function f(t) whose graph is shown below. Write down a possible formula for f(t).

2.1 THE DEFINITION OF DERIVATIVE

MVT and Rolle s Theorem

Differentiation in higher dimensions

How to Find the Derivative of a Function: Calculus 1

Department of Mathematics, K.T.H.M. College, Nashik F.Y.B.Sc. Calculus Practical (Academic Year )

Math Spring 2013 Solutions to Assignment # 3 Completion Date: Wednesday May 15, (1/z) 2 (1/z 1) 2 = lim

Higher Derivatives. Differentiable Functions

Function Composition and Chain Rules

MAT 145. Type of Calculator Used TI-89 Titanium 100 points Score 100 possible points

Polynomial Functions. Linear Functions. Precalculus: Linear and Quadratic Functions

ENGI Gradient, Divergence, Curl Page 5.01

Analytic Functions. Differentiable Functions of a Complex Variable

LIMITS AND DERIVATIVES CONDITIONS FOR THE EXISTENCE OF A LIMIT

UNIVERSITY OF MANITOBA DEPARTMENT OF MATHEMATICS MATH 1510 Applied Calculus I FIRST TERM EXAMINATION - Version A October 12, :30 am

MTH-112 Quiz 1 Name: # :

Poisson Equation in Sobolev Spaces

. If lim. x 2 x 1. f(x+h) f(x)

INTRODUCTION TO CALCULUS LIMITS

NUMERICAL DIFFERENTIATION. James T. Smith San Francisco State University. In calculus classes, you compute derivatives algebraically: for example,

Example: f(x) = x 3. 1, x > 0 0, x 0. Example: g(x) =

Continuity and Differentiability Worksheet

Section 3: The Derivative Definition of the Derivative

(a) At what number x = a does f have a removable discontinuity? What value f(a) should be assigned to f at x = a in order to make f continuous at a?

Exam 1 Review Solutions

Practice Problem Solutions: Exam 1

Section 3.1: Derivatives of Polynomials and Exponential Functions

Derivatives. if such a limit exists. In this case when such a limit exists, we say that the function f is differentiable.

Logarithmic functions

Solutions to the Multivariable Calculus and Linear Algebra problems on the Comprehensive Examination of January 31, 2014

Main Points: 1. Limit of Difference Quotients. Prep 2.7: Derivatives and Rates of Change. Names of collaborators:

4. The slope of the line 2x 7y = 8 is (a) 2/7 (b) 7/2 (c) 2 (d) 2/7 (e) None of these.

Integral Calculus, dealing with areas and volumes, and approximate areas under and between curves.

Continuity and Differentiability of the Trigonometric Functions

2.8 The Derivative as a Function

Introduction to Derivatives

Math 242: Principles of Analysis Fall 2016 Homework 7 Part B Solutions

Math 1210 Midterm 1 January 31st, 2014

Key Concepts. Important Techniques. 1. Average rate of change slope of a secant line. You will need two points ( a, the formula: to find value

Test 2 Review. 1. Find the determinant of the matrix below using (a) cofactor expansion and (b) row reduction. A = 3 2 =

1 Calculus. 1.1 Gradients and the Derivative. Q f(x+h) f(x)

Applications of the van Trees inequality to non-parametric estimation.

MA119-A Applied Calculus for Business Fall Homework 4 Solutions Due 9/29/ :30AM

University Mathematics 2

Numerical Differentiation

Finding and Using Derivative The shortcuts

MATH1901 Differential Calculus (Advanced)

Mathematics 5 Worksheet 11 Geometry, Tangency, and the Derivative

Math Module Preliminary Test Solutions

Polynomial Interpolation

Math 161 (33) - Final exam

MATH 155A FALL 13 PRACTICE MIDTERM 1 SOLUTIONS. needs to be non-zero, thus x 1. Also 1 +

Lesson 4 - Limits & Instantaneous Rates of Change

Order of Accuracy. ũ h u Ch p, (1)

MATH 1A Midterm Practice September 29, 2014

Some Review Problems for First Midterm Mathematics 1300, Calculus 1

A SHORT INTRODUCTION TO BANACH LATTICES AND

1 1. Rationalize the denominator and fully simplify the radical expression 3 3. Solution: = 1 = 3 3 = 2

5.1 We will begin this section with the definition of a rational expression. We

Derivatives and Rates of Change

JANE PROFESSOR WW Prob Lib1 Summer 2000

y = 3 2 x 3. The slope of this line is 3 and its y-intercept is (0, 3). For every two units to the right, the line rises three units vertically.

Precalculus Test 2 Practice Questions Page 1. Note: You can expect other types of questions on the test than the ones presented here!

Bob Brown Math 251 Calculus 1 Chapter 3, Section 1 Completed 1 CCBC Dundalk

5. (a) Find the slope of the tangent line to the parabola y = x + 2x

1 Solutions to the in class part

Derivative as Instantaneous Rate of Change

Preface. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

REVIEW LAB ANSWER KEY

2.3 More Differentiation Patterns

Using the definition of the derivative of a function is quite tedious. f (x + h) f (x)

Transcription:

1 Gradient Descent etc EE 13: Networked estimation and control Prof Kan) I DERIVATIVE Consider f : R R x fx) Te derivative is defined as d fx) = lim dx fx + ) fx) Te cain rule states tat if d d f gx) ) = dx dx fx) x gx) How do we arrive to te cain rule from te definition? Consider z = fx y) x = gt) y = t) Ten d dx gx) d dt fx y) = d dx dt dx fx y) + d dy fx y) dt dy = d fx y)dx dx dt + d fx y)dy dy dt II DERIVATIVE AND DESCENT Consider f : R R Te derivative is defined as fx) d fx + ɛ) fx) fx) = lim dx ɛ ɛ fx + ɛ) fx) + ɛ fx) Hence if fx) > ten f decreases to te left In oter words if ɛ is a small enoug positive number f x ɛsign fx)) ) < fx) It implies tat an algoritm were x is updated as x x ɛsign fx)) moves in te direction of decreasing fx) A plausible suc algoritm is x k+1 = x k ɛ fx k ) wic is called gradient descent It does work for a small enoug ɛ but wat is a proper bound?

III CONVEX FUNCTIONS A convex function f : R R is suc tat x 1 x 2 R and t [ 1] ftx 1 + 1 t)x 2 ) tfx 1 ) + 1 t)fx 2 ) Let f be te derivative of f First f may not be differentiable everywere eg x A differentiable function of one variable is convex on an interval if and only if its derivative is monotonically non-decreasing on tat interval For te basic case of a differentiable function from a subset of) te real numbers to te real numbers convex is equivalent to increasing at an increasing rate A differentiable function of one variable is convex on an interval if and only if te function lies above all of its tangents: fx) fy) + f y)x y) for all x and y in te interval In particular if f c) = ten c is a global minimum of fx) Tangent: Suppose tat a curve is given as te grap of a function y = fx) To find te tangent line at te point p = a fa)) consider anoter nearby point q = a+ fa+)) on te curve Te slope of te secant line passing troug p and q is equal to te difference quotient Hence te line can be defined as As we get y fa) = fa + ) fa) a + a fa + ) fa) x a) y fa) = f a)x a) y = fa) + f a)x a) As an example consider fx) = x 2 Ten te tangent at a point a R is defined by te line: y = a 2 + 2ax a) = a 2 + 2ax 2a 2 = a 2 + 2ax 2

IV METHOD OF STEEPEST DESCENT Convex: A convex function f : R n R is suc tat x 1 x 2 R n and t [ 1] ftx 1 + 1 t)x 2 ) tfx 1 ) + 1 t)fx 2 ) A function f : R n R wit Lipscitz-continuous gradient is suc tat for some L > Suc a function is also called L-smoot Lemma 1 If f is L-smoot ten fx) fy) 2 L x y x y R n fx) fy) fy) x y) L 2 x y 2 2 Proof Consider gt) = fy + tx y)) We ave g) = fy) g1) = fx) and We ave g t) = fy + tx y)) x y) g t)dt = g1) g) fx) fy) fy) x y) = g1) g) fy) x y) = fy + tx y)) x y)dt fy) x y)dt fy + tx y)) x y) fy) x y)) dt [by Caucy-Scwartz] Finally we ave and te proof follows = fy + tx y)) fy) ) x y) dt fy + tx y)) fy) 2 x y 2 dt L y + tx y) y 2 x y 2 dt L x y 2 2 L 2 x y 2 2 t dt fx) fy) fy) x y) L 2 x y 2 2 If f is strongly-convex wit constant m > ten 2 f mi is positive semidefinite If f is L-smoot ten 2 f LI is negative semidefinite 3

ttp://wwwstatcmuedu/ ryantibs/convexopt-f13/scribes/lec6pdf Teorem 1 Let f : R n R be convex and differentiable wit a Lipscitz-continuous gradient wit constant L > Ten if we run gradient descent for k iterations wit a fixed stepsize ɛ 1/L it will yield a solution x k suc tat fx k ) fx ) x x 2 2 2ɛk Proof Since f as a Lipscitz-continuous gradient we ave from Lemma 1: fy) fx) + fx) y x) + 1 2 L y x 2 2 Let us plug in te gradient descent update: y = x k+1 = x k ɛ fx k ) to obtain: fx k+1 ) fx k ) + fx k ) x k+1 x k ) + 1 2 L x k+1 x k 2 2 fx k ) + fx k ) ɛ fx k )) + 1 2 L ɛ fx k) 2 2 fx k ) ɛ fx k ) 2 2 + 1 2 Lɛ2 fx k ) 2 2 ) 1 fx k ) + 2 Lɛ 1 ɛ fx k ) 2 2 ɛ 1 L fx k+1 ) fx k ) 1 2 ɛ fx k) 2 2 1) wic sows tat fx k ) decreases at every k unless fx k ) = true wen x k = x because f is convex and differentiable Since f is convex it lies above all of its tangents and) we ave fx ) fx) + fx) x x) wic leads to From earlier we ave fx k+1 ) fx k ) 1 2 ɛ fx k) 2 2 fx k ) fx ) + fx k ) x k x ) fx ) + fx k ) x k x ) 1 2 ɛ fx k) 2 2 fx k+1 ) fx ) fx k ) x k x ) 1 2 ɛ fx k) 2 2 1 2ɛ fx k ) x k x ) ɛ 2 fx k ) 2 2 2ɛ 1 2ɛ fx k ) x k x ) ɛ 2 fx k ) 2 2 x k x 2 2 2ɛ }{{} + x k x 2 2 1 xk x ɛ fx k ) 2 2 + x k x 2 ) 2 2ɛ ) ) 4

Continuing: fx k+1 ) fx ) 1 2ɛ Sum bot sides over k = K 1: Consider K 1 k= fx k+1 ) fx )) 1 2ɛ 1 2ɛ xk x 2 2 x k+1 x 2 ) 2 K 1 k= xk x 2 2 x k+1 x 2 ) 2 x x 2 2 x 1 x 2 2+ x 1 x 2 2 x 2 x 2 2 + x K 1 x 2 2 x K x 2 2) = 1 x x 2 2 x K x 2 ) 2 2ɛ 1 x x 2 ) 2 2ɛ Kfx K ) fx )) = fx K ) + fx K ) + + fx K ) Kfx ) Finally we obtain fx K ) + fx K 1 ) + + fx 1 ) Kfx ) [see 1)] K 1 ) = fx k+1 ) Kfx ) k= K 1 = k= 1 2ɛ fx k+1 ) fx ) x x 2 ) 2 ) and te teorem follows fx K ) fx ) x x 2 2 2ɛK 5

V GRADIENT For f : R n R we generalize derivative wit a gradient f were x 1 fx) f = x n fx) 6

VI PARTIAL DERIVATIVE Consider f : R 2 R were R 2 is indexed by x and y Ten for some a b) R 2 partial derivative of f wit respect to x at te point a b) is defined as fa b) = lim x Let us expand te notation to a vector a R 2 : Hence for an arbitrary direction u R 2 : fa + b) fa b) fa + 1 ) fa) fa) = lim x fa) = lim u fa + u) fa) notice owever tat te direction is not scaled ere In oter words te definition canges for 2u unless we trow in a normalization on te direction Directional derivative is denoted as: u fa) = u fa) 7

VII DIRECTIONAL DERIVATIVE: FORMAL Definition 1 Directional derivative) Consider a function: f : R n derivative of f in an arbitrary direction u R n is given by u fa) = fa) = lim u fa + u) fa) R Te directional fa 1 + u 1 a 2 + u 2 a n + u n ) fa 1 a 2 a n ) = lim Lemma 2 Te directional derivative is given by u fx) = u fx) Proof Consider f : R 2 R for convenience Let R 2 be indexed by x y) Fix an arbitrary point x y ) R 2 and an arbitrary direction a b) R 2 At tis point define gz) = fx + za y + zb) As g is a function of a single variable z we can define leading to g z) d gz) = lim dz gz + ) gz) g ) d dz gz) g) g) fx + a y + b) fx y ) z= = lim = lim = ab) fx y ) 2) by definition We tus ave g ) = ab) fx y ) Now let x = x + za and y = y + zb Ten gz) = fx y) We ave g z) = d dz gz) = d dx fx y) = fx y) dz x dz + dy fx y) y dz = x fx y) a + fx y) b y Wit z = we get x = x and y = y and Combine 2) and 3) to get ab) fx y ) = generalizes to u fx) = wic completes te proof g ) = x fx y ) a + y fx y ) b 3) x fx y ) a + y fx y ) b Te above fx)u 1 + + fx)u n = u f = u f = f u x 1 x n 8

Lemma 3 Te maximum value of te directional derivative u fx) is in te direction of te gradient Proof Directional derivative is a dot product wic is max wen te angle of te cosine is Lemma 4 Te gradient vector is ortogonal to te level yper-)curve fx) = c at te point x Proof 9

VIII DIRECTIONAL DERIVATIVE Consider f : R n R Te scalar notion of derivative does not apply and te derivative must be specified along a particular direction in R n ie te rate of cange of f in a direction u It is defined as fx + αu) fx) u fx) = lim α α Te gradient of a function in an arbitrary direction u is given by u fx) = u fx) u 2 = u u = 1 Consider a function f wit domain R 3 ie over tree variables x 1 x 2 x 3 Te tree principal directions of tis space are tus e 1 e 2 e 3 If we coose u 1 to be e 1 ten e1 fx) = x 1 fx) Because we ave already specified a particular direction te directional derivative is a scalar We furter ave u fx) = u fx) = u fx) = u fx) cos θ were θ is te angle between te gradient vector and te direction u Te directional derivative tus is te maximum wen θ = ie wen u is along te direction of te gradient; tis is were f increases te most Similarly te directional derivative is te minimum wen θ = π ie wen u is opposite to te direction of te gradient; tis is were f decreases te most A Example Consider fx 1 x 2 ) = 4x 2 1 + x2 2 For any value fx 1 x 2 ) = c R te function becomes 4x 2 1 + x 2 2 = c and is an ellipse in R2 Te gradient of tis function is x 1 fx) fx) = x n fx) = 8x 1 2x 2 1