Recitation 1. Gradients and Directional Derivatives. Brett Bernstein. CDS at NYU. January 21, 2018
|
|
- Randall Payne
- 5 years ago
- Views:
Transcription
1 Gradients and Directional Derivatives Brett Bernstein CDS at NYU January 21, 2018 Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23
2 Initial Question Intro Question Question We are given the data set (x 1, y 1 ),..., (x n, y n ) where x i R d and y i R. We want to fit a linear function to this data by performing empirical risk minimization. More precisely, we are using the hypothesis space F = {f (x) = w T x w R d } and the loss function l(a, y) = (a y) 2. Given an initial guess w for the empirical risk minimizing parameter vector, how could we improve our guess? y x Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23
3 Initial Question Intro Solution Solution The empirical risk is given by ˆR n (f ) = 1 n n l(f (x i ), y i ) = 1 n i=1 n (w T x i y i ) 2 = 1 n Xw y 2 2, i=1 where X R n d is the matrix whose ith row is given by x i. Can improve a non-optimal guess w by taking a small step in the direction of the negative gradient. Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23
4 Single Variable Calculus Single Variable Differentiation Calculus lets us turn non-linear problems into linear algebra. For f : R R differentiable, the derivative is given by f f (x + h) f (x) (x) = lim. h 0 h Can also be written as f (x + h) = f (x) + hf (x) + o(h) as h 0, where o(h) denotes a function g(h) with g(h)/h 0 as h 0. Points with f (x) = 0 are called critical points. Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23
5 Single Variable Calculus 1D Linear Approximation By Derivative f(t) f(x 0 ) + (t x 0 )f (x 0 ) f(t) (f(x 0 ) + (t x 0 )f (x 0 )) (x 0, f(x 0 )) Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23
6 Multivariable Differentiation Consider now a function f : R n R with inputs of the form x = (x 1,..., x n ) R n. Unlike the 1-dimensional case, we cannot assign a single number to the slope at a point since there are many directions we can move in. Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23
7 Multiple Possible Directions for f : R 2 R Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23
8 Directional Derivative Definition Let f : R n R. The directional derivative f (x; u) of f at x R n in the direction u R n is given by f f (x + hu) f (x) (x; u) = lim. h 0 h By fixing a direction u we turned our multidimensional problem into a 1-dimensional problem. Similar to 1-d we have f (x + hu) = f (x) + hf (x; u) + o(h). We say that u is a descent direction of f at x if f (x; u) < 0. Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23
9 Directional Derivative as a Slope of a Slice Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23
10 Partial Derivative i 1 {}}{ Let e i = ( 0, 0,..., 0, 1, 0,..., 0) denote the ith standard basis vector. The ith partial derivative is defined to be the directional derivative along e i. It can be written many ways: f (x; e i ) = x i f (x) = xi f (x) = i f (x). What is the intuitive meaning of xi f (x)? For example, what does a large value of x3 f (x) imply? Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23
11 Differentiability We say a function f : R n R is differentiable at x R n if for some g R n. f (x + v) f (x) g T v lim = 0, v 0 v 2 If it exists, this g is unique and is called the gradient of f at x with notation g = f (x). It can be shown that f (x) = x1 f (x). xn f (x). Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23
12 Useful Convention Consider f : R p+q R. Split the input x R p+q into parts w R p and z R q so that x = (w, z). Define the partial gradients w1 f (w, z) w f (w, z) :=. wp f (w, z) and z f (w, z) := z1 f (w, z). zq f (w, z). Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23
13 Tangent Plane Analogous to the 1-d case we can express differentiability as f (x + v) = f (x) + f (x) T v + o( v 2 ). The approximation f (x + v) f (x) + f (x) T v gives a tangent plane at the point x. The tangent plane of f at x is given by P = {(x + v, f (x) + f (x) T v) v R n } R n+1. Methods like gradient descent approximate a function locally by its tangent plane, and then take a step accordingly. Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23
14 Tangent Plane for f : R 2 R Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23
15 Directional Derivatives from Gradients If f is differentiable we have If f (x) 0 this implies that arg max f (x; u) = u 2 =1 f (x) f (x; u) = f (x) T u. f (x) 2 and arg min u 2 =1 The gradient points in the direction of steepest ascent. f f (x) (x; u) =. f (x) 2 The negative gradient points in the direction of steepest descent. Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23
16 Critical Points Analogous to 1-d, if f : R n R is differentiable and x is a local extremum then we must have f (x) = 0. Points with f (x) = 0 are called critical points. Later we will see that for a convex differentiable function, x is a critical point if and only if it is a global minimizer. Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23
17 Critical Points of f : R 2 R Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23
18 Computing Gradients Computing Gradients Question For questions 1 and 2, compute the gradient of the given function. 1 f : R 3 R is given by 2 f : R n R is given by f (x 1, x 2, x 3 ) = log(1 + e x 1+2x 2 +3x 3 ). f (x) = Ax y 2 2 = (Ax y) T (Ax y) = x T A T Ax 2y T Ax + y T y, for some A R m n and y R m. 3 Assume A in the previous question has full column rank. What is the critical point of f? Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23
19 Computing Gradients f (x 1, x 2, x 3 ) = log(1 + e x 1+2x 2 +3x 3 ) Solution 1 We can compute the partial derivatives directly: and obtain x1 f (x 1, x 2, x 3 ) = x2 f (x 1, x 2, x 3 ) = x3 f (x 1, x 2, x 3 ) = f (x 1, x 2, x 3 ) = e x 1+2x 2 +3x e x 1+2x 2 +3x 3 2e x 1+2x 2 +3x e x 1+2x 2 +3x 3 3e x 1+2x 2 +3x e x 1+2x 2 +3x 3 e x 1+2x 2 +3x e x 1+2x 2 +3x 3 2e x 1+2x 2 +3x e x 1+2x 2 +3x 3 3e x 1+2x 2 +3x e x 1+2x 2 +3x 3. Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23
20 Computing Gradients f (x 1, x 2, x 3 ) = log(1 + e x 1+2x 2 +3x 3 ) Solution 2 Let w = (1, 2, 3) T. Write f (x) = log(1 + e w T x ). Apply a version of the chain rule: f (x) = ew T x 1 + e w T x w. Theorem (Chain Rule) If g : R R and h : R n R are differentiable then (g h)(x) = g (h(x)) h(x). Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23
21 Computing Gradients f (x) = Ax y 2 2 Solution We could use techniques similar to the previous problem, but instead we show a different method using directional derivatives. For arbitrary t R and x, v R n we have f (x + tv) = (x + tv) T A T A(x + tv) 2y T A(x + tv) + y T y = x T A T Ax + t 2 v T A T Av + 2tx T A T Av 2y T Ax 2ty T Av + y T y = f (x) + t(2x T A T A 2y T A)v + t 2 v T A T Av. This gives f f (x + tv) f (x) (x; v) = lim = (2x T A T A 2y T A)v = f (x) T v t 0 t Thus f (x) = 2(A T Ax A T y) = 2A T (Ax y). Data science interpretation of f (x)? Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23
22 Computing Gradients Critical Points of f (x) = Ax y 2 2 Need f (x) = 2A T Ax 2A T y = 0. Since A is assumed to have full column rank, we see that A T A is invertible. Thus we have x = (A T A) 1 A T y. As we will see later, this function is strictly convex (Hessian 2 f (x) = 2A T A is positive definite). Thus we have found the unique minimizer (least squares solution). Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23
23 Computing Gradients Technical Aside: Differentiability When computing the gradients above we assumed the functions were differentiable. Can use the following theorem to be completely rigorous. Theorem Let f : R n R and suppose xi f : R n R is continuous for i = 1,..., n. Then f is differentiable. Brett Bernstein (CDS at NYU) Recitation 1 January 21, / 23
Machine Learning Brett Bernstein. Recitation 1: Gradients and Directional Derivatives
Machine Learning Brett Bernstein Recitation 1: Gradients and Directional Derivatives Intro Question 1 We are given the data set (x 1, y 1 ),, (x n, y n ) where x i R d and y i R We want to fit a linear
More informationRecitation 9. Gradient Boosting. Brett Bernstein. March 30, CDS at NYU. Brett Bernstein (CDS at NYU) Recitation 9 March 30, / 14
Brett Bernstein CDS at NYU March 30, 2017 Brett Bernstein (CDS at NYU) Recitation 9 March 30, 2017 1 / 14 Initial Question Intro Question Question Suppose 10 different meteorologists have produced functions
More informationOptimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30
Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained
More informationLecture Notes: Geometric Considerations in Unconstrained Optimization
Lecture Notes: Geometric Considerations in Unconstrained Optimization James T. Allison February 15, 2006 The primary objectives of this lecture on unconstrained optimization are to: Establish connections
More informationConvex Optimization. Problem set 2. Due Monday April 26th
Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining
More informationNonlinear equations and optimization
Notes for 2017-03-29 Nonlinear equations and optimization For the next month or so, we will be discussing methods for solving nonlinear systems of equations and multivariate optimization problems. We will
More informationDifferentiable Functions
Differentiable Functions Let S R n be open and let f : R n R. We recall that, for x o = (x o 1, x o,, x o n S the partial derivative of f at the point x o with respect to the component x j is defined as
More informationOptimization and Calculus
Optimization and Calculus To begin, there is a close relationship between finding the roots to a function and optimizing a function. In the former case, we solve for x. In the latter, we solve: g(x) =
More informationUnconstrained minimization of smooth functions
Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and
More informationECE580 Exam 1 October 4, Please do not write on the back of the exam pages. Extra paper is available from the instructor.
ECE580 Exam 1 October 4, 2012 1 Name: Solution Score: /100 You must show ALL of your work for full credit. This exam is closed-book. Calculators may NOT be used. Please leave fractions as fractions, etc.
More informationNote: Every graph is a level set (why?). But not every level set is a graph. Graphs must pass the vertical line test. (Level sets may or may not.
Curves in R : Graphs vs Level Sets Graphs (y = f(x)): The graph of f : R R is {(x, y) R y = f(x)} Example: When we say the curve y = x, we really mean: The graph of the function f(x) = x That is, we mean
More informationChapter 8: Taylor s theorem and L Hospital s rule
Chapter 8: Taylor s theorem and L Hospital s rule Theorem: [Inverse Mapping Theorem] Suppose that a < b and f : [a, b] R. Given that f (x) > 0 for all x (a, b) then f 1 is differentiable on (f(a), f(b))
More informationECE 680 Modern Automatic Control. Gradient and Newton s Methods A Review
ECE 680Modern Automatic Control p. 1/1 ECE 680 Modern Automatic Control Gradient and Newton s Methods A Review Stan Żak October 25, 2011 ECE 680Modern Automatic Control p. 2/1 Review of the Gradient Properties
More information1 Convexity, concavity and quasi-concavity. (SB )
UNIVERSITY OF MARYLAND ECON 600 Summer 2010 Lecture Two: Unconstrained Optimization 1 Convexity, concavity and quasi-concavity. (SB 21.1-21.3.) For any two points, x, y R n, we can trace out the line of
More informationLECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION
15-382 COLLECTIVE INTELLIGENCE - S19 LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION TEACHER: GIANNI A. DI CARO WHAT IF WE HAVE ONE SINGLE AGENT PSO leverages the presence of a swarm: the outcome
More informationProperties of the Gradient
Properties of the Gradient Gradients and Level Curves In this section, we use the gradient and the chain rule to investigate horizontal and vertical slices of a surface of the form z = g (x; y) : To begin
More informationGradient and Directional Derivatives October 2013
Gradient and Directional Derivatives 14.5 07 October 2013 function of one variable: makes sense to talk about the rate of change function of several variables: rate of change depends on direction slope
More informationComputational Optimization. Mathematical Programming Fundamentals 1/25 (revised)
Computational Optimization Mathematical Programming Fundamentals 1/5 (revised) If you don t know where you are going, you probably won t get there. -from some book I read in eight grade If you do get there,
More informationMA102: Multivariable Calculus
MA102: Multivariable Calculus Rupam Barman and Shreemayee Bora Department of Mathematics IIT Guwahati Differentiability of f : U R n R m Definition: Let U R n be open. Then f : U R n R m is differentiable
More informationE 600 Chapter 3: Multivariate Calculus
E 600 Chapter 3: Multivariate Calculus Simona Helmsmueller August 21, 2017 Goals of this lecture: Know when an inverse to a function exists, be able to graphically and analytically determine whether a
More informationECE580 Partial Solution to Problem Set 3
ECE580 Fall 2015 Solution to Problem Set 3 October 23, 2015 1 ECE580 Partial Solution to Problem Set 3 These problems are from the textbook by Chong and Zak, 4th edition, which is the textbook for the
More informationLimits, Continuity, and the Derivative
Unit #2 : Limits, Continuity, and the Derivative Goals: Study and define continuity Review limits Introduce the derivative as the limit of a difference quotient Discuss the derivative as a rate of change
More informationWeek 3: Sets and Mappings/Calculus and Optimization (Jehle September and Reny, 20, 2015 Chapter A1/A2)
Week 3: Sets and Mappings/Calculus and Optimization (Jehle and Reny, Chapter A1/A2) Tsun-Feng Chiang *School of Economics, Henan University, Kaifeng, China September 20, 2015 Week 3: Sets and Mappings/Calculus
More informationIFT Lecture 2 Basics of convex analysis and gradient descent
IF 6085 - Lecture Basics of convex analysis and gradient descent his version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribes: Assya rofimov,
More informationSOLUTIONS to Exercises from Optimization
SOLUTIONS to Exercises from Optimization. Use the bisection method to find the root correct to 6 decimal places: 3x 3 + x 2 = x + 5 SOLUTION: For the root finding algorithm, we need to rewrite the equation
More informationB553 Lecture 1: Calculus Review
B553 Lecture 1: Calculus Review Kris Hauser January 10, 2012 This course requires a familiarity with basic calculus, some multivariate calculus, linear algebra, and some basic notions of metric topology.
More information26. Directional Derivatives & The Gradient
26. Directional Derivatives & The Gradient Given a multivariable function z = f(x, y) and a point on the xy-plane P 0 = (x 0, y 0 ) at which f is differentiable (i.e. it is smooth with no discontinuities,
More informationAPPLICATIONS OF INTEGRATION
6 APPLICATIONS OF INTEGRATION APPLICATIONS OF INTEGRATION 6.5 Average Value of a Function In this section, we will learn about: Applying integration to find out the average value of a function. AVERAGE
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationMathematical Economics (ECON 471) Lecture 3 Calculus of Several Variables & Implicit Functions
Mathematical Economics (ECON 471) Lecture 3 Calculus of Several Variables & Implicit Functions Teng Wah Leo 1 Calculus of Several Variables 11 Functions Mapping between Euclidean Spaces Where as in univariate
More informationIterative Reweighted Least Squares
Iterative Reweighted Least Squares Sargur. University at Buffalo, State University of ew York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative
More informationSlopes and Rates of Change
Slopes and Rates of Change If a particle is moving in a straight line at a constant velocity, then the graph of the function of distance versus time is as follows s s = f(t) t s s t t = average velocity
More informationMatrix Derivatives and Descent Optimization Methods
Matrix Derivatives and Descent Optimization Methods 1 Qiang Ning Department of Electrical and Computer Engineering Beckman Institute for Advanced Science and Techonology University of Illinois at Urbana-Champaign
More informationWeek 4: Calculus and Optimization (Jehle and Reny, Chapter A2)
Week 4: Calculus and Optimization (Jehle and Reny, Chapter A2) Tsun-Feng Chiang *School of Economics, Henan University, Kaifeng, China September 27, 2015 Microeconomic Theory Week 4: Calculus and Optimization
More informationLOCAL SEARCH. Today. Reading AIMA Chapter , Goals Local search algorithms. Introduce adversarial search 1/31/14
LOCAL SEARCH Today Reading AIMA Chapter 4.1-4.2, 5.1-5.2 Goals Local search algorithms n hill-climbing search n simulated annealing n local beam search n genetic algorithms n gradient descent and Newton-Rhapson
More informationCalculus Overview. f(x) f (x) is slope. I. Single Variable. A. First Order Derivative : Concept : measures slope of curve at a point.
Calculus Overview I. Single Variable A. First Order Derivative : Concept : measures slope of curve at a point. Notation : Let y = f (x). First derivative denoted f ʹ (x), df dx, dy dx, f, etc. Example
More informationRegression with Numerical Optimization. Logistic
CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204
More informationLecture 10. Econ August 21
Lecture 10 Econ 2001 2015 August 21 Lecture 10 Outline 1 Derivatives and Partial Derivatives 2 Differentiability 3 Tangents to Level Sets Calculus! This material will not be included in today s exam. Announcement:
More informationConvex Optimization. Chapter Introduction Mathematical Optimization
Chapter 4 Convex Optimization 4.1 Introduction 4.1.1 Mathematical Optimization The problem of mathematical optimization is to minimize a non-linear cost function f 0 (x) subject to inequality constraints
More informationC&O367: Nonlinear Optimization (Winter 2013) Assignment 4 H. Wolkowicz
C&O367: Nonlinear Optimization (Winter 013) Assignment 4 H. Wolkowicz Posted Mon, Feb. 8 Due: Thursday, Feb. 8 10:00AM (before class), 1 Matrices 1.1 Positive Definite Matrices 1. Let A S n, i.e., let
More informationThe Derivative. Appendix B. B.1 The Derivative of f. Mappings from IR to IR
Appendix B The Derivative B.1 The Derivative of f In this chapter, we give a short summary of the derivative. Specifically, we want to compare/contrast how the derivative appears for functions whose domain
More informationChapter 13. Convex and Concave. Josef Leydold Mathematical Methods WS 2018/19 13 Convex and Concave 1 / 44
Chapter 13 Convex and Concave Josef Leydold Mathematical Methods WS 2018/19 13 Convex and Concave 1 / 44 Monotone Function Function f is called monotonically increasing, if x 1 x 2 f (x 1 ) f (x 2 ) It
More informationVector Derivatives and the Gradient
ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 1/1 Lecture 10 ECE 275A Vector Derivatives and the Gradient ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego
More informationFebruary 4, 2019 LECTURE 4: THE DERIVATIVE.
February 4, 29 LECTURE 4: THE DERIVATIVE 2 HONORS MULTIVARIABLE CALCULUS PROFESSOR RICHARD BROWN Synopsis Today, we finish our discussion on its and pass through the concept of continuity Really, there
More informationFamilies of Functions, Taylor Polynomials, l Hopital s
Unit #6 : Rule Families of Functions, Taylor Polynomials, l Hopital s Goals: To use first and second derivative information to describe functions. To be able to find general properties of families of functions.
More informationA function(al) f is convex if dom f is a convex set, and. f(θx + (1 θ)y) < θf(x) + (1 θ)f(y) f(x) = x 3
Convex functions The domain dom f of a functional f : R N R is the subset of R N where f is well-defined. A function(al) f is convex if dom f is a convex set, and f(θx + (1 θ)y) θf(x) + (1 θ)f(y) for all
More information1 Overview. 2 A Characterization of Convex Functions. 2.1 First-order Taylor approximation. AM 221: Advanced Optimization Spring 2016
AM 221: Advanced Optimization Spring 2016 Prof. Yaron Singer Lecture 8 February 22nd 1 Overview In the previous lecture we saw characterizations of optimality in linear optimization, and we reviewed the
More informationMATH529 Fundamentals of Optimization Unconstrained Optimization II
MATH529 Fundamentals of Optimization Unconstrained Optimization II Marco A. Montes de Oca Mathematical Sciences, University of Delaware, USA 1 / 31 Recap 2 / 31 Example Find the local and global minimizers
More informationFitting Linear Statistical Models to Data by Least Squares I: Introduction
Fitting Linear Statistical Models to Data by Least Squares I: Introduction Brian R. Hunt and C. David Levermore University of Maryland, College Park Math 420: Mathematical Modeling January 25, 2012 version
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationVector Multiplication. Directional Derivatives and the Gradient
Vector Multiplication - 1 Unit # : Goals: Directional Derivatives and the Gradient To learn about dot and scalar products of vectors. To introduce the directional derivative and the gradient vector. To
More informationParameter estimation Conditional risk
Parameter estimation Conditional risk Formalizing the problem Specify random variables we care about e.g., Commute Time e.g., Heights of buildings in a city We might then pick a particular distribution
More information5.) For each of the given sets of vectors, determine whether or not the set spans R 3. Give reasons for your answers.
Linear Algebra - Test File - Spring Test # For problems - consider the following system of equations. x + y - z = x + y + 4z = x + y + 6z =.) Solve the system without using your calculator..) Find the
More informationSIMPLE MULTIVARIATE OPTIMIZATION
SIMPLE MULTIVARIATE OPTIMIZATION 1. DEFINITION OF LOCAL MAXIMA AND LOCAL MINIMA 1.1. Functions of variables. Let f(, x ) be defined on a region D in R containing the point (a, b). Then a: f(a, b) is a
More information2.098/6.255/ Optimization Methods Practice True/False Questions
2.098/6.255/15.093 Optimization Methods Practice True/False Questions December 11, 2009 Part I For each one of the statements below, state whether it is true or false. Include a 1-3 line supporting sentence
More informationIntegration - Past Edexcel Exam Questions
Integration - Past Edexcel Exam Questions 1. (a) Given that y = 5x 2 + 7x + 3, find i. - ii. - (b) ( 1 + 3 ) x 1 x dx. [4] 2. Question 2b - January 2005 2. The gradient of the curve C is given by The point
More informationChapter 5 Elements of Calculus
Chapter 5 Elements of Calculus An Introduction to Optimization Spring, 2015 Wei-Ta Chu 1 Sequences and Limits A sequence of real numbers can be viewed as a set of numbers, which is often also denoted as
More informationFitting Linear Statistical Models to Data by Least Squares II: Weighted
Fitting Linear Statistical Models to Data by Least Squares II: Weighted Brian R. Hunt and C. David Levermore University of Maryland, College Park Math 420: Mathematical Modeling April 21, 2014 version
More informationMath 5630: Conjugate Gradient Method Hung M. Phan, UMass Lowell March 29, 2019
Math 563: Conjugate Gradient Method Hung M. Phan, UMass Lowell March 29, 219 hroughout, A R n n is symmetric and positive definite, and b R n. 1 Steepest Descent Method We present the steepest descent
More informationNonlinear equations. Norms for R n. Convergence orders for iterative methods
Nonlinear equations Norms for R n Assume that X is a vector space. A norm is a mapping X R with x such that for all x, y X, α R x = = x = αx = α x x + y x + y We define the following norms on the vector
More informationSections 2.1, 2.2 and 2.4: Limit of a function Motivation:
Sections 2.1, 2.2 and 2.4: Limit of a function Motivation: There are expressions which can be computed only using Algebra, meaning only using the operations +,, and. Examples which can be computed using
More informationMore First-Order Optimization Algorithms
More First-Order Optimization Algorithms Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapters 3, 8, 3 The SDM
More informationDirectional Derivatives and the Gradient
Unit #20 : Directional Derivatives and the Gradient Goals: To learn about dot and scalar products of vectors. To introduce the directional derivative and the gradient vector. To learn how to compute the
More informationChapter 2. Optimization. Gradients, convexity, and ALS
Chapter 2 Optimization Gradients, convexity, and ALS Contents Background Gradient descent Stochastic gradient descent Newton s method Alternating least squares KKT conditions 2 Motivation We can solve
More informationCS 453X: Class 20. Jacob Whitehill
CS 3X: Class 20 Jacob Whitehill More on training neural networks Training neural networks While training neural networks by hand is (arguably) fun, it is completely impractical except for toy examples.
More informationDirectional Derivatives in the Plane
Directional Derivatives in the Plane P. Sam Johnson April 10, 2017 P. Sam Johnson (NIT Karnataka) Directional Derivatives in the Plane April 10, 2017 1 / 30 Directional Derivatives in the Plane Let z =
More informationOptimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More informationAPPLICATIONS OF DIFFERENTIABILITY IN R n.
APPLICATIONS OF DIFFERENTIABILITY IN R n. MATANIA BEN-ARTZI April 2015 Functions here are defined on a subset T R n and take values in R m, where m can be smaller, equal or greater than n. The (open) ball
More information, b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are
Quadratic forms We consider the quadratic function f : R 2 R defined by f(x) = 2 xt Ax b T x with x = (x, x 2 ) T, () where A R 2 2 is symmetric and b R 2. We will see that, depending on the eigenvalues
More informationScientific Computing: Optimization
Scientific Computing: Optimization Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course MATH-GA.2043 or CSCI-GA.2112, Spring 2012 March 8th, 2011 A. Donev (Courant Institute) Lecture
More informationMath 117: Honours Calculus I Fall, 2002 List of Theorems. a n k b k. k. Theorem 2.1 (Convergent Bounded) A convergent sequence is bounded.
Math 117: Honours Calculus I Fall, 2002 List of Theorems Theorem 1.1 (Binomial Theorem) For all n N, (a + b) n = n k=0 ( ) n a n k b k. k Theorem 2.1 (Convergent Bounded) A convergent sequence is bounded.
More informationMultivariable Calculus
2 Multivariable Calculus 2.1 Limits and Continuity Problem 2.1.1 (Fa94) Let the function f : R n R n satisfy the following two conditions: (i) f (K ) is compact whenever K is a compact subset of R n. (ii)
More informationSeptember Math Course: First Order Derivative
September Math Course: First Order Derivative Arina Nikandrova Functions Function y = f (x), where x is either be a scalar or a vector of several variables (x,..., x n ), can be thought of as a rule which
More informationl 1 and l 2 Regularization
David Rosenberg New York University February 5, 2015 David Rosenberg (New York University) DS-GA 1003 February 5, 2015 1 / 32 Tikhonov and Ivanov Regularization Hypothesis Spaces We ve spoken vaguely about
More informationMotivation: We have already seen an example of a system of nonlinear equations when we studied Gaussian integration (p.8 of integration notes)
AMSC/CMSC 460 Computational Methods, Fall 2007 UNIT 5: Nonlinear Equations Dianne P. O Leary c 2001, 2002, 2007 Solving Nonlinear Equations and Optimization Problems Read Chapter 8. Skip Section 8.1.1.
More informationOptimization Tutorial 1. Basic Gradient Descent
E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.
More information8.7 Taylor s Inequality Math 2300 Section 005 Calculus II. f(x) = ln(1 + x) f(0) = 0
8.7 Taylor s Inequality Math 00 Section 005 Calculus II Name: ANSWER KEY Taylor s Inequality: If f (n+) is continuous and f (n+) < M between the center a and some point x, then f(x) T n (x) M x a n+ (n
More informationIntroduction to gradient descent
6-1: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction to gradient descent Derivation and intuitions Hessian 6-2: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction Our
More informationMAT 473 Intermediate Real Analysis II
MAT 473 Intermediate Real Analysis II John Quigg Spring 2009 revised February 5, 2009 Derivatives Here our purpose is to give a rigorous foundation of the principles of differentiation in R n. Much of
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationLinear Regression and Its Applications
Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start
More information10-701/ Recitation : Linear Algebra Review (based on notes written by Jing Xiang)
10-701/15-781 Recitation : Linear Algebra Review (based on notes written by Jing Xiang) Manojit Nandi February 1, 2014 Outline Linear Algebra General Properties Matrix Operations Inner Products and Orthogonal
More informationd(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N
Problem 1. Let f : A R R have the property that for every x A, there exists ɛ > 0 such that f(t) > ɛ if t (x ɛ, x + ɛ) A. If the set A is compact, prove there exists c > 0 such that f(x) > c for all x
More informationComparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming Problems
International Journal of Scientific and Research Publications, Volume 3, Issue 10, October 013 1 ISSN 50-3153 Comparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming
More informationSolutions to Test #1 MATH 2421
Solutions to Test # MATH Pulhalskii/Kawai (#) Decide whether the following properties are TRUE or FALSE for arbitrary vectors a; b; and c: Circle your answer. [Remember, TRUE means that the statement is
More informationn f(k) k=1 means to evaluate the function f(k) at k = 1, 2,..., n and add up the results. In other words: n f(k) = f(1) + f(2) f(n). 1 = 2n 2.
Handout on induction and written assignment 1. MA113 Calculus I Spring 2007 Why study mathematical induction? For many students, mathematical induction is an unfamiliar topic. Nonetheless, this is an important
More informationConstrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.
Optimization Unconstrained optimization One-dimensional Multi-dimensional Newton s method Basic Newton Gauss- Newton Quasi- Newton Descent methods Gradient descent Conjugate gradient Constrained optimization
More informationGradient Descent. Dr. Xiaowei Huang
Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,
More informationMATH 23b, SPRING 2005 THEORETICAL LINEAR ALGEBRA AND MULTIVARIABLE CALCULUS Midterm (part 1) Solutions March 21, 2005
MATH 23b, SPRING 2005 THEORETICAL LINEAR ALGEBRA AND MULTIVARIABLE CALCULUS Midterm (part 1) Solutions March 21, 2005 1. True or False (22 points, 2 each) T or F Every set in R n is either open or closed
More informationECS171: Machine Learning
ECS171: Machine Learning Lecture 3: Linear Models I (LFD 3.2, 3.3) Cho-Jui Hsieh UC Davis Jan 17, 2018 Linear Regression (LFD 3.2) Regression Classification: Customer record Yes/No Regression: predicting
More informationVector Functions & Space Curves MATH 2110Q
Vector Functions & Space Curves Vector Functions & Space Curves Vector Functions Definition A vector function or vector-valued function is a function that takes real numbers as inputs and gives vectors
More information(Extended) Kalman Filter
(Extended) Kalman Filter Brian Hunt 7 June 2013 Goals of Data Assimilation (DA) Estimate the state of a system based on both current and all past observations of the system, using a model for the system
More informationExistence of minimizers
Existence of imizers We have just talked a lot about how to find the imizer of an unconstrained convex optimization problem. We have not talked too much, at least not in concrete mathematical terms, about
More informationAPPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.
APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product
More informationLecture 3 - Linear and Logistic Regression
3 - Linear and Logistic Regression-1 Machine Learning Course Lecture 3 - Linear and Logistic Regression Lecturer: Haim Permuter Scribe: Ziv Aharoni Throughout this lecture we talk about how to use regression
More informationThe Inverse Function Theorem 1
John Nachbar Washington University April 11, 2014 1 Overview. The Inverse Function Theorem 1 If a function f : R R is C 1 and if its derivative is strictly positive at some x R, then, by continuity of
More informationUNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems
UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems Robert M. Freund February 2016 c 2016 Massachusetts Institute of Technology. All rights reserved. 1 1 Introduction
More information1 Functions of Several Variables Some Examples Level Curves / Contours Functions of More Variables... 6
Contents 1 Functions of Several Variables 1 1.1 Some Examples.................................. 2 1.2 Level Curves / Contours............................. 4 1.3 Functions of More Variables...........................
More information10. Unconstrained minimization
Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation
More informationa b c d e GOOD LUCK! 3. a b c d e 12. a b c d e 4. a b c d e 13. a b c d e 5. a b c d e 14. a b c d e 6. a b c d e 15. a b c d e
MA Elem. Calculus Fall 07 Exam 07-09- Name: Sec.: Do not remove this answer page you will turn in the entire exam. No books or notes may be used. You may use an ACT-approved calculator during the exam,
More information