Gradient descents and inner products

Size: px
Start display at page:

Download "Gradient descents and inner products"

Transcription

1 Gradient descents and inner products

2 Gradient descent and inner products Gradient (definition) Usual inner product (L 2 ) Natural inner products Other inner products Examples

3 Gradient (definition) Energy E depending on a ariable f (ector or function) gradient E(f) =? Directional deriatie: consider a ariation of f. E(f + ε) E(f) δe(f)() := lim ε 0 ε f Hopefully, δe is linear and continuous wrt. Riesz representation theorem = gradient: ariation E(f) such that, δe(f)() = E(f) Usual inner product : L 2 f gradient

4 Gradient Descent Scheme Build minimizing path: f t=0 = f 0 f t = P f E(f)

5 Gradient Descent Scheme Build minimizing path: f t=0 = f 0 f t = P f E(f) Change the inner product P = change the minimizing path P f E(f) = arg min δe(f)() + 1 } 2 2 P P as a prior on the minimizing flow

6 Example 1: ectors, lists of parameters E( α ) = E(α 1, α 2, α 3,..., α n ) Usual L 2 gradient: E( α ) = i i E(α 1, α 2,..., α n ) e i (supposes α i independent)

7 Example 1: ectors, lists of parameters E( α ) = E(α 1, α 2, α 3,..., α n ) Usual L 2 gradient: E( α ) = i i E(α 1, α 2,..., α n ) e i (supposes α i independent) Should all parameters α i hae the same weights? Consider a different parameterization β = P( α ): β 1 = 10 α 1 and β i = α i i 1 F ( β ) = E(P 1 ( β )) = E(0.1β 1, β 2,..., β n )

8 Example 1: ectors, lists of parameters E( α ) = E(α 1, α 2, α 3,..., α n ) Usual L 2 gradient: E( α ) = i i E(α 1, α 2,..., α n ) e i (supposes α i independent) Should all parameters α i hae the same weights? Consider a different parameterization β = P( α ): β 1 = 10 α 1 and β i = α i i 1 F ( β ) = E(P 1 ( β )) = E(0.1β 1, β 2,..., β n ) i 1, βi F ( β ) = αi E( α ) β1 F ( β ) = 0.1 α1 E( α ) α=p 1 ( β )

9 E( α ) = E(α 1, α 2, α 3,..., α n ) β 1 = 10 α 1 β1 F ( β ) = 0.1 α1 E( α ) α=p 1 ( β ) gradient ascent wrt β: δβ 1 = β1 F ( β ) = δβ 1 = 0.1 α1 E( α ) gradient ascent wrt α: δα 1 = α1 E( α ) = δβ 1 = 10 δα 1 = 10 α1 E( α ) Difference between two approaches: factor 100 for the first parameter! Conclusion: L 2 inner product is bad (not intrinsic)

10 Example 2: (not specific to) kernels and intrinsic gradients f = i α i k(x i, ) E(f) = i f(x i ) y i 2 + f 2 H L 2 gradient wrt α : not the best idea Choose a natural, intrinsic inner product: δ α a δ α b P = δf a δf b H = Df(δ α a ) Df(δ α b ) H = δ α a t Df Df δ α b H In case of kernels this gies: δ α a δ α b P = δ α a K δ α b Corresponding gradient: P α E = (t Df Df ) 1 ( P adm ( H f E) )

11 Example 3: parameterized shapes The same as for kernels Choose any representation for shapes (splines, polygon, leel-set...) Intrinsic gradient = as close as possible to the real eolution, most independent as possible of the representation/parameterization

12 Example 4: shapes, contours Cure f Deformation field, in the tangent space of f f + Usual tangent space: L 2 (f): u L 2 (f) = u(x) (x) df(x) f

13 Examples of inner products for shapes (or functions) L 2 inner product: u L 2 (f) = u(x) (x) df(x) f

14 Examples of inner products for shapes (or functions) L 2 inner product: u L 2 (f) = u(x) (x) df(x) f H 1 inner product: u H 1 = u L 2 (f) + xu x L 2 (f) Interesting property of the H 1 gradient: H1 E = arg inf u u L2 E 2 + xu 2 L 2 L 2

15 Examples of inner products for shapes (or functions) L 2 inner product: u L 2 (f) = u(x) (x) df(x) H 1 inner product: u H 1 = u L 2 (f) + xu x L 2 (f) Set S of prefered transformations (e.g. rigid motion) Projection on S: Q Projection orthogonal to S: R (Q + R = Id) f u S = Q(u) Q() L 2 + α R(u) R() L 2

16 Examples of inner products for shapes (or functions) L 2 inner product: u L 2 (f) = u(x) (x) df(x) H 1 inner product: u H 1 = u L 2 (f) + xu x L 2 (f) Set of prefered transformations (e.g. rigid motion) Example: minimizing the Hausdorff distance between two cures f f t = f d H (f, f 2 ) usual rigidified

17 Examples of inner products for shapes (or functions) L 2 inner product: u L 2 (f) = u(x) (x) df(x) H 1 inner product: u H 1 = u L 2 (f) + xu x L 2 (f) Set of prefered transformations (e.g. rigid motion) Example: minimizing the Hausdorff distance between two cures Change an inner product for another one: linear symmetric positie definite transformation of the gradient f

18 Examples of inner products for shapes (or functions) L 2 inner product: u L 2 (f) = u(x) (x) df(x) H 1 inner product: u H 1 = u L 2 (f) + xu x L 2 (f) Set of prefered transformations (e.g. rigid motion) Example: minimizing the Hausdorff distance between two cures Change an inner product for another one: linear symmetric positie definite transformation of the gradient Gaussian smoothing of the L 2 gradient: symmetric positie definite f

19 Extension to non-linear criteria P f E(f) = arg min δe(f)() P }

20 Extension to non-linear criteria P f E(f) = arg min δe(f)() P P f E(f) = arg min δe(f)() + R P ()} Example: semi-local rigidification }

21 Remark on Taylor series and Newton method P f E(f) = arg min δe(f)() + 1 } 2 2 P

22 Remark on Taylor series and Newton method P f E(f) = arg min δe(f)() + 1 } 2 2 P Different approximations of E(f + ) = different gradient descents:

23 Remark on Taylor series and Newton method P f E(f) = arg min δe(f)() + 1 } 2 2 P Different approximations of E(f + ) = different gradient descents: arg min E(f)}

24 Remark on Taylor series and Newton method P f E(f) = arg min δe(f)() + 1 } 2 2 P Different approximations of E(f + ) = different gradient descents: arg min E(f) + δe(f)()}

25 Remark on Taylor series and Newton method P f E(f) = arg min δe(f)() + 1 } 2 2 P Different approximations of E(f + ) = different gradient descents: arg min E(f) + δe(f)() + 1 } 2 2 P

26 Remark on Taylor series and Newton method P f E(f) = arg min δe(f)() + 1 } 2 2 P Different approximations of E(f + ) = different gradient descents: arg min E(f) + δe(f)() + 1 } 2 2 P = P f E(f) 1st order + metric choice 2 = quadratic

27 Remark on Taylor series and Newton method P f E(f) = arg min δe(f)() + 1 } 2 2 P Different approximations of E(f + ) = different gradient descents: arg min E(f) + δe(f)() + 1 } 2 2 P = P f E(f) 1st order + metric choice 2 = quadratic arg min E(f) + δe(f)() + 1 } 2 δ2 E(f)()()

28 Remark on Taylor series and Newton method P f E(f) = arg min δe(f)() + 1 } 2 2 P Different approximations of E(f + ) = different gradient descents: arg min E(f) + δe(f)() + 1 } 2 2 P = P f E(f) 1st order + metric choice 2 = quadratic arg min E(f) + δe(f)() + 1 } 2 δ2 E(f)()() quadratic : order-2 Newton

29 Remark on Taylor series and Newton method P f E(f) = arg min δe(f)() + 1 } 2 2 P Different approximations of E(f + ) = different gradient descents: arg min E(f) + δe(f)() + 1 } 2 2 P = P f E(f) 1st order + metric choice 2 = quadratic arg min E(f) + δe(f)() + 1 } 2 δ2 E(f)()() quadratic : order-2 Newton arg min E(f) + δe(f)() + R P ()} 1st order + any suitable criterion

30 (End of the parenthesis)

ECE 680 Modern Automatic Control. Gradient and Newton s Methods A Review

ECE 680 Modern Automatic Control. Gradient and Newton s Methods A Review ECE 680Modern Automatic Control p. 1/1 ECE 680 Modern Automatic Control Gradient and Newton s Methods A Review Stan Żak October 25, 2011 ECE 680Modern Automatic Control p. 2/1 Review of the Gradient Properties

More information

Line Search Methods for Unconstrained Optimisation

Line Search Methods for Unconstrained Optimisation Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic

More information

Math 5520 Homework 2 Solutions

Math 5520 Homework 2 Solutions Math 552 Homework 2 Solutions March, 26. Consider the function fx) = 2x ) 3 if x, 3x ) 2 if < x 2. Determine for which k there holds f H k, 2). Find D α f for α k. Solution. We show that k = 2. The formulas

More information

Generalized Newton-Type Method for Energy Formulations in Image Processing

Generalized Newton-Type Method for Energy Formulations in Image Processing Generalized Newton-Type Method for Energy Formulations in Image Processing Leah Bar and Guillermo Sapiro Department of Electrical and Computer Engineering University of Minnesota Outline Optimization in

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

Numerical Optimization

Numerical Optimization Numerical Optimization Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Spring 2010 Emo Todorov (UW) AMATH/CSE 579, Spring 2010 Lecture 9 1 / 8 Gradient descent

More information

Trajectory-based optimization

Trajectory-based optimization Trajectory-based optimization Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2012 Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 6 1 / 13 Using

More information

Policy Gradient. U(θ) = E[ R(s t,a t );π θ ] = E[R(τ);π θ ] (1) 1 + e θ φ(s t) E[R(τ);π θ ] (3) = max. θ P(τ;θ)R(τ) (6) P(τ;θ) θ log P(τ;θ)R(τ) (9)

Policy Gradient. U(θ) = E[ R(s t,a t );π θ ] = E[R(τ);π θ ] (1) 1 + e θ φ(s t) E[R(τ);π θ ] (3) = max. θ P(τ;θ)R(τ) (6) P(τ;θ) θ log P(τ;θ)R(τ) (9) CS294-40 Learning for Robotics and Control Lecture 16-10/20/2008 Lecturer: Pieter Abbeel Policy Gradient Scribe: Jan Biermeyer 1 Recap Recall: H U() = E[ R(s t,a ;π ] = E[R();π ] (1) Here is a sample path

More information

Section 3.5 The Implicit Function Theorem

Section 3.5 The Implicit Function Theorem Section 3.5 The Implicit Function Theorem THEOREM 11 (Special Implicit Function Theorem): Suppose that F : R n+1 R has continuous partial derivatives. Denoting points in R n+1 by (x, z), where x R n and

More information

The Gradient and Directional Derivative - (12.6)

The Gradient and Directional Derivative - (12.6) Te Gradient and Directional Deriatie - (.6). Directional Deriaties Definition: Te directional deriatie of f x,y at te point a,b andintedirectionofteunit ector u u,u, denoted as D u f a,b, is defined by

More information

Lectures in Discrete Differential Geometry 2 Surfaces

Lectures in Discrete Differential Geometry 2 Surfaces Lectures in Discrete Differential Geometry 2 Surfaces Etienne Vouga February 4, 24 Smooth Surfaces in R 3 In this section we will review some properties of smooth surfaces R 3. We will assume that is parameterized

More information

CS Tutorial 5 - Differential Geometry I - Surfaces

CS Tutorial 5 - Differential Geometry I - Surfaces 236861 Numerical Geometry of Images Tutorial 5 Differential Geometry II Surfaces c 2009 Parameterized surfaces A parameterized surface X : U R 2 R 3 a differentiable map 1 X from an open set U R 2 to R

More information

Sec. 1.1: Basics of Vectors

Sec. 1.1: Basics of Vectors Sec. 1.1: Basics of Vectors Notation for Euclidean space R n : all points (x 1, x 2,..., x n ) in n-dimensional space. Examples: 1. R 1 : all points on the real number line. 2. R 2 : all points (x 1, x

More information

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability... Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................

More information

Non-linear least squares

Non-linear least squares Non-linear least squares Concept of non-linear least squares We have extensively studied linear least squares or linear regression. We see that there is a unique regression line that can be determined

More information

Unconstrained Optimization

Unconstrained Optimization 1 / 36 Unconstrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University February 2, 2015 2 / 36 3 / 36 4 / 36 5 / 36 1. preliminaries 1.1 local approximation

More information

LOWELL. MICHIGAN, THURSDAY, MAY 23, Schools Close. method of distribution. t o t h e b o y s of '98 a n d '18. C o m e out a n d see t h e m get

LOWELL. MICHIGAN, THURSDAY, MAY 23, Schools Close. method of distribution. t o t h e b o y s of '98 a n d '18. C o m e out a n d see t h e m get UY Y 99 U XXX 5 Q == 5 K 8 9 $8 7 Y Y 8 q 6 Y x) 5 7 5 q U «U YU x q Y U ) U Z 9 Y 7 x 7 U x U x 9 5 & U Y U U 7 8 9 x 5 x U x x ; x x [ U K»5 98 8 x q 8 q K x Y Y x K Y 5 ~ 8» Y x ; ; ; ; ; ; 8; x 7;

More information

Inverse problems Total Variation Regularization Mark van Kraaij Casa seminar 23 May 2007 Technische Universiteit Eindh ove n University of Technology

Inverse problems Total Variation Regularization Mark van Kraaij Casa seminar 23 May 2007 Technische Universiteit Eindh ove n University of Technology Inverse problems Total Variation Regularization Mark van Kraaij Casa seminar 23 May 27 Introduction Fredholm first kind integral equation of convolution type in one space dimension: g(x) = 1 k(x x )f(x

More information

The Steepest Descent Algorithm for Unconstrained Optimization

The Steepest Descent Algorithm for Unconstrained Optimization The Steepest Descent Algorithm for Unconstrained Optimization Robert M. Freund February, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Steepest Descent Algorithm The problem

More information

Motion Estimation (I)

Motion Estimation (I) Motion Estimation (I) Ce Liu celiu@microsoft.com Microsoft Research New England We live in a moving world Perceiving, understanding and predicting motion is an important part of our daily lives Motion

More information

Practical Optimization: Basic Multidimensional Gradient Methods

Practical Optimization: Basic Multidimensional Gradient Methods Practical Optimization: Basic Multidimensional Gradient Methods László Kozma Lkozma@cis.hut.fi Helsinki University of Technology S-88.4221 Postgraduate Seminar on Signal Processing 22. 10. 2008 Contents

More information

ECE580 Exam 1 October 4, Please do not write on the back of the exam pages. Extra paper is available from the instructor.

ECE580 Exam 1 October 4, Please do not write on the back of the exam pages. Extra paper is available from the instructor. ECE580 Exam 1 October 4, 2012 1 Name: Solution Score: /100 You must show ALL of your work for full credit. This exam is closed-book. Calculators may NOT be used. Please leave fractions as fractions, etc.

More information

Exam 2. Average: 85.6 Median: 87.0 Maximum: Minimum: 55.0 Standard Deviation: Numerical Methods Fall 2011 Lecture 20

Exam 2. Average: 85.6 Median: 87.0 Maximum: Minimum: 55.0 Standard Deviation: Numerical Methods Fall 2011 Lecture 20 Exam 2 Average: 85.6 Median: 87.0 Maximum: 100.0 Minimum: 55.0 Standard Deviation: 10.42 Fall 2011 1 Today s class Multiple Variable Linear Regression Polynomial Interpolation Lagrange Interpolation Newton

More information

Motion Estimation (I) Ce Liu Microsoft Research New England

Motion Estimation (I) Ce Liu Microsoft Research New England Motion Estimation (I) Ce Liu celiu@microsoft.com Microsoft Research New England We live in a moving world Perceiving, understanding and predicting motion is an important part of our daily lives Motion

More information

26. Directional Derivatives & The Gradient

26. Directional Derivatives & The Gradient 26. Directional Derivatives & The Gradient Given a multivariable function z = f(x, y) and a point on the xy-plane P 0 = (x 0, y 0 ) at which f is differentiable (i.e. it is smooth with no discontinuities,

More information

ICS-E4030 Kernel Methods in Machine Learning

ICS-E4030 Kernel Methods in Machine Learning ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This

More information

Performance Surfaces and Optimum Points

Performance Surfaces and Optimum Points CSC 302 1.5 Neural Networks Performance Surfaces and Optimum Points 1 Entrance Performance learning is another important class of learning law. Network parameters are adjusted to optimize the performance

More information

A Regularization Framework for Learning from Graph Data

A Regularization Framework for Learning from Graph Data A Regularization Framework for Learning from Graph Data Dengyong Zhou Max Planck Institute for Biological Cybernetics Spemannstr. 38, 7076 Tuebingen, Germany Bernhard Schölkopf Max Planck Institute for

More information

Course Summary Math 211

Course Summary Math 211 Course Summary Math 211 table of contents I. Functions of several variables. II. R n. III. Derivatives. IV. Taylor s Theorem. V. Differential Geometry. VI. Applications. 1. Best affine approximations.

More information

Optimization. Yuh-Jye Lee. March 21, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 29

Optimization. Yuh-Jye Lee. March 21, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 29 Optimization Yuh-Jye Lee Data Science and Machine Intelligence Lab National Chiao Tung University March 21, 2017 1 / 29 You Have Learned (Unconstrained) Optimization in Your High School Let f (x) = ax

More information

Functional Gradient Descent

Functional Gradient Descent Statistical Techniques in Robotics (16-831, F12) Lecture #21 (Nov 14, 2012) Functional Gradient Descent Lecturer: Drew Bagnell Scribe: Daniel Carlton Smith 1 1 Goal of Functional Gradient Descent We have

More information

= (, ) V λ (1) λ λ ( + + ) P = [ ( ), (1)] ( ) ( ) = ( ) ( ) ( 0 ) ( 0 ) = ( 0 ) ( 0 ) 0 ( 0 ) ( ( 0 )) ( ( 0 )) = ( ( 0 )) ( ( 0 )) ( + ( 0 )) ( + ( 0 )) = ( + ( 0 )) ( ( 0 )) P V V V V V P V P V V V

More information

Chapter 2: 1D Kinematics Tuesday January 13th

Chapter 2: 1D Kinematics Tuesday January 13th Chapter : D Kinematics Tuesday January 3th Motion in a straight line (D Kinematics) Aerage elocity and aerage speed Instantaneous elocity and speed Acceleration Short summary Constant acceleration a special

More information

MATHEMATICS FOR COMPUTER VISION WEEK 8 OPTIMISATION PART 2. Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year

MATHEMATICS FOR COMPUTER VISION WEEK 8 OPTIMISATION PART 2. Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year MATHEMATICS FOR COMPUTER VISION WEEK 8 OPTIMISATION PART 2 1 Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year 2013-14 OUTLINE OF WEEK 8 topics: quadratic optimisation, least squares,

More information

Oslo Class 2 Tikhonov regularization and kernels

Oslo Class 2 Tikhonov regularization and kernels RegML2017@SIMULA Oslo Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT May 3, 2017 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n

More information

Unconstrained minimization of smooth functions

Unconstrained minimization of smooth functions Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and

More information

Mathematical modelling Chapter 2 Nonlinear models and geometric models

Mathematical modelling Chapter 2 Nonlinear models and geometric models Mathematical modelling Chapter 2 Nonlinear models and geometric models Faculty of Computer and Information Science University of Ljubljana 2018/2019 3. Nonlinear models Given is a sample of points {(x

More information

The Conjugate Gradient Method

The Conjugate Gradient Method The Conjugate Gradient Method Lecture 5, Continuous Optimisation Oxford University Computing Laboratory, HT 2006 Notes by Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The notion of complexity (per iteration)

More information

Review of Classical Optimization

Review of Classical Optimization Part II Review of Classical Optimization Multidisciplinary Design Optimization of Aircrafts 51 2 Deterministic Methods 2.1 One-Dimensional Unconstrained Minimization 2.1.1 Motivation Most practical optimization

More information

Edge Detection. CS 650: Computer Vision

Edge Detection. CS 650: Computer Vision CS 650: Computer Vision Edges and Gradients Edge: local indication of an object transition Edge detection: local operators that find edges (usually involves convolution) Local intensity transitions are

More information

Chapter 8 Gradient Methods

Chapter 8 Gradient Methods Chapter 8 Gradient Methods An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Introduction Recall that a level set of a function is the set of points satisfying for some constant. Thus, a point

More information

Gradient Descent. Dr. Xiaowei Huang

Gradient Descent. Dr. Xiaowei Huang Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,

More information

PDEs in Image Processing, Tutorials

PDEs in Image Processing, Tutorials PDEs in Image Processing, Tutorials Markus Grasmair Vienna, Winter Term 2010 2011 Direct Methods Let X be a topological space and R: X R {+ } some functional. following definitions: The mapping R is lower

More information

Newton s Method and Linear Approximations

Newton s Method and Linear Approximations Newton s Method and Linear Approximations Curves are tricky. Lines aren t. Newton s Method and Linear Approximations Newton s Method for finding roots Goal: Where is f (x) = 0? f (x) = x 7 + 3x 3 + 7x

More information

Max Margin-Classifier

Max Margin-Classifier Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization

More information

g(t) = f(x 1 (t),..., x n (t)).

g(t) = f(x 1 (t),..., x n (t)). Reading: [Simon] p. 313-333, 833-836. 0.1 The Chain Rule Partial derivatives describe how a function changes in directions parallel to the coordinate axes. Now we shall demonstrate how the partial derivatives

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Classification Logistic Regression

Classification Logistic Regression Classification Logistic Regression Machine Learning CSE546 Kevin Jamieson University of Washington October 16, 2016 1 THUS FAR, REGRESSION: PREDICT A CONTINUOUS VALUE GIVEN SOME INPUTS 2 Weather prediction

More information

II. DIFFERENTIABLE MANIFOLDS. Washington Mio CENTER FOR APPLIED VISION AND IMAGING SCIENCES

II. DIFFERENTIABLE MANIFOLDS. Washington Mio CENTER FOR APPLIED VISION AND IMAGING SCIENCES II. DIFFERENTIABLE MANIFOLDS Washington Mio Anuj Srivastava and Xiuwen Liu (Illustrations by D. Badlyans) CENTER FOR APPLIED VISION AND IMAGING SCIENCES Florida State University WHY MANIFOLDS? Non-linearity

More information

5 Handling Constraints

5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This

More information

Introduction to gradient descent

Introduction to gradient descent 6-1: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction to gradient descent Derivation and intuitions Hessian 6-2: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction Our

More information

Approximation of Circular Arcs by Parametric Polynomials

Approximation of Circular Arcs by Parametric Polynomials Approximation of Circular Arcs by Parametric Polynomials Emil Žagar Lecture on Geometric Modelling at Charles University in Prague December 6th 2017 1 / 44 Outline Introduction Standard Reprezentations

More information

Mean-Shift Tracker Computer Vision (Kris Kitani) Carnegie Mellon University

Mean-Shift Tracker Computer Vision (Kris Kitani) Carnegie Mellon University Mean-Shift Tracker 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Mean Shift Algorithm A mode seeking algorithm

More information

Convex Optimization. Problem set 2. Due Monday April 26th

Convex Optimization. Problem set 2. Due Monday April 26th Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining

More information

LECTURE 2: CROSS PRODUCTS, MULTILINEARITY, AND AREAS OF PARALLELOGRAMS

LECTURE 2: CROSS PRODUCTS, MULTILINEARITY, AND AREAS OF PARALLELOGRAMS LECTURE : CROSS PRODUCTS, MULTILINEARITY, AND AREAS OF PARALLELOGRAMS MA1111: LINEAR ALGEBRA I, MICHAELMAS 016 1. Finishing up dot products Last time we stated the following theorem, for which I owe you

More information

Orthogonal tensor decomposition

Orthogonal tensor decomposition Orthogonal tensor decomposition Daniel Hsu Columbia University Largely based on 2012 arxiv report Tensor decompositions for learning latent variable models, with Anandkumar, Ge, Kakade, and Telgarsky.

More information

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL) Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

Hilbert Space Methods for Reduced-Rank Gaussian Process Regression

Hilbert Space Methods for Reduced-Rank Gaussian Process Regression Hilbert Space Methods for Reduced-Rank Gaussian Process Regression Arno Solin and Simo Särkkä Aalto University, Finland Workshop on Gaussian Process Approximation Copenhagen, Denmark, May 2015 Solin &

More information

Learning the Kernel Function via Regularization

Learning the Kernel Function via Regularization Learning the Kernel Function via Regularization Author: Micchelli, C.A., Pontil, M Speaker: Yonggang Yao April 9, 2007 Outline Settings for a Class of Regularization Problems 1 Training data: (y i, x i

More information

RegML 2018 Class 2 Tikhonov regularization and kernels

RegML 2018 Class 2 Tikhonov regularization and kernels RegML 2018 Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT June 17, 2018 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n = (x i,

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

Definition and basic properties of heat kernels I, An introduction

Definition and basic properties of heat kernels I, An introduction Definition and basic properties of heat kernels I, An introduction Zhiqin Lu, Department of Mathematics, UC Irvine, Irvine CA 92697 April 23, 2010 In this lecture, we will answer the following questions:

More information

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science EAD 115 Numerical Solution of Engineering and Scientific Problems David M. Rocke Department of Applied Science Taylor s Theorem Can often approximate a function by a polynomial The error in the approximation

More information

Worksheet 9. Math 1B, GSI: Andrew Hanlon. 1 Ce 3t 1/3 1 = Ce 3t. 4 Ce 3t 1/ =

Worksheet 9. Math 1B, GSI: Andrew Hanlon. 1 Ce 3t 1/3 1 = Ce 3t. 4 Ce 3t 1/ = Worksheet 9 Math B, GSI: Andrew Hanlon. Show that for each of the following pairs of differential equations and functions that the function is a solution of a differential equation. (a) y 2 y + y 2 ; Ce

More information

Gradient Based Optimization Methods

Gradient Based Optimization Methods Gradient Based Optimization Methods Antony Jameson, Department of Aeronautics and Astronautics Stanford University, Stanford, CA 94305-4035 1 Introduction Consider the minimization of a function J(x) where

More information

Part 2: Linesearch methods for unconstrained optimization. Nick Gould (RAL)

Part 2: Linesearch methods for unconstrained optimization. Nick Gould (RAL) Part 2: Linesearch methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

Numerical Optimization

Numerical Optimization Numerical Optimization Unit 2: Multivariable optimization problems Che-Rung Lee Scribe: February 28, 2011 (UNIT 2) Numerical Optimization February 28, 2011 1 / 17 Partial derivative of a two variable function

More information

Piecewise rigid curve deformation via a Finsler steepest descent

Piecewise rigid curve deformation via a Finsler steepest descent Piecewise rigid curve deformation via a Finsler steepest descent arxiv:1308.0224v3 [math.na] 7 Dec 2015 Guillaume Charpiat Stars Lab INRIA Sophia-Antipolis Giacomo Nardi, Gabriel Peyré François-Xavier

More information

MVE165/MMG631 Linear and integer optimization with applications Lecture 13 Overview of nonlinear programming. Ann-Brith Strömberg

MVE165/MMG631 Linear and integer optimization with applications Lecture 13 Overview of nonlinear programming. Ann-Brith Strömberg MVE165/MMG631 Overview of nonlinear programming Ann-Brith Strömberg 2015 05 21 Areas of applications, examples (Ch. 9.1) Structural optimization Design of aircraft, ships, bridges, etc Decide on the material

More information

Jim Lambers MAT 460/560 Fall Semester Practice Final Exam

Jim Lambers MAT 460/560 Fall Semester Practice Final Exam Jim Lambers MAT 460/560 Fall Semester 2009-10 Practice Final Exam 1. Let f(x) = sin 2x + cos 2x. (a) Write down the 2nd Taylor polynomial P 2 (x) of f(x) centered around x 0 = 0. (b) Write down the corresponding

More information

LMS Algorithm Summary

LMS Algorithm Summary LMS Algorithm Summary Step size tradeoff Other Iterative Algorithms LMS algorithm with variable step size: w(k+1) = w(k) + µ(k)e(k)x(k) When step size µ(k) = µ/k algorithm converges almost surely to optimal

More information

The Karcher Mean of Points on SO n

The Karcher Mean of Points on SO n The Karcher Mean of Points on SO n Knut Hüper joint work with Jonathan Manton (Univ. Melbourne) Knut.Hueper@nicta.com.au National ICT Australia Ltd. CESAME LLN, 15/7/04 p.1/25 Contents Introduction CESAME

More information

Matrix Derivatives and Descent Optimization Methods

Matrix Derivatives and Descent Optimization Methods Matrix Derivatives and Descent Optimization Methods 1 Qiang Ning Department of Electrical and Computer Engineering Beckman Institute for Advanced Science and Techonology University of Illinois at Urbana-Champaign

More information

LECTURE 3 3.1Rules of Vector Differentiation

LECTURE 3 3.1Rules of Vector Differentiation LETURE 3 3.1Rules of Vector Differentiation We hae defined three kinds of deriaties inoling the operator grad( ) i j k, x y z 1 3 di(., x y z curl( i x 1 j y k z 3 d The good news is that you can apply

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

Nonlinearity Root-finding Bisection Fixed Point Iteration Newton s Method Secant Method Conclusion. Nonlinear Systems

Nonlinearity Root-finding Bisection Fixed Point Iteration Newton s Method Secant Method Conclusion. Nonlinear Systems Nonlinear Systems CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Justin Solomon CS 205A: Mathematical Methods Nonlinear Systems 1 / 24 Part III: Nonlinear Problems Not all numerical problems

More information

Iterative Reweighted Least Squares

Iterative Reweighted Least Squares Iterative Reweighted Least Squares Sargur. University at Buffalo, State University of ew York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative

More information

Math 307: Problems for section 2.1

Math 307: Problems for section 2.1 Math 37: Problems for section 2. October 9 26 2. Are the vectors 2 2 3 2 2 4 9 linearly independent? You may use MAT- 7 3 LAB/Octave to perform calculations but explain your answer. Put the vectors in

More information

Newton s Method and Linear Approximations

Newton s Method and Linear Approximations Newton s Method and Linear Approximations Newton s Method for finding roots Goal: Where is f (x) =0? f (x) =x 7 +3x 3 +7x 2 1 2-1 -0.5 0.5-2 Newton s Method for finding roots Goal: Where is f (x) =0? f

More information

Matrix differential calculus Optimization Geoff Gordon Ryan Tibshirani

Matrix differential calculus Optimization Geoff Gordon Ryan Tibshirani Matrix differential calculus 10-725 Optimization Geoff Gordon Ryan Tibshirani Review Matrix differentials: sol n to matrix calculus pain compact way of writing Taylor expansions, or definition: df = a(x;

More information

Trust Region Methods. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725

Trust Region Methods. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725 Trust Region Methods Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh Convex Optimization 10-725/36-725 Trust Region Methods min p m k (p) f(x k + p) s.t. p 2 R k Iteratively solve approximations

More information

On Nesterov s Random Coordinate Descent Algorithms - Continued

On Nesterov s Random Coordinate Descent Algorithms - Continued On Nesterov s Random Coordinate Descent Algorithms - Continued Zheng Xu University of Texas At Arlington February 20, 2015 1 Revisit Random Coordinate Descent The Random Coordinate Descent Upper and Lower

More information

Vector Derivatives and the Gradient

Vector Derivatives and the Gradient ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 1/1 Lecture 10 ECE 275A Vector Derivatives and the Gradient ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego

More information

Variational Methods in Signal and Image Processing

Variational Methods in Signal and Image Processing Variational Methods in Signal and Image Processing XU WANG Texas A&M University Dept. of Electrical & Computer Eng. College Station, Texas United States xu.wang@tamu.edu ERCHIN SERPEDIN Texas A&M University

More information

g(2, 1) = cos(2π) + 1 = = 9

g(2, 1) = cos(2π) + 1 = = 9 1. Let gx, y 2x 2 cos2πy 2 + y 2. You can use the fact that Dg2, 1 [8 2]. a Find an equation for the tangent plane to the graph z gx, y at the point 2, 1. There are two key parts to this problem. The first,

More information

The Inverse Function Theorem via Newton s Method. Michael Taylor

The Inverse Function Theorem via Newton s Method. Michael Taylor The Inverse Function Theorem via Newton s Method Michael Taylor We aim to prove the following result, known as the inverse function theorem. Theorem 1. Let F be a C k map (k 1) from a neighborhood of p

More information

Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program

Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools Joan Llull Microeconometrics IDEA PhD Program Maximum Likelihood Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical

More information

Learning Kernel Parameters by using Class Separability Measure

Learning Kernel Parameters by using Class Separability Measure Learning Kernel Parameters by using Class Separability Measure Lei Wang, Kap Luk Chan School of Electrical and Electronic Engineering Nanyang Technological University Singapore, 3979 E-mail: P 3733@ntu.edu.sg,eklchan@ntu.edu.sg

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of

More information

Second Order ODEs. Second Order ODEs. In general second order ODEs contain terms involving y, dy But here only consider equations of the form

Second Order ODEs. Second Order ODEs. In general second order ODEs contain terms involving y, dy But here only consider equations of the form Second Order ODEs Second Order ODEs In general second order ODEs contain terms involving y, dy But here only consider equations of the form A d2 y dx 2 + B dy dx + Cy = 0 dx, d2 y dx 2 and F(x). where

More information

1 The space of linear transformations from R n to R m :

1 The space of linear transformations from R n to R m : Math 540 Spring 20 Notes #4 Higher deriaties, Taylor s theorem The space of linear transformations from R n to R m We hae discssed linear transformations mapping R n to R m We can add sch linear transformations

More information

Numerical optimization

Numerical optimization Numerical optimization Lecture 4 Alexander & Michael Bronstein tosca.cs.technion.ac.il/book Numerical geometry of non-rigid shapes Stanford University, Winter 2009 2 Longest Slowest Shortest Minimal Maximal

More information

Newton s Method and Linear Approximations 10/19/2011

Newton s Method and Linear Approximations 10/19/2011 Newton s Method and Linear Approximations 10/19/2011 Curves are tricky. Lines aren t. Newton s Method and Linear Approximations 10/19/2011 Newton s Method Goal: Where is f (x) =0? f (x) =x 7 +3x 3 +7x

More information

Numerical solutions of nonlinear systems of equations

Numerical solutions of nonlinear systems of equations Numerical solutions of nonlinear systems of equations Tsung-Ming Huang Department of Mathematics National Taiwan Normal University, Taiwan E-mail: min@math.ntnu.edu.tw August 28, 2011 Outline 1 Fixed points

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/32 Margin Classifiers margin b = 0 Sridhar Mahadevan: CMPSCI 689 p.

More information

B553 Lecture 3: Multivariate Calculus and Linear Algebra Review

B553 Lecture 3: Multivariate Calculus and Linear Algebra Review B553 Lecture 3: Multivariate Calculus and Linear Algebra Review Kris Hauser December 30, 2011 We now move from the univariate setting to the multivariate setting, where we will spend the rest of the class.

More information