Gradient descents and inner products
|
|
- Cleopatra Golden
- 5 years ago
- Views:
Transcription
1 Gradient descents and inner products
2 Gradient descent and inner products Gradient (definition) Usual inner product (L 2 ) Natural inner products Other inner products Examples
3 Gradient (definition) Energy E depending on a ariable f (ector or function) gradient E(f) =? Directional deriatie: consider a ariation of f. E(f + ε) E(f) δe(f)() := lim ε 0 ε f Hopefully, δe is linear and continuous wrt. Riesz representation theorem = gradient: ariation E(f) such that, δe(f)() = E(f) Usual inner product : L 2 f gradient
4 Gradient Descent Scheme Build minimizing path: f t=0 = f 0 f t = P f E(f)
5 Gradient Descent Scheme Build minimizing path: f t=0 = f 0 f t = P f E(f) Change the inner product P = change the minimizing path P f E(f) = arg min δe(f)() + 1 } 2 2 P P as a prior on the minimizing flow
6 Example 1: ectors, lists of parameters E( α ) = E(α 1, α 2, α 3,..., α n ) Usual L 2 gradient: E( α ) = i i E(α 1, α 2,..., α n ) e i (supposes α i independent)
7 Example 1: ectors, lists of parameters E( α ) = E(α 1, α 2, α 3,..., α n ) Usual L 2 gradient: E( α ) = i i E(α 1, α 2,..., α n ) e i (supposes α i independent) Should all parameters α i hae the same weights? Consider a different parameterization β = P( α ): β 1 = 10 α 1 and β i = α i i 1 F ( β ) = E(P 1 ( β )) = E(0.1β 1, β 2,..., β n )
8 Example 1: ectors, lists of parameters E( α ) = E(α 1, α 2, α 3,..., α n ) Usual L 2 gradient: E( α ) = i i E(α 1, α 2,..., α n ) e i (supposes α i independent) Should all parameters α i hae the same weights? Consider a different parameterization β = P( α ): β 1 = 10 α 1 and β i = α i i 1 F ( β ) = E(P 1 ( β )) = E(0.1β 1, β 2,..., β n ) i 1, βi F ( β ) = αi E( α ) β1 F ( β ) = 0.1 α1 E( α ) α=p 1 ( β )
9 E( α ) = E(α 1, α 2, α 3,..., α n ) β 1 = 10 α 1 β1 F ( β ) = 0.1 α1 E( α ) α=p 1 ( β ) gradient ascent wrt β: δβ 1 = β1 F ( β ) = δβ 1 = 0.1 α1 E( α ) gradient ascent wrt α: δα 1 = α1 E( α ) = δβ 1 = 10 δα 1 = 10 α1 E( α ) Difference between two approaches: factor 100 for the first parameter! Conclusion: L 2 inner product is bad (not intrinsic)
10 Example 2: (not specific to) kernels and intrinsic gradients f = i α i k(x i, ) E(f) = i f(x i ) y i 2 + f 2 H L 2 gradient wrt α : not the best idea Choose a natural, intrinsic inner product: δ α a δ α b P = δf a δf b H = Df(δ α a ) Df(δ α b ) H = δ α a t Df Df δ α b H In case of kernels this gies: δ α a δ α b P = δ α a K δ α b Corresponding gradient: P α E = (t Df Df ) 1 ( P adm ( H f E) )
11 Example 3: parameterized shapes The same as for kernels Choose any representation for shapes (splines, polygon, leel-set...) Intrinsic gradient = as close as possible to the real eolution, most independent as possible of the representation/parameterization
12 Example 4: shapes, contours Cure f Deformation field, in the tangent space of f f + Usual tangent space: L 2 (f): u L 2 (f) = u(x) (x) df(x) f
13 Examples of inner products for shapes (or functions) L 2 inner product: u L 2 (f) = u(x) (x) df(x) f
14 Examples of inner products for shapes (or functions) L 2 inner product: u L 2 (f) = u(x) (x) df(x) f H 1 inner product: u H 1 = u L 2 (f) + xu x L 2 (f) Interesting property of the H 1 gradient: H1 E = arg inf u u L2 E 2 + xu 2 L 2 L 2
15 Examples of inner products for shapes (or functions) L 2 inner product: u L 2 (f) = u(x) (x) df(x) H 1 inner product: u H 1 = u L 2 (f) + xu x L 2 (f) Set S of prefered transformations (e.g. rigid motion) Projection on S: Q Projection orthogonal to S: R (Q + R = Id) f u S = Q(u) Q() L 2 + α R(u) R() L 2
16 Examples of inner products for shapes (or functions) L 2 inner product: u L 2 (f) = u(x) (x) df(x) H 1 inner product: u H 1 = u L 2 (f) + xu x L 2 (f) Set of prefered transformations (e.g. rigid motion) Example: minimizing the Hausdorff distance between two cures f f t = f d H (f, f 2 ) usual rigidified
17 Examples of inner products for shapes (or functions) L 2 inner product: u L 2 (f) = u(x) (x) df(x) H 1 inner product: u H 1 = u L 2 (f) + xu x L 2 (f) Set of prefered transformations (e.g. rigid motion) Example: minimizing the Hausdorff distance between two cures Change an inner product for another one: linear symmetric positie definite transformation of the gradient f
18 Examples of inner products for shapes (or functions) L 2 inner product: u L 2 (f) = u(x) (x) df(x) H 1 inner product: u H 1 = u L 2 (f) + xu x L 2 (f) Set of prefered transformations (e.g. rigid motion) Example: minimizing the Hausdorff distance between two cures Change an inner product for another one: linear symmetric positie definite transformation of the gradient Gaussian smoothing of the L 2 gradient: symmetric positie definite f
19 Extension to non-linear criteria P f E(f) = arg min δe(f)() P }
20 Extension to non-linear criteria P f E(f) = arg min δe(f)() P P f E(f) = arg min δe(f)() + R P ()} Example: semi-local rigidification }
21 Remark on Taylor series and Newton method P f E(f) = arg min δe(f)() + 1 } 2 2 P
22 Remark on Taylor series and Newton method P f E(f) = arg min δe(f)() + 1 } 2 2 P Different approximations of E(f + ) = different gradient descents:
23 Remark on Taylor series and Newton method P f E(f) = arg min δe(f)() + 1 } 2 2 P Different approximations of E(f + ) = different gradient descents: arg min E(f)}
24 Remark on Taylor series and Newton method P f E(f) = arg min δe(f)() + 1 } 2 2 P Different approximations of E(f + ) = different gradient descents: arg min E(f) + δe(f)()}
25 Remark on Taylor series and Newton method P f E(f) = arg min δe(f)() + 1 } 2 2 P Different approximations of E(f + ) = different gradient descents: arg min E(f) + δe(f)() + 1 } 2 2 P
26 Remark on Taylor series and Newton method P f E(f) = arg min δe(f)() + 1 } 2 2 P Different approximations of E(f + ) = different gradient descents: arg min E(f) + δe(f)() + 1 } 2 2 P = P f E(f) 1st order + metric choice 2 = quadratic
27 Remark on Taylor series and Newton method P f E(f) = arg min δe(f)() + 1 } 2 2 P Different approximations of E(f + ) = different gradient descents: arg min E(f) + δe(f)() + 1 } 2 2 P = P f E(f) 1st order + metric choice 2 = quadratic arg min E(f) + δe(f)() + 1 } 2 δ2 E(f)()()
28 Remark on Taylor series and Newton method P f E(f) = arg min δe(f)() + 1 } 2 2 P Different approximations of E(f + ) = different gradient descents: arg min E(f) + δe(f)() + 1 } 2 2 P = P f E(f) 1st order + metric choice 2 = quadratic arg min E(f) + δe(f)() + 1 } 2 δ2 E(f)()() quadratic : order-2 Newton
29 Remark on Taylor series and Newton method P f E(f) = arg min δe(f)() + 1 } 2 2 P Different approximations of E(f + ) = different gradient descents: arg min E(f) + δe(f)() + 1 } 2 2 P = P f E(f) 1st order + metric choice 2 = quadratic arg min E(f) + δe(f)() + 1 } 2 δ2 E(f)()() quadratic : order-2 Newton arg min E(f) + δe(f)() + R P ()} 1st order + any suitable criterion
30 (End of the parenthesis)
ECE 680 Modern Automatic Control. Gradient and Newton s Methods A Review
ECE 680Modern Automatic Control p. 1/1 ECE 680 Modern Automatic Control Gradient and Newton s Methods A Review Stan Żak October 25, 2011 ECE 680Modern Automatic Control p. 2/1 Review of the Gradient Properties
More informationLine Search Methods for Unconstrained Optimisation
Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic
More informationMath 5520 Homework 2 Solutions
Math 552 Homework 2 Solutions March, 26. Consider the function fx) = 2x ) 3 if x, 3x ) 2 if < x 2. Determine for which k there holds f H k, 2). Find D α f for α k. Solution. We show that k = 2. The formulas
More informationGeneralized Newton-Type Method for Energy Formulations in Image Processing
Generalized Newton-Type Method for Energy Formulations in Image Processing Leah Bar and Guillermo Sapiro Department of Electrical and Computer Engineering University of Minnesota Outline Optimization in
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationNumerical Optimization
Numerical Optimization Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Spring 2010 Emo Todorov (UW) AMATH/CSE 579, Spring 2010 Lecture 9 1 / 8 Gradient descent
More informationTrajectory-based optimization
Trajectory-based optimization Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2012 Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 6 1 / 13 Using
More informationPolicy Gradient. U(θ) = E[ R(s t,a t );π θ ] = E[R(τ);π θ ] (1) 1 + e θ φ(s t) E[R(τ);π θ ] (3) = max. θ P(τ;θ)R(τ) (6) P(τ;θ) θ log P(τ;θ)R(τ) (9)
CS294-40 Learning for Robotics and Control Lecture 16-10/20/2008 Lecturer: Pieter Abbeel Policy Gradient Scribe: Jan Biermeyer 1 Recap Recall: H U() = E[ R(s t,a ;π ] = E[R();π ] (1) Here is a sample path
More informationSection 3.5 The Implicit Function Theorem
Section 3.5 The Implicit Function Theorem THEOREM 11 (Special Implicit Function Theorem): Suppose that F : R n+1 R has continuous partial derivatives. Denoting points in R n+1 by (x, z), where x R n and
More informationThe Gradient and Directional Derivative - (12.6)
Te Gradient and Directional Deriatie - (.6). Directional Deriaties Definition: Te directional deriatie of f x,y at te point a,b andintedirectionofteunit ector u u,u, denoted as D u f a,b, is defined by
More informationLectures in Discrete Differential Geometry 2 Surfaces
Lectures in Discrete Differential Geometry 2 Surfaces Etienne Vouga February 4, 24 Smooth Surfaces in R 3 In this section we will review some properties of smooth surfaces R 3. We will assume that is parameterized
More informationCS Tutorial 5 - Differential Geometry I - Surfaces
236861 Numerical Geometry of Images Tutorial 5 Differential Geometry II Surfaces c 2009 Parameterized surfaces A parameterized surface X : U R 2 R 3 a differentiable map 1 X from an open set U R 2 to R
More informationSec. 1.1: Basics of Vectors
Sec. 1.1: Basics of Vectors Notation for Euclidean space R n : all points (x 1, x 2,..., x n ) in n-dimensional space. Examples: 1. R 1 : all points on the real number line. 2. R 2 : all points (x 1, x
More informationFunctional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...
Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................
More informationNon-linear least squares
Non-linear least squares Concept of non-linear least squares We have extensively studied linear least squares or linear regression. We see that there is a unique regression line that can be determined
More informationUnconstrained Optimization
1 / 36 Unconstrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University February 2, 2015 2 / 36 3 / 36 4 / 36 5 / 36 1. preliminaries 1.1 local approximation
More informationLOWELL. MICHIGAN, THURSDAY, MAY 23, Schools Close. method of distribution. t o t h e b o y s of '98 a n d '18. C o m e out a n d see t h e m get
UY Y 99 U XXX 5 Q == 5 K 8 9 $8 7 Y Y 8 q 6 Y x) 5 7 5 q U «U YU x q Y U ) U Z 9 Y 7 x 7 U x U x 9 5 & U Y U U 7 8 9 x 5 x U x x ; x x [ U K»5 98 8 x q 8 q K x Y Y x K Y 5 ~ 8» Y x ; ; ; ; ; ; 8; x 7;
More informationInverse problems Total Variation Regularization Mark van Kraaij Casa seminar 23 May 2007 Technische Universiteit Eindh ove n University of Technology
Inverse problems Total Variation Regularization Mark van Kraaij Casa seminar 23 May 27 Introduction Fredholm first kind integral equation of convolution type in one space dimension: g(x) = 1 k(x x )f(x
More informationThe Steepest Descent Algorithm for Unconstrained Optimization
The Steepest Descent Algorithm for Unconstrained Optimization Robert M. Freund February, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Steepest Descent Algorithm The problem
More informationMotion Estimation (I)
Motion Estimation (I) Ce Liu celiu@microsoft.com Microsoft Research New England We live in a moving world Perceiving, understanding and predicting motion is an important part of our daily lives Motion
More informationPractical Optimization: Basic Multidimensional Gradient Methods
Practical Optimization: Basic Multidimensional Gradient Methods László Kozma Lkozma@cis.hut.fi Helsinki University of Technology S-88.4221 Postgraduate Seminar on Signal Processing 22. 10. 2008 Contents
More informationECE580 Exam 1 October 4, Please do not write on the back of the exam pages. Extra paper is available from the instructor.
ECE580 Exam 1 October 4, 2012 1 Name: Solution Score: /100 You must show ALL of your work for full credit. This exam is closed-book. Calculators may NOT be used. Please leave fractions as fractions, etc.
More informationExam 2. Average: 85.6 Median: 87.0 Maximum: Minimum: 55.0 Standard Deviation: Numerical Methods Fall 2011 Lecture 20
Exam 2 Average: 85.6 Median: 87.0 Maximum: 100.0 Minimum: 55.0 Standard Deviation: 10.42 Fall 2011 1 Today s class Multiple Variable Linear Regression Polynomial Interpolation Lagrange Interpolation Newton
More informationMotion Estimation (I) Ce Liu Microsoft Research New England
Motion Estimation (I) Ce Liu celiu@microsoft.com Microsoft Research New England We live in a moving world Perceiving, understanding and predicting motion is an important part of our daily lives Motion
More information26. Directional Derivatives & The Gradient
26. Directional Derivatives & The Gradient Given a multivariable function z = f(x, y) and a point on the xy-plane P 0 = (x 0, y 0 ) at which f is differentiable (i.e. it is smooth with no discontinuities,
More informationICS-E4030 Kernel Methods in Machine Learning
ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This
More informationPerformance Surfaces and Optimum Points
CSC 302 1.5 Neural Networks Performance Surfaces and Optimum Points 1 Entrance Performance learning is another important class of learning law. Network parameters are adjusted to optimize the performance
More informationA Regularization Framework for Learning from Graph Data
A Regularization Framework for Learning from Graph Data Dengyong Zhou Max Planck Institute for Biological Cybernetics Spemannstr. 38, 7076 Tuebingen, Germany Bernhard Schölkopf Max Planck Institute for
More informationCourse Summary Math 211
Course Summary Math 211 table of contents I. Functions of several variables. II. R n. III. Derivatives. IV. Taylor s Theorem. V. Differential Geometry. VI. Applications. 1. Best affine approximations.
More informationOptimization. Yuh-Jye Lee. March 21, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 29
Optimization Yuh-Jye Lee Data Science and Machine Intelligence Lab National Chiao Tung University March 21, 2017 1 / 29 You Have Learned (Unconstrained) Optimization in Your High School Let f (x) = ax
More informationFunctional Gradient Descent
Statistical Techniques in Robotics (16-831, F12) Lecture #21 (Nov 14, 2012) Functional Gradient Descent Lecturer: Drew Bagnell Scribe: Daniel Carlton Smith 1 1 Goal of Functional Gradient Descent We have
More information= (, ) V λ (1) λ λ ( + + ) P = [ ( ), (1)] ( ) ( ) = ( ) ( ) ( 0 ) ( 0 ) = ( 0 ) ( 0 ) 0 ( 0 ) ( ( 0 )) ( ( 0 )) = ( ( 0 )) ( ( 0 )) ( + ( 0 )) ( + ( 0 )) = ( + ( 0 )) ( ( 0 )) P V V V V V P V P V V V
More informationChapter 2: 1D Kinematics Tuesday January 13th
Chapter : D Kinematics Tuesday January 3th Motion in a straight line (D Kinematics) Aerage elocity and aerage speed Instantaneous elocity and speed Acceleration Short summary Constant acceleration a special
More informationMATHEMATICS FOR COMPUTER VISION WEEK 8 OPTIMISATION PART 2. Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year
MATHEMATICS FOR COMPUTER VISION WEEK 8 OPTIMISATION PART 2 1 Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year 2013-14 OUTLINE OF WEEK 8 topics: quadratic optimisation, least squares,
More informationOslo Class 2 Tikhonov regularization and kernels
RegML2017@SIMULA Oslo Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT May 3, 2017 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n
More informationUnconstrained minimization of smooth functions
Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and
More informationMathematical modelling Chapter 2 Nonlinear models and geometric models
Mathematical modelling Chapter 2 Nonlinear models and geometric models Faculty of Computer and Information Science University of Ljubljana 2018/2019 3. Nonlinear models Given is a sample of points {(x
More informationThe Conjugate Gradient Method
The Conjugate Gradient Method Lecture 5, Continuous Optimisation Oxford University Computing Laboratory, HT 2006 Notes by Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The notion of complexity (per iteration)
More informationReview of Classical Optimization
Part II Review of Classical Optimization Multidisciplinary Design Optimization of Aircrafts 51 2 Deterministic Methods 2.1 One-Dimensional Unconstrained Minimization 2.1.1 Motivation Most practical optimization
More informationEdge Detection. CS 650: Computer Vision
CS 650: Computer Vision Edges and Gradients Edge: local indication of an object transition Edge detection: local operators that find edges (usually involves convolution) Local intensity transitions are
More informationChapter 8 Gradient Methods
Chapter 8 Gradient Methods An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Introduction Recall that a level set of a function is the set of points satisfying for some constant. Thus, a point
More informationGradient Descent. Dr. Xiaowei Huang
Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,
More informationPDEs in Image Processing, Tutorials
PDEs in Image Processing, Tutorials Markus Grasmair Vienna, Winter Term 2010 2011 Direct Methods Let X be a topological space and R: X R {+ } some functional. following definitions: The mapping R is lower
More informationNewton s Method and Linear Approximations
Newton s Method and Linear Approximations Curves are tricky. Lines aren t. Newton s Method and Linear Approximations Newton s Method for finding roots Goal: Where is f (x) = 0? f (x) = x 7 + 3x 3 + 7x
More informationMax Margin-Classifier
Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization
More informationg(t) = f(x 1 (t),..., x n (t)).
Reading: [Simon] p. 313-333, 833-836. 0.1 The Chain Rule Partial derivatives describe how a function changes in directions parallel to the coordinate axes. Now we shall demonstrate how the partial derivatives
More informationAlgorithms for Nonsmooth Optimization
Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationClassification Logistic Regression
Classification Logistic Regression Machine Learning CSE546 Kevin Jamieson University of Washington October 16, 2016 1 THUS FAR, REGRESSION: PREDICT A CONTINUOUS VALUE GIVEN SOME INPUTS 2 Weather prediction
More informationII. DIFFERENTIABLE MANIFOLDS. Washington Mio CENTER FOR APPLIED VISION AND IMAGING SCIENCES
II. DIFFERENTIABLE MANIFOLDS Washington Mio Anuj Srivastava and Xiuwen Liu (Illustrations by D. Badlyans) CENTER FOR APPLIED VISION AND IMAGING SCIENCES Florida State University WHY MANIFOLDS? Non-linearity
More information5 Handling Constraints
5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest
More informationCS-E4830 Kernel Methods in Machine Learning
CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This
More informationIntroduction to gradient descent
6-1: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction to gradient descent Derivation and intuitions Hessian 6-2: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction Our
More informationApproximation of Circular Arcs by Parametric Polynomials
Approximation of Circular Arcs by Parametric Polynomials Emil Žagar Lecture on Geometric Modelling at Charles University in Prague December 6th 2017 1 / 44 Outline Introduction Standard Reprezentations
More informationMean-Shift Tracker Computer Vision (Kris Kitani) Carnegie Mellon University
Mean-Shift Tracker 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Mean Shift Algorithm A mode seeking algorithm
More informationConvex Optimization. Problem set 2. Due Monday April 26th
Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining
More informationLECTURE 2: CROSS PRODUCTS, MULTILINEARITY, AND AREAS OF PARALLELOGRAMS
LECTURE : CROSS PRODUCTS, MULTILINEARITY, AND AREAS OF PARALLELOGRAMS MA1111: LINEAR ALGEBRA I, MICHAELMAS 016 1. Finishing up dot products Last time we stated the following theorem, for which I owe you
More informationOrthogonal tensor decomposition
Orthogonal tensor decomposition Daniel Hsu Columbia University Largely based on 2012 arxiv report Tensor decompositions for learning latent variable models, with Anandkumar, Ge, Kakade, and Telgarsky.
More informationPart 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)
Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective
More informationHilbert Space Methods for Reduced-Rank Gaussian Process Regression
Hilbert Space Methods for Reduced-Rank Gaussian Process Regression Arno Solin and Simo Särkkä Aalto University, Finland Workshop on Gaussian Process Approximation Copenhagen, Denmark, May 2015 Solin &
More informationLearning the Kernel Function via Regularization
Learning the Kernel Function via Regularization Author: Micchelli, C.A., Pontil, M Speaker: Yonggang Yao April 9, 2007 Outline Settings for a Class of Regularization Problems 1 Training data: (y i, x i
More informationRegML 2018 Class 2 Tikhonov regularization and kernels
RegML 2018 Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT June 17, 2018 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n = (x i,
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V
More informationDefinition and basic properties of heat kernels I, An introduction
Definition and basic properties of heat kernels I, An introduction Zhiqin Lu, Department of Mathematics, UC Irvine, Irvine CA 92697 April 23, 2010 In this lecture, we will answer the following questions:
More informationEAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science
EAD 115 Numerical Solution of Engineering and Scientific Problems David M. Rocke Department of Applied Science Taylor s Theorem Can often approximate a function by a polynomial The error in the approximation
More informationWorksheet 9. Math 1B, GSI: Andrew Hanlon. 1 Ce 3t 1/3 1 = Ce 3t. 4 Ce 3t 1/ =
Worksheet 9 Math B, GSI: Andrew Hanlon. Show that for each of the following pairs of differential equations and functions that the function is a solution of a differential equation. (a) y 2 y + y 2 ; Ce
More informationGradient Based Optimization Methods
Gradient Based Optimization Methods Antony Jameson, Department of Aeronautics and Astronautics Stanford University, Stanford, CA 94305-4035 1 Introduction Consider the minimization of a function J(x) where
More informationPart 2: Linesearch methods for unconstrained optimization. Nick Gould (RAL)
Part 2: Linesearch methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective
More informationNumerical Optimization
Numerical Optimization Unit 2: Multivariable optimization problems Che-Rung Lee Scribe: February 28, 2011 (UNIT 2) Numerical Optimization February 28, 2011 1 / 17 Partial derivative of a two variable function
More informationPiecewise rigid curve deformation via a Finsler steepest descent
Piecewise rigid curve deformation via a Finsler steepest descent arxiv:1308.0224v3 [math.na] 7 Dec 2015 Guillaume Charpiat Stars Lab INRIA Sophia-Antipolis Giacomo Nardi, Gabriel Peyré François-Xavier
More informationMVE165/MMG631 Linear and integer optimization with applications Lecture 13 Overview of nonlinear programming. Ann-Brith Strömberg
MVE165/MMG631 Overview of nonlinear programming Ann-Brith Strömberg 2015 05 21 Areas of applications, examples (Ch. 9.1) Structural optimization Design of aircraft, ships, bridges, etc Decide on the material
More informationJim Lambers MAT 460/560 Fall Semester Practice Final Exam
Jim Lambers MAT 460/560 Fall Semester 2009-10 Practice Final Exam 1. Let f(x) = sin 2x + cos 2x. (a) Write down the 2nd Taylor polynomial P 2 (x) of f(x) centered around x 0 = 0. (b) Write down the corresponding
More informationLMS Algorithm Summary
LMS Algorithm Summary Step size tradeoff Other Iterative Algorithms LMS algorithm with variable step size: w(k+1) = w(k) + µ(k)e(k)x(k) When step size µ(k) = µ/k algorithm converges almost surely to optimal
More informationThe Karcher Mean of Points on SO n
The Karcher Mean of Points on SO n Knut Hüper joint work with Jonathan Manton (Univ. Melbourne) Knut.Hueper@nicta.com.au National ICT Australia Ltd. CESAME LLN, 15/7/04 p.1/25 Contents Introduction CESAME
More informationMatrix Derivatives and Descent Optimization Methods
Matrix Derivatives and Descent Optimization Methods 1 Qiang Ning Department of Electrical and Computer Engineering Beckman Institute for Advanced Science and Techonology University of Illinois at Urbana-Champaign
More informationLECTURE 3 3.1Rules of Vector Differentiation
LETURE 3 3.1Rules of Vector Differentiation We hae defined three kinds of deriaties inoling the operator grad( ) i j k, x y z 1 3 di(., x y z curl( i x 1 j y k z 3 d The good news is that you can apply
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationNonlinearity Root-finding Bisection Fixed Point Iteration Newton s Method Secant Method Conclusion. Nonlinear Systems
Nonlinear Systems CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Justin Solomon CS 205A: Mathematical Methods Nonlinear Systems 1 / 24 Part III: Nonlinear Problems Not all numerical problems
More informationIterative Reweighted Least Squares
Iterative Reweighted Least Squares Sargur. University at Buffalo, State University of ew York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative
More informationMath 307: Problems for section 2.1
Math 37: Problems for section 2. October 9 26 2. Are the vectors 2 2 3 2 2 4 9 linearly independent? You may use MAT- 7 3 LAB/Octave to perform calculations but explain your answer. Put the vectors in
More informationNewton s Method and Linear Approximations
Newton s Method and Linear Approximations Newton s Method for finding roots Goal: Where is f (x) =0? f (x) =x 7 +3x 3 +7x 2 1 2-1 -0.5 0.5-2 Newton s Method for finding roots Goal: Where is f (x) =0? f
More informationMatrix differential calculus Optimization Geoff Gordon Ryan Tibshirani
Matrix differential calculus 10-725 Optimization Geoff Gordon Ryan Tibshirani Review Matrix differentials: sol n to matrix calculus pain compact way of writing Taylor expansions, or definition: df = a(x;
More informationTrust Region Methods. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725
Trust Region Methods Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh Convex Optimization 10-725/36-725 Trust Region Methods min p m k (p) f(x k + p) s.t. p 2 R k Iteratively solve approximations
More informationOn Nesterov s Random Coordinate Descent Algorithms - Continued
On Nesterov s Random Coordinate Descent Algorithms - Continued Zheng Xu University of Texas At Arlington February 20, 2015 1 Revisit Random Coordinate Descent The Random Coordinate Descent Upper and Lower
More informationVector Derivatives and the Gradient
ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 1/1 Lecture 10 ECE 275A Vector Derivatives and the Gradient ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego
More informationVariational Methods in Signal and Image Processing
Variational Methods in Signal and Image Processing XU WANG Texas A&M University Dept. of Electrical & Computer Eng. College Station, Texas United States xu.wang@tamu.edu ERCHIN SERPEDIN Texas A&M University
More informationg(2, 1) = cos(2π) + 1 = = 9
1. Let gx, y 2x 2 cos2πy 2 + y 2. You can use the fact that Dg2, 1 [8 2]. a Find an equation for the tangent plane to the graph z gx, y at the point 2, 1. There are two key parts to this problem. The first,
More informationThe Inverse Function Theorem via Newton s Method. Michael Taylor
The Inverse Function Theorem via Newton s Method Michael Taylor We aim to prove the following result, known as the inverse function theorem. Theorem 1. Let F be a C k map (k 1) from a neighborhood of p
More informationChapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program
Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools Joan Llull Microeconometrics IDEA PhD Program Maximum Likelihood Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical
More informationLearning Kernel Parameters by using Class Separability Measure
Learning Kernel Parameters by using Class Separability Measure Lei Wang, Kap Luk Chan School of Electrical and Electronic Engineering Nanyang Technological University Singapore, 3979 E-mail: P 3733@ntu.edu.sg,eklchan@ntu.edu.sg
More informationAPPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.
APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product
More informationBeyond the Point Cloud: From Transductive to Semi-Supervised Learning
Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of
More informationSecond Order ODEs. Second Order ODEs. In general second order ODEs contain terms involving y, dy But here only consider equations of the form
Second Order ODEs Second Order ODEs In general second order ODEs contain terms involving y, dy But here only consider equations of the form A d2 y dx 2 + B dy dx + Cy = 0 dx, d2 y dx 2 and F(x). where
More information1 The space of linear transformations from R n to R m :
Math 540 Spring 20 Notes #4 Higher deriaties, Taylor s theorem The space of linear transformations from R n to R m We hae discssed linear transformations mapping R n to R m We can add sch linear transformations
More informationNumerical optimization
Numerical optimization Lecture 4 Alexander & Michael Bronstein tosca.cs.technion.ac.il/book Numerical geometry of non-rigid shapes Stanford University, Winter 2009 2 Longest Slowest Shortest Minimal Maximal
More informationNewton s Method and Linear Approximations 10/19/2011
Newton s Method and Linear Approximations 10/19/2011 Curves are tricky. Lines aren t. Newton s Method and Linear Approximations 10/19/2011 Newton s Method Goal: Where is f (x) =0? f (x) =x 7 +3x 3 +7x
More informationNumerical solutions of nonlinear systems of equations
Numerical solutions of nonlinear systems of equations Tsung-Ming Huang Department of Mathematics National Taiwan Normal University, Taiwan E-mail: min@math.ntnu.edu.tw August 28, 2011 Outline 1 Fixed points
More informationSupport Vector Machines
Support Vector Machines Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/32 Margin Classifiers margin b = 0 Sridhar Mahadevan: CMPSCI 689 p.
More informationB553 Lecture 3: Multivariate Calculus and Linear Algebra Review
B553 Lecture 3: Multivariate Calculus and Linear Algebra Review Kris Hauser December 30, 2011 We now move from the univariate setting to the multivariate setting, where we will spend the rest of the class.
More information