Gradient Descent and Implementation Solving the Euler-Lagrange Equations in Practice

Similar documents
Convex Optimization Conjugate, Subdifferential, Proximation

ENERGY METHODS IN IMAGE PROCESSING WITH EDGE ENHANCEMENT

Introduction to numerical schemes

Finite difference method for elliptic problems: I

Erkut Erdem. Hacettepe University February 24 th, Linear Diffusion 1. 2 Appendix - The Calculus of Variations 5.

Tutorial 2. Introduction to numerical schemes

Lecture 9 Approximations of Laplace s Equation, Finite Element Method. Mathématiques appliquées (MATH0504-1) B. Dewals, C.

Numerical Solutions to Partial Differential Equations

PDEs in Image Processing, Tutorials

Solving linear equations with Gaussian Elimination (I)

Least Mean Squares Regression

Nonlinear Diffusion. Journal Club Presentation. Xiaowei Zhou

ITERATIVE METHODS FOR NONLINEAR ELLIPTIC EQUATIONS

1 Non-negative Matrix Factorization (NMF)

Random walks for deformable image registration Dana Cobzas

Finite difference method for heat equation

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Motivation: Sparse matrices and numerical PDE's

Solving linear systems (6 lectures)

Background. Background. C. T. Kelley NC State University tim C. T. Kelley Background NCSU, Spring / 58

ECS171: Machine Learning

Review for Exam 2 Ben Wang and Mark Styczynski

Lecture 6: CS395T Numerical Optimization for Graphics and AI Line Search Applications

Variational Methods in Image Denoising

Advanced computational methods X Selected Topics: SGD

Active Contours Snakes and Level Set Method

PDE Solvers for Fluid Flow

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Recovery of Sparse Signals from Noisy Measurements Using an l p -Regularized Least-Squares Algorithm

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

IE 5531: Engineering Optimization I

Finite Difference Methods for Boundary Value Problems

Time stepping methods

Finite Difference Method

Homework 5. Convex Optimization /36-725

Parameter Identification in Partial Differential Equations

Conjugate Gradient Tutorial

Lecture 10: Finite Differences for ODEs & Nonlinear Equations

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

A Riemannian Framework for Denoising Diffusion Tensor Images

To keep things simple, let us just work with one pattern. In that case the objective function is defined to be. E = 1 2 xk d 2 (1)

2 A Model, Harmonic Map, Problem

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Multibody simulation

the method of steepest descent

Fraunhofer Institute for Computer Graphics Research Interactive Graphics Systems Group, TU Darmstadt Fraunhoferstrasse 5, Darmstadt, Germany

PDE Based Image Diffusion and AOS

LECTURE # 0 BASIC NOTATIONS AND CONCEPTS IN THE THEORY OF PARTIAL DIFFERENTIAL EQUATIONS (PDES)

8 A pseudo-spectral solution to the Stokes Problem

Continuous Primal-Dual Methods in Image Processing

Least Mean Squares Regression. Machine Learning Fall 2018

Numerical optimization

Lecture 10: October 27, 2016

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

A Localized Linearized ROF Model for Surface Denoising

Variational Image Restoration

CSC321 Lecture 15: Exploding and Vanishing Gradients

Introduction to Electrostatics

Inverse problems Total Variation Regularization Mark van Kraaij Casa seminar 23 May 2007 Technische Universiteit Eindh ove n University of Technology

Module III: Partial differential equations and optimization

Numerical Methods - Numerical Linear Algebra

Chapter Two: Numerical Methods for Elliptic PDEs. 1 Finite Difference Methods for Elliptic PDEs

Simple Examples on Rectangular Domains

Shooting methods for numerical solutions of control problems constrained. by linear and nonlinear hyperbolic partial differential equations

Spring 2014: Computational and Variational Methods for Inverse Problems CSE 397/GEO 391/ME 397/ORI 397 Assignment 4 (due 14 April 2014)

Numerical Optimization Algorithms

Numerical Solutions to Partial Differential Equations

Chapter 7 Iterative Techniques in Matrix Algebra

Math (P)Review Part II:

Final Examination. CS 205A: Mathematical Methods for Robotics, Vision, and Graphics (Fall 2013), Stanford University

Bayesian Paradigm. Maximum A Posteriori Estimation

On Some Variational Optimization Problems in Classical Fluids and Superfluids

Numerical Solution Techniques in Mechanical and Aerospace Engineering

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

AMS 147 Computational Methods and Applications Lecture 17 Copyright by Hongyun Wang, UCSC

Index. C 2 ( ), 447 C k [a,b], 37 C0 ( ), 618 ( ), 447 CD 2 CN 2

Iterative regularization of nonlinear ill-posed problems in Banach space

Finite Difference Methods (FDMs) 2

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Partial Differential Equations

Getting Started with Communications Engineering

Linear Diffusion and Image Processing. Outline

Basic Aspects of Discretization

Image processing and Computer Vision

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

Project: Vibration of Diatomic Molecules

nonrobust estimation The n measurement vectors taken together give the vector X R N. The unknown parameter vector is P R M.

Lecture 9: September 28

Chapter 10 Exercises

Finite Differences for Differential Equations 28 PART II. Finite Difference Methods for Differential Equations

Neural Networks: Backpropagation

Machine Learning Linear Models

The Dirichlet boundary problems for second order parabolic operators satisfying a Carleson condition

AMS 529: Finite Element Methods: Fundamentals, Applications, and New Trends

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review

Adaptive Primal Dual Optimization for Image Processing and Learning

Dual methods for the minimization of the total variation

1 Discretizing BVP with Finite Element Methods.

Partial Differential Equations II

A Sobolev trust-region method for numerical solution of the Ginz

Transcription:

1 Lecture Notes, HCI, 4.1.211 Chapter 2 Gradient Descent and Implementation Solving the Euler-Lagrange Equations in Practice Bastian Goldlücke Computer Vision Group Technical University of Munich

2 Bastian Goldlücke Overview x 1 Gradient descent Basic descent algorithm Descent direction 2 Discretization and Implementation Finite difference operators Gradient descent for linear inverse problems 3 Summary

3 Gradient descent Overview x Bastian Goldlücke 1 Gradient descent Basic descent algorithm Descent direction 2 Discretization and Implementation Finite difference operators Gradient descent for linear inverse problems 3 Summary

4 Gradient descent Gradient descent x Basic descent algorithm x Bastian Goldlücke Suppose you have an energy E : C 1 (Ω) R as required for the theorem (3.15), and your goal is to find the minimum argmin E(u). u C 1 (Ω) A simple approach you can always try is the gradient descent or steepest descent algorithm, which works as follows: 1 Start at an initial point u V. 2 Find the direction of steepest descent ĥ V which minimizes δe(u; h). 3 Determine t such that E(u + tû) is as small as possible. Alternatively, choose just any t such that E(u + tĥ) < E(u). Then update u with u + tĥ. 4 Repeat from (2) until convergence (i.e. E does not get smaller anymore). Key question: what is the steepest descent direction ĥ given u?

Gradient descent Descent direction x Bastian Goldlücke Steepest descent direction x Key question: what is the steepest descent direction h given u? Fortunately, we already know the answer. Theorem (steepest descent direction) We normalize directions with respect to the L 2 norm, i.e. only consider vectors h in the unit sphere of L 2. The direction of steepest descent is then given by φ u argmin δe(u; h) =, h Cc 1 (Ω), h 2 =1 φ u 2 where φ u is the left-hand side of the Euler-Lagrange equation viewed as a function on Ω, φ u : x a L(u, u, x) div x [ b L(u, u, x)]. The proof of this theorem is not too difficult and quite enlightening, especially if you compare to the situation on R n and think about it thoroughly (exercise). 5

6 Gradient descent Descent direction x Bastian Goldlücke Gradient descent properties x Gradient descent can easily be implemented for any functional for which you can compute the Euler-Lagrange equation. It will nearly always work, if you manage to initialize it with a good initial value u which is close to the minimum. However, you can usually only hope for a local minimum. Gradient descent is very slow.

7 Discretization and Implementation Overview x Bastian Goldlücke 1 Gradient descent Basic descent algorithm Descent direction 2 Discretization and Implementation Finite difference operators Gradient descent for linear inverse problems 3 Summary

8 Discretization and Implementation Discretization x Finite difference operators x Bastian Goldlücke When one wants to solve the Euler-Lagrange PDE, one has to discretize it and work in the discrete setting. We will discuss discretization only for the case of a 2D image, and Ω R 2 is a rectangle, it is easy to generalize to higher dimensions which follow the same pattern. Thus, in the following, we will assume that we have given an image on a regular grid of size M N. This means that an image u is represented as a matrix u R M N. We write the gray value at a pixel (i, j) as u i,j. Analogously, a vector field ξ on Ω is described by two component matrices ξ 1, ξ 2 R M N.

Discretization and Implementation Discretization of the gradient x Finite difference operators x Bastian Goldlücke The main question is how to discretize the derivative operators on functions u and vector fields ξ. Discretization of the gradient The gradient acts on functions u. It can be most easily discretized with forward differences: [ ] ui+1,j u ( u) i,j = i,j. u i,j+1 u i,j Neumann boundary conditions u n = are used for functions, i.e. if the grid has size M N, then ( u) 1 M,j = and ( u) 2 i,n =. Note that for example the regularizer total variation can then be written in discrete form as M N J(u) = ( u)i,j 2. i=1 j=1 9

Discretization and Implementation Discretization of the method x Finite difference operators x Bastian Goldlücke Discretization of the divergence The divergence operator acts on vector fields ξ. For a consistent discretization, it must be discretized so that the key property div =, the Gauss theorem, is satisfied. ξi,j 1 ξi 1,j 1 if 1 < i < M (div(ξ)) i,j = ξi,j 1 if i = 1 ξi 1,j 1 if i = M ξi,j 2 ξi,j 1 2 if 1 < j < N + ξi,j 2 if j = 1 if j = N ξ 2 i,j 1 In short, div is discretized with backward differences using Dirichlet boundary conditions - this makes sense, since test functions ξ vanish on the boundary. You should check for yourself that the desired property is satisfied when div is defined this way (exercise). 1

11 Discretization and Implementation Gradient descent for linear inverse problems x Bastian Goldlücke Gradient descent for the linear inverse problem x Having decided on a discretization, implementation of gradient descent is extremely simple. We show it here for the case of smoothened total variation on linear inverse problems. Input: An observed image f related to an unknown original image u by Au = f in an ideal world, but which is degraded by Gaussian noise of standard deviation σ. Goal: Recover an estimate for u by computing a solution to E(u) = u ɛ + 1 2σ 2 (Au f )2 dx. Ω Reminder: The Euler-Lagrange equation for this functional is ( ) u div + 1 u ɛ σ 2 A (Au f ) =. The model is convex, so any solution is a global minimizer of the energy.

Discretization and Implementation Gradient descent for linear inverse problems x Bastian Goldlücke Gradient descent for the linear inverse problem x Algorithm Choose a step size τ >. Initialize u = f. Then iterate the following steps. Compute the left-hand size of the Euler-Lagrange equation, using the discrete difference schemes on the previous slides. ( ) un n = div + 1 u n ɛ σ 2 A (Au n f ). Update the solution u n+1 = u n + τ n. The step size τ depends on a number of factors and is best determined experimentally. Later, we will show much better algorithms where we know the maximum step size we can take. It is also possible to do a line search to determine the maximum step size automatically in each step. 12

13 Summary Summary x Bastian Goldlücke A local minimum of a differentiable energy can always be found via gradient descent. Non-differentiable problems can often be approximated by a smoothened version, which however usually does not obtain the exact solution to the original problem. If the problem is convex, gradient descent converges to the global minimizer.

14 Summary Bastian Goldlücke Open questions x The regularizer of the ROF functional is u 2 dx, Ω which requires u to be differentiable. Yet, we are looking for minimizers in L 2 (Ω). It is necessary to generalize the definition of the regularizer (chapter 4). The total variation is not a differentiable functional, so the variational principle is not applicable. We need a theory for convex, but not differentiable functionals (chapter 2).