Affine covariant Semi-smooth Newton in function space

Similar documents
Constrained Minimization and Multigrid

Robust error estimates for regularization and discretization of bang-bang control problems

1.The anisotropic plate model

Optimal Control of Partial Differential Equations I+II

Solving Distributed Optimal Control Problems for the Unsteady Burgers Equation in COMSOL Multiphysics

Adaptive methods for control problems with finite-dimensional control space

Numerical Methods for Large-Scale Nonlinear Systems

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

Semismooth Newton Methods for an Optimal Boundary Control Problem of Wave Equations

Superlinear convergence of the Control Reduced Interior Point Method for PDE Constrained Optimization 1

Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization. Prof. Dr. R. Herzog

20 J.-S. CHEN, C.-H. KO AND X.-R. WU. : R 2 R is given by. Recently, the generalized Fischer-Burmeister function ϕ p : R2 R, which includes

Time-dependent Dirichlet Boundary Conditions in Finite Element Discretizations

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

Nonlinear Optimization for Optimal Control

Semismooth Newton Methods for Variational Inequalities and Constrained Optimization Problems in Function Spaces


ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS

PRECONDITIONED SOLUTION OF STATE GRADIENT CONSTRAINED ELLIPTIC OPTIMAL CONTROL PROBLEMS

Scientific Computing WS 2018/2019. Lecture 15. Jürgen Fuhrmann Lecture 15 Slide 1

Elliptic optimal control problems with L 1 -control cost and applications for the placement of control devices

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005

RESEARCH ARTICLE. A strategy of finding an initial active set for inequality constrained quadratic programming problems

Design of optimal RF pulses for NMR as a discrete-valued control problem

Constrained Optimization

Constrained Optimization and Lagrangian Duality

A Continuation Method for the Solution of Monotone Variational Inequality Problems

Optimal control problems with PDE constraints

MODIFYING SQP FOR DEGENERATE PROBLEMS

SEVERAL PATH-FOLLOWING METHODS FOR A CLASS OF GRADIENT CONSTRAINED VARIATIONAL INEQUALITIES

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

Unconstrained Optimization

AN INTERIOR-POINT METHOD FOR NONLINEAR OPTIMIZATION PROBLEMS WITH LOCATABLE AND SEPARABLE NONSMOOTHNESS

The Kuratowski Ryll-Nardzewski Theorem and semismooth Newton methods for Hamilton Jacobi Bellman equations

Algorithms for constrained local optimization

SUPERCONVERGENCE PROPERTIES FOR OPTIMAL CONTROL PROBLEMS DISCRETIZED BY PIECEWISE LINEAR AND DISCONTINUOUS FUNCTIONS

APPROXIMATE ISOMETRIES ON FINITE-DIMENSIONAL NORMED SPACES

THE INVERSE FUNCTION THEOREM

Handout on Newton s Method for Systems

PDE Constrained Optimization selected Proofs

A Novel Inexact Smoothing Method for Second-Order Cone Complementarity Problems

K. Krumbiegel I. Neitzel A. Rösch

Lecture 3. Optimization Problems and Iterative Algorithms

Semismooth Newton Methods for Operator Equations in Function Spaces

Lecture 13 Newton-type Methods A Newton Method for VIs. October 20, 2008

Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization. Nick Gould (RAL)

Sobolev spaces. May 18

A derivative-free nonmonotone line search and its application to the spectral residual method

ON GAP FUNCTIONS OF VARIATIONAL INEQUALITY IN A BANACH SPACE. Sangho Kum and Gue Myung Lee. 1. Introduction

STATE-CONSTRAINED OPTIMAL CONTROL OF THE THREE-DIMENSIONAL STATIONARY NAVIER-STOKES EQUATIONS. 1. Introduction

Optimality Conditions for Constrained Optimization

A Recursive Trust-Region Method for Non-Convex Constrained Minimization

SF2822 Applied Nonlinear Optimization. Preparatory question. Lecture 10: Interior methods. Anders Forsgren. 1. Try to solve theory question 7.

CS711008Z Algorithm Design and Analysis

Computational Finance

Lecture 19 Algorithms for VIs KKT Conditions-based Ideas. November 16, 2008

Identifying Active Constraints via Partial Smoothness and Prox-Regularity

1 Computing with constraints

A Distributed Newton Method for Network Utility Maximization, II: Convergence

On the iterate convergence of descent methods for convex optimization

The Dirichlet s P rinciple. In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation:

A duality-based approach to elliptic control problems in non-reflexive Banach spaces

5 Handling Constraints

Algorithms for PDE-Constrained Optimization

Support Vector Machines

Affine scaling interior Levenberg-Marquardt method for KKT systems. C S:Levenberg-Marquardt{)KKTXÚ

Semismooth implicit functions

Optimal control of partial differential equations with affine control constraints

c 2004 Society for Industrial and Applied Mathematics

1.4 The Jacobian of a map

Overview of normed linear spaces

A Sobolev trust-region method for numerical solution of the Ginz

Remarks on Extremization Problems Related To Young s Inequality

Least-Squares Finite Element Methods

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Newton-type Methods for Solving the Nonsmooth Equations with Finitely Many Maximum Functions

Introduction into Implementation of Optimization problems with PDEs: Sheet 3

Path-following and augmented Lagrangian methods for contact problems in linear elasticity

Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming

Local Inexact Newton Multilevel FEM for Nonlinear Elliptic Problems

Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization

CS 450 Numerical Analysis. Chapter 5: Nonlinear Equations

An interior-point trust-region polynomial algorithm for convex programming

ON A CERTAIN GENERALIZATION OF THE KRASNOSEL SKII THEOREM

The nonsmooth Newton method on Riemannian manifolds

Numerical Optimal Control Part 3: Function space methods

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM

8 Numerical methods for unconstrained problems

An Iteratively Regularized Projection Method with Quadratic Convergence for Nonlinear Ill-posed Problems

Everywhere differentiability of infinity harmonic functions

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

A Posteriori Existence in Adaptive Computations

A Local Convergence Analysis of Bilevel Decomposition Algorithms

subject to (x 2)(x 4) u,

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Example 1. Hamilton-Jacobi equation. In particular, the eikonal equation. for some n( x) > 0 in Ω. Here 1 / 2

Convex Optimization CMU-10725

A PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES

Conservative Control Systems Described by the Schrödinger Equation

Transcription:

Affine covariant Semi-smooth Newton in function space Anton Schiela March 14, 2018 These are lecture notes of my talks given for the Winter School Modern Methods in Nonsmooth Optimization that was held from February 26 to March 02, 2018 in Würzburg. They consist of a concise summary of the material, presented in the talks and the slides used there. Come supplementary information is also given. 1 Local Newton Methods Consider a nonlinear, differentiable mapping F : Z Y for which we try to find a root: z : F (z) = 0. One popular way is Newton s method: F (z k )δz k + F (z k ) = 0 z k+1 = z k + δz k Newton s method is only locally convergent, so we have to globalize it: damping path-following Affine covariance Let T : Y Z be a linear isomorphism. Then F (z) = 0 T F (z) = 0, (T F ) (z) = T F (z) and F (z k )δz k + F (z k ) = 0 T F (z k )δz k + T F (z k ) = 0 Thus, roots and Newton corrections are not changed by a linear transformation of the image space. Even more Let T x : Y Z x be a family of linear isomorphisms onto a family of linear spaces, then the same holds true: we could transform the image space in each Newton step without changing the Newton correction. Thus, questions of convergence of Newton s method are completely independent of quantities that live in the image space (use e.g. T = λid as rescalings). In problems that involving differential equations the norms in the domain space are readily computed, and it is easy to implement problem adjusted norms (which is often a good idea). In contrast, norms in the image space are usually dual norms, which are computationally inaccessible, and can in most cases only be approximated. 1

Local Newton Methods 2 Local convergence In order to characterize the deviation from linearity, we define an affine covariant theoretical factor Θ z (z) := F (z) 1 (F (z)(z z) (F (z ) F (z))) Z z z Z for z z. Obviously (for analysis only): Θ z (z) F (z) 1 Y Z (F (z)(z z) (F (z ) F (z))) Y z z Z This looks like an affine covariant version of the usual finite differences used in the definition of Fréchet differentiability of F at z. Here, however, we will consider z fixed, and the limit z z. Theorem 1.1. Let R be a linear space, Z a normed space, D Z and F : D R. Assume there is z D with F (z ) = 0. For given z D assume that there exists an invertible linear mapping F (z)( ) : Z R such that Θ z (z) is well defined. Assume that an inexact Newton step results in z + := z F (z) 1 F (z) + e, where the relative error γ(z) := e / z z is bounded by γ(z) + Θ z (z) β < 1. (1) Then z + z β z z. (2) Proof. We compute for one inexact Newton step: z + z = z F (z) 1 F (z) + e z F (z) 1 [F (z)(z z ) (F (z) F (z ))] + e. Inserting the definition of Θ z (z) and assumption (1) we obtain z + z (Θ(z) + γ(z)) z z β z z. F is called (affine covariantly) Newton differentiable or semi-smooth with respect to F, if Θ z (z) 0 as z z. For the case of exact Newton steps (γ(z) = 0) we then obtain the well known result of local superlinear convergence. The choice of F is not unique. It is an algorithmic, but not an analytic quantity. On the basis of this theorem, we denote Θ z (z) as theoretical contraction factor. Close relative: implicit function theorem We know F (z 0 ) = 0, do we find a root for F (z 0 ) p = 0? Difference: used to show existence of local solutions

Semi-smoothness of superposition operators 3 Needs strict differentiability: Θ IF z 0 (z, y) := F (z 0 ) 1 (F (z 0 )(z y) (F (z) F (y))) Z z y Z, and lim z,y z0 Θ IF z 0 (z, y) = 0. Then the simplified Newton method: F (z 0 )δz k + (F (z k ) p) = 0. converges (for small enough p, of course) to a solution of the perturbed equation. 2 Semi-smoothness of superposition operators Consider the following nonlinear function: We interpret the mapping g : R d Ω R u g(u, ω), which maps d-vectors to real numbers, as non-linear mapping G : L p (Ω) d L s (Ω), which maps (vector valued) functions to real valued functions, via the relation: G(u)(ω) := g(u(ω), ω) Such a mapping is called superposition operator. We assume that G maps measurable functions to measurable functions. A sufficient condition for that property is that g is a Caratheodory function, i.e., continuous in u for fixed ω and measurable in ω for fixed u. However, also pointwise limits and suprema of Caratheodory functions yield superposition operators that retain measurablility. The following lemma is a slight refinement of a classical result of Krasnoselsky: Theorem 2.1 (Continuity). Let Ω be a measurable subset of R d, and g : R n Ω R m such that the corresponding superposition operator G maps L p (Ω) into L s (Ω) for 1 p, s <. Let u L p (Ω) be given. If g is continuous with respect to u at (u (ω), ω) for almost all ω Ω, then G is continuous at u in the norm topology. Proof. To show continuity of G at u for s < we consider an arbitrary sequence u n u Lp 0. By picking a suitable subsequence, we may assume w.l.o.g. that u n (ω) u (ω) pointwise almost everywhere. Define the function w(u, ω) := g(u, ω) g(u (ω), ω) s and denote by W the corresponding superposition operator. Inserting the sequence u n, we conclude that W (u n ) 0 pointwise a.e. because g(u, ω) is continuous in u at (u (ω), ω) a.e.. Next we will show W (u n ) 0 in L 1 (Ω) via the convergence theorem of Lebesgue. For this we have to construct a function w L 1 (Ω) that dominates the sequence W (u n ). If we assume for simplicity that g(u, ω) M is bounded, and Ω <, then w := (2M) s is such a domination function (otherwise, there is a more difficult construction possible). Hence, we obtain W (u n ) 0 in L 1 (Ω) and thus G(u n ) G(u ) in L s (Ω). Because u n was arbitrary, we conclude continuity of the operator G : L p (Ω) L s (Ω) at u.

Semi-smoothness of superposition operators 4 For p < s =, there is no continuity (except, when g is constant). For p = s = one needs uniform continuity of g: u v 0 g(u) g(v) 0. Semi-smoothness of the max function Consider the max-function: m : R R m(x) = max(x, 0) and define its derivative by m (x) := 0 : x < 0 arbitrary : x = 0 1 : x > 0. (3) Proposition 2.2. For the max-function we have lim Θ x x x (x) = 0 Proof. If x 0, then m is differentiable at x with locally constant derivative m (x ), so the remainder term vanishes close to x. If x = 0, we have two cases x < 0 : m (x)(x x ) (m(x) m(x )) = 0 (0 0) = 0 x > 0 : m (x)(x x ) (m(x) m(x )) = (x 0) (x 0) = 0. Semi-smoothness of superposition operators Let g, g : R R, x L p be given. Define for x R { g (x)(x x (ω)) (g(x) g(x (ω)) γ x (x, ω) := x x (ω) : x x (ω) 0 : x = x (ω) If γ x is continuous at x, Θ x (x) 0, then also the superposition operator Γ x : L p L s Then for 1/s + 1/p = 1/q we conclude by the Hölder-inequality: G (x)(x x ) (G(x) G(x )) Lq = Γ x (x) x x Lq Γ x (x) Ls x x Lp }{{} 0 Hence, loss of integrability, norm-gap : G (x)(x x ) (G(x) G(x )) Lq x x Lp 0 q < p. To bridge this norm gap, additional structure is needed, coming, e.g., from a partial differential equation.

Application to problems in function space 5 3 Application to problems in function space Using function space methods for pure superposition operator problems is, of course, a bad idea. These problems can be solved pointwise. However, if there is a coupling, e.g., by a PDE, function space methods should be used. A semi-linear equation with a semi-smooth nonlinearity As a simple example, we consider the following problem: F (x)v := x v + max(0, x)v dω Then the remainder-term is a superpositon operator: Ω v H 1 0 (Ω) [F (x)(x x ) (F (x) F (x )](ω) = [M (x)(x x ) (M(x) M(x ))](ω) The differential operator cancels out, but inverse is smoothing: (F (x)δx)v = δx v + M (x)δxv dω Ω is an invertible mapping: F (x) : H0 1 (Ω) H0 1 (Ω). By the Sobolev embedding: E : H0 1 (Ω) L p (Ω), p > 2 we get: EF (x) 1 E : L q (Ω) L p (Ω) L p (Ω), q < 2 < p. We observe a smoothing property of F (x) 1 which helps to brigde the norm gap: F (x) 1 [F (x)(x x ) (F (x) F (x )] Lp = F (x) 1 [M (x)(x x ) (M(x) M(x ))] Lp F (x) 1 Lq L p [M (x)(x x ) (M(x) M(x ))] Lq = F (x) 1 Lq L p o( x x Lp Control constrained optimal control Consider the following abstract problem: min 1 2 y y d 2 L 2 + α 2 u 2 L 2 s.t. Ay Bu = 0, u 0. The corresponding KKT-conditions can be written as follows: 0 = J (y) + A p 0 = αu + λ + B p 0 = Ay Bu u 0, λ 0, λ u = 0

Application to problems in function space 6 Elimination of λ yields αu = max(b p, 0), which can be inserted into the state equation, yielding the control reduced KKT-conditions: ( ) J F (z) = (y) + A p Ay B max(α 1 B = 0, p, 0) where F : Z Z. (Y P Y P ). This system can be solved by semi-smooth Newton methods. Let w := B p, m(w) := max(α 1 w, 0), thus Derivative: F (z) = { m (w) := Here J (y) = Id. As before, we will use the following strategy: 0 : w 0 α 1 : w > 0 ( J (y) A A Bα 1 m (w)b ), (i) Choice of appropriate framework. Here Z = Y P = H 1 H 1 (ii) Continuous invertibility of F with uniform estimate (iii) Analysis of remainder term as superposition operator Lemma 3.1. Assume that m : U R is bounded and non-negative, A : H 1 (H 1 ) continuously invertible, J (y) bounded and positive semi-definite. Then the system ( ) ( ) ( ) J (y) A δy r1 A BmB = δp r 2 has a unique solution. Moreover for r 1 L 2, the estimate δy H1 + δp H 1 C( r 1 (H1 ) + r 2 (H1 ) ) holds, where C does not depend on m, but only on m. Proof. Consider the optimal control problem: min 1 2 J (y)(δy, δy) r 1, δy + 1 2 δu 2 U s.t. Aδy B mδu r 2 = 0, where (r 1, r 2 ) Z. By standard argumentation, this problem has a solution (δy, δu) H 1 U, which is even unique by strict convexity (because of the δu-term). Further, by uniform convexity, we obtain: KKT-conditions read: u U c( r 1 (H1 ) + r 2 (H1 ). J (y)δy + A δp = r 1 δu + mb δp = 0 Aδy + B mδu = r 2

Application to problems in function space 7 The corresponding control reduced KKT conditions (our system) also has a unique solution (δy, δp). The state equation implies: and by the adjoint equation we get: δy H 1 c( m )( δu U + r 2 (H1 ) ) δp H 1 ( δy L2 + r 2 (H 1 ) ) Inserting into each other, we obtain the desired estimate. Remark: with regularity theory, this lemma can be refined. Theorem 3.2. Let U = L 2 (Q), (e.g. Q = Ω, Q = Γ...) and assume that B : H 1 L q is continuous for some q > 2. Then Newton s method converges locally superlinearly towards the solution of our optimal control problem, where z := y L2 + B p Lq Proof. Let q > 2 as in our assumption. We compute F (z)(z z ) (F (z) F (z )) = B(m (w)(w w ) (m(w) m(w ))) = B m (w)(w w ) (m(w) m(w )) (w w ) w w }{{ } γ w (w) We note that γ w (w) α 1, and thus Γ w : L q L s for each s <. By the above example lim w w γ w (w) = 0. Thus, by Lemma 2.1 we obtain Γ w (w) Ls 0 for w w Lq 0. For 1/2 = 1/s + 1/q we use the Hölder inequality to get F (z)(z z ) (F (z) F (z )) L2 (H 1 ) B L 2 (H 1 ) Γ w (w) Ls w w Lq. Now let δz = F (z) 1 (F (z)(z z ) (F (z) F (z ))). By Lemma 3.1 δz = C( δy L2 + B δp Lq ) δy H 1 + B H 1 L q δp H 1) C B L2 (H 1 ) Γ w (w) Ls w w Lq. C Γ w (w) Ls B (p p ) Lq. thus Θ z (z) 0, if B (p p ) 0 in L q for s <. Hence, Newton s method converges locally superlinearly. Related approaches Semi-smooth Newton methods were applied to optimal control problems in various other formulations [2, 6, 3]. Alternatively one can also eliminate the state and the adjoint state to end up with a problem in u that contains solution operators S = A 1 B and their adjoints, together with the max-superposition operator. Here the smoothing property of S helps to bridge the norm-gap. Also, a complementarity function approach can be used to tackle the full KKT-system. Then a smoothing step has to be employed to bridge the norm gap [6], see also the pre-semi-smooth paper [8]. The approach presented here follows [5], where also convergence rates are discussed and additional details can be found. A comprehensive source on semi-smooth Newton in function space are the textbooks [7, 4].

Detection of local convergence 8 4 Detection of local convergence A-priori theory yields confidence in algorithms, but how to implement? Main issues: detection of local convergence globalization Need computational quantities, close to theoretical quantities. The ideas presented here are based on the concept of affine covariant Newton methods [1]. To assess local convergence of Newton s method we may use a one-parameter model for Θ: [Θ z ](y) := [ω] 2 z y Z, where [ω] has to be estimated. Idea: estimate Θ(z, z ) by Θ(z, z + ): Θ z+ (z) := F (z) 1 (F (z)(z + z) (F (z + ) F (z))) Z z + z Z. If Θ z+ (z) 1 fast convergence, if Θ z+ (z) > 1 divergence Computation: Θ z+ (z) = F (z) 1 F (z + ) Z δz Z where δz = F (z) 1 F (z + ) is a simplified Newton correction. This also yields an estimate for [ω]: [ω] := 2Θ z + (z) z z + Z. = δz Z δz Z, Under the assumption that Θ z (z) 2ω z z Z we obtain an estimate for the radius of contraction: r Θ := 2Θ ω : z z Z r Θ Θ z (z) Θ. Thus, we get a computational estimate [r Θ ] := 2Θ [ω]. Moreover, we have the error bound: thus z + z = Θ z (z) z z Θ z ( z z + + z + z ), z + z Θ z (z) 1 Θ z (z) z z +, which can be estimated again via [Θ z (z)] = Θ z+ (z) Difficulties with semi-smooth problems Proposition 4.1. Assume that F ( z) 1 F is not Fréchet differentiable at z. Then there there exists Θ 0, and a sequence z k z, such that for the radius of contraction rθ k 0 for the perturbed problems F (z) F (z k ) = 0 we have lim k rk Θ 0 = 0.

Detection of local convergence 9 Proof. If F is not Fréchet differentiable at z, then for each choice of F 1 ( z) there is a a constant c and a sequence z k z such that F 1 (z 0 )(F (z 0 )(z k z) (F (z k ) F ( z))) c zk z. (4) This inequality, reinterpreted in terms of Newton contraction of F at z k, directly yields the result. Thus, if Fréchet differentiability fails, the radius of convergence of Newton methods for semi-smooth problems may break down under perturbations. This implies difficulties in many analytic and algorithmic concepts that rely on stability of Newton s method: implicit function theorems semi-smooth Newton path-following detection of local convergence Proposition 4.2. Assume that the following strong semi-smoothness assumption holds: F (y) 1 [F (y)(z z ) (F (z) F (z ))] Then lim z,y z z z = 0. (5) [Θ](z ) lim = 1. (6) z z Θ(z ) Proof. For a comparison of Θ(z ) and [Θ](z ) we introduce the auxiliary term Θ(z ) := z z z z and use the inverse triangle inequality a b a ± b to compute which yields z z z z z z, z z z z z z, z z (1 Θ(z )) z z z z (1 + Θ(z )), z z (1 Θ(z )) z z z z (1 + Θ(z )). Combination of these inequalities yields the estimate 1 Θ(z ) 1 + Θ(z ) z z z z z z z z = [Θ](z ) Θ(z ) 1 + Θ(z ) 1 Θ(z ). If lim z z Θ(z ) = 0 and lim z z Θ(z ) = 0, then (6) holds. The first assumption in this statement holds, if F is semi-smooth at z. For the second assumption, we compute Θ(z ) := z z z z = (z z ) F (z ) 1 (F (z) F (z )), z z thus, lim z z Θ(z ) = 0 is implied by (5). Such a strong semi-smoothness assumptions holds, if y φ almost everywhere in Ω. This is a kind of strict complementarity assumption.

Globalization by damping 10 5 Globalization by damping For bad initial guesses, Newton s method may diverge. Idea: for an iterate z 0 solve the problem: find z(λ) Z, such that F (z(λ)) (1 λ)f (z 0 ) = 0 for some 0 < λ 1, This defines the Newton path z(λ), with z(0) = z 0 and z(1) = z (if it exists) is a zero of F. Newton step for this equation at z: 0 = F (z 0 )δz λ + F (z 0 ) (1 λ)f (z 0 ) = F (z 0 )δz λ + λf (z 0 ) aka damped Newton correction: δz λ = λδz = λf (z 0 ) 1 F (z 0 ) Choose λ such that the Newton s method for z(λ) converges locally, i.e. Θ z(λ) (z 0 ) < 1 The simplified Newton step for that problem reads, setting the trial iterate z t := z 0 + λδz F (z 0 )δ z λ + F (z t ) (1 λ)f (z 0 ) = 0. Try to control λ, such that Θ 1/2, but accept if Θ < 1. If Θ 1, then λ has to be reduced. Estimation of Θ: Θ z(λ) (z 0 ) Θ zt (z 0 ) = F (z 0 ) 1 (F (z 0 )λδz (F (z t ) F (z 0 )) λδz = δ z λ λδz Model for Θ: [Θ] z0 (z) := [ω] 2 z z 0. If [Θ] is too large, choose λ + such that: where 1 2 = [Θ] z t (z 0 + λ + δz) = [ω]λ + δz Z 2 [ω] = 2Θ z 0 (z t ) z t z 0 = 2Θ z 0 (z t ) λδz = 2 δ z λ λδz 2. Algorithm 5.1. (Damped Newton) Initial guesses: z, [ω] (or λ) Parameters: λ fail, T OL do solve F (z)δz + F (z) = 0 do compute λ = min(1, 1/( δz [ω])) if λ < λ fail terminate: Newton s method failed

References 11 solve F (z)δ z + F (z + λδz) = 0 compute [ω] = δ z (1 λ)δz λδz. 2 compute Θ(z, z + λδz) = [ω] λδz while Θ(z, z + λδz) 1 z = z + λδz if λ = 1 and Θ(z, z + δz) 1/4 and δz TOL terminate: Desired Accuracy reached, z out = z + δz + δ z while false References [1] P. Deuflhard. Newton Methods for Nonlinear Problems. Affine Invariance and Adaptive Algorithms, volume 35 of Series Computational Mathematics. Springer, 2 nd edition, 2006. [2] M. Hintermüller, K. Ito, and K. Kunisch. The primal-dual active set strategy as a semi-smooth Newton method. SIAM J. Optim., 13:865 888, 2003. [3] M. Hintermüller and M. Ulbrich. A mesh-independence result for semismooth Newton methods. Math. Programming, 101:151 184, 2004. [4] K. Ito and K. Kunisch. Lagrange Multiplier Approach to Variational Problems and Applications. Society for Industrial and Applied Mathematics, 2008. [5] A. Schiela. A simplified approach to semismooth newton methods in function space. SIAM J. Optim., 19(3):1417 1432, 2008. [6] M. Ulbrich. Semismooth Newton methods for operator equations in function spaces. SIAM J. Optim., 13:805 842, 2003. [7] M. Ulbrich. Semismooth Newton Methods for Variational Inequalities and Constrained Optimization Problems in Function Spaces. Society for Industrial and Applied Mathematics, 2011. [8] M. Ulbrich and S. Ulbrich. Superlinear convergence of affine-scaling interior-point Newton methods for infinite-dimensional nonlinear problems with pointwise bounds. SIAM J. Control Optim., 38(6):1938 1984, 2000.

Techniques for the Analysis of Semi Smooth Newton Methods Anton Schiela Winter School Würzburg, 2. März 2018

Newton's Method in Function Space Nonlinear Functional Analysis Nonlinear Equations Homotopy Newton's Method Optimization Affine Invariance [Deuflhard 2004] Semi Smoothness [Kunisch, Hintermüller 2002] [Ulbrich 2002]

Newton's Method Solve the operator equation: Sufficient for local superlinear convergence: for

Affine Invariant form of semi smoothness Remarks: is not uniquely determined may, but need not be a classical derivative follows from well known semi smoothness and regularity assumptions computable estimate is available

The Dirichlet function is semi smooth... Example from Analysis I: Definition of derivative: Semi smoothness at : Semi smoothness is not an analytic, but an algorithmic concept

An important special case linear differential operator superposition operator e.g. then Consider

Example: control constraints

Semi smoothness of superposition operators Hölder inequality If is continuous at, then

In general Semi Smoothness of outer function Hölder Inequality Local Continuity Result Semi Smoothness of superposition operator Remarks: complete analogy to Fréchet differentiability no global Lipschitz condition necessary (growth conditions instead) necessity of explains the well known norm gap not compatible with semi smoothness

Closing the norm gap (well known) Local superlinear convergence of Newton's method

Order of semi smoothness Same idea: Hölder inequality If is bounded near, then

Strict complementarity

Distribution function

Distribution function With strict complementarity we have the estimate:

Orders of convergence Lemma: Conclusion: Hence for the max function: Local superlinear convergence of Newton's method of order

What about globalization? Assume that is not Frechet differentiable at :

What about globalization? Assume that is not Frechet differentiable at : Consider the set of good semi smooth approximation: Then If is sufficiently large, our radius of Newton contraction towards can be arbitrarily small Problem: non differentiable points are ubiquituous in semi smooth problems Consequence: homotopy methods difficult to analyse

Using a merit function Classical merit function on a Hilbert space: Global convergence theory demands uniform continuity of : Problem with semi smoothness: is not continuous Remedy 1 (M. Ulbrich): use semi smooth operator, such that Candidate: Fischer Burmeister function: Remedy 2 (Kunisch et. al.): use techniques from non smooth optimization Remedy 3 (Gräser): for control constrained optimal control, reformulation in adjoint state

Objective with semi smooth derivative Unconstrained minimization problem: Semi smooth and Lipschitz continuous derivative: Globalization with classical techniques (line search, trust region) easy, if you use as merit function and not Applications: penalized obstacle problems and state constrained problems What about transition to fast local convergence?

Second order semi smoothness Appropriate condition for transition to fast local convergence: We have a quadratic approximation that may depend on In semi smooth Newton we have, of course: Theorem: Define the local quadratic model: Under second order semi smoothness we have for full Newton steps: Thus, full Newton steps become acceptable, eventually

Second order semi smoothness Theorem: Define the local quadratic model: Under second order semi smoothness we have for full Newton steps: Theorem: Under second order semi smoothness we have for damped Newton steps: Thus, damped Newton steps become acceptable, eventually if

Examples: Squared max function: Classical merit function:

Conclusions Local convergence theory Hölder Inequality & Continuity Result => Continuous Fréchet Differentiability Hölder Inequality & Local Continuity Result => Semi Smoothness Issues of stability of Newton contraction Global convergence Difficulty with classical merit function Optimization with semi smooth derivatives Transition to local convergence Second order semi smoothness