ADAPTIVE MULTILEVEL INEXACT SQP METHODS FOR PDE CONSTRAINED OPTIMIZATION

Size: px

Start display at page:

Download "ADAPTIVE MULTILEVEL INEXACT SQP METHODS FOR PDE CONSTRAINED OPTIMIZATION"

Jeffery Reeves
5 years ago
Views:

1 ADAPTIVE MULTILEVEL INEXACT SQP METHODS FOR PDE CONSTRAINED OPTIMIZATION J CARSTEN ZIEMS AND STEFAN ULBRICH Abstract We present a class of inexact adaptive multilevel trust-region SQP-metods for te efficient solution of optimization problems governed by nonlinear partial differential equations Te algoritm starts wit a coarse discretization of te underlying optimization problem and provides during te optimization process 1) implementable criteria for an adaptive refinement strategy of te current discretization based on local error estimators and 2) implementable accuracy requirements for iterative solvers of te linearized PDE and adjoint PDE on te current grid We prove global convergence to a stationary point of te infinite dimensional problem Moreover, we illustrate ow te adaptive refinement strategy of te algoritm can be implemented by using existing reliable a posteriori error estimators for te state and te adjoint equation Numerical results are presented Key words Optimal control, adaptive mes adaptation, PDE constraints, finite elements, a posteriori error estimator, trust-region metods, inexact linear system solvers 1 Introduction In tis paper we introduce and analyze a class of adaptive multilevel inexact sequential quadratic programming SQP) metods for te solution of nonlinear PDE-constrained optimization problems Nowadays, adaptive discretization tecniques for partial differential equations based on a posteriori error estimators are well establised to obtain accurate solutions wit considerably less grid points tan in te case of uniform meses In te context of optimization adaptive mes refinement offers te potential to perform most of te optimization iterations on coarse meses and to approac te infinite-dimensional problem during optimization in an efficient way We consider PDE constrained optimization problems of te form 11) min fy, u) subject to Cy, u) = 0, y Y,u U were U is te control space, Y is te state space, f : Y U R is te objective function Te state equation C : Y U V, Cy, u) = 0 comprises a system of) partial differential equations) wit appropriate initial and/or boundary conditions in a variational formulation wit V as te set of test functions Here V denotes as usual te dual space of V It would be possible to include constraints on te control u in our approac witout significant canges We leave tis issue to a fortcoming paper We assume tat Y and U are Hilbert spaces and tat V is a reflexive Banac space Moreover, let f and C be twice continuously Frécet differentiable Often, te PDE constraint is given by a variational formulation of te form In tis case, Cy, u) is given by ay; v) = bu; v) v V C : Y U V, Cy, u) = ay; ) bu; ) V TU Darmstadt, Department of Matematics, Nonlinear Optimization, Sclossgartenstr 7, D Darmstadt, Germany ziems@matematitu-darmstadtde) TU Darmstadt, Department of Matematics, Nonlinear Optimization, Sclossgartenstr 7, D Darmstadt, Germany ulbric@matematitu-darmstadtde) Te autor was supported by te DFG priority program 1253 and in parts by SFB 666 1

2 Te proposed multilevel SQP-algoritm for 11) generates a ierarcy of finitedimensional approximations 12) min y Y,u U fy, u ) subject to C y, u ) = 0, wic result from conformal discretizations, eg by te finite element metod, of 11) on adaptively refined meses Our assumptions on te conformal discretization will be made precise in section 2 In tis paper we develop an implementable adaptive refinement strategy based on error estimators and combine it wit an efficient inexact composite-step trust-region SQP metod inspired by [18, 28] Te resulting adaptive multilevel SQP-metod generates a ierarcy of adaptive discretizations 12), controls te inexactness of iterative solvers on te current grid and refines te grid if necessary adaptively in an appropriate way based on local error estimators, eg [1, 8, 9, 11, 30, 32], to ensure convergence to te solution of te original problem 11) We will prove global convergence under standard assumptions to a first order optimality point of te infinite dimensional problem 11) Te major advantages of te multilevel approac are tat most optimization iterations are carried out on coarse meses wile te accuracy of te optimization result is controlled, since te mes adaptation is tailored to te needs of te optimization metod Tis offers te possibility to obtain optimization results of ig accuracy by an effort of a few simulation runs In recent years, multilevel tecniques in optimization ave received considerable attention [6, 7, 14, 15, 16, 23, 29] Tese approaces focus on te efficient use of a ierarcy of discretizations to solve an optimization problem on te finest grid [6, 7] consider multigrid solvers for te optimality system of PDE-constrained problems witout globalization, [7] studies suc metods wit control constraints and [6] wit state constraints [15, 16, 23, 29] apply multigrid ideas in a recursive fasion for optimization problems, te coupling wit adaptive mes refinement is not considered Te rigorous combination of adaptive error control tecniques and modern globally convergent optimization tecniques, wic is te topic of tis paper, was so far to te best of our nowledge not considered On te oter and, a posteriori error estimators in te context of PDE-constrained optimization are an active researc area [2, 3, 4, 20, 19, 24] Te rigorous imbedding of error estimators in multilevel optimization metods was to te best of our nowledge not considered so far Truncated Newton metods in te presence of inexact function and gradient evaluations were studied in [22], but te combination wit error estimators was not considered [26] proposes a general algoritmic framewor based on consistent approximations for optimal control problems wic deals wit approximate function and gradient evaluations in steepest descent algoritms Te accuracy control mecanism requires an error estimator for te function and gradient value depending on a scalar mes parameter and is very different from te approac in tis paper Te purpose of tis paper is to provide a rigorous framewor for te combination of efficient and robust inexact SQP-metods wit appropriate a posteriori error estimators For te solution of te auxiliary trust-region problems, our metod offers te possibility to use any ind of iterative solver, in particular te above mentioned multilevel solvers Te paper is organized as follows In section 2 we describe te optimality conditions and introduce our notations for te discretized problems In section 3 we start wit basic notations and requirements for inexact SQP metods followed by 2

3 a description of te basic components of our multilevel composite step trust region SQP algoritm before we state te refinement criteria and te algoritm itself Te convergence analysis can be found in section 4 In section 5 we sow ow te inexactness in linear equation solves and ow te decrease conditions can be satisfied in an implementation Averaging and residual based error estimators for a general semilinear elliptic PDE wit inexact states can be found in section 6 Numerical results are presented in section 7 We will often use te following notation: X := Y U, x = y, u) X 2 Optimality conditions and discretization 21 Optimality conditions Let G w denote te Frécet derivative of an operator G wrt a variable w, eg C y denotes te Frécet derivative of te PDEconstraint operator C wit respect to te state y Trougout te paper we assume tat C y y, u) LY, V ) as an bounded inverse Let 21) l : Y U V R, ly, u, λ) = fy, u) + λ, Cy, u) V,V denote te Lagrangian function, were λ, Cy, u) V,V denotes te dual pairing Note tat V = V and tus λ V = V Let ȳ, ū) be an optimal solution of problem 11) Ten te following first order necessary optimality conditions old: Tere exists an adjoint state Lagrange multiplier) λ V suc tat 22) l y ȳ, ū, λ) = f y ȳ, ū) + C y ȳ, ū) λ = 0, l u ȳ, ū, λ) = f u ȳ, ū) + C u ȳ, ū) λ = 0, Cȳ, ū) = 0 Tus, te adjoint state λ is uniquely determined by λ = C y ȳ, ū) f y ȳ, ū), since C y ȳ, ū) as a bounded inverse 22 Discretized problem For simplicity we assume tat problem 11) is approximated by a conformal finite element discretization More precisely, let Y Y, V V be finite element subspaces on a triangulation T of te computational domain Ω consisting of closed cells T Te mes parameter is defined as a cell-wise constant function by setting T = T and T is te diameter of T Te mes T is assumed to be sape regular Morover, we introduce a finite dimensional subspace U U of te control space Depending on te concrete situation tere are different possibilities to coose te space U It is reasonable to set U = U if U is finite dimensional We set X := Y U We assume tat te discretized PDE-constraint C : Y U V is given by te conformal finite element discretization 23) C y, u ), v V,V := Cy, u ), v V,V v V Te discretized optimization problem is ten given by 12) min y Y,u U fy, u ) subject to C y, u ) = 0, and te Lagrangian function of te discretized problem by l : Y U V R, l y, u, λ ) = fy, u ) + λ, C y, u ) V,V = ly, u, λ ), 3

4 were te last identity follows from 23) Similar to 22) te optimality conditions at a local solution ȳ, ū ) of te discretized problem 12) read wit an appropriate Lagrange multiplier λ V 24) l y ȳ, ū, λ ), w y Y,Y = 0 w y Y, l u ȳ, ū, λ ), w u U,U = 0 w u U, Cȳ, ū ), w λ V,V = 0 w λ V For given x, λ ) X V te residuals in te original optimality system 22) are given by l y x, λ ) Y = sup l y x, λ ), w y Y,Y, w y Y, w y Y =1 l u x, λ ) U = sup l u x, λ ), w u U,U, w u U, w u U =1 Cx ) V = sup Cx ), v λ V,V, v λ V, v λ V =1 and te residuals of te discrete optimality system 24) by l y x, λ ) Y = sup l y x, λ ), wy Y,Y, wy Y, wy Y =1 l u x, λ ) U = sup l u x, λ ), wu U,U, wu U, wu U =1 Cx ) V = sup Cx ), vλ V,V vλ V, vλ V =1 Note tat te inequality Cx ) V Cx ) V always olds We assume tat we are able to calculate norms in V By refining te meses we can generate a ierarcy of approximations Derivatives of functions from te discrete problem will also be denoted by subset variables, since by inserting te discrete values tey can be defined via dual pairings in infinite dimensions wic can be calculated as a vector vector product, eg C y x )) λ, w y Y,Y = C y x )) λ, w y Y,Y Example 21 Consider te problem Problem 71 in section 71) min fy, u) := 1 y H0 1 2 y y d 2 L Ω),u L2 Ω) 2 Ω) + α 2 u 2 L 2 Ω) st y + y 3 = u in Ω, y = 0 on Ω, were Ω R 2 is a polygonal domain, y d H0 1 Ω) and α > 0 Here, te state equation as to be understood in te wea sense, more presicely, te state equation is given by te variational equation y v + y 3 v uv) dx = 0 v H0 1 Ω) Ω Terefore, we set Y = V := H 1 0 Ω), U := L 2 Ω), V = Y and define C : y, u) Y U Cy, u) V = Y, Cy, u), v V,V := ay; v) bu; v), v V, 4

5 wit ay; v) = y v + y 3 v) dx, bu; v) = Te Lagrangian function is tus given by Ω Ω uv dx ly, u, λ) = fy, u) + Cy, u), λ V,V = fy, u) + ay; λ) bu; λ) Now let Y = V Y and U U be finite dimensional subspaces Ten te conformal discretization is given by were min y Y,u U fy, u ) := 1 2 y y d 2 L 2 Ω) + α 2 u 2 L 2 Ω) st C y, u ) = 0, C : y, u ) Y U C y, u ) V = Y, C y, u ), v V,V := ay ; v ) by ; v ) = Cy, u ), v V,V, v V = Y Te discrete Lagrangian function l is just te restriction of l l y, u, λ ) = fy, u ) + ay ; λ ) bu ; λ ) = ly, u, λ ) 3 A multilevel trust region SQP algoritm 31 Main components of our multilevel trust region SQP algoritm In tis section we give a brief introduction to trust region SQP metods and introduce te main components of our multilevel trust region SQP algoritm For furter information on trust region tecniques we refer to [12] and for inexact trust region tecniques to [18] In a classical local SQP metod one minimizes a quadratic approximation of te Lagrangian function l in te current iterate x, λ ) subject to te linearized constraint Tat is, one computes te next iterate x +1 as x +1 = x + s were s solves te SQP-problem at x, λ ) min s X q s) := l + l x ), s X,X s, H s X,X subject to C + C x ) s = 0 wit x = y, u ), l = lx, λ ), l x) = l x x, λ ), H l xx x, λ ), C = C x ), Cx ) = Cx x ) and accordingly all oter abbreviations trougout te paper Note tat by te conformity of te discretization, see in particular 23), C + Cx ) s = 0 is equivalent to C + C x ) s V = 0 and q s) can also be written in terms of l, more precisely, q s) := l + l x), s X,X s, H s X,X Since it is elpful to view our algoritm as a metod for 11) tat wors wit a ierarcy of adaptive discretizations, we will sometimes prefer to use l instead of l, since l is only te restriction of l to te current subspaces 5

6 One way to globalize a local SQP-metod is using trust-region tecniques Te idea is to trust te quadratic approximation of te Lagrangian funtion and te linearized constraint only in a trust-region wic is adjusted during te algoritm to te quality of te approximations Since te local SQP problem may become infeasible wen joining an additional trust-region constraint s for a trust region radius > 0 one uses a step decomposition as suggested for example by Byrd, Omojoun [10, 25] and Dennis, El Alem, Maciel [13] Here te step s is split into a sum of two steps, te quasi-normal step s n = s n y, 0) to improve feasibility and te tangential step s t = s t y, s u ) to improve optimality 311 Quasi normal step towards feasibility First, we compute a quasi normal step s n, wic is responsible for moving towards feasibility Since we assume tat Cy x ) is invertible, we perform te quasi normal step only in te state variables Te y component of s n is an approximate solution of 31) min s n y Y Cy ) s n y + C V st s n y Y, and te u component is given by s n u, = 0 Subproblem 31) is not solved exactly A rater coarse solution is sufficient to ensure basic global convergence Te quasi normal component is required to satisfy a Fraction of Caucy Decrease condition 32) C 2 V C y ) s n y + C 2 V κ 1 C V min { κ 2 C } V, for all N, were κ 1, κ 2 0, 1) are fixed constants independent of and te grid It is well nown, tat for example te Steiaug-CG metod or a truncated Newton step, wic is scaled bac into te trust region if necessary, satisfies 32) Remar 31 Usually, tere exists already an efficient iterative solver for te linearized state equation Cy ) s n y + C = 0 Ten sn can be computed as an inexact solution, wic is scaled bac into te trust region See section 52 for more details 312 Tangential step towards optimality In a second step, te trust region SQP-algoritm computes a tangential step s t wic is responsible for moving towards optimality but as to maintain linearized feasible, ie as to be in te nullspace of te linearized constraints Let q be te quadratic approximation of te Lagrangian function in x, λ ) 33) q s) := l + l x ), s X,X s, H s X,X, were H is a symmetric approximation to te Hessian of te Lagrangian function in x, λ ) We will assume tat te sequence of approximated Hessians is bounded Te tangential step is ten an approximate solution of 34) min s t X q s n + st ) st Cy ) s t y + Cu) s u = 0, s u U Note tat te tangential equation in te constraint is a variational equation for test functions from te finite element space V Consequently, te residual in te tangential equation must be ortogonal on a basis of V We can reduce q s n +st ) to te controlcomponent s u of te tangential step s t by solving te tangential equation 35) C y ) s t y + C u) s u = 0, 6

7 ie s t y = Cy ) 1 C u) s u Defining W by C W = y ) 1 ) C u) LU, Y U ) I we obtain s t = W s u and we arrive at te reduced quadratic approximation of te Lagrangian 36) ˆq s u ) := q s n ) + W H s n + l x ) ), s u s u, W H W s u Tus, we can write te tangential problem entirely in s u 37) min su U ˆq s u ) st s u U, but ave afterwords to compute s t y by using 35) Now we allow inexactness in te derivatives and te solutions of linear systems Instead of computing te reduced gradient by W H s n + l x) ) we solve te adjoint equation after te quasi-normal step on te current grid 38) l y x, λ + λ ) + H s n ) y) = 0 in Y in te variable λ sufficiently well index y) denotes te y-component) suc tat te following accuracy condition is satisfied 39) l y x, λ + λ ) + H s n ) y) Y κ λ min{ l u x, λ + λ ) + H s n ) u) U, }, were denotes te trust-region radius and κ λ > 0 A similar criterion was proposed in [18] We define te inexact reduced gradient ĝ as approximation to te reduced gradient of ˆq by 310) ĝ := l u x, λ + λ ) + H s n ) u) Any suitable iterative solver can be applied to te adjoint equation 38) until te stopping criterion 39) is satisfied It is ten easy to sow tat tere exists ξ 1 > 0 suc tat 311) ĝ W H s n + l x x, λ )) U ξ 1 min{ ĝ U, } Moreover, let Ĥ be an approximation to te reduced Hessian W H W satisfying 312) s u,, Ĥs u, ξ 2 s u, 2 U for all steps s u, U computed by te algoritm and some fixed ξ 2 > 0 Ten we define our approximate reduced quadratic approximation of te Lagrangian as ˆm s u ) := q s n ) + ĝ, s u s u, Ĥs u, 7

8 were s n denotes te quasi normal step And we compute s u as an approximate solution of te inexact reduced tangential problem 313) min su U ˆm s u ) st s u U Te approximate solution of 313) must provide a fraction of te Caucy decrease in te approximate model ˆm, ie 314) ˆm 0) ˆm s u, ) κ 4 ĝ U min { κ 5 ĝ U, κ 6 } N, were κ 4, κ 5, κ 6 are positive constants independent of and te grid Te y component of te tangential step is ten given by 315) s t y, = C y ) 1 C u) s u, Since we allow linear system solutions to be inexact, solving tis equation approximately creates te residual 316) r t := C y ) s t y, + C u) s u, Accuracy conditions on te residual in tis tangential equation are presented in te next section 313 Derivation of te predicted decrease To decide about te acceptance of te step we use te augmented Lagrangian merit function L x, λ ; ρ ) := fx ) + λ, C + ρ C 2 V Te decision about te acceptance of te step and update of te trust region radius is ten based on te ratio of actual reduction ared s, ρ ), given by ared s, ρ ) := L x, λ ; ρ ) L x + s, λ +1; ρ ) and predicted reduction based on te quadratic models in te quasi-normal and tangential step preds ; ρ ) = L x, λ ; ρ ) q s ) + λ, Cy ) s + C + ρ Cy ) s + C 2 V were λ = λ +1 λ Since we solve te linear system for te y component of te tangential step 315) inexactly wit residual r t = C y ) s t y, + C u) s u,, we obtain for s = s n + st, st = st y,, s u,), preds ; ρ ) = L x, λ ; ρ ) q s ) λ, C + Cy ) s n y, V,V λ, r t ρ V,V C + Cy ) s n y, + r t Since V is not necessarily a Hilbert space we use te triangle inequality in te last summand reducing te predicted reduction preds ; ρ ) L x, λ ; ρ ) q s ) λ, C + Cy ) s n y, λ V,V, r t V,V C ρ + Cy ) s n y, V + ) 2 r t V = L x, λ ; ρ ) q s ) λ, C + Cy ) s n y, λ V,V, r t V,V C ρ + Cy ) s n 2 y, V + 2 C + Cy ) s n y, V r t V + ) r t 2 V 8 2 V ),

9 Certainly te rigt and side is not te same model of te actual reduction as before only if r t = 0) But since we reduced preds ; ρ ) tis will only lead to a stronger requirement on te residual r t Note tat L x, λ ; ρ ) = q 0) + ˆq 0) q s n ) + ρ C 2 V Now, te quadratic model q s ) of te Lagrangian is replaced by te approximate reduced quadratic model ˆm s u, ) and we define 317) pred s n, s u, ; ρ ) := ˆm 0) ˆm s u, ) + q 0) q s n ) λ, C + Cy ) s n y, V,V ) + ρ C 2 V C + Cy ) s n y, 2 V, and rpred r; t ρ ) := λ, r t ρ V,V r t 2 V 2ρ r t V Cy ) s n y, + C V We now view pred s n, s u, ; ρ ) + rpred r t ; ρ ) as te approximate) quadratic model of te actual reduction in te augmented Lagrangian Remar 32 If V is a Hilbert space, ten we obtain C + Cx ) s t 2 V = C + Cy ) s n y, + r, t C + Cy ) s n y, + r t ) V = C + Cy ) s n y, 2 V + r t 2 V + 2 r, t C + Cy ) s n ) y, and we can define rpred r t ; ρ ) more exactly as rpred r; t ρ ) := λ, r t ρ V,V r t 2 V 2 r, t C + Cy ) s n ) y,, V wic is larger tan te above defined rpred r t ; ρ ) Neverteless, step evaluations are performed based on pred s n, s u,; ρ ) only: If ared s ; ρ ) pred s n, s u,; ρ ) η 1, were η 1 0, 1) is a given constant, ten s is accepted, oterwise s is rejected and te trust region is reduced As in [18], te conditions 318) rpred r; t ρ ) η0 pred s n, s u, ; ρ ), were η 0 0, 1 η 1 ) is a given constant, and 319) r t V ξ 3 1+p, for some constant ξ 3 > 0 independent of and given p 0, 1] ensure tat te inexactness in te tangential step s t y, does not dominate te quadratic model Inequality 318) is implied by 320) r t V σ + σ 2 + η 0 pred s n, s u,; ρ )/ρ, 9 V

10 were σ = C y ) s n y, + C V + λ V /2ρ ) Remar 33 Since only te size of rpred r t ; ρ ) is of interest as seen in te estimates 318) and 319), were tis size depends on te residual accuracy of an inexact solution of te tangential equation 315) te difference in te definitions of rpred r t ; ρ ), weter V is a Hilbert space or not, is of no importance However, te acceptance of a trial step depends on te ratio ared /pred and, tus, rpred r t ; ρ ) is of no importance for tat decision 314 Update of te penalty parameter We coose te penalty parameter ρ so large suc tat for a given κ 0, 1) te inequality 321) pred s n, s u, ; ρ ) κ ˆm 0) ˆm s u, )) + ρ 2 C 2 V C y ) s n y, + C 2 V olds Let 0 < ν 1 and κ 0, 1) If 321) is satisfied wit ρ = ρ 1, ten we set ρ := ρ 1 Oterwise, we coose te smallest ρ 1 + ν)ρ 1 tat satisfies 321) 315 Update of te trust region radius Let 0 < α 0 α 1 < α 2, let 0 < η 1 < η 2 < 1 and let 0 < min max We coose te trust region radius as follows: +1 [α 0, α 1 ], if ared pred < η 1 [max{ min, α 1 }, max{ min, }], if ared pred [η 1, η 2 ) [max{ min, }, min{max{ min, α 2 }, max }], if ared pred η Refinement of te grids Te main idea for refinement is to control te infinite dimensional norms of te residuals in te infinite dimensional optimality system by using te corresponding finite dimensional norms and te discrete) norm of te reduced gradient and te constraint Tus, if te norm of te reduced gradient or te constraint is large enoug compared to te infinite dimensional counterparts, te current discretization will be good enoug to compute sufficient descent On te oter and, if te discrete norm of te reduced gradient and/or te constraint on te current grid are small compared to te continuous norms, one as to ensure by mes refinement tat te infinite dimensional problem and, in particular, te infinite dimensional reduced gradient are well represented in te current discretization suc tat reasonable steps can be computed Observe tat te inexact reduced gradient ĝ depends on te inexact) state y and te inexact) adjoint λ + λ Terefore, te residual norms of te infinite dimensional state- and adjoint equation must be controlled Since tese residual norms cannot be computed directly, we will use reliable error estimators instead We will give brief motivations for te different refinement criteria before we state implementable versions using error estimators Note tat for Galerin discretizations V is te test function space corresponding to te discrete state space Y and terefore a refinement of Y implies a refinement of V and vice versa Error control for te discrete state equation To control te accuracy of te discrete state equation during optimization we refine te Y and V grid adaptively if necessary As suggested above we require te following convergence condition on te constraint 322) Cx ) V c 1 C V + c 2 ĝ U N, x X, 10 )

11 wit fixed arbitrary constants c 1 > 1 and c 2 > 0 Remar 34 Note tat tis convergence condition for te constraint can only be applied after te computation of te approximate reduced gradient 310) and, tus, after te computation of te quasi normal step Since te discretized norms in Y and V cange due to refinement, condition 32) needs to be ceced for te prolongated sn after a refinement of te grids Moreover, te dimension of V affects te computation of te adjoint state and, tus, also te approximate reduced gradient Consequently, condition 311) as to be reviewed Hence, if te prolongated s n does not meet 32), ten s n and ĝ are recomputed on te refined grids, since te computation of ĝ depends also on s n Oterwise, if te prolongated sn meets 32), ten ĝ only needs to be recomputed if 311) does not old for te prolongated ĝ After te computation of a succesful step on te current grid we need to verify tat te next iterate is also well represented on te current grid Tat is, te difference of te discrete norm and te infinite dimensional norm of te constraint in te next iterate may not become muc larger Oterwise we may ave no decrease in te infinite dimensional augmented Lagrangian function wile aving decrease in te discrete augmented Lagrangian function L In te convergence proofs we will see tat it is enoug to require tat te descent in te tangential step dominates a worsening in te infinite dimensional norm of te constraint: Cx ared s ; ρ ) 1 + δ)ρ + s ) 2 V C) x + s ) 2 ) V 323) ) Cx ) 2 V C) x ) 2 ) ) V ) wit 0 < δ 1 If criterion 323) is not satisfied te Y - and V -grid need to be refined properly suc tat te next iterate can be represented well Tus, we cec after a succesful step if te current discretization was suitable to compute sufficient descent And, ence, tis criterion guarantees suitable adaptive) refinements Note tat te norm differences in te rigt and side of 323) are positive Moreover, if te grid is even better suitable for te next iterate tan for te current iterate, ten te rigt and side of 323) is negative Generally, if one refines reasonably, criterion 323) is always satisfied and, terefore, does not need to be implemented However, in te case were te grids are refined infinitely many times and te maximal messize tends to zero if te algoritm stays on one grid after some refinements convergence follows from finite dimensional teory) condition 323) can be given in te following way Assuming tat 324) αx ) := Cx ) 2 V C) x ) 2 V ) 0 αx + s ) := Cx + s ) 2 V C) x + s ) 2 V ) 0 for ) 0 as, condition 323) can be formulated in a weaer version, wic is easier to implement If te last term on te rigt and side in 323) can be estimated by an estimator βx, s ) > 0 suc tat 325) αx + s ) αx ) = K, )βx, s ) wit unnown constants satisfying 1/K K, ) K, N, for some fixed K > 0, ten it suffices to verify te following criterion 326) ared s ; ρ ) ξρ βx, s ) ω 11

12 for fixed ω 0, 1) and ξ > 0 In fact, assumption 324) guarantees wit 325) and te uniform boundedness of K, ) from below and above tat βx, s ) 1 + δ)k/ξ) 1/1 ω) for large enoug wic implies 1+δ)αx +s ) αx )) = 1+δ)K, )βx, s ) 1+δ)Kβx, s ) ξβx, s ) ω and consequently 323) Tis way one does not need to now te constants K, ) If te algoritm does not terminate after finitely many iterations and if te problem is well conditioned in suc a way tat 324) olds, ten after finitely many iterations and refinements 326) implies 323) wic suffices for te convergence proof An alternative criterion to 323) is te following condition 327) =0 Cx +1) +1 ) V C ) x ) +1 ) V ) < tat originates from te jumps in te differences of te norms of te constraint due to refinement of te meses wic sall be summable Neverteless te convergence proof is given for criterion 323) A convergence proof using condition 327) instead of 323) in te algoritm is very similar Only a few details in te proof of teorem 414 need to be adapted Error control for te reduced gradient and te discrete adjoint equation To control te quality of te discrete adjoint equation and te discrete reduced gradient during te optimization iteration we ave to refine te U grid and te Y and V grid, respectively, if necessary To control te error in te first optimality condition D y lx, λ ) = 0, ie te adjoint equation, we apply a similar criterion as for te state equation constraint We use λ +1 = λ + λ as inexact solution of te discrete adjoint equation ly x, λ +1 ) = 0 wit λ from 38) and 39) Note tat using te computation rule of λ and Lemma 41 togeter wit our assumptions on te boundedness it is easy to sow tat 328) l y x, λ +1) Y ξ 3 ĝ U + ξ 4 C V for some ξ 3, ξ 4 > 0 Tis justifies te coice of λ +1 as inexact discrete adjoint state since ĝ U and C V tend to zero during te optimization Tus, we require te following convergence condition on te adjoint equation 329) l y x, λ +1) Y c 1 l y x, λ +1) Y + c 2 ĝ U + C V ) wit fixed arbitrary constants c 1 > 1 and c 2 > 0 For given Y and V it is often easily possible to coose U U in suc a way tat 330) l u x, λ ) U = l u x, λ ) U In tis case te refinement of V implies te refinement of U and tere is no additional criterion necessary for refining te control space Example 35 Consider again te problem Problem 71 in section 71) min fy, u) := 1 y H0 1 2 y y d 2 L Ω),u L2 Ω) 2 Ω) + α 2 u 2 L 2 Ω) st y + y 3 = u in Ω, y = 0 on Ω, 12

13 were Ω R 2 is a polygonal domain, α > 0 Ten l u x, λ ), w u U,U = αu, w u ) L2 Ω) λ, w u ) L 2 Ω) w u U Terefore, if we coose U = V V U ten αu λ U U is te Riesz representation of l u x, λ ) in U as well as in U and terefore l u x, λ ) U = αu λ L2 Ω) = l u x, λ ) U On te oter and, if 330) does not old ten we require tat te discretization of te control space meets te following accuracy condition 331) l u x, λ +1) U c 1 l u x, λ +1) U + c 2 ĝ U + C V ), wit fixed arbitrary constants c 1 > 1 and c 2 > 0 Note tat using Lemma 41 togeter wit our assumptions on te boundedness it is easy to sow tat 332) l u x, λ +1) U ξ 5 ĝ U + ξ 6 C V for some ξ 5, ξ 6 > 0 Remar 36 Note tat after a refinement of te Y and V grid for te adjoint te discretized norms in Y and V cange Tus, condition 32) is not necessarily satisfied for te prolongated s n tat was computed on a coarser grid Hence, possibly, te quasi normal step needs to be recomputed In any case, te inexact reduced gradient ĝ is recomputed Implementation of te refinement criteria wit error estimators As derived above we need to implement te following refinement criteria wit fixed arbitrary constants c i > 1, i > 0, i = 1, 2, 3: Cx ) V c 1 C ) V + 1 ĝ ) ) U ) l y x, λ +1) Y c 2 l y x, λ +1) Y + ) 2 ) C l u x, λ +1) U c 3 l u x, λ +1) U ) + 3 ) C V V + ĝ) ) + ĝ) ) ) U ) ) U ) In general, infinite dimensional norms can not be computed Terefore, we assume tat we ave reliable error estimators η C,, η ly,, η lu, wit 333a) 333b) 333c) Cy, u ) V C 1 η C, y, u ) + C 2 C y, u ) V l y y, u, λ ) Y C 3 η ly,y, u, λ ) + C 4 l y y, u, λ ) Y l u y, u, λ ) U C 5 η lu,y, u, λ ) + C 6 l u y, u, λ ) U wit unnown, bounded constants C i > 0, i = 1,, 6, in suc a way tat η 0 as 0 for fixed y, u, λ Suc error estimators can be developed using te same tecniques as for well nown error estimators in te presence of exact discrete states Examples for suitable residual based and averaging error estimators as well as teir derivation will be sown in section 7 Remar 37 For all error estimators in section 7 one can sow tat η 0 as 0, ie if te maximal mes size tends to zero Terefore, te convergence conditions 334) can always be satisfied by sufficient refinement 13

14 Now we insert tese error estimator inequalities 333) in te above given criteria Moreover, an algoritm will truncate for a given stop-tolerance ε tol > 0 Since te norms of te reduced gradient and te constraint may become muc smaller tan te prescribed stop-tolerance in one last) iteration we also include ε tol in te refinement formulas Tus, we obtain te following implementable sufficient refinement criteria: Cec for arbitrary fixed constants c i > 0, i = 1,, 9, if 334a) η C,) x ) max { c 1 C ) V + c 2 ĝ ) ) U ), c 3 ε tol } 334b) η ly,)x, λ +1) max { c 4 l y x, λ +1) Y ) ) + c 5 C V + ĝ) ) ) } U ), c6 ε tol 334c) η lu,)x, λ +1) max { c 7 l u x, λ +1) U ) ) + c 8 C V + ĝ) ) ) } U ), c9 ε tol Oterwise refine te grids for Y ), V ), U ), prolongate te functions and recompute te affected data Remar 38 Wit te coice of c 3, c 6, c 9 a different quality for te state and te adjoint state tan for te norms of te reduced gradient and te constraint can be acieved in te stop-criterion of te algoritm Tis is in particular of interest wen dependent on PDEs or domains an approximate size of te error estimators on fine meses larger tan ε tol ) is nown Note tat c 1 and c 2 affect directly ow soon meses are refined Criterion 323) can be implemented in te form of 326) te following way We assume tat we ave an error estimator as in 333a) Cy, u ) V C 1 η C, + C 2 C y, u ) V wit C 2 = C 2 ) 1 as 0 were η C, is an efficient and reliable error estimator in te presence of exact discrete states Ten C 1 η C, C 1 η C, + 2 C y, u ) V ) may be seen as good numerical approximation of Cx ) 2 V C x ) 2 V for some bounded constant C 1 Terefore, we can consider Cx ) 2 V C x ) 2 V = K, )η C, x ) C 1 η C, x ) + 2 C x ) ) V ) ) Cx +1) 2 V C x +1) 2 V = K, )η C, x +1) C 1 η C, x +1) ) + 2 C x ) +1) V ) as residual estimator in te norm differences of te constraint wit bounded constants 1/K K, ) K for some K > 0 Ten βx, s ) in 325) can be computed as 335) βx, s ) =η C, x +1)C 1 η C, x +1) + 2 C x +1) V ) ) η C, x )C 1 η C, x ) + 2 C x ) V ) ) wit appropriate coice of C 1 Tus, condition 326) is implementable in a euristical version Note tat using a residual based or averaging error estimator η C, condition 326) wit βx, s ) as in 335) still contains te important geometrical meaning tat te current grid must be good enoug to compute and represent te next iterate 14

15 Local refinement strategy Te local refinement strategy is based on elementwise contributions to te error estimators ) 1/2 η C, ) = ηc,,t 2 ) T T η ly, ) = T T η 2 l y,,t ) ) 1/2 Examples for suitable error estimators will be discussed in section 6 Tere exist many local refinement strategies to select elements for refinement Typical examples for refinement strategies are refining te p% elements wit largest local errors η C,,T ) or η ly,,t ) respectively, or refining were te local contribution to te error estimator is larger tan p% of te largest local error 32 Multilevel trust region composite step SQP algoritm In tis section we state te common assumptions wic are necessary for te convergence teory and our multilevel algoritm 321 Assumptions Our convergence teory requires te set of assumptions given below For all iterations we assume tat x, x +s D, were D is an open, convex subset of X A1 Te functionals f, C are twice continuously Frécet differentiable in D A2 Te partial Jacobians Cy ) and C y x ) ave an inverse for all x D A3 Te functionals and operators f, f x, f xx, C, Cx ), Cxx are bounded in D Te operators Cy ) 1 are uniformly bounded in D A4 Te sequences {H }, {W } and {λ } are bounded We will use te notation B A in te convergence teory as a bound for te norm A for any quantity A tat is bounded by te assumptions 322 Multilevel trust region composite step SQP algoritm We are now in te position to state te complete algoritm Algoritm 39 Multilevel trust region composite step SQP algoritm) S0 Initialization: Coose κ 0, 1), 0 < ν 1, p 0, 1], ρ 1 1, ε tol > 0, 0 < α 0 α 1 < 1 < α 2, 0 < η 1 < η 2 < 1, 0 < min max, 0 < η 0 < 1 η 1, c 1 1, c 2, c 3 > 0, a starting grid denoted by index, x 0 X, λ 0 V and 0 [ min, max ] For = 0, 1, 2, S1 Compute a quasi normal step s n as inexact solution of 31) satisfying 32) S2 Compute an inexact adjoint state λ +1 λ + λ by 38) satisfying 39) and te inexact reduced gradient ĝ by 310) S3 If te refinement conditions 334b) and 334c) for te adjoint equation and te control-gradient old ten goto S4 Oterwise refine te U grid and te Y and V grid adaptively) and, if 32) is satisfied for te prolongated s n, ten go to S2, oterwise go to S1 S4 If te refinement condition 334a) for te state equation olds, ten go to S5 Oterwise refine te Y and V grid adaptively) until 334a) is satisfied If 32) and 311) old for te prolongated s n and ĝ, ten go to S5 If 311) is not satisfied for te prolongated ĝ, ten go to S2 But if 32) is also not satisfied for te prolongated s n, ten go to S1 15

16 S5 If C V ε tol and ĝ U ε tol, ten stop and return x = y, u ) as an approximate solution for problem 11) S6 Compute s u, as inexact soulution of 313) satisfying 314) S7 Update te penalty parameter according to subsection 314 S8 Compute s t y, suc tat te residual rt satisfies 318) and 319) S9 Compute pred s n, s u,; ρ ) using 317) Update te trust region radius according to subsection 315 If ared s ; ρ )/pred s n, s u,; ρ ) < η 1, ten reject s and go bac to S1 wit x and λ, else go to S10 S10 If 323) is satisfied, ten accept s and go to S1 wit x +1 = x + s and λ +1 = λ +1 Oterwise reject s, refine te Y and V grid properly and go bac to S1 wit x and λ Remar For te convergence teory we need te Lagrange multipliers to be bounded As stated in te algoritm above we use te adjoint states λ as Lagrange multipliers in te Lagrangian function l If te sequence of adjoint states is not bounded one can distinct between adjoint states and different) bounded Lagrange multipliers 2 Generally, if one refines reasonably, criterion 323) is always satisfied and, terefore, S10 does not need to be implemented 4 Convergence Analysis Let assumptions A1 A4 old trougout te section 41 Auxiliary estimates We start wit several tecnical lemmas Lemma 41 Tere exists κ 3 > 0 suc tat for all steps s n generated by te algoritm te inequality s n X κ 3 C V olds Proof Tis is an immediate consequence of Cy ) s n y, + C V C V and te boundedness of Cy ) 1 Lemma 42 Tere exists B > 0 suc tat for all steps s generated by te algoritm te inequality s B olds Proof Using s n Y, s u, U, and max togeter wit te definiton 316) of r t, 319) and te boundedness of C y ) 1 C u), we obtain te desired result Lemma 43 Tere exists c > 0 independent of te grid suc tat lx +1, λ ) + q s ) c 2 Proof By te definition 33) of q a Taylor expansion of lx +1, λ ) and Lemma 42 yields te desired result Lemma 44 Tere exists c > 0 independent of te grid suc tat q s ) + ˆq s u, ) c 1+p for q and ˆq in 33), 36) 16

17 Proof Recall tat, by te definition 316) of r t, s t C = y ) 1 ) rt + W 0 s u, Using te definitions of q and ˆq in 33), 36) along wit te above equality, we find tat Note tat q s ) + ˆq s u, ) = = H s n + l x x, λ ), W s u, l x x, λ ), s t s n, H s t 1 2 st, H s t W s u,, H W s u, = H s n + l x x, λ ), W s u, s t 1 2 st, H s t W s u,, H W s u, B H s n X + B l ) W s u, s t X B H s t 2 X B HB 2 W s u, 2 X s t X s t W s u, X + W s u, X C y ) 1 LV,Y ) r t V + B W s u, U B C 1 y ξ 3 1+p + B W s u, U, were we ave used 319) Hence, we obtain by using max q s ) + ˆq s u, ) B H s n X + B l ) Cy ) 1 rt X B H s t 2 X B HBW 2 s u, 2 U B H s n X + B l )B C 1 ξ 3 1+p y B H B HB 2 W s u, 2 U C 1+p B C 1 y ξ 3 1+p + B W s u, U ) 2 wit some constant C te proof is complete Lemma 45 Tere exists c > 0 independent of te grid suc tat ĝ W q ) s s n ), s u, s u,, Ĥs u, 1 2 s u,, W H W s u, c 1+p Proof Tis is an immediate consequence of 311), 312), max and te assumptions on te boundedness Lemma 46 Tere exists c > 0 independent of te grid suc tat λ, C +1 + C x ) s + C c 2 Proof A Taylor expansion for te constraint togeter wit te boundedness of te Lagrange multipliers yield te desired result Remar 47 For any norm on a vectorspace Z and a, b, c Z te following inequality olds a 2 b + c ) 2 a a = b c + b + c ) ) a b + c a + b + c 17

18 Lemma 48 Tere exist c 1, c 2 > 0 independent of te grid suc tat ρ C +1 2 V C + C y ) s n y, V + r t V ) 2 ) ρ c 1 1+p C V + ρ c 2 2+p Proof In view of Remar 47 we estimate as follows C +1 2 V C + C y ) s n y, V + r t V ) 2 [ C +1 C C y ) s n y, V + r t V ] [ C +1 V + C + C y ) s n y, V + r t V ] =: [A] [B] First, we estimate [A] by using Taylor expansion, t [0, 1], and 319) Tis yields wit max [A] = C + C x ) s C xxx + ts )[s, s ] C C y ) s n y, V + r t V C x ) s C y ) s n y, V B D 2 C s 2 X + r t V = C x ) s t V B D 2 C s 2 X + r t V = r t V B D 2 C s 2 X + r t V C 1+p, for some C > 0 Now, we estimate [B] C + C y ) s n y, V C V + B Cy s n y, Y C V + c, for some c > 0 and by using Lemma 42 C +1 V = C + C x x + τs )s V C V + B Cx s X C V + B Cx B, for some τ [0, 1] Tus, we obtain [B] C V + c for some c > 0 Te estimates on [A] and [B] togeter imply [A] [B] c 1 1+p C V + c 2 2+p, for some c 1, c 2 > 0, wic yields te desired result Lemma 49 Tere exist K 0, K 1, K 2 > 0 independent of te grid suc tat 41) ared s ; ρ ) pred s n, s u,; ρ ) rpred r t ; ρ ) K 0 1+p + ρ K 1 1+p C V + ρ K 2 2+p 18

19 Proof Using te definitions of ared, pred, rpred, q, ˆq and ˆm and some simple transformations we obtain 42) ared s ; ρ ) pred s n, s u, ; ρ ) rpred r; t ρ ) = = lx +1, λ ) + q s ) q s ) + ˆq s u, ) + ĝ W q ) s s n ), s u, s u,, Ĥs u, 1 2 s u,, W H W s u, + λ, C+1 + Cx ) s + C ρ C +1 2 V C + Cy ) s n y, V + r t ) 2 ) V Te asserted estimate follows now from te triangle inequality togeter wit Lemmas 43, 44, 45, 46, 48 and max 42 Acceptance of steps We sow now tat tere will always be a succesful step on a fixed grid after finitely many iterations Togeter wit Remar 37, wic states tat te refinement conditions 334) can always be satisfied by sufficient refinement, tis sows tat te algoritm is well defined We start wit an auxiliary lemma Lemma 410 Let min {δ, δ C V ) 2 2+p Ten te following inequalities old: i) C V 1+p C V δp min{, C V }, } wit 0 < δ < min{b 1 C, 1} ii) 2+p δ C V min{, C V } Proof Tese estimates follow quite directly from te assumptions Lemma 411 Let ε > 0, ten tere exists a constant δ > 0 wic depends on ε but not on C V suc tat if ten { { δ C for min δ, max ) 2 V 2+p, C V + ĝ U ε, ared s ; ρ ) pred s n, s u,; ρ ) η 1 ) 1 δ ρ }} p, in particular, te step s will be accepted and +1 Proof Using te triangle inequality and 318) we see tat ared s ; ρ ) pred s n, s u,; ρ ) 1 ared s ; ρ ) pred s n, s u,; ρ ) rpred r t ; ρ ) pred s n, s + η 0 u,; ρ ) By te coice of te penalty parameter and by te decrease conditions 314) and 32), we obtain pred s n, s u, ; ρ ) κκ 4 ĝ U min { κ 5 ĝ U, κ 6 } + ρ 2 κ 1 C V min { κ 2 C V, } Ten tere exists K > 0 depending on ρ 0 ) suc tat pred s n, s u, ; ρ ) Kε min{ε, } + Kρ C V min { C V, } 19

20 For te rigt and side of inequality 41) from Lemma 49 we obtain K 0 1+p + ρ K 1 1+p C V + ρ K 2 2+p 1+p ρ c, { for some c 1 Now coose δ 1 < min{1 η 1 η 0 ) Kε, ) 1 p ε} and let min δ1 ρ c, δ 1 } Ten we obtain by using Lemma 49 and te previous inequalities ared s ; ρ ) pred s n, s u,; ρ ) 1 1+p ρ c + η 0 1 η 1 Kε Tus, te above cosen guarantees a succesful step Now{ we consider te second part of te maximum in te lemma Coose δ 2 < ) 1 } min 1 η1 η 0) p, B 1 ˆK C, 1 wit ˆK = max{k0,2k1,2k2} min{ K, e Kε} e and let B C be te bound { δ2 on te norm of te constraint Let min C ) p V 2+p, δ 2 }, ten we obtain by using Lemma 410 wit δ = δ 2 ared s ; ρ ) pred s n, s u,; ρ ) 1 η 0 + K 0 1+p + ρ K 1 1+p C V + K 2 2+p Kε + Kρ C V min{, C V } η 0 + K 0 δ p 2 + ρ max{k 1, K 2 }δ 2 + δ p 2 ) C V min{, C V } Kε + Kρ C V min{, C V } η 0 + δ p ˆK + ρ C V min{, C V } ) 2 + ρ C V min{, C V } ) η 0 + ˆK 1 η 1 η 0 ˆK = 1 η 1 Tus, te step will be accepted Now, we define δ := min{δ 2, δ 1 /c} and te proof is complete 43 Penalty parameter We study next te beaviour of te penalty parameter Lemma 412 Under te problem assumptions, tere exists a constant K > 0 independent of te iterates suc tat q 0) q s n ) λ, C y ) s n y, + C K C V Proof Tis result follows similarly as in [13, Lem 73] Lemma 413 Let ε > 0 and assume tat C V + ĝ U ε N Ten tere exists ρ > 0 and K N suc tat ρ = ρ for all K 20

21 Proof Oterwise, we obtain ρ Set M := { N : ρ > ρ 1 } and consider M Ten 321) is not valid Tis implies 43) q 0) q s n ) λ, C y ) s n y, ) + C κ 1) ˆm 0) ˆm s u, ) ) ρ 1 2 C 2 V C y ) s n y, + C 2 V 0 By Lemma 412, te left and side of te above inequality is K C V Tus, 43) and 32) imply K C V ρ 1 2 κ 1 C V min{κ 2 C V, } If C V = 0, ten min{ C V, } = 0 Oterwise te previous inequality yields a constant C ρ > 0 suc tat C ρ ρ 1 min{ C V, } Since ρ, tis sows min{ C V, } M 0 On te oter and, by Lemma 411 and te update rule for te trust region radius, we obtain { { δ C ) 2 2+p 44) α 0 min δ, max V, ) 1 δ ρ }} p Tis yields { C V } M 0 Consequently, for all M large enoug, we get ĝ U ε 2 If 321) does not old, ten by 43) and 314) we obtain K C V 1 κ)κ 4 ĝ U min{κ 5 ĝ U, κ 4 p } Consequently, tere exists c > 0 suc tat c C V Since { C V } M 0, tis requires ε 2 min { ε 2, p } Hence, by 44), we obtain 2c ) 1 p ε C 1 p V p 2c ε C V M, 0 N { { δ C ) 2 2+p α 0 min δ, max V, ) 1 δ ρ }} p If C V = 0, tis leads to te contradiction 0 min{δ, δ/ρ ) 1 p } > 0 Tus, C V > 0 olds, implying ) 1 2c p C ε 1 p V { α 0 min δ, δ 2 2+p C 2 wic contradicts { C V } M 0 Consequently, te sequence of penalty parameters {ρ } is bounded Moreover, te update rule for te penalty parameter implies tat tere exists ρ > 0 and K N suc tat ρ = ρ for all K 21 2+p V },

22 44 Global convergence result We sow now global convergence to a stationary point of te infinite dimensional problem 11) if ε tol = 0 or finite termination if ε tol > 0 respectively We start wit te following result Teorem 414 Let te assumptions A1, A2, A3 and A4 old If ε tol = 0 ten te algoritm terminates finitely or te sequence of iterates generated by algoritm 39 satisfies lim inf C V + ) ĝ U = 0 For ε tol > 0 te algoritm terminates finitely wit C V ε tol and ĝ U ε tol Proof Suppose not, ten te algortim runs infinitely and tere exists ε > 0 suc tat C V + ĝ U ε N Ten, by Lemma 413, ρ equals ρ for all K for some K N Let S be te set of indices of accepted steps By Lemma 411, tere exists δ > 0 suc tat for all accepted steps, S, we obtain 45) α 0 min { { δ C ) 2 2+p δ, max V, ) 1 δ ρ }} p { α 0 min δ, ) 1 } p δ ρ =: Moreover, for all S wit K we get by te decrease conditions 314) and 32) 46) ared s ; ρ ) η 1 pred s n, s u, ; ρ ) η 1 κκ 4 ĝ U min{κ 5 ĝ U, κ 6 } + η 1 ρ 2 κ 1 C V min{κ 2 C V, } Let us define te infinite dimensional augmented Lagrangian function L and te infinite dimensional actual reduction ared by Lx, λ; ρ) := lx, λ) + ρ Cx) 2 V, ared s ; ρ ) := Lx, λ ; ρ ) Lx + s, λ +1; ρ ) Te condition 323) for reasonable refinement yields ared s ; ρ ) = δ 1+δ ared s ; ρ ) δ ared s ; ρ ) δ 1+δ ared s ; ρ ) + ρ Cx +1 ) 2 V C) x +1) 2 V ) Cx ) 2 V C) x ) 2 V )) ) ) Hence, using tis inequality we obtain 47) ared s ; ρ ) δ 1+δ ared s ; ρ ) δ 1+δ η 1pred s n, s u, ; ρ ) since we assume conform discretizations and, tus, l x, λ ) = lx, λ ) olds Now, by assumption, L is bounded from below Summation of te infinite dimensional 22

23 actual reduction in te successive steps S gives Lx, λ ; ρ ) Lx +1, λ +1; ρ )) ared s ; ρ ) = S S = C + S K Lx, λ ; ρ ) Lx +1, λ +1; ρ )) = C + Lx K, λ K; ρ ) lim Lx +1, λ +1; ρ ) < Hence, by te summability, we obtain ared s ; ρ ) 0 as S wic implies, by 45), 46), and 47) tat C V + ĝ U 0 Tis contradicts our assumption from te beginning of te proof Te following teorem states tat tere exists a subsequence of te iterates tat satisfies te first order necessary optimality conditions cf 22)) of te given problem 11) in te limit if ε tol = 0 Teorem 415 Let te assumptions A1, A2, A3 and A4 old Ten for ε tol > 0 te algoritm terminates finitely and for ε tol = 0 te algoritm terminates finitely wit a stationary point of problem 11) or te sequence of iterates x, λ +1 ) generated by algoritm 39 satisfies lim inf ly x, λ +1) Y + lu x, λ +1) U + Cx ) ) V = 0 Proof Using te convergence conditions 334) wit ε tol = 0 togeter wit 328) for te adjoint equation and 332) for te u gradient of te Lagrangian tis is an immediate result of Teorem Implementation 51 Computation of norms Norms in te control space Let M = M U denote te matrix mass matrix if U = L 2 ) M = ψ i, ψ j ) U ) i,j for te basis ψ i ) of U were, ) U is te inner product in te Hilbert space U Ten functions u U ave te representation u = u i ψ i =: Ψ u and we may identify U wit te Hilbert space R l,, ) M ), were te scalar product is given by u, v) M = u T M v and u M = u T M u is te induced norm Ten for any u = Ψ u U u U = u U = u M Furtermore, by te Riesz representation teorem te dual space U can be identified wit U, ie, all functionals u U = U are given by u, u U,U = u, u ) U = u, u ) U = u, u) M, u U = u U = u M, were u = Ψ u If, moreover, u U is given by u = Ψ u v T u wic is eg te case for te euclidean representation of te reduced gradient) ten u as te representation u = Ψ M T v) U, since and one as u, u U,U = v T u = M T v, u) M u U = u M = M T v M = v M T In tis way one can in particular compute a discrete representation and discrete norm of gradients in U = U tat are appropriate in te function space setting 23

24 Norms in state space and test function space If Y and V are Hilbert spaces ten te discrete norms Y, Y, V, V can be computed analogously as for te control space, were now M Y = ψ i, ψ j ) Y ) i,j for te basis ψ i ) of Y and M V = ψ i, ψ j ) V ) i,j for te basis ψ i ) of V ave to be used instead of M U In particular, te discrete norm of te residual in te constraint C V can be computed In te case Y = H0 1 Ω) te matrix M Y is te sum of te usual stiffness and mass matrices Often a spectrally equivalent independent of te mes size) matrix M Y is used to reduce te costs for computing Y, Y 52 Computation of te quasi normal component Te quasi normal component s n is an approximate solution of te trust region subproblem 31) and it is required to satisfy 32) One metod to guarantee 32) is to use scaled approximate solutions wic may be produced by te following simple procedure Apply an appropriate iterative solver for te linearized state equation Cy ) z n = until wit a fixed ν 0, 1) te stopping criterion olds C C y ) z n + C V ν C V Ten scale tis step bac into te trust region, ie, set ) { s n t z = n 1 if z, were t 0 = n Y, / z n Y oterwise Te step s n satisfies 32) see Lemma 633 in [31]) 53 Computation of te tangential component 531 Computation of te u component of te tangential step Te u component s u, of te tangential step s t is an approximate solution of te trust region subproblem 313) tat is required to satisfy te fraction of Caucy decrease condition 314) As in section 51 denote by Ψ te basis of U and by M = M U te corresponding mass matrix Since U is a Hilbert space, we may use te identification U = U Ten ĝ U = U is given by ĝ, ) U and ĝ is te steepest descent direction of ˆm in U at s u = 0 It is well nown tat te decrease condition 314) can be ensured as long as s u, provides at least a fixed fraction of te decrease provided by te Caucy point s c u, := argmin{ ˆm s u ) : s u = tĝ, t 0 and s u U } As described in section 51 U can be identified via te coordinate represention U u = Ψ u wit te Hilbert space R l,, ) M ) In practice, ˆm is given by its coordinate representation m s u ) := ˆm Ψ s u ) Ten te correctly scaled steepest descent direction is given by M T su ˆm 0) not by te euclidean gradient representation su ˆm 0)) An approximate solution of 313) tat satisfies 314) can be computed, eg, by using te conjugate gradient cg) metod applied to ˆm s u ) in te space U wit scalar product, ) U, or equivalently to m s u ) in te space R l,, ) M ) Here te cg metod wit starting point s u = 0 is applied to te minimization of ˆm Te cg metod is stopped if an approximate minimum of te quadratic model ˆm is reaced, if negative curvature is detected, or if te iterates leave te trust-region bound Te 24

25 first iterate in te steiaug cg metod is te Caucy-step Note tat it is essential to apply te cg-metod wit te scalar product, ) U in order to wor wit te correct scaling and discrete norms If Ĥ can be applied exactly wic is usually not realistic if te exact reduced Hessian Ĥ = W H W is used, ten te cg metod ensures tat ˆm decreases monotonically, and 314) remains satisfied for all Steiaug cg iterates If Ĥ is applied inexactly, ten one as to compare te function values ˆm at te first Steiaug cg iterate and at te final Steiaug cg iterate Anoter possibility to compute steps is te application of suitable Krylov solvers to te KKT system of te tangential problem 34) In tat case te accuracy conditions for te y-component of te tangential step 318) and 319) can be integrated in te solver If te exact Hessian is H available tis metod in combination wit preconditioners as suggested in [5] leads usually to very good steps after a few iterations on te linear system 532 Computation of te y component of te tangential step We ave already sown tat 318) and 319) are satisfied if s t y, satisfies C y ) s t y, = Cu) s u, + r t wit residual { } r t V min ξ 3 1+p, σ + σ 2 + η 0 pred s n, s u,; ρ )/ρ, were σ = Cy ) s n y, + C V + λ V /2ρ ) Note tat all te quantities on te rigt and side of te above inequality are nown by te time s t y, needs to be computed Any iterative solver for te linearized state equation can be applied until te stopping criterion ist satisfied 6 A Posteriori Error Estimators for Inexact States and Adjoints In tis section we sow for a general semilinear elliptic PDE ow te required estimates 333) of te infinite dimensional residual norm in te wea formulation of te PDE and adjoint PDE can be implemented by using well nown a posteriori error estimators We consider te following problem 61) y + sy) = f y ν = g y = 0 in Ω on Γ N on Γ D were Ω R 2 is an open polygonal domain wit boundary Ω wose boundary edges are partitioned into a Neumann part Γ N and a disjoint Diriclet part Γ D, Ω = Γ N Γ D, sy) denotes a nonlinear) operator s : Y L 2 Ω), f L 2 Ω), g L 2 Γ N ), and y ν denotes te normal derivative of y wit te outer unit normal vector field ν of Ω Typical examples for te control action are distributed control, ie, u = f, and Neumann boundary control, ie, u = g We use te notation Cy) = 0 for te wea formulation of te PDE Cy), v V,V = y, v) L2 Ω) + sy), v) L2 Ω) f, v) L2 Ω) g, v) L2 Γ N ) We set Y = V = HD 1 Ω) := {y H1 Ω) : y ΓD = 0} and assume tat te given PDE as a unique solution See [21, 27] for sufficient assumptions on sy) For example 25

26 in te case sy) = y 3, as occuring in te following examples, te teory of maximal monotone operators guarantees a unique solution operator f, g) L 2 Ω) L 2 Γ N ) y Y for tis PDE tat is locally bounded, see for example [21, 27] We discretize te problem by using a finite element metod on a regular triangulation T of Ω consisting of closed triangles T and coose te standard finite element space Y = V := {y CΩ) : y T P T ), T T }, were P T ) denotes te space of polynomials of degree Ten te discretized constraint is given by C y ), v V,V = y, v ) L 2 Ω) + sy ), v ) L 2 Ω) f, v ) L 2 Ω) g, v ) L 2 Γ N ) Now let y Y be a possibly inexact solution of te finite element discretization We want to estimate te residual Cy ) V Te desired estimate 333a) is ten Cy ) V C 1 ηy ) + C 2 C y ) V for some bounded constants C 1, C 2 > 0 As we consider tis general semilinear case te results can be applied not only to te state equation but also to te corresponding estimate 333b) for te adjoint equation We consider bot averaging and residual based error estimation tecniques As we will see tese well nown a posteriori error estimators can be used in our context Triangulation and Notation We will use te following notation for te triangulation Let as already introduced T denote a triangulation of te computational domain Ω R 2 consisting of closed triangles T Let N denote te set of nodes ie te vertices of elements of te triangulation T ) and let E denote te edges in T Let E Ω = E \ {E E, E Ω} denote te inner edges in Ω We assume tat te edges can be partitioned into te Neumann edges E N = {E E, E Γ N } and te Diriclet edges E D For any node z N we define te patc around z as ω z := int {T T : z T }) Moreover, let ω T denote te patc around a triangle T T and let ω E be te union of tose two triangles tat sare te edge E E Let T and E be T and E piecewise constants on Ω defined by T T := T := diamt ) and E E := E := diame) for T T and E E, respectively Finally, let z = diamω z ) for z N and denote by K := N \ Γ D te set of free nodes 61 Averaging Error Estimators for Inexact States Averaging tecniques, also called gradient) recovery estimators, estimate te energy error y y L 2 Ω) by q y L2 Ω), were q is generated from postprocessing p := y suc tat it is a iger order approximation of y tan p In global averaging tecniques te procedure consists in approximating te piecewise smoot discontinuous function p = y by some globally continuous function q = Ap ), wic is piecewise a polynomial of iger degree A well nown example is te ZZ-estimator of Zienewicz and Zu [32] tat will be discussed below In local averaging tecniques p = y is locally approximated on patces ω by polynomials of iger order 611 Averaging Error Estimator for Linear Finite Elements Consider te linear finite element space Y = V = Q := {y CΩ) : y T P 1 T ), T T, y ΓD = 0} 26

27 Let y Y and p = y be its piecewise constant gradient Define te average A z p := ω z p dx/ ω z R 2 of p on ω z Wit te nodal basis function ϕ defined as ϕ continuous, piecewise linear, ϕ z z) = 1 and ϕ z x) = 0 for all x N \ {z}) define 62) Ap ) := z NA z p )ϕ z Q Q Ten te averaging estimator is defined by 63) η A y ) := y A y ) L 2 Ω) 2 Notice tat tere is a minimal version η M y ) := min q Q 2 y q L 2 Ω) 2 η A y ) Remar 61 1 η A is a reliable and efficient error estimator for te energy norm of te difference of te smoot solution y of te wea formulation of te Poisson problem and its first order finite element approximation y, y y L2 Ω) 2, cf [9, 55]) 2 Following Brenner and Carstensen in [9], η A and η M are very close and accurate estimators in many numerical examples 3 Note tat te averaging estimator η A is locally computable Indeed, we see tat η A p ) = p z NA z p )ϕ z L 2 Ω) = T T T [ p T z NA ] 2 z p )ϕ z T dx Let y Y be an inexact solution of te finite element discretization of te PDE on te given mes T To evaluate te residual in te variational formulation 64) Cy ) = ay, ) g, ) L 2 Γ N ) + sy), ) L 2 Ω) f, ) L 2 Ω) V = Y, were av, w) = v, w) L2 Ω), we consider Cy ), v V,V = y, v) L 2 Ω) g, v) L 2 Γ N ) + sy), v) L 2 Ω) f, v) L 2 Ω) 1/2 wit v V, v V = 1 Ten taing te supremum over all suc v we obtain te norm Cy ) V Let Π denote te L 2 projection onto te first order finite element space V on Ω and set v := Πv for v V By linearity we ave 65) Cy ), v V,V = Cy ), v v V,V + Cy ), v V,V It remains to derive upper bounds for te last two summands to estimate te norm as desired We begin wit te estimation of te first summand of te rigt and side of equation 65) Here, we first consider te last two summands of 64) tested wit v v Since v v is L 2 -ortogonal onto Πf we obtain fv v ) dx = f Πf)v v ) dx 1 T v v ) L2 Ω) T f Πf) L2 Ω) Ω Ω 27

28 Notice tat T f Πf) L 2 Ω) = ot is of iger order We use ot to denote iger order terms Tese are generically muc smaller tan an estimator η, but tis depends on te smootness of te given data In general, ot may be neglected, but in case if ig oscillations tey may even dominate η Te first-order approximation property of te L 2 projection, 1 T v v ) L 2 Ω) C approx v L 2 Ω), cf [9, 55]), yields fv v ) dx C approx v L 2 Ω) T f Πf) L 2 Ω) = otf) Ω since v L 2 Ω) v V = 1 Similarly we obtain Ω sy)v v ) dx C approx v L 2 Ω) T sy) Πsy) L 2 Ω) = otsy)) Now we proceed to te first two summands in 64) tested wit v v We follow te analysis in [9] Set p := y and let q be arbitrary in Q n Ten tere olds y, v v )) L2 Ω) g, v v ) L2 Γ N ) = p q) v v ) dx + q v v ) dx gv v ) dsx) Ω Ω Γ N Te H 1 stability of te projection Π yields 66) v Πv) L2 Ω) C stab v L2 Ω) for some C stab > 0 Tus, using te Caucy Scwarz inequality and elementwise) integration by parts togeter wit Gauss teorem we obtain y, v v )) L2 Ω) g, v v ) L2 Γ N ) C stab v L2 Ω) p q L2 Ω) v v )div T q dx + C stab p q L 2 Ω) + 1 T Ω Ω v Πv) L 2 Ω) T div T q L 2 Ω) + 1/2 E v v ) L 2 Γ N ) 1/2 E q n g) L 2 Γ N ) v v )q n g) dsx) Note tat div p ) T inverse inequality = 0 for all T T, since y is piecewise linear Terefore, te T div q L 2 T ) = T div q p ) L 2 T ) C inv q p L 2 T ) T T yields T div T q L 2 Ω) C inv q p L 2 Ω) Togeter wit te approximation property of Π on te edges 1/2 E v v ) L2 Γ N ) C approx v L2 Ω), 28

29 cf [9, 55]), we obtain y, v v )) L2 Ω) g, v v ) L2 Γ N ) C stab + C approx C inv ) p q L2 Ω) + C approx 1/2 E q n g) L 2 Γ N ) Hence, te estimates of te summands in 64) yield for v V wit v V = 1 67) Cy ), v v { C est min p q L q Q n 2 Ω) + 1/2 E q n g) } L 2 Γ N ) + otf) + otsy)) =: C est ηy, g) + otf) + otsy)) were ηy, g) denotes te desired estimator Note tat te iger order terms may be integrated in te estimator It remains to estimate te second summand in 65) Te H 1 stability and te first order approximation property of te L 2 projection Π give for v V wit v V = 1 and v := Πv 68) v V = v V v v V + v V 1/2 = v v 2 L 2 Ω) + v v) L Ω)) 2 + v V 1/2 2 T Capprox v 2 2 L 2 Ω) + C2 stab v 2 L Ω)) + 2 v V T C2 approx + C 2 stab ) v V C proj v V wit some C proj > 0, since v L2 Ω) v V and since T is bounded Hence, for v V = 1 we obtain v V C proj and tus by using te definition of C 69) Cy ), v = C y ), v C V,V V,V proj C y ) V Consequently, te estimation of te summands in 65) yields 610) Cy ) V C est ηy, g) + C proj C y ) V + otf) + otsy)) wit ηy, g) from 67) Te averaging estimator may ten be calculated by 63) aving regard to 67) If te iger order terms are not neglected, tey may be integrated in te estimator-calculation 612 Averaging Error Estimator for Higer Order Finite Elements For simplicity te error estimator is developed only for 2-dimensional spaces Neverteless te same teory is valid in tree space dimensions Only a few constants in some proofs will cange due to larger overlaps of patces in 3D Let d be te local) polynomial degree of te finite elements and let P G) denote algebraic polynomials on te domain G R 2, of degree at most Let S d = P d T ) CΩ) be te finite element space of continuous functions on Ω tat are T elementwise polynomials of degree at most d N For generality different polynomial degrees are allowed As in te linear finite element case {ϕ z } z N sall denote te continuous T elementwise linear nodal basis functions We follow te analysis in [1] were te autors define a projection operator J on local 29

30 polynomial spaces as follows For eac fixed node z N \ K tey coose a neigboring free node ζ K and tereby define a relation R on N were zrz if z K Ten, tey define ψ z := ϕ ζ and Ω z := intsupp ψ z ) ζ N,ζRz Tey require tat for eac z K, Ω z is connected and ϕ z ψ z implies tat Ω z ) Γ D as a positive surface measure Ten {ζ N : ζrz} : z K) is a partition of N and ψ z : z K) is a partition of unity For eac z K tey define te degree minimal degree allowed on Ω z minus one) dz) := max{ N 0 : P Ω z )ϕ z S}, were P Ω z ) denotes te set of all polynomials on R 2 of total degree at most restricted to Ω z Te set S H 1 Ω) is some finite element space consisting of functions tat are T -elementwise polynomials and globally continuous Moreover, one requires tat SD 1 T ) := {y S 1 T ) : y ΓD = 0} S, wic implies tat dz) is well defined and larger tan or equal to zero For g L 1 Ω), z N te autors of [1] define g z P dz) Ω) by g z ϕ z gψ z )q z dx = 0 q z P dz) Ω z ), Ω z and ten tey define 611) J g := z K g z ϕ z S H 1 DΩ) According to cf [1, Rem 22]), J g is well defined In te following we state a few results from [1] wic are necessary to develop an estimator for te residual of te given PDE 61) in te presence of inexact states Proposition 62 Tere exist T, E ) independent constants C stab > 0, C approx > 0 and C > 0 suc tat for all g H 1 D Ω) and f L2 Ω) 1) te stability of J 2) te approximation properties of J 3) and te enanced stability of J Ω g J g) L2 Ω) C stab g L2 Ω), 1 T g J g) L 2 Ω) C approx g L2 Ω), 1/2 E g J g) L2 Γ N ) C approx g L2 Ω), fg J g) dx C g L 2 Ω) z K 2 z min f f z 2 L f z P dz) Ω 2 Ω z) z) old Te constants depend only on Ω, Γ D, Γ N, te degrees dz), z K, and te sapes of te elements T T and te patces Ω z, z K For a proof see [1, Tm 21] 30 ) 1 2

31 Lemma 63 Suppose S = {v CΩ) : T T, v T P dt ) T )} for positive integers dt ), T T, and let d E, E E, be nonnegative integers Ten tere exists a constant C > 0 suc tat for all u S and eac z K we ave min q z P dz)+1 Ω u q z 2 z) 2 L 2 Ω C z) E E Ωz min q E P de ω E ) u q E 2 2 L 2 ω E ), were E Ωz is te set of edges E Ω z wit E Ω z and ω E = T T,E T T, E E, is te union of tose triangles tetraedra) tat sare te edge face) E Te constant C depends on te degrees dz) and d E as well as on te sapes of te elements and patces, but not on teir diameters For a proof see [1, Lem 31] Lemma 64 Let and d E, E E, be nonnegative integers and let p be a possibly discontinuous) piecewise polynomial on T wit local degrees on T T at most and let g be a piecewise polynomial on E Γ N wit local degrees at most +1 Ten, ) min p q 2 q S +1 T ) 2 L 2 Ω) + 1/2 E g q ν) 2 L 2 Γ N ) C ) min p q E 2 q E P de ω E ) 2 L 2 ω E ) + E g q E ν 2 L 2 E Γ N ) E E Ω E N wit a constant C > 0 tat depends on te degrees and d E as well as on te sapes of te elements and patces but not on teir diameters For a proof see [1, Lem 32] Te averaging error estimator is ten defined by 612) 1/2 η E y, g) = min p q E 2 1/2 q E E E P de ω E ) 2 L 2 ω E )) + E g g ) L2 Γ N ), Ω E N q E ν=g on E Γ N were g is a piecewise polynomial on E N wit local degrees at most d E Te polynomial degree d E on ω E is cosen accordingly to te elementwise degrees of y on ω E If problem 61) is a boundary control problem, ten g equals te control wic is usually given in te finite element space wic arises from te restriction of Y onto te Neumann boundary Ten one cooses g = g Oterwise it is reasonable to coose g as suitable projection of g onto te restriction of te finite element space to te Neumann boundary S d D D According to [1] for g L 2 Γ N ) wit g E H d E E) for all E E tere exists g L Γ N ) wit g E P de E) for all E E N suc tat te last summand is of iger order, ie 1/2 E g g ) L2 Γ N ) C de+1/2 E d E g/ s d E L2 Γ N ) for some C > 0 Now we ave te tools to estimate Cy ) V Let Y = V = SD d for some d N and let y Y Let v V wit v V = 1 and let J be te projection onto Y from 611) Ten we ave 613) Cy ), v = Cy ), v J v + Cy ), J v 31

32 We begin wit te estimation of te first summand Let Q denote te space of gradients in L 2 Ω) 2 tat are continuous and T elementwise polynomials wit degree at most d, Q = S d ) 2 Set p = y and let q Q Ten we obtain 614) Cy ), v J v =ay, v J v) g, v J v) L 2 Γ N ) + sy ), v J v) L2 Ω) f, v J v) L2 Ω) For te tird and fourt summand in te latter expression 614) we use te enanced stability of J from Lemma 62 and get Ω sy )v J v) dx C v L2 Ω) =:otsy )), z N 2 z min f z P d 1 ω sy ) f z 2 L 2 Ω) z) and, similarly, fv J v) dx otf) Ω We go on wit te first two summands from equation 614), and see, using te Caucy Scwarz inequality, integration by parts, Gauss teorem, te stability property of J, Minowsi s teorem and te triangle inequality, y v J v) dx gv J v) dsx) Ω Γ N = p q) v J v) dx + q v J v) dx gv J v) dsx) Ω Ω Γ N p q L2 Ω) v J v) L2 Ω) v J v)div T q dx Ω + v J v)q ν dsx) gv J v) dsx) Ω Γ N C stab p q L 2 Ω) + C v L 2 Ω) q ν g)v J v) dsx) Γ N C stab p q L2 Ω) + C z N z N ) 1 2 min f z P d 1 ω z) zdiv T q) f z ) 2 L 2 ω z) min q z P d ω z) 2 zdiv T q q z ) 2 L 2 ω z) + 1/2 E q ν g) L 2 Γ N ) 1/2 E v J v) L2 Γ N ) C stab p q L2 Ω) + C z div T q p ) 2 L 2 ω z) + C z N z N z N min q z P d ω z) 2 zdiv T p q z ) 2 L 2 ω z) ) 1 2 ) 1 2 ) 1 2 ) C approx 1/2 E q ν g) L 2 Γ N ) C stab p q L 2 Ω) + 3KC T div T q p ) L 2 Ω) ) C min zdiv T p q z ) 2 q z P d ω z) 2 L 2 ω z) + C approx 1/2 E q ν g) L 2 Γ N ), 32

33 since z K T, z T T, for some K > 0 and vol z N ω z ) 3volΩ) A T elementwise inverse estimate sows and T div T p q) L 2 T ) C inv p q L 2 T ) T T, z div T p q z ) L 2 ω z) C inv p q z L 2 ω z) z N, for some C inv > 0 Hence, we obtain ay, v J v) g, v J v) L 2 Γ N ) C stab p q L2 Ω) + 3KCC inv p q L2 Ω) + CC inv z N min q z P d ω z) p q z 2 L 2 ω z) ) C approx 1/2 E q ν g) L 2 Γ N ) All tis togeter gives, using Lemma 63 and Lemma 64, and minimizing over te arbitrarily cosen q Q, Cy ), v J v C stab + 3KCC inv ) min q Q y q L 2 Ω) + CC inv min q z P d ω y q z 2 L 2 ω z) z) z N 1/2 + C approx E q ν g ) L 2 Γ N ) + 1/2 ) E g g) L 2 Γ N ) + otsy )) + otu ) 1 C est min y q E E E P de ω E ) 2 Ω q E ν=g on E Γ N ) 1 2 q E 2 L 2 ω E ) + C approx 1/2 E g g) L 2 Γ N ) + otsy )) + otu ) C est η E y, g) + otsy )) + otu ) Now, we come to te second summand of 613) Using proposition 62, we obtain as in 68) in te linear finite element case, wit projection operator J instead of Π, J v V ) T C2 approx + Cstab 2 v V C proj v V, for some C proj > 0 Tus, we get Cy, u ), J v C proj sup v V =1 Cy, u ), v C proj C y, u ) V Hence, te estimations of te first and second summand in 613) yield 615) Cy ) V C est η E y, g) + C proj C y ) V + otsy )) + otu ) 33 2

34 If te iger order terms are not neglected, tey may be estimated wit te following lemma and integrated in te estimator calculation Lemma 65 For all z N tere exists an z independent constant C > 0 suc tat, if f ωz H d ω z ), d 1, we ave D d f = α f) α =d denotes te vector of all partial derivatives of order d) For a proof see [1, Lem 41] min f f z L 2 ω z) C d z D d f L 2 ω f z P d 1 ω z) z) 62 Residual Based Error Estimators for Inexact States Using te same idea from te previous section on averaging estimates we obtain te required estimation for te residual in te given semilinear elliptic PDE 61) Denote by I : L 2 Ω) SD 1 T ) te interpolation operator of Clément cf [30]) Remar 66 Recall te following properties of te quasi-interpolation operator I : Let v H 1 Ω), let ω T denote te patc around a triangle T T and let ω E denote te patc around an edge E E Ten tere exists C > 0 suc tat 1 v I v L 2 T ) C T v L 2 ω T ) 2 v I v L2 E) C 1/2 E v L 2 ω E ) 3 v I v) L2 T ) C v L2 ω T ) Using finite overlap property 3 yields v I v) L 2 Ω) C v L 2 Ω) for some C > 0 For a proof see [30] Now we follow te analysis in [17, pp ] and use te same arguments Recall tat we do not ave Galerin ortogonality since we do not assume tat y is an exact solution of C y ) = 0 To be able to use te same tecniques, again we devide te residual into two parts 616) Cy ), v V,V = Cy ), v I v V,V + Cy ), I v V,V for v V and ten tae te supremum over all testfunctions v V wit norm 1 We begin wit te estimation of te first summand Observe tat Cy), v I v V,V = T T r T, v I v) L2 T ) + E Er E, v I v) L2 E) wit te elementwise residuals r T y) = y +sy) f) T of te PDE and te jumps r E y) = [ν E y] E of te discontinuous) normal derivative of y across te edges E Hence, using te standard arguments we obtain for v V wit v V = 1 were η 2 y) = T T Cy), v I v V,V C est ηy) ηt 2 y) and ηt 2 y) = 2 T r T y) 2 L 2 T ) E r E y) 2 L 2 E) It remains to estimate te second summand in 616) Using te properties from Remar 66 of te Clément interpolation operator we see tat I v V I v v V + v V E T = I v v 2 L 2 Ω) + I v v) 2 L 2 Ω)) 1/2 + v V C 2 2 v 2 L 2 Ω) + C2 v 2 L 2 Ω)) 1/2 + v V C proj v V = C proj 34

35 for v V wit v V = 1 Consequently, we obtain 7 Numerical Results Cy) V C est ηy) + C proj C y) V 71 A Distributed Optimal Control Problem Problem 71 We consider te following problem min fy, u) := 1 y H0 1 2 y y d 2 L Ω),u L2 Ω) 2 Ω) + α 2 u 2 L 2 Ω) st y + y 3 = u in Ω, y = 0 on Ω, were Ω R 2 is a polygonal domain, y d H 1 0 Ω) and α > 0 Te cost function and te constraint operator in te wea formulation are twice continuously Frécet differentiable Tese functions and teir derivatives are bounded on a bounded subset D Y U Moreover, te teory of maximal monotone operators guarantees tat tere exists a unique solution operator for tis PDE tat is uniformly bounded Hence, te required assumptions for algoritm 39 are satisfied It is furtermore well nown tat tis optimization problem as a solution 711 Estimators for te convergence conditions We are in te situation to use error estimators as in section 6 for te infinite dimensional norm of te residual in te PDE constraint Te Lagrangian function is given by ly, u, λ) = 1 2 y y d L2 Ω) + α 2 u L 2 Ω) + λ, Cy, u) H 1 0 Ω),H 1 Ω) Tus, te u gradient of te Lagrangian reads l u y, u, λ) = αu λ, ) L 2 Ω) Hence, te norm of te u gradient of te Lagrangian is easy to evaluate following Riesz representation teorem: l u y, u, λ) L2 Ω) = αu λ L 2 Ω) Tus, te convergence condition on te u gradient of te Lagrangian 331) is always satisfied since we calculate exact L 2 norms for a given discrete control u and discrete adjoint state λ Te y gradient of te Lagrangian ly, u, λ) is given by l y y, u, lambda) = y y d, ) L 2 Ω) + aλ, ) + 3λy 2, ) L 2 Ω) Again, te residual in te adjoint equation can be estimated wit te tecniques from section Numerical Results for te Distributed Optimal Control Problem For te testproblem 71 we used te following configuration: Ω = L-saped domain, α = 1e 4, y d = 1 We used preconditioned Krylov solvers as iterative solvers wit incomplete Colesy factorizations in te preconditioner on te KKTsystem of te tangential step and in te quasi-normal step We calculated bot wit linear finite elements and te averaging ZZ-estimator and wit quadratic finite elements and te averaging estimator proposed by Bartels and Carstensen We sow te results for quadratic finite elements and te averaging estimator of Bartels and Carstensen In figure 71 we see a table of error estimators wit te iteration number in te first column, te error estimator for te constraint in te second column, te error 35

estimator for te adjoint state in te tird column, te norm of te inexact reduced gradient in te fourt column, te degrees of freedom in te fift column and te degrees of freedom one would need to acieve

30e-3 16e-2 927 2945 5 15e-1 22e-3 21e-3 927 2945 6 12e-1 18e-3 17e-3 1399 2945 7 67e-2 11e-3 12e-3 2459 12033 8 65e-2 10e-3 14e-4 2459 12033 9 48e-2 72e-4 25e-3 3834 48641 10 27e-2 47e-4 24e-3 7848

36 estimator for te adjoint state in te tird column, te norm of te inexact reduced gradient in te fourt column, te degrees of freedom in te fift column and te degrees of freedom one would need to acieve te same accuracy on uniformly refined meses in te sixt column It η C, x ) η Ly,λ + λ ) ĝ U DOF Uniform DOF 1 32e-1 18e-2 56e e-1 87e-3 51e e-1 96e-3 34e e-1 30e-3 16e e-1 22e-3 21e e-1 18e-3 17e e-2 11e-3 12e e-2 10e-3 14e e-2 72e-4 25e e-2 47e-4 24e e-2 24e-4 19e e-2 28e-4 11e e-2 24e-4 24e Fig 71 Table of error estimators Te same accuracy on uniform meses would ere require more tan 20 times te degrees of freedom on our adaptively refined meses In figure 72 we see te last grid produced by te multilevel SQP algoritm as well as te optimal control and te optimal state Fig 72 Last 9t) grid, optimal control, optimal state 72 A Boundary Control Problem Problem 72 We consider te following problem taen from [2] min y H 1 Ω),u L 2 Γ C ) fy, u) := 1 2 y y d 2 L 2 Γ O ) + α 2 u 2 L 2 Γ C ) st y + y 3 y = 1 in Ω, n y = 0 on Ω \ Γ C, n y = u on Γ C were α = 1e 4, y d = 1, n denotes te normal derivative, Ω = is a T-saped domain, Γ C bottom boundary of T, Γ O upper boundary of T, ie T = int[1/4, 3/4] [0, 1/2] [0, 1] [1/2, 3/4]), Γ O = [0, 1] {3/4}, Γ C = [1/4, 3/4] {0} Again we used preconditioned Krylov solvers as iterative solvers wit incomplete Colesy 36

factorizations in te preconditioner on te KKT-system of te tangential step and in te quasi-normal step We calculated wit quadratic finite elements and te averaging estimator from Bartels and

37 factorizations in te preconditioner on te KKT-system of te tangential step and in te quasi-normal step We calculated wit quadratic finite elements and te averaging estimator from Bartels and Carstensen In figure 73 we see a table of error estimators wit te iteration number in te first column, te error estimator for te constraint in te second column, te error estimator for te adjoint state in te tird column, te norm of te inexact reduced gradient in te fourt column and te degrees of freedom in te fift column It η C, x ) η Ly,λ + λ ) ĝ U DOF 1 27e-3 24e-3 34e e-3 84e-5 21e e-3 79e-4 60e e-3 39e-4 34e e-3 84e-5 61e e-3 69e-5 85e e-3 63e-5 36e e-3 59e-5 36e e-3 57e-5 82e e-4 51e-5 34e e-4 51e-5 58e e-4 41e-5 19e e-4 41e-5 18e e-4 38e-5 18e Fig 73 Table of error estimators In figure 74 we see te last grid produced by te multilevel SQP algoritm as well as te optimal control and te optimal state Fig 74 Last 8t) grid, optimal control, optimal state REFERENCES [1] S Bartels and C Carstensen, Eac averaging tecnique yields reliable a posteriori error control in FEM on unstructured grids II Higer order FEM, Mat Comp, ), pp electronic) [2] R Becer, H Kapp, and R Rannacer, Adaptive finite element metods for optimal control of partial differential equations: basic concept, SIAM J Control Optim, ), pp electronic) 37

Convergence and Descent Properties for a Class of Multilevel Optimization Algorithms

Convergence and Descent Properties for a Class of Multilevel Optimization Algoritms Stepen G. Nas April 28, 2010 Abstract I present a multilevel optimization approac (termed MG/Opt) for te solution of