Master Thesis. POD-Based Bicriterial Optimal Control of Time-Dependent Convection-Diffusion Equations with Basis Update

Size: px

Start display at page:

Download "Master Thesis. POD-Based Bicriterial Optimal Control of Time-Dependent Convection-Diffusion Equations with Basis Update"

Baldric Walsh
5 years ago
Views:

1 Master Thesis POD-Based Bicriterial Optimal Control of Time-Dependent Convection-Diffusion Equations with Basis Update submitted by Eugen Makarov at the Department of Mathematics and Statistics Konstanz,.6.18 Supervisor and 1st Reviewer: Prof. Dr. Stefan Volkwein, University of Konstanz Konstanzer Online-Publikations-System (KOPS) URL:

3 Erklärung der Selbstständigkeit Ich versichere hiermit, dass ich die vorliegende Masterarbeit mit dem Thema: POD-Based Bicriterial Optimal Control of Time-Dependent Convection-Diffusion Equations with Basis Update selbstständig verfasst und keine anderen Hilfsmittel als die angegebenen benutzt habe. Die Stellen, die anderen Werken dem Wortlaut oder dem Sinne nach entnommen sind, habe ich in jedem einzelnen Falle durch Angaben der Quelle, auch der benutzten Sekundärliteratur, als Entlehnung kenntlich gemacht. Die Arbeit wurde bisher keiner anderen Prüfungsbehörde vorgelegt und auch noch nicht veröffentlicht. Konstanz, den. Juni 18 Eugen Makarov

5 Contents 1 Introduction 1 Basic Concepts and Notations 3.1 Order on R and Useful Inequalities Function Spaces First Order Abstract Evolution Problem Bicriterial Optimization Problems Problem Formulation Notion of Optimality Euclidean Reference Point Method Problem Formulation and Analytical Results Optimality Conditions A-Posteriori Error Estimates Optimization Algorithm Proper Orthogonal Decomposition for Abstract Evolution Problem Continuous Version of the POD Method Discrete Version of the POD Method Bicriterial Optimal Control of Time-Dependent Convection-Diffusion Equations Problem Formulation Well-Posedness of the State Equation Reduced Optimal Control Problem Adjoint Equation Reduced-Order Modelling Using POD POD Approximation of the State Equation POD Approximation of the Objective Function POD Approximation of the Adjoint Equation A-Priori Convergence Analysis A-Posteriori Analysis POD-Based Optimization Algorithms Numerical Results Implementational Aspects Example I Solving the Full Problem Solving the POD Approximated Problem

6 vi Contents 6.3 Example II Solving the Full Problem Solving the POD Approximated Problem Conclusion 91 Bibliography 94

7 Chapter 1 Introduction Optimal control problems governed by partial differential equations (PDEs) occurs in various fields such as engineering, astronautics, medicine and economy, see for instance [1] and []. Mathematically, a PDE describes a certain process and couples a control variable u from an admissible set U ad with the state variable y which is given by the solution of the PDE. Thereby, the aim of an optimal control problem consists in minimizing the cost function, also called objective function, depending on the state y and the control u. However, in real applications, optimization problems are often described by utilizing two or even more objective functions which are usually conflicting with each other. This leads to the multiobjective or multicriterial optimization. One prominent example of those optimization problems is the energy efficient heating, ventilation and air-conditioning (HVAC) process of a building, in which the objective of reaching the maximal comfort, i.e. the desired temperature distribution, is conflicting with the minimal energy consumption, see [3]. Therefore, the main issue is to find a solution which is a good compromise. Unfortunately, there is usually not only one singular minimizer of all objectives. Thus, the notion of the Pareto optimality has been developed. The idea behind it is to present the set of all possible compromises to the decision maker. Then, due to his insights, one optimal solution can be chosen. Consequently, in contrast to scalar-valued optimization problems, the computation or at least an approximation of the Pareto set is required. For this purpose, several solution methods based on this concept have been developed. The most popular approach consists in transforming the multiobjective optimization problem into a series of scalar-valued problems and solving them using well-known techniques from scalar optimization. Popular scalarizations are, for instance, the weighted sum method and reference point methods, see [4], [5]. In this thesis we consider a bicriterial optimal control problem governed by the heat equation with a convection term and bilateral control constraints which arise in HVAC operation of building applications. In contrast to the problem presented in [1], we allow the convection term to be time-dependent and investigate its influence on the optimal control problem. For this purpose, we apply the Euclidean reference point method, which is a special case of the reference point method, in order to transform the bicriterial optimal control problem into a series of scalar-valued optimal control problems. In order to make the computation effort feasible, we apply the proper orthogonal decomposition method (POD) which is a well-known model-order reduction technique, see [6]. In the context of this thesis, we derive new a-priori estimates for the approximation error in the objective and control space. In our numerical experiments we analyse the results and compare them with the results for the time-independent convection term. Furthermore, new strategies for efficiently updating the POD basis in the optimization process are proposed and tested numerically. Now, we give an outline of the thesis with a brief description of the chapters.

8 1 Introduction Chapter First of all, important concepts and results from functional analysis, which will be required in the course of this thesis, are introduced. Chapter 3 In this chapter the theory of bicriterial optimization problems, which is a special case of multiobjective optimization, is presented. First of all, an optimality concept, the so-called Pareto optimality, is introduced in order to obtain solution approaches. In the further process of the chapter, the Euclidean reference method is considered which allows to compute the set of optimal solutions. Finally, a numerical approach for approximating the solution set is presented. Chapter 4 This chapter provides an introduction in the proper orthogonal decomposition (POD) method, which is a widely used model order reduction technique, for the parameter-dependent abstract evolution problem. First, the continuous version of the POD approach is introduced. Afterwards, the numerically feasible version of the POD method is considered. Eventually, the numerical computation of a POD basis is discussed. Chapter 5 In this chapter a bicriterial optimal control problem governed by a heat equation with a timedependent convection term and bilateral control constraints is considered. In the first part of the chapter it is shown that the present problem fits into the framework of the bicriterial optimization treated in Chapter 3 and the Euclidean reference point method is applied. In the second part, as this method transforms the bicriterial optimal control into a series of scalar optimization problems, the method of proper orthogonal decomposition (POD), presented in Chapter 4, is applied as an approach for the model-order reduction. Chapter 6 Next, the numerical experiments for the bicriterial optimal control problem governed by the convection-diffusion equation with a time-dependent convection term and bilateral control constraints are presented and analysed. In the first part of the chapter, the spatial domain Ω is assumed to be perfectly isolated, which yields a homogeneous Neumann boundary condition in the PDE-constraint. In order to investigate this problem numerically, the results obtained by solving both the full and the POD approximated problem are analysed. For this purpose, Algorithms 1, and 3 are used. Afterwards, new strategies for efficiently updating the POD basis in the optimization process are proposed and tested numerically. In the last part of the chapter results for the optimal control problem using the inhomogeneous Robin boundary condition are analysed and compared with the previous results. Chapter 7 Finally, we draw a conclusion of the obtained results.

9 Chapter Basic Concepts and Notations.1 Order on R and Useful Inequalities Definition.1. Let R := R {, }. (i) For x, y R we write x y : x 1 y 1 and x y, x < y : x 1 < y 1 and x < y. Note that the set (R, ) is partially but not totally ordered. Furthermore, in contrast to the scalar case, the relation x y does not imply (x < y or x = y). Therefore, we additionally define (ii) For x R we define the sets x y : x y and x y. R x := {y R y x} R, R x := {y R y x} R. Analogously, we define the sets R x and R x. For convenience, we additionally write R := R, R := R and in the same manner we define R and R. (iii) Lastly, for x, y R with x y we define (x, y) := {z R x < z < y}, [x, y] := {z R x z y}. Note that it is (x, y) = (x 1, y 1 ) (x, y ) and that [x, y] has an analogous representation. Definition.. Let g : R R be a function. We call g (i) monotonically increasing if x, y R : x y g(x) g(y), (ii) strictly monotonically increasing if it is monotonically increasing and x, y R : x < y g(x) < g(y).

10 4 Basic Concepts and Notations We cite from [6, p. 9] the following results. Lemma.3 (Gronwall s inequality). For T > let v : [, T ] R be a non-negative, differentiable function satisfying v (t) ϕ(t)v(t) + χ(t) for all t [, T ], where ϕ and χ are real-valued, non-negative, integrable functions on [, T ]. Then it holds ( t ) ( v(t) exp ϕ(s) ds v() + t ) χ(s) ds for all t [, T ]. In particular, if v ϕv in [, T ] and v() = hold, then v = in [, T ]. Lemma.4 (Young s inequality). For every a, b R and every ε > it holds. Function Spaces ab εa + b ε. In this section we define function spaces, which will be used in this master thesis. For n N let Ω R n be a non-empty, open and bounded set. Furthermore, let L(Ω) := {ϕ : Ω R ϕ is Lebesgue measurable} be the space of Lebesgue measurable functions and denote by ϕ(x) dx the Lebesgue integral of ϕ L(Ω). Definition.5. For p N { } we denote by L p (Ω) the Lebesgue space defined as Ω L p (Ω) := {ϕ L(Ω) ϕ L p (Ω) < }, where the associated Lebesgue norm is given by ( ) 1/p ϕ L p (Ω) := ϕ(x) p Ω for 1 p < and ϕ L (Ω) := ess sup{ ϕ(x) : x Ω} for p =. Remark.6. It is well-known that the Lebesgue space L p (Ω) is a Banach space for any p N { }. Moreover, L (Ω) is even a Hilbert space (see for instance [7]).

11 .3 First Order Abstract Evolution Problem 5 Definition.7. For p N { } we denote by W 1,p (Ω) the Sobolev space defined as W 1,p (Ω) := {ϕ L p (Ω) ϕ W 1,p (Ω) < }, where the associated Sobolev norm is given by ( n 1/p ϕ W 1,p (Ω) := ϕ L p (Ω) + ϕ x i for 1 p < L (Ω)) p and ϕ W 1, (Ω) i=1 ( := max ϕ 1 i n L (Ω), ϕ x i For convenience, we additionally define H 1 (Ω) := W 1, (Ω). L (Ω) ) for p =. Definition.8. Let (X, X ) be a Banach space and T >. Then for any p N { } we denote by L p (, T ; X) the so-called Bochner space which is defined as follows: L p (, T ; X) := {ϕ : [, T ] X ϕ is Bochner measurable and ϕ L p (,T ;X) < }, where the associated Bochner norm is given by and ( T ) 1/p ϕ L p (,T ;X) := ϕ(t) p X dt for 1 p < ϕ L p (,T ;X) := ess sup ϕ(t) X for p =. t [,T ].3 First Order Abstract Evolution Problem Let (V,, V ) and (H,, H ) be two real, separable Hilbert spaces. Moreover, suppose that V is dense in H with compact embedding. Consequently, there is a constant C V > such that ϕ H C V ϕ V for all ϕ V. (.1) By identifying H with its dual space (by using Riesz theorem) we have V H H V, where each space is dense in the following one. Furthermore, let T > be the final time. Definition and Remark.9. The function space equipped with the inner product ϕ, ψ W (,T ) := W (, T ) := {ϕ L (, T ; V ) ϕ t L (, T ; V )} T is a Hilbert space (see [, pp ]). ϕ(t), ψ(t) V + ϕ t (t), ψ t (t) V dt for all ϕ, ψ W (, T )

12 6 Basic Concepts and Notations Theorem.1. For any ϕ W (, T ) it holds ϕ C(, T ; H) and the embedding W (, T ) C(, T ; H) is continuous, i.e. there is a constant C W > satisfying ϕ C(,T ;H) C W ϕ W (,T ) for all ϕ W (, T ). Proof. A proof of this statement can be found in [8, p. 473]. Theorem.11 (Trace Theorem). Suppose Ω R n is bounded with C 1 boundary Ω. Then there exists a bounded linear operator such that (i) T u = u Ω if u H 1 (Ω) C( Ω), T : H 1 (Ω) L ( Ω) (ii) T u L p ( Ω) C u H 1 (Ω) for each u H1 (Ω) with a constant C depending only on Ω. Proof. A proof can be found in [9, pp ]. The aim of this section is to introduce a solution concept for the abstract evolution problem y t (t), ϕ V,V + a(t; y(t), ϕ) = f(t), ϕ V,V for all ϕ V a.e. y(), ϕ H = y, ϕ H for all ϕ H, (AEP) where a(t;, ) : V V R is a bilinear form almost everywhere (a.e.) in t (, T ), f : (, T ) V is a function, y H and, V,V stands for the dual pairing between V and its dual space V. Remark.1. (i) Note that for y W (, T ) it holds y t (t), ϕ V,V = d dt y(t), ϕ H for all ϕ V, see [8, p. 57]. (ii) The initial condition is meaningful in H due to the continuous embedding W (, T ) C(, T ; H). In order to ensure the well-posedness of the equation (AEP) we state the following assumption. Assumption 1. Suppose that the bilinear form a(t;, ) : V V R satisfies the following conditions: 1. the mapping t a(t,, ) is measurable,. there exists a constant C a >, which is independent of t, such that a(t; ϕ, ψ) C a ϕ V ψ V for all ϕ, ψ V a.e.,

13 .3 First Order Abstract Evolution Problem 7 3. there are constants γ > and η, which are independent of t, such that the coercivity condition a(t; ϕ, ϕ) γ ϕ V η ϕ H for all ϕ, ψ V a.e. holds. Remark.13. It follows that for f.a.a. t (, T ) the bilinear form a(t;, ) defines a linear operator A(t) : V V, A(t)ϕ, ψ V,V := a(t, ϕ, ψ) such that A(t) L(V,V ) C a f.a.a. t (, T ). The following theorem guarantees the existence of a unique solution to (AEP) which is continuously dependent on the data. Theorem.14. Let a(t,, ) be a bilinear form satisfying Assumption 1. Then for every f L (, T ; V ) and y H there is a unique solution y W (, T ) satisfying the equation (AEP) and ) y W (,T ) C ( f L (,T ;V ) + y o H for a constant C > which is independent of f and y o. Proof. The proof of the existence of a unique solution can be found in [8, pp. 51-5]. The continuous dependence on data follows from standard variational techniques and energy estimates.

15 Chapter 3 Bicriterial Optimization Problems In this chapter the theory of bicriterial optimization problems, which is a special case of multiobjective optimization, is established. First of all, an optimality concept, the so-called Pareto optimality, is introduced in order to obtain solution approaches. In the further process of the chapter the Euclidean reference method is considered which allows to compute the set of optimal solutions. Finally, a numerical approach for approximating the solution set is presented. The whole chapter is based on results which were derived in [1, pp ]. Therefore, in this master thesis we only cite the appropriate theoretical results. 3.1 Problem Formulation Let (U,, U ) be a real Hilbert space and let U ad U be non-empty, convex and closed. In the following we consider the optimization problem ( f1 (u) min f(u) := min u U ad u U ad f (u) where f 1, f : U ad R are real-valued functions. Definition 3.1. In this context we call ) (BOP) (1) the minimization problem (BOP) bicriterial optimization problem and the associated function f : U ad R bicriterial objective function, () the set U ad U of all admissible solutions to (BOP) admissible set and a vector u U ad admissible, (3) the set Y := f(u ad ) R objective admissible region and a vector y Y objective vector. 3. Notion of Optimality The aim of this section is to introduce an appropriate solution concept to the bicriterial optimization problem (BOP). However, unlike the scalar case, defining a minimizer ū U ad just by demanding f(ū) f(u) for all u U ad is not reasonable. This is due to the fact that the objective functions f 1 and f are usually conflicting with each other, i.e. there is no single solution that simultaneously optimizes both objectives. Thus, we need a generalized concept of an optimal solution. Actually, there are several minimality concepts for (BOP). In this thesis we use the notion of the so-called Pareto optimality.

16 1 3 Bicriterial Optimization Problems Definition 3.. The admissible vector ū U ad is called Pareto (optimal) point for (BOP) if there is no other point u U ad \{ū} such that f(u) f(ū). Furthermore, we denote by P s the so-called Pareto set and by P f the Pareto front which are defined by respectively. P s := {u U ad u is Pareto optimal} U and P f := f(p s ) R, Remark 3.3. (i) Expressed in words, a solution ū U ad to (BOP) is called Pareto optimal, if none of the objective functions f 1, f can be improved in value without degrading the other objective value. (ii) Typically there are lots of different Pareto optimal solutions. Thus, solving the optimization problem (BOP) consists in computing the set of optimal solutions, i.e. the Pareto set, or at least an approximation of it. (iii) The idea behind the Pareto optimality concept is to present a set of optimal solutions to the decision maker, who then can take a trade-off according to his preferences in order to choose one solution. (iv) The notion of Pareto optimality can also be defined for an arbitrary set X R n exactly in the same way as in Definition 3.. Now, we define the lower and the upper bound of the Pareto front, respectively. Definition 3.4. (1) The point y id R given by y id i := inf u U ad f i (u) for i = 1, is called ideal point of the bicriterial optimization problem (BOP). () The point y nad R given by y nad i := sup u P s f i (u) for i = 1, is called nadir point of the bicriterial optimization problem (BOP). Remark 3.5. The ideal point y id is the infimum of the Pareto front in the sense that for all y P f it holds y id y and there is no ỹ y id with ỹ y for all y P f. In an analogous way, the nadir point y nad is the supremum of the Pareto front. Thus, we can conclude P f [y id, y nad ]. 3.3 Euclidean Reference Point Method Having introduced a solution concept in the previous section, we learned that the theoretical aim of solving a bicriterial optimization problem (BOP) consists in computing both the Pareto set and the Pareto front or at least an approximation of it. Actually, in the specialised literature there is a vast range of methods that can be used to get Pareto optimal solutions. An overwiev about those methods can be found in [4].

17 3.3 Euclidean Reference Point Method 11 The usual approach to tackle a bicriterial optimization problem is the scalarization in which the bicriterial objective function is transformed into a scalar function and then minimized by well-known techniques from scalar optimization. The basic idea is that by performing different scalarizations, both the Pareto set and Pareto front can be approximated. In this thesis we introduce one particular scalarization method the so-called Euclidean reference point method which was investigated in [11, pp ] and [1, pp. 6-34] Problem Formulation and Analytical Results Definition 3.6. Given a reference point z P f + R := {z + x z P f and x R } we define the Euclidean distance function by F z : U R, F z (u) := 1 (f 1(u) z 1 ) + 1 (f (u) z ). Then the Euclidean reference point problem is given by the scalar optimization problem min F z (u) s.t. u U ad. (ERPP) Remark 3.7. (i) The mapping F z measures the squared Euclidean distance between f(u) and a reference point z for a given u U. Hence, minimizing F z is equivalent to searching an objective vector y Y that is nearest to the reference point z in terms of Euclidean distance. (ii) The factor 1 derivatives. is introduced in order to not to deal with any factors when considering the (iii) The condition z P f + R P f. means that the reference point z lies below the Pareto front Notation 3.8. In order to emphasize the dependency on a reference point z R, we denote the associated Euclidean reference point problem by (ERPP) z. Assumption. Assume that the objective functions f 1, f are strictly convex, continuous and bounded from below. If U ad is unbounded, suppose additionally that lim u U f i (u) = for all i {1, }. Theorem 3.9. Let z P f + R be a reference point and let Assumption be satisfied. Then the Euclidean reference point problem (ERPP) has a unique solution ū z U ad which is Pareto optimal. The next theorem shows that all Pareto optimals can be computed by solving an Euclidean reference point problem. Theorem 3.1. Let Assumption be satisfied. Then for every ȳ = f(ū) P f P f + R such that ū is the unique solution to (ERPP) z. there is z The following results prove that the unique solution to (ERPP) and its image depend continuously on the reference point.

18 1 3 Bicriterial Optimization Problems Assumption 3. Assume the objective functions f 1 and f are twice continuously differentiable with positive definite second derivatives f 1 and f. Moreover, assume that there is i p {1, } such that f ip is uniformly positive definite with coercivity constant C ip. Lemma Let Assumptions and 3 be satisfied and define the set Z := {z P f + R z ip < y id i p κ} for an arbitrary κ >. Then the mapping Z U ad defined by z ū z, where ū z is the unique solution to (ERPP) z, is Lipschitz continuous with Lipschitz constant 1 ( ) 1 κ Cip. Theorem 3.1. Let Assumption be satisfied. Then the mapping P f + R P f defined by z f(ū z ), where ū z is the unique solution to (ERPP) z, is Lipschitz continuous with Lipschitz constant 1. Lemma Let Assumption be satisfied and let z P f + R be arbitrary. Furthermore, denote by ū U ad the unique minimizer of (ERPP) z. Then ū is also the unique minimizer of (ERPP) z for each z = f(ū) + λ(z f(ū)) with λ, i.e. for all reference points lying on the ray that is starting in f(ū) and is going through z. Lemma Let Assumption be satisfied. Furthermore, let the reference points z 1, z P Y + R k be arbitrary and assume that ū1, ū U ad are the unique minimizers of F z 1 and F z, respectively. Then it holds f(ū 1 ) f(ū ) R k dist(z, R), where R R denotes the ray starting in f(ū 1 ) and going through z Optimality Conditions More information and detailed proofs of the following results can be found in [1, pp. 6-34]. Lemma Let Assumption be satisfied and let z P f + R be arbitrary. Furthermore, let ū U ad be the unique solution to (ERPP) z. Then it holds z f(ū). Theorem Let Assumption be satisfied and let z P f + R be arbitrary. Furthermore, assume that the functions f 1 and f are differentiable. Then the necessary first-order condition for the unique solution ū to the minimization problem (ERPP) z is given by where F z (u) := F z (ū), u ū U for all u U ad, (3.1) (f i (u) z i ) f i (u) for all u U ad. i=1 Moreover, if additionally f(ū) z holds, then the condition (3.1) is even sufficient. Theorem Let z P f + R be a reference point and let Assumption be satisfied. Then the necessary and sufficient first-order condition for the image f(ū) of the unique solution ū to the minimization problem (ERPP) z is given by f(ū) z, y f(ū) R k for all y Y + R.

19 3.3 Euclidean Reference Point Method 13 Theorem Let Assumptions and 3 be satisfied and let z P f + R be arbitrary. Then for all u U ad with f(u) z and all v U it holds F z (u)v, v U ( f ip (u) z ip ) Cip v U. Thus, if ū U ad satisfies (3.1), f(u) z and f ip (ū) > z ip, ū is a local strict minimizer of (ERPP) z A-Posteriori Error Estimates In this section a-posteriori estimates are presented which can be used in order to measure the distance between the solution ū to (ERPP) z and an arbitrary admissible vector u p U ad. The detailed proofs of the following results can be found in [1, pp ]. Theorem Let Assumptions and 3 be satisfied. Furthermore, let z Z := {z R y id z ip yi id p κ} for an arbitrary κ > and denote by ū U ad the associated unique solution to (ERPP) z. Then for an arbitrary u p U ad it holds and ū u p U ( C ip κ ) 1 ξ U f(ū) f(u p ) R 1 where ξ U is given such that the inequality is satisfied. F z (u p ) + ξ, u u p U ( Cip κ ) 1 ξ U, for all u U ad By making additional assumptions on the functions f 1 and f it is possible to improve the results of the previous theorem. Theorem 3.. Let Assumptions and 3 be satisfied and assume that the functions f 1, f are quadratic, i.e. it holds f i (u + h) = f i (u) + f i (u), h U + 1 f i (u)h, h U for all u U ad, h U such that u + h U ad and i {1, }. Furthermore, let z P f + R and denote by ū U ad the associated unique solution to (ERPP) z. Then for an arbitrary u p U ad satisfying f(u p ) z and f ip (u p ) > z ip we have [ ( )] fip (ū) + f ip (u p ) 1 ū u p U C ip z ip ξ U and f(ū) f(u p ) R 1 where ξ U is chosen such that is fulfilled. [ C ip ( fip (ū) + f ip (u p ) F z (u p ) + ξ, u u p U for all u U ad )] 1 z ip ξ U,

20 14 3 Bicriterial Optimization Problems Optimization Algorithm As already mentioned, we understand the task of numerically solving the bicriterial optimization problem (BOP) as computing an approximation of the Pareto set P s and Pareto front P f, i.e. as computing discrete substitutes P s := {ū, ū 1,..., ū N+1 } P s and Pf := f( P s ). Furthermore, we learned that all those Pareto optimal points can be found by solving the Euclidean reference point problem (ERPP) z for different reference points z P f +R. However, it is still not clear how the reference points have to be chosen in order to approximate the Pareto front in an equidistant manner. In this section we want to address this issue and present an algorithm which uses the Euclidean reference point method. More information about it can be found in [1, pp ] or [1, p. 3]. The computation process of the discrete Pareto set P s and Pareto front P f is of the recursive nature. The first optimal control ū can be obtained by minimizing the first objective f 1, i.e. by solving min f 1 (u) s.t. u U ad. (3.) Analogously, the last optimal control ū N+1 is obtained by solving min f (u) s.t. u U ad. (3.3) Since ū and ū N+1 are Pareto optimal points (see [1, p. ]), Definition 3. implies that decreasing the component f further than f (ū ) by varying the admissible control can only be achieved by increasing the component f 1 from f 1 (ū ). Viewing the objective space as a two-dimensional plane, this means that the Pareto front P f goes from f(ū ) to f(ū N+1 ). Consequently, the basic idea of the optimization algorithm consists in generating a series of reference points z 1,..., z N such that by iteratively solving the associated Euclidean reference point problems and then computing f(ū n ), we would move along P f in an equidistant manner. This is realized by iteratively defining a reference point z n+1 P f + R which captures the local geometry of the Pareto front P f. For this purpose, we observe the following properties of the Euclidean reference point problems: From Lemma 3.13 we know that ū n is a unique solution to (ERPP) z for each reference point z = f(ū n + λ(z n f(ū n )) with λ. This means that the vector ϕ := z n f(ū n ) R lies perpendicular to P f at f(ū n ) and the vector ϕ := ( ϕ, ϕ 1 ) R lies tangential to P f at the same point and points to the lower right. Consequently, by scaling ϕ accordingly we can determine how far along P f the next reference point will be chosen. Analogously, scaling ϕ allows to determine how close to P f the next reference point will be located. Thus, given a reference point z n and the associated Pareto optimum ū n the next reference point z n+1 can be recursively defined by ϕ z n+1 := f(ū n ) + h ϕ R ϕ + h ϕ R for n =,..., N (3.4) for suitable parameters ϕ, ϕ >. In order to compute the first reference point z 1, we use the fact that P f is vertical in the neighbourhood of f(ū 1 ). Therefore, we can set ϕ := (, 1) and

21 3.3 Euclidean Reference Point Method 15 ϕ := ( 1, ), i.e. ( z 1 := f(ū 1 h ) h Note that from Lemma 3.14 it follows f(ū n+1 ) f(ū n ) R h. ). (3.5) Hence, h should be chosen according to our desired approximation fineness. Furthermore, we do not know a-priori how many discrete Pareto optimal points we will get by using this approach and thus how large N will be. However, we can stop the iteration if the new reference point z n+1 fulfils z1 n+1 (ūend f ) 1, where ū end is the unique solution to (3.3). The whole procedure is summarized in Algorithm 1. Algorithm 1 Algorithm to compute the Pareto front Require: Maximum number N max N of Pareto points, recursive parameters h, h > ; 1: Solve (3.) and (3.3) in order to obtain ū and ū end ; : Set P s {ū } and P f {f(ū )}; 3: Set n and compute z 1 by (3.5); 4: while n < N max and z n+1 1 < f 1 (ūend ) do 5: n n + 1; 6: Solve (ERPP) z n with starting point ū n 1 ; 7: Set P s P s {ū n } and P f P f {f(ū n )}; 8: Compute z n+1 by (3.4); 9: Return P s P s {ū end } and P f P f {f(ū end )} Remark 3.1. Note that Algorithm 1 describes an approach for traversing the Pareto front P f from top to the bottom. However, it is also possible to start the algorithm by ū end and not by ū. In this case we would move along the Pareto front P f from bottom to the top. For this purpose, the following modifications have to be done: (i) A new reference point z n+1 is defined by ϕ z n+1 := f(ū n ) h ϕ R ϕ + h ϕ R for n =,..., N for suitable parameters ϕ, ϕ >, whereby z 1 is given by ( ) z 1 := f(ū 1 h ). (ii) The termination condition is then given by z n+1 f (ū ). h

23 Chapter 4 Proper Orthogonal Decomposition for Abstract Evolution Problem The goal of this section is to introduce the proper orthogonal decomposition (POD) method, which is a widely used model order reduction technique, for the parameter-dependent abstract evolution problem y t (t), ϕ V,V + a(t; y(t), ϕ) = f(t, u), ϕ V,V for all ϕ V a.e. y(), ϕ H = y, ϕ H for all ϕ H, (AEP) which was considered in Section.3. In the course of this, u is supposed to be an element of the vector space U. Furthermore, in order to ensure the well-posedness of this equation, it is additionally assumed that the right-hand side f : (, T ) U V fulfils f(, u) L (, T ; V ) for all u U. The general approach of the model order reduction is based on constructing a low-dimensional subspace V l V for which the solutions to the associated reduced-order evolution equation y t (t), ϕ (V l ),V l + a(t; y(t), ϕ) = f(t, u), ϕ (V l ),V l for all ϕ V l a.e. y(), ϕ H = y, ϕ H for all ϕ H are still good approximations to the solutions of (AEP). In this work a well known and widely used model reduction technique called proper orthogonal decomposition is presented. The basis idea of this method consists in creating a subspace V l by computing the POD basis which contains the main characteristics of the expected solutions to (AEP). This is in contrast to, for example, the finite element method, where the basis elements of the associated subspace VFE m do not relay to the physical properties of the approximated system. Thus, the dimension of the subspace V l is usually much smaller than the dimension of VFE m. Consequently, having the numerical treatment of this problem in mind, the computational effort is significantly lower. However, in order to compute the POD space V l, some information about the dynamics of solutions to the parameter-dependent evolution equation are needed. These are usually provided by using its solutions for different inputs of u U. Thus, the use of the POD method is only reasonable if the underlying equation has to be solved repeatedly for different values of u. In numerical applications this situation appears especially in the context of optimal control problems, where the equation (AEP) is often used as a PDE-constraint. For a theoretical introduction we consider the continuous version of the POD method in the first part of this chapter. In this case the provided data are function mappings from [, T ] to V. In the second part, we turn to the numerical implementation for which only discrete data will be available. The whole chapter is strongly based on [6, Section ].

24 18 4 Proper Orthogonal Decomposition for Abstract Evolution Problem 4.1 Continuous Version of the POD Method Let (V,, V ) and (H,, H ) be two real, separable Hilbert spaces such that V is dense in H with compact embedding. Furthermore, for fixed N let the trajectories y 1,..., y L (, T ; V ) be given and introduce the linear subspace with dimension d := dim(v) N { }. V := span{y k (t) 1 k, t [, T ]} V Definition 4.1. In the context of POD we call the given trajectories y 1,..., y L (, T ; V ) snapshots and the corresponding set V of the dimension d snapshot subspace. Remark 4.. (i) The general idea of POD consists in finding a low-dimensional subspace V l V, so that the difference between y and its projection onto V l is as small as possible for all y V. (ii) The snapshots y 1,..., y L (, T ; V ) are usually solutions to (AEP) for different parameters u U or their derivatives. Thus, the associated snapshot space V contains information about the dynamics of the investigated system. (iii) The inputs u, which are used for generating the snapshots, have to be chosen appropriately in order to obtain a good approximation. Now, we define the notion of a POD basis and a POD space, respectively. Definition 4.3. Let the snapshots y 1,..., y L (, T ; V ) be given. A POD basis { ψ i } l i=1 of rank l for any l {1,..., d} is the solution to the optimization problem min ψ 1,...,ψ l V k=1 T yk (t) l y k (t), ψ i V ψ i i=1 subject to (s.t.) ψ i, ψ j V = δ ij, 1 i, j l, V dt (P OD l ) where the symbol δ ij denotes the Kronecker delta satisfying δ ii = 1 and δ ij = for i j. The associated linear space V l := span{ ψ 1,..., ψ l } is called POD space of rank l. Remark 4.4. Expressed verbally, for a given rank l d the POD basis { ψ i } l i=1 minimizes the sum of the mean square errors between the snapshots y k and the corresponding l-th partial Fourier sums l i=1 yk ( ), ψ i V ψ i which can be seen as projections onto the subspace V l. A solution to the optimization problem (P OD l ) can be characterized by an eigenvalue problem for the linear integral operator R : V V, Rψ = k=1 T y k (t), ψ V y k (t) dt. (4.1) Theorem 4.5. Let y 1,..., y L (, T ; V ) be given snapshots. Then the linear integral operator R is compact, non-negative and self-adjoint.

25 4. Discrete Version of the POD Method 19 Proof. A proof is given in [6, Lemma.1]. In the next theorem it is shown how the solution of the optimization problem (P OD l ) can be found by using the operator R. A proof can be found in [6, Lemma.13]. Theorem 4.6. Let y 1,..., y L (, T ; V ) be given snapshots. Furthermore, let R be the linear integral operator defined by (4.1). Then there are non-negative eigenvalues { λ i } i N and associated orthonormal eigenfunctions { ψ i } i N satisfying R ψ i = λ i ψi, λ1... λ d > λ d+1 =... =. (4.) In particular, for every l {1,..., d} the first l eigenfunctions { ψ i } l i=1 solve the optimization problem (P OD l ). Moreover, it holds k=1 T yk (t) l y k (t), ψ i V ψi i=1 V dt = d i=l+1 λ i. (4.3) Remark 4.7. (i) Note that the system { ψ i } i N is an orthonormal basis of the separable Hilbert space V, so that V l := span{ ψ 1,..., ψ l } is an orthonormal system for all l N. (ii) The equation (4.3) says that approximation quality of the POD method depends on the decay rate of the eigenvalues { λ i } i N and the choice of l. (iii) In the applications the choice of l is often based on heuristic considerations combined with observing the ratio of modelled to total energy contained in the snapshots y 1,..., y which is expressed by l i=1 ε(l) = λ i d λ [, 1]. (4.4) i=1 i 4. Discrete Version of the POD Method In numerical applications the whole trajectories {y 1 (t),..., y (t) t [, T ]} are usually not available, but only their approximations. Hence, a discrete version of the POD method is needed. Let y 1,..., y : {t 1,..., t n } R m be given finite element solutions to the abstract evolution equation (AEP) for different inputs u U or their discrete derivatives. Here, m is the degree of freedom of the discrete solutions. As in the case of the continuous POD method, we want to find a low-dimensional subspace V l of the solution space V n := span{y k j 1 k, 1 j n} (4.5) such that V l approximates V n as good as possible. The general approach will be analogous to the continuous case. Definition 4.8. In the context of discrete POD, we call the vectors y k 1,..., yk n R m for 1 k discrete snapshots and the set V n, defined by (4.5), discrete snapshot space. Furthermore, let d n min(n, m) be the dimension of the discrete snapshot space V n.

26 4 Proper Orthogonal Decomposition for Abstract Evolution Problem Now, we can formulate the notion of a discrete POD basis and a discrete POD space, respectively. Definition 4.9. For 1 k let y1 k,..., yk n R m be discrete snapshots. Furthermore, let α1 n,..., αn n > be positive weighting parameters and W R m m a symmetric and positive definite matrix such that, W :=, W R m defines an inner product on R m. A discrete POD basis { ψ i } l i=1 of rank l for any l {1,..., dn } is the solution to the optimization problem n min α n l ψ 1,...,ψ l R m j yk j yj k, ψ i W ψ i (P OD k=1 j=1 i=1 W n) l s.t. ψ i, ψ j W = δ ij, 1 i, j l. The associated linear space V l := span{ ψ 1,..., ψ l } is called discrete POD space of rank l. Remark 4.1. (i) As in the continuous case, the discrete POD basis { ψ i } l i=1 is chosen such that the sum of the weighted mean square errors between the snapshots y1 k,..., yk n R m for 1 k p and their orthogonal projections onto the space span{ ψ 1,..., ψ l } is minimized. (ii) The positive weights α1 n,..., αn n are supposed to model the numerical integration and thus should be chosen appropriately. A popular choice in applications are the trapezoidal weights α n 1 = (iii) A weighted inner product, W element space. T (n 1), αn j = T n 1 for j n 1, αn n = T (n 1). is used for modelling the inner product on the finite Analogous to continuous case, the solution to the minimization problem (P OD l n) coincides with the solution of an eigenvalue problem. Definition and Theorem Let y1 k,..., yk n R m be discrete snapshots for 1 k and define the linear operator n R n : R m V n, R n ψ = αj n yj k, ψ W yj k. (4.6) Then R n is compact, non-negative and self-adjoint. Proof. A proof can be found in [6, Lemma.]. k=1 j=1 Theorem 4.1. For 1 k let y k 1,..., yk n R m be discrete snapshots. Furthermore, let R n be the linear operator defined by (4.6). Then there are non-negative eigenvalues { λ n i }m i=1 and associated orthonormal eigenvectors { ψ n i }m i=1 satisfying R n ψn i = λ n ψ i i n, λn 1... λ n d n > λ n d n +1 =... =. (4.7) In particular, for every l {1,..., d n } the first l eigenvectors { ψ i n}l i=1 problem (P ODn). l Moreover, it holds n αj n l yk j yj k, ψ d i n W ψn i = λ n i. k=1 j=1 i=1 W i=l+1 solve the optimization

27 4. Discrete Version of the POD Method 1 In order to solve the eigenvalue problem (4.7) practically, we state the next theorem. Thereby we limit ourselves to the case p =. The extension for p > is straightforward. Definition and Theorem For 1 k let y k 1,..., yk n R m be discrete snapshots. Furthermore, define the matrix Y := [y y1 n y 1... y n] R m n with rank d n and the diagonal matrix D := diag{α n 1,..., αn n} R n n. Then the matrix representation of the linear operator R n, defined by (4.6), is given as follows: R n = Y DY W, where ( ) D D := R n n. D Thus, (4.7) corresponds to the eigenvalue problem Y DY W ψ n i = λ n i ψ n i, λn 1... λ n d n > λ n d n +1 =... = λ n m =. (4.8) Moreover, by defining Ŷ := W 1/ Y D 1/ R m n and ψi n equivalent to the symmetric m m eigenvalue problem := W 1/ ψn i, the problem (4.8) is Ŷ Ŷ ψ n i = λ n i ψ n i, λn 1... λ n d n > λ n d n +1 = subject to ψ n i, ψ n j R m = δ ij, 1 i, j m. (4.9) Proof. For any ψ R m and i {1,..., m} we derive ( Y DY W ψ )i = n = j=1 n ( n Y ij DY W ψ )j = m m j=1 k=1 ν=1 α n j j=1 ( y 1 j )k W kνψ ν ( y 1 j )i + n n D jj Y ij (Y W ψ )j = m m j=1 k=1 ν=1 n ( = α n j yj 1, ψ W yj 1 + αj n yj, ψ W yj ) = (R n ψ) i. j=1 i α n j m j=1 k=1 ν=1 m D jj Y kj W kν ψ ν Y ij ( y j )k W kνψ ν ( y j ) i Thus, it holds Y DY W ψ = R n ψ for ψ R m. The second proposition follows by multiplying (4.8) with W 1/ from the left and defining ψ n i := W 1/ ψn i. Moreover, using W = W yields ψ n i, ψ n j R m = ψ n i, ψ n j W, 1 i, j m. Remark According to [6, pp. 1-13] there are three numerical methods to determine the POD basis { ψ i n}l i=1 of rank l:

28 4 Proper Orthogonal Decomposition for Abstract Evolution Problem (1) Solve the symmetric m m eigenvalue problem (4.9) for l largest eigenvalues λ n 1... λ n l > and set ψ n i = W 1/ ψ n i for 1 i l. However, this method is only recommended for m n, since we have to compute Ŷ = W 1/ Y DY W 1/ and thus the matrix W 1/ R m m which is numerically expensive to compute. () Method of snapshots: Solve the symmetric n n eigenvalue problem Ŷ Ŷ ϕ n i = λ n i ϕ n i, s.t. ϕ n i, ϕ n j R n = δ ij, 1 i, j m for l largest eigenvalues λ n 1... λ n l > and set ψ i n = W 1/ ψi n = 1 W λn 1/ Ŷ ϕ n i = 1 Y i λn D 1/ ϕ n i i for 1 i l. This method is suitable for n m, since we have Ŷ Ŷ = D 1/ Y W Y D 1/. Thus, the expensive computation of W 1/ is not required, whereas the matrix D 1/ is numerically cheap to compute. However, this approach is numerically unstable, since taking the product of Ŷ and Ŷ squares the condition number compared to Ŷ. (3) Singular value decomposition (SVD): Compute the SVD of the matrix Ŷ, i.e. determine orthonormal vectors {ψi n}l i=1 Rm and {ϕ n i }l i=1 Rn associated to the l largest singular values σ 1 n... σn l > such that Ŷ ϕ n i = σ n i ψ n i and Ŷ ψ n i = σ n i ϕ n i, 1 i l. Then the POD basis { ψ n i }l i=1 and the corresponding eigenvalues { λ n i }l i=1 are given by ψ n i = W 1/ ψ n i and λ n i = ( σ n i ), 1 i l. This method is recommended if the computations are not too costly, since it is numerically stable. More information about SVD can be found, for instance, in [13].

29 Chapter 5 Bicriterial Optimal Control of Time-Dependent Convection-Diffusion Equations In this chapter a bicriterial optimal control problem governed by a heat equation with a timedependent convection term and bilateral control constraints is considered. In the first part of the chapter it is shown that the present problem fits into the framework of the bicriterial optimization treated in Chapter 3 and the Euclidean reference point method is applied. In the second part, as this method transforms the bicriterial optimal control into a series of scalar optimization problems, the method of proper orthogonal decomposition (POD) presented in Chapter 4 is applied as an approach for the model-order reduction. 5.1 Problem Formulation We consider a bicriterial optimal control problem specified by ( 1 min J(u, y) = y y ) Q L (,T ;L (Ω)) 1 u L (,T ;R m ) subject to (s.t.) y t (t, x) κ y(t, x) + β(t, x) y(t, x) = m i=1 u i(t)χ i (x) for (t, x) (, T ) Ω y η (t, x) + α jy(t, x) = α j y a (t) for (t, x) (, T ) Γ j y(, x) = y (x) for x Ω for all j {1,..., s} and (BOC) (SE) u a (t) u(t) u b (t) for almost all (f.a.a.) t [, T ], (BC) which is supposed to model an energy efficient heating, ventilation and air-conditioning process in a room described by the set Ω R n. In the course of this optimization problem, the control u models the heating process and the associated state y is the temperature distribution in the room given as the solution to the modified heat equation (SE). Thus, the aim of this optimal control problem consists in finding a heating strategy such that the corresponding temperature distribution is close to the desired temperature y Q while the heating costs, modelled by the second objective, are supposed to be low. The heating process itself is described by the right-hand side of the state equation (SE), whereby there are m heaters whose locations are given by the indicator functions χ 1,..., χ m. The associated heating strategies u 1,..., u m are supposed to be space-independent. Furthermore, we use the bilateral constraints (BC) in order to bound the heating costs.

30 4 5 Bicriterial Optimal Control of Time-Dependent Convection-Diffusion Equations The boundary condition of the underlying PDE is of Robin type and is supposed to model the heat exchange with the outside world. Therefore, in order to model windows as well as inner and outer walls simultaneously, the boundary Γ := Ω of the room is split into several disjoint parts Γ 1,..., Γ s. The isolation quality of each boundary part i {1,..., m} is described by a constant isolation coefficient α i. For example, α i = means the perfect isolation, i.e. there is no heat exchange with the outside world. The initial condition y describes the initial temperature distribution in the room. In contrast to the bicriterial optimal control problems investigated in [1] we focus in the present master thesis on problems which are governed by the heat equation with a time-dependent convection term β. This models the air flow inside a room caused for example by an open window or an air conditioner. Hence, allowing β to be time-dependent we are able to build more realistic heating scenarios. For the mathematical investigations we make the following assumptions which are supposed to hold for the rest of this chapter: Assumption 4. The spatial domain Ω R n is bounded with a C 1 -boundary Γ = Ω, which is divided into several disjoint Parts Γ 1,..., Γ s. We assume n N with n. The characteristic sets of the characteristic functions χ 1,..., χ m are supposed to be disjoint and measurable with non-zero measure so that χ i L (Ω) for all i {1,..., m} holds. The diffusion parameter κ is constant and positive. The convection term β is time-dependent and satisfy β B := L (, T ; L (Ω; R n )). The isolation coefficients satisfy α i for all i {1,..., m}. The initial temperature is supposed to fulfil y L (Ω). The outer temperature y a L (, T ) is space-independent. For each control u we assume that u U := L (, T ; R m ). Note that the control space U is a real Hilbert space. The bounds in the bilateral box constraints fulfil u a, u b L (, T ; R m ). 5. Well-Posedness of the State Equation The goal of this section is to show that under Assumption 4 the heat equation with a timedependent convection term and Robin boundary condition given by y t (t, x) κ y(t, x) + β(t, x) y(t, x) = m u i (t)χ i (x) for (t, x) (, T ) Ω (5.1a) i=1 y η (t, x) + α iy(t, x) = α i y a (t) for (t, x) (, T ) Γ j (5.1b) is well-posed, i.e. the following properties are fulfilled: y(, x) = y (x) for x Ω (5.1c)

31 5. Well-Posedness of the State Equation 5 1. the equation (5.1) possesses a solution,. the solution is unique, 3. the unique solution depends continuously on the data. In order to proof that, we are going to use Theorem.14. But first we have to transform (5.1) into an abstract evolution problem (AEP) and to define the associated function spaces H, V and W (, T ). For this purpose, we need the weak formulation of the present heat equation. Definition and Remark 5.1. Let u U and ϕ C (Ω) be arbitrary. equation (5.1a) with the test function ϕ and integrating over Ω yield Ω y t (t, x)ϕ(x) dx κ Ω y(t, x)ϕ(x) dx + Ω β(t, x) y(t, x)ϕ(x) dx = Ω Multiplying the m u i (t)χ i (x)ϕ(x) dx. (5.) Using Gauss Theorem and plugging in the Robin boundary condition (5.1b), the second summand in (5.) can be replaced by Ω y(t, x)ϕ(x) dx = Ω = Ω y(t, x) ϕ(x) dx + y(t, x) ϕ(x) dx + Then inserting (5.3) into (5.) yields y t (t, x)ϕ(x) dx + κ = Ω + Ω m i=1 Ω β(t, x) y(t, x)ϕ(x) dx + κ Ω u i (t)χ i (x)ϕ(x) dx + κ Ω ϕ(x) y (t, x) ds(x) η s α i i=1 y(t, x) ϕ(x) dx s α i i=1 s α i i=1 Γ i i=1 Γ i ϕ(x)(y a (t) y(t, x)) ds(x). (5.3) Γ i ϕ(x)y(t, x) ds(x) ϕ(x)y a (t) ds(x) for an arbitrary ϕ C (Ω) and thus for all ϕ H 1 (Ω), as C (Ω) is a dense subset of H 1 (Ω). Hence, by defining V := H 1 (Ω), H := L (Ω), W (, T ) := {ϕ L (, T ; V ) ϕ t L (, T ; V )} and introducing the mappings a : (, T ) V V R and f : (, T ) V defined by a(t, ϕ, ψ) := κ Ω ϕ(x) ψ(x) dx + Ω β(t, x) ϕ(x)ψ(x) dx + κ s α i i=1 Γ i ϕ(x)ψ(x) ds(x) and f(t), ϕ V,V := m i=1 Ω s u i (t)χ i (x)ϕ(x) dx + κ α i ϕ(x)y a (t) ds(x), i=1 Γ i

32 6 5 Bicriterial Optimal Control of Time-Dependent Convection-Diffusion Equations we obtain the following abstract representation of (5.1): y t (t), ϕ V,V + a(t; y(t), ϕ) = f(t), ϕ V,V for all ϕ V a.e. y(), ϕ H = y, ϕ H for all ϕ H (WF) for an unknown function y W (, T ). We call this representation weak formulation or variational equation of (5.1). Remark 5.. It is well-known that H 1 (Ω) and L (Ω) are real, separable Hilbert spaces. In particular, the Sobolev space H 1 (Ω) is dense in L (Ω) with compact embedding. As in Chapter we denote the embedding constant by C V. Furthermore, by identifying H 1 (Ω) and L (Ω) with their dual spaces, it holds V H H V. For more information see for instance [7, Chapter 16]. Having introduced the abstract representation of the heat equation (5.1), we need to prove some properties for the operators a and f in order to be able to apply Theorem.14. Lemma 5.3. The mapping a(t,, ) : V V R is a bilinear form f.a.a. t (, T ). Moreover, it satisfies Assumption 1, i.e. (i) the mapping t a(t,, ) is measurable, (ii) there is a constant C a >, which is independent of t, such that a(t; ϕ, ψ) C a ϕ V ψ V for all ϕ, ψ V a.e., (iii) there are constants γ > and η, which are independent of t, such that the coercivity condition holds. a(t; ϕ, ϕ) γ ϕ V η ϕ H for all ϕ, ψ V a.e. Proof. The bilinearity follows immediately from the definition. (i) The mapping t β(t, x) ϕ(x)ψ(x) dx is measurable for all ϕ, ψ V, as composition Ω of measurable functions. Hence, the mapping t a(t,, ) is measurable as well. (ii) Let ϕ, ψ V be arbitrary. By using Theorem (.11) and Cauchy-Schwarz, we obtain for

33 5. Well-Posedness of the State Equation 7 almost all t (, T ) a(t, ϕ, ψ) = κ ϕ(x) ψ(x) dx + Ω Ω s β(t, x) ϕ(x)ψ(x) dx + κ α i i=1 Γ i ϕ(x)ψ(x) ds(x) κ ϕ L (Ω;R n ) ψ L (Ω;R n ) + β(t, ) ϕ L (Ω) ψ L (Ω) s + κ α i ϕ L ( Ω) ψ L ( Ω) i=1 κ ϕ V ψ V + β(t, ) L (Ω;R n ) ϕ L (Ω;R n ) ψ V + κc κ ϕ V ψ V + β(t, ) L (Ω;R n ) ϕ V ψ V + κc κ ϕ V ψ V + β B ϕ V ψ V + κc = C a ϕ V ψ V, s α i ϕ V ψ V i=1 s α i ϕ V ψ V i=1 s α i ϕ V ψ V where C is the constant from Theorem.11 and C a is defined by C a := κ + β B + κc i=1 s α i. (iii) Let ϕ V be arbitrary. By defining C β := β B, we infer f.a.a. t (, T ) that a(t, ϕ, ϕ) = κ ϕ(x) ϕ(x) dx + κ Ω s i=1 i=1 κ ϕ L (Ω;R n ) β(t, ) ϕ L (Ω) ϕ L (Ω) α i ϕ(x) Γ ds(x) + β(t, x) ϕ(x)ϕ(x) dx i Ω κ ϕ L (Ω;R n ) β(t, ) L (Ω;R n ) ϕ L (Ω;R n ) ϕ L (Ω) κ ϕ L (Ω;R n ) C β ϕ L (Ω;R n ) ϕ L (Ω) ( κ ϕ κ L (Ω;R n ) C β ϕ L C (Ω;R n ) + C ) β β κ ϕ L (Ω) = κ ϕ L (Ω;R n ) C β κ ϕ L (Ω) ) = κ ϕ V ( C β κ + κ ϕ H. Thus, the claim follows for γ := κ > and η := C β κ + κ. Lemma 5.4. The mapping f : (, T ) V is well-defined and it holds f L (, T ; V ) for all u U.

34 8 5 Bicriterial Optimal Control of Time-Dependent Convection-Diffusion Equations Proof. A detailed proof of this lemma can be found in [1, Lemma 5.]. Now, we can conclude the well-posedness of the weak formulation (WF). Theorem 5.5. Let Assumption 4 be satisfied. Then there is a unique solution y W (, T ) to the weak formulation (WF). Moreover, there is a constant C > such that the a-priori estimate ) y W (,T ) C ( f L (,T ;V ) + y H holds. Proof. This result follows immediately by applying Theorem.14. Definition 5.6. Using the last theorem, we define the linear continuous operator T : L (, T ; V ) H W (, T ), T (f, y ) := y which maps a right-hand side f and an initial value y to the unique solution y of (WF). In the following we want to analyse how the solution to (WF) depends on the control u U. For this purpose, we split the right-hand side f into one part which is independent of the control u and another part depending on u. Definition and Remark 5.7. We split the right-hand side f of the weak formulation (WF) into f(t) = F(t) + (Bu)(t), where the u-independent and u-dependent parts are defined by s F : (, T ) V, F(t), ϕ V,V := κ α i ϕ(x)y a (t) ds(x) and respectively. i=1 B : U L (, T ; V ), (Bu)(t), ϕ V,V := Γ i m i=1 Ω u i (t)χ i (x)ϕ(x) dx, Lemma 5.8. The operator B depends linearly on the control u U and fulfils the estimate Bu L (,T ;V ) χ L (Ω;R m ) u U. (5.4) Proof. The linear dependency on u follows directly from the definition of B. In order to show the second claim, we observe that for an arbitrary ϕ V and almost all t (, T ) the following estimate holds: (Bu)(t), ϕ V,V = m u i (t)χ i (x)ϕ(x) dx i=1 Ω m u i (t) χ i (x)ϕ(x) dx i=1 Ω m u i (t) χ i L (Ω) ϕ L (Ω) i=1 m u i (t) χ i L (Ω) ϕ V C(t) ϕ V, i=1

35 5.3 Reduced Optimal Control Problem 9 where C(t) := m i=1 u i(t) χ i L (Ω). Hence, we have (Bu)(t) V C(t) f.a.a. t (, T ). Furthermore, using this estimate, we obtain T T Bu L (,T ;V ) = (Bu)(t) V dt C(t) dt = T ( m u i (t) χ i L (Ω)) dt i=1 χ L (Ω;R m ) T u(t) R m = χ L (Ω;R m ) u U < Thus, we conclude Bu L (,T ;V ) χ L (Ω;R m ) u U. Definition and Remark 5.9. Let y W (, T ) be the unique solution to (WF) for an arbitrary u U. We define the u-independent part of the solution by ŷ := T (F, y ) and introduce the linear operator S : U W (, T ) L (, T ; H), Su := T (Bu, ). Then by linearity the solution can be expressed as y = ŷ + Su. Remark 5.1. Note that using Theorem.1 we can conclude as C(, T ; H) L (, T ; H) holds. W (, T ) C(, T ; H) L (, T ; H), Lemma The solution operator S is injective and continuous with where C > is the constant from Theorem 5.5. S L(U;L (,T ;H)) C χ L (Ω;R m ), Proof. The continuity of S follows immediately from estimate (5.4) and Theorem (5.5). The proof of the injectivity can be found in [1, p. 53]. 5.3 Reduced Optimal Control Problem Using the linear continuous operator S defined in the last section, we are able to transform the bicriterial optimal control problem (BOC) to an equivalent problem which is only depending on the control u. Moreover, we will show that the resulting problem fits into the framework of the bicriterial optimization problems introduced in Chapter 3. Definition 5.1. Let U ad := {u U u a (t) u(t) u b (t) f.a.a. t [, T ]} be the set of all admissible controls. Defining the reduced objective function ( ) ( ) Ĵ : U R Ĵ1 (u) 1, Ĵ(u) = := ŷ + Su y Q L (,T ;H) Ĵ (u) 1 u U the reduced bicriterial optimal control problem reads min Ĵ(u) s.t. u U ad. (RBOC) dt

36 3 5 Bicriterial Optimal Control of Time-Dependent Convection-Diffusion Equations It is clear that the reduced optimal control problem (RBOC) has the form of a general bicriterial optimization problem. Thus, in order to apply the results shown in Chapter 3, we only need to verify the main assumptions on the objective functions Ĵ1, Ĵ and the admissible set U ad. Indeed, this work has already been done in [1, pp ]. Hence, in the following we only cite those results. Lemma The admissible set U ad is non-empty, closed and convex. Lemma The reduced objective functions Ĵ1 and Ĵ are strictly convex, continuous, bounded from below and quadratic. Lemma The reduced objective functions Ĵ1 and Ĵ are twice continuously Fréchet differentiable with the gradients and second derivatives Ĵ1(u) = S (ŷ + Su y Q ) Ĵ(u) = u Ĵ 1 (u) = S S Ĵ (u) = id U, where S : L (, T ; H) U is the Hilbert space adjoint of S. Corollary The second derivatives Ĵ 1 and Ĵ are positive definite. Moreover, Ĵ is even uniformly positive definite with coercivity constant C = 1. Having shown that the reduced problem (RBOC) fits into the framework of the bicriterial optimization, we are now able to apply the theoretical results shown in Chapter 3. An associated Euclidean reference point problem for a suitable reference point z is then defined as follows: Definition Given a reference point z R we define the Euclidean distance function F z : U R, F z (u) := 1 ) 1 ) (Ĵ1 (u) z 1 + (Ĵ (u) z. Then the associated Euclidean reference point problem is given by min F z (u) s.t. u U ad. As in Chapter 3 we refer to this problem by (ERPP) z. 5.4 Adjoint Equation Most approaches for solving scalar optimization problems utilize evaluations of the gradient and the second derivative of the objective function. Thus, having in mind the numerical treatment of (RBOC), it is crucial to compute Ĵi(u) U and Ĵ i (u)h U efficiently for given u U ad, h U and all i {1, }. However, the current representations of the gradient and the second derivative of Ĵ1 contain the Hilbert adjoint S for which it is not clear how to evaluate it. Hence, an alternative representation of the derivatives is needed. For this purpose, we introduce the so-called adjoint equation.

37 5.4 Adjoint Equation 31 Definition Given a control u U ad we call the end value problem d dt p(t), ϕ V,V + a(t; ϕ, p(t)) = y Q (t) ŷ(t) Su(t), ϕ V,V p(t ), ϕ H =, ϕ H the adjoint or dual equation of (WF). for all ϕ V a.e. for all ϕ H (AE) Theorem Let u U be arbitrary. Then there is a unique solution p W (, T ) of the adjoint equation (AE) satisfying for a constant C >. p W (,T ) C y Q ŷ(t) Su L (,T ;V ) Proof. In order to apply Theorem.14, we first have to transform (AE) into an initial value problem. Thus, defining the function p(t) := p(t t) and the operator ã(t, ϕ, ψ) := a(t t, ψ, ϕ) f.a.a. t (, T ) and all ϕ, ψ V, we obtain an initial value problem d dt p(t), ϕ V,V + ã(t; p(t), ϕ) = y Q (T t) ŷ(t t) Su(T t), ϕ V,V ϕ V a.e. p(), ϕ H =, ϕ H ϕ H. Note that the operator a(t,, ) is a bilinear form f.a.a. t (, T ) which fulfils Assumption 1. Hence, the operator ã(t,, ) satisfies those properties f.a.a. t (, T ) as well. Furthermore, under Assumption 4 we have and thus using the continuous embedding y Q (T ) ŷ(t ) Su(T ) L (, T ; V ) y Q (T ) ŷ(t ) Su(T ) L (, T ; V ). Applying Theorem.14 to (5.5), we conclude that there is a unique solution p W (, T ) satisfying p W (,T ) C y Q ŷ Su L (,T ;V ) for a constant C >. Hence, there is a unique solution p W (, T ) to the adjoint equation (AE) which fulfils the above inequality. Definition and Remark Having the last theorem in mind, we define the linear and continuous operator A : L (, T ; V ) W (, T ) L (, T ; H) which maps an arbitrary right-hand side function to the unique solution of (AE). (5.5). Analogous to Definition and Remark 5.7, we split the solution to (AE) in one part which is independent on the input variable u U ad and another part depending on u. For this purpose, we define the linear operator A : U W (, T ), A(u) := A( Su) and the u-independent part of the solution ˆp := A(y Q ŷ). Then for all u U it holds by linearity A(y Q ŷ Su) = ˆp + Au. Note that A is continuous as the composition of two continuous functions.

38 3 5 Bicriterial Optimal Control of Time-Dependent Convection-Diffusion Equations The next lemma shows that by using the adjoint equation we obtain representations of the derivatives of Ĵ1 which are numerically evaluable. Lemma 5.1. Assume that p = ˆp + Au is the unique solution to the adjoint equation (AE) for a given u U. Then the gradient and the second derivative of the objective function Ĵ1 are given by Ĵ1(u) = B (ˆp + Au) and Ĵ 1 (u) = B A, respectively. Proof. A detailed proof for a similar problem with a time-independent operator a(, ) can be found in [1, pp ]. An extension to the time-dependent case is straightforward. The next result shows how to evaluate the adjoint operator B. Lemma 5.. Let B be the operator defined in Definition and Remark 5.7. Then the adjoint operator B is given by B : L (, T ; V ) U, B v(t) = Proof. The proof of this lemma can be found in [1, pp ]. Ω χ 1(x)v(t, x) dx. Ω χ m(x)v(t, x) dx. 5.5 Reduced-Order Modelling Using POD In the first part of this chapter we showed that the bicriterial optimal control problem (BOC) governed by the linear heat equation with convection term (SE) and bilateral constraints (BC) can be transformed into the problem (RBOC), which fits into the framework of the bicriterial optimization discussed in Chapter 3. However, when applying Algorithm 1 to (RBOC), many scalar optimization problems (ERPP) z have to be solved for different reference points z R. Each solve of (ERPP) z requires numerous evaluations of F z and F z and thus multiple solves of both the state and adjoint equation. Unfortunately, the computations are often too costly when using a standard Finite-Element method. Thus, in order to reduce the computation effort, we apply the POD method introduced in Chapter 4 to both equations. By doing so, we will not only obtain reduced-order equations, but also the POD approximated objective function Ĵ l depending on the POD space POD Approximation of the State Equation To begin with, we introduce the reduced-order model for the u-dependent part of the weak formulation (WF). For this purpose, assume that a POD space V l V of rank l is given and the u-independent part ŷ of the solution to (WF) is already computed.

39 5.5 Reduced-Order Modelling Using POD 33 Definition 5.3. Let u U be arbitrary. We call the initial value problem y l t(t), ϕ (V l ),V l + a(t; yl (t), ϕ) = Bu(t), ϕ (V l ),V l for all ϕ V l a.e. y l (), ϕ H =, ϕ H for all ϕ V l (WF l ) a low-dimensional or reduced-order model of the u-dependent part of (WF). Theorem 5.4. Let Assumption 4 be satisfied. Then there is a unique solution y l H 1 (, T ; V l ) to the reduced-order model (WF l ) satisfying the a-priori estimate y l H C Bu 1 L (,T ;V ) (,T ;(V l ) ), (5.6) where the constant C > is independent of l. Proof. The proof of a similar version of this theorem can be found in [9, pp ]. Remark 5.5. (i) As y l (t) V l for almost all t (, T ) holds, we can make the Galerkin ansatz of the form y l (t) = l yi(t) l ψ i V l f.a.a. t (, T ), (5.7) i=1 where yi l : (, T ) R are coefficient functions and { ψ i } l i=1 is the POD basis of rank l, i.e. V l = span { ψ 1,..., ψ l }. Then inserting (5.7) in (WF l ) and choosing ϕ := ψ i yield that the vector valued coefficient function y l defined by y l : (, T ) R l, y l := (y l 1,..., y l l ) solves the linear system of ordinary differential equations M l ẏ l (t) + A l (t)y l (t) = b l (t) f.a.a t (, T ) M l y l () =, (5.8) where M l := ( ψ i, ψ ) j (V l ),V l R l l A l (t) := ( a(t; ψ i, ψ j ) ) R l l b l (t) := ( Bu(t), ψ ) i (V l ),V l R l. Thus, solving (WF l ) is equivalent to computing the coefficient vector y l H 1 (, T ; R l ). Notice that the existence of a unique y l H 1 (, T ; R l ) follows by standard arguments. (ii) The ordinary differential equation (5.8) can be solved numerically by using standard techniques for time discretization such as the implicite Euler method or the Crank-Nicolson method. (iii) Note that for an arbitrary ϕ V and y H 1 (, T ; R) it holds yϕ H 1 (, T ; V ).

40 34 5 Bicriterial Optimal Control of Time-Dependent Convection-Diffusion Equations Definition and Remark 5.6. linear solution operator (i) Analogous to Definition and Remark 5.9 we define the S l : U H 1 (, T ; V l ) L (, T ; H), S l (u) := y l, which maps an arbitrary u U to the unique solution y l of (WF l ). (ii) By linearity we infer that ỹ l = ŷ + S l u solves the equation ỹ l t(t), ϕ (V l ),V l + a(t; ỹl (t), ϕ) = F(t) + Bu(t), ϕ (V l ),V l for all ϕ V l a.e. ỹ l (), ϕ H = y, ϕ H for all ϕ V l. (iii) Note that the POD approximation ỹ l (t) usually does not belong to V l, but to the affine space ŷ + V l provided ŷ. The benefit of this approach is the fact that we avoid an approximation error in the u-independent part of the solution. More information can be found, for instance, in [6]. Lemma 5.7. The linear operator S l is continuous and fulfils the estimate S l L(U,L C χ L (,T ;H)) (Ω;R m ), where C is the constant from (5.6). In particular, C does not depend on l. Proof. The claim follows immediately from the estimates (5.6) and (5.4). Unlike in Section 5., we can not expect the injectivity of the operator S l by using an arbitrary POD space. Thus, further assumptions are needed. Assumption 5. Let the POD space V l be chosen such that B seen as B L(U; L (, T ; (V l ) )) is injective. Lemma 5.8. Let Assumption 5 be satisfied. Then the operator S l is injective. Proof. Let u U such that y l := S l u = holds. We need to show that u =. Using the reduced-order model (WF l ), we obtain = y l t(t), ϕ (V l ),V l + a(t, yl (t), ϕ) = Bu(t), ϕ (V l ),V l for all ϕ V l a.e. and thus Bu =. Hence, the claim follows as we assumed that B is injective. For the rest of this chapter we assume that Assumption 5 is satisfied. In the following we want to investigate the convergence of the error S l u Su for an arbitrary u U. The proof of the next theorem can be found in [1, p. 6]. Theorem 5.9. Let u U be arbitrary. Furthermore, define y := Su and y l := S l u. (i) If the POD space V l is computed by using the snapshot y 1 = y, then y and y l satisfy the a-priori error estimate ( ) y l y C L 1 λ i + y t P l y t, (,T ;V ) i=l+1 L (,T ;V ) where { λ i } i N are the eigenvalues of the operator R given by (4.).

41 5.5 Reduced-Order Modelling Using POD 35 (ii) If y H 1 (, T ; V ) and the POD space V l is computed by using the snapshots y 1 = y and y = y t, then the estimate y l y C L (,T ;V ) i=l+1 holds, where { λ i } i N are the eigenvalues of the operator R given by (4.). (iii) Let (V l ) l N be POD spaces that are generated by an arbitrary family of snapshots. Furthermore, assume that y H 1 (, T ; V ) and define y l := S l u for all l N. Then it holds lim y l y =. L (,T ;V ) l Remark 5.3. (i) Note that the a-priori estimates above depend on the arbitrarily chosen, but fixed control u U which is also used for computing the POD space. (ii) As V H holds, all statements are also valid if we consider the error in the L (, T ; H) norm instead of the L (, T ; V ) norm. It is possible to derive an a-priori error bound without including time derivatives into the snapshot space. However, first we need the following result which was proved in [14, pp ]. Lemma Let the main assumptions of Chapter 4 be satisfied. Furthermore, let (V l ) l N be POD spaces computed using the snapshots y 1,..., y L (, T ; V ). Then it holds k=1 T y k (t) Q l y k (t) dt = V i=l+1 λ i λ i ψi Q l ψi where { ψ i } i N denotes the associated POD basis and { λ i } i N are the corresponding eigenvalues given by (4.). Moreover, Q l is the orthogonal projection given as follows: Q l : H V l, v Q l v := inf v w H w V l H. In addition, if Q l L(V ) is bounded independently of l, then for each k = 1,..., it holds lim y k Q l y k L =. (,T ;V ) l V, Remark 5.3. Let v H be arbitrary. computed using the expression Then the orthogonal projection Q l v V l can be where Q l v := l d i ψi s.t. M l d = b l, i=1 d R l, M l := ( ψ i, ψ j H ) R l l, b l := ( v, ψ i H ) R l. More information about it can be found in [14, p. 7].

42 36 5 Bicriterial Optimal Control of Time-Dependent Convection-Diffusion Equations Now, utilizing Lemma 5.31 we are able to derive an a-priori error bound without including time derivatives into the snapshot space. Theorem Let u U be arbitrary and let (V l ) l N be POD spaces computed using the snapshot y 1 = Su. Furthermore, define y l := S l u and y := Su. Then there is a constant C = C(C a, γ, η, T ) such that the a-priori error estimate y l y C L(,T ;V ) i=l+1 λ i ψi Q l ψi V (5.9) holds, where { ψ i } i N denotes the associated POD basis and { λ i } i N are the corresponding eigenvalues given by (4.). Moreover, Q l is the orthogonal projection defined in Lemma In addition, if Q l L(V ) is bounded independently of l, then it holds lim y l y =. L (,T ;V ) l Proof. For almost all t (, T ) we split the error as y l (t) y(t) = y l (t) Q l (y(t)) + Q l (y(t)) y(t) =: θ l (t) + ρ l (t), where θ l (t) := y l (t) Q l (y(t)) V l and ρ l (t) := Q l (y(t)) y(t) V. Using Lemma 5.31 with the snapshot y 1 = Su, we conclude that ρ l y = Q l (y) = L (,T ;V ) L (,T ;V ) i=l+1 λ i ψi Q l ψi Therefore, we only have to show that there is a constant C > such that θ l L (,T ;V ) C i=l+1 λ i ψi Q l ψi V, V. since the estimate y l y θ l ρ + l L (,T ;V ) L (,T ;V ) L (,T ;V ) holds. But first we need to find a uniform bound for θ l (t). For this purpose, we use H y l = Q l (y) + θ l in the reduced order model (WF l ) as well as Remark (.1) and obtain d dt θl (t), ϕ H + a(t; θ l (t), ϕ) = Bu(t), ϕ (V l ),V l d dt Ql (y(t)), ϕ H a(t; Q l (y(t)), ϕ) (5.1) for all ϕ V l and f.a.a. t (, T ). Note that the projection Q l : H V l H is self adjoint (see [14, p. 3]), thus we have the formula d dt Ql (y(t)), ϕ H = d dt y(t), Ql (ϕ) H = d dt y(t), ϕ H for all ϕ V l a.e.

43 5.5 Reduced-Order Modelling Using POD 37 Then inserting the u-dependent part of (WF) into (5.1) yields d dt θl (t), ϕ H + a(t; θ l (t), ϕ) = a(t; ρ l (t), ϕ) for all ϕ V l a.e. (5.11) Choosing ϕ = θ l (t) V l in (5.11) as well as using the coercivity of the operator a(t;, ) and Young s inequality.4 with ε >, we obtain 1 d θ l (t) θ dt + γ l (t) θ η l (t) a(t; ρ l (t), θ l (t)) H V H ρ C l θ a (t) l (t) (5.1) V V C a ρ l (t) ε + ε θ l (t) V V and thus by choosing ε = γ d θ l (t) dt C a ρ l (t) θ H γ + η l (t). V H Now, Gronwall s lemma.3 yields ( θ l (t) t ( θ exp η ds) l () + C a H H γ T exp (ηt ) C a ρ l (s) γ ds V exp (ηt ) C a y Q l (y) γ L (,T ;V ) t ) ρ l (s) ds V for almost all t (, T ). Using (5.1) again with ε = γ, we get the inequality d θ l (t) θ dt + γ l (t) θ η l (t) C a ρ l (t) H V H γ V and hence θ l (t) 1 V γ ( d dt θ l (t) θ + η l (t) + C a ρ l (t) H H γ Then integrating over (, T ) and using (5.13) yield θ l 1 ( T d θ l (s) T L (,T ;V ) γ ds ds + η θ l (s) ds + C a H H γ = 1 ( θ l (T ) θ γ + l () T + η θ l (s) ds + C a H H H γ η T θ l (s) γ ds + C a ρ l H γ L (,T ;V ) exp (ηt ) C aηt ρ l γ + C a ρ l L (,T ;V ) γ. L (,T ;V ) V T ). ) ρ l (s) ds V ρ l L (,T ;V ) ) (5.13) Hence, there is a constant C = C(C a, γ, η, T ) such that the inequality (5.9) holds. The second assertion follows immediately from Lemma 5.31 and Theorem 4.6.

44 38 5 Bicriterial Optimal Control of Time-Dependent Convection-Diffusion Equations 5.5. POD Approximation of the Objective Function Having introduced the solution operator S l of the reduced-order model (WF l ) in the last section, we are able to define a POD approximated objective function and the corresponding Euclidean reference point problem. Definition Given a POD space V l V of rank l define the POD approximated objective function ( ) ( Ĵ l : U R, Ĵ l Ĵ l (u) := 1 (u) 1 ) ŷ := + S l u y Q L (,T ;H). Ĵ (u) 1 u U Then the low-dimensional or reduced-order bicriterial optimal control problem reads where the admissible set U ad is given by min Ĵ l (u) s.t. u U ad, (RBOC l ) U ad := {u U u a (t) u(t) u b (t) f.a.a. t [, T ]} U. Furthermore, we define by Y l := Ĵ l (U ad ) R the admissible region of the objective Ĵ l. Notation In the following we refer to the original bicriterial optimal control problem (RBOC) and its approximation (RBOC l ) by calling it the full problem and the POD approximated problem, respectively. Lemma Let Assumption 5 be satisfied. Then the functions Ĵ 1 l, Ĵ are quadratic and fulfil Assumptions and 3. Moreover, the derivatives of Ĵ 1 l are given by Ĵ l 1(u) = (S l ) (ŷ + S l u y Q ) and Ĵ l 1(u) = (S l ) S l. Proof. The assertions of this lemma were proved in [1, p. 6]. The following result shows that under certain assumption the POD approximated objective function Ĵ 1 l converges to Ĵ1. Theorem Let (V l ) l N be POD spaces generated by an arbitrary family of snapshots. (i) If u U ad such that Su H 1 (, T ; V ) holds, then Ĵ l 1 (u) Ĵ1(u) as l. (ii) If (u l ) l N U ad is a sequence such that u l u (l ) for u U ad and Su H 1 (, T ; V ) hold, then Ĵ 1 l(ul ) Ĵ1(u l ) and Ĵ 1 l(ul ) Ĵ1(u) as l. Proof. The proof of this theorem was carried out in [1, pp. 61-6].

45 5.5 Reduced-Order Modelling Using POD 39 Definition Given a reference point z R define the POD approximated distance function Fz l : U R, Fz l (u) = 1 ) l 1 ) (Ĵ 1 (u) z 1 + (Ĵ (u) z. Then the low-dimensional or reduced-order Euclidean reference point problem for (RBOC l ) is given by min F l z (u) s.t. u U ad. Analogous to Definition 5.17 we refer to this optimization problem by (ERPP) l z POD Approximation of the Adjoint Equation In this section we introduce the reduced-order model for the adjoint equation (AE) which will be used to obtain a numerically evaluable representation of the gradient of the objective function Ĵ l 1. For this purpose, assume that a POD space V l V of rank l is given and the u-independent part ˆp of the solution to (AE) is already computed. Definition Given an arbitrary u U we call the end value problem p l t(t), ϕ (V l ),V l + a(t; ϕ, pl (t)) = S l u(t), ϕ (V l ),V l for all ϕ V l a.e. (AE l ) p l (T ), ϕ H =, ϕ H for all ϕ V l a low-dimensional or reduced-order model for the u-dependent part of (AE). Theorem 5.4. Let Assumption 4 be satisfied. Then there is a unique solution p l H 1 (, T ; V l ) to the reduced-order model (AE l ) satisfying the a-priori estimate p l H S C l u, 1 (,T ;V ) L (,T ;(V l ) ) where the constant C > is independent of l. Proof. The claim follows by utilizing the same technique as in the proofs of Theorems 5.19 and 5.4. Remark Analogous to Remark 5.5 we make the Galerkin ansatz of the form p l (t) = l p l i(t) ψ i V l f.a.a. t (, T ), (5.14) i=1 where p l i : (, T ) R are coefficient functions and { ψ i } l i=1 is the POD basis of rank l. Then inserting (5.14) in (AE l ) and choosing ϕ = ψ i yield that the vector valued coefficient function p l defined by p l : (, T ) R l, p l := (p l 1,..., p l l ) solves the linear system of ordinary differential equations M l ṗ l (t) + (A l ) (t)p l (t) = b l (t) f.a.a t (, T ) M l p l (T ) =,

46 4 5 Bicriterial Optimal Control of Time-Dependent Convection-Diffusion Equations where M l := ( ψ i, ψ ) j (V l ),V l R l l A l (t) := ( a(t; ψ i, ψ j ) ) R l l b l (t) := ( Su(t), ψ ) i (V l ),V l R l. Thus, solving (AE l ) is equivalent to computing the coefficient vector p l H 1 (, T ; R l ). The existence of a unique solution p l H 1 (, T ; R l ) follows by standard arguments. Definition and Remark 5.4. Analogous to Definition and Remark 5. we introduce the linear solution operator A l : U H 1 (, T ; V l ) L (, T ; H), A l (u) := p l, where p l is the unique solution to (AE l ). Note that by linearity p l := ˆp+A l u fulfils the equation p l t(t), ϕ (V l ),V l + a(t; ϕ, pl (t)) = y Q ŷ S l u(t), ϕ (V l ),V l p l (T ), ϕ H =, ϕ H for all ϕ V l a.e for all ϕ V l for all u U. As for the state equation, p l usually does not belong to V l, but to the affine space ˆp + V l. Lemma The solution operator A l is continuous and fulfils the estimate A l L(U,L C χ L (,T ;H)) (Ω;R m ), where the constant C > is independent of l. Proof. The claim follows immediately from Theorem 5.4 and Lemma 5.4. Analogous to Lemma 5.1 the derivatives of Ĵ 1 l reduced-order adjoint equation. can be expressed by utilizing the solution to the Lemma For a given u U let p l = A l u be the unique solution to (AE l ). Furthermore, let ˆp be the u-independent part of the solution to (AE). Then the gradient and the second derivative of the objective function Ĵ 1 l are given by Ĵ l 1(u) = B (ˆp + A l u) and respectively. Ĵ l 1(u) = B A l, Now, we want to investigate the convergence of the error A l u Au for an arbitrary u U. The proof of the following theorem can be found in [1, p. 63]. Theorem Let u U be arbitrary. Furthermore, define y := Su, p := Au, y l := S l u and p l := A l u.

47 5.5 Reduced-Order Modelling Using POD 41 (i) If y H 1 (, T ; V ) and the POD space V l is computed by using the snapshots y 1 = Sy and y = y t, then p l and p satisfy the a-priori error estimate ( p p l p C L 1 P l p ) + λ i, (,T ;V ) W (,T ) i=l+1 where { λ i } i N are the eigenvalues of the operator R given by (4.). (ii) If y, p H 1 (, T ; V ) holds and the POD space V l is computed by using the snapshots y 1 = y, y = y t, y 3 = p and y 4 = p t, then the estimate p l p C L (,T ;V ) i=l+1 holds, where { λ i } i N are the eigenvalues of the operator R given by (4.). (iii) Let (V l ) l N be POD spaces that are generated by an arbitrary family of snapshots. Furthermore, assume that y, p H 1 (, T ; V ) and define p l := A l u for all l N. Then it holds lim p l p =. L (,T ;V ) l Remark All statements are also valid if we consider the error in the L (, T ; H) norm instead of the L (, T ; V ) norm, as V H holds. The following theorem extends the convergence results from Theorem 5.37 to the gradient of Ĵ l 1. The proof can be found in [1, p. 64]. Theorem Let (V l ) l N be POD spaces generated by an arbitrary family of snapshots. (i) If u U ad such that Su, Au H 1 (, T ; V ) holds, then Ĵ l 1 (u) Ĵ1(u) as l. (ii) If (u l ) l N U ad is a sequence such that u l u (l ) for u U ad and Su, Au H 1 (, T ; V ) holds, then Ĵ 1 l(ul ) Ĵ1(u l ) and Ĵ 1 l(ul ) Ĵ1(u) as l. U Analogous to Theorem 5.33 it is possible to derive an a-priori error bound without including time derivatives into the snapshot space. Theorem Let u U be arbitrary and let (V l ) l N be POD spaces computed using the snapshots y 1 = Su and y = Au. Furthermore, define p l := A l u and p := Au. Then there is a constant C = C(C a, γ, η, T, C V ) such that the a-priori error estimate p l p C L(,T ;V ) i=l+1 λ i λ i ψi Q l ψi V (5.15) holds, where { ψ i } i N denotes the associated POD basis and { λ i } i N are the corresponding eigenvalues given by (4.). Moreover, Q l is the orthogonal projection defined in Lemma In addition, if Q l L(V ) is bounded independently of l, then it holds lim p l p =. L (,T ;V ) l

48 4 5 Bicriterial Optimal Control of Time-Dependent Convection-Diffusion Equations Proof. Using the same techniques as in the proof of Theorem 5.33, we split the error for almost all t (, T ) as p l (t) p(t) = p l (t) Q l (p(t)) + Q l (p(t)) p(t) =: θ l (t) + ρ l (t), where θ l (t) := p l (t) Q l (p(t)) V l and ρ l (t) := Q l (p(t)) p(t) V. Using Lemma 5.31 with the snapshots y 1 = Su and y = Au, we conclude y l y ρ + l = λ i ψi Q l ψi L (,T ;V ) L (,T ;V ) i=l+1 where y l = S l u and y = Su. Therefore, we only have to show that there is a constant C > such that θ l C λ i ψi Q l ψi, L (,T ;V ) V since the estimate i=l+1 p l p θ l ρ + l L (,T ;V ) L (,T ;V ) L (,T ;V ) holds. As in the proof of Theorem 5.33, we first need to find a uniform bound for θ l (t) H. For this purpose, we use p l = Q l (p) + θ l in the reduced order model (AE l ) as well as Remark (.1) and obtain d dt θl (t), ϕ H +a(t; ϕ, θ l (t)) = S l u(t), ϕ (V l ),V l+ d dt Ql (p(t)), ϕ H a(t; ϕ, Q l (p(t))) (5.16) for all ϕ V l and f.a.a. t (, T ). As the orthogonal projection Q l : H H is self adjoint (see [14, p. 3]), we have again the formula d dt Ql (p(t)), ϕ H = d dt p(t), Ql (ϕ) H = d dt p(t), ϕ H Then inserting the u-dependent part of (AE) into (5.16) yields V, for all ϕ V l a.e. and thus d dt θl (t), ϕ H + a(t; ϕ, θ l (t)) = S l u(t), ϕ (V l ),V l + Su(t), ϕ (V l ),V l + a(t; ϕ, p(t)) a(t; ϕ, Q l (p(t))) d dt θl (t), ϕ H + a(t; ϕ, θ l (t)) = y(t) y l (t), ϕ (V l ),V l + a(t; ϕ, p(t) Ql (p(t))). Choosing ϕ = θ l (t) V l as well as using the coercivity and the boundedness of the operator a(t;, ) as well as Young s inequality.4 with ε, ε >, we obtain the inequality 1 d θ l (t) θ dt + γ l (t) θ η l (t) y(t) y l θ (t) l θ (t) + C l ρ a (t) l (t) H V H H H V V 1 y(t) y l (t) ε + ε θ l (t) H H + ε θ l (t) + C a ρ l (t) V ε. V (5.17)

49 5.5 Reduced-Order Modelling Using POD 43 Then setting ε = 1 and ε = γ in (5.17) yields d θ l (t) y(t) dt y l (t) + C a ρ l (t) θ H H γ + (1 + η) l (t). V H Integrating the above inequality over (T t, T ) f.a.a. t (, T ), we obtain θ l (T t) θ l (T ) T y(s) y l (s) + C a ρ l (s) H H T t H γ T + (1 + η) θ l (s) ds H and thus by substituting r := T s θ l (T t) θ l (T ) t + y(t r) y l (T r) + C a ρ l (T r) H H H γ t + (1 + η) θ l (T r) dr. H T t Now, we are able to apply Gronwall s lemma to v(t) := θ l (T t) and find f.a.a. t (, T ) H ( θ l (T t) t ( θ exp (1 + η) ds) l (T ) T + y(s) y l (s) ) + C a ρ l (s) H V T t H γ ds V ( T exp ((1 + η)t) CV y(s) y l (s) T ) ds + C a ρ l (s) V γ ds V ( exp ((1 + η)t ) CV y y l ) + C a ρ l, L (,T ;V ) γ L (,T ;V ) (5.18) where C V is the embedding constant from (.1). Using (5.17) again with ε = 1 and ε = γ yield θ l (t) 1 ( d θ l (t) θ V γ dt + (1 + η) l (t) y(t) + y l (t) ) + C a ρ l (t) H H H γ V and thus by integrating over (, T ) and using (5.18) θ l 1 θ l (T ) L (,T ;V ) γ 1 θ l () H γ y y l + C V γ CC V T + C V γ y y l H L (,T ;V ) + C a (1 + η) + γ ρ l γ L (,T ;V ) + C C at γ y y l L (,T ;V ) + C a T L (,T ;V ) ρ l V ds θ l (s) ds H L (,T ;V ) ρ l γ, L (,T ;V ) exp ((1+η)T )(1+η) where C := γ. Hence, there is a constant C = C(C a, γ, η, T, C V ) such that the inequality (5.15) holds. The second assertion follows immediately from Lemma 5.31 and Theorem 4.6. Remark Note that results from Theorem 5.33 are also valid if we use the snapshots y 1 = Su and y = Au for u U in order to compute the POD spaces. V dr

50 44 5 Bicriterial Optimal Control of Time-Dependent Convection-Diffusion Equations A-Priori Convergence Analysis In this section we turn towards the main result of the POD analysis, namely we show that for a suitable reference point z the solution of the POD approximated Euclidean reference point problem converges to the one of the full ERPP for an increasing dimension of the POD space. In [1] this has been already shown for an optimal control problem governed by the heat equation with a time-independent convection term. As all solution operators in this thesis have the same properties, the proofs are exactly the same. Hence, for detailed proofs of the following results we refer to [1, pp ]. We assume that the POD spaces (V l ) l N are computed by an arbitrary family of snapshots. Lemma 5.5. Let z l N ( PY l + R ) be a reference point and denote by ū l the minimizer of F l z for all l N. Then the sequence (ū l ) l N U ad is bounded. Theorem Let z ( l N PY l + R ) be a reference point. Furthermore, denote by ū l and ū the minimizer of Fz l for all l N and the minimizer of F z, respectively. If Sū, Aū H 1 (, T ; V ) and Ĵ(ū) > z are satisfied, then it holds lim ū l ū =. U l Theorem 5.5. Let z l N ( PY l + R ) be a reference point. Furthermore, denote by ū l and ū the minimizer of F l z for all l N and the minimizer of F z, respectively. If Sū, Aū H 1 (, T ; V ), then the following assertions hold: (i) Ĵ1(ū l ) Ĵ1(ū) and Ĵ l 1 (ūl ) Ĵ1(ū) as l. (ii) Ĵ1(ū l ) Ĵ1(ū) and Ĵ l 1 (ūl ) Ĵ1(ū) as l. (iii) F z (ū l ) F z (ū) and F l z (ū l ) F z (ū) as l. Analogous to Theorems 5.33 and 5.48 it is possible to derive a similar a-priori error estimate for the control space. Theorem Let z l N ( PY l + R ) be a reference point. Furthermore, denote by ū l and ū the minimizer of F l z for all l N and the minimizer of F z, respectively. If the POD spaces (V l ) l N are computed using the snapshots y 1 = Sū and y = Aū as well as Ĵ l 1 (ū) z 1 and Ĵ (ū l ) > z for all l N hold, then there is a constant C = C(C a, γ, η, T, C V, z) > such that the a-priori error estimate ū l ū C U i=l+1 λ i ψi Q l ψi V (5.19) holds, where { ψ i } i N denotes the associated POD basis and { λ i } i N are the corresponding eigenvalues given by (4.). Moreover, Q l is the orthogonal projection defined in Lemma In addition, if Q l L(V ) is bounded independently of l, then it holds lim ū l ū =. U l

51 5.5 Reduced-Order Modelling Using POD 45 Proof. We start with the estimate ) (Ĵ l 1 (ū l ) + Ĵ1(ū) S z l 1 (ū l (Ĵ ū) + (ū) + Ĵ(ū l ) L (,T ;H) ) ) (Ĵ1 (ū) z 1 (Ĵ1 (ū) Ĵ 1(ū) l + Ĵ1(ū) Ĵ 1(ū), l ū l ū U = ) ) l + (Ĵ 1 (ū l l ) z 1 (Ĵ 1 (ū) Ĵ1(ū) ) (Ĵ1 (ū) z 1 ) ū z l ū ) Ĵ1(ū) Ĵ 1(ū), l ū l ū U + (Ĵ1 (ū) Ĵ 1(ū )) l l (Ĵ1 (ū) Ĵ 1(ū) l (Ĵ1 (ū) z 1 ) Ĵ1(ū) Ĵ l 1(ū), ū l ū U = ) + (Ĵ1 (ū) Ĵ 1(ū) l + Ĵ 1(ū) l Ĵ 1(ū )) l l (Ĵ1 (ū) Ĵ 1(ū) l ) = (Ĵ1 (ū) z 1 Ĵ1(ū) Ĵ 1(ū), l ū l ū U + (Ĵ1 (ū) Ĵ 1(ū) l ) l + (Ĵ 1 (ū) Ĵ 1(ū )) l l (Ĵ1 (ū) Ĵ 1(ū) l which was derived in the proof of Theorem 5.41 in [1]. Using Cauchy-Schwarz inequality as well as Lemmata 5.1 and 5.44, we obtain the following estimate for the first summand: ) (Ĵ1 (ū) z 1 Ĵ1(ū) Ĵ 1(ū), l ū l ū U ) (Ĵ1 (ū) z 1 Ĵ 1 (ū) Ĵ 1(ū) l ū l ū ) U U B = (Ĵ1 (ū) z 1 (ˆp + Aū) + B (ˆp + A l ū ū) l ū U U ) B = (Ĵ1 (ū) z 1 (A l ū ū Aū) l ū ) U U (Ĵ1 (ū) z 1 B L(L (,T ;V );U) A l ū Aū Furthermore, using Young s inequality with ε > yields (Ĵ1 (ū) z 1 ) C 1 ε ) L (,T ;H) ū l ū. U Ĵ1(ū) Ĵ 1(ū), l ū l ū U A l ū Aū + ε ū l ū (5.) L (,T ;H), U ) where C 1 := (Ĵ1 (ū) z 1 B L(L (,T ;V );U). Note that according to Lemma 5. it holds B L(L (,T ;V );U) <. In addition, using the fact that the objective functions are quadratic as well as Lemma 5.36 and Cauchy-Schwarz inequality, we obtain the following equality for the last summand of the initial estimate: ) l (Ĵ 1 (ū) Ĵ 1(ū l l ) (Ĵ1 (ū) 1(ū)) Ĵ l ( = Ĵ 1(ū l l ), ū ū l U + 1 ) ) Ĵ1(ū l l )(ū ū l ), ū ū l U (Ĵ1 (ū) Ĵ 1(ū) l ) = (Ĵ1 (ū) Ĵ 1(ū) l Ĵ 1(ū l l ), ū ū l U + 1 (Ĵ1 (ū) 1(ū)) Ĵ l S l (ū ū l ) U L (,T ;H)

52 46 5 Bicriterial Optimal Control of Time-Dependent Convection-Diffusion Equations Moreover, using Cauchy-Schwarz s inequality as well as Young s inequality with ε >, we infer ) l (Ĵ 1 (ū) Ĵ 1(ū l l ) (Ĵ1 (ū) 1(ū)) Ĵ l Ĵ1(ū) Ĵ 1(ū) l Ĵ 1(ū l l ū ) ū l + 1 (Ĵ1 (ū) U U 1(ū)) Ĵ l S l (ū ū l ) L (,T ;H) 1 ) (Ĵ1 (ū) ε Ĵ 1(ū) l ε + Ĵ l 1(ū l ) ū ū l + 1 ) (Ĵ1 (ū) U U Ĵ 1(ū) l S l (ū ū ) l, L (,T ;H) (5.1) whereby Ĵ 1 l (ūl ) U < for all l N holds (see Theorem 5.5). Now, inserting (5.) and (5.1) in the initial estimate as well as rearranging yield ) (Ĵ l 1 (ū l ) + Ĵ 1 l(ū) S z l 1 (ū l ū) L (,T ;H) (Ĵ (ū) + + Ĵ(ū l ) z ε ε ū Ĵ l 1(ū l ) U) l ū C ( 1 A l ū Aū ε ) (Ĵ1 (ū) L (,T ;H) ε 1(ū)) Ĵ l. Note that as Ĵ l 1 (ūl )+Ĵ l 1 (ū) z 1 holds, the first term can be ignored. Furthermore, choosing ε and ε small enough as well as using Ĵ(ū) + Ĵ(ū l ) z >, there is a constant C such that ū l ū 1 ( C1 ( A l ū Aū U C ε ) ) ) (Ĵ1 (ū) L (,T ;H) ε Ĵ 1(ū) l holds. Now, we only have to find an estimate for the last term in order to apply Theorems 5.48 and For this purpose, we use the third binomial formula as well as Lemmata 5.11 and 5.7 and obtain Ĵ1(ū) Ĵ 1(ū) l = 1 ŷ + Sū y Q ŷ L (,T ;H) + Slū y Q L (,T ;H) 1 ( ) ŷ ŷ + Sū y Q L (,T ;H) + + Slū L y Q (,T ;H) ŷ ŷ + Sū y Q L (,T ;H) + Slū L y Q (,T ;H) ( ) C 3 χ L (Ω;R m ) ū U + ŷ y Q L (,T ;H) ŷ ŷ + Sū y Q L (,T ;H) + Slū L y Q (,T ;H) with a constant C 3 >. Since it holds ( ) C 3 χ L (Ω;R m ) ū U + ŷ y Q L (,T ;H) <, U

53 5.5 Reduced-Order Modelling Using POD 47 we only have to find a bound for the second factor. Thus, we apply the reverse triangle inequality and conclude ŷ ŷ + Sū y Q L (,T ;H) + Slū L y Q Sū (,T ;H) Slū. L (,T ;H) Now, the claim follows immediately by applying Theorems 5.48 and A-Posteriori Analysis In this section we present an a-posteriori error estimate for ū l ū U which can be used as a measure for the quality of the solutions to (ERPP) l z. Corollary Let z P Y + R be an arbitrary reference point. Furthermore, let ū U ad be the minimizer of F z and u p U ad be arbitrary. If Ĵ(u p) z and Ĵ(u p ) > z hold, then we have the following a-posteriori estimates: and where ξ U is given such that is fulfilled. u p ū U Ĵ(u p) Ĵ(ū) R 1 ) 1 (Ĵ (u p ) z ξ U (Ĵ (u p ) z ) 1 ξ U, F z (u p ) + ξ, u u p U for all u U ad (5.) Proof. The claim follows immediately from Theorem 3. and Corollary The following result shows how ξ U can be obtained. The underlying idea of the construction of ξ is for instance explained in [15]. Lemma Let u p U ad be arbitrary. If we define ξ = ξ(u p ) U by min(, F z (u p ) i (t)) a.e. in {t [, T ] (u p ) i (t) = (u a ) i (t)}, ξ i (t) := max(, F z (u p ) i (t)) a.e. in {t [, T ] (u p ) i (t) = (u b ) i (t)}, F z (u p ) i (t) otherwise (5.3) for i = 1,..., m, then (5.) holds. Applying Corollary 5.54 to the solutions ū l of (ERPP) l z, i.e. choosing ū p = ū l, it is possible to show that under certain assumptions the a-posteriori error estimate for ū l ū U converges to for an increasing rank of the POD basis. Thus, it is reasonable to use this estimate as a measure for the quality of the solutions ū l. The proof of the following result was carried out in [1, pp ].

54 48 5 Bicriterial Optimal Control of Time-Dependent Convection-Diffusion Equations Theorem Let z ( l N PY l + R ) be a reference point. Furthermore, denote by ū l and ū the minimizer of Fz l for all l N and the minimizer of F z, respectively. If Sū, Aū H 1 (, T ; V ), then it holds lim ξ l U =, l where ξ l := ξ(ū l ) is given by (5.3). In particular, if additionally Ĵ(ū) > z as well as Ĵ(ūl ) z and Ĵ(ū l ) > z for all l N hold, then we have ū l ū U ) 1 (Ĵ (ū l ) z ξ l (l ) U and Ĵ(ūl ) Ĵ(ū) R 1 (Ĵ (ū l ) z ) 1 ξ l U (l ) POD-Based Optimization Algorithms Having applied the concept of model order reduction to the reduced bicriterial optimal control problem (RBOC) in the previous sections, we introduce two possible POD-based variations of Algorithm 1 for computing both the Pareto set P s and the Pareto front P f which will be used in our numerical experiments. The basic optimization routine proceeds as follows: First of all, compute the first Pareto optimal ū by minimizing Ĵ1, i.e. by solving (3.). Analogously, obtain the last Pareto optimal ū end by solving (3.3). Compute the full-order state y and the full-order dual p associated with ū and use these to build up a POD basis { ψ i } lmax i=1 of a certain rank l max. Then fix l < l max and continue by solving reduced-order Euclidean reference point problems using the POD basis { ψ i } l i=1 of rank l. We summarize this procedure in Algorithm. Algorithm POD-based Algorithm to compute the Pareto front Require: Maximum number N max N of Pareto points, recursive parameters h, h > ; 1: Solve (3.) and (3.3) in order to obtain ū and ū end ; : Set P s {ū } and P f {f(ū )}; 3: Get POD basis { ψ i } l i=1 from (P ODl ) using the state y and the dual p associated with ū ; 4: Set n and compute z 1 using (3.5); 5: while n < N max 1 and z n+1 1 < f 1 (ū end ) do 6: Set n n + 1; 7: Solve (ERPP) l z n with starting point ūn 1 ; 8: Set P s P s {ū n } and P f P f {f(ū n )}; 9: Compute z n+1 using (3.4); 1: Return P s P s {ū end } and P f P f {f(ū end )} Algorithm is a straightforward adaption of Algorithm 1 which uses a predefined and fixed number of POD basis functions. Consequently, in the numerical applications the value of l has to be chosen in advance depending on the desired approximation quality. However, since l is

55 5.5 Reduced-Order Modelling Using POD 49 supposed to be as small as possible in order to keep the computational effort feasible, it is not easy to find an appropriate value for it. One possible option to manage this issue is to use the condition (4.4), which is a ratio of modelled to total energy contained in the snapshots. However, this is a heuristic consideration and the provided results are mostly inaccurate. Therefore, it is reasonable to use a-posteriori error estimates in Algorithm in order to ensure the desired quality of solutions to the arising reduced-order optimization problems. In our numerical investigations we will mainly use the adaptive POD basis extension algorithm, which was developed and tested in [1, pp ]. The main idea of this algorithm consists in estimating the errors between the solutions of (ERPP) l z n and (ERPP) zn for each reference point z n by computing the recursive a-posteriori estimates ) 1 (Ĵ µ U (ū 1, z 1 (ū 1 ) z 1 ) := ξ(ū 1 ) U ) (Ĵ (ū 1 ) z 1 1 ξ(ū 1 ) U µ Y (ū 1, z 1 ) := 1 for n = 1 and ( ) 1 µ U (ū n, z n µ J (ū n 1, z n 1 ) + ) := Ĵ(ū n ) z n ξ(ū n ) U ( ) µ Y (ū n, z n ) := 1 µ J (ū n 1, z n 1 ) + Ĵ(ū n 1 ) z n ξ(ū n ) U for n =,..., N max, where µ J (ū n 1, z n 1 ) is given by ) h (Ĵ (ū n 1 ) µ Y (ū n 1, z n 1 ) z µ J (ū n 1, z n 1 n 1 ) := Ĵ(ūn 1 ) z n 1 R + µ Y (ū n 1, z n 1 ) Ĵ(ū n ) z n. Note that the a-posteriori estimates µ U and µ Y are measures for the error in the control and the objective space, respectively. Furthermore, remark that in course of this ū n denotes the solution to (ERPP) l z n and not to (ERPP) zn. The complete optimization procedure is illustrated in Algorithm 3. Unlike in Algorithm, we predefine the initial number of POD basis functions l init in the beginning and increase this number whenever the a-posteriori estimator µ U exceeds an upper bound µ of the acceptable error. In this way it can be ensured that the approximation error in the control space is below the predefined threshold µ for each Pareto point. Remark (i) The a-posteriori estimates µ U and µ Y are implications of Theorem 3. and Corollary The idea consists in using the information about previous Pareto and reference points in order to derive a lower bound for the expression Ĵ(ū) z, which does not depend on the unknown solution ū. The reason for using these estimates instead of the a-posteriori estimates derived in Corollary 5.54 the better efficiency. More information about it can be found in [1, pp ].

56 5 5 Bicriterial Optimal Control of Time-Dependent Convection-Diffusion Equations (ii) Analogous to Remark 3.1, it is also possible to start Algorithms and 3 by ū end and to traverse the Pareto front P f from bottom to the top. In this case, the first non-zero optimal control is used for computing the POD basis. Algorithm 3 Adaptive POD basis extension Algorithm Require: Maximum number N max N of Pareto points, recursive parameters h, h >, initial number of POD basis functions l init, threshold µ; 1: Solve (3.) and (3.3) in order to obtain ū and ū end ; : Set P s {ū } and P f {f(ū )}; 3: Get POD basis { ψ i } d i=1 from (P ODl ) using the state y and the dual p associated with ū ; 4: Set l l init ; 5: Set n and compute z 1 using (3.5); 6: while n < N max 1 and z1 n+1 < f 1 (ū end ) do 7: Set n n + 1 and check ; 8: while check = do 9: Solve (ERPP) l z n using POD basis { ψ i } l i=1 of rank l and starting point ūn 1 ; 1: Compute the a-posteriori estimate µ U (ū n, z n ); 11: if µ U (ū n, z n ) < µ then 1: Set check 1; 13: else 14: Set l l + 1; 15: Set P s P s {ū n } and P f P f {f(ū n )}; 16: Compute z n+1 using (3.4); 17: Return P s P s {ū end } and P f P f {f(ū end )}

57 Chapter 6 Numerical Results In the present chapter numerical experiments for the bicriterial optimal control problem governed by the convection-diffusion equation with a time-dependent convection term and bilateral control constraints, which was introduced in the previous chapter, are presented. In the first part of the chapter we assume the spatial domain Ω to be perfectly isolated, which yields a homogeneous Neumann boundary condition in the PDE-constraint. In order to investigate this problem numerically, we analyse the result that we get by solving both the full and the POD approximated problem. For this purpose, Algorithms 1, and 3 are used. Afterwards, new strategies for efficiently updating the POD basis in the optimization process are proposed and tested numerically. In the last part of the chapter results for the optimal control problem using the inhomogeneous Robin boundary condition are analysed and compared with the previous results. 6.1 Implementational Aspects In this section we provide essential details about the numerical implementation. All algorithms and optimization routines are implemented in MATLAB R17a and are based on codes, which were used in [1]. Thereby, all test runs are performed on a MAC (Middle 1) with a Intel.5 GHz Intel Core i5 Duo processor, 4 GB RAM and SSD. Details for the Discretization We use linear finite elements in order to discretize the state and adjoint equations which arise in the course of optimizing the full problem. The mesh on Ω is generated using the MATLAB PDE toolbox function initmesh with a maximum edge length h =.5. This results in a triangulation with N x = 71 and N x = 714 nodes in case of Example I and Example II, respectively. In order to assemble the mass matrix M and the stiffness matrix A, we use MATLAB PDE toolbox function assema. For the discretization in time we choose the final time T = 1 and N t = 1 equidistantly distributed time instances which are given by t i := i t for 1 i N t and t := T N t = 1 1. The time integration of the PDEs is performed by using the Crank-Nicolson method. Solving the Scalar Optimization Problems For the numerical solving of the arising scalar optimization problems (ERPP) z - or (ERPP) l z in the case of model-order reduction by POD - for different reference points z P f + R we

58 5 6 Numerical Results use the projected Newton-CG method, see for instance [16] and [17], which utilizes evaluations of the grandient F z and the second derivative F z. These are obtained by using the adjoint operator B and the adjoint equation (AE) as illustrated in Section 5.4. According to Lemma 5., the operator B can be computed by using numerical integration. As the stopping condition for the optimization routine we utilize the a-posteriori estimates from Corollary 5.54, i.e. we stop if both conditions u n ū U (Ĵ (u n ) z ) 1 ξ(u n ) U ε U and Ĵ(u n) Ĵ(ū) R 1 (Ĵ (u n ) z ) 1 ξ(u n ) U ε Y are fulfilled, where ū denotes the unknown optimal solution to (ERPP) z, u n is the current iterative and ξ(u n ) is given by (see Lemma 5.55) min(, F z (u n ) i (t)) ξ i (u n )(t) := max(, F z (u n ) i (t)) F z (u n ) i (t) a.e. in {t [, T ] (u n ) i (t) = (u a ) i (t)}, a.e. in {t [, T ] (u n ) i (t) = (u b ) i (t)}, otherwise. Note that in case of solving the POD approximated problem (ERPP) l z, the gradient F z has to be replaced by F l z and ū is the unknown optimal solution to (ERPP) l z. In both cases we do not need to solve an additional state or adjoint equation in order to compute ξ(u n ), since the gradient and the function values at the current iterative are already available. In our numerical experiments we use the thresholds ε U := 1 4 and ε Y := 1 5 for the errors in the control and the objective space, respectively. Computing Pareto Set and the Pareto Front In order to compute an approximation of the Pareto set P s and the Pareto front P f of the bicriterial optimal control problem, we will use Algorithm 1, and 3 in dependency on the context. However, here we deviate from those algorithms by solving the weighted optimization problem ) min (Ĵ1 (u) +. Ĵ(u) s.t. u U ad (6.1) in order to obtain the first Pareto optimal point ū. This is due to the fact that Ĵ 1 is only positive definite, but not strictly positive definite. This makes the numerical optimization of Ĵ 1 difficult to handle. Therefore, we regularize the problem by putting a small weight on the objective function Ĵ. On the other hand, the minimizer of the objective function Ĵ can be easily obtained, as u = fulfils the box constraints. In the case of model order reduction, we use the snapshots y 1 := Sū and y := Aū in order to generate the POD basis. Thereby, the computation is performed by utilizing the method of snapshots as illustrated in Remark Furthermore, the V -norm on the finite element space is modelled by utilizing the weighted inner product, W (see Definition 4.9) with symmetric

59 6. Example I 53 and positive definite matrix W = M + A. Here, the matrices M and A denote the finite element mass matrix and the finite element stiffness matrix, which are defined as follows: M ij := ψ i, ψ j R m, A ij := ψ i, ψ j R m. For generating the reference points we use h =.5 in Example I and h = 1.5 in Example II. The parameter h will be varied in order to investigate its influence on the behaviour of the solutions. 6. Example I To begin with, we investigate numerically the bicriterial optimal control problem (BOC) which was considered in previous chapter. For this purpose, we assume the perfect isolation of the spatial domain Ω by setting α = on the whole boundary Γ = Ω. This yields a homogeneous Neumann boundary condition in the PDE-constraint, i.e. we consider the optimization problem: ( 1 y y ) Q L (,1;L (Ω)) subject to min J(u, y) = 1 u L (,1;R m ) y t (t, x) κ y(t, x) + c β β(t, x) y(t, x) = m i=1 u i(t)χ i (x) for (t, x) (, 1) Ω y η (t, x) = for (t, x) (, 1) Γ y(, x) = y (x) for x Ω (BOC1) and u a (t) u(t) u b (t) for almost all t [, 1]. Besides that, we use the following parameter and function values in this example: The room Ω is given by the two-dimensional unit square, i.e. Ω = (, 1). The diffusion parameter is given by κ =.5. As the convection term β(t, x) for all (t, x) (, 1) Ω we use a non-stationary solution of a Navier-Stokes equation. This describes a stream which goes from the upper left to the lower right corner of the room, see Figure 6.1. In addition, there is a vortex moving through the room in the same manner. Though, the motion of the vortex ends in the lower first half of the room approximately at the time instance t =.1. The parameter c β is a constant to control the strength of the convection. Moreover, it holds β L (,T ;L (Ω;R n )) = We impose a floor heating in the whole room with 4 uniformly distributed heaters in the domains Ω 1 = (,.5), Ω = (,.5) (.5, 1), Ω 3 = (.5, 1) (,.5) and Ω 4 = (.5, 1). The bilateral control constraints are u a = and u b = 3. This yields U ad := {u U u(t) 3 f.a.a. t [, 1]}.

60 54 6 Numerical Results (a) t = (b) t = (c) t = (d) t = 1 Figure 6.1: Time-dependent convection β(t, x) at four time instances. As an initial condition we suppose a constant temperature of 16 Celsius in the whole room, i.e. y (x) = 16 for all x Ω. Finally, we choose y Q (t, x) = 16 + t for all (t, x) (, 1) Ω, i.e. the desired temperature is supposed to increase uniformly from 16 Celsius at the starting time to 18 Celsius at the final time T = 1. In the following we will use notations introduced in the previous chapters Solving the Full Problem In this section we analyse the results for the bicriterial optimal control problem (BOC1) that we get by using the finite element method. By running Algorithm 1, we obtain a family (ū n Full, P n Full, zn Full ) n=1,...,n Full

61 6. Example I 55 of solutions to the corresponding Euclidean reference point problems, where (ū n Full ) n denote the optimal controls, (PFull n ) n := (Ĵ (ūn Full )) n are the associated Pareto optimal points in the objective space and (zfull n ) n are the utilized reference points. Accordingly, N Full is the number of the inner Pareto optimal points. Furthermore, recall that (ū Full, P Full ) is obtained by solving the weighted optimization problem (6.1), whereas (ū N Full+1 Full, P Full+1 Full ) is gained by minimization of Ĵ. For convenience, we additionally define for all n = 1,..., N. ū n Full := ūn, P n Full := P n, z n Full := zn, N := N Full 5 4 Pareto front Reference points 5 4 Pareto front Reference points (a) Time-dependent convection (b) Time-independent convection Figure 6.: Pareto front in the time-dependent (left) and the time-independent case (right). In our first experiment we run Algorithm 1 for c β = 1 and compare these results with the ones that we get by using the time-independent convection term β(x) in order to investigate the influence of the time-dependency on the solutions of (BOC1). Thereby, the time-independent convection β(x) is obtained from the original convection β by taking the average over time. For the computation of the reference points the parameter h =.1 is used. In the left plot of Figure 6. we can see the Pareto front P f in the time-dependent case, namely for the convection term β(t, x). First of all, we observe that P f is smoothly approximation by 5 Pareto optimals, i.e. N = 5. Hereby, P f ranges from P = (.199, 4.1) to P 51 = (.6667, ). Thus, the desired temperature can be achieved quite closely in the upper part of the Pareto front. Note that the value.199 is only an approximation for min u Uad Ĵ 1 (u), as ū is computed by solving the weighted optimization problem (6.1). However, this is a satisfactory solution for our purposes. Furthermore, it holds y id = (.199, ) and y nad = (4.1,.667). In a next step we want to investigate the heating strategies for different controls. For this purpose, we choose three optimal controls which are situated at the top, the middle and the bottom of the Pareto front. The influence of the time-dependent convection term on the controls can be seen in the upper half of Figure 6.3. We observe that all three controls have almost the same qualitative behaviour which only differs in scale. By especially looking at the optimal control ū, we see that all four heaters adapt to the air flow, which goes from the upper left to

62 56 6 Numerical Results Control Region 1 Region Region 3 Region 4 Control Region 1 Region Region 3 Region 4 Control Region 1 Region Region 3 Region Time (a) Optimal control ū for β Time (b) Optimal control ū 5 for β Time (c) Optimal control ū 5 for β Control Region 1 Region Region 3 Region 4 Control Region 1 Region Region 3 Region 4 Control Region 1 Region Region 3 Region Time (d) Optimal control ū for β Time (e) Optimal control ū 5 for β Time (f) Optimal control ū 5 for β Figure 6.3: Optimal controls in the time-dependent and the time-independent case. the lower right corner of the room, by using different heating strategies. In particular, non of them is active on the upper bound of the constraints. Furthermore, we note that heaters one and two have their maximums at the beginning of the time interval and decrease monotonically to at T = 1. Thereby, heaters three and four show an analogous behaviour, however, the decrease is not monotonous in the first half of the time interval. This can be explained by the fact that not heating at the beginning would result in a temperature deviation which could be corrected only by a disproportionately high heating input. In addition, one can observe a wavy behaviour of the optimal controls at the beginning. This is due to the temporal changes in dynamics of the system caused by a vortex moving over time from the upper left corner to the middle of the room. Moreover, as the warm air is transported from the second region into the others, mainly into regions three and four, heater two has to heat the most. Consequently, heaters three and four need to heat the least. In the left plot of Figure 6.4 we can see the effect of optimal heating strategies on the deviation from the desired temperature y Q, i.e. the graph of the mapping t ŷ(t, ) + Sū(t, ) y Q (t) L (Ω) for ū {ū, ū 5, ū 5 }. We observe that for ū the desired temperature can be reached quite accurately almost in the whole time interval. Only at the end of the time period we can see a deviation which is monotonically increasing, since the heating input decreases to over time. Furthermore, by looking at the associated temperature distribution at the end of the time interval in the left plot of Figure 6.5, we can see that the temperature is not homogeneous. It

63 6. Example I (a) Time-dependent convection (b) Time-independent convection Figure 6.4: Deviation from the desired temperature y Q. has its maximum of 17.5 which is attained almost in the whole upper half of the room and decreases gradually towards the lower half to Consequently, the desired temperature of 18 is not reached in any of the regions. As the control ū is not active on the upper boundary, we conclude that only the weight of. on the heating costs, i.e. on objective Ĵ, prevents ū from heating more. This behaviour of the controls also results in a higher deviation from the desired temperature at the end of the time interval and is due to the air flow which transports the warm air from the second region mainly into regions three and four. It is clear that the deviation for ū 5 and ū 5 is bigger than for ū, as the heating input is significantly lower. Now we turn to analysing the results that we get for the time-independent convection β(x). Exactly as in the last case, we observe in the right plot of Figure 6. a smooth approximation of the Pareto front by 5 Pareto optimal points. Hereby, it ranges from Pareto point P = (.199, 4.95), which is computed by (6.1), to Pareto point P 51 = (.6667, ), i.e. it holds y id = (.199, ) and y nad = (4.95,.667). Consequently, the desired temperature distribution can be reached as good as in the previous case by using almost the same heating costs. Note that the last Pareto optimals are the same in both cases. This is due to the fact that not controlling just leaves the temperature constant, even though there is an air flow in the room. Looking at the optimal controls ū, ū 5 and ū 5 in the lower half of Figure 6.3, we can see a remarkable similarity to the last case. In particular, all three optimal controls describe almost the same heating strategies as in the time-dependent case. However, there is no wavy behaviour at the beginning anymore. This is an expected result, as there are no temporal changes in the air flow caused by the moving vortex, i.e. the strength of the air stream is constant at each point of the room. In the right plots of Figures 6.4 and 6.5 we can additionally see the deviation from the desired temperature over time and the temperature distribution at the final time T = 1, respectively. Again, it is remarkable that there are almost no differences to the time-dependent case. Consequently, we can conclude that in both cases the temperature distributions have to be very similar at each time instance. This can be explained by analysing the structure of the time-dependent

58 6 Numerical Results (a) Time-dependent convection (b) Time-independent convection Figure 6.5: Temperature distribution at T = 1 associated with ū. convection β(t, x) in the present example.

64 58 6 Numerical Results (a) Time-dependent convection (b) Time-independent convection Figure 6.5: Temperature distribution at T = 1 associated with ū. convection β(t, x) in the present example. On the one hand, the air flow is mainly dominated by a stream which goes from the upper left to the lower right corner, whereby its direction and strength remain approximately constant almost over the whole time interval. And on the other hand, the most noticeable temporal changes occur only at the beginning of the time interval, until t =.1, and seem not to have a big impact on the air flow as a whole. Thus, having taken the average over time in order to construct β(x), we still captured the main information about the air flow in the room. Finally, we want to compare the computation times in both cases. Using the time-independent convection β(x), the computation of the whole Pareto front took about 6.49 seconds, whereas in the time-dependent case it took seconds and thus about 1% more computation time were needed. As the latter case adds dynamics to the optimal control problems, which are more difficult to handle numerically, we have been expecting a more significant increase of the computation time. In addition, in Figure 6.6 the computation time for solving each Euclidean reference point problem can be observed. First of all, we notice that, as expected, the computation time in the time-dependent case is higher for nearly every Pareto point. In particular, both graphs show almost the same qualitative behaviour. As the number of used Newton-CG iterations at each Pareto point is identical in both cases with exception of P 16, we assume that this is only due to the higher computational effort by the time integration via the Crank-Nicolson method caused by the time-dependency of the convection term. 6.. Solving the POD Approximated Problem In this section we apply the POD method to the bicriterial optimal control problem (BOC1) as shown in Section 5.5 in order to reduce the computational effort. The focus of our first experiments will be on investigating the influence of the time-dependent convection term on the quality of the approximation when using a fixed number of POD basis functions. Furthermore, we will run the adaptive basis extension algorithm (see Algorithm 3) for both the time-dependent and the time-independent convection term and compare the results. Additionally, we test the

65 6. Example I Time-dependent convection Time-independent convection Computation time Number of Pareto point Figure 6.6: Computation times for the inner Pareto optimals. efficiency of the a-posteriori estimates µ U and µ Y which are used in this algorithm. These insights will be used in order to propose new POD basis update strategies which will be tested numerically in the last half of this section. Influence of Time-dependent Convection on the Quality of the Approximation First of all, we investigate the influence of the time-dependent convection term β(t, x) on the quality of the POD approximation by using a fixed number of POD basis functions, i.e. by using Algorithm. For this purpose, we solve both the full and the POD approximated problem for c β {1, 1.5, } independently of each other and compare these results with the ones that we get by using the time-independet convection term β(x) (using the constant c β = 1) from the previous section. In the course of this, we obtain two families of solutions (ū n POD, P n POD, z n POD) i=1,...,npod and (ū n Full, P n Full, zn Full ) i=1,...,n Full, where in all runs N POD = N Full =: N holds. Therefore, we are able to consider the errors ū n POD ū n Full U, P n POD P n Full R, z n POD z n Full R for all n = 1,..., N. The focus of our experiments in this section will be on investigating these errors. Note that we always have ū POD = ū Full and ūn+1 POD = ūn+1 Full. Consequently, the associated errors are zero. In Figure 6.7 we can see the errors for fixed l = 5 and h =.1. Since we obtain a different number of Pareto optimal points by varying the parameter c β, we use the relative location on the Pareto front on the x-axis, i.e. it holds P = and P N+1 =1, in order to show all results in the same plot. First of all, we observe that in comparison to the time-independent case using the time-dependent convection term increases the error in the control space by a factor of about 1. On the other hand, increasing the strength of the time-dependent convection only has a small influence on this error. In all four cases the error decreases drastically at the end of the Pareto front.

66 6 6 Numerical Results Error Error Error Relative location on Pareto front (a) Error in control space Relative location on Pareto front (b) Error in objective space Relative location on Pareto front (c) Error of reference points Figure 6.7: Errors between the solutions of the POD approximated problem with fixed l = 5 and the full problem. Unfortunately, the results for the objective space and the reference points are not as clear as for the control space. By especially looking at the errors in the objective space, we first observe that analogous to the numerical results in [1, pp. 81-8] all errors are lower by a factor of about 1 3 in the first half of the Pareto front than the corresponding errors in the control space. However, as the errors in the control space are monotonically decreasing, the difference gets much smaller at the end. Furthermore, we can additionally see that using the time-dependent convection also increases the error in comparison to the time-independent case in large parts of the Pareto front. Here, we observe an increase in the error by a factor of about 1 at the beginning, whereas in the middle of the Pareto front this factor decreases to 4. Thereby, the error at the end of the Pareto front is even smaller than in the time-independent case. Moreover, unlike the errors in the control space, increasing the strength of the time-dependent convection to 1.5 leads to a small increase in the error in comparison to the case c β = 1. However, at the end of the Pareto front the difference gets even significantly bigger. The error for c β = shows a very similar behaviour. Though, in this case the error is higher by a factor of about 1 almost on the whole Pareto front than the one for c β = 1. In contrast to the errors in the control and the objective space, the error in the reference points in the time-dependent case is actually below the one in the time-independent case in large parts of the Pareto front. Only at the beginning it is larger by a factor of 1. As the reference points are computed by using the values in the objective space, we assume that this is due to the behaviour of the errors in the objective space. Moreover, we can additionally observe that increasing the strength of the convection has a much stronger impact on the error in comparison to the previous cases. Here, increasing the parameter c β by.5 leads each time to an increase in the error by a factor of about 1 1 almost on the whole Pareto front. These phenomena can be explained by the fact that using the time-dependent convection adds more complicated dynamics to the state and the adjoint equation than the time-independent one, whereas increasing the strength of the convection only intensifies these dynamics. A mathematical explanation can be obtained by looking at the eigenvalues { λ n i }m i=1 of the operator Rn, which is given by (4.6). From Theorem 4.1 we know that the sum d λ i=l+1 n i measures the error of the POD approximation. Indeed, it turns out (see Figure 6.8) that the eigenvalues decrease significantly faster for the time-independent convection than the ones for the time-dependent convection for all values of c β. Furthermore, as expected, increasing the parameter c β slows

67 6. Example I Eigenvalue Number of eigenvalue Figure 6.8: Eigenvalues for different values of c β and time-independent convection. down the decrease of the eigenvalues. This confirms our argumentation from above. In the next step, we increase the number of POD basis functions in order to see how the approximation errors behave. For this purpose, we fix c β = 1 and run the algorithm for l {5, 1, 15}. The resulting errors can be observed in Figure 6.9. As expected, all three errors decrease for an increasing number of POD basis functions in the main. However, there are some points at the end of the Pareto front for which the error in the objective space is actually smaller for l = 5 than for l = {1, 15}. So far, we were not able to find an explanation for this phenomenon. Furthermore, there are also some points for which the error for the reference points is lower for l = 5 than for l = 1. Analogous to the last case, we assume that this is due to the behaviour of the error in the objective space and not due to a better approximation Error Relative location on Pareto front (a) Error in control space Error Relative location on Pareto front (b) Error in objective space Error Relative location on Pareto front (c) Error of reference points Figure 6.9: Errors between the solutions of the POD approximated and the full problem for different values of l. Finally, we fix c β = 1, l = 5 and run algorithm for h {.5,.1,.} in order to investigate the behaviour of the errors when controlling the fineness of the approximation of the Pareto front. The results are presented in Figure 6.1. First of all, it can be seen that the errors in the control space are almost identical, i.e. varying the fineness parameter h does not have any

68 6 6 Numerical Results effects on this error. However, there are noticeable deviations in the courses of the errors in objective space and the errors of the reference points, especially for h = Error Error Error Relative location on Pareto front (a) Error in control space Relative location on Pareto front (b) Error in objective space Relative location on Pareto front (c) Error of reference points Figure 6.1: Errors between the solutions of the POD approximated and the full problem for different values of h. Results Using the Adaptive Basis Extension Algorithm Now, we turn to analysing the results that we get by using Algorithm 3, which was introduced in Section and utilizes an adaptive POD basis extension strategy. In comparison to Algorithm, which uses a fixed number of POD basis functions, this method utilizes recursive a-posteriori estimates µ U and µ Y for the errors in the control and the objective space in order to assess the quality of the solutions to the arising reduced-order optimization problems. Therefore, whenever the a-posteriori estimate µ U exceeds an upper bound µ of the acceptable error, the number of currently used POD basis functions is enlarged and the considered reference point problem is recomputed. In this manner it is ensured that the approximation error in the control space stays below the predefined threshold µ on each Pareto optimal point. These properties make Algorithm 3 very attractive for numerical applications. In [1, pp ] Algorithm 3 was successfully utilized to a very similar bicriterial optimal control problem with a time-independent convection term. Furthermore, it was shown that the a-posteriori estimates µ U and µ Y have a good efficiency in this case, i.e. they approximate the real error quite accurately. The focus of our experiments will be on investigating the influence of a time-dependent convection term on the number of used POD basis functions in Algorithm 3. Moreover, the efficiency of the a-posteriori estimates µ U and µ Y in the time-dependent case will be investigated. To begin with, we run Algorithm 3 for c β = 1 and compare the results with the ones we get for the time-independent convection term β(x) defined in Section For this purpose, we set the upper bound of the acceptable error in the control space to µ = and the initial number of POD basis functions to l init = 6. Furthermore, we use the parameter h =.1 in order to compute the reference points. Note that it is not reasonable to choose µ ε U, where ε U denotes the threshold for the distance between the current suboptimal control and the unknown optimal control in the optimization routine of (ERPP) l z. Therefore, by setting µ ε U we would just ensure that the computed optimal control is closer than ε U to the solution of (ERPP) l z, while expecting at the same time that this suboptimal control is closer than µ to the solution

69 6. Example I Computation time Number of Pareto point Number of Pareto point Figure 6.11: Number of used POD basis function in Algorithm 3 using l init = 6 (left); associated computation times (right). of (ERPP) z. First of all, we observe in the left plot of Figure 6.11 that 4 POD basis functions are needed to compute the whole Pareto front in the desired approximation quality. Thereby, all 18 basis extensions are conducted on the Pareto optimal point P 1. In contrast, only 11 and thus half as much POD basis functions are required in the time-independent case, whereby all 5 basis extensions are performed on P 1 as well. These are expectable results, as in the experiments for fixed l = 5, we have already seen that using a time-dependent convection term increases the error in the control space by a factor of about 1 compared to the time-independent case. Again, this is due to the fact that a time-dependency of a convection term adds more complex dynamics to the system. Furthermore, by observing the first plot of Figure 6.9 we found out that the approximation error in the time-dependent case has its maximum at P 1 and decreases drastically in the end of the Pareto front. These facts explain why all basis extensions are conducted only on P 1. The same reasoning can be applied to the time-independent case, see first plot of Figure 6.7. Table 6.1: Results for Algorithm 3 using l init = 6. Comp. time #Basis extensions Max. value of l Time-dependent convection Time-independent convection As a consequence, the computation time in the time-dependent case is higher by about 4% than the one in the time-independent case. An interesting aspect is however the fact that the increase in the computation time is mainly due to the higher number of conducted POD basis extensions (see right plot of Figure 6.11) and not due to the significant higher number of used POD basis functions. In particular, the computational effort for a separate subsequent Pareto point in the time-dependent case is only 8% higher, although in this case linear systems of much higher

70 64 6 Numerical Results dimension have to be solved many times in the optimization process on the one hand and on the other hand the arising scalar optimization problems are more difficult to handle numerically due to the time-dependency. We assume that this is due to the implementational aspects of the MATLAB internal solver, which still handles the arising linear systems of dimension 4 quite efficiently. Note that here and in the following experiments we only consider the computation time for solving the Euclidean reference point problems and not the entire running time of the Algorithm 3. The reason for it is the fact that the outer Pareto optimal points are always computed by the finite element method. Table 6.: Results for Algorithm 3 using l init = 6. c β Comp. time #Basis extensions Max. value of l Avg. comp. time per point In our next experiment we investigate the influence of the strength of the convection term on the number of used POD basis functions in Algorithm 3. For this purpose, we additionally run the algorithm for c β {1.5, } by using the same settings as in the previous test. The results are summarized in Table 6.. First of all, we observe that exactly as for c β = 1 all POD basis extensions are conducted on the Pareto point P 1. Thereby, maximal 4 and 5 POD basis functions are used for c β = 1.5 and c β =, respectively. Consequently, increasing the strength of the convection term and thus increasing the dynamics in the optimal control problem has almost no effect on the number of used POD basis functions in this present example. Note that these results correspond with the ones that we got for fixed l = 5 in the last section, as in Figure 6.7 we have already seen that errors in the control space are almost identical for different values of c β. Furthermore, the eigenvalues { λ n i }m i=1 of the operator Rn have almost the same course for c β {1, 1.5, }, see Figure 6.8. Therefore, in each case almost the same number of POD basis functions is sufficient in order to achieve the desired approximation quality, see Theorem 4.1. Another interesting aspect worth considering is the fact that increasing the strength of the convection term increases slightly the computation time. However, in each case the Pareto front is approximated by a different number of points. In particular, in the case c β = 1.5 the Pareto front is approximated by 48 points, whereas in the case c β = only by 47. Therefore, it is reasonable to consider the average computation times per Pareto point instead of overall calculation time. By doing so, we can observe a monotonous increase of the computation time as well. Thereby, the computation time for c β = 1.5 and c β = is 7% and 1% higher than the one for c β = 1. This is due to the increased dynamics of the optimal control problem which are more difficult to handle numerically, i.e. more Newton-CG iterations are needed for convergence. In the second part of this section we analyse the quality of the a-posteriori estimates µ U and µ Y, which are utilized in Algorithm 3, in the time-dependent case. For this purpose, we will use again Algorithm for different parameters. Thereby, in order to assess the quality of the investigated a-posteriori estimates, we introduce the following notion of efficiency. Definition 6.1. Let a, b R and let b be an overestimate for a, i.e. it holds a b. Then we

71 6. Example I 65 define the so-called estimate efficiency η by η := b a [1, ), where η = 1 means the ideal efficiency. Of course, in order to measure the efficiency of the a-posteriori estimate in the control space, we have to set a := ū n Full ūn POD U and b := µ U (ū n POD, zn ). Analogously, for the efficiency of the a-posteriori estimate in the objective space we set a := Ĵ (ūn Full ) Ĵ (ūn POD ) R and b := µ Y (ū n POD, zn ). However, note that, in comparison to the experiments in the last section, here the full and the POD approximated problem have to be solved by using the same reference points (z n ) n=1,...,n. Otherwise, as the reference points of both solutions usually do not coincide, there is an additional error term depending on zfull n zn POD R, which falsifies the efficiency of the a-posteriori estimates (see for example Corollary 5.54). Therefore, in this experiment we first solve the full problem and take the reference points (zfull n ) n=1,...,n Full as reference points for the POD approximated problem, i.e. we set N := N Full and z n := zfull n for all n = 1,..., N Efficiency 1 1 Efficiency 1 1 Efficiency Relative location on Pareto front (a) Control space, varying c β Relative location on Pareto front (b) Control space, varying l Relative location on Pareto front (c) Control space, varying h Efficiency Efficiency 1 1 Efficiency Relative location on Pareto front (d) Objective space, varying c β Relative location on Pareto front (e) Objective space, varying l Relative location on Pareto front (f) Objective space, varying h Figure 6.1: Efficiencies of the a-posteriori estimates used in Algorithm 3. In our first test we compare the efficiency for varying convection constants c β {1, 1.5, } and the time-independent convection β(x). For this purpose, we run Algorithm each time using the parameters l = 1 and h =.1. The corresponding efficiencies in the control and the objective space are presented in the first and the fourth plot of Figure 6.1. First of all, we observe that in all cases the efficiency in the control space is remarkably good, namely it is very close to 1 in

72 66 6 Numerical Results large parts of the Pareto front. However, there are some points at the end of the Pareto front at which the efficiency is significantly worse. Especially in the time-independent case the efficiency is particularly bad in this region. Though, as the error in the control space is monotonically decreasing at the end of the Pareto front (see first plot of Figure 6.7), the number of POD basis functions, which is needed in order to achieve a certain POD approximation quality, is also decreasing. Thus, bad efficiency at the end of the Pareto front does not affect negatively the number of used POD basis function in Algorithm 3. By comparing both figures, it can be seen that the efficiency in the objective space is worse than the one in the control space by a factor of about almost on the whole Pareto front. Thereby, the problem with the time-independent convection has the best efficiency with exception of some Pareto points. Furthermore, we can additionally see that in the time-dependent case the efficiency is getting worse when increasing the strength of the convection. In the second test we fix c β = 1, h =.1 and run Algorithm for l {1, 15, }. By looking at the results which are presented in the second and the fifth plot of Figure 6.1, we can see a very similar picture as in the last test. Analogously, the efficiency in the control space is very close to 1 in large parts of the Pareto front, whereby it gets significantly worse at the end. Though, here we can see a clear pattern, namely the efficiency is getting worse when increasing the value of l. The same behaviour can be observed in the efficiencies of the estimates in the objective space on the whole Pareto front. Furthermore, by comparing the efficiencies in the control and the objective space, it can be seen again that the efficiency in the latter case is worse by a factor of about in the main part of the Pareto front. Lastly, we set c β = 1, l = 1 and compare the results for h {.5,.1,.} by observing the third and the sixth plot of Figure 6.1. First of all, we notice that the efficiency in both the control and the objective space is almost identical for {.5,.1} with exception of some points. However, there is a big jump in the efficiency in the control space for h =.5at P 1. So far, we were not able to find an explanation for it. By observing especially the results for h =., it can be additionally seen that the efficiency in the control space is slightly worse for h =. in large parts of the Pareto front than the one for {.5,.1}. However, a reversed behaviour can be seen in the objective space. Note that all three parameter settings have a zigzag behaviour of the efficiencies at the end of the Pareto front in common. This is presumably due to the fact that the real errors ū n Full ūn POD U and Ĵ (ūn Full ) Ĵ (ūn POD ) R decrease very strongly at the end of the Pareto front. However, this decrease cannot be completely captured by the a-posteriori estimates µ U and µ Y (see [1, p. 86]). Furthermore, there are some points at the end of the Pareto front in the efficiencies of the estimates in the objective space for which the efficiency is actually below 1. This is only due to the inaccuracy of the computation of the optimal controls of the full problem. As a conclusion, we can say that the efficiency of the a-posteriori estimates µ U and µ Y is remarkably good in all parameter settings. Basis Update Strategy In the previous sections we have seen that using the time-dependent convection term instead of the time-independent one significantly increases the error in both the control and the objective space. In particular, by applying the adaptive basis extension algorithm, we additionally found out that 4 POD basis functions are needed in the time-dependent case in order to compute

73 6. Example I 67 the whole Pareto front in the desired approximation quality, whereas in the time-independent case only 11 basis extensions are sufficient in order to achieve the same quality. As already mentioned, this is due to the fact that the time-dependency of the convection term adds more complex dynamics to the optimal control problem which are hard to capture by using only one fixed POD basis. As a consequence, the computational effort has been significantly increased in comparison to the time-independent case, as a lot of avoidable POD basis extension were done in this case. Furthermore, it has been shown that the a-posteriori estimate µ U for the error in the control space, which is used in Algorithm 3 in order to estimate the quality of the solutions to the arising reduced-order optimization problems, has a remarkably good efficiency in the time-dependent case. These insights motivate a POD basis update strategy which is based on the adaptive basis extension algorithm (see Algorithm 3) and is supposed to decrease the computational effort. The optimization routine for computing the n-th Pareto optimal point is shown in Algorithm 4. Algorithm 4 POD basis update algorithm Require: threshold µ >, < l min < l max, l [l min, l max ]; 1: Set check ; : while check = do 3: Solve (ERPP) l z n with starting point ūn 1 ; 4: Compute the a-posteriori estimate µ U (ū l, z n ); 5: if µ U (ū l, z n ) < µ then 6: Set check 1; 7: else 8: if l < l max 1 then 9: Set l l + ; 1: else if l = l max 1 then 11: Set l l + 1; 1: else 13: Solve (ERPP) z n with starting point ū l ; 14: Compute new POD basis by using the state y and the dual p associated with ū; 15: Choose l [l min, l max ] and set check 1. The basic idea of this algorithm consist in controlling the number of used POD basis functions by choosing an upper bound l max at the beginning of the optimization process, i.e. the POD basis is only extended as long as l max is not reached. Then, if the upper bound l max is exceeded, the current full euclidean reference point problem (ERPP) z n is solved. Afterwards, the corresponding solution ū is used for the computation of the new POD basis, which will be used for the computation of the next Pareto point, and an initial number l [l min, l max ] of POD basis functions is chosen. Using this strategy, we hope to improve the performance of the basis extension algorithm by preventing the number of used POD basis functions from getting too high. Note that increasing the value of l increases the computational effort, as lots of linear systems of higher dimension have to be solved in the course of the optimization. Furthermore, after a basis update we expect to obtain a new POD basis which captures the most characteristic dynamics of the reference point problems in the current region of the Pareto front, i.e. we expect that less POD basis functions will be required for the computation of the following Pareto points.

74 68 6 Numerical Results Another difference to the basis extension algorithm is the enlargement of l if the a-posteriori estimate µ U (ū l, z n ) of the suboptimal solution ū l to (ERPP) l zn exceeds the acceptable error µ and the upper bound l max is not reached yet. Here, the number of currently used POD basis functions is increased by instead of 1. In this way, we intend to reduce the number of used POD basis extensions. Note that after each basis extension both the POD approximated problem (ERPP) l z n and the a-posteriori estimate µ U have to be recomputed. Thereby, full solutions to the state and adjoint equation, which are associated to the suboptimal control ū l, are required in order to compute µ U. These facts make basis extensions quite expensive with regard to the computational effort. For the same reason it is not reasonable to choose l too small after a basis update. Therefore, we predefine the lower bound l min. Finally, we turn to the question how an appropriate value of l [l min, l max ] can be chosen after a POD basis computation. Of course, the straightforward solution is simply to set l = l min after each basis update. However, this strategy is obviously inefficient with regard to the number of basis extensions, since we do not know in advance how many POD basis functions will be needed for the computation of the next Pareto point. Consequently, if l is chosen too low, a lot of avoidable basis extensions will be done. On the other hand, we want to keep l as low as possible, in order to minimize the computational effort. To tackle this problem, we propose an adaptive strategy for determining l after each POD basis computation which is motivated by the a-priori estimate (5.19). Namely, we choose the smallest l [l min, l max ] such that l max i=l+1 λ i ψi Q l ψi V ε (6.) holds for an ε < µ, where Q l denotes the orthogonal projection onto the subspace V l defined in Lemma 5.31, { ψ i } i N is the POD basis and { λ i } i N are the associated eigenvalues given by (4.). Thereby, the orthogonal projection Q l can be easily computed for each l [l min, l max ] by solving a linear equation system, as illustrated in Remark 5.3. In this way we are able to obtain an estimation for the number of POD basis functions which would have been necessary in order to compute the current Pareto optimal point when using the new POD basis. However, in order to obtain an accurate estimation, ε has to be chosen appropriately. Unfortunately, we do not have any information about the size of the constant C in the estimate (5.19). Thus, we have to determine an appropriate value of ε experimentally. Note that due to the square root dependency of the error ū l ū U in the a-priori estimate, it is reasonable to chose ε µ. Table 6.3: Test 1: Results for Algorithm 4. Comp. Time #Basis extensions #Basis updates l = 1 fixed l [6, ] adaptive using ε = l [6, ] adaptive using ε = l [6, ] adaptive using ε = In the next step we turn to analysing the numerical results that we get by using the proposed basis update strategy. For this purpose, we choose the parameters c β = 1, h =.1, l min = 6, l max = and set the upper bound of the acceptable error to µ = In our first test we

75 6. Example I 69 run the optimization algorithm for different values of ε, namely for ε {4 1 6, 1 1 6, }, and compare these results with the ones we get by using a fixed number l = 1 of POD basis functions after each basis computation. The results are presented in Table 6.3 and Figure First of all, we observe that only one basis update on the Pareto optimal point P 1 is necessary in order to compute the whole Pareto front. Thereby, POD basis functions are used afterwards to compute the following optimal points. For comparison, in the adaptive basis extension algorithm, which is described in Algorithm 3, maximal 4 basis functions were used, whereby all basis extensions were conducted on P 1. Thus, by improving the POD basis, we have been able to reduce the number of used POD basis functions by almost 17% in comparison to the basis extension algorithm Number of Pareto point (a) l = 1 fixed Number of Pareto point (b) 6 l adaptive, ε = Number of Pareto point (c) 6 l adaptive, ε = Number of Pareto point (d) 6 l adaptive, ε = Figure 6.13: Test 1: Number of used POD basis functions in Algorithm 4. The instances on which the POD basis has been extended in Algorithm 4 are shown in Figure As assumed in the first part of this section, it can be seen that the condition (6.) for determining the initial value of l after a POD basis computation estimates quite well how many

76 7 6 Numerical Results POD basis functions would have been necessary in order to compute the current Pareto point. Thereby, the estimation inaccuracy and thus the number of used basis extensions decreases for a decreasing value of ε until the best result with regard to computational time is reached for ε = In this case no basis extensions are needed at all, as l is chosen exactly as necessary both at the beginning of the optimization and after a basis update. In the other two cases, 4 and basis extensions on the points P 1 and P are conducted, as 17 and 19 POD basis functions are not enough even after a basis update. Unsurprisingly, using a fixed number l = 1 of POD basis functions after each basis computation, Algorithm 4 yields the worst result. Here, 1 basis extensions are needed, since l is chosen much too small. Consequently, the computational time of the Pareto front is 18% higher than in the case of using the condition (6.) with ε = Therefore, in the following, we will use only the latter configuration in order to estimate the values of l after each POD basis computation. Note that here we consider the computation times only for the inner Pareto front, as the first and the last Pareto optimal point is always computed by the finite element method. Table 6.4: Test : Results for Algorithm 4 using ε = Comp. Time #Basis extensions #Basis updates Max. value of l l [6, 18] adaptive l [6, 19] adaptive l [6, ] adaptive l [6, 4] adaptive l [6, 8] adaptive l [6, 3] adaptive l [6, 36] adaptive Now, we run Algorithm 4 for different values of l max in order to investigate its influence on the performance of the algorithm. For this purpose, we use the same settings as above, whereby we choose ε = for the adaptive choice of l [6, l max ] after each basis computation. The results for l max {18, 19,, 4, 8, 3, 36} are presented in Table 6.4 as well as in Figures 6.14 and First of all, by looking at Table 6.4 it can be seen clearly that decreasing the upper bound l max of POD basis functions below a certain threshold increases drastically the number of conducted POD basis updates. In particular, decreasing l max from POD basis functions, for which only one basis update is needed, to 19 and 18 increases the number of POD basis updates to 6 and 11, respectively. Thereby, both cases have in common that all basis updates are conducted on the neighbouring points at the beginning of the Pareto front. This is plausible, as the error in this region of the Pareto front is the highest, see Figure 6.7. However, surprisingly, even after a POD basis update, 18 and 19 POD basis functions are not enough at the beginning. We assume that this is due to the fact that adjacent optimal controls are used in order to compute a new POD basis. Consequently, the associated solutions of the state and the adjoint equation, which are used as snapshots, have very similar dynamics. Therefore, we have only a slight improvement of the POD basis. Mathematically, this can be explained again by looking at the eigenvalues { λ n i }m i=1 of the operator Rn after each basis computation, as d i=l+1 λ n i is a measure for the quality of the POD approximation, see Theorem 4.1. The first 4 eigenvalues for l max = 19 are presented in the right plot of Figure Here, it can be seen

77 6. Example I 71 that even after a POD basis update all eigenvalues have almost identical behaviour and decay rates, which confirm our argumentation from above. It is clear that for l max 4 no POD basis updates are needed, as 4 POD basis functions were required in the basis extension algorithm in order to compute the whole Pareto front, see Table Number of Pareto point (a) 6 l 18 adaptive Number of Pareto point (b) 6 l 19 adaptive Number of Pareto point (c) 6 l 4 adaptive Number of Pareto point (d) 6 l 3 adaptive Figure 6.14: Test : Number of used POD basis functions in Algorithm 4 using ε = Unsurprisingly, by comparing the computation times for different values of l max it can be seen in Table 6.1 that the performance of Algorithm 4 depends strongly on the number of used POD basis updates and thus on the values of l max as explained above. In particular, decreasing l max from POD basis functions, for which only one POD basis update is needed, to 19 and 18 increases the computation time by 43% and 85%, respectively. This is obviously due to the disproportional increase in the number of POD basis updates, which are very expensive with respect to the computation effort, see left plot of Figure The reason for it is the fact that each POD basis update in Algorithm 4 requires the optimization of the current full Euclidean reference point problem, i.e. both the state and the adjoint equation have to be solved repeatedly

78 7 6 Numerical Results for different inputs u U ad by using the finite element method. On the other hand, increasing the value of l max to 4 decreases the computation time by about 8%, although here 4 additional POD basis functions on each Pareto point are used. This is due to the fact that in the present example only 4 POD basis functions are sufficient in order to compute the whole POD pareto front without using a POD basis update. Furthermore, we have already seen in the right plot of Figure 6.11 that arising linear systems of these dimensions can be handled very efficiently by the MATLAB internal solver. Therefore, the strategy of updating the POD basis does not pay off with regard to computation time in this case. Computation time Eigenvalue Initial eigenvalues Basis update 1 Basis update Basis update 3 Basis update 4 Basis update 5 Basis update Number of Pareto point (a) Computation time, varying l max Number of eigenvalue (b) 6 l 19 adaptive Figure 6.15: Test : Results for Algorithm 4 using ε = Another interesting aspect is the fact that increasing the value of l max from 4 to 8 and so on increases slightly the number of used POD basis functions and thus the computation time. This is due to the overestimating of the initial value of l [l min, l max ] after POD basis computation in these cases. The reason for it is the fact that increasing l max extends the sum of the weighted eigenvalues in condition 6.. Thus, a higher value of l is needed in order to fulfil this condition for ε = However, there is obviously an upper bound for the values of l which is reached by 7 in this case and is determined by the decay rate of the eigenvalues. Besides this, the estimates of l are very accurate for l max 4 such that no basis extensions are needed at all. Finally, we increase the strength of the convection term c β to and run the basis update algorithm for l max {1,, 3, 5, 8, 3, 36} in order to investigate the influence of the strength of the convection term on its performance. For this purpose, we use the same settings as in the previous experiment. First of all, it can be seen in Table 6.5 and Figure 6.16 that the results show exactly the same pattern as the ones that we got for c β = 1. However, as expected, increasing the strength of the convection term increases heavily the number of POD basis updates for the same values of l max and thus the computation time. By looking especially at the case l max = 1, we observe that now 15 POD basis updates instead of one are needed. Thereby, exactly as in the previous case, all basis updates are conducted on the neighbouring points at the beginning of the Pareto front which is due to the course of the error in the control space,

79 6. Example I 73 Table 6.5: Test 3: Results for Algorithm 4 using ε = and c β =. Comp. Time #Basis extensions #Basis updates Max. value of l l [6, 1] adaptive l [6, ] adaptive l [6, 3] adaptive l [6, 5] adaptive l [6, 8] adaptive l [6, 3] adaptive l [6, 36] adaptive see Figure 6.7. Moreover, the number of POD basis updates remains on a very high level for l max {1, } and decreases abruptly to one for l max = 3. This can be explained again by the fact that due to the more complicated dynamics of the optimal control problem the eigenvalues { λ n i }m i=1 of the operator Rn, see Figure 6.8, decrease slower than for c β = 1. Consequently, more POD basis functions are needed in order to achieve the same approximation quality. Note that in this case 5 POD basis functions are sufficient to compute the whole Pareto front without using a basis update. Thus, no POD basis updates are needed for l max Number of Pareto point (a) 6 l 1 adaptive Number of Pareto point (b) 6 l 3 adaptive Figure 6.16: Test 3: Number of used POD basis functions in Algorithm 4 using ε = and c β =. By looking at the computation times in Table 6.5, it can be seen clearly that, exactly as in the last test, the strategy of updating the POD basis does not pay off in this case either, as the best result is obtained for l max = 5 for which no POD basis updates are needed. This is due to the same reasons which were described in the previous investigation. Furthermore, compared to the case c β = 1, we observe a significant stronger overestimation of l after a POD basis computation for large values of l max which is presumably due to the slower decrease of the eigenvalues, see Figure 6.8. However, this circumstance has only a small effect on the

80 74 6 Numerical Results computation time. Additionally, we can say that on the whole the estimation of the initial value of l after a POD basis computation works quite well for ε = such that no POD basis extensions are needed at all. Basis Update Strategy without Solving the Full Problem In the last section we proposed a strategy for updating the POD basis in the optimization process which is supposed to reduce the computational effort. The basic idea of this method is to solve the full scalar optimization problem if the upper bound l max for the number of currently used POD basis functions is exceeded and to utilize its solution in order to update the POD basis. Afterwards, the new basis is used for the computation of the following Pareto optimal points. However, in the last section we have seen that solving a full optimization problem by using the finite element method is very expensive with regard to the computational effort. In order to tackle this problem, another straightforward POD basis update strategy, which does not require the optimization of the full problem, is considered in this section. Algorithm 5 shows the optimization routine for computing n-th Pareto optimal point. Algorithm 5 Alternative POD basis update algorithm Require: threshold µ >, < l min < l max, l [l min, l max ]; 1: Set check ; : while check = do 3: Solve (ERPP) l z n with starting point ū ū n 1 ; 4: Compute the a-posteriori estimate µ U (ū l, z n ); 5: if µ U (ū l, z n ) < µ then 6: Set check 1; 7: else 8: if l < l max 1 then 9: Set l l + ; 1: else if l = l max 1 then 11: Set l l + 1; 1: else 13: Compute POD basis by using the full state y and the dual p associated with ū l ; 14: Set ū ū l and choose l [l min, l max ]. As already mentioned, in comparison to Algorithm 4, we do not solve the current full Euclidean reference point problem if the upper bound l max is exceeded, but only update the POD basis by using the suboptimal solution ū l and use the new basis in order to recompute the current Euclidean reference point problem. The idea behind this approach is the assumption that the suboptimal solution ū l is close to the unknown optimal solution of (ERPP) z n so that we would obtain a snapshot space which contains the most characteristic dynamics of the present problem. Note that the required full-order solutions of the state and adjoint equation are utilized for the computation of the a-posteriori estimate µ U (ū l, z n ), which estimates the approximation quality of the suboptimal solution ū l, and thus are already available. As for Algorithm 4, the condition (6.) is used in order to determine an appropriate value of l [l min, l max ] after each POD basis computation.

81 6. Example I 75 In order to test this POD basis update strategy, we set c β = 1, h =.1, l min = 6, µ = 4 1 4, ε = and run the algorithm for l max {, 1,, 3}. Unfortunately, the optimization algorithm does not converge even for l max = 3, as it is stuck in an infinite loop at the Pareto point P 1. This can be explained by the fact that due to the stronger dynamics at the beginning of the Pareto front 3 POD basis functions are not enough even after the first POD basis update. Thus, a further basis update is required at this point. However, as optimal controls, which are very close to each other, are utilized for the POD basis computation, the basis improvements are not sufficient in order to achieve the desired approximation quality. Therefore, the POD basis is updated again and again. These results show impressively that unfortunately, at least in some cases, we are not able to avoid the computation of full scalar problems in order to use POD basis updates successfully. Choice of the Upper Bound l max In the previous investigations we have seen that performance of Algorithm 4 depends on the choice of the upper bound l max for the number of currently used POD basis functions. On the one hand, we observed that choosing l max too small leads to a high demand for POD basis updates which are very expensive with regard to the computational effort. On the other hand, using large values of l max results in strong overestimation of the initial values of l [l min, l max ] after POD basis computation in some cases and thus unnecessarily high computational effort. Moreover, the optimal value of l max depends on the optimal control problem and the strength of the convection term as well and is not known in advance. To tackle this problem we propose to choose l max adaptively by observing the convergence rates in the control space analogously to the computation of l after a POD basis update. Namely, we choose the smallest l max [4, 3] in the beginning of the optimization process such that 3 i=l max+1 λ i ψi Q lmax ψ i V δ (6.3) holds for a δ >, where Q l denotes the orthogonal projection onto the subspace V l defined in Lemma 5.31, { ψ i } i N is the POD basis and { λ i } i N are the associated eigenvalues given by (4.). Note that it is not reasonable to choose l max < 4, as POD basis updates in Algorithm 4 are very costly and thus should be used only if the number of currently used POD basis functions gets quite large, see Tables 6.4 and Table 6.5. Thereby, the upper bound of 3 POD basis functions is set in order to keep the computational effort as low as possible. Analogously to the computation of l [l min, l max ] after each POD basis computation, the value of δ has to be chosen appropriately in order to obtain optimal estimates of l max. Recall that in our experiments the best results with regard to the computation time are achieved for l max = 4 and l max = 5 in the cases c β = 1 and c β =, respectively. Therefore, we propose to use δ = 1 6 in the condition (6.3), as it provides estimates which are the closest to the optimal values of l max, see Table 6.6. Comparison of the Computation Times In the next step we want to compare the computation times for different algorithms we presented in the previous sections in order to see how much time we have been able to save by using POD

82 76 6 Numerical Results Table 6.6: Estimates of l max for different values of δ and c β. c β = 1 c β = 1.5 c β = Max. value of l in Alg δ = δ = δ = approximation. An overview of the results for c β = 1 can be seen in Table 6.7. First of all, we observe that solving the POD approximated problem by using the basis extension algorithm, see Algorithm 3, is faster than solving the full problem only by a factor of about 5. This can be explained by the fact that the associated Euclidean reference point problems are well conditioned, i.e. only to 4 Newton-CG iterations per point are needed, even at the beginning of the Pareto front. On the other hand, 18 POD basis extensions are needed in the basis extension algorithm, which is costly. Note that each basis extension requires recomputation of the same Pareto point and the a-posteriori estimate µ U, which is used to be able to decide whether more basis functions are needed or not. Moreover, to compute the a-posteriori estimate, both the state and the adjoint equation have to be solved by using the finite element method. Of course, by using a higher value of l init, we would get better results, as some POD basis extensions would be avoided. Table 6.7: Computation times for different methods and c β = 1. Comp. Time #Basis extensions #Basis updates Full system 3.3 Basis extension with l init = Alg. 4 with adaptive l [6, ] Alg. 4 with adaptive l [6, 4] 38. Alg. 4 with adaptive l max 38. By looking at the results for the POD basis update algorithm, it can be seen that, as expected, using the condition (6.3) for determining l max in Algorithm 4 provides very good results. In particular, the computation time is as good as for fixed l max = 4, for which the best result with regard to the computation time is achieved in this case. Comparing these results with the ones for the basis extension algorithm, we observe that Algorithm 4 with adaptive l max requires 5% less computation time, although here no POD basis updates are used at all. This is obviously due to the fact that in Algorithm 4 no POD basis extensions at Pareto point P 1 are required any more, as the initial value for l is chosen exactly as needed after the POD basis computation. Furthermore, the value of l max is adaptively set to 4 POD basis functions so that no overestimation of l occurs. Unfortunately, the proposed strategy of updating the POD basis does not come into effect in this example, as the best result with regard to the computation time is obtain without using it. As already mentioned, this is due to the fact that POD basis updates in Algorithm 4 are very costly, as the optimization of the full scalar problems is required. Therefore, it is reasonable

83 6. Example I 77 to conduct a POD basis update only if the number of currently used basis elements gets quite large. Though, in this case only 4 POD basis functions are needed in order to compute the whole Pareto front in the desired approximation quality. However, by looking at the result for l max =, the effect of using a POD basis update can be observed clearly. Here, we have been able to reduce the number of needed POD basis functions after basis update by 17% and to avoid all POD basis extensions. Therefore, the computation time has been decreased by 18% in comparison to the basis extension algorithm. This is a still a good result and show the potential of the POD basis update strategy, especially for more challenging optimization problems. Table 6.8: Computation times for different methods and c β =. Comp. Time #Basis extensions #Basis updates Full system 31.8 Basis extension with l init = Alg. 4 with adaptive l [6, 3] Alg. 4 with adaptive l [6, 5] Alg. 4 with adaptive l max 36.6 In Table 6.8 we can additionally observe the computation times for the convection constant c β =. All in all, the results are very similar to the ones we got for c β = 1. As in the last case, solving the POD approximated problem by using the basis extension algorithm is faster than solving the full problem by a factor of about 5, although the dynamics of the problem are significant stronger. Looking at the results for Algorithm 4, it can bee seen again that using adaptive l max provides very good results in this case as well. However, the computation time is slightly higher as in the optimal case which is obtained for fixed l max = 5. This is due to the slightly overestimation of l after the POD basis computation, as l max is adaptively set to 6. Comparing this result with the one for the basis extension algorithm, we observe a reduction of the computation time by almost 9%, which is even a better result than in the last case. By looking at the result for l max = 3, we can observe the effect of using a POD basis update in this case. Here, we have been able to save only POD basis functions per Pareto point after the POD basis update. Note that decreasing l max to led to a massive increase in the number of POD basis updates and thus to significant higher computational effort, see Table 6.5. Nevertheless, the computational time has been reduced by almost % in comparison to the basis extension algorithm. However, this is mainly due to the efficient determination of l after a POD basis computation, so that no basis extensions are needed. Note that in this case the inner Pareto front consists of 47 points instead of 5. Therefore, the computation times for the convection constant c β = are lower than for c β = 1. As a conclusion, we can say that Algorithm 4 using the adaptive strategy for determining l max works efficiently. Thus, in comparison to the basis extension algorithm, we have been able to reduce the computation time by 5% and 9% for c β = 1 and c β =, respectively.

84 78 6 Numerical Results Results for Traversing the Pareto Front from Bottom to Top Lastly, we want to test the proposed POD basis update algorithms by traversing the Pareto front from bottom to top as described in Remark 3.1. This approach appears more natural in the sense that we start with the part of the Pareto front in which the associated Euclidean reference point problems are well conditioned, i.e. the coercivity constant C is the largest and thus few Newton-CG iterations are needed, and only small dynamics are expected due to the low control input. Therefore, few POD basis functions are needed at the beginning and the POD basis can be extended gradually in the course of the optimization process with dynamics getting stronger. However, the disadvantage of this approach is the fact that the initial POD basis is computed by using a snapshot space which contains only low dynamics of the system. Consequently, we expect that more basis updates will be needed to compute the whole Pareto front for the same values of l max as in the previous experiments. Furthermore, as ū = holds, we also have to compute ū 1 by using the finite element method in order to obtain a reasonable snapshot space for the computation of the initial POD basis. Of course, this fact will affect the computation time negatively. Table 6.9: Results for Algorithm 4 when traversing the Pareto front from bottom to top. Comp. Time #Basis extensions #Basis updates Basis extension from top to bottom l [6, ] adaptive l [6, 1] adaptive l [6, ] adaptive l [6, 3] adaptive l [6, 4] adaptive l [6, 5] adaptive l [6, 6] adaptive To start with, we choose the parameters c β = 1, h =.1, l min = 6, ε = 4 1 7, set the upper bound of the acceptable error in the control space to µ = and run Algorithm 4 for l max {,..., 6}. The corresponding results are presented in Table 6.9 and Figure First of all, we observe that only 6 POD basis functions are sufficient in order to solve the first ten Euclidean reference point problems, whereby the POD basis is gradually extended afterwards. As already mentioned in the introduction of this section, this is due to dynamics which are getting stronger while traversing the Pareto front. Note that in some cases increasing l max also increases the number of used POD basis extensions and thus the computation time, even though the same number of POD basis updates is used. On the one hand, this can be explained by the fact that a higher value of l max just allows more basis extensions. On the other hand, using a smaller value of l max forces the algorithm to conduct the first basis update earlier. Consequently, the instances, at which POD basis updates are done, are usually varying for different values of l max. The results for l max = and l max = 4, which can be investigated in detail in Figure 6.17, confirm this argumentation. Note that the number of needed POD basis functions after a POD basis update is always chosen very efficiently in this approach as well so that no POD basis extensions are needed afterwards.

85 6. Example I Number of Pareto point (a) 6 l adaptive Number of Pareto point (b) 6 l adaptive Number of Pareto point (c) 6 l 4 adaptive Number of Pareto point (d) 6 l 5 adaptive Figure 6.17: Number of used POD basis functions in Algorithm 4 when traversing the Pareto front from bottom to top. In Table 6.5 we can additionally see that, analogously to the results for the computation direction top to bottom, decreasing the upper bound l max leads to a higher number of used POD basis updates and thus to a higher computational effort. However, by looking especially at the case l max = we observe that now three basis updates, instead of one, are needed in order to compute the whole Pareto front. Furthermore, all POD basis updates in this case are sequentially conducted in the second part of the Pareto front at points which are quite far apart from each other and not at consecutive Pareto optimal points at the beginning of the Pareto front. An analogous behaviour can be observed for every other value of l max as well. This can be explained by the same argumentation as above. Mathematically, it can be seen by observing the eigenvalues { λ n i }m i=1 of the operator Rn, as d λ i=l+1 n i is a measure for the quality of the POD approximation, see Theorem 4.1. In Figure 6.18 we can see clearly that the eigenvalues have almost the same behaviour after each POD basis update. However, the initial eigenvalues, which are obtained

86 8 6 Numerical Results by using the snapshot space containing the weakest dynamics of the system, are smaller by a factor of about 1 3. For the same reason, it is not reasonable to use the condition (6.3) with ε = 1 6 for the adaptive determining of l max in Algorithm 4, as we would get too small estimates for it. Eigenvalue Initial eigenvalues Basis update 1 Basis update Basis update Number of eigenvalue (a) 6 l adaptive Figure 6.18: Eigenvalues after each POD basis computation in Algorithm 4 when traversing the Pareto front from bottom to top. By observing the computation times it can be seen that, exactly as for the computation direction top to bottom, decreasing the value of l max increases the computation time which is due to the increasing demand for POD basis updates. Though, the increase is not monotonous, since the number of POD basis extensions can vary, even though the same number of POD basis updates is used. Furthermore, exactly as in the previous investigations, the strategy of updating the POD basis does not pay off in this case, as the best result with regard to the computation time is reached for l max = 6 for which no POD basis updates are needed. This is due to the same reasons. However, note that in the other approach only 4 POD basis functions were sufficient in order to obtain the best result. Comparing these results with the ones that we got for the basis extension algorithm by using the other computation direction, we can see that even in the best case we are able to decrease the computation time only by 1%. However, using Algorithm 4 with adaptive l max and the computation direction top to bottom, we have been able to reduce the computation time by 5%. The reason for the bad results by using the current approach is obviously the high demand for POD basis extensions and POD basis updates for the same values of l max. Therefore, the current approach is not attractive for the numerical applications. Finally, we want to test Algorithm 5 which does not require the optimization of full scalar problems in order to reduce the computational effort. As we start with the part of the Pareto front in which dynamics are low due to little control input, we expect better results than for the computation direction top to bottom. For this purpose, we use the same settings as in the last experiment and run the algorithm for different values of l max. In Table 6.1 and Figure 6.19

87 6. Example I 81 Table 6.1: Results for Algorithm 5 when traversing the Pareto front from bottom to top. Comp. Time #Basis extensions #Basis updates Basis extension from top to bottom l [6, 19] adaptive l [6, ] adaptive l [6, 1] adaptive l [6, ] adaptive l [6, 3] adaptive l [6, 4] adaptive l [6, 5] adaptive l [6, 6] adaptive the results for l max {19,..., 6} are summarized. First of all, we notice that the optimization routine terminates successfully for all l max {,..., 6}. Only for l max = 19 the algorithm is stuck in an infinite loop which is due to the same problem as for the computation direction top to bottom. Furthermore, exactly as in the previous test, we observe that the POD basis is gradually extended in the first half of the Pareto front from 6 POD basis function to the upper bound l max. Of course, this is an expectable result, since Algorithms 4 and 5 utilize the same POD basis extension strategy. Besides that, in Figure 6.19 it can be seen additionally that using the condition (6.) with ε = in Algorithm 5 provides accurate estimates for the number of needed POD basis functions after a POD basis update as well. An interesting aspect, however, is the number of used POD basis updates in Algorithm 5. Whereas for l max = and l max = 5 the number of used POD basis updates coincides in both algorithms, there are considerable differences in all other cases. By looking first at the results for l max = and l max = 1, it can be seen that now only two POD basis updates, instead of three in Algorithm 4, are sufficient. Thereby, the instances at which the first two basis updates have been conducted coincide for both algorithms. Therefore, the second basis update in Algorithm 5 provides presumably a POD basis which contains stronger dynamics than the one provided in Algorithm 4. We suspect that this is due to a coincidence, as in the first case we utilize the suboptimal solution ū l to the current POD approximated Euclidean reference point problem in order to recompute the POD basis and not the solution to the full problem. On the other hand, for l max = 3 and l max = 4 the number of used POD basis updates is considerably higher, although in both cases there are only two reference point problems at which the POD basis has been updated. This is due to the fact that and POD basis updates are conducted at the last instance for l max = 3 and l max = 4, respectively. This can be explained by the same reasons as for the computation direction top to bottom, where the algorithm was stuck in an infinite loop. Comparing the computation times, we first notice that by using Algorithm 5 instead of Algorithm 4 we are able to decrease the computation time by up to 15% for the same values of l max which is due to the fact that here no full problems have to be solved. However, compared to the basis extension algorithm using the computation direction top to bottom, we have a decrease in the computation time by only % in the best case. Moreover, for some values of l max Algorithm 5 does not converge properly, as it is stuck in an infinite loop. Therefore, this algorithm is not

88 8 6 Numerical Results recommended for the numerical applications either Number of Pareto point (a) 6 l adaptive Number of Pareto point (b) 6 l 3 adaptive Number of Pareto point (c) 6 l 4 adaptive Number of Pareto point (d) 6 l 5 adaptive Figure 6.19: Number of used POD basis functions in Algorithm 5 when traversing the Pareto front from bottom to top. 6.3 Example II In this section we investigate numerically the bicriterial optimal control problem ( 1 min J(u, y) = y y Q L (,1;L (Ω)) 1 u L (,1;R m ) ) (BOC)

89 6.3 Example II 83 subject to y t (t, x) κ y(t, x) + β(t, x) y(t, x) = m i=1 u i(t)χ i (x) for (t, x) (, 1) Ω y η (t, x) + α jy(t, x) = α j y a (t) for (t, x) (, 1) Γ j y(, x) = y (x) for x Ω and u a (t) u(t) u b (t) for almost all t [, 1], Temperature in degree Celsius Time of day Figure 6.: Time-dependent advection term β(t, x) at time instance t =.5 (left); temperature in Konstanz on 6 October 17 (right). where the following parameter and function values are used: The room Ω is given by the two-dimensional unit square, i.e. Ω = (, 1). The diffusion parameter is given by κ =.5. As the convection term β(t, x) for all (t, x) (, 1) Ω we use the same convection as in Example I. Namely, we use a non-stationary solution of a Navier-Stokes equation. This describes a stream which goes from the upper left to the lower right corner of the room, see Figure 6.1. In addition, there is a vortex moving through the room in the same manner. Though, the motion of the vortex ends in the lower first half of the room approximately at the time instance t =.1. Moreover, it holds β L (,T ;L (Ω;R n )) = We impose a floor heating in the whole room with 4 uniformly distributed heaters in the domains Ω 1 = (,.5), Ω = (,.5) (.5, 1), Ω 3 = (.5, 1) (,.5) and Ω 4 = (.5, 1). The boundary of the room Γ := Ω is divided into disjoint parts Γ 1 := (.8, 1), Γ := 1 (,.) and Γ 3 := Γ\(Γ 1 Γ ). For these boundary segments the isolation constants α 1 = α = 1 and α 3 =.1 are set, i.e. we allow the heat exchange with the outside world, see left plot of Figure 6.. This yields an inhomogeneous Robin boundary condition in the PDE-constraint.

90 84 6 Numerical Results The outer temperature y a (t) for t [, 1] is location-independent and is modelled by using linearly interpolated temperature measurements in Konstanz on 6 October 17 between the hours of 16: and :, see right plot of Figure 6.. All data comes from the archive of the website kachelmannwetter.com. As we expect a higher energy consumption, the bilateral control constraints are set to u a = and u b = 4.5. This yields U ad := {u U u(t) 4.5 f.a.a. t [, 1]}. As an initial condition we suppose a constant temperature of 16 Celsius in the whole room, i.e. y (x) = 16 for all x Ω. Finally, we choose y Q (t, x) = 16 + t for all (t, x) (, 1) Ω, i.e. the desired temperature is supposed to increase uniformly from 16 Celsius at the starting time to 18 Celsius at the final time T = 1. The present situation can be interpreted as heating up of the -dimensional room with opened windows which are situated in the upper left and the lower right corner of the room. In the following we will use the same notations as the previous sections Solving the Full Problem In our first experiment we run Algorithm 1 in order to analyse the results that we get by using the finite element method and compare these results with the ones obtained in Example I using the homogeneous boundary condition in the PDE-constraint. In all following calculations we will use the parameters h =.3 and h = 1.5 for generating reference points Pareto front Reference points Pareto front Reference points (a) Example II (b) Example I Figure 6.1: Pareto fronts in Example I and II. In the left plot of Figure 6.1 the corresponding Pareto front P f can be seen. First of all, we observe that P f is smoothly approximated by 54 Pareto optimal points, i.e. N Full = 5 holds.

91 6.3 Example II 85 Thereby, P f ranges from Pareto point P = (.16, 13.5) to P 53 = (1.53, ). For comparison, the Pareto front in Example I moves from P = (.199, 4.1) to P 51 = (.6667, ), see the right plot of Figure 6.1. Thus, the desired temperature is achieved significantly worse as in the previous example in the upper part of the Pareto front, although the heating coasts J are about three times higher in this case. On the one hand, this can be explained by the fact that boundary segments Γ 1 and Γ do not have any isolation which results in a continuous heat loss and thus in a significantly higher energy consumption. On the other hand, in Figure 6. it can be seen additionally that the second heater in the optimal control ū is active on the upper bound of the constraints in the middle of the heating process. Consequently, the second heater is prevented from heating more. Note that the value of.16 is only an approximation for min u Uad Ĵ 1 (u), as there is still a weight of. on the heating costs Ĵ in the computation of the first Pareto optimal point P Region 1 Region Region 3 Region Region 1 Region Region 3 Region 4 Control.5 Control Time Time (a) ū, Example II (b) ū 6, Example II Control Region 1 Region Region 3 Region 4 Control Region 1 Region Region 3 Region Time (c) ū 5, Example II Time (d) ū, Example I Figure 6.: Optimal controls in Example I and II. Next, we want to investigate the influence of the heat exchange with the outer world on the

86 6 Numerical Results optimal controls. For this purpose, we choose three optimal controls which are situated at the top, the middle and the bottom of the Pareto front.

92 86 6 Numerical Results optimal controls. For this purpose, we choose three optimal controls which are situated at the top, the middle and the bottom of the Pareto front. First of all, it can be seen in Figure 6. that, analogous to Example I, all three optimal controls have the same qualitative behaviour which only differs in scale. In particular, each heater of a control adapt to the air flow, which goes from the upper left to the lower right corner of the room, by using different heating strategies. However, by looking especially at ū, we observe that, in contrast to the last case, heaters do not decrease monotonically, but they increase gradually in the first half of the Pareto front until the maximum is reached and decrease rapidly to afterwards. This is obviously due to the increasing heat loss caused by the sinking outer temperature in the last half of the time interval. Furthermore, the air stream coming through the boundary segment Γ 1 into the room is getting colder over time, see right plot of Figure 6.3. Consequently, a higher heating input is required in order to reach the desired temperature. Comparing additionally heating strategies associated with ū, it can be seen that the heater two has to heat the most. This is an expectable result, since the cold air streaming into the room has to be heated up. Consequently, as the heated air is transported from the second region into the others, mainly into regions three and four, heaters three and four need to heat the least. Furthermore, exactly as in Example I, all heaters show a wavy behaviour at the beginning. Again, this is due to the temporal changes in dynamics of the system caused by a vortex moving over time from the upper left corner to the middle of the room Figure 6.3: Deviation from the desired temperature y Q (left); temperature distribution at T = 1 associated with ū (right). The effect of the optimal heating strategies on the deviation from the desired temperature y Q can be seen in the left plot of Figure 6.3, where the graph of the mapping t ŷ(t, ) + Sū(t, ) y Q (t) L (Ω) is presented for ū {ū, ū 6, ū 5 }. We observe that in the beginning the temperature distribution approximates quite accurately the desired temperature for ū and ū 6. Thereby, only in the second half of the time interval we can see a deviation which increases exponentially, as the heating input decreases rapidly to for all optimal controls. However, in contrast to Example I,

Reduced-Order Greedy Controllability of Finite Dimensional Linear Systems. Giulia Fabrini Laura Iapichino Stefan Volkwein

Universität Konstanz Reduced-Order Greedy Controllability of Finite Dimensional Linear Systems Giulia Fabrini Laura Iapichino Stefan Volkwein Konstanzer Schriften in Mathematik Nr. 364, Oktober 2017 ISSN