OPTIMAL SENSOR PLACEMENT FOR JOINT PARAMETER AND STATE ESTIMATION PROBLEMS IN LARGE-SCALE DYNAMICAL SYSTEMS WITH APPLICATIONS TO THERMO-MECHANICS

Size: px

Start display at page:

Download "OPTIMAL SENSOR PLACEMENT FOR JOINT PARAMETER AND STATE ESTIMATION PROBLEMS IN LARGE-SCALE DYNAMICAL SYSTEMS WITH APPLICATIONS TO THERMO-MECHANICS"

Allan Jacob Heath
5 years ago
Views:

OPTIMAL SENSOR PLACEMENT FOR JOINT PARAMETER AND STATE ESTIMATION PROBLEMS IN LARGE-SCALE DYNAMICAL SYSTEMS WITH APPLICATIONS TO THERMO-MECHANICS Roland Herzog Ilka Riedel Dariusz Uciński February 7,

These unknown quantities must be estimated from partial state observations over a time window. A data assimilation framework is applied for this purpose.

1 OPTIMAL SENSOR PLACEMENT FOR JOINT PARAMETER AND STATE ESTIMATION PROBLEMS IN LARGE-SCALE DYNAMICAL SYSTEMS WITH APPLICATIONS TO THERMO-MECHANICS Roland Herzog Ilka Riedel Dariusz Uciński February 7, 2017 We consider large-scale dynamical systems in which both the initial state and some parameters are unknown. These unknown quantities must be estimated from partial state observations over a time window. A data assimilation framework is applied for this purpose. Specifically, we focus on large-scale linear systems with multiplicative parameter-state coupling as they arise in the discretization of parametric linear time-dependent partial differential equations. Another feature of our work is the presence of a quantity of interest different from the unknown parameters, which is to be estimated based on the available data. In this setting, we develop a simplicial decomposition algorithm for an optimal sensor placement and set forth formulae for the efficient evaluation of all required quantities. As a guiding example, we consider a thermo-mechanical PDE system with the temperature constituting the system state and the induced displacement at a certain reference point as the quantity of interest. Technische Universität Chemnitz, Faculty of Mathematics, Professorship Numerical Mathematics (Partial Differential Equations), D Chemnitz, Germany, roland.herzog@mathematik.tu-chemnitz.de, Technische Universität Chemnitz, Faculty of Mathematics, Professorship Numerical Mathematics (Partial Differential Equations), D Chemnitz, Germany, ilka.riedel@mathematik.tu-chemnitz.de, University of Zielona Góra, Institute of Control and Computation Engineering, ul. Podgórna 50, Zielona Góra, Poland, d.ucinski@issi.uz.zgora.pl,

2 1. INTRODUCTION In this paper, we consider joint parameter and state estimation problems for large-scale dynamical systems of the form { E ẋ(t) = A(p) x(t) + f (t), t [0, t f ], x(0) = x 0 R n (1.1). Here x(t) R n is the state vector, E R n n is a non-singular matrix, f (t) R n signifies a known forcing input, p R q stands for a set of system parameters, and A(p) R n n is a matrix representing parameter dependent dynamics. The purpose of our estimation procedure is to infer an estimate ˆx 0 of the unknown initial state x 0 as well as an estimate ˆp of the unknown parameters p from partial measurements y j = C y x(t j ) + η j R m, j = 1,..., N (1.2) of the state trajectory evaluated at sampling instants t 1,..., t N which are fixed in a given time horizon [0, t f ]. Here η j R m denotes measurement noise which accounts for factors such as measurement errors and inadequacies of the mathematical model (1.1). We adopt a Bayesian setting, which means that while estimating x 0 and p some information about them is available. Once determined, the estimates ˆx 0 and ˆp are supposed to be plugged into the model (1.1) so as to produce an estimate ˆx(t f ) of the terminal state x(t f ) and then finally yield the estimate ẑ = C z ˆx(t f ) of a quantity of interest (QOI) z, which depends linearly on the terminal state x(t f ), z = C z x(t f ) R r, (1.3) where C z R r n is given. In practice, the measurements of the observable quantity y are subject to measurement error. Logically, the output noise propagates into the estimate of (x 0, p), thereby influencing the estimate of the QOI z. The amount of perturbation in ẑ depends on the matrix C y which encodes which parts of the state trajectory are being observed. It is the purpose of this paper to optimize the measurement matrix C y in order to minimize the influence of the measurement error on the estimate of the QOI, in a sense to be made precise below. We envision that the state vector x(t) R n is high-dimensional and it represents a distributed quantity, as for instance in the discretization of timedependent partial differential equations. It is assumed that the measurement matrix C y consists of m distinct rows of the n n identity matrix. In this setting, the optimization of C y can be understood as choosing optimal sensor locations. We point out that we consider the sensors to be static here. NOTATION. Throughout the paper, R + and R ++ stand for the sets of nonnegative and positive real numbers, respectively. We adopt the convention that all vectors have 2

3 column form. The set of real m n matrices is denoted by R m n. We use sym m to denote the set of symmetric m m matrices, S m + to denote the set of symmetric nonnegative definite m m matrices, and S m ++ to denote the set of symmetric positive definite m m matrices. The symbol id n denotes the n n identity matrix. The symbol 1 n denotes a vector whose components are all equal to one. Given two vectors x and y of dimension n, x y is an n-vector whose i-th component is x i y i (the componentwise multiplication operator). Finally, the symbol conv({q 1,..., q l }) denotes the convex hull of a set of vectors q i, i = 1,..., l. MOTIVATION: THERMO-MECHANICAL PDE SYSTEM As a motivation to consider sensor placement problems for systems of type (1.1), we mention an application described by a thermo-mechanical PDE system. More details are given in Section 5. Suppose that the temperature T of a machine tool constitutes the state of the system and it is governed by the heat equation endowed with boundary conditions ρ c p Ṫ div(λ T) = 0, λ n T + α(x) (T T ref) = r(x, t) describing the heat flux. The heat transfer coefficient, α(x), depends on the spatial position x and it subsumes various physical phenomena, such as convective and radiative heat transfer. Its true value is therefore unknown and must be estimated from a time series of temperature measurements. A second unknown is the initial temperature state T 0 (x), which arises from previous operation of the machine and is impossible to be measured directly. The right hand side r(x, t) represents heat sources acting on the machine tool. A table describing these correspondences with the model (1.1) is provided as Table 5.2. It is not our primary goal to estimate the temperature distribution of the machine at time t f, but rather to estimate the QOI, that is the displacement of a certain relevant point of the machine structure induced by that temperature. Notice that thermally induced displacements can be the source of dominating positioning errors in machine tools. It is the precision of the estimation of these displacements that we are concerned with. To increase this precision, we wish to find optimal locations of temperature (state) sensors on the machine s surface. RELATED WORK AND STRUCTURE OF THE PAPER Let us put our paper into perspective. In the absence of unknown parameters p in (1.1), the estimation of the terminal state x(t f ) in a dynamical model such as (1.1) from previous measurements of the state is known as a data assimilation problem, see, e.g., (Freitag and Potthast, 2013; Law et al., 2015; Cacuci et al., 2014). Notice that unknown 3

4 parameters could be easily incorporated by declaring them as artificial state variables satisfying ṗ(t) = 0. We do not follow this approach but prefer to keep p and x separate. Such joint parameter and state estimation problems were considered, for instance, in Kühl et al. (2011); Küpper et al. (2009). A key design problem in state and/or parameter estimation of distributed parameter systems (DPSs) consists in properly deploying the available measurement sensors. Logically, they should be placed at sites which provide the most valuable information about the estimated quantities. As it is desirable to determine best sensor positions before the actual data collection, the issue that must be primarily addressed is the appropriate choice of the optimality criterion. As for state estimation, various criteria quantifying observability were employed in deterministic scenarios (El Jai and Pritchard, 1988), whereas in stochastic settings the research was focused on minimizing criteria which aggregated the covariance matrix of the estimation error, see (Kubrusly and Malebranche, 1985) for the state of the art in the mid-1980s. Since the Kalman filter, which was the main tool to produce state estimates, was hard to implement in realistic settings due to its prohibitive computational and memory requirements, this line of research was abandoned for nearly two decades, and then revived interest in it was observed in the framework of variational data assimilation (Cacuci et al., 2014) or spatial statistics (Cressie and Wikle, 2011). In turn, sensor location for parameter estimation usually follows the traditional approach of statistical experimental design (Atkinson et al., 2007; Pázman, 1986; Pronzato and Pázman, 2013; Pukelsheim, 2006) and is based on various scalar measures of performance defined on the Fisher information matrix (FIM) associated with the estimated parameters. The inverse of the FIM constitutes the Cramér-Rao bound to the covariance matrix of the estimates. The approach dates back to the work of Uspenskii and Fedorov (1975), whose ideas were then extended by Rafajłowicz (1981, 1986). A comprehensive overview of this currently very active research area is contained in the monograph (Uciński, 2005). Over the past decade communications about sensor location have continued to grow. Results regarding various types of PDEs have been reported, e.g., for reaction-diffusion or convection-diffusion problems (Alonso et al., 2004a; Armaou and Demetriou, 2006; Alonso et al., 2004b; García et al., 2007), as well as for models in fluid dynamics (Mokhasi and Rempfer, 2004; Cohen et al., 2006; Willcox, 2006; Yildirim et al., 2009). By the same token, the problem has been considered in numerous applications, e.g., in environmental and water resource systems (Sun and Sun, 2015), for mechanical deformation problems (Yi et al., 2011; Meo and Zumpano, 2005), as well as in sensor networks (Song et al., 2009). A great difficulty in estimation of DPSs arises due to the infinite dimensional nature of the parameter space. Some theoretical problems, such as existence of a least-squares estimator, continuous dependence of the estimator on the data and convergence of approximations, require compactness of the parameter space. If these aspects are not properly addressed, the estimation process may be ill-posed in the sense that noise in the data may give rise to significant errors in the estimate. Therefore, techniques 4

5 known as regularization methods have been developed to deal with this ill-posedness, e.g., Tikhonov regularization (Vogel, 2002). They, however, hardly ever consider the statistical aspects of the estimation problem. Alternatively, a Bayesian framework can be employed, which quite naturally makes it possible to take account of prior statistical information of the unknown parameters and/or states. Bayesian methods, unlike asymptotic methods of classical statistics, turn out to be well-suited theoretically and computationally to infinite dimensional parameter spaces and can well handle the above-mentioned theoretical problems (Fitzpatrick, 1991). Unfortunately, sensor location for Bayesian inference in DPSs (or, in general, estimation combined with regularization) has not been sufficiently considered yet. Recent research, however, points to some breakthrough in this area, especially in the context of variational data assimilation. Gejadze and Shutyaev (2012) approached the problem of efficiently evaluating the gradient of the A-optimality criterion with respect to the spatial coordinates of the sensors for estimating the initial condition of a onedimensional Burgers equation with a nonlinear viscous term. To this end, they used a limited-memory approximation of the inverse Hessian of the data assimilation cost function (up to a multiplier, the Hessian is equal to the FIM associated with the coefficients of a finite-dimensional parametrization of the initial state). The cost of the attendant computations is substantially reduced by extensive use of adjoint equations. In turn, selection of an optimal subset of candidate sensor locations have been studied by Alexanderian et al. (2014) for estimation of the initial state of a three-dimensional advection-diffusion equation. The optimality criterion was the trace of the posterior covariance, implemented in practice through a randomized trace estimator. Substantial computational savings result from using a randomized SVD to get a low-rank surrogate for the prior-preconditioned parameter-to-observable map. Efficiency is additionally increased by specifying the covariance operator of the Gaussian prior as the inverse of an elliptic differential operator, which can be evaluated using fast solvers for elliptic PDEs. A successful attempt to generalize this approach to a parameter estimation problem (i.e., a nonlinear inverse problem) for inferring a coefficient field in a two-dimensional elliptic problem has been made in (Alexanderian et al., 2016). This inspired the formulation we use in our paper. In an earlier paper Herzog and Riedel (2015), we focused on sensor placement problems for thermo-mechanical systems, but in the absence of a dynamical system (1.1). To be precise, the temperature field was estimated directly from instantaneous measurements and in a reduced-order temperature space. This is not possible here, since the heat transfer coefficient α is considered unknown. Notice that an estimation of α is only possible in a time-dependent model. The particular features of the problem at hand and novelties in the present paper compared with previous work on sensor placement are the following. The presence of the QOI prevents us from using directly the Fisher information matrix (FIM) of the (x 0, p)- estimation problem to formulate the objective for the optimal sensor placement problem. Instead, we must use the (approximate) covariance matrix of the QOI estimator, which involves the solution map of a linearized state system. Since we assume the di- 5

6 mension of the QOI to be much lower than the state dimension (r n), we employ an adjoint technique to evaluate that covariance matrix efficiently. In order to solve the sensor placement problem, we employ a simplicial decomposition algorithm, which was analyzed in Patriksson (1999) and Bertsekas (2015). To solve the main subproblem we make use of the classical multiplicative algorithm which goes back to Silvey et al. (1978), but needs to be adapted to the objective at hand. We refer the reader to Torsney (2009); Yu (2010) for a historical overview. Basically, while solving the relaxed convex sensor selection problem (Problem 3.2) we could adapt the approach outlined by Joshi and Boyd (2009), which advocates an interiorpoint method. As will be shown, however, the implementation of simplicial decomposition is strikingly easy, the algorithm usually runs very fast, and most often the solutions produced by it are rather sparse (i.e., the number of nonzero weights is low). Sparsity may be quite an acute problem as far as relaxed solutions are concerned and usually requires augmenting the criterion by sparsifying penalty functions (Chepuri and Leus, 2015; Alexanderian et al., 2014; Haber et al., 2010, 2008). The linear programming subproblem built in simplicial decomposition seems to successfully retain a moderate number of nonzero weights. Due to the multiplicative coupling of parameters p and state vector x(t) in (1.1), the covariance of the QOI is going to depend not only on the measurement matrix C y but also on the unknown parameters p themselves (but not on the unknown initial state x 0 ). Often, this feature is addressed in sensor placement or similar experimental design problems by embedding the latter in a robust formulation, where the unknown parameter is confined to an uncertainty set. This significantly adds to the level of complexity of the problem; see, e.g., (Uciński, 2005; Pronzato and Pázman, 2013) or (Körkel et al., 2004; Diehl et al., 2006; Bock et al., 2007). In this paper, we focus on the sensor placement problem for systems of type (1.1) in the presence of a QOI and therefore content ourselves with a given set-point (nominal value) p 0 in the parameter space. In Section 2, we formulate the data assimilation problem, which is used to jointly estimate the unknown initial state x 0 and the parameters p. The sensor placement problem is addressed in Section 3 and we propose a simplicial decomposition algorithm for its solution in Section 4. Subsequently, we elaborate on a specific thermo-mechanical system modeling a machine tool, where the temperature constitutes the system state x(t) and the thermo-mechanically induced displacement at a certain reference point (the tool center point, or TCP) serves as the quantity of interest z. We seek optimal locations of temperature sensors on the surface of the machine in order to obtain an accurate estimate of the TCP displacement. The details are given in Section 5 and illustrated with numerical results in Section DATA ASSIMILATION PROBLEM We consider the dynamical system (1.1) with state x(t) R n, unknown initial state x 0 R n and unknown parameter vector p R q. We assume that measurements (1.2) of 6

7 certain parts of the state trajectory are taken at given measurement times t j, j = 1,..., N during the time interval [0, t f ] under consideration. The measurements y j are subject to measurement errors η j, j = 1,..., N, which we assume to be i.i.d. random variables with normal distribution N (0, V y ), where V y = σ 2 id m. This means that the components of each η j are independent zero-mean random variables with the same variance σ 2, or equivalently, that the measurements from different sensors are independent of one another and that their accuracy is the same. The unknowns in the model (1.1) are x 0 and p. However, our prior (background) information are their prior estimates x bg 0 and p bg which are supposed to be realizations of Gaussian random vectors with means x 0 R n and p R q, and covariance matrices V x0 R n n and V p R q q, respectively, i.e., x bg 0 N ( x 0, V x0 ) a p bg N ( p, V p ). Here x 0 and p are unknown and interpreted as the true initial state and the true parameter, respectively. In turn, as for V x0 and V p, we assume that they are known and positive definite, and hence invertible. As is usually the case in data assimilation problems, the number of unknowns (n + q) exceeds the number of measurements (N m). Consequently, regularization terms are needed expressing the above-mentioned prior information about the unknowns. We thus state our data assimilation as follows, cf. Cacuci et al. (2014): min x 0 R n, p R q J DA(x 0, p) = 1 2 x 0 x bg Vx p pbg 2 Vp N j=1 y j C z x(t j ; x 0, p) 2, V 1 y (2.1) where the term x(t j ; x 0, p) is the solution to (1.1) at sampling time t j evaluated at given x 0 and p 0. In order to solve the nonlinear least-squares problem (2.1), one can employ a standard derivative-based method such as the Gauss-Newton or Levenberg-Marquardt algorithms; see for instance (Nocedal and Wright, 2006, Section 10.3). In order to formulate the Jacobian of the model output w.r.t. the unknowns (x 0, p), we introduce the sensitivities X 0 (t) = x 0 x(t; x 0, p) R n n and X p (t) = p x(t; x 0, p) R n q of the state x(t; x 0, p) with respect to the initial state x 0 and the parameters p. By the implicit function theorem, it follows from (1.1) that X 0 is given by the linear system { E Ẋ0 (t) = A(p)X 0 (t), t [0, t f ], X 0 (0) = I n (2.2) and X p satisfies { E Ẋp (t) = A (p) x(t) + A(p)X p (t), t [0, t f ], X p (0) = 0 R n p. (2.3) 7

8 Note that, for simplicity of notation, we let A (p) x(t) stand for the Jacobian matrix of the mapping p A(p) x(t) with respect to p while holding x(t) constant. Using the Chain Rule Theorem (Magnus and Neudecker, 1999, Thm. 12, p. 108), we easily deduce that ( ) A (p) x(t) = p A(p) x x=x(t) = ( x(t) ) vec A(p) I n (2.4) = [ A(p) p 1... p A(p) p q ] (idq x(t) ), where vec is the column-stacking operator and signifies the Kronecker product. Due to the linearity of the output equation (1.2), the sensitivity of the model output to changes in (x 0, p) is given by C y x(t j ; x 0, p) (x 0, p) = C y [ X0 (t j ) X p (t j ) ] R m (n+q), j = 1,..., N. (2.5) The data assimilation problem (2.1) can be written as a weighted least-squares problem of the form 1 min x 0 R n, p R q 2 r(x 0, p) H r(x 0, p) (2.6) with the residual vector r(x 0, p) = x 0 x bg 0 p p bg y 1 C y x(t 1 ; x 0, p). y N C y x(t N ; x 0, p) and the symmetric non-negative definite weight matrix H = diag ( Vx 1 0, Vp 1, Vy 1,..., Vy 1 ). }{{} N times R n+q+n m (2.7) The Jacobian of the residual can be computed from the sensitivities defined above in the following way: id n 0 J(x 0, p) = r(x 0 id 0, p) q (x 0, p) = C y X 0 (t 1 ) C y X p (t 1 ). (2.8).. C y X 0 (t N ) C y X p (t N ) 8

9 Notice that for a large state dimension n, the sensitivity trajectory X 0 : [0, t f ] R n n will be of a formidable size. Also, since the number of model outputs and measurements, N m, is typically smaller than the number of unknowns n + q, it is more economical to evaluate the Jacobian using the adjoint technique. We will now show that one single adjoint variable S : [0, t f ] R n m is enough to attain this objective. To this end, consider one typical output estimate ŷ j = C y x(t j ; x 0, p) of the actual output y j. Adjoin (1.1) to this estimate with an arbitrary time-varying Lagrange multiplier matrix S j (t) R n m as follows: ŷ j = C y x(t j ; x 0, p) tj + S j (t) [ ] E ẋ(t; x 0, p) A(p) x(t; x 0, p) f (t) dt (2.9) 0 }{{} =0 Let us integrate the S j (t) E ẋ(t) term in (2.9) by parts, yielding ŷ j = [ C y + S j (t j ) E ] x(t j ) S j (0) E x(0) tj 0 [Ṡj (t) E + S j (t) tj A(p)]x(t) dt S j (t) f (t) dt. Differentiating both the sides of (2.10) with respect to x 0, we thus get C y X 0 (t j ) = ŷ j x 0 = [ C y + S j (t j ) E ] X 0 (t j ) S j (0) E X 0 (0) tj 0 [Ṡj (t) E + S j (t) A(p)]X 0 (t) dt. 0 (2.10) (2.11) To avoid having to determine the function X 0 (t), we choose the multiplier function S j (t) so that the coefficients of X 0 (t) and X 0 (t j ) vanish, i.e., we specify it as the solution to the following backwards-in-time adjoint differential equation: { E Ṡ j (t) = A(p) S j (t), t [0, t j ], E S j (t j ) = Cy (2.12). Equation (2.11) then becomes C y X 0 (t j ) = S j 0 (0) EX 0 (0) = S j 0 (0) E. (2.13) In turn, differentiating both the sides of (2.10) with respect to p, we get C y X p (t j ) = ŷ j p = [ C y + S j (t j ) E ] X p (t j ) S j (0) E X p (0) tj 0 tj 0 [Ṡj (t) E + S j (t) A(p)]X p (t) dt S j (t) ( A (p) x(t) ) dt (2.14) 9

10 But on account of (2.12) and the initial condition for (2.3), this simplifies to tj C y X p (t j ) = S j (t) ( A (p) x(t) ) dt. (2.15) 0 Consequently, the block row of the Jacobian (2.8) associated with the output at time t j can be expressed as [ Cy X 0 (t j ) C y X p (t j ) ] [ tj = S j (0) E S j (t) ( A (p) x(t) ) ] dt. It is now important to observe that (2.12) is an autonomous system. Therefore, S j (t) = S k (t t j + t k ) holds whenever both are defined. We conclude that in place of N different systems of type (2.12) it is enough to consider a single adjoint system for the adjoint state S : [0, t f ] R n m { E Ṡ(t) = A(p) s(t), t [0, t f ], 0 E S(t f ) = C y. (2.16) Since S j (t) = S(t t j + t f ) holds, each block row of the Jacobian can be evaluated according to [ Cy X 0 (t j ) C y X p (t j ) ] [ = S(t f t j ) E tj 0 S(t t j + t f ) ( A (p) x(t) ) dt ]. (2.17) We provide in Table 3.1 an overview over the quantities required during the solution of the data assimilation problem (2.1) by gradient-based methods. 3. SENSOR PLACEMENT PROBLEM 3.1. COVARIANCE OF THE QOI ESTIMATOR Having solved the data assimilation problem (2.1), we obtain estimates ˆx 0 and ˆp of the sought true values x 0 and p, respectively. In the sequel, we shall concatenate x 0 and p so as to have only one vector of unknown true parameters θ = ( x 0, p) and its estimates ˆθ = ( ˆx 0, ˆp). As was mentioned in the introduction, our main concern is not to estimate the unknown initial state x 0 or the parameter vector p directly, but rather to estimate a quantity of interest z depending on the terminal state x(t f ) at time t f, z = C z x(t f ; θ) R r (3.1) through ẑ = C z x(t f ; ˆθ) R r (3.2) 10

11 with r small compared with the dimension n of the state variable. To be able to assess the quality of the estimator (3.2), we investigate the expected dispersion of the estimates produced by it, which is quantified by the covariance matrix Cov(ẑ). Clearly, the QOI z depends on the unknowns (x 0, p) in an indirect way, and its dependence on p is nonlinear. Therefore, to obtain an expression for the covariance matrix of the estimator ẑ is a real challenge. That is why we follow here a standard approach in the literature, cf. Mehra (1974), and resort to the covariance of a linearized estimator, which is obtained by linearizing the parameter-to-qoi map. This approach is backed up by asymptotical considerations; see for instance (Pronzato and Pázman, 2013, Chapter 3). From now on, let θ 0 = (x0 0, p0 ) denote a given set-point in the parameter space (we may set θ 0 = θ bg = (x bg 0, pbg )), where (3.1) is linearized. An application of the chain rule, applied to (3.1) and (2.2) (2.3), shows that this linearization is given by the matrix Q = z θ = C z X(t f ; θ 0 ) R r d, (3.3) θ=θ 0 where here and subsequently for abbreviation we write d = n + q and X(t; θ) = [ X 0 (t; θ) X p (t; θ) ]. Consequently, the covariance of the linearized QOI estimator is related via Cov(ẑ) = Q Cov( ˆθ) Q (3.4) to the covariance Cov( ˆθ) of the parameter estimator ˆθ. Throughout the paper we assume that the matrix Q has full row rank: rank Q = r. (3.5) In order to form the matrix Q, we exploit the similarity of (3.3) and (2.5) and follow an adjoint approach. To be precise, we solve the additional adjoint system for S Q : [0, T] R n r { E Ṡ Q (t) = A(p 0 ) S Q (t), t [0, t f ], and evaluate E S Q (t f ) = C z Q = [ C z X 0 (t f ; θ 0 ) C z X p (t f ; θ 0 ) ] [ t = S Q (0) f E S Q (t) ( A (p 0 ) x(t) ) dt 0 ]. (3.6) (3.7) The problem of characterizing and evaluating Cov( ˆθ) has extensively been investigated by researchers concerned with variational data assimilation. Gejadze et al. outlined an approach to obtain approximations to the covariance matrices for the data assimilation 11

12 Quantity defined in evaluate using requires r(x 0, p) residual (2.7) (2.7) solution x of (1.1) J(x 0, p) Jacobian (2.8) (2.17) solution S of (2.16) Q Jacobian of QOI (3.3) (3.7) solution S Q of (3.6) Table 3.1: Overview of quantities for the solution of the data assimilation and the sensor placement problem, and how to evaluate them efficiently. problem in which either the initial state x 0 or the parameter vector p are unknowns, see Gejadze et al. (2008) and Gejadze et al. (2010), respectively, as well as Gejadze et al. (2013); Gejadze and Shutyaev (2012). It is rather straightforward to combine these results in our problem of joint estimation of x 0 and p, thereby obtaining Cov( ˆθ) ( V 1 N θ + X(t j ) Cy Vy 1 C y X(t j ) ) 1, (3.8) j=1 where V θ = diag(v x0, V p ) and X(t j ) = X(t j ; θ). The dependence of the right-hand side on the true vector θ is not surprising, as it is a rule as long as estimates of the covariance matrices of various estimators are constructed in settings where the outputs depend nonlinearly on the estimated parameters. Clearly, we do not know θ and, in practice, we approximate it by a preliminary estimate θ 0 (e.g., a logical choice is θ 0 = θ bg ) THE CRITERION TO BE OPTIMIZED Our optimal design problem consists in determining an m-element subset selected out of a total of n state variables, which would yield the lowest variability in the estimates of the QOI as measured by the covariance matrix (3.4). In order to express this formally, we define a decision variable which is the n-dimensional vector w whose component w i is zero if x i is supposed to be measured and zero if x i in not going to be measured. In consequence, the observation matrix takes the form C y (w) = D(diag(w)), (3.9) where D stands for the operation of forming a submatrix of its matrix arguments by deleting all zero rows. Since we assume that the measurements of the observed state components are independent of one another and taken by equally accurate sensors, i.e., V y = σ 2 id m for some known variance σ 2, it follows that Cov( ˆθ) I(w) 1, (3.10) 12

13 where I(w) = V 1 θ + 1 σ 2 N j=1[x(t j )] diag(w) X(t j ) = V 1 n θ + i=1 w i Υ i, (3.11) Υ i = 1 σ 2 N j=1 row i (X(t j )) row i (X(t j )), i = 1,..., n. (3.12) Here row i signifies the i-th row of its matrix argument. We call I(w) the Bayesian information matrix for θ, cf. Chepuri and Leus (2015). Observe that the positive definiteness of V x0 and V p implies that of V θ, and this, in turn, forces I(w) to be positive definite (since the term i=1 n w iυ i is nonnegative definite). Consequently, there is no problem with the inversion of I(w). For the intended search for an optimal w, we have to introduce the appropriate optimality criterion. As nonnegative-definite matrices can be only partially ordered, instead of directly comparing the covariance matrices for different choices of the output matrix, a scalar performance index Ψ defined on Cov( ˆθ) can be used here. Thus, our sensor selection problem can be ultimately expressed as the optimization problem: Problem 3.1 (Sensor Selection Problem). Find a vector w bin Rn to minimize subject to the constraints J (w) = Ψ ( Q I(w) 1 Q ) (3.13) 1 n w = m, (3.14) w i {0, 1}, i = 1,..., n. (3.15) In the role of Ψ, various alphabetical optimality criteria commonly used in experimental design can be considered. Specifically, three possible criteria follow: (i) D Q -optimality (or generalized D-optimality), which corresponds to Ψ = log det, J (w) = log det ( Q I(w) 1 Q ), (3.16) (ii) A Q -optimality (or generalized A-optimality), which corresponds to Ψ = trace, J (w) = trace ( Q I(w) 1 Q ), (3.17) (iii) E Q -optimality (or generalized E-optimality), which corresponds to Ψ = λ max, J (w) = λ max ( Q I(w) 1 Q ), (3.18) 13

14 where λ max is the maximal eigenvalue of its matrix argument. See (Atkinson et al., 2007, p. 137) or (Silvey, 1980, p. 10) for justification of this terminology and notation. Different optimality criteria may produce different solutions to Problem 3.1, but this results from the their slightly different interpretations in terms of the uncertainty ellipsoid for the estimates ẑ. Roughly speaking, a D Q -optimum design minimizes its volume, an A Q -optimum design suppresses the mean squared length of its axes, and an E Q -optimum design minimizes the length of its largest axis. In what follows, our attention will be focused on the D Q -optimality criterion (3.16). Note that the assumption (3.5) implies rank Q I(w) 1 Q = rank Q = r, (3.19) see, e.g., the Range Inclusion Lemma in (Pukelsheim, 2006, p. 17), which clearly demonstrates that Q I(w) 1 Q is always nonsingular RELAXED SENSOR SELECTION PROBLEM Owing to the combinatorial nature of Problem 3.1, which may make its solution intractable even for small-scale problems, we relax it by replacing the non-convex Boolean constraints w i {0, 1} with the convex box constraints w i [0, 1]. Thus we get the following convex relaxed sensor selection problem: Problem 3.2 (Relaxed Sensor Selection Problem). Find a vector w R n to minimize subject to the constraints J (w) = Ψ ( Q I(w) 1 Q ) ( = Ψ Q ( V 1 ) ) 1Q (3.20) w i Υ i n θ + i=1 1 n w = m, (3.21) 0 w i 1, i = 1,..., n. (3.22) It goes without saying that the above relaxed problem is not equivalent to the original problem, as some components of the computed optimal solution w may be fractional and not binary. It is however by no means useless, as J (w ) constitutes a lower bound to J (wbin ) solving Problem 3.1. What is more, rounding up m largest components of w to one and the remaining components to zero, we can produce a suboptimal solution for Problem 3.1. This option is typical for sensor selection problems, see, e.g., Joshi and Boyd (2009). What is more, solutions to Problem 3.2 can be embedded into a general branch-and-bound scheme to yield a solution wbin, see (Uciński and Patan, 2007) for details. Problem 3.2 possesses a number of notable features which, in theory, should make its solution straightforward. First of all, note that the performance index J (w) is convex over the convex feasible set W defined by the constraints (3.21) and (3.22), being the 14

15 intersection of a hyperplane and a hyperbox. The convexity results from the fact that, under the assumption (3.5), the mapping Φ : M log det(q M 1 Q ) is convex on the set of positive-definite R d d matrices (Marshall et al., 2011, Theorem 16.F.4, p. 688). What is more, J is differentiable with where φ(w) := J (w) = [ φ 1 (w),..., φ n (w) ], (3.23) φ i (w) = trace ( Φ (I(w)) Υ i ) R, (3.24) where Φ (X) d dx Φ(X) signifies the matrix derivative of Ψ(X) of a matrix argument X R d d, which is the d d matrix whose (i, j) entry is Φ(X)/ X (j,i), cf. (Bernstein, 2005, p. 410). As I R d d is positive definite, we have Φ (I) = d di log det(q I 1 Q ) = I 1 Q ( Q I 1 Q ) 1 D I 1, (3.25) cf. (Bernstein, 2005, Eqn. ( ), p. 411). Substituting this into (3.24) and using the cyclic commutativity of the trace of a product of matrices, we get φ i (w) = trace (( Q I(w) 1 Q ) 1 Q I(w) 1 Υ i I(w) 1 Q ), i = 1,..., n. (3.26) As the feasible set W is a rather nice convex set, numerous computational methods can potentially be employed for solving Problem 3.2, e.g., the conditional gradient method or a gradient projection method. Unfortunately, if the number of the support points n is large, which is rather a common situation in applications, then these algorithms require additional efforts regarding implementation in order to avoid unsatisfactory computational times. On the other hand, an extremely simple multiplicative algorithm (Silvey et al., 1978; Yu, 2010) is available to maximize the D Q -optimality criterion over the canonical simplex. Its idea is reminiscent of the EM algorithm used for maximum likelihood estimation and a decisive advantage is ease of implementation. In what follows, it will be shown how this multiplicative algorithm can be built into a very simple and efficient computational scheme in which account of the additional upper-bound constraint in (3.22) is taken. The principal tool in its construction will be simplicial decomposition. 4. SIMPLICIAL DECOMPOSITION FOR PROBLEM ALGORITHM MODEL Simplicial decomposition (SD) proved extremely useful for large-scale pseudoconvex programming problems encountered, e.g., in traffic assignment or other network flow 15

16 problems (Patriksson, 1999). In its basic form, it proceeds by alternately solving linear and nonlinear programming subproblems, called the column generation problem (CGP) and the restricted master problem (RMP), respectively. In the RMP, the original problem is relaxed by replacing the original constraint set W with its inner approximation being the convex hull of a finite set of feasible solutions. In the CGP, this inner approximation is improved by incorporating a point in the original constraint set that lies furthest along the gradient direction computed at the solution of the RMP. This basic strategy has been discussed and extended in numerous references (Bertsekas, 2015; Patriksson, 1999). A marked characteristic of the SD method is that the sequence of solutions to the RMP tends to a solution of the original problem in such a way that the objective function strictly monotonically approaches its optimal value. The SD algorithm may be viewed as a form of modular nonlinear programming, provided that one has an effective computer code for solving the RMP, as well as access to a code which can take advantage of the linearity of the CGP. One of the aims of this paper is to show that this is the case within the framework of Problem 3.2. What is more, since we deal with minimization of the convex function J over a bounded polyhedral set W, this will automatically imply the convergence of the resulting SD scheme in a finite number of RMP steps (Bertsekas, 2015). Tailoring the SD scheme to our needs, we obtain Algorithm 1. In the sequel, its consecutive steps will be discussed in turn CHARACTERIZATION OF THE OPTIMAL DESIGN AND TERMINATION OF ALGORITHM 1 In the original SD setting, the criterion for terminating the iterations is checked only after solving the column generation problem. The computation is then stopped if the current point w (k) satisfies the condition of nondecrease, to first order, in performance measure value in the whole constraint set, i.e., min w W φ(w(k) ) (w w (k) ) 0. (4.8) The condition (4.4) is less costly in terms of the number of floating-point operations. It results from the following characterization of w which has the property that J (w ) = min w W J (w). Theorem 4.1. A vector w constitutes a global minimum of J over W if, and only if, there exists a number λ such that λ if w φ i (w i = 1, ) = λ if 0 < wi < 1, (4.9) λ if pi = 0 for i = 1,..., n. 16

17 Algorithm 1 Algorithm model for solving Problem 3.2 via simplicial decomposition. Step 0: (Initialization) Guess an initial solution w (0) W such that I(w (0) ) is nonsingular. Set I = { 1,..., n }, G (0) = { w (0)} and k = 0. Step 1: (Termination check) Set I (k) ub = { i I w (k) i = 1 }, (4.1) I (k) im = { i I 0 < w (k) i < 1 }, (4.2) I (k) lb = { i I w (k) i = 0 }. (4.3) If λ if i I (k) φ i (w (k) ub, ) = λ if i I (k) im, λ if i I (k) lb for some positive λ, then STOP and w (k) is optimal. (4.4) Step 2: (Solution of the column generation subproblem, CGP) Compute g (k+1) = arg min w W φ(w(k) ) w (4.5) and set If g (k+1) conv(g (k) ), then STOP G (k+1) = G (k) { g (k+1)}. (4.6) Step 3: (Solution of the restricted master subproblem, RMP) Find w (k+1) = arg min Ψ ( Q I(w) 1 Q ), (4.7) w conv(g (k+1) ) and purge G (k+1) of all extreme points with zero weights in the resulting expression of w (k+1) as a convex combination of elements in G (k+1). Increment k by one and go back to Step 1. 17

18 The proof of this result proceeds in much the same way as that of Proposition 1 in (Uciński and Patan, 2007) SOLUTION OF THE COLUMN GENERATION SUBPROBLEM In Step 2 of Algorithm 1 we deal with the linear programming problem minimize c w subject to w W, (4.10) where c = φ(w (k) ), in which the feasible region is defined by 2n bound constraints (3.22) and one equality constraint (3.21). Making use of this special form of the constraints, we can develop an algorithm to solve this problem, which is almost as simple as a closed-form solution. The key idea is to make use of the following assertion which can be demonstrated in much the same way as Theorem 4.1. Theorem 4.2. A vector g W constitutes a global solution to the problem (4.10) if, and only if, there exists a scalar ρ such that for i = 1,..., n. c i ρ if g i = 1, = ρ if 0 < g i < 1, ρ if g i = 0 (4.11) We thus see that, in order to solve (4.10), it is sufficient to pick m largest components c i of c and set the corresponding weights g i as one, and the remaining weights as zero SOLUTION OF THE RESTRICTED MASTER SUBPROBLEM Suppose that in the (k + 1)-th iteration of Algorithm 1, we have G (k+1) = { g 1,..., g l}, (4.12) possibly with l < k + 1 owing to the built-in deletion mechanism of points in G (j), 1 j k, which did not contribute to the convex combinations yielding the corresponding iterates w (j). Step 3 of Algorithm 1 involves minimization of the design criterion (3.20) over conv ( { G (k+1)) l = v j g j j=1 From the representation of any w conv ( G (k+1)) as w = l v j = 1, v j 0, j = 1,..., l j=1 }. (4.13) l v j g j, (4.14) j=1 18

19 or, in component-wise form, w i = g j i being the i-th component of gj, it follows that I(w) = V 1 n θ + i=1 w i Υ i = l v j g j i, i = 1,..., n, (4.15) j=1 l ( v j V 1 j=1 n θ + i=1 ) g j i Υ i = l v j I(g j ). (4.16) j=1 From this, we see that the RMP can equivalently be formulated as the following problem: Problem 4.3. Find the sequence of weights v R l to minimize subject to the constraints P(w) = log det ( Q H(v) 1 Q ) (4.17) 1 l v = 1, (4.18) v j 0, j = 1,..., l (4.19) where H(v) = l v j H j, H j = I(g j ), j = 1,..., l. (4.20) j=1 Basically, since the constraints (4.18) and (4.19) define the probability simplex in R l, i.e., a very nice convex feasible domain, it is intuitively appealing to determine optimal weights using a numerical algorithm specialized for solving convex optimization problems. Note, however, that this formulation has already captured close attention in optimum experimental design theory, where various characterizations of optimal solutions and efficient computational schemes have been proposed (Atkinson et al., 2007). In particular, in the case of the D Q -optimality criterion studied here, we can employ the General Equivalence Theorem of (Uciński, 2005, Theorem 3.2, p. 48) to get the following conditions for global optimality: Theorem 4.4. A vector v constitutes a global solution to Problem 4.3 if and only if ψ j (v ) { = r if v j > 0, r if v j = 0 (4.21) for each j = 1,..., l, where ψ j (v) = trace (( Q H(v) 1 Q ) 1 Q H(v) 1 H j H(v) 1 Q ), j = 1,..., l. (4.22) 19

20 A very simple multiplicative algorithm (Yu, 2010) can be adapted to the above RMP. It is summarized in Algorithm 2. Although only its monotonicity can be proven for the D Q - optimality criterion, and not global convergence, cf. (Yu, 2010), in practice it behaves flawlessly. As an alternative, an interior-point method has recently been proposed by Lu and Pong (2013), for which global convergence is guaranteed, but at the cost of a much more complicated implementation. Algorithm 2 Algorithm model for the restricted master problem. Step 0: (Initialization) Select a weight vector v (0) with positive components which sum up to one, e.g., set v (0) = (1/r)1 l. Set κ = 0. Step 1: (Termination check) If then STOP. Step 2: (Multiplicative update) Evaluate 1 r ψ(v(κ) ) 1 l (4.23) v (κ+1) = 1 r ψ(v(κ) ) v (κ). (4.24) Increment κ by one and go to Step APPLICATION TO A THERMO-MECHANICAL SYSTEM In this section, we descibe in more detail the application of the sensor placement procedure for a certain thermo-mechanical system. To be more precise, we consider the temperature evolution T(x, t) of the machine tool column depicted in Figure 5.1. We denote the solid body of the machine column by Ω and its surface by Γ. The temperature evolution is governed by the linear heat equation, ρ c p Ṫ div(λ T) = 0 in Ω (0, t f ), λ n T + α(x) (T T ref) = r(x, t) on Γ (0, t f ), T(x, 0) = T 0 (x) in Ω. The boundary conditions represent a simplified model for the heat transfer occurring at the different parts of the machine s surface. Since the underlying heat transfer mechanism includes both convective and radiative phenomena, the value of the effective coefficient α(x) is considered unknown and also dependent on the spatial position x. We (5.1) 20

Sensor Placement for Joint Parameter and State Estimation Herzog, Riedel, Ucin ski make here the following ansatz: q α( x) := α k χ k ( x ), (5.

The value of α is fixed to zero on those two areas where the two heat sources act, which are expressed through the right hand side r ( x, t).

They originate from an electrical drive mounted on top of the machine column and on the other hand through the spindle driving the horizontal movement of the column, see Figure 5.1(c).

21 Sensor Placement for Joint Parameter and State Estimation Herzog, Riedel, Ucin ski make here the following ansatz: q α( x) := α k χ k ( x ), (5.2) k =1 where each χk is an indicator function with values in {0, 1}, which selects a certain portion of the machine s surface Γ. Here the surface of the machine is divided into five parts. The value of α is fixed to zero on those two areas where the two heat sources act, which are expressed through the right hand side r ( x, t). The heat sources are assumed to be known and described in Section 6 where numerical results are presented. They originate from an electrical drive mounted on top of the machine column and on the other hand through the spindle driving the horizontal movement of the column, see Figure 5.1(c). On the remaining q = 4 parts of the surface, the heat transfer coefficients α1,..., α4 need to be estimated but some background information αbg is available. We have 12 W K 1 m 2 on the vertical surfaces, 10 W K 1 m 2 and 8 W K 1 m 2 on the horizontal planes with the outer normal facing upwards and downwards, respectively, and 5 W K 1 m 2 on all enclosed surfaces, including the inner surfaces of the cavities; see Figure 5.1(c). At those surface parts where the electrical drives are mounted, the heat transfer coefficient α( x) is zero. All symbols occuring in (5.1) are summarized in Table 5.1. (a) Photograph of the (b) CAD model with machine column. mounting points determining the TCP location. (c) Background values of αbg. Figure 5.1: Auerbach ACW 630 machine column. We now switch to a spatial finite element model of (5.1) w.r.t. a basis { ϕi }, i = 1,..., n. In our computations, we are using the standard nodal basis composed of piecewise linear, continuous elements on a tetrahedral grid of the geometry depicted in Figure 5.1(b). In a slight abuse of notation, we denote the coefficient vector representing the temperature field T also by T. By converting (5.1) to its weak formulation and restricting it to 21

22 Symbol Meaning Value Units T temperature K r thermal surface load W m 2 ρ density kg m 3 c p specific heat at constant pressure 500 J kg 1 K 1 λ thermal conductivity 46.8 W K 1 m 1 T ref ambient temperature 20 C α bg background information on α 0 to 12 W K 1 m 2 α heat transfer coefficient unknown W K 1 m 2 T 0 initial temperature unknown K Table 5.1: Table of symbols associated with the thermal model. the finite element space, we arrive at the following semi-discretized version of (5.1) M Ṫ(t) + K T(t) + q k=1 α k M k (T(t) T ref ) = r(t), t [0, t f ], T(0) = T 0. Here M and M k denote mass and boundary mass matrices, respectively, and K is the stiffness matrix: ( ) ( ) M = ρ c p ϕ i ϕ j dx, M k = ϕ i ϕ j χ k dx, Ω i,j Γ i,j ( ) K = λ ϕ i ϕ j dx Ω i,j with indices i, j = 1,..., n. T ref is a coefficient vector in R n with identical entries. The right hand side vector r(t) represents the load vector generated by the given boundary heat sources: ( ) r(t) = r(x, t) ϕ j dx. j Γ Finally, we recall that the coefficient vector T 0 representing the initial temperature distribution n T 0 (x) = T 0,j ϕ j (x) j=1 is unknown. It is clear that the finite element model (5.3) is of the form (1.1) when the identifications given in Table 5.2 are made. Our model output y(t) = C y T(t), which is adjusted to the temperature measurements during the data assimilation process, is described by the measurement matrix C y. In the present setting, we wish to use as potential measurement locations all finite element (5.3) 22

23 mesh nodes which are located on the surface of the machine column. Therefore, C y is composed of all rows of the n n identity matrix corresponding to the surface degrees of freedom. The specific form of the adjoint system (2.16) reads M Ṡ(t) = K S(t) M S(t f ) = C y. q k=1 α k M k S(t), t [0, t f ], Notice that the symmetry of M and K has been used. The block rows of the Jacobian according to (2.17) are [ tj S(t f t j ) M S(t t j + t f ) [ M 1 T(t) M 4 T(t) ] ] dt. (5.5) 0 (5.4) Symbol in (1.1) Symbol in (5.3) Remark x T p α unknown x 0 T 0 unknown E A(p) f (t) M K q α k M k k=1 r(t) + q α k M k T ref k=1 Table 5.2: Correspondence of symbols in the general dynamical system (1.1) and the finite element model of the heat equation (5.3). We recall that our emphasis is not on the estimation of the temperature distribution of the machine, but rather on the estimation of the QOI, i.e., the temperature-induced displacement of a certain reference point of the machine structure, at time t f. The overall displacement field is governed by a quasi-static linear elasticity model since the time scale of the heat equation is unable to generate wave motion in the machine structure. The linear elasticity model is based on the balance of forces, div σ ( ε(u), T(t f ) ) = 0 in Ω. (5.6) We employ an additive split of the stress tensor σ into its mechanically and thermally induced parts. (An alternative, equivalent approach would apply such a split to the strains.) Together with the usual homogeneous and isotropic stress-strain relation, we obtain the following constitutive law; see (Boley and Weiner, 1960, Section 1.12), (Eslami et al., 2013, Section 2.8): 23

24 σ ( ε(u), T(t f ) ) = σ el (ε(u)) + σ th (T(t f )), σ el (ε(u)) = E 1 + ν ε(u) + Eν trace(ε(u)) id, (1 + ν)(1 2ν) σ th (T(t f )) = E 1 2ν β ( ) T(t f ) T ref id3. Herein, ε denotes the linearized strain tensor ε(u) = 1 2 ( u + u ). (5.7) The elasticity modulus E and Poisson ratio ν of the cast iron machine column are given. For convenience, all quantities relevant for the displacement model are summarized in Table 5.3. Symbol Meaning Value Units u displacement m σ stress N m 2 ε strain 1 ν Poisson s ratio E modulus of elasticity N m 2 β thermal volumetric expansion coefficient K 1 L length of the main spindle m l auxiliary quantity, see Appendix A m σ standard deviation of temperature sensors K Table 5.3: Table of symbols associated with the displacement model. We continue by a specification of the mechanical boundary conditions for the elasticity equation (5.6) (5.7). The machine column is free to move in the X-direction on the rail by which it connects to the machine bed, see Figure 5.1(a). Movements in Y and Z- directions are prohibited. Moreover, the machine column is connected by a spindle nut to the spindle in the machine bed which drives the horizontal movement during operation. This leads to the following mixture of essential and natural boundary conditions for (5.6) (5.7): u 2 = 0, u 3 = 0, [σ n] 1 = 0 on Γ rail, u = 0 on Γ nut, σ n = 0 on Γ \ ( (5.8) ) Γ nut Γ rail. The third boundary condition expresses the absence of boundary loads on the remainder of the surface. 24

25 We discretize (5.6) (5.8) by standard nodal (vector-valued) linear finite elements on the same mesh employed for the discretization of the heat equation (5.1). This leads to a stationary, discrete problem of the following form: K u + F (T(t f ) T ref ) = 0, (5.9) where K is the stiffness matrix and F is a matrix associated with the thermally induced stress. Clearly, the solution map T(t f ) u taking the terminal temperature to the induced displacement is affine. Our quantity of interest z = u(x TCP ) R r with r = 3 is the displacement at a certain reference point x TCP, the tool center point. As TCP, we use the tip of the main spindle (holding the tool) seen in the left of Figure 5.1(a). We consider the main spindle assembly as a rigid body which is thermally insulated from the machine column. Consequently, the TCP displacement is determined by the displacement at the four mounting points x 1,..., x 4 of the sledge holding the main spindle, see Figure 5.1(b). The dependence u(x TCP ) = N(u(x 1 ),..., u(x 4 )) is nonlinear, and we refer the reader to (Herzog and Riedel, 2015, Section 3.2) for more details. Here we are only interested in the linearization C z of the map described by (5.9), T(t f ) u u(x TCP ) at the constant reference temperature T ref. By the chain rule, it is evident that C z = N (0) K 1 F R 3 n holds. The specific form of N (0) is given in Appendix A. Clearly, it is advantageous to evaluate C z in an adjoint fashion according to C z = F K N (0). This amounts to the solution of only r = 3 adjoint elasticity equations with point sources acting in x 1,..., x 4. With the matrix C z available, the output matrix Q can be evaluated by solving the adjoint system (3.6) and applying (3.7). For the forward system (5.3) under consideration, this amounts to solving MṠ Q (t) = K S Q (t) MS Q (t f ) = C z q k=1 for S Q : [0, t f ] R n r and subsequently evaluating [ Q = S Q (0) M α k M k S Q (t), t [0, t f ], t f S Q (t) [ M 1 T(t) M 4 T(t) ] dt 0 The symmetry of M and K has been used in these formulas. ]. (5.10) 25

26 6. NUMERICAL RESULTS In this section we present some numerical results. We focus on the sensor placement problem and its solution by the simplicial decomposition sensor placement method described in Algorithm 1. The algorithm is applied to the thermo-mechanical system described in Section 5. We therefore assume that the set-point θ 0 = (T 0 0, α0 ) is given and no data assimilation problem needs to be solved DESCRIPTION OF PROBLEM DATA We fix the set-point of the initial temperature state equal to the ambient temperature, i.e., T 0 0 (x) T ref. The set-point of the heat transfer parameter α 0 (x) varies over different parts of the boundary and it is zero were the heat sources are applied, see (5.2) and Figure 5.1(c). We have chosen typical values for the heat transfer coefficient, 12 W K 1 m 2 if x Γ vert (vertical surfaces), 10 W K 1 m 2 if x Γ up (horizontal surfaces facing up), α 0 (x) = 8 W K 1 m 2 if x Γ down (horizontal surfaces facing down), 5 W K 1 m 2 if x Γ inner (enclosed surfaces), 0 W K 1 m 2 if x Γ r1 Γ r2 (surfaces with heat sources). The inverse covariance matrices for the initial state and for the parameter were chosen as Vx 1 0 = M (finite element mass matrix) and Vp 1 = id 4. The machine column experiences the influence of two heat sources, see Figure 5.1(c). One originates from an electrical drive mounted on the top of the machine column (Γ r1 ) and the other one from the spindle driving the horizontal movement of the column (Γ r2 ). The heat sources are described by 6700 W m 2 if x Γ r1 and 0s t 2400s, 2700 W m 2 if x Γ r2 and 0s t 4800s, r(x, t) = 6700 W m 2 if x Γ r1 and 4800s < t 7200s, 0 else. All calculations are done in the time interval [0 s, 7200 s]. The standard deviation of the measurements was assumed to be σ = DISCRETIZATION As described in Section 5, we used a finite element model with a standard nodal basis of piecewise linear, continuous elements for the temperature T as well as for the displacement u on a tetrahedral grid of the geometry depicted in Figure 5.1(b). The size of the mesh can be seen in Table 6.1 All finite element nodes on the boundary are potential 26

27 number of mesh nodes number of mesh cells n = number of nodes on the boundary (potential sensor locations) Table 6.1: Size of the finite element mesh. sensor positions in the sensor placement problem. In order to compute the required quantities for the sensor placement problem, particularly the Jacobian J and the output matrix Q, we need to solve the time-dependent forward system (5.1), the adjoint system for the sensitivities (5.4) and the adjoint system (5.10) for the matrix Q. For the forward system we employed the implicit Euler method with time step length t = 360 s. The adjoint systems with discretized with the consistent adjoint time stepping scheme. The measurements y(t j ) = C y T(t j ) were taken at the same time instances t j = j t, j = 1,..., N = 20, which occur during integration EFFICIENT IMPLEMENTATION Notice that the sensitivities X(t j ) = [X 0 (t j ), X p (t j )] R n (n+q), j = 1,..., N as well as the matrices Υ i R (n+q) (n+q), i = 1,..., n are dense and therefore would require a large amount of memory to store. Moreover, the assembly of the matrix I(w) appearing in the evaluation of φ i during the CGP step, see (4.5) and (3.26), and during the RMP step (4.7) of Algorithm 1 would be computationally rather expensive. Here we take advantage of the fact that only the product h = I(w) 1 Q is needed to compute all required quantities in. Instead of forming I(w), we therefore solve I(w) h = Q by means of r = 3 calls to a preconditioned conjugate gradient method based on matrix-vector-products with I(w), which are much more economical to implement. As preconditioner we use the background covariance matrix V 1 θ = diag(vx 1 0, Vp 1 ) = diag(m, id 4 ). Similar considerations apply to the evaluation of ψ j in (4.22). In addition, the computation of φ(w) and ψ(v), see (3.26) and (4.22), as well as the matrix-vector products with the FIM I(w) are executed in parallel with N = 20 threads, where each thread j only uses sensitivity information X(t j ) for time step j RESULTS AND PERFORMANCE For practical purposes the termination criteria for the simplicial decomposition problem (4.4) and for the restricted master problem (4.23) are implemented only up to certain tolerances. In (4.4) a weight w i is considered zero (one), if it is below 0.05 (above 27

28 0.95) and hence i is taken to belong to the set I lb (I ub ). Afer solving the RMP, a column g j is purged if the corresponding v j is below All values for tolerances as well as maximal iteration numbers for solving both problems can be found in Table 6.2. Parameter maximal number of iterations for SDP 40 Value zero weight in termination check for SDP 0.05 unit weight in termination check for SDP 0.95 tolerance in termination check for SDP 0.01 tolerance for purging columns in RMP 0.05 maximal number of iterations for RMP 30 tolerance in termination check for RMP 0.01 Table 6.2: Parameters used in Algorithm 1. The sensor placement problem for the thermo-mechanical system described in Section 5 was solved for the setting described in Section 6.1 with a desired number of m = 10 sensors. Algorithm 1 stopped after 6 iterations, because the column generated in the CGP was already contained in the previous column set G. In this case the RMP to be solved would be the same as in the step before and no further progress could be achieved. The computation took about 2.5 h, further detail about the performance of the algorithm are listed in Table 6.3. time for computation of sensitivities (5.5) time for computation of φ(w) 15 min 150 s average number of RMP steps per SDP step 18 time for RMP step 70 s number of SDP steps 6 average time for SDP step 1360 s overall time 2.5 h Table 6.3: Computation times for the application of Algorithm 1. Figure 6.1 shows the evolution of the distribution of measurement weights for each SDP step over all possible sensor locations, which were all boundary nodes of the FE mesh. The final solution is achieved practically after 4 iterations, which is also reflected in the objective values Ψ ( Q I(w) 1 Q ) in Figure 6.2(b). The optimal sensors are all placed in the vicinity of the two heat sources, see Figure 6.2(a). Since the D Q -criterion targets the volume of the confidence ellipsoid of the QOI (TCP 28

29 Figure 6.1: Evolution of the measurement weights w (k) during SPD iterations. (a) Optimal sensor positions (m = 10). (b) Objective values Ψ ( Q I(w (k) ) 1 Q ) over iteration number. Figure 6.2: Optimal sensors and objective values. 29

On construction of constrained optimum designs

On construction of constrained optimum designs Institute of Control and Computation Engineering University of Zielona Góra, Poland DEMA2008, Cambridge, 15 August 2008 Numerical algorithms to construct