OPTIMAL SENSOR PLACEMENT FOR JOINT PARAMETER AND STATE ESTIMATION PROBLEMS IN LARGE-SCALE DYNAMICAL SYSTEMS WITH APPLICATIONS TO THERMO-MECHANICS

Size: px
Start display at page:

Download "OPTIMAL SENSOR PLACEMENT FOR JOINT PARAMETER AND STATE ESTIMATION PROBLEMS IN LARGE-SCALE DYNAMICAL SYSTEMS WITH APPLICATIONS TO THERMO-MECHANICS"

Transcription

1 OPTIMAL SENSOR PLACEMENT FOR JOINT PARAMETER AND STATE ESTIMATION PROBLEMS IN LARGE-SCALE DYNAMICAL SYSTEMS WITH APPLICATIONS TO THERMO-MECHANICS Roland Herzog Ilka Riedel Dariusz Uciński February 7, 2017 We consider large-scale dynamical systems in which both the initial state and some parameters are unknown. These unknown quantities must be estimated from partial state observations over a time window. A data assimilation framework is applied for this purpose. Specifically, we focus on large-scale linear systems with multiplicative parameter-state coupling as they arise in the discretization of parametric linear time-dependent partial differential equations. Another feature of our work is the presence of a quantity of interest different from the unknown parameters, which is to be estimated based on the available data. In this setting, we develop a simplicial decomposition algorithm for an optimal sensor placement and set forth formulae for the efficient evaluation of all required quantities. As a guiding example, we consider a thermo-mechanical PDE system with the temperature constituting the system state and the induced displacement at a certain reference point as the quantity of interest. Technische Universität Chemnitz, Faculty of Mathematics, Professorship Numerical Mathematics (Partial Differential Equations), D Chemnitz, Germany, roland.herzog@mathematik.tu-chemnitz.de, Technische Universität Chemnitz, Faculty of Mathematics, Professorship Numerical Mathematics (Partial Differential Equations), D Chemnitz, Germany, ilka.riedel@mathematik.tu-chemnitz.de, University of Zielona Góra, Institute of Control and Computation Engineering, ul. Podgórna 50, Zielona Góra, Poland, d.ucinski@issi.uz.zgora.pl,

2 1. INTRODUCTION In this paper, we consider joint parameter and state estimation problems for large-scale dynamical systems of the form { E ẋ(t) = A(p) x(t) + f (t), t [0, t f ], x(0) = x 0 R n (1.1). Here x(t) R n is the state vector, E R n n is a non-singular matrix, f (t) R n signifies a known forcing input, p R q stands for a set of system parameters, and A(p) R n n is a matrix representing parameter dependent dynamics. The purpose of our estimation procedure is to infer an estimate ˆx 0 of the unknown initial state x 0 as well as an estimate ˆp of the unknown parameters p from partial measurements y j = C y x(t j ) + η j R m, j = 1,..., N (1.2) of the state trajectory evaluated at sampling instants t 1,..., t N which are fixed in a given time horizon [0, t f ]. Here η j R m denotes measurement noise which accounts for factors such as measurement errors and inadequacies of the mathematical model (1.1). We adopt a Bayesian setting, which means that while estimating x 0 and p some information about them is available. Once determined, the estimates ˆx 0 and ˆp are supposed to be plugged into the model (1.1) so as to produce an estimate ˆx(t f ) of the terminal state x(t f ) and then finally yield the estimate ẑ = C z ˆx(t f ) of a quantity of interest (QOI) z, which depends linearly on the terminal state x(t f ), z = C z x(t f ) R r, (1.3) where C z R r n is given. In practice, the measurements of the observable quantity y are subject to measurement error. Logically, the output noise propagates into the estimate of (x 0, p), thereby influencing the estimate of the QOI z. The amount of perturbation in ẑ depends on the matrix C y which encodes which parts of the state trajectory are being observed. It is the purpose of this paper to optimize the measurement matrix C y in order to minimize the influence of the measurement error on the estimate of the QOI, in a sense to be made precise below. We envision that the state vector x(t) R n is high-dimensional and it represents a distributed quantity, as for instance in the discretization of timedependent partial differential equations. It is assumed that the measurement matrix C y consists of m distinct rows of the n n identity matrix. In this setting, the optimization of C y can be understood as choosing optimal sensor locations. We point out that we consider the sensors to be static here. NOTATION. Throughout the paper, R + and R ++ stand for the sets of nonnegative and positive real numbers, respectively. We adopt the convention that all vectors have 2

3 column form. The set of real m n matrices is denoted by R m n. We use sym m to denote the set of symmetric m m matrices, S m + to denote the set of symmetric nonnegative definite m m matrices, and S m ++ to denote the set of symmetric positive definite m m matrices. The symbol id n denotes the n n identity matrix. The symbol 1 n denotes a vector whose components are all equal to one. Given two vectors x and y of dimension n, x y is an n-vector whose i-th component is x i y i (the componentwise multiplication operator). Finally, the symbol conv({q 1,..., q l }) denotes the convex hull of a set of vectors q i, i = 1,..., l. MOTIVATION: THERMO-MECHANICAL PDE SYSTEM As a motivation to consider sensor placement problems for systems of type (1.1), we mention an application described by a thermo-mechanical PDE system. More details are given in Section 5. Suppose that the temperature T of a machine tool constitutes the state of the system and it is governed by the heat equation endowed with boundary conditions ρ c p Ṫ div(λ T) = 0, λ n T + α(x) (T T ref) = r(x, t) describing the heat flux. The heat transfer coefficient, α(x), depends on the spatial position x and it subsumes various physical phenomena, such as convective and radiative heat transfer. Its true value is therefore unknown and must be estimated from a time series of temperature measurements. A second unknown is the initial temperature state T 0 (x), which arises from previous operation of the machine and is impossible to be measured directly. The right hand side r(x, t) represents heat sources acting on the machine tool. A table describing these correspondences with the model (1.1) is provided as Table 5.2. It is not our primary goal to estimate the temperature distribution of the machine at time t f, but rather to estimate the QOI, that is the displacement of a certain relevant point of the machine structure induced by that temperature. Notice that thermally induced displacements can be the source of dominating positioning errors in machine tools. It is the precision of the estimation of these displacements that we are concerned with. To increase this precision, we wish to find optimal locations of temperature (state) sensors on the machine s surface. RELATED WORK AND STRUCTURE OF THE PAPER Let us put our paper into perspective. In the absence of unknown parameters p in (1.1), the estimation of the terminal state x(t f ) in a dynamical model such as (1.1) from previous measurements of the state is known as a data assimilation problem, see, e.g., (Freitag and Potthast, 2013; Law et al., 2015; Cacuci et al., 2014). Notice that unknown 3

4 parameters could be easily incorporated by declaring them as artificial state variables satisfying ṗ(t) = 0. We do not follow this approach but prefer to keep p and x separate. Such joint parameter and state estimation problems were considered, for instance, in Kühl et al. (2011); Küpper et al. (2009). A key design problem in state and/or parameter estimation of distributed parameter systems (DPSs) consists in properly deploying the available measurement sensors. Logically, they should be placed at sites which provide the most valuable information about the estimated quantities. As it is desirable to determine best sensor positions before the actual data collection, the issue that must be primarily addressed is the appropriate choice of the optimality criterion. As for state estimation, various criteria quantifying observability were employed in deterministic scenarios (El Jai and Pritchard, 1988), whereas in stochastic settings the research was focused on minimizing criteria which aggregated the covariance matrix of the estimation error, see (Kubrusly and Malebranche, 1985) for the state of the art in the mid-1980s. Since the Kalman filter, which was the main tool to produce state estimates, was hard to implement in realistic settings due to its prohibitive computational and memory requirements, this line of research was abandoned for nearly two decades, and then revived interest in it was observed in the framework of variational data assimilation (Cacuci et al., 2014) or spatial statistics (Cressie and Wikle, 2011). In turn, sensor location for parameter estimation usually follows the traditional approach of statistical experimental design (Atkinson et al., 2007; Pázman, 1986; Pronzato and Pázman, 2013; Pukelsheim, 2006) and is based on various scalar measures of performance defined on the Fisher information matrix (FIM) associated with the estimated parameters. The inverse of the FIM constitutes the Cramér-Rao bound to the covariance matrix of the estimates. The approach dates back to the work of Uspenskii and Fedorov (1975), whose ideas were then extended by Rafajłowicz (1981, 1986). A comprehensive overview of this currently very active research area is contained in the monograph (Uciński, 2005). Over the past decade communications about sensor location have continued to grow. Results regarding various types of PDEs have been reported, e.g., for reaction-diffusion or convection-diffusion problems (Alonso et al., 2004a; Armaou and Demetriou, 2006; Alonso et al., 2004b; García et al., 2007), as well as for models in fluid dynamics (Mokhasi and Rempfer, 2004; Cohen et al., 2006; Willcox, 2006; Yildirim et al., 2009). By the same token, the problem has been considered in numerous applications, e.g., in environmental and water resource systems (Sun and Sun, 2015), for mechanical deformation problems (Yi et al., 2011; Meo and Zumpano, 2005), as well as in sensor networks (Song et al., 2009). A great difficulty in estimation of DPSs arises due to the infinite dimensional nature of the parameter space. Some theoretical problems, such as existence of a least-squares estimator, continuous dependence of the estimator on the data and convergence of approximations, require compactness of the parameter space. If these aspects are not properly addressed, the estimation process may be ill-posed in the sense that noise in the data may give rise to significant errors in the estimate. Therefore, techniques 4

5 known as regularization methods have been developed to deal with this ill-posedness, e.g., Tikhonov regularization (Vogel, 2002). They, however, hardly ever consider the statistical aspects of the estimation problem. Alternatively, a Bayesian framework can be employed, which quite naturally makes it possible to take account of prior statistical information of the unknown parameters and/or states. Bayesian methods, unlike asymptotic methods of classical statistics, turn out to be well-suited theoretically and computationally to infinite dimensional parameter spaces and can well handle the above-mentioned theoretical problems (Fitzpatrick, 1991). Unfortunately, sensor location for Bayesian inference in DPSs (or, in general, estimation combined with regularization) has not been sufficiently considered yet. Recent research, however, points to some breakthrough in this area, especially in the context of variational data assimilation. Gejadze and Shutyaev (2012) approached the problem of efficiently evaluating the gradient of the A-optimality criterion with respect to the spatial coordinates of the sensors for estimating the initial condition of a onedimensional Burgers equation with a nonlinear viscous term. To this end, they used a limited-memory approximation of the inverse Hessian of the data assimilation cost function (up to a multiplier, the Hessian is equal to the FIM associated with the coefficients of a finite-dimensional parametrization of the initial state). The cost of the attendant computations is substantially reduced by extensive use of adjoint equations. In turn, selection of an optimal subset of candidate sensor locations have been studied by Alexanderian et al. (2014) for estimation of the initial state of a three-dimensional advection-diffusion equation. The optimality criterion was the trace of the posterior covariance, implemented in practice through a randomized trace estimator. Substantial computational savings result from using a randomized SVD to get a low-rank surrogate for the prior-preconditioned parameter-to-observable map. Efficiency is additionally increased by specifying the covariance operator of the Gaussian prior as the inverse of an elliptic differential operator, which can be evaluated using fast solvers for elliptic PDEs. A successful attempt to generalize this approach to a parameter estimation problem (i.e., a nonlinear inverse problem) for inferring a coefficient field in a two-dimensional elliptic problem has been made in (Alexanderian et al., 2016). This inspired the formulation we use in our paper. In an earlier paper Herzog and Riedel (2015), we focused on sensor placement problems for thermo-mechanical systems, but in the absence of a dynamical system (1.1). To be precise, the temperature field was estimated directly from instantaneous measurements and in a reduced-order temperature space. This is not possible here, since the heat transfer coefficient α is considered unknown. Notice that an estimation of α is only possible in a time-dependent model. The particular features of the problem at hand and novelties in the present paper compared with previous work on sensor placement are the following. The presence of the QOI prevents us from using directly the Fisher information matrix (FIM) of the (x 0, p)- estimation problem to formulate the objective for the optimal sensor placement problem. Instead, we must use the (approximate) covariance matrix of the QOI estimator, which involves the solution map of a linearized state system. Since we assume the di- 5

6 mension of the QOI to be much lower than the state dimension (r n), we employ an adjoint technique to evaluate that covariance matrix efficiently. In order to solve the sensor placement problem, we employ a simplicial decomposition algorithm, which was analyzed in Patriksson (1999) and Bertsekas (2015). To solve the main subproblem we make use of the classical multiplicative algorithm which goes back to Silvey et al. (1978), but needs to be adapted to the objective at hand. We refer the reader to Torsney (2009); Yu (2010) for a historical overview. Basically, while solving the relaxed convex sensor selection problem (Problem 3.2) we could adapt the approach outlined by Joshi and Boyd (2009), which advocates an interiorpoint method. As will be shown, however, the implementation of simplicial decomposition is strikingly easy, the algorithm usually runs very fast, and most often the solutions produced by it are rather sparse (i.e., the number of nonzero weights is low). Sparsity may be quite an acute problem as far as relaxed solutions are concerned and usually requires augmenting the criterion by sparsifying penalty functions (Chepuri and Leus, 2015; Alexanderian et al., 2014; Haber et al., 2010, 2008). The linear programming subproblem built in simplicial decomposition seems to successfully retain a moderate number of nonzero weights. Due to the multiplicative coupling of parameters p and state vector x(t) in (1.1), the covariance of the QOI is going to depend not only on the measurement matrix C y but also on the unknown parameters p themselves (but not on the unknown initial state x 0 ). Often, this feature is addressed in sensor placement or similar experimental design problems by embedding the latter in a robust formulation, where the unknown parameter is confined to an uncertainty set. This significantly adds to the level of complexity of the problem; see, e.g., (Uciński, 2005; Pronzato and Pázman, 2013) or (Körkel et al., 2004; Diehl et al., 2006; Bock et al., 2007). In this paper, we focus on the sensor placement problem for systems of type (1.1) in the presence of a QOI and therefore content ourselves with a given set-point (nominal value) p 0 in the parameter space. In Section 2, we formulate the data assimilation problem, which is used to jointly estimate the unknown initial state x 0 and the parameters p. The sensor placement problem is addressed in Section 3 and we propose a simplicial decomposition algorithm for its solution in Section 4. Subsequently, we elaborate on a specific thermo-mechanical system modeling a machine tool, where the temperature constitutes the system state x(t) and the thermo-mechanically induced displacement at a certain reference point (the tool center point, or TCP) serves as the quantity of interest z. We seek optimal locations of temperature sensors on the surface of the machine in order to obtain an accurate estimate of the TCP displacement. The details are given in Section 5 and illustrated with numerical results in Section DATA ASSIMILATION PROBLEM We consider the dynamical system (1.1) with state x(t) R n, unknown initial state x 0 R n and unknown parameter vector p R q. We assume that measurements (1.2) of 6

7 certain parts of the state trajectory are taken at given measurement times t j, j = 1,..., N during the time interval [0, t f ] under consideration. The measurements y j are subject to measurement errors η j, j = 1,..., N, which we assume to be i.i.d. random variables with normal distribution N (0, V y ), where V y = σ 2 id m. This means that the components of each η j are independent zero-mean random variables with the same variance σ 2, or equivalently, that the measurements from different sensors are independent of one another and that their accuracy is the same. The unknowns in the model (1.1) are x 0 and p. However, our prior (background) information are their prior estimates x bg 0 and p bg which are supposed to be realizations of Gaussian random vectors with means x 0 R n and p R q, and covariance matrices V x0 R n n and V p R q q, respectively, i.e., x bg 0 N ( x 0, V x0 ) a p bg N ( p, V p ). Here x 0 and p are unknown and interpreted as the true initial state and the true parameter, respectively. In turn, as for V x0 and V p, we assume that they are known and positive definite, and hence invertible. As is usually the case in data assimilation problems, the number of unknowns (n + q) exceeds the number of measurements (N m). Consequently, regularization terms are needed expressing the above-mentioned prior information about the unknowns. We thus state our data assimilation as follows, cf. Cacuci et al. (2014): min x 0 R n, p R q J DA(x 0, p) = 1 2 x 0 x bg Vx p pbg 2 Vp N j=1 y j C z x(t j ; x 0, p) 2, V 1 y (2.1) where the term x(t j ; x 0, p) is the solution to (1.1) at sampling time t j evaluated at given x 0 and p 0. In order to solve the nonlinear least-squares problem (2.1), one can employ a standard derivative-based method such as the Gauss-Newton or Levenberg-Marquardt algorithms; see for instance (Nocedal and Wright, 2006, Section 10.3). In order to formulate the Jacobian of the model output w.r.t. the unknowns (x 0, p), we introduce the sensitivities X 0 (t) = x 0 x(t; x 0, p) R n n and X p (t) = p x(t; x 0, p) R n q of the state x(t; x 0, p) with respect to the initial state x 0 and the parameters p. By the implicit function theorem, it follows from (1.1) that X 0 is given by the linear system { E Ẋ0 (t) = A(p)X 0 (t), t [0, t f ], X 0 (0) = I n (2.2) and X p satisfies { E Ẋp (t) = A (p) x(t) + A(p)X p (t), t [0, t f ], X p (0) = 0 R n p. (2.3) 7

8 Note that, for simplicity of notation, we let A (p) x(t) stand for the Jacobian matrix of the mapping p A(p) x(t) with respect to p while holding x(t) constant. Using the Chain Rule Theorem (Magnus and Neudecker, 1999, Thm. 12, p. 108), we easily deduce that ( ) A (p) x(t) = p A(p) x x=x(t) = ( x(t) ) vec A(p) I n (2.4) = [ A(p) p 1... p A(p) p q ] (idq x(t) ), where vec is the column-stacking operator and signifies the Kronecker product. Due to the linearity of the output equation (1.2), the sensitivity of the model output to changes in (x 0, p) is given by C y x(t j ; x 0, p) (x 0, p) = C y [ X0 (t j ) X p (t j ) ] R m (n+q), j = 1,..., N. (2.5) The data assimilation problem (2.1) can be written as a weighted least-squares problem of the form 1 min x 0 R n, p R q 2 r(x 0, p) H r(x 0, p) (2.6) with the residual vector r(x 0, p) = x 0 x bg 0 p p bg y 1 C y x(t 1 ; x 0, p). y N C y x(t N ; x 0, p) and the symmetric non-negative definite weight matrix H = diag ( Vx 1 0, Vp 1, Vy 1,..., Vy 1 ). }{{} N times R n+q+n m (2.7) The Jacobian of the residual can be computed from the sensitivities defined above in the following way: id n 0 J(x 0, p) = r(x 0 id 0, p) q (x 0, p) = C y X 0 (t 1 ) C y X p (t 1 ). (2.8).. C y X 0 (t N ) C y X p (t N ) 8

9 Notice that for a large state dimension n, the sensitivity trajectory X 0 : [0, t f ] R n n will be of a formidable size. Also, since the number of model outputs and measurements, N m, is typically smaller than the number of unknowns n + q, it is more economical to evaluate the Jacobian using the adjoint technique. We will now show that one single adjoint variable S : [0, t f ] R n m is enough to attain this objective. To this end, consider one typical output estimate ŷ j = C y x(t j ; x 0, p) of the actual output y j. Adjoin (1.1) to this estimate with an arbitrary time-varying Lagrange multiplier matrix S j (t) R n m as follows: ŷ j = C y x(t j ; x 0, p) tj + S j (t) [ ] E ẋ(t; x 0, p) A(p) x(t; x 0, p) f (t) dt (2.9) 0 }{{} =0 Let us integrate the S j (t) E ẋ(t) term in (2.9) by parts, yielding ŷ j = [ C y + S j (t j ) E ] x(t j ) S j (0) E x(0) tj 0 [Ṡj (t) E + S j (t) tj A(p)]x(t) dt S j (t) f (t) dt. Differentiating both the sides of (2.10) with respect to x 0, we thus get C y X 0 (t j ) = ŷ j x 0 = [ C y + S j (t j ) E ] X 0 (t j ) S j (0) E X 0 (0) tj 0 [Ṡj (t) E + S j (t) A(p)]X 0 (t) dt. 0 (2.10) (2.11) To avoid having to determine the function X 0 (t), we choose the multiplier function S j (t) so that the coefficients of X 0 (t) and X 0 (t j ) vanish, i.e., we specify it as the solution to the following backwards-in-time adjoint differential equation: { E Ṡ j (t) = A(p) S j (t), t [0, t j ], E S j (t j ) = Cy (2.12). Equation (2.11) then becomes C y X 0 (t j ) = S j 0 (0) EX 0 (0) = S j 0 (0) E. (2.13) In turn, differentiating both the sides of (2.10) with respect to p, we get C y X p (t j ) = ŷ j p = [ C y + S j (t j ) E ] X p (t j ) S j (0) E X p (0) tj 0 tj 0 [Ṡj (t) E + S j (t) A(p)]X p (t) dt S j (t) ( A (p) x(t) ) dt (2.14) 9

10 But on account of (2.12) and the initial condition for (2.3), this simplifies to tj C y X p (t j ) = S j (t) ( A (p) x(t) ) dt. (2.15) 0 Consequently, the block row of the Jacobian (2.8) associated with the output at time t j can be expressed as [ Cy X 0 (t j ) C y X p (t j ) ] [ tj = S j (0) E S j (t) ( A (p) x(t) ) ] dt. It is now important to observe that (2.12) is an autonomous system. Therefore, S j (t) = S k (t t j + t k ) holds whenever both are defined. We conclude that in place of N different systems of type (2.12) it is enough to consider a single adjoint system for the adjoint state S : [0, t f ] R n m { E Ṡ(t) = A(p) s(t), t [0, t f ], 0 E S(t f ) = C y. (2.16) Since S j (t) = S(t t j + t f ) holds, each block row of the Jacobian can be evaluated according to [ Cy X 0 (t j ) C y X p (t j ) ] [ = S(t f t j ) E tj 0 S(t t j + t f ) ( A (p) x(t) ) dt ]. (2.17) We provide in Table 3.1 an overview over the quantities required during the solution of the data assimilation problem (2.1) by gradient-based methods. 3. SENSOR PLACEMENT PROBLEM 3.1. COVARIANCE OF THE QOI ESTIMATOR Having solved the data assimilation problem (2.1), we obtain estimates ˆx 0 and ˆp of the sought true values x 0 and p, respectively. In the sequel, we shall concatenate x 0 and p so as to have only one vector of unknown true parameters θ = ( x 0, p) and its estimates ˆθ = ( ˆx 0, ˆp). As was mentioned in the introduction, our main concern is not to estimate the unknown initial state x 0 or the parameter vector p directly, but rather to estimate a quantity of interest z depending on the terminal state x(t f ) at time t f, z = C z x(t f ; θ) R r (3.1) through ẑ = C z x(t f ; ˆθ) R r (3.2) 10

11 with r small compared with the dimension n of the state variable. To be able to assess the quality of the estimator (3.2), we investigate the expected dispersion of the estimates produced by it, which is quantified by the covariance matrix Cov(ẑ). Clearly, the QOI z depends on the unknowns (x 0, p) in an indirect way, and its dependence on p is nonlinear. Therefore, to obtain an expression for the covariance matrix of the estimator ẑ is a real challenge. That is why we follow here a standard approach in the literature, cf. Mehra (1974), and resort to the covariance of a linearized estimator, which is obtained by linearizing the parameter-to-qoi map. This approach is backed up by asymptotical considerations; see for instance (Pronzato and Pázman, 2013, Chapter 3). From now on, let θ 0 = (x0 0, p0 ) denote a given set-point in the parameter space (we may set θ 0 = θ bg = (x bg 0, pbg )), where (3.1) is linearized. An application of the chain rule, applied to (3.1) and (2.2) (2.3), shows that this linearization is given by the matrix Q = z θ = C z X(t f ; θ 0 ) R r d, (3.3) θ=θ 0 where here and subsequently for abbreviation we write d = n + q and X(t; θ) = [ X 0 (t; θ) X p (t; θ) ]. Consequently, the covariance of the linearized QOI estimator is related via Cov(ẑ) = Q Cov( ˆθ) Q (3.4) to the covariance Cov( ˆθ) of the parameter estimator ˆθ. Throughout the paper we assume that the matrix Q has full row rank: rank Q = r. (3.5) In order to form the matrix Q, we exploit the similarity of (3.3) and (2.5) and follow an adjoint approach. To be precise, we solve the additional adjoint system for S Q : [0, T] R n r { E Ṡ Q (t) = A(p 0 ) S Q (t), t [0, t f ], and evaluate E S Q (t f ) = C z Q = [ C z X 0 (t f ; θ 0 ) C z X p (t f ; θ 0 ) ] [ t = S Q (0) f E S Q (t) ( A (p 0 ) x(t) ) dt 0 ]. (3.6) (3.7) The problem of characterizing and evaluating Cov( ˆθ) has extensively been investigated by researchers concerned with variational data assimilation. Gejadze et al. outlined an approach to obtain approximations to the covariance matrices for the data assimilation 11

12 Quantity defined in evaluate using requires r(x 0, p) residual (2.7) (2.7) solution x of (1.1) J(x 0, p) Jacobian (2.8) (2.17) solution S of (2.16) Q Jacobian of QOI (3.3) (3.7) solution S Q of (3.6) Table 3.1: Overview of quantities for the solution of the data assimilation and the sensor placement problem, and how to evaluate them efficiently. problem in which either the initial state x 0 or the parameter vector p are unknowns, see Gejadze et al. (2008) and Gejadze et al. (2010), respectively, as well as Gejadze et al. (2013); Gejadze and Shutyaev (2012). It is rather straightforward to combine these results in our problem of joint estimation of x 0 and p, thereby obtaining Cov( ˆθ) ( V 1 N θ + X(t j ) Cy Vy 1 C y X(t j ) ) 1, (3.8) j=1 where V θ = diag(v x0, V p ) and X(t j ) = X(t j ; θ). The dependence of the right-hand side on the true vector θ is not surprising, as it is a rule as long as estimates of the covariance matrices of various estimators are constructed in settings where the outputs depend nonlinearly on the estimated parameters. Clearly, we do not know θ and, in practice, we approximate it by a preliminary estimate θ 0 (e.g., a logical choice is θ 0 = θ bg ) THE CRITERION TO BE OPTIMIZED Our optimal design problem consists in determining an m-element subset selected out of a total of n state variables, which would yield the lowest variability in the estimates of the QOI as measured by the covariance matrix (3.4). In order to express this formally, we define a decision variable which is the n-dimensional vector w whose component w i is zero if x i is supposed to be measured and zero if x i in not going to be measured. In consequence, the observation matrix takes the form C y (w) = D(diag(w)), (3.9) where D stands for the operation of forming a submatrix of its matrix arguments by deleting all zero rows. Since we assume that the measurements of the observed state components are independent of one another and taken by equally accurate sensors, i.e., V y = σ 2 id m for some known variance σ 2, it follows that Cov( ˆθ) I(w) 1, (3.10) 12

13 where I(w) = V 1 θ + 1 σ 2 N j=1[x(t j )] diag(w) X(t j ) = V 1 n θ + i=1 w i Υ i, (3.11) Υ i = 1 σ 2 N j=1 row i (X(t j )) row i (X(t j )), i = 1,..., n. (3.12) Here row i signifies the i-th row of its matrix argument. We call I(w) the Bayesian information matrix for θ, cf. Chepuri and Leus (2015). Observe that the positive definiteness of V x0 and V p implies that of V θ, and this, in turn, forces I(w) to be positive definite (since the term i=1 n w iυ i is nonnegative definite). Consequently, there is no problem with the inversion of I(w). For the intended search for an optimal w, we have to introduce the appropriate optimality criterion. As nonnegative-definite matrices can be only partially ordered, instead of directly comparing the covariance matrices for different choices of the output matrix, a scalar performance index Ψ defined on Cov( ˆθ) can be used here. Thus, our sensor selection problem can be ultimately expressed as the optimization problem: Problem 3.1 (Sensor Selection Problem). Find a vector w bin Rn to minimize subject to the constraints J (w) = Ψ ( Q I(w) 1 Q ) (3.13) 1 n w = m, (3.14) w i {0, 1}, i = 1,..., n. (3.15) In the role of Ψ, various alphabetical optimality criteria commonly used in experimental design can be considered. Specifically, three possible criteria follow: (i) D Q -optimality (or generalized D-optimality), which corresponds to Ψ = log det, J (w) = log det ( Q I(w) 1 Q ), (3.16) (ii) A Q -optimality (or generalized A-optimality), which corresponds to Ψ = trace, J (w) = trace ( Q I(w) 1 Q ), (3.17) (iii) E Q -optimality (or generalized E-optimality), which corresponds to Ψ = λ max, J (w) = λ max ( Q I(w) 1 Q ), (3.18) 13

14 where λ max is the maximal eigenvalue of its matrix argument. See (Atkinson et al., 2007, p. 137) or (Silvey, 1980, p. 10) for justification of this terminology and notation. Different optimality criteria may produce different solutions to Problem 3.1, but this results from the their slightly different interpretations in terms of the uncertainty ellipsoid for the estimates ẑ. Roughly speaking, a D Q -optimum design minimizes its volume, an A Q -optimum design suppresses the mean squared length of its axes, and an E Q -optimum design minimizes the length of its largest axis. In what follows, our attention will be focused on the D Q -optimality criterion (3.16). Note that the assumption (3.5) implies rank Q I(w) 1 Q = rank Q = r, (3.19) see, e.g., the Range Inclusion Lemma in (Pukelsheim, 2006, p. 17), which clearly demonstrates that Q I(w) 1 Q is always nonsingular RELAXED SENSOR SELECTION PROBLEM Owing to the combinatorial nature of Problem 3.1, which may make its solution intractable even for small-scale problems, we relax it by replacing the non-convex Boolean constraints w i {0, 1} with the convex box constraints w i [0, 1]. Thus we get the following convex relaxed sensor selection problem: Problem 3.2 (Relaxed Sensor Selection Problem). Find a vector w R n to minimize subject to the constraints J (w) = Ψ ( Q I(w) 1 Q ) ( = Ψ Q ( V 1 ) ) 1Q (3.20) w i Υ i n θ + i=1 1 n w = m, (3.21) 0 w i 1, i = 1,..., n. (3.22) It goes without saying that the above relaxed problem is not equivalent to the original problem, as some components of the computed optimal solution w may be fractional and not binary. It is however by no means useless, as J (w ) constitutes a lower bound to J (wbin ) solving Problem 3.1. What is more, rounding up m largest components of w to one and the remaining components to zero, we can produce a suboptimal solution for Problem 3.1. This option is typical for sensor selection problems, see, e.g., Joshi and Boyd (2009). What is more, solutions to Problem 3.2 can be embedded into a general branch-and-bound scheme to yield a solution wbin, see (Uciński and Patan, 2007) for details. Problem 3.2 possesses a number of notable features which, in theory, should make its solution straightforward. First of all, note that the performance index J (w) is convex over the convex feasible set W defined by the constraints (3.21) and (3.22), being the 14

15 intersection of a hyperplane and a hyperbox. The convexity results from the fact that, under the assumption (3.5), the mapping Φ : M log det(q M 1 Q ) is convex on the set of positive-definite R d d matrices (Marshall et al., 2011, Theorem 16.F.4, p. 688). What is more, J is differentiable with where φ(w) := J (w) = [ φ 1 (w),..., φ n (w) ], (3.23) φ i (w) = trace ( Φ (I(w)) Υ i ) R, (3.24) where Φ (X) d dx Φ(X) signifies the matrix derivative of Ψ(X) of a matrix argument X R d d, which is the d d matrix whose (i, j) entry is Φ(X)/ X (j,i), cf. (Bernstein, 2005, p. 410). As I R d d is positive definite, we have Φ (I) = d di log det(q I 1 Q ) = I 1 Q ( Q I 1 Q ) 1 D I 1, (3.25) cf. (Bernstein, 2005, Eqn. ( ), p. 411). Substituting this into (3.24) and using the cyclic commutativity of the trace of a product of matrices, we get φ i (w) = trace (( Q I(w) 1 Q ) 1 Q I(w) 1 Υ i I(w) 1 Q ), i = 1,..., n. (3.26) As the feasible set W is a rather nice convex set, numerous computational methods can potentially be employed for solving Problem 3.2, e.g., the conditional gradient method or a gradient projection method. Unfortunately, if the number of the support points n is large, which is rather a common situation in applications, then these algorithms require additional efforts regarding implementation in order to avoid unsatisfactory computational times. On the other hand, an extremely simple multiplicative algorithm (Silvey et al., 1978; Yu, 2010) is available to maximize the D Q -optimality criterion over the canonical simplex. Its idea is reminiscent of the EM algorithm used for maximum likelihood estimation and a decisive advantage is ease of implementation. In what follows, it will be shown how this multiplicative algorithm can be built into a very simple and efficient computational scheme in which account of the additional upper-bound constraint in (3.22) is taken. The principal tool in its construction will be simplicial decomposition. 4. SIMPLICIAL DECOMPOSITION FOR PROBLEM ALGORITHM MODEL Simplicial decomposition (SD) proved extremely useful for large-scale pseudoconvex programming problems encountered, e.g., in traffic assignment or other network flow 15

16 problems (Patriksson, 1999). In its basic form, it proceeds by alternately solving linear and nonlinear programming subproblems, called the column generation problem (CGP) and the restricted master problem (RMP), respectively. In the RMP, the original problem is relaxed by replacing the original constraint set W with its inner approximation being the convex hull of a finite set of feasible solutions. In the CGP, this inner approximation is improved by incorporating a point in the original constraint set that lies furthest along the gradient direction computed at the solution of the RMP. This basic strategy has been discussed and extended in numerous references (Bertsekas, 2015; Patriksson, 1999). A marked characteristic of the SD method is that the sequence of solutions to the RMP tends to a solution of the original problem in such a way that the objective function strictly monotonically approaches its optimal value. The SD algorithm may be viewed as a form of modular nonlinear programming, provided that one has an effective computer code for solving the RMP, as well as access to a code which can take advantage of the linearity of the CGP. One of the aims of this paper is to show that this is the case within the framework of Problem 3.2. What is more, since we deal with minimization of the convex function J over a bounded polyhedral set W, this will automatically imply the convergence of the resulting SD scheme in a finite number of RMP steps (Bertsekas, 2015). Tailoring the SD scheme to our needs, we obtain Algorithm 1. In the sequel, its consecutive steps will be discussed in turn CHARACTERIZATION OF THE OPTIMAL DESIGN AND TERMINATION OF ALGORITHM 1 In the original SD setting, the criterion for terminating the iterations is checked only after solving the column generation problem. The computation is then stopped if the current point w (k) satisfies the condition of nondecrease, to first order, in performance measure value in the whole constraint set, i.e., min w W φ(w(k) ) (w w (k) ) 0. (4.8) The condition (4.4) is less costly in terms of the number of floating-point operations. It results from the following characterization of w which has the property that J (w ) = min w W J (w). Theorem 4.1. A vector w constitutes a global minimum of J over W if, and only if, there exists a number λ such that λ if w φ i (w i = 1, ) = λ if 0 < wi < 1, (4.9) λ if pi = 0 for i = 1,..., n. 16

17 Algorithm 1 Algorithm model for solving Problem 3.2 via simplicial decomposition. Step 0: (Initialization) Guess an initial solution w (0) W such that I(w (0) ) is nonsingular. Set I = { 1,..., n }, G (0) = { w (0)} and k = 0. Step 1: (Termination check) Set I (k) ub = { i I w (k) i = 1 }, (4.1) I (k) im = { i I 0 < w (k) i < 1 }, (4.2) I (k) lb = { i I w (k) i = 0 }. (4.3) If λ if i I (k) φ i (w (k) ub, ) = λ if i I (k) im, λ if i I (k) lb for some positive λ, then STOP and w (k) is optimal. (4.4) Step 2: (Solution of the column generation subproblem, CGP) Compute g (k+1) = arg min w W φ(w(k) ) w (4.5) and set If g (k+1) conv(g (k) ), then STOP G (k+1) = G (k) { g (k+1)}. (4.6) Step 3: (Solution of the restricted master subproblem, RMP) Find w (k+1) = arg min Ψ ( Q I(w) 1 Q ), (4.7) w conv(g (k+1) ) and purge G (k+1) of all extreme points with zero weights in the resulting expression of w (k+1) as a convex combination of elements in G (k+1). Increment k by one and go back to Step 1. 17

18 The proof of this result proceeds in much the same way as that of Proposition 1 in (Uciński and Patan, 2007) SOLUTION OF THE COLUMN GENERATION SUBPROBLEM In Step 2 of Algorithm 1 we deal with the linear programming problem minimize c w subject to w W, (4.10) where c = φ(w (k) ), in which the feasible region is defined by 2n bound constraints (3.22) and one equality constraint (3.21). Making use of this special form of the constraints, we can develop an algorithm to solve this problem, which is almost as simple as a closed-form solution. The key idea is to make use of the following assertion which can be demonstrated in much the same way as Theorem 4.1. Theorem 4.2. A vector g W constitutes a global solution to the problem (4.10) if, and only if, there exists a scalar ρ such that for i = 1,..., n. c i ρ if g i = 1, = ρ if 0 < g i < 1, ρ if g i = 0 (4.11) We thus see that, in order to solve (4.10), it is sufficient to pick m largest components c i of c and set the corresponding weights g i as one, and the remaining weights as zero SOLUTION OF THE RESTRICTED MASTER SUBPROBLEM Suppose that in the (k + 1)-th iteration of Algorithm 1, we have G (k+1) = { g 1,..., g l}, (4.12) possibly with l < k + 1 owing to the built-in deletion mechanism of points in G (j), 1 j k, which did not contribute to the convex combinations yielding the corresponding iterates w (j). Step 3 of Algorithm 1 involves minimization of the design criterion (3.20) over conv ( { G (k+1)) l = v j g j j=1 From the representation of any w conv ( G (k+1)) as w = l v j = 1, v j 0, j = 1,..., l j=1 }. (4.13) l v j g j, (4.14) j=1 18

19 or, in component-wise form, w i = g j i being the i-th component of gj, it follows that I(w) = V 1 n θ + i=1 w i Υ i = l v j g j i, i = 1,..., n, (4.15) j=1 l ( v j V 1 j=1 n θ + i=1 ) g j i Υ i = l v j I(g j ). (4.16) j=1 From this, we see that the RMP can equivalently be formulated as the following problem: Problem 4.3. Find the sequence of weights v R l to minimize subject to the constraints P(w) = log det ( Q H(v) 1 Q ) (4.17) 1 l v = 1, (4.18) v j 0, j = 1,..., l (4.19) where H(v) = l v j H j, H j = I(g j ), j = 1,..., l. (4.20) j=1 Basically, since the constraints (4.18) and (4.19) define the probability simplex in R l, i.e., a very nice convex feasible domain, it is intuitively appealing to determine optimal weights using a numerical algorithm specialized for solving convex optimization problems. Note, however, that this formulation has already captured close attention in optimum experimental design theory, where various characterizations of optimal solutions and efficient computational schemes have been proposed (Atkinson et al., 2007). In particular, in the case of the D Q -optimality criterion studied here, we can employ the General Equivalence Theorem of (Uciński, 2005, Theorem 3.2, p. 48) to get the following conditions for global optimality: Theorem 4.4. A vector v constitutes a global solution to Problem 4.3 if and only if ψ j (v ) { = r if v j > 0, r if v j = 0 (4.21) for each j = 1,..., l, where ψ j (v) = trace (( Q H(v) 1 Q ) 1 Q H(v) 1 H j H(v) 1 Q ), j = 1,..., l. (4.22) 19

20 A very simple multiplicative algorithm (Yu, 2010) can be adapted to the above RMP. It is summarized in Algorithm 2. Although only its monotonicity can be proven for the D Q - optimality criterion, and not global convergence, cf. (Yu, 2010), in practice it behaves flawlessly. As an alternative, an interior-point method has recently been proposed by Lu and Pong (2013), for which global convergence is guaranteed, but at the cost of a much more complicated implementation. Algorithm 2 Algorithm model for the restricted master problem. Step 0: (Initialization) Select a weight vector v (0) with positive components which sum up to one, e.g., set v (0) = (1/r)1 l. Set κ = 0. Step 1: (Termination check) If then STOP. Step 2: (Multiplicative update) Evaluate 1 r ψ(v(κ) ) 1 l (4.23) v (κ+1) = 1 r ψ(v(κ) ) v (κ). (4.24) Increment κ by one and go to Step APPLICATION TO A THERMO-MECHANICAL SYSTEM In this section, we descibe in more detail the application of the sensor placement procedure for a certain thermo-mechanical system. To be more precise, we consider the temperature evolution T(x, t) of the machine tool column depicted in Figure 5.1. We denote the solid body of the machine column by Ω and its surface by Γ. The temperature evolution is governed by the linear heat equation, ρ c p Ṫ div(λ T) = 0 in Ω (0, t f ), λ n T + α(x) (T T ref) = r(x, t) on Γ (0, t f ), T(x, 0) = T 0 (x) in Ω. The boundary conditions represent a simplified model for the heat transfer occurring at the different parts of the machine s surface. Since the underlying heat transfer mechanism includes both convective and radiative phenomena, the value of the effective coefficient α(x) is considered unknown and also dependent on the spatial position x. We (5.1) 20

21 Sensor Placement for Joint Parameter and State Estimation Herzog, Riedel, Ucin ski make here the following ansatz: q α( x) := α k χ k ( x ), (5.2) k =1 where each χk is an indicator function with values in {0, 1}, which selects a certain portion of the machine s surface Γ. Here the surface of the machine is divided into five parts. The value of α is fixed to zero on those two areas where the two heat sources act, which are expressed through the right hand side r ( x, t). The heat sources are assumed to be known and described in Section 6 where numerical results are presented. They originate from an electrical drive mounted on top of the machine column and on the other hand through the spindle driving the horizontal movement of the column, see Figure 5.1(c). On the remaining q = 4 parts of the surface, the heat transfer coefficients α1,..., α4 need to be estimated but some background information αbg is available. We have 12 W K 1 m 2 on the vertical surfaces, 10 W K 1 m 2 and 8 W K 1 m 2 on the horizontal planes with the outer normal facing upwards and downwards, respectively, and 5 W K 1 m 2 on all enclosed surfaces, including the inner surfaces of the cavities; see Figure 5.1(c). At those surface parts where the electrical drives are mounted, the heat transfer coefficient α( x) is zero. All symbols occuring in (5.1) are summarized in Table 5.1. (a) Photograph of the (b) CAD model with machine column. mounting points determining the TCP location. (c) Background values of αbg. Figure 5.1: Auerbach ACW 630 machine column. We now switch to a spatial finite element model of (5.1) w.r.t. a basis { ϕi }, i = 1,..., n. In our computations, we are using the standard nodal basis composed of piecewise linear, continuous elements on a tetrahedral grid of the geometry depicted in Figure 5.1(b). In a slight abuse of notation, we denote the coefficient vector representing the temperature field T also by T. By converting (5.1) to its weak formulation and restricting it to 21

22 Symbol Meaning Value Units T temperature K r thermal surface load W m 2 ρ density kg m 3 c p specific heat at constant pressure 500 J kg 1 K 1 λ thermal conductivity 46.8 W K 1 m 1 T ref ambient temperature 20 C α bg background information on α 0 to 12 W K 1 m 2 α heat transfer coefficient unknown W K 1 m 2 T 0 initial temperature unknown K Table 5.1: Table of symbols associated with the thermal model. the finite element space, we arrive at the following semi-discretized version of (5.1) M Ṫ(t) + K T(t) + q k=1 α k M k (T(t) T ref ) = r(t), t [0, t f ], T(0) = T 0. Here M and M k denote mass and boundary mass matrices, respectively, and K is the stiffness matrix: ( ) ( ) M = ρ c p ϕ i ϕ j dx, M k = ϕ i ϕ j χ k dx, Ω i,j Γ i,j ( ) K = λ ϕ i ϕ j dx Ω i,j with indices i, j = 1,..., n. T ref is a coefficient vector in R n with identical entries. The right hand side vector r(t) represents the load vector generated by the given boundary heat sources: ( ) r(t) = r(x, t) ϕ j dx. j Γ Finally, we recall that the coefficient vector T 0 representing the initial temperature distribution n T 0 (x) = T 0,j ϕ j (x) j=1 is unknown. It is clear that the finite element model (5.3) is of the form (1.1) when the identifications given in Table 5.2 are made. Our model output y(t) = C y T(t), which is adjusted to the temperature measurements during the data assimilation process, is described by the measurement matrix C y. In the present setting, we wish to use as potential measurement locations all finite element (5.3) 22

23 mesh nodes which are located on the surface of the machine column. Therefore, C y is composed of all rows of the n n identity matrix corresponding to the surface degrees of freedom. The specific form of the adjoint system (2.16) reads M Ṡ(t) = K S(t) M S(t f ) = C y. q k=1 α k M k S(t), t [0, t f ], Notice that the symmetry of M and K has been used. The block rows of the Jacobian according to (2.17) are [ tj S(t f t j ) M S(t t j + t f ) [ M 1 T(t) M 4 T(t) ] ] dt. (5.5) 0 (5.4) Symbol in (1.1) Symbol in (5.3) Remark x T p α unknown x 0 T 0 unknown E A(p) f (t) M K q α k M k k=1 r(t) + q α k M k T ref k=1 Table 5.2: Correspondence of symbols in the general dynamical system (1.1) and the finite element model of the heat equation (5.3). We recall that our emphasis is not on the estimation of the temperature distribution of the machine, but rather on the estimation of the QOI, i.e., the temperature-induced displacement of a certain reference point of the machine structure, at time t f. The overall displacement field is governed by a quasi-static linear elasticity model since the time scale of the heat equation is unable to generate wave motion in the machine structure. The linear elasticity model is based on the balance of forces, div σ ( ε(u), T(t f ) ) = 0 in Ω. (5.6) We employ an additive split of the stress tensor σ into its mechanically and thermally induced parts. (An alternative, equivalent approach would apply such a split to the strains.) Together with the usual homogeneous and isotropic stress-strain relation, we obtain the following constitutive law; see (Boley and Weiner, 1960, Section 1.12), (Eslami et al., 2013, Section 2.8): 23

24 σ ( ε(u), T(t f ) ) = σ el (ε(u)) + σ th (T(t f )), σ el (ε(u)) = E 1 + ν ε(u) + Eν trace(ε(u)) id, (1 + ν)(1 2ν) σ th (T(t f )) = E 1 2ν β ( ) T(t f ) T ref id3. Herein, ε denotes the linearized strain tensor ε(u) = 1 2 ( u + u ). (5.7) The elasticity modulus E and Poisson ratio ν of the cast iron machine column are given. For convenience, all quantities relevant for the displacement model are summarized in Table 5.3. Symbol Meaning Value Units u displacement m σ stress N m 2 ε strain 1 ν Poisson s ratio E modulus of elasticity N m 2 β thermal volumetric expansion coefficient K 1 L length of the main spindle m l auxiliary quantity, see Appendix A m σ standard deviation of temperature sensors K Table 5.3: Table of symbols associated with the displacement model. We continue by a specification of the mechanical boundary conditions for the elasticity equation (5.6) (5.7). The machine column is free to move in the X-direction on the rail by which it connects to the machine bed, see Figure 5.1(a). Movements in Y and Z- directions are prohibited. Moreover, the machine column is connected by a spindle nut to the spindle in the machine bed which drives the horizontal movement during operation. This leads to the following mixture of essential and natural boundary conditions for (5.6) (5.7): u 2 = 0, u 3 = 0, [σ n] 1 = 0 on Γ rail, u = 0 on Γ nut, σ n = 0 on Γ \ ( (5.8) ) Γ nut Γ rail. The third boundary condition expresses the absence of boundary loads on the remainder of the surface. 24

25 We discretize (5.6) (5.8) by standard nodal (vector-valued) linear finite elements on the same mesh employed for the discretization of the heat equation (5.1). This leads to a stationary, discrete problem of the following form: K u + F (T(t f ) T ref ) = 0, (5.9) where K is the stiffness matrix and F is a matrix associated with the thermally induced stress. Clearly, the solution map T(t f ) u taking the terminal temperature to the induced displacement is affine. Our quantity of interest z = u(x TCP ) R r with r = 3 is the displacement at a certain reference point x TCP, the tool center point. As TCP, we use the tip of the main spindle (holding the tool) seen in the left of Figure 5.1(a). We consider the main spindle assembly as a rigid body which is thermally insulated from the machine column. Consequently, the TCP displacement is determined by the displacement at the four mounting points x 1,..., x 4 of the sledge holding the main spindle, see Figure 5.1(b). The dependence u(x TCP ) = N(u(x 1 ),..., u(x 4 )) is nonlinear, and we refer the reader to (Herzog and Riedel, 2015, Section 3.2) for more details. Here we are only interested in the linearization C z of the map described by (5.9), T(t f ) u u(x TCP ) at the constant reference temperature T ref. By the chain rule, it is evident that C z = N (0) K 1 F R 3 n holds. The specific form of N (0) is given in Appendix A. Clearly, it is advantageous to evaluate C z in an adjoint fashion according to C z = F K N (0). This amounts to the solution of only r = 3 adjoint elasticity equations with point sources acting in x 1,..., x 4. With the matrix C z available, the output matrix Q can be evaluated by solving the adjoint system (3.6) and applying (3.7). For the forward system (5.3) under consideration, this amounts to solving MṠ Q (t) = K S Q (t) MS Q (t f ) = C z q k=1 for S Q : [0, t f ] R n r and subsequently evaluating [ Q = S Q (0) M α k M k S Q (t), t [0, t f ], t f S Q (t) [ M 1 T(t) M 4 T(t) ] dt 0 The symmetry of M and K has been used in these formulas. ]. (5.10) 25

26 6. NUMERICAL RESULTS In this section we present some numerical results. We focus on the sensor placement problem and its solution by the simplicial decomposition sensor placement method described in Algorithm 1. The algorithm is applied to the thermo-mechanical system described in Section 5. We therefore assume that the set-point θ 0 = (T 0 0, α0 ) is given and no data assimilation problem needs to be solved DESCRIPTION OF PROBLEM DATA We fix the set-point of the initial temperature state equal to the ambient temperature, i.e., T 0 0 (x) T ref. The set-point of the heat transfer parameter α 0 (x) varies over different parts of the boundary and it is zero were the heat sources are applied, see (5.2) and Figure 5.1(c). We have chosen typical values for the heat transfer coefficient, 12 W K 1 m 2 if x Γ vert (vertical surfaces), 10 W K 1 m 2 if x Γ up (horizontal surfaces facing up), α 0 (x) = 8 W K 1 m 2 if x Γ down (horizontal surfaces facing down), 5 W K 1 m 2 if x Γ inner (enclosed surfaces), 0 W K 1 m 2 if x Γ r1 Γ r2 (surfaces with heat sources). The inverse covariance matrices for the initial state and for the parameter were chosen as Vx 1 0 = M (finite element mass matrix) and Vp 1 = id 4. The machine column experiences the influence of two heat sources, see Figure 5.1(c). One originates from an electrical drive mounted on the top of the machine column (Γ r1 ) and the other one from the spindle driving the horizontal movement of the column (Γ r2 ). The heat sources are described by 6700 W m 2 if x Γ r1 and 0s t 2400s, 2700 W m 2 if x Γ r2 and 0s t 4800s, r(x, t) = 6700 W m 2 if x Γ r1 and 4800s < t 7200s, 0 else. All calculations are done in the time interval [0 s, 7200 s]. The standard deviation of the measurements was assumed to be σ = DISCRETIZATION As described in Section 5, we used a finite element model with a standard nodal basis of piecewise linear, continuous elements for the temperature T as well as for the displacement u on a tetrahedral grid of the geometry depicted in Figure 5.1(b). The size of the mesh can be seen in Table 6.1 All finite element nodes on the boundary are potential 26

27 number of mesh nodes number of mesh cells n = number of nodes on the boundary (potential sensor locations) Table 6.1: Size of the finite element mesh. sensor positions in the sensor placement problem. In order to compute the required quantities for the sensor placement problem, particularly the Jacobian J and the output matrix Q, we need to solve the time-dependent forward system (5.1), the adjoint system for the sensitivities (5.4) and the adjoint system (5.10) for the matrix Q. For the forward system we employed the implicit Euler method with time step length t = 360 s. The adjoint systems with discretized with the consistent adjoint time stepping scheme. The measurements y(t j ) = C y T(t j ) were taken at the same time instances t j = j t, j = 1,..., N = 20, which occur during integration EFFICIENT IMPLEMENTATION Notice that the sensitivities X(t j ) = [X 0 (t j ), X p (t j )] R n (n+q), j = 1,..., N as well as the matrices Υ i R (n+q) (n+q), i = 1,..., n are dense and therefore would require a large amount of memory to store. Moreover, the assembly of the matrix I(w) appearing in the evaluation of φ i during the CGP step, see (4.5) and (3.26), and during the RMP step (4.7) of Algorithm 1 would be computationally rather expensive. Here we take advantage of the fact that only the product h = I(w) 1 Q is needed to compute all required quantities in. Instead of forming I(w), we therefore solve I(w) h = Q by means of r = 3 calls to a preconditioned conjugate gradient method based on matrix-vector-products with I(w), which are much more economical to implement. As preconditioner we use the background covariance matrix V 1 θ = diag(vx 1 0, Vp 1 ) = diag(m, id 4 ). Similar considerations apply to the evaluation of ψ j in (4.22). In addition, the computation of φ(w) and ψ(v), see (3.26) and (4.22), as well as the matrix-vector products with the FIM I(w) are executed in parallel with N = 20 threads, where each thread j only uses sensitivity information X(t j ) for time step j RESULTS AND PERFORMANCE For practical purposes the termination criteria for the simplicial decomposition problem (4.4) and for the restricted master problem (4.23) are implemented only up to certain tolerances. In (4.4) a weight w i is considered zero (one), if it is below 0.05 (above 27

28 0.95) and hence i is taken to belong to the set I lb (I ub ). Afer solving the RMP, a column g j is purged if the corresponding v j is below All values for tolerances as well as maximal iteration numbers for solving both problems can be found in Table 6.2. Parameter maximal number of iterations for SDP 40 Value zero weight in termination check for SDP 0.05 unit weight in termination check for SDP 0.95 tolerance in termination check for SDP 0.01 tolerance for purging columns in RMP 0.05 maximal number of iterations for RMP 30 tolerance in termination check for RMP 0.01 Table 6.2: Parameters used in Algorithm 1. The sensor placement problem for the thermo-mechanical system described in Section 5 was solved for the setting described in Section 6.1 with a desired number of m = 10 sensors. Algorithm 1 stopped after 6 iterations, because the column generated in the CGP was already contained in the previous column set G. In this case the RMP to be solved would be the same as in the step before and no further progress could be achieved. The computation took about 2.5 h, further detail about the performance of the algorithm are listed in Table 6.3. time for computation of sensitivities (5.5) time for computation of φ(w) 15 min 150 s average number of RMP steps per SDP step 18 time for RMP step 70 s number of SDP steps 6 average time for SDP step 1360 s overall time 2.5 h Table 6.3: Computation times for the application of Algorithm 1. Figure 6.1 shows the evolution of the distribution of measurement weights for each SDP step over all possible sensor locations, which were all boundary nodes of the FE mesh. The final solution is achieved practically after 4 iterations, which is also reflected in the objective values Ψ ( Q I(w) 1 Q ) in Figure 6.2(b). The optimal sensors are all placed in the vicinity of the two heat sources, see Figure 6.2(a). Since the D Q -criterion targets the volume of the confidence ellipsoid of the QOI (TCP 28

29 Figure 6.1: Evolution of the measurement weights w (k) during SPD iterations. (a) Optimal sensor positions (m = 10). (b) Objective values Ψ ( Q I(w (k) ) 1 Q ) over iteration number. Figure 6.2: Optimal sensors and objective values. 29

On construction of constrained optimum designs

On construction of constrained optimum designs On construction of constrained optimum designs Institute of Control and Computation Engineering University of Zielona Góra, Poland DEMA2008, Cambridge, 15 August 2008 Numerical algorithms to construct

More information

Optimal sensor placement based on model order reduction

Optimal sensor placement based on model order reduction P. Benner a, R. Herzog b, N. Lang b, I. Riedel b, J. Saak a a Max Planck Institute for Dynamics of Complex Technical Systems, Computational Methods in Systems and Control Theory, 39106 Magdeburg, Germany

More information

1 Computing with constraints

1 Computing with constraints Notes for 2017-04-26 1 Computing with constraints Recall that our basic problem is minimize φ(x) s.t. x Ω where the feasible set Ω is defined by equality and inequality conditions Ω = {x R n : c i (x)

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games

Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games Alberto Bressan ) and Khai T. Nguyen ) *) Department of Mathematics, Penn State University **) Department of Mathematics,

More information

Optimization for Compressed Sensing

Optimization for Compressed Sensing Optimization for Compressed Sensing Robert J. Vanderbei 2014 March 21 Dept. of Industrial & Systems Engineering University of Florida http://www.princeton.edu/ rvdb Lasso Regression The problem is to solve

More information

Chapter 2 Distributed Parameter Systems: Controllability, Observability, and Identification

Chapter 2 Distributed Parameter Systems: Controllability, Observability, and Identification Chapter 2 Distributed Parameter Systems: Controllability, Observability, and Identification 2.1 Mathematical Description We introduce the class of systems to be considered in the framework of this monograph

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

Algorithms for Constrained Optimization

Algorithms for Constrained Optimization 1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic

More information

Iterative Methods for Solving A x = b

Iterative Methods for Solving A x = b Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http

More information

5.1 2D example 59 Figure 5.1: Parabolic velocity field in a straight two-dimensional pipe. Figure 5.2: Concentration on the input boundary of the pipe. The vertical axis corresponds to r 2 -coordinate,

More information

Numerical methods for the Navier- Stokes equations

Numerical methods for the Navier- Stokes equations Numerical methods for the Navier- Stokes equations Hans Petter Langtangen 1,2 1 Center for Biomedical Computing, Simula Research Laboratory 2 Department of Informatics, University of Oslo Dec 6, 2012 Note:

More information

Stochastic Spectral Approaches to Bayesian Inference

Stochastic Spectral Approaches to Bayesian Inference Stochastic Spectral Approaches to Bayesian Inference Prof. Nathan L. Gibson Department of Mathematics Applied Mathematics and Computation Seminar March 4, 2011 Prof. Gibson (OSU) Spectral Approaches to

More information

Self-Concordant Barrier Functions for Convex Optimization

Self-Concordant Barrier Functions for Convex Optimization Appendix F Self-Concordant Barrier Functions for Convex Optimization F.1 Introduction In this Appendix we present a framework for developing polynomial-time algorithms for the solution of convex optimization

More information

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems Robert M. Freund February 2016 c 2016 Massachusetts Institute of Technology. All rights reserved. 1 1 Introduction

More information

Selected Examples of CONIC DUALITY AT WORK Robust Linear Optimization Synthesis of Linear Controllers Matrix Cube Theorem A.

Selected Examples of CONIC DUALITY AT WORK Robust Linear Optimization Synthesis of Linear Controllers Matrix Cube Theorem A. . Selected Examples of CONIC DUALITY AT WORK Robust Linear Optimization Synthesis of Linear Controllers Matrix Cube Theorem A. Nemirovski Arkadi.Nemirovski@isye.gatech.edu Linear Optimization Problem,

More information

ASIGNIFICANT research effort has been devoted to the. Optimal State Estimation for Stochastic Systems: An Information Theoretic Approach

ASIGNIFICANT research effort has been devoted to the. Optimal State Estimation for Stochastic Systems: An Information Theoretic Approach IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL 42, NO 6, JUNE 1997 771 Optimal State Estimation for Stochastic Systems: An Information Theoretic Approach Xiangbo Feng, Kenneth A Loparo, Senior Member, IEEE,

More information

Scalable algorithms for optimal experimental design for infinite-dimensional nonlinear Bayesian inverse problems

Scalable algorithms for optimal experimental design for infinite-dimensional nonlinear Bayesian inverse problems Scalable algorithms for optimal experimental design for infinite-dimensional nonlinear Bayesian inverse problems Alen Alexanderian (Math/NC State), Omar Ghattas (ICES/UT-Austin), Noémi Petra (Applied Math/UC

More information

. D CR Nomenclature D 1

. D CR Nomenclature D 1 . D CR Nomenclature D 1 Appendix D: CR NOMENCLATURE D 2 The notation used by different investigators working in CR formulations has not coalesced, since the topic is in flux. This Appendix identifies the

More information

Appendix A Taylor Approximations and Definite Matrices

Appendix A Taylor Approximations and Definite Matrices Appendix A Taylor Approximations and Definite Matrices Taylor approximations provide an easy way to approximate a function as a polynomial, using the derivatives of the function. We know, from elementary

More information

8 A pseudo-spectral solution to the Stokes Problem

8 A pseudo-spectral solution to the Stokes Problem 8 A pseudo-spectral solution to the Stokes Problem 8.1 The Method 8.1.1 Generalities We are interested in setting up a pseudo-spectral method for the following Stokes Problem u σu p = f in Ω u = 0 in Ω,

More information

1 Lyapunov theory of stability

1 Lyapunov theory of stability M.Kawski, APM 581 Diff Equns Intro to Lyapunov theory. November 15, 29 1 1 Lyapunov theory of stability Introduction. Lyapunov s second (or direct) method provides tools for studying (asymptotic) stability

More information

Optimal input design for nonlinear dynamical systems: a graph-theory approach

Optimal input design for nonlinear dynamical systems: a graph-theory approach Optimal input design for nonlinear dynamical systems: a graph-theory approach Patricio E. Valenzuela Department of Automatic Control and ACCESS Linnaeus Centre KTH Royal Institute of Technology, Stockholm,

More information

SPARSE signal representations have gained popularity in recent

SPARSE signal representations have gained popularity in recent 6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying

More information

Sparse Linear Systems. Iterative Methods for Sparse Linear Systems. Motivation for Studying Sparse Linear Systems. Partial Differential Equations

Sparse Linear Systems. Iterative Methods for Sparse Linear Systems. Motivation for Studying Sparse Linear Systems. Partial Differential Equations Sparse Linear Systems Iterative Methods for Sparse Linear Systems Matrix Computations and Applications, Lecture C11 Fredrik Bengzon, Robert Söderlund We consider the problem of solving the linear system

More information

MS&E 318 (CME 338) Large-Scale Numerical Optimization

MS&E 318 (CME 338) Large-Scale Numerical Optimization Stanford University, Management Science & Engineering (and ICME) MS&E 318 (CME 338) Large-Scale Numerical Optimization 1 Origins Instructor: Michael Saunders Spring 2015 Notes 9: Augmented Lagrangian Methods

More information

Linear Programming Redux

Linear Programming Redux Linear Programming Redux Jim Bremer May 12, 2008 The purpose of these notes is to review the basics of linear programming and the simplex method in a clear, concise, and comprehensive way. The book contains

More information

Notes for CS542G (Iterative Solvers for Linear Systems)

Notes for CS542G (Iterative Solvers for Linear Systems) Notes for CS542G (Iterative Solvers for Linear Systems) Robert Bridson November 20, 2007 1 The Basics We re now looking at efficient ways to solve the linear system of equations Ax = b where in this course,

More information

2 Nonlinear least squares algorithms

2 Nonlinear least squares algorithms 1 Introduction Notes for 2017-05-01 We briefly discussed nonlinear least squares problems in a previous lecture, when we described the historical path leading to trust region methods starting from the

More information

Written Examination

Written Examination Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes

More information

Multi-Robotic Systems

Multi-Robotic Systems CHAPTER 9 Multi-Robotic Systems The topic of multi-robotic systems is quite popular now. It is believed that such systems can have the following benefits: Improved performance ( winning by numbers ) Distributed

More information

Subject: Optimal Control Assignment-1 (Related to Lecture notes 1-10)

Subject: Optimal Control Assignment-1 (Related to Lecture notes 1-10) Subject: Optimal Control Assignment- (Related to Lecture notes -). Design a oil mug, shown in fig., to hold as much oil possible. The height and radius of the mug should not be more than 6cm. The mug must

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Solving the Generalized Poisson Equation Using the Finite-Difference Method (FDM)

Solving the Generalized Poisson Equation Using the Finite-Difference Method (FDM) Solving the Generalized Poisson Equation Using the Finite-Difference Method (FDM) James R. Nagel September 30, 2009 1 Introduction Numerical simulation is an extremely valuable tool for those who wish

More information

DELFT UNIVERSITY OF TECHNOLOGY

DELFT UNIVERSITY OF TECHNOLOGY DELFT UNIVERSITY OF TECHNOLOGY REPORT -09 Computational and Sensitivity Aspects of Eigenvalue-Based Methods for the Large-Scale Trust-Region Subproblem Marielba Rojas, Bjørn H. Fotland, and Trond Steihaug

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture

More information

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality

More information

5.7 Cramer's Rule 1. Using Determinants to Solve Systems Assumes the system of two equations in two unknowns

5.7 Cramer's Rule 1. Using Determinants to Solve Systems Assumes the system of two equations in two unknowns 5.7 Cramer's Rule 1. Using Determinants to Solve Systems Assumes the system of two equations in two unknowns (1) possesses the solution and provided that.. The numerators and denominators are recognized

More information

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods Quasi-Newton Methods General form of quasi-newton methods: x k+1 = x k α

More information

Chapter 2 Finite Element Formulations

Chapter 2 Finite Element Formulations Chapter 2 Finite Element Formulations The governing equations for problems solved by the finite element method are typically formulated by partial differential equations in their original form. These are

More information

GENERALIZED CONVEXITY AND OPTIMALITY CONDITIONS IN SCALAR AND VECTOR OPTIMIZATION

GENERALIZED CONVEXITY AND OPTIMALITY CONDITIONS IN SCALAR AND VECTOR OPTIMIZATION Chapter 4 GENERALIZED CONVEXITY AND OPTIMALITY CONDITIONS IN SCALAR AND VECTOR OPTIMIZATION Alberto Cambini Department of Statistics and Applied Mathematics University of Pisa, Via Cosmo Ridolfi 10 56124

More information

Fundamentals of Linear Algebra. Marcel B. Finan Arkansas Tech University c All Rights Reserved

Fundamentals of Linear Algebra. Marcel B. Finan Arkansas Tech University c All Rights Reserved Fundamentals of Linear Algebra Marcel B. Finan Arkansas Tech University c All Rights Reserved 2 PREFACE Linear algebra has evolved as a branch of mathematics with wide range of applications to the natural

More information

Definition 5.1. A vector field v on a manifold M is map M T M such that for all x M, v(x) T x M.

Definition 5.1. A vector field v on a manifold M is map M T M such that for all x M, v(x) T x M. 5 Vector fields Last updated: March 12, 2012. 5.1 Definition and general properties We first need to define what a vector field is. Definition 5.1. A vector field v on a manifold M is map M T M such that

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

5 Handling Constraints

5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

More information

Fitting Linear Statistical Models to Data by Least Squares: Introduction

Fitting Linear Statistical Models to Data by Least Squares: Introduction Fitting Linear Statistical Models to Data by Least Squares: Introduction Radu Balan, Brian R. Hunt and C. David Levermore University of Maryland, College Park University of Maryland, College Park, MD Math

More information

The Simplex Method: An Example

The Simplex Method: An Example The Simplex Method: An Example Our first step is to introduce one more new variable, which we denote by z. The variable z is define to be equal to 4x 1 +3x 2. Doing this will allow us to have a unified

More information

1 Kalman Filter Introduction

1 Kalman Filter Introduction 1 Kalman Filter Introduction You should first read Chapter 1 of Stochastic models, estimation, and control: Volume 1 by Peter S. Maybec (available here). 1.1 Explanation of Equations (1-3) and (1-4) Equation

More information

Numerical Methods I Solving Nonlinear Equations

Numerical Methods I Solving Nonlinear Equations Numerical Methods I Solving Nonlinear Equations Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 16th, 2014 A. Donev (Courant Institute)

More information

(df (ξ ))( v ) = v F : O ξ R. with F : O ξ O

(df (ξ ))( v ) = v F : O ξ R. with F : O ξ O Math 396. Derivative maps, parametric curves, and velocity vectors Let (X, O ) and (X, O) be two C p premanifolds with corners, 1 p, and let F : X X be a C p mapping. Let ξ X be a point and let ξ = F (ξ

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

Optimization and Root Finding. Kurt Hornik

Optimization and Root Finding. Kurt Hornik Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding

More information

11 a 12 a 21 a 11 a 22 a 12 a 21. (C.11) A = The determinant of a product of two matrices is given by AB = A B 1 1 = (C.13) and similarly.

11 a 12 a 21 a 11 a 22 a 12 a 21. (C.11) A = The determinant of a product of two matrices is given by AB = A B 1 1 = (C.13) and similarly. C PROPERTIES OF MATRICES 697 to whether the permutation i 1 i 2 i N is even or odd, respectively Note that I =1 Thus, for a 2 2 matrix, the determinant takes the form A = a 11 a 12 = a a 21 a 11 a 22 a

More information

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725 Consider Last time: proximal Newton method min x g(x) + h(x) where g, h convex, g twice differentiable, and h simple. Proximal

More information

AN ALTERNATING MINIMIZATION ALGORITHM FOR NON-NEGATIVE MATRIX APPROXIMATION

AN ALTERNATING MINIMIZATION ALGORITHM FOR NON-NEGATIVE MATRIX APPROXIMATION AN ALTERNATING MINIMIZATION ALGORITHM FOR NON-NEGATIVE MATRIX APPROXIMATION JOEL A. TROPP Abstract. Matrix approximation problems with non-negativity constraints arise during the analysis of high-dimensional

More information

Lecture 9 Approximations of Laplace s Equation, Finite Element Method. Mathématiques appliquées (MATH0504-1) B. Dewals, C.

Lecture 9 Approximations of Laplace s Equation, Finite Element Method. Mathématiques appliquées (MATH0504-1) B. Dewals, C. Lecture 9 Approximations of Laplace s Equation, Finite Element Method Mathématiques appliquées (MATH54-1) B. Dewals, C. Geuzaine V1.2 23/11/218 1 Learning objectives of this lecture Apply the finite difference

More information

Introduction - Motivation. Many phenomena (physical, chemical, biological, etc.) are model by differential equations. f f(x + h) f(x) (x) = lim

Introduction - Motivation. Many phenomena (physical, chemical, biological, etc.) are model by differential equations. f f(x + h) f(x) (x) = lim Introduction - Motivation Many phenomena (physical, chemical, biological, etc.) are model by differential equations. Recall the definition of the derivative of f(x) f f(x + h) f(x) (x) = lim. h 0 h Its

More information

Lecture 13: Constrained optimization

Lecture 13: Constrained optimization 2010-12-03 Basic ideas A nonlinearly constrained problem must somehow be converted relaxed into a problem which we can solve (a linear/quadratic or unconstrained problem) We solve a sequence of such problems

More information

Numerical Optimal Control Overview. Moritz Diehl

Numerical Optimal Control Overview. Moritz Diehl Numerical Optimal Control Overview Moritz Diehl Simplified Optimal Control Problem in ODE path constraints h(x, u) 0 initial value x0 states x(t) terminal constraint r(x(t )) 0 controls u(t) 0 t T minimize

More information

Numerical Methods for Large-Scale Nonlinear Systems

Numerical Methods for Large-Scale Nonlinear Systems Numerical Methods for Large-Scale Nonlinear Systems Handouts by Ronald H.W. Hoppe following the monograph P. Deuflhard Newton Methods for Nonlinear Problems Springer, Berlin-Heidelberg-New York, 2004 Num.

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

Sensitivity and Reliability Analysis of Nonlinear Frame Structures

Sensitivity and Reliability Analysis of Nonlinear Frame Structures Sensitivity and Reliability Analysis of Nonlinear Frame Structures Michael H. Scott Associate Professor School of Civil and Construction Engineering Applied Mathematics and Computation Seminar April 8,

More information

September Math Course: First Order Derivative

September Math Course: First Order Derivative September Math Course: First Order Derivative Arina Nikandrova Functions Function y = f (x), where x is either be a scalar or a vector of several variables (x,..., x n ), can be thought of as a rule which

More information

The Bock iteration for the ODE estimation problem

The Bock iteration for the ODE estimation problem he Bock iteration for the ODE estimation problem M.R.Osborne Contents 1 Introduction 2 2 Introducing the Bock iteration 5 3 he ODE estimation problem 7 4 he Bock iteration for the smoothing problem 12

More information

The Hilbert Space of Random Variables

The Hilbert Space of Random Variables The Hilbert Space of Random Variables Electrical Engineering 126 (UC Berkeley) Spring 2018 1 Outline Fix a probability space and consider the set H := {X : X is a real-valued random variable with E[X 2

More information

An Adaptive Partition-based Approach for Solving Two-stage Stochastic Programs with Fixed Recourse

An Adaptive Partition-based Approach for Solving Two-stage Stochastic Programs with Fixed Recourse An Adaptive Partition-based Approach for Solving Two-stage Stochastic Programs with Fixed Recourse Yongjia Song, James Luedtke Virginia Commonwealth University, Richmond, VA, ysong3@vcu.edu University

More information

An introduction to Mathematical Theory of Control

An introduction to Mathematical Theory of Control An introduction to Mathematical Theory of Control Vasile Staicu University of Aveiro UNICA, May 2018 Vasile Staicu (University of Aveiro) An introduction to Mathematical Theory of Control UNICA, May 2018

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement to the material discussed in

More information

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares Robert Bridson October 29, 2008 1 Hessian Problems in Newton Last time we fixed one of plain Newton s problems by introducing line search

More information

A Locking-Free MHM Method for Elasticity

A Locking-Free MHM Method for Elasticity Trabalho apresentado no CNMAC, Gramado - RS, 2016. Proceeding Series of the Brazilian Society of Computational and Applied Mathematics A Locking-Free MHM Method for Elasticity Weslley S. Pereira 1 Frédéric

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation Prof. C. F. Jeff Wu ISyE 8813 Section 1 Motivation What is parameter estimation? A modeler proposes a model M(θ) for explaining some observed phenomenon θ are the parameters

More information

Directional Field. Xiao-Ming Fu

Directional Field. Xiao-Ming Fu Directional Field Xiao-Ming Fu Outlines Introduction Discretization Representation Objectives and Constraints Outlines Introduction Discretization Representation Objectives and Constraints Definition Spatially-varying

More information

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal IPAM Summer School 2012 Tutorial on Optimization methods for machine learning Jorge Nocedal Northwestern University Overview 1. We discuss some characteristics of optimization problems arising in deep

More information

Key words. preconditioned conjugate gradient method, saddle point problems, optimal control of PDEs, control and state constraints, multigrid method

Key words. preconditioned conjugate gradient method, saddle point problems, optimal control of PDEs, control and state constraints, multigrid method PRECONDITIONED CONJUGATE GRADIENT METHOD FOR OPTIMAL CONTROL PROBLEMS WITH CONTROL AND STATE CONSTRAINTS ROLAND HERZOG AND EKKEHARD SACHS Abstract. Optimality systems and their linearizations arising in

More information

INTRODUCTION TO FINITE ELEMENT METHODS

INTRODUCTION TO FINITE ELEMENT METHODS INTRODUCTION TO FINITE ELEMENT METHODS LONG CHEN Finite element methods are based on the variational formulation of partial differential equations which only need to compute the gradient of a function.

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

PDEs in Image Processing, Tutorials

PDEs in Image Processing, Tutorials PDEs in Image Processing, Tutorials Markus Grasmair Vienna, Winter Term 2010 2011 Direct Methods Let X be a topological space and R: X R {+ } some functional. following definitions: The mapping R is lower

More information

Generalized Finite Element Methods for Three Dimensional Structural Mechanics Problems. C. A. Duarte. I. Babuška and J. T. Oden

Generalized Finite Element Methods for Three Dimensional Structural Mechanics Problems. C. A. Duarte. I. Babuška and J. T. Oden Generalized Finite Element Methods for Three Dimensional Structural Mechanics Problems C. A. Duarte COMCO, Inc., 7800 Shoal Creek Blvd. Suite 290E Austin, Texas, 78757, USA I. Babuška and J. T. Oden TICAM,

More information

Outline. 1 Full information estimation. 2 Moving horizon estimation - zero prior weighting. 3 Moving horizon estimation - nonzero prior weighting

Outline. 1 Full information estimation. 2 Moving horizon estimation - zero prior weighting. 3 Moving horizon estimation - nonzero prior weighting Outline Moving Horizon Estimation MHE James B. Rawlings Department of Chemical and Biological Engineering University of Wisconsin Madison SADCO Summer School and Workshop on Optimal and Model Predictive

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

Semidefinite and Second Order Cone Programming Seminar Fall 2012 Project: Robust Optimization and its Application of Robust Portfolio Optimization

Semidefinite and Second Order Cone Programming Seminar Fall 2012 Project: Robust Optimization and its Application of Robust Portfolio Optimization Semidefinite and Second Order Cone Programming Seminar Fall 2012 Project: Robust Optimization and its Application of Robust Portfolio Optimization Instructor: Farid Alizadeh Author: Ai Kagawa 12/12/2012

More information

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ). .8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

More information

Introduction to Mobile Robotics Compact Course on Linear Algebra. Wolfram Burgard, Bastian Steder

Introduction to Mobile Robotics Compact Course on Linear Algebra. Wolfram Burgard, Bastian Steder Introduction to Mobile Robotics Compact Course on Linear Algebra Wolfram Burgard, Bastian Steder Reference Book Thrun, Burgard, and Fox: Probabilistic Robotics Vectors Arrays of numbers Vectors represent

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Adaptive methods for control problems with finite-dimensional control space

Adaptive methods for control problems with finite-dimensional control space Adaptive methods for control problems with finite-dimensional control space Saheed Akindeinde and Daniel Wachsmuth Johann Radon Institute for Computational and Applied Mathematics (RICAM) Austrian Academy

More information

CIV-E1060 Engineering Computation and Simulation Examination, December 12, 2017 / Niiranen

CIV-E1060 Engineering Computation and Simulation Examination, December 12, 2017 / Niiranen CIV-E16 Engineering Computation and Simulation Examination, December 12, 217 / Niiranen This examination consists of 3 problems rated by the standard scale 1...6. Problem 1 Let us consider a long and tall

More information

THEODORE VORONOV DIFFERENTIABLE MANIFOLDS. Fall Last updated: November 26, (Under construction.)

THEODORE VORONOV DIFFERENTIABLE MANIFOLDS. Fall Last updated: November 26, (Under construction.) 4 Vector fields Last updated: November 26, 2009. (Under construction.) 4.1 Tangent vectors as derivations After we have introduced topological notions, we can come back to analysis on manifolds. Let M

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

arxiv: v1 [math.na] 7 May 2009

arxiv: v1 [math.na] 7 May 2009 The hypersecant Jacobian approximation for quasi-newton solves of sparse nonlinear systems arxiv:0905.105v1 [math.na] 7 May 009 Abstract Johan Carlsson, John R. Cary Tech-X Corporation, 561 Arapahoe Avenue,

More information

1 The linear algebra of linear programs (March 15 and 22, 2015)

1 The linear algebra of linear programs (March 15 and 22, 2015) 1 The linear algebra of linear programs (March 15 and 22, 2015) Many optimization problems can be formulated as linear programs. The main features of a linear program are the following: Variables are real

More information

Dimensionality reduction of SDPs through sketching

Dimensionality reduction of SDPs through sketching Technische Universität München Workshop on "Probabilistic techniques and Quantum Information Theory", Institut Henri Poincaré Joint work with Andreas Bluhm arxiv:1707.09863 Semidefinite Programs (SDPs)

More information

Iterative Methods for Linear Systems

Iterative Methods for Linear Systems Iterative Methods for Linear Systems 1. Introduction: Direct solvers versus iterative solvers In many applications we have to solve a linear system Ax = b with A R n n and b R n given. If n is large the

More information

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks C.M. Bishop s PRML: Chapter 5; Neural Networks Introduction The aim is, as before, to find useful decompositions of the target variable; t(x) = y(x, w) + ɛ(x) (3.7) t(x n ) and x n are the observations,

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

CCP Estimation. Robert A. Miller. March Dynamic Discrete Choice. Miller (Dynamic Discrete Choice) cemmap 6 March / 27

CCP Estimation. Robert A. Miller. March Dynamic Discrete Choice. Miller (Dynamic Discrete Choice) cemmap 6 March / 27 CCP Estimation Robert A. Miller Dynamic Discrete Choice March 2018 Miller Dynamic Discrete Choice) cemmap 6 March 2018 1 / 27 Criteria for Evaluating Estimators General principles to apply when assessing

More information

Dynamic System Identification using HDMR-Bayesian Technique

Dynamic System Identification using HDMR-Bayesian Technique Dynamic System Identification using HDMR-Bayesian Technique *Shereena O A 1) and Dr. B N Rao 2) 1), 2) Department of Civil Engineering, IIT Madras, Chennai 600036, Tamil Nadu, India 1) ce14d020@smail.iitm.ac.in

More information

Linear Hyperbolic Systems

Linear Hyperbolic Systems Linear Hyperbolic Systems Professor Dr E F Toro Laboratory of Applied Mathematics University of Trento, Italy eleuterio.toro@unitn.it http://www.ing.unitn.it/toro October 8, 2014 1 / 56 We study some basic

More information

Lecture Notes: Geometric Considerations in Unconstrained Optimization

Lecture Notes: Geometric Considerations in Unconstrained Optimization Lecture Notes: Geometric Considerations in Unconstrained Optimization James T. Allison February 15, 2006 The primary objectives of this lecture on unconstrained optimization are to: Establish connections

More information