Estimating Disturbance Covariances From Data For Improved Control Performance

Size: px

Start display at page:

Download "Estimating Disturbance Covariances From Data For Improved Control Performance"

Sarah Mosley
5 years ago
Views:

1 Estimating Disturbance Covariances From Data For Improved Control Performance by Brian J. Odelson A dissertation submitted in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY (Chemical Engineering) at the UNIVERSITY OF WISCONSIN MADISON 23

3 i To my parents for letting me take things apart, to my grandfather for showing me how to put them back together, to Karen for sharing the highs and lows of this journey, and to Abby for making it all worthwhile

4 ii

5 iii Acknowledgments It has been my pleasure to work with such a talented group of people in my time here in Madison. Foremost is my advisor, Prof. James B. Rawlings. I continue to be amazed at his knowledge of the control field, especially his ability to distill complex concepts down into very simple ideas. I am grateful for the latitude he gave me to pursue my ideas, most of which didn t pan out. I also thank him for helping to make me a better technical writer. I would like to thank those professors who took the time to serve on my thesis committee: Professors Ray, Graham, and Swaney from the department and Professor Bates from Statistics. The members of Rawlings group have made my time here especially enjoyable. First there were the senior members of the group: Scott, Rahul, and Chris. I always enjoyed playing darts and drinking beer with you guys; I learned a lot about being in the group. Jenny was always a pleasure to talk to, even though I didn t know her well. Matt s knowledge of optimization and the regulator is impressive. I ll always remember the hours of entertainment you provided, (especially when the carpets are wet).

6 iv I ve appreciated Eric s easy-going attitude in the office, and marveled at the way he senses an idle CPU. I always enjoyed working with Aswin from time to time. In the short time that I have known the newbies of the group, I can see that the group will be in good hands. Best of luck to Paul, Murali, and Ethan. There have been many visitors to the Rawlings group over the years, and I have enjoyed working with them. It was great to get to know Gabriele during his time here. His knowledge of disturbance modeling is quite impressive, and a few of the examples in this dissertation were inspired by his thesis. Alex ran the lab for me during his time here, and helped me out tremendously. I am indebted to John Eaton for teaching me so much about Linux and system administration, and for answering my frequent (sometimes stupid) questions. Of course, the group would not have run so well without the glue. Mary has been a good friend, and made it much more pleasant to come to the office. In the past five years, I have made some lifelong friends. I have missed my frequent conversations with Daniel, and of course, lighting up the beach from time to time. I am thankful that I met Craig and spent so much time with him and Heidi. I am also grateful to the group (Craig, Jeff, Luis, Maren, Matt, Katie, and Pierce) for welcoming me as one of their own. Part of this dissertation included a collaboration with the Eastman Chemical Company. I thank Dr. Jim Downs for working with us, and for your hospitality in Kingsport. I also would like to thank Dr. Steve Miller for all of his help in acquiring

7 v the data, and obtaining the permissions we needed. I have to thank my parents and sister for their unfailing love, and for always being there. We cannot say how much we have appreciated having support from Grandma, Nan, and Vince. Finally, I thank my wife, Karen, for putting up with the long hours, and sharing both the highest highs and the lowest lows that have been the past four years. Brian J. Odelson University of Wisconsin Madison August 23

8 vi

9 vii Estimating Disturbance Covariances From Data For Improved Control Performance Brian J. Odelson Under the supervision of Professor James B. Rawlings The objective of this project is to address a shortcoming of model predictive controllers in the industrial setting. When using a state estimator, information about the disturbances in the system are necessary. Specifically, the covariances of the disturbances are required, and are often set arbitrarily in practice. Furthermore, model predictive controllers use integrated white noise disturbances to ensure offset-free control. Choosing the covariance of this white noise disturbance is complicated by the fact that an integrated white noise disturbance is generally not present in the plant. We have developed techniques that recover the covariances of the disturbances in the plant, including the integrated white noise disturbance. The autocovariance least-squares (ALS) methods use closed-loop data from the plant to recover these parameters. We have compared these methods to those available in the literature. Our approach allows us to determine a priori when unique estimates of the covariances

10 viii can be found from data. Previously, these conditions were not available in the literature. We demonstrate that these estimates converge, in the limit of large data, to the correct answer for the nominal system. From the results of this dissertation, we conclude that using the ALS methods to account for purely stochastic disturbances has a 15-3% increase in closed-loop control performance. However, the the control performance can be 3-5 times better in cases where significant model mismatch is present, which is the usual case in an industrial setting. This mismatch can take of the form of a poor input/output model, or a poor choice of disturbance model. A number of examples are presented in this dissertation to demonstrate the benefits of using the ALS methods. Several simulation examples are provided to demonstrate the theoretical control performance benefits of using an ALS estimator. The methods are also applied to data from an industrial reactor. In this application, it is shown that the prediction errors of the state estimator can be reduced by almost two orders of magnitude. The closed-loop control benefit is simulated and estimated to be nearly three times better than the control performance of the plant when the data was collected. Finally, the methods are applied to the control of a laboratory reactor. In this example, the control benefits can be exactly quantified using real closed-loop process data. The control cost is three to five times better than the state estimator tuning currently being practiced. The methods we have developed are suitable for practical implementation.

11 ix They are able to remove a major source of uncertainty in practical model predictive control implementations, can have large benefits in terms of closed-loop control performance, and require no additional capital expenditure.

12 x

13 xi Contents Acknowledgments iii Abstract vii List of Tables xvii List of Figures xix Chapter 1 Introduction MPC Monitoring Dissertation Overview Objectives Chapter 2 Model Predictive Control Problems and Motivations Problem Setup and Notation Regulator State Estimation Disturbance Model Target Tracking

14 xii 2.2 Effects of Incorrect Covariances Estimator Tuning Effects Evaluating Probability Distributions Chapter 3 Review and Critical Analysis of Previous Work Bayesian Methods Maximum Likelihood Methods Covariance Matching Methods Time Series Method Correlation Methods Subspace Identification Critical Analysis of Covariance Matching Filtered Estimates Smoothed Estimates Critical Analysis of Instrumental Variables Subspace Identification (IV- 4SID) Consistency of IV-4SID Chapter 4 Correlation Methods - Old and New Stochastically Driven Outputs Autocovariance Matrix Full Triple Matrix Method

15 xiii Removing the Initial Condition Solving the Least-squares Problem Connections to the Literature Augmented Observability Matrix Window Size Eigenvalues of the Plant Sliding Window Extraction of the Q w Matrix Derivation of State-Based Full Matrix Least-Squares Problem Chapter 5 Innovations, Noise Shaping, and Optimal Filtering Innovations Form Cross Covariance Terms Uniqueness Conditions Connections to the Literature Conditions for Computing Q w Uniqueness Conditions Derivation of Full Matrix Innovations-Based Least-Squares Problem Discussion of the Autocovariance Estimates Different Ways to Process the Data

16 xiv 5.7 Noise Shaping Methods Multiple Observers Optimal Filtering Iterative Procedure Minimizing the Estimate Error Optimization Whitening the Innovations Design Procedure Chapter 6 Closing the Loop Output-based Solution Innovations-based Solution Fixed Disturbance Model Updated Disturbance Models Time-delay Systems Illustrative Examples Discussion of Model Mismatch and Regulator Performance Chapter 7 Application to Industrial Data Introduction Data Set

17 xv Consistency Regulator Payoff Input Disturbance Model Time-delay Parameter Computational Summary Data Set Data Set Chapter 8 Application to Laboratory Data Apparatus and Reaction Model Quantifying the Regulator Benefits Motivation Sampling Issues Steady-state Behavior Servo Control Output Disturbance Rejection Input Disturbance Rejection Input/Output Model Mismatch PID Control Chapter 9 Conclusions and Future Work 241

18 xvi 9.1 Output-based Methods (Chapter 4) Innovations-based Methods (Chapter 5) Disturbance Models (Chapter 6) Control System Payoff (Chapters 6-8) Future Work Notation 247 Appendix A Solving Least-squares Problems 253 A.1 Solving Matrix-matrix Least-squares Problems Using the Vec Operator. 253 A.2 Summary of Kronecker Definitions A.3 Least-squares Related Proofs A.3.1 Proof of Lemma A.3.2 Proof of Lemma A.3.3 Proof of Lemma Appendix B Multimedia 265 B.1 MPC Simulation B.2 Effects of Model Mismatch Bibliography 267 Vita 281

19 xvii List of Tables 2.1 Summary of secondary effects Summary of changes in sensor noise characteristics Details of sensor noise covariance estimates over time Convergence properties of autocovariance estimates Condition numbers of the time-delay example Table of parameters for nonisothermal CSTR example Map of results Computational summary - Eastman set Summary of new filter applied to set Summary of new filter applied to set Numerical values for parameters Objective function costs for regulatory control in the motivating example Objective function costs for servo control in the motivating example Conditions used for sampling experiment

20 xviii 8.5 Objective function costs for steady-state regulatory control Objective function costs for servo control Objective function costs for output disturbance rejection Objective function costs for rejecting input disturbance Efficiency factors for input disturbance tuning on setpoint control Replicates of input disturbance experiments Objective function costs for model mismatch Replicates of model mismatch experiments Laboratory PID parameters

21 xix List of Figures 1.1 Traditional model predictive control Proposed MPC monitoring layer Graphical representation of the control problem Input/Output with incorrect estimator tuning and a step increase in sensor noise covariance (R v ) Input after using the regulator to compensate for noise Comparison of tracking performance - wrong control knob versus covariance estimator Evaluating 3-D probability distributions Comparison of state estimate error probability distributions from Example Outputs of the second motivational example Inputs of the second motivational example Filtered sensor noise frequency distribution using an incorrect R v Filtered state noise frequency distribution using an incorrect R v

22 xx 3.3 Filtered sensor noise frequency distribution using correct covariances Filtered state noise frequency distribution using correct covariances Filtered state estimates Smoothed state estimates Smoothed sensor noise frequency distribution using an incorrect sensor noise covariance Smoothed state noise frequency distribution using an incorrect sensor noise covariance Smoothed sensor noise frequency distribution using correct covariances Smoothed state noise frequency distribution using correct covariances Estimates of sensor noise covariance (R v ) from subspace Estimation of R v and SC T from open-loop output data Estimate of R v and SC T with increasing window size Estimate of R v (1, 1) with increasing points, well conditioned model Estimate of R v (1, 1) with increasing points, badly conditioned model Condition number of the augmented observability matrix, full rank measurement Condition number of the augmented observability matrix, limited measurement Estimation of sensor noise probability distribution with incorrect a priori estimate

23 4.8 Estimation of dynamic shift in sensor noise probability distribution Estimating Q w with limited measurements xxi 5.1 Ensemble average of the autocovariance Time average estimate of the autocovariance Objective function values versus elements of G Q w Objective function values versus elements of G Q w Improving convergence with a double filter Improving convergence with a different double filter Probability distribution of (x k x k k 1 ), Kalman gain unknown Probability distribution of (x k x k k 1 ) using Kalman gain from covariance estimator Autocorrelation function of white noise Evolution of the innovations autocorrelation while updating the Kalman gain Evolution of the estimator eigenvalues while updating the Kalman gain ALS design procedure Shell control problem - constrained outputs Shell control problem - prediction error, control actions Shell control problem - covariance matrix element-by-element Examples of integrated white noise disturbances Time-delay example

24 xxii 6.6 CSTR example - estimation of state noise probability distribution CSTR example - estimation of slow-drift disturbance probability distribution CSTR example - comparison of composition outputs CSTR example - comparison of cooling jacket temperature inputs CSTR example - comparison of temperature outputs CSTR example - comparison of flowrate input CSTR example - comparison of regulator costs High purity distillation column Column example - target tracking comparison Column example - input comparison Column example - comparison of regulator costs Column example - outputs while rejecting a disturbance Column example - inputs while rejecting a disturbance Column example - outputs using an input disturbance model Column example - inputs using an input disturbance model Column example - effects of model mismatch on regulatory control cost Column example effects of model mismatch on servo control cost Eastman gas phase reactor control Normalized process data - set Prediction error training Set - -5,

25 xxiii 7.4 Validation set - 5,-1, Validation set - 15,-2, Validation set - 25,-3, Division of data for consistency discussion Eigenvalues of the ALS estimators replicates Prediction errors containing an unmodeled disturbance Disturbance estimates for the temperature excursion Comparison of composition behavior, early set Comparison of temperature behavior, early set Comparison of normalized input behavior, early set Comparison of composition behavior, late set Comparison of temperature behavior, late set Comparison of normalized input behavior, late set Comparison of regulator objective function costs Before and after prediction errors, input disturbance model Regulator efficiency factors, input versus output models Normalized process data - set Normalized process data - set Schematic of reactor Laboratory setup

26 xxiv 8.3 Frequency distribution of the tracking error and control action in the reactor using initial tuning Frequency distribution of the tracking error and control action in the reactor using an increased control penalty Frequency distribution of the tracking error and control action in the reactor using the ALS estimator Setpoint change with original tuning Setpoint change with increased control penalty Setpoint change using updated state estimator Composition output with dynamic shift in R v Estimating the dynamic shift in the R v matrix Input frequency distributions for steady-state regulatory control Output of a setpoint change Input of a setpoint change Output while rejecting a deterministic output disturbance Input while rejecting a deterministic output disturbance Output while rejecting a deterministic input disturbance Input frequency distribution while rejecting a deterministic input disturbance Output of a setpoint change with model mismatch Input of a setpoint change with model mismatch

27 xxv 8.2 PID Servo control PID output disturbance rejection PID Input disturbance rejection

28 xxvi

29 1 Chapter 1 Introduction The best way to have a good idea is to have lots of ideas. Linus Pauling Model predictive control (MPC) has become a popular choice for solving difficult control problems. Higher performance, however, comes at a cost of greater required knowledge about the process being controlled. Often, this information cannot be obtained easily, is estimated incorrectly, or changes over time. It has been shown that incorrect, insufficient, or missing knowledge can lead to poor control performance. All too often, the parameters that the controller requires are not estimated correctly, and industrial practitioners may settle for parameters that give acceptable closedloop performance, instead of using parameters specified by the data. If a practitioner decides to spend the time and capital to implement an advanced control strategy, then intuitively they may wish to use some form of automated monitoring to ensure that the full benefits of the high-end control solution are realized.

30 2 1.1 MPC Monitoring When an MPC system is commissioned, certain assumptions are made about the process. Model predictive controllers perform better when the model and assumed parameters are closer to the plant. We propose to implement a monitoring system that uses the normal input-output data of the system to either estimate these parameters, or at least detect when they need to be re-identified. Typical MPC is shown in Figure 1.1, and is a simplified representation of actual implementations. Expert knowledge regulator u k plant y k x s (k) u s (k) ˆx k estimator y t, u t (Q s, R s ) target calculation ˆx k ˆd k Figure 1.1: Traditional model predictive control is often required to properly commission and maintain the regulator, target calculator, and state estimator. A more complete representation of MPC is shown as the grey box in Figure 1.2. In the regulator, the user must specify the system model, the objective function penalties, and the constraints a priori. In the state estimator, the user must specify the system model, the covariances of the disturbances, and a disturbance model. We propose to use normal input/output data from the plant to recover some of this information on-line. On the regulator side, the model validation

31 3 Performance Objectives (Q, R) Model(A, B, C, D) Constraints (u min,max, y min,max ) system ID model validation x s (k) u s (k) regulator u k y t, u t target calculation plant y k ˆx k ˆd k estimator Model (A, B, C, D) Disturbance Model Tuning (Q w, R v ) covariance estimator Figure 1.2: Proposed MPC monitoring layer step determines when the system model (the input to output connection) is no longer valid. The model validation procedure might create a flag when it is time to compute a new model using one of the many available system identification procedures. On the estimator side, since the covariances of the disturbances come from streams entering the system, the covariances are, in principle, measurable quantities. The covariance estimator shown in Figure 1.2 uses the normal operating data to compute the covariances of the Gaussian disturbances, as well as information about the integrating disturbance model being used. This dissertation deals mainly with this expert knowledge of the state estimator, and how ordinary process data may be used to remove some of the information

32 4 burden from the user. Since the state estimation works of Kalman and Bucy [72, 73], much research has been devoted to the theoretical properties as well as the practical implementation aspects of the Kalman filter. This dissertation focuses on the implementation aspects of the state estimator, specifically in dealing with the disturbances that are almost certainly present in practical applications. Performance monitoring is not useful if it can not be run on a closed-loop industrial process. As the methods are developed in this dissertation, there are several restrictions to keep in mind. Firstly, we use only the input and output data that normally comes from the plant (e.g. without requiring additional sensors or actuators). The method must be robust, in that reasonable model mismatch does not cause the monitoring layer to fail. Finally, the method cannot rely on input excitation. While the methods must account for the control actions, they must not depend on deliberate manipulation of the inputs (i.e. a pseudo random binary sequence) to be useful. 1.2 Dissertation Overview In this project, we outline a methodology for providing the required disturbance parameters in the state estimator from process data. In addition to developing a method to recover the information from data, there are several other issues to address. There must be a way to accurately determine when the methods are applicable, based on the available information. The method is not suitable for practical implementations if a plant test was required to determine if the problem can be solved. There must be

33 5 sufficient motivation for using this high-end control solution on industrial processes. If there is only a small increase in control performance, then the return on investment would not be high enough to warrant implementation. Finally, the methods must be robust enough to accommodate a variety of disturbances that might affect the plant, including a poor model. The remainder of this dissertation is organized as follows. Chapter 2 lays out the mathematical formulation of the MPC controller. We also provide some motivational examples that are solved later in the dissertation. Chapter 3 reviews the literature for adaptive filtering, subspace identification, and general Kalman filtering theory. Also in this chapter, a critical analysis of some of the popular techniques in the literature is provided, with examples to demonstrate why existing techniques are not suitable for this project. Chapter 4 begins to construct a method to solve the problems of interest, beginning with the available techniques for open-loop systems. Chapter 5 further develops this method to make it suitable for use with closed-loop data. A hierarchy of tools, based on the level of available information, is outlined. Chapter 6 applies these methods to closed-loop systems, and adapts the methods to include an integrated disturbance model. The integrated disturbance is required in MPC to ensure offset-free control, yet has not been treated in the literature. Chapter 7 contains the results of an industrial collaboration. From actual plant data, we make theoretical predictions about the potential benefits of these methods in an industrial setting. Chapter 8 contains the results of a laboratory MPC implementation. In this

34 6 setup, we are able to demonstrate closed-loop performance on an actual process. Finally, the dissertation concludes in Chapter 9 with a summary of the results, and an outline of future work. 1.3 Objectives We define the following objectives for this dissertation 1. Develop methods for using process data to monitor the estimator 2. Apply these methods to a closed-loop industrial type MPC problem 3. Evaluate effects of model mismatch 4. Diagnose and correct an improperly designed disturbance model 5. Evaluate control performance on a laboratory reactor

35 7 Chapter 2 Model Predictive Control Problems and Motivations Research is what I m doing when I don t know what I m doing. Wernher Von Braun In the past two decades, model predictive control has become the dominant control strategy in industry to solve tough control problems. What originally began as IDCOM in 1978 [133], was followed by dynamic matrix control (DMC) [35], then quadratic DMC (QDMC) [44]. Optimization-based Model Predictive Control with constraints has been the subject of much research, eventually evolving into nonlinear MPC [154]. In this chapter, we describe the model predictive framework that is used in this dissertation. Two motivational examples are then provided in this framework to demonstrate the potential impact of this project. These motivating examples are then solved in subsequent chapters.

36 8 2.1 Problem Setup and Notation Throughout this dissertation, we assume that datum is produced from the following linear plant x k+1 = Ax k + Bu k + Gw k y k = Cx k + Du k + v k (2.1a) (2.1b) in which {w k } N d k= and {v k} N d k= are uncorrelated, zero-mean, Gaussian noise sequences with covariances Q w, R v, respectively. The mean and covariances of the noise sequences are summarized with the following notation. w k N(, Q w ) v k N(, R v ) The dimensions of the system matrices are A R n n B R n m G R n g C R p n D R p p and the corresponding vectors have appropriate dimensions. In this dissertation, D is assumed to be zero unless explicitly stated Regulator The control objective is to minimize the quadratic cost of the deviations of the outputs (y ks) from a constant setpoint as shown in Figure 2.1. The deviation of the inputs (u k s) from their steady-state values (us k ) are also penalized. The quadratic controller

37 9 u k y k k 1 k k + 1 k + 2 Past Present Future k value of control objective Figure 2.1: Graphical representation of the control problem solves the following optimization problem for the optimal input trajectory over the infinite horizon. min Φ = {u k } y j r j 2 Q + u j u s k 2 R j=k s.t. x k+1 = Ax k + Bu k y k = Cx k + Du k u min u k u max y min y k y max (2.2a) (2.2b) (2.2c) (2.2d) (2.2e) In the absence of constraints, Equation 2.2 is the linear quadratic regulation (LQR) problem. The LQR control problem can be solved recursively, defining the control

38 1 action, u k = Kx k. The optimal gain is computed from K k = (B T Π k B + R) 1 B T Π k A k (2.3) in which Π is computed from the Riccati equation Π k 1 = Q + A T Π k A A T Π k B(R + B T Π k B) 1 B T Π k A (2.4) Throughout this dissertation, we assume that the Riccati solution converges to a steady-state value, Π k 1 = Π k Π. Subsequently, the gain also converges, K k = K. The optimization can also penalize the differential control action, (u k u k 1 ), instead of the absolute control action, u k. The state-space system is augmented x k+1 u k A = [ ] x k u k 1 B + u k I (2.5a) y k = C + v k (2.5b) This augmented system can be used in Equation 2.4 to solve for an augmented gain, and the corresponding control law State Estimation The state estimation problem is dual to the regulator problem and is based on the following probability distribution for the state [27], given measurements up to the previous time step p(x k y,, y k 1 ) N ( x k k 1, P k k 1 ) (2.6)

39 11 The state estimate error covariances are defined as P k k 1 E[(x k x k k 1 )(x k x k k 1 ) T ] P k k E[(x k x k k )(x k x k k ) T ] (2.7a) (2.7b) The joint density for the state and current measurement is then x k k 1 p(x k, y k y,, y k 1 ) N, C x k k 1 P k k 1 CP k k 1 P k k 1 C T CP k k 1 C T + R v (2.8) The conditional density of the state can then be expressed as p(x k y,, y k ) N( x k k, P k k ) (2.9) in which x k k = x k k 1 + P k k 1 C T (CP k k 1 C T + R v ) 1 (y k C x k k 1 ) (2.1) and P k k = P k k 1 P k k 1 C T (CP k k 1 C T + R v ) 1 CP k k 1 (2.11) The final step is to propagate the estimates and the estimate error covariance to the next time step x k+1 k = A x k k + Bu k P k+1 k = AP k k A T + GQ w G T (2.12) which leads to the standard recursive Kalman filter x k+1 k = A x k k + Bu k x k k = x k k 1 + L k [y k C x k k 1 ] (2.13a) (2.13b)

40 12 where the estimate error covariances can also be computed in a recursive fashion P k+1 k = AP k k A T + GQ w G T P k k = P k k 1 P k k 1 C T [CP k k 1 C T + R v ] 1 CP k k 1 (2.14a) (2.14b) The Kalman gain L k is defined as L k = P k k 1 C T [CP k k 1 C T + R v ] 1 (2.15) We assume that the estimate error covariance has also achieved steady-state, and the notation [P k k 1 ] k is reduced to P for brevity. We further assume that the Kalman gain has assumed its steady-state value, L. L = P C T [CP C T + R v ] 1 (2.16) The steady-state Riccati equation is P = AP A T + GQ w G T AP C T [CP C T + R v ] 1 CP A T (2.17) Disturbance Model A disturbance model is also required to achieve offset-free control. The disturbance can be modeled at the input or at the output depending on the situation. The general form of the disturbance model is x k+1 = Ax k + Bu k + B d d k + Gw k d k+1 = d k + ξ k y k = Cx k + C d d k + v k (2.18a) (2.18b) (2.18c)

41 13 where the additional noise is an uncorrelated Gaussian sequence distributed as ξ k N(, Q ξ ) (2.19) The disturbance model can be written as the following augmented state-space system x A B d x B G w = + u k k + (2.2a) d I d I ξ k k+1 k [ ] x y k = C C d + v k (2.2b) d k This augmented system can be used in Equations 2.16 and 2.17 to find the gain for the states and disturbances. The state and disturbance estimates are then computed as x k+1 k = A x k k + Bu k + B d dk k x k k = x k k 1 + L x [y k C x k k 1 C d dk k 1 ] d k+1 k = d k k d k k = d k k 1 + L d [y k C x k k 1 C d dk k 1 ] (2.21a) (2.21b) (2.21c) (2.21d) Pannocchia and Rawlings [123] demonstrated that a number of integrators (and the corresponding number of covariance elements) equal to the number of outputs (not the number of controlled variables) is required to ensure offset-free control. In this work, we generally deal with two limiting cases of the disturbance model. The pure output disturbance model (B d =, C d = I) is a common industrial choice. The pure input disturbance (B d = B, C d = ) is less common, but is used in several cases for

42 14 comparison purposes. Other choices of (B d, C d ) are beyond the scope of this dissertation; the interested reader is referred to [123] Target Tracking The target tracking problem is an optimization to track non-zero setpoints subject to the constraints min x s k,us k,η Φ = η T Q s η + (u s k ū)t R s (u s k ū) + qt s η (2.22) (I A)x s k Bus k = B d d k k (2.23a) Cx s k + η r k d k k Cx s k η r k d k k η (2.23b) (2.23c) (2.23d) If using LQR, then the target tracking problem can be solved in closed-form. Solving the following equality gives the steady-state targets for the regulator. I A B C x s k u s k B = d dk k r k C d dk k (2.24) The corresponding control law to drive the outputs to setpoint is u k = K( x k k x s k ) + us k (2.25) or if incremental control penalties are being used u k = K x k k x s k + us k (2.26) u k 1 u s k

43 15 Generally, estimated quantities in this dissertation are denoted with a hat. Specifically, this dissertation explores the estimation of the covariance matrices, and denotes the estimates as Q w and R v. All other MPC conventions not explicitly stated follow the works of Rawlings [129], Muske and Rawlings [131], and Rao and Rawlings [128]. 2.2 Effects of Incorrect Covariances The Kalman gain in Equation 2.15 yields optimal state estimates only if the noise covariances are known correctly. If the state estimates are suboptimal, then the subsequent control is suboptimal. In this dissertation, we also consider the effects the covariances of the disturbances can have on industrial operations. We illustrate one of these effects with an example. Example 2.1 Motivating Industrial Type Example We illustrate this motivation with the simple example of the two state, one output system, the state-space realization of the following transfer function. g (s) = 2 (s + 1) (s + 2) (2.27) which is A = B = (2.28) [ ] G = I C = 1 (2.29)

44 16 The noise covariances are chosen as [ ] [ ] Q w =.9 I 2 Q w =.9 I 2 (2.3) R v = [.6 ].6 < k < 25 R v = (2.31).6 k > 25 and the regulator penalties are set as [ ] [ ] [ ] Q = 1 R = S = 1 (2.32) The input and output of the plant are shown in Figure 2.2. In this case, the tracking is good, but the control action is aggressive, especially after the sensor noise increase at k = 25. An operator monitoring this process might notice the aggressive control action and increase the penalty on the inputs in the regulator. [ ] [ ] [ ] Q = 1 R = S = 5 (2.33) As illustrated in Figure 2.3, the input is not quite as aggressive. However, this change in the input penalty causes a decrease in the tracking performance. Figure 2.4 compares the tracking performance of the new regulator tuning versus a covariance estimator that detects the shift and updates the estimator. This example illustrates how a monitoring layer can be beneficial in detecting problems in the estimator. The intuitive solution in this example (using the regulator to slow the input), is ultimately detrimental to the process.

45 17 u k y k k input output setpoint k Figure 2.2: Input/Output with incorrect estimator tuning and a step increase in sensor noise covariance (R v )

46 18 u k k input Figure 2.3: Input after using the regulator to compensate for noise 2.3 Estimator Tuning Effects The secondary effects of incorrect covariances are summarized below. We have already illustrated the case in which the sensor noise covariance (R v ) increases. The secondary effect of this change is a decrease in the tracking performance caused by the operator using the wrong control knob. Conversely, it can be argued that a decrease in the covariance might not be noticed by the operators, sacrificing the tracking performance that can be realized by increasing the confidence on the sensor readings. The cases with the state noise covariance (Q w ) are analogous to those in the sensor noise covariance (R v ).

47 19 y k output, incorrect control knob output, adaptive filtering setpoint k Figure 2.4: Comparison of tracking performance - wrong control knob versus covariance estimator 2.4 Evaluating Probability Distributions Since the goal of this dissertation is to estimate covariances, we define how to judge the quality of those estimates. 1. Covariance is a scalar The probability density of the random variable v k (say), is given by v k (x) = ( ) 1 exp (x m)2 2πσ 2 2σ 2 (2.34) We assume that the mean (m) is zero, and we are interested in estimating the covariance (variance), σ 2, of the noise sequence. The estimates of the variances can be plotted with time on the x-axis and variance on the y-axis.

48 2 Table 2.1: Summary of secondary effects Case Interpretation Effect Operator Response Net Effect R v or Q w R v or Q w Sensor quality deteriorates More reliable sensor Excessive control action Slow tracking Increase S penalty Probably none Slow tracking Slow tracking 2. Covariance is a 2 2 matrix A two-dimensional covariance represents a three-dimensional probability distribution. The probability distribution function for the sensor noise is v k (x) = [ 1 exp 1 ] (2π) n/2 R v 1/2 2 (x m)t Rv 1 (x m) (2.35) To represent these covariances, we draw planes of constant probability (x T R 1 v x = c) through the distribution, and evaluate the resulting ellipses as shown in Figure 2.5. The eigenvectors of the matrix are represented by the major and minor axis of the ellipse and the eigenvalues are represented by the lengths of the respective axes. The overall performance of the new estimator can be judged in this way as well. If we define the state estimate error as (x k x k k 1 ), then the two-dimensional representations of the estimate error distributions for the previous example are illustrated in Figure 2.6. A smaller ellipse means that the state estimates more closely approximate the true states of the system.

49 21 3. Covariance has greater dimensions than a 2 2 matrix In this case, more rudimentary methods of presenting the data must be employed. We use either element-by-element plots, or some norm of the matrix differences.

50 x x 2 3 x T P x = c Figure 2.5: Evaluating 3-D probability distributions

51 23.15 suboptimal estimator adaptive filter Figure 2.6: Comparison of state estimate error probability distributions from Example 2.1

52 24 Example 2.2 Instability Consider the following state-space model ( ) exp [ ( 75 A = ( ) B = 1 exp exp )] (2.36) in which is the sample time, one minute in this example. We assume there is an slow-drift output disturbance (see Chapter 6) in the data, so that the plant is given by x k+1 = Ax k + Bu k + Gw k d k+1 = d k + ξ k y k = Cx k + d k + v k (2.37a) (2.37b) (2.37c) In this example, we add mismatch in the state-space input matrix as follows 1.2 B = B (2.38).8 We assume that everything about the process (except the mismatch) is known, including the statistics of the disturbances actually entering the process Q w = Q w Q ξ = Q ξ R v = R v (2.39) The input/output performance of this system is shown in Figures 2.7 and 2.8. This example illustrates that in the presence of model mismatch, it is easy to destabilize the entire control system with a bad choice of disturbance parameters in the estimators. More details about the regulator and estimator are located in Section 6.5, along with the solution to this motivating example.

53 25 y k output k Figure 2.7: Outputs of the second motivational example u k input k Figure 2.8: Inputs of the second motivational example

54 26

55 27 Chapter 3 Review and Critical Analysis of Previous Work I have not failed. I ve just found 1, ways that don t work. Thomas Alva Edison The works of Kalman in 196 [72] and Kalman and Bucy in 1961 [73], greatly advanced the field of state estimation. Since that time, it has become an important aspect of control theory, and subject to an enormous body of research. The Kalman filter is the optimal solution (in mean square error sense) to the filtering problem, assuming sufficient information is known about the process [25, 45], including the system model and the covariances of the state noise and the sensor noise. Significant research has been conducted to determine the effects of inaccurate estimates of the model and noise covariances [8, 134]. Fitzgerald and Price showed the state estimates could diverge when the filter is constructed improperly [41, 125].

56 28 Sangsuk-Iam and Bullock [136, 137] studied the linear time-varying case while using the incorrect covariances. Willems and Callier [171] studied the same filter divergence in the linear time-invariant case using incorrect covariances. There is significant motivation (see Chapter 2) to develop methods to determine the covariance matrices from the usual plant data. Traditional adaptive filtering methods generally fall into four categories [14, 15]: Bayesian, maximum likelihood, covariance matching, and correlation techniques. 3.1 Bayesian Methods Bayesian adaptive filters were the first attempt at computing the unknown parameters [99]. Several other papers followed using Bayesian estimation [3, 4, 57, 135, 142, 149]. The principle behind the Bayesian adaptive filter is to recursively find the a-posteriori probability distribution function of the states and a vector of unknowns. The vector could include the system matrices, but for simplicity, we assume that it includes only the unknown covariances Q w and R v. Therefore, the Bayesian approach attempts to find p(x k, Q w, R v y y k ) = p(x k Q w, R v, y y k ) p(q w, R v y y k ) (3.1) The first term is the probability distribution of the state (from the Kalman filter, say), evaluated at a particular (Q w, R v ) pair. The second term is more difficult to

57 29 evaluate [142] p(q w, R v y y k ) = p(y k Q w, R v, y y k 1 )p(q w, R v y y k 1 ) Ω p(y k Q w, R v, y y k 1 )p(q w, R v y y k 1 )dω (3.2) In this case, Ω represents the entire set of Q w, R v. Evaluating both the Kalman filter equations (for every Q w, R v ) and the probability distributions in Equation 3.2, may be too computationally expensive for real-time application. If the set of values that Q w, R v are drawn from is known and finite, then the computations become more tractable and is well suited to a multi-model approach as in [9]. Burkhart and Bishop [26] use this approach with an example of tracking the Mars Pathfinder mission. The state estimates are computed from a weighted summation of the state estimates produced from M filters, each with different covariances. M x k k = x k k ((Q w, R v ) i )p((q w, R v ) i y y k ) (3.3) i=1 in which p((q w, R v ) i y y k ) = p(y y k (Q w, R v ) i )p((q w, R v ) i ) M j=1 p(y y k (Q w, R v ) i )p((q w, R v ) i ) In this framework, the weighting on one filter in the bank converges to one, and the rest converge to zero [57, 99]. Once the weighting matrices have converged, the choice of filter is fixed. Other researchers have used this filter bank approach [24, 46, 1]. For this project, we prefer to keep the generality of the covariances, and dismiss the Bayesian approach for our covariance estimator.

58 3 3.2 Maximum Likelihood Methods The second approach to the adaptive filtering problem is to compute the unknown covariances by maximizing a likelihood function [19, 74, 15]. There are three different types [14] of likelihood functions that can be maximized. 1. Joint Estimates - maximize p(x k, Q w, R v y y k ) with respect to x k, Q w, R v. 2. Marginal Estimates - maximize p(q w, R v y y k ) with respect to Q w, R v 3. Conditional Estimates - maximize p(x k y y k ) with respect to x k. The marginal estimate can be found using Bayes rule p(q w, R v y y k ) = p(y y k Q w, R v )p(q w, R v ) p(y y k ) (3.4) The marginal likelihood estimate of Q w, R v can be found by maximizing L(Q w, R v ) = log p(q w, R v y y k ) (3.5a) = 1 k y j Cx j 2 2 (CP j CT + R v ) 1 + log (CP j CT + R v ) j=1 + log p(q w, R v ) + c (3.5b) The joint likelihood estimate can be found in similar fashion, maximizing the following function L(x k, Q w, R v ) = log p(x k, Q w, R v y y k ) (3.6) The maximum likelihood method is complex, and is derived in [14]. Even the simpler case of the marginal likelihood method can be computationally expensive [14,

59 15, 117] and is not guaranteed to converge. As in the Bayesian case, the methods are 31 simplified if the maximum likelihood function is evaluated over a limited set of unknowns. As illustrated in [3], the unknown set consists of a set of noise covariances, with a network to weight the individual Kalman filters according to their performance. Zhou and Luecke [174] attempt the maximum likelihood method on the cumulative error of the system. This method is limited to diagonal Q w matrices. 3.3 Covariance Matching Methods Covariance matching techniques make the covariances of the state estimate residuals consistent with their theoretical covariances [49, 53, 135]. We expect the residuals of the output predictions to be E[(y k C x k k 1 )(y k C x k k 1 ) T ] = E[Y k Y T k ] (3.7) From this equation, the noise covariances can be estimated as in Myers and Tapley [112], corrected to be unbiased from sampling R v = 1 N d 1 N d j=1 [ Y j Y T j ( ) ] Nd 1 CP j CT N d (3.8) In the limit of large N d R v = E[Y k Y T k ] CP C T (3.9) Leathrum [9] claimed that the Myers/Tapley algorithm could diverge, and offered a correction term. The correction does not change the structure of the problem.

60 32 A similar correlation can be written to estimate the state noise covariance Q w. The proposed idea is if the second moment of the residuals is larger than the theoretical covariance, then the covariance of the state noises is increased (and vice versa). However, there is no guarantee that the iterative scheme converges [93]. The sensor noise covariance estimate, R v is a function of the estimate error covariance, P. However, the estimate error covariance is not known correctly unless the covariances of the disturbances are known. More recent papers that make use of these techniques use simplifications to avoid the iteration [61, 62, 11, 156]. Hull et. al. [63] use this method to avoid divergence of a homing missile tracking. A popular assumption for covariance matching is that the sensor noise covariance (R v ) comes from a measurement device and can be specified by the vendor [5, 62, 158]. Louv [98] proposed an adaptive filtering algorithm based on the full information estimation problem, based on the following linear model y 1 y 2. y k = C A I C A I... C x 1 x 2. x k + v 1 w 1 v 2 w 2. v k }{{} ε (3.1)

61 33 in which E[εε T ] = diag R v Q w, R v Q w, (k times) This equation is a more general form of the covariance matching techniques presented, and as a result, suffers from the same problems already described. Fioretti and Jetto [4] developed a covariance matching technique based on smoothed state estimates. This result and the ones previously presented are analyzed further later in this chapter. Jazwinski also presents methods [66, 67] for adaptive estimation of the state noise covariance. This method compares the predicted residuals against the predicted residuals when the state noise covariance is assumed to be zero. When the predicted residuals are within their one sigma limit, the state noise covariance is assumed be zero. In the scalar case, q k = ν 2 k+1 E[ν2 k+1 q=] P k+1 GGT P k+1 if positive otherwise (3.11) This method can be considered a special case of the covariance matching techniques. Moreover, it requires knowledge of the sensor noise covariance. 3.4 Time Series Method In addition to the four categories defined previously, the time series approach to the adaptive filter has a strong following. The basis for these methods is a Box and

62 34 Jenkins [21] style time series polynomial as shown in Lee [92] z(k) = C(α 1 w k 1 + α 2 w k α m w k m ) + (3.12) + a v k + a 1 v k a m v k m where α 1 = a G α j = Aα j 1 + a j 1 G The coefficients of the time series model can be related to the covariances of the system, which then can be computed in a least-squares fashion, using the results of a lagged autocorrelation procedure. Most of the papers making use of the time series method [1, 32, 69, 92, 16, 115, 116, 153] follow the same basic pattern. Ramirez-Beltran [127] applied the time series method to hurricane track predictions. An advantage of this method cited by almost all of these papers is that filtered estimates of the states do not need to be computed, thereby saving computation time and the risk of a possible divergence due to bad initial construction of the Kalman filter. This procedure is similar to the correlation procedure presented next, but includes the extra step of fitting the time series model to the outputs. A second class of time series method was presented by Hagander and Wittenmark [52]. This method involves fitting a smoother to the data, and then computing the covariances from the fitted parameters. This method was extended by Moir and Grimble [17] to include filtering in an attempt to generalize the method.

63 3.5 Correlation Methods 35 The covariance correlation methods began with the works of Anderson [7]. The correlation methods encompass the general topic of correlation the outputs (or innovations) over time to monitor the propagation of noise terms through time. The first major contribution to the field was by Mehra [12, 13, 14]. Mehra s approach uses the autocorrelations of the outputs [12] and later the innovations [14] to ensure a stationary result. Analysis of the works of Mehra are postponed to Chapter 4. Many papers have made use of this correlation strategy in one form or another [55, 14, 143]. Wojcik [172] modified the Mehra method to perform better in the SISO case. While SISO is too specific for our project, it is well suited to certain signal processing applications. In 1973, Carew and Bélanger [28] followed up on the works of Mehra. They demonstrate that the methods of Mehra require solution of Lyapunov style equations at every iteration, and may not converge to the correct solution. Carew and Bélanger proposes an iterative scheme that requires only matrix multiplications. This work is explored in greater detail in Chapter 5. In 1974 Bélanger [18], extended the correlation techniques and noted that the covariances were linear in a set of parameters, that could be fit to the autocovariance. In this formulation, R i, Q i are known matrices. and the correlation methods are parameterized by a scaling vector. Bélanger claims his method handles the nonstationary case and is more efficient. This branch of the correlation methods was further developed by several others [34, 37, 94, 132, 157]. Godbole [47] extended the

64 36 Mehra method to include correlation noise and unknown (non-zero) means. Sinha and Tom [144] uses the Carew method as an initial estimate, and apply stochastic approximation to account for non-stationary systems, but this additional step may be prohibitively slow. Alspach [2] commented on Mehra s work and made a recommendation for the termination step of the method. Hong applied the correlation methods [59] to multicoordinated systems. Guu and Wei [51] use a heuristic approach to account for correlated noises so they can be used in the correlation methods. The correlation methods have already found several applications in the literature. Chen and Chui [33] use a modified Mehra procedure with statistical approximation to the target tracking problem. Gabrea, et. al. [43] applied Mehra s method to recover the speech signal from a noisy voice signal. Dee [36] uses Bélanger s method for atmospheric data assimilation. Chang and Tabaczynski [31] applies the Mehra method to the target tracking problem. For example, tracking a moving target by some methods (e.g. radar), without losing track because of a poorly tuned estimator. Jang, et. al. [64] apply the Mehra method to the GPS positioning problem, and also use it as a fault identification tool. They demonstrate that the adaptive method detects a fault in the GPS estimate earlier than traditional methods. There have been few major changes to the correlation methods since the works of Carew and Bélanger. There have been several incremental changes, some of which are previously listed. The correlation methods are further explored in Chapters 4-6.

65 37 Up until now, these methods have been used only in open-loop situations. 3.6 Subspace Identification Subspace identification has been the subject of much research over the past decade. Proponents claim that this method uses clever, yet complicated projections of the input-output data to obtain the full system model (A, B, C, D, Q w, R v ). While this dissertation is not a system identification project, the techniques and results of subspace identification must be explored since they claim to find the optimal tuning of the estimator as a by-product. Rudimentary identification methods have been present for a long time, although the subspace methods first began to emerge with the works of Ho and Kalman [58] in Their method begins the so-called realization based subspace identification [165] field. The realization based methods attempt to recover the system matrices from a collection of impulse responses [175], taking advantage of the structure of the Hankel matrices. y k = h j u k j (3.13) j= There were several contributions to the realization based methods in recent years [17, 77, 95, 97], although these methods suffer from several drawbacks. It can be difficult to obtain the proper impulse responses needed by realization methods, and are only applicable for certain types of inputs [96].

66 38 A new class of subspace algorithms tries to circumvent forming the impulse response matrix. The inputs and outputs of the system are stacked in a structured way (without noise) y 1 y 2 y N C y 2 y 3 y N+1 CA [ ] = x x 2 x N.. y d y d+1 y N+d 1 CA d D u 1 u 2 u N CB D u 2 u 3 u N CA d 2 CB D u d u d+1 u N+d 1 (3.14a) Y = Γ d X + Φ d U (3.14b) These so-called direct 4SID methods attempt to solve Equation 3.14b for the extended observability matrix Γ d and the Toeplitz matrix of impulse responses Φ d from the known input-output matrices Y and U. The extended observability matrix is found by projecting the outputs onto the null-space of the inputs, effectively removing the effects of the inputs [38, 18, 163, 164]. These methods are effective in the absence of noise, or if the covariance of the innovations is proportional to the identity matrix. However, when process noise is present, the method is not effective unless the signalto-noise ratio is large [164, 167]. The last major class of subspace methods are the instrumental variable methods. These methods are designed to overcome the restric-

67 39 tions on the noises that are present in the direct 4SID methods. In this method, the input-output data is divided into past and future elements. The state can be thought of as the interface between the past and future [15]. In the stochastic identification case (u k = ), the outputs are stacked into a block Hankel form [122] Y p f = y y 1 y j y i 1 y i y i+j 2 y i y i+1 y i+j 1 = y i+1 y i+2 y i+j y 2i 1 y 2i y 2i+j 2 Y p Y f (3.15) Next, the past inputs and outputs are projected onto the future outputs, yielding some linear transformation O. Next, a singular value decomposition is performed [39, 83]. In this SVD, the weighting matrices W 1 and W 2 are chosen based on the type of subspace method being used. W 1 OW 2 = [ ] U 1 U 2 S 1 V T 1 V T 2 (3.16) From this SVD, either the extended observability matrix or estimates of the states can be recovered. This information is sufficient to compute the system matrices. As shown, if the states are replaced with the state estimates, then the problem becomes a linear regression in the system matrices A, B, C, D and the noise covariances can be

68 4 computed from the residuals. x k+1 y k = A C B D x k u k + w k v k (3.17) All of the subspace algorithms follow this general pattern, differing only in the weighting matrices in Equation We can extend these projections to the deterministic case as well, in which the projections are performed between past inputs, past outputs, and future outputs. The newest methods are based on the instrumental variable approach [159, 16, 161, 162, 168]. Another popular instrumental approach is canonical variate analysis spearheaded by Larimore [81, 82]. A critical analysis of these methods and how they relate to this project can be found later in this chapter along with a discussion of the consistency of the results. 3.7 Critical Analysis of Covariance Matching Covariance matching is a popular choice for the adaptive filtering problem, so we perform an analysis to determine if it is appropriate for our monitoring layer. Subspace identification is a popular tool that also claims to deliver the correct tuning for the estimator, so we perform an analysis to determine if it is an effective tool for this project.

69 Filtered Estimates The simplest covariance matching technique is to use the residuals of the standard Kalman filter. From the information provided in the filter, the covariances can be extracted and compared to the theoretical probability distribution function defined by the model. In the estimation problem, the standard steady-state Kalman filter in predictor form is used x k+1 k = A x k+1 k + AL(y k C x k k 1 ) (3.18) in which L = P C T [ CP C T + R v ] 1 (3.19) Recall that P is the steady-state estimate error covariance before update, and the solution to the steady-state Riccati equation P = AP A T + GQ w G T AP C T [CP C T + R v ] 1 CP A T (3.2) in which P = E[(x k x k k 1 )(x k x k k 1 )T ] (3.21) An estimate of the sensor noise covariance (R v ) can be computed from the residuals of the state estimates E[Y k Y T k ] = E[(y k C x k k 1 )(y k C x k k 1 ) T ] (3.22) E[Y k Y T k ] = E[(Cx k + v k C x k k 1 )(Cx k + v k C x k k 1 ) T ] = CP C T + R v (3.23)

70 42 An estimate of the state noise covariance (Q w ) can be computed by a similar method E[Gŵ k ŵ T k GT ] = E[( x k+1 A x k )( x k+1 A x k ) T ] (3.24) E[Gŵ k ŵ T k GT ] = E[(AL v k )(AL( v k )) T ] = AL(CP C T + R v )L T A T (3.25) Example 3.1 Covariance matching with filtered state estimates The state estimates can be used to construct estimates of the noise terms, v k and ŵ k. These values are plotted as a histogram and compared to the theoretical normal curve predicted by the initial values of Q w and R v as in Equations 3.23 and For this example, we use the same two state, one output system as in the previous chapter [ ] A = B = G = I C = 1 (3.26) The state noise covariance is known exactly, whereas the sensor noise covariance is not. Q w =.1.1 Q w =.1.1 (3.27) [ ] [ ] R v = 1 R v =.5 (3.28) As described previously, the estimates of the noise elements are computed from the data and plotted as a histogram, resulting in a Gaussian distribution. The corresponding theoretical covariance is computed from the model and plotted as a continuous curve. As shown in Figure 3.1, the correlation for the sensor noise term ( v k ) does not

71 match the theoretical predicted value from the model as expected with an incorrect estimate of R v in the filter. At the same time the state noise histograms in Figure data model.4 p( v k ) v k Figure 3.1: Filtered sensor noise frequency distribution using an incorrect R v do not correspond with their theoretical plots. Even though the state noise covariance Q w is known correctly, the incorrect sensor noise covariance affects the matching of the state noise covariance. If the correlations were performed using the correct estimate of the sensor noise covariance, then the data histogram and the theoretical model prediction curve align, as shown below in Figure 3.3. [ ] [ ] R v =.5 R v =.5 (3.29) Correcting the sensor noise covariance estimate has a similar effect on the state noise correlation plots as shown in Figure 3.4. From the previous section

72 data model data model.8.6 p((ŵ 1 ) k ).6 p((ŵ 2 ) k ) (ŵ 1 ) k (ŵ 2 ) k Figure 3.2: Filtered state noise frequency distribution using an incorrect R v p( v k ) data model Figure 3.3: Filtered sensor noise frequency distribution using correct covariances v k

73 data model data model p((ŵ 1 ) k ).1.8 p((ŵ 2 ) k ) (ŵ 1 ) k (ŵ 2 ) k Figure 3.4: Filtered state noise frequency distribution using correct covariances E[ v k v T k ] = R v = CP C T + R v R v (3.3) E[ŵ k ŵ T k ] = Q w = AL(CP C T + R v )L T A T Q w In addition to the covariance estimates not being equal to the true covariances, the estimate of the state noise covariance (Q w ) is a linear transformation of the sensor noise covariance (R v ). G Q w G T = AL R v L T A T (3.31)

74 46 There are not enough degrees of freedom to obtain unique solutions of both Q w and R v. Even if one of the covariances were known a priori, covariance matching does not work since the filter used to obtain the residuals is a function of both true covariances, one of which is unknown Smoothed Estimates It has been demonstrated that filtering does not give enough information to completely solve for the unknown covariance matrices. A better solution may be to use a smoothing approach in order to use a greater portion of the available data. The state estimates from a smoothing method differ in the amount of data that is processed. In the filtering method, the state estimate for a given time step k ( x k k 1 ) is computed using measurements available from zero to k 1, as illustrated in Figure 3.5. The smoothing method computes state estimates at a given time step k using all measurements available from zero to N ( x k N ) as shown in Figure 3.6. Given the additional data to be processed, smoothing might provide enough degrees of freedom to compute the correct covariance matrices. The smoothed estimates are computed from the following objective function, subject to the model min Φ = x N x 2 Q 1 + N k= v k N 2 R 1 v + ŵ k N 2 Q 1 w (3.32) s.t. x k+1 N = A x k N + ŵ k N y k = C x k N + v k N

75 47 y k k E(x 6 y 1,, y 5 ) x k k 1 x 6 5 k Figure 3.5: Filtered state estimates

76 48 y k k E(x 6 y 1,, y 11 ) x k k 1 x 6 11 k Figure 3.6: Smoothed state estimates

77 49 The smoothing problem can be collapsed to the following matrix form Φ = ŵ N ŵ 1 N. ŵ k N v N v 1 N. v k N x N x 1 N. x k+1 N T H ŵ N ŵ 1 N. ŵ k N v N v 1 N. v k N x N x 1 N. x k+1 N (3.33) in which H = diag Q 1 w Q 1 w... Q 1 w, R 1 v R 1 v... R 1 v, Q 1...

78 5 The constraints (the model) are formulated as follows I A I I A I I A I I C I C I C ŵ N ŵ 1 N. ŵ k N v N v 1 N. v k N x N x 1 N. x k 1 N x k N =. y y 1. y k (3.34) This problem is now reduced to a simple optimization problem min Φ (x) = x T Hx s.t. Dx = d To determine whether or not the covariance matrices have been chosen correctly, the estimates compiled from the data must be compared to a theoretical probability distribution function predicted by the model. We begin with the general form of the constrained least-squares problem with Lagrange multipliers. 2H D T D x λ = d (3.35) The vector x in this case is all of the estimates of the error terms stacked into a single vector. We wish to know the covariance of this matrix. The least-squares problem in

79 51 Equation 3.35 is solved in the standard way. x 2H D = T λ D 1 d (3.36) The covariance of the left hand side of the solution can easily be expressed as shown. xx T xλ T 2H D E = T 2H D T λx T λλ T D E [ dd ] (3.37) T D The term E [ dd ] T that appears in the last equation represents the covariance of the output values themselves. This method is explored further in the next chapter and 1 T is not covered in great detail here. y k y T k y k y T k+1 y k y T k+3 [ ] y k+1 y T E dd T k y k+1 y T k+1 y k+1 y T k+3 = E..... y k+3 y T k y k+3 y T k+1 y k+3 y T k+3 (3.38) Due to the zeros in the central matrix, the problem can be further simplified. 1 [ ] [ ] H D E xx T T = I [dd ] [ T ] H D E T T I I D I D Example 3.2 Covariance matching with smoothed state estimates We use the same example as before, but base the estimate of the state x k on y y N instead of y y k 1. A = B = (3.39)

80 52 G = I C = [ 1 ] (3.4) We assume that the filter is tuned using the same incorrect covariances (Equations 3.27 and 3.28). The sensor noise covariance match is shown in Figure 3.7, and the state noise covariance match is shown in Figure 3.8. Both correlations are smeared due to.6.5 data model.4 p( v k ) Figure 3.7: Smoothed sensor noise frequency distribution using an incorrect sensor noise covariance v k the incorrect estimate of R v. If both covariances are known correctly, the sensor noise and state noise covariance matches are shown in Figures 3.9 and 3.1 respectively. The covariance matching method using smoothed state estimates has the same problem as the filtering problem. If only one of the covariances is known incorrectly, the effects appear in both distributions. The additional uncertainty in the system that comes from generating state estimates is manifested in both probability distribution functions and cannot be reconciled.

81 data model data model.8.6 p((ŵ 1 ) k ).6 p((ŵ 2 ) k ) (ŵ 1 ) k (ŵ 2 ) k Figure 3.8: Smoothed state noise frequency distribution using an incorrect sensor noise covariance.6.5 data model.4 p( v k ) v k Figure 3.9: Smoothed sensor noise frequency distribution using correct covariances

82 data model.7.6 data model.1.5 p((ŵ 1 ) k ).8.6 p((ŵ 2 ) k ) (ŵ 1 ) k (ŵ 2 ) k Figure 3.1: Smoothed state noise frequency distribution using correct covariances 3.8 Critical Analysis of Instrumental Variables Subspace Identification (IV-4SID) As mentioned in Chapter 2, the subspace identification methods compute the entire system model, including the noise covariances to which this dissertation is devoted. For illustration purposes, we choose one of the instrumental variable methods, canonical variate analysis. The biggest proponent of CVA has been Larimore with an impressive body of research on the subject [7, 76, 84, 85, 87, 139]. The basic principle of instrumental variables subspace identification is the following augmented

83 55 state-space equation x k+1 y k = A C B D x k u k + w k v k (3.41) If the states of the system are known, Equation 3.41 can be solved for the system model (A, B, C, D, Q w, R v ) via linear regression. Of course, the true states of the system are not known, so we substitute some estimate of the states. First, two vectors are defined, the past vector consisting of d past inputs and outputs, and the future vector consisting of d future outputs. The size of the past vector and future vectors do not need to be the same, but for simplicity we assume the number of past inputs and outputs is equal to the number of future outputs. p T k = [y T k 1 y T k d ut k 1 ut k d ]T f T k = [y T k y T k+d ]T (3.42) Using these vectors, a transformation is computed to map the past inputs and outputs to the future. The transformation is then used to compute a memory of the process m k = [ I n ] Jp k (3.43) The linear transformation matrix J is computed to project the past inputs and outputs onto the future outputs. Derivation of this J matrix is complicated and omitted in this dissertation. However, for the purposes of this dissertation, it is not necessary to know how to derive J, only to know that it is simply a linear transformation of the data. The interested reader is referred to [86, 88, 89, 138] for more details. The memory is substituted into Equation 3.41 for the states, and the system

84 56 matrices can be computed from linear regression as follows T Â B = E m k+1 m k mk E 1 Ĉ D y k u k u k m k u k T (3.44) The noise covariances are computed from the residuals ŵ k v k = m k+1 y k Â Ĉ B D m k u k (3.45) The order of the system is usually not known, and must also be estimated. In the presence of noise, there is no clear way to estimate the order. There are, however, several heuristic methods of estimating the order [1, 6, 14, 17]. Uncertainty of the model order is a shortcoming of any subspace method, but for the purposes of this analysis, we assume the model order is known Consistency of IV-4SID The asymptotic properties of the subspace methods have only recently begun to appear in the literature. There has been significant research into proving the consistency of the deterministic part of the identification process [15, 16, 65, 79, 169]. Given one minimal state-space realization of a process x k+1 = Ax k + Bu k + w k (3.46) y k = Cx k + Du k + v k w k N(, Q w ) v k N(, R v )

85 Alternate realizations of the same process can be defined by any state transformation z k = T x k, in which T is full rank 57 z k+1 = T AT 1 z k + T Bu k + w k (3.47) y k = CT 1 z k + v k w k N(, T Q w T T ) v k N(, R v ) The process noise is transformed by the linear transformation in the alternate statespace representation, but the measurement noise is unaffected. For simplicity, when comparing the covariances from different state-space realizations, we compare the measurement noise covariance (R v ) since it does not change between realizations.. Example 3.3 Simple CVA covariance estimates As a simple motivating example, we consider the standard CVA method, using a single lag. We correlate one past input and one past output to one future output, which we substitute in Equation J 1 y k + J 2 u k y k = A C B D J 1 y k 1 + J 2 u k 1 u k + w k v k (3.48) We also make the assumption (without loss of generality) that the input (u k ) to the system is Gaussian with covariance U and Σ = E[x k 1 x T k 1 ]. Â B J = 1 CAΣC T J1 T + J 1CBUJ2 T [J 1(CΣC T + R v )J1 T + J 2UJ2 T ] 1 J 2 Ĉ D CAΣC T J1 T + CBUJT 2 [J 1(CΣC T + R v )J1 T + J (3.49) 2UJ2 T ] 1

86 58 ŵ k v k = J 1 y k + J 2 u k y k Â Ĉ B D J 1 y k 1 + J 2 u k 1 u k (3.5) From Equation 3.49, Â = J 1 Ĉ, B = J 2, and D =. ŵ k v k J = 1 y k J 1 ĈJ 1 y k 1 J 1 ĈJ 2 u k 1 y k ĈJ 1 y k 1 ĈJ 2 u k 1 (3.51) ŵ k v k = J 1 e k e k (3.52) The state residuals are a linear combination of the sensor residuals. There is not enough information to compute two independent covariances, thus resulting in a biased noise model. Example 3.4 Numerical example For a general subspace ID method, the residuals can be written as x = k+1 Â x k Bu k v k y k Ĉ x k Du k ŵ k (3.53) Since the sensor noise covariance R v should be the same regardless of realization, we illustrate only that covariance v k = Cx k Ĉ x k + v k Du k (3.54) The estimated covariance of these residuals is R v = CΣC T Ĉ ΣĈ T + R v DU D T (3.55)

87 59 in which Σ = E[x k x T k ] Σ = E[ x k x T k ] (3.56) While Σ is known, Σ is not. The covariance estimate is correct only if CE[x k x T k ]CT ĈE[ x k x T k ]ĈT DU D T = (3.57) A determination of when this condition is true (if at all) has not yet been found. There is practically no information in the literature regarding the consistency of the noise model. Kawauchi et. al. [75] recognized that the covariance matrices vary with time except in the limit of infinite data. A recent paper by Viberg [166], demonstrates bias in the noise model, but does not offer an explanation for it. We are able, however, to test this condition numerically. We have tested a variety of data points and lags in the CVA method. At the extreme, we simulate a two-state, two-output system with 3, data points and 1 lags. These values are excessive and computation times are on the order of days. Figure 3.11 below shows the comparison of the subspace noise model to the actual model. The histogram depicts the residuals from the subspace method and the solid line illustrates the true probability distribution. As can be seen, even with extreme amounts of data, the noise model for the subspace approach is biased. We do not dispute the effectiveness of subspace identification for computing the deterministic state-space model (A, B, C, D). However, we have shown that when using subspace identification (Â, B, Ĉ, D) (A, B, C, D) ( Q w, R v ) (Q w, R v ) as N d

88 residuals actual noise.3.25 residuals actual noise.2.2 p(( v 1 ) k ).15 p(( v 2 ) k ) ( v 1 ) k ( v 2 ) k Figure 3.11: Estimates of sensor noise covariance (R v ) from subspace Computing the noise model based on residuals is biased due to the uncertainty in the state estimates that is propagated into the residuals.

89 61 Chapter 4 Correlation Methods - Old and New Assassins! Arturo Toscanini, to his orchestra As mentioned in Chapter 3, correlation techniques are a popular method for adaptive filtering, and form the basis of the methods we have developed. In this chapter, we derive output-based correlation techniques for the open-loop system and compare these techniques to those available in the literature.

90 Stochastically Driven Outputs With no inputs to the system (u k = ), the outputs become summations of the random variable, x, and the random sequences {w k } N d k=, and {v k} N d k=. The outputs can be expressed as y = Cx + v y 1 = Cx 1 + v 1 = CAx + CGw + v 1 y 2 = Cx 2 + v 2 = CA 2 x + CAGw + CGw 1 + v 2 y 3 = Cx 3 + v 3 = CA 3 x + CA 2 Gw + CAGw 1 + CGw 2 + v 3 The sequence of outputs can be summarized as y k = CA k x + C k 1 h= A k h 1 Gw h + v k (4.1) At this point, we assume that the system is stable, in that all of the eigenvalues of the state transition matrix (A) have magnitudes of the real parts less than unity. This assumption is limiting, and is removed in Chapter 5, but for simplicity of explanation, we assume it here. 4.2 Autocovariance Matrix We begin by writing the expectation of the second moment of the output variables, assuming the outputs are dependent only on a combination of the random sequences

91 {w k } N d k= and {v k} N d k=. We define the covariance of the initial state as Σ k E[x k x T k ]. The covariance matrix is always symmetric, so the upper triangular elements are 63 omitted in the interest of compactness. E = y y T y 1 y T y 1 y1 T y 2 y T y 2 y1 T y 2 y2 T CΣ C T + R v CAΣ C T CAΣ A T C T + CGQ w G T C T + R v CA 2 Σ C T CA 2 Σ A T C T + CAGQ w G T C T E[y 2 y2 T ] (4.2) in which E[y 2 y T 2 ] = CA2 Σ A 2T C T + CAGQ w G T A T C T + CGQ w G T C T + R v The autocovariance matrix (ACM) is defined as the expectation of the second moment of the outputs with lagged versions of itself. Definition 4.1 The autocovariance matrix of the outputs is defined as R(N) = E y k y T k y k y T k+n..... y k+n y T k y k+n y k+n (4.3) in which N is the number of lags, or the window size of the autocovariance matrix. The initial time step reference is arbitrary, so any lower diagonal element of the ACM can be written as [ ] E y k+j y T k = CA j Σ k (A T ) k C T + k CA j k+h 1 GQ w G T (A T ) h 1 C T + R v δ jk (4.4) h=1

92 64 in which δ jk = 1 if j = k and δ jk = if j k. The unknown covariances can be estimated from this equation in four different ways: full triple matrix, single column matrix-matrix (SCMM), single column matrix-vector (SCMV), and full matrix-vector form. We derive these four methods and compare them to those in the literature. Computing the autocovariance matrix from data is discussed in Section Full Triple Matrix Method The objective is to compute the autocovariances from data, and estimate the unknown covariances from the autocovariance matrix. The ACM can be expanded to a form in which the three unknown matrices Σ k, Q w, Rv are separated. R(N) = C CA Σ k O T + CA 2 }{{} O R v R v R v R v + CG CAG CG CA 2 G CAG CG } {{ } Γ Q w Q w Q w Q w Γ T (4.5)

93 65 This equation can be expressed in the form, AXA T = B and solved as described in Appendix A. R(N) = [ ] O Γ I pn Σ k Q w... Q w R v... R v O T Γ T I pn (4.6) Solving this matrix equation, however, requires a method to eliminate Σ k as well as constraints to force each of the N Q w matrices to be equal, as well as the N R v matrices. All of the off-diagonal terms (except those contained within the partitioned matrices on the diagonal) must also be constrained to zero. As is, the structure of the problem is not informative in terms of existence or uniqueness of covariance estimates. This form is revisited in Section Removing the Initial Condition The initial state covariance Σ k must be removed from the problem. We assume that the system has reached steady state, and the initial state covariance disappears due to the increasing powers of A (assuming that A is stable). The remaining terms form an infinite series in GQ w G T that collapse to a steady-state value given by the solution

94 66 to the Lyapunov equation S = GQ w G T + AGQ w G T A T + A 2 GQ w G T A 2T + (4.7) Thus at sufficiently long times away from the initial state, the autocovariance matrix collapses to the following CSC T + R v CSA T C T CSA 2T C T CSA 3T C T CASC T CSC T + R v CSA T C T CSA 2T C T R(N) = CA 2 SC T CASC T CSC T + R v CSA T C T CA 3 SC T CA 2 SC T CASC T CSC T + R v (4.8) This result can be verified using another form of the Lyapunov equation S = ASA T + GQ w G T (4.9) which can be used to reduce each of the autocovariance matrix elements to a single term without a summation. Example 4.1 Lyapunov convergence As an example, we choose the element E[y k+2 y T k+2 ] from the correlation and illustrate how the steady-state Lyapunov equation collapses the summation of GQ w G T terms into a single quantity. E[y k+2 y T k+2 ] = CA2 SA 2T C T + CAGQ w G T A T C T + CGQ w G T C T + R v (4.1) = CA (ASA T + GQ w G T ) A }{{} T C T + CGQ w G T C T + R v S = C (ASA T + GQ w G T ) C }{{} T + R v S = CSC T + R v

95 Solving the Least-squares Problem Since Equation 4.8 is still in triple matrix form, a simplification is required. By taking the first p columns of the autocovariance matrix, the problem is reduced to single column matrix matrix (SCMM) form. y k y T k C I y k+1 y T k CA E =... y k+n y T k CA N } {{ } } {{ } R 1 (N) A SC T R v (4.11) The left-hand side of the equation is computed from data and solved (see Appendix A) for the unknowns R v, SC T. ŜC T R v [ = (A ) T A ] 1 (A ) T R 1 (N) (4.12) The SCMM method is not unlike ones found in the literature [12]. A more detailed comparison is in the next section. Example 4.2 Solving for R v and SC T We use this least-squares method to compute estimates for the output noise covariance, R v, and the modified Lyapunov solution, SC T, for the plant illustrated in Example 3.1. The simulation is repeated 1 times and the covariance estimator approximates the true sensor noise covariance and Lyapunov solution elements, as shown in Figure 4.1.

96 68 1 (R) est (R) act experiment (SC 1,1 ) est (SC 1,1 ) act (SC 2,1 ) est (SC 2,1 ) act experiment Figure 4.1: Estimation of R v and SC T from open-loop output data

97 Connections to the Literature As mentioned, similar methods are available in the literature [14]. The output correlation method can be described using the j th lagged autocovariance of the outputs C j = E[y k y T k+j ] (4.13) A collection of these autocovariances results in a least-squares problem in the sensor noise covariance R v and the modified state covariance ΣC T, in which Σ = E[x k x T k ]. The correlations are performed with a fixed window size equal to the order of the system, n. C 1 C 2. C n = CA CA 2. CA n ΣC T (4.14) and R v = C CΣC T (4.15) Solving Equation 4.11 is identical to sequentially solving the least-squares problem in Equation 4.14 and the equality in Equation Practical experience has shown that the numerical conditioning on the simultaneous problem is better than the sequential version. Solving the two equations assumes that C is estimated perfectly from data. This observation is supported by Neethling [113].

98 7 4.4 Augmented Observability Matrix From Equation 4.11, the least-squares problem to find the sensor noise covariance and the Lyapunov solution is based on the following augmented observability matrix A = Equation 4.11 can be solved if the inverse is defined. C I CA (4.16).. CA N Lemma 4.1 The inverse [(A ) T A ] 1 exists if and only if A has full rank. Proof: This proof is in Appendix A. Lemma 4.2 In the least-squares problem, Ax = b, the solution x = (A T A) 1 A T b (4.17) is a unique minimizer of Ax b, if and only if A has full column rank. Proof: This proof is in Appendix A. Definition 4.2 We define the Hautus matrix [151] λi A H (A, C) = C (4.18) The Hautus Lemma states that the system is observable if and only if rank[h (A, C)] = n λ λ(a) (4.19)

99 The augmented observability matrix and the Hautus lemma can be used to develop conditions for solving the least-squares problem. 71 Lemma 4.3 The estimates ŜC T and R v from Equation 4.12 uniquely minimize Equation 4.11 if and only if A is full rank and (A, C) is observable Proof: From Lemmas 4.1 and 4.2, the estimated covariances from Equation 4.12 are unique if and only if A has full column rank. 1. (A, C) observable and A full rank rank(a ) = n + p We see that A can be written as the observability matrix if N = n + p A = O(Ã, C) (4.2) in which Ã = A C = [ C I p ] (4.21) We show that A is full rank by showing the Hautus matrix is full rank. A λi A λi H (Ã, C) = = λi [ ] p C I p C I p (4.22) for λ H (Ã, C) = λi A λi C I p (4.23)

100 72 The first column has rank n due to the observability of (A, C). The second column has rank p and is independent of the first column due to the zero in the second row. for λ = H (Ã, C) = A C I p (4.24) rank[h (Ã, C)] = rank A C I p (4.25) We show the columns are independent by showing the multipliers (α i, β i ) of the matrix must be zero. A C I p α i β i = (4.26) In the first row, the α i s must be all zero if A is full rank. In the second row, since the α i s are zero, then all of the β i s are also zero. Therefore, if A is full rank, then the Hautus matrix is full rank. 2. rank(a ) = n + p (A, C) observable and A full rank or not[(a, C) observable or A full rank] not[rank(a ) = n + p] From Equation 4.23, it is apparent that the first n columns are rank deficient if (A, C) is unobservable, and rank(h ) < n + p. If A is rank deficient, the first n rows of Equation 4.25 are not independent, and rank(h ) < n + p. If

101 rank(h ) < n + p, then A does not have full column rank, and the estimates of SC T, R v are not unique. 73 To find unique estimates in this least-squares problem, the augmented observability matrix must be full column rank, which is analogous to having a finite condition number. The condition number of matrix is the product of the Euclidean norm of the matrix with the norm of the inverse. γ(a) = A A 1 (4.27) It is also the ratio of the maximum singular value to the minimum singular value of the matrix γ(a) = σ max(a) σ min (A) (4.28) If A does not have full rank, then σ min is equal to zero and the condition number is undefined. The condition number of the augmented observability matrix is related to the size of the confidence intervals on the covariance estimates. We have already shown that when (A, C) is unobservable, the least-squares problem is not full rank, and thus has an infinite condition number. Several other factors influence this condition number including the window size, N, and the eigenvalues of the state transition matrix.

102 Window Size The number of terms included in the least-squares problem is the window size N. From Lemma 4.3, we assume that N n + p. E y k y T k y k+1 y T k. y k+n y T k = CSC T + R v CASC T. CA N SC T (4.29) On initial inspection, it may appear that if more lags are used, the efficiency of the estimation improves. With more equations in the least-squares problem, the resulting estimate should be closer to the true value. Figure 4.2 illustrates the results of 1,8 simulations of the plant in Example 3.1, with varying window size, ranging from 3 to 2. As the window size increases past seven, the confidence intervals on the covariance estimates begin to diverge. The divergence of the covariance estimates is due to the increasing powers of the state transition matrix, which ultimately causes the least-squares problem to begin fitting noise in the problem. The window size in these least-squares problems must be chosen carefully. Otherwise, the increasing powers of A cause a row of near-zeros in the augmented observability matrix, resulting in large confidence intervals on the covariance estimates.

103 (R v ) est (R v ) act window size (SC T ) est (SC T ) act window size Figure 4.2: Estimate of R v and SC T with increasing window size

104 Eigenvalues of the Plant In Lemma 4.3, it is shown that unique estimates cannot be found when the state transition matrix is singular. In the limiting case, the state transition matrix, A, is equal to zero. Example 4.3 Rank condition of A In this case, the states are simply instantaneous random variables, shaped by the noise shaping matrix, G. x k+1 = Gw k (4.3a) y k = Cx k + v k (4.3b) The least-squares problem is given as E y k y T k y k+1 y T k. y k+n y T k C I =.. } {{ } A SC T R v (4.31) For unique covariance estimates, the column rank of the A matrix must be equal to p + n. The column rank of this matrix can never be greater than p, and unique estimates of CSC T + R v can only be found. Even if the A matrix is full rank, the eigenvalues still have a large effect on the estimate of the covariances.

105 77 Example 4.4 Eigenvalue effects on covariance estimates We use an example presented by Muske and Rawlings [111] to motivate the eigenvalue argument. We use the state-space realization of the z-space transfer function G (z) = z 2z 1.5z 2z 1 z 2.5z z 2.5z 1.5 (4.32) which yields A = B = C = (4.33) and GQ w G T = R v =.2.6 (4.34) The estimation of the (1, 1) element of the R v matrix as shown in Figure 4.3. As can be seen, the confidence interval around the estimated covariance shrinks as more data is processed. In the second part of this example, the (1, 1) element of the A matrix (and subsequently the eigenvalue) is moved to.5. The estimation of the (1, 1) element of the R v matrix are shown in Figure 4.4. While the mean of the estimated covariance approximates the plant value, the confidence intervals are much larger that the ones

106 78 4 actual data mean 1σ -1σ Figure 4.3: Estimate of R v (1, 1) with increasing points, well conditioned model k

107 79 4 actual data mean 1σ -1σ Figure 4.4: Estimate of R v (1, 1) with increasing points, badly conditioned model k

108 8 presented in Figure 4.3. When the eigenvalues of the state transition matrix are small, the effects of the state noise sequence, {w k } N d k=, do not persist in the data set long enough to be estimated reliably, since the time constant of a system with a small eigenvalue is small.

109 81 Example 4.5 Condition number of the eigenvalue problem Illustrating this effect is a two-state A matrix with off-diagonal zeros and the eigenvalues on the diagonal. A = λ 1 λ 2 (4.35) As the condition number of the augmented observability matrix approaches infinity, the confidence intervals on the covariance estimates grow without bound. In Figure 4.5, the condition number of the least-squares coefficient matrix is plotted versus the eigenvalues. When either eigenvalue approaches zero, the condition number grows exponentially. In the case in which one of the states is observable, [ ] C = 1 the condition number of the observability matrix takes on additional infinite regions due to this unobservable state, as seen in Figure Sliding Window If the eigenvalues of the system are well-behaved, then a sliding window approach may be used to track the noise disturbances over time. From a long string of data, a subset of the data is processed using the ALS method to compute estimates of the noise covariances. The data window is slid forward in time, and the process is repeated. This method is effective at estimating covariances that persist over time,

110 82 γ(a * ) λ λ 2 3 Figure 4.5: Condition number of the augmented observability matrix, full rank measurement

111 83 γ(a * ) λ λ 2 3 Figure 4.6: Condition number of the augmented observability matrix, limited measurement

112 84 since the ALS method cannot adapt to disturbances until after they have persisted in the outputs for some duration. Example 4.6 Sliding window to estimate changes in covariances In this example, the system from Example 4.4 is used to find the covariances using a single string of data and the sliding window approach. The sensor noise sequence can be represented by a three-dimensional Gaussian distribution, and plotted as ellipses as shown in Figure 2.5. In this simulation, the plant covariance matrix undergoes a dynamic shift midstream as summarized in Table 4.1. Figures 4.7 and 4.8 depict Table 4.1: Summary of changes in sensor noise characteristics k R v -1 [ ] 1-2 [ ] the performance of the sensor noise estimator over time. The time stamps of the subplots are summarized in Table 4.2.

113 85.4 plant model.4 plant model plant model.4 plant model plant model.4 plant model Figure 4.7: Estimation of sensor noise probability distribution with incorrect a priori estimate

114 86 Table 4.2: Details of sensor noise covariance estimates over time Figure 4.7 Figure 4.8 R v y R v y,, y 2 R v y,, y 1 R v y,, y 12 R v y,, y 4 R v y,, y 6 R v y,, y 14 R v y,, y 16 R v y,, y 8 R v y,, y 1 R v y,, y 18 R v y,, y 2

115 87.4 plant model.4 plant model plant model.4 plant model plant model.4 plant model Figure 4.8: Estimation of dynamic shift in sensor noise probability distribution

116 Extraction of the Q w Matrix From SC T, we must extract Q w by solving the Lyapunov equation. The Lyapunov equation can be solved in a number of ways [11, 54, 78, 11]. The Lyapunov equation can be expressed as an infinite series. S = GQ w G T + AGQ w G T A T + A 2 GQ w G T A 2T + (4.36) Applying the vec operator to the equation (see Appendix A) yields ( S s = (I n 2) (GQ w G T ) s + (A A) (GQ w G T ) s + A 2 A 2) (GQ w G T ) s + (4.37) Definition 4.3 Kronecker product A B = a 11 B a 12 B a 1n B a 21 B a 22 B a 2n B a m1 B a m2 B a mn B (4.38) Definition 4.4 vec operator The vec operator is the columnwise stacking of a matrix into a vector [23], in which a j is the jth column of the matrix vec(a) = [ a T 1 a T k ] T = A s (4.39) In this dissertation, the vec operator is replaced with the s subscript. We also define the inverse vec operator s 1 (i, j) which stacks a vector back into a matrix of dimension i j.

117 89 It is shown in [23, 152] and in Appendix A that solving the matrix equation AXC T = B (4.4) in which A R mxn X R nxn C R pxn B R mxp (4.41) is identical to solving the matrix-vector problem (C A)X s = B s (4.42) or } X = {(C A) 1 B s s 1 (n,n) (4.43) Lemma 4.4 Solving the equation AXC T = B is equivalent to solving (C A)X s = B s Proof: The proof can be found in [152] and also in Appendix A. Based on the property of Kronecker products [12]: (A B) (C D) = AC BD the following property is also true ( A k A k) = (A A) k Therefore, Equation 4.37 becomes S s = (I n 2) (GQ w G T ) s + (A A) (GQ w G T ) s + (A A) 2 (GQ w G T ) s + (4.44) Applying the substitution, X = A A,leads to the following infinite series. S s = ( ) I + X + X 2 + (GQ w G T ) s (4.45)

118 9 The scalar case of the infinite series is given as n= x n = 1 + x + x 2 + = 1 1 x x < 1 (4.46) The analogous matrix infinite series is X n = 1 + X + X 2 + = (I X) 1 λ(x) < 1 (4.47) n= Thus the solution of the Lyapunov equation can be computed in stacked form. Unstacking the vector into a matrix gives the solution to the discrete Lyapunov equation. } S = {(I n 2 A A) 1 (GQ w G T ) s s 1 (n,n) (4.48) The Lyapunov equation can also be written as ASA T S + GQ w G T = (4.49) Again applying the vec operator to the equation yields (GQ w G T ) s = (A A) S s S s (4.5) (GQ w G T ) s = (I n 2 A A) S s (4.51) which yields the same result as in Equation Next Q w is extracted from the Lyapunov solution. From our least-squares method, an estimate of S is not available, only SC T, which has the relationship N [SC T ] s = C Ss (4.52) i=1

119 91 Definition 4.5 Direct sum The direct sum [141, 152] is defined as A k A j =..... A j=1 } k times (4.53) Equation 4.11 becomes in which A C (Q w ) s (R v ) s = R(N) s (4.54) C = (C I n ) [I n 2 (A A)] 1 (G G) (4.55) Theorem 4.1 Given estimates of SC T and R v, unique estimates of Q w and R v can be found if and only if C has full column rank. Proof: Lemma 4.3 proves that estimates of SC T and R v can be found if and only if (A, C) observable, and A has full rank. The covariances Q w and R v can be found from C I Q w R v SC = T R v (4.56) 1. C has full column rank unique estimates of Q w, R v The proof begins by showing that the multipliers (α i, β i ) must be zero. C I α i β i = (4.57) From the second equation, the multipliers β i must be zero. Substituting into the first equation, the multipliers α i must be zero if C is full rank.

120 92 2. C does not have full column rank nonunique estimates of Q w, R v If C is rank deficient, then the matrix C I cannot have full column rank. Example 4.7 Q w estimation, short measurement In Mehra s work [14], he claims that an estimate for Q w can be found only when the number of unknowns in less than np. These conditions are neither necessary nor sufficient, since counterexamples can be generated, as discussed in Section 5.2. We repeat Example 4.4. C = G = (4.58) Q w = Q w = (4.59) R v = R v = (4.6)

121 93 The results of this example are shown in Figure 4.9. Since the state noise sequence is a three-dimensional probability distribution, we can illustrate the results as twodimensional ellipses. The first plot compares the initial estimate of the state noise distribution to the actual noise distribution. Similarly, the second plot shows the final estimation of the state noise distribution using the aforementioned method versus the actual distribution.

122 plant initial estimate plant final estimate Figure 4.9: Estimating Q w with limited measurements

123 4.7 Derivation of State-Based Full Matrix Least-Squares 95 Problem In this section, we revisit the full matrix form of the problem from Equation 4.4. The full autocovariance matrix can be written as O C R v CA R v R(N) = Σ k O T + CAA R v CAAA R v }{{} + GQ w G T C GQ w G T CA C GQ w G T CAA CA C GQ w G T } {{ } Γ Γ T (4.61) or N R(N) = OΣ k O T + Γ GQ w G T Γ T + i=1 N R v (4.62) i=1 Applying the vec operator to the equation yields N R(N) s = (O O)Σ s + (Γ Γ ) GQ w G T i=1 s N + i=1 R v = (O O)Σ s + (Γ Γ )I n,n (G G)(Q w ) s + I p,n (R v ) s (4.63) s

124 96 in which I x,n is a permutation matrix to convert the direct sum to a vector in the unknown elements of the covariance matrix. Applying the steady-state state covariance (Σ Σ k+1 = Σ k ) Σ s = (I n 2 A A) 1 (G G)(Q w ) s (4.64) yields R(N) s = ] } {[(O O) (I n 2 A A) 1 + (Γ Γ ) I n,n (G G) (Q w ) s + I p,n (R v ) s (4.65) This equation can be cast in a least-squares framework, min Q w,r v Φ = Ax b (4.66) in which A = [ [ (O O)(In 2 A A) 1 + (Γ Γ )I n,n ] (G G) Ip,N ] (4.67) and x = (Q w ) s (R v ) s b = [ R(N)s ] (4.68) The covariances are estimated from ( Q w ) s ( R v ) s = (AT A) 1 A T b (4.69) and ] Q w = [( Q w ) s s 1 (g,g) ] R v = [( R v ) s s 1 (p,p) (4.7)

125 97 The structure of the problem allows constraints to be added to the problem [ ] (O O)(In 2 A A) 1 + (Γ Γ )I n,n (G G) Ip,N A = S g S p (4.71) and b = R(N) s s g s p (4.72) The additional equations S g x = s g and S p x = s p can be used to enforce constraints (i.e. diagonality, positive definiteness, etc.) on the covariance estimates. The advantage of constructing the problem in the full matrix formulation is better numerical conditioning as well as better structuring of the problem.

126 98

127 99 Chapter 5 Innovations, Noise Shaping, and Optimal Filtering 1 When I am working on a problem I never think about beauty. I only think about how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong. Buckminster Fuller There are number of reasons why the output-based covariance estimator is inadequate for practical applications. In the output-based formulation, the control law has to be specified in order to close the loop. Additionally, output-based autocovariance methods are not suitable for estimating integrated white noise disturbances, as shown in Chapter 6. 1 Portions of this chapter were published in Odelson and Rawlings [12]

128 1 5.1 Innovations Form We begin with the estimate error using the predictor form of the Kalman filter x k+1 k = A x k k 1 + Bu k + AL ( y k C x k k 1 ) (5.1) The explicit values of the estimate errors are not known, but we can write an expression for their evolution. As before we define the state estimate error as, ε k x k x k k 1, so the evolution of the state estimate error can be written as ε k+1 = (A ALC) ε k ALv k + Gw k (5.2) At this point we replace the stability assumption with the far less restrictive assumption that the regulator can stabilize the plant. The estimate error evolution can be written in standard state-space form, in which the output is the prediction error (innovations) or the system, given as Y k y k C x k k 1. We define the state-space model of the innovations as ε k+1 = Āε k + Ḡ w k Y k = Cε k + v k (5.3a) (5.3b) In the previous chapter, it is assumed that the two noise terms, {w k } N d k= and {v k} N d k= are uncorrelated. In the innovations model, this assumption no longer holds since the augmented w k vector contains the output noise term v k. [ ] E w k v T k = R v χ (5.4)

129 Cross Covariance Terms We begin by revisiting the expressions for each of the output terms, again treating the system as if driven only by random effects. Previously, when the outputs were correlated with each other, only the like noise terms at the same time step were correlated. Y k = Cε k + v k Y k+1 = CĀε k + CḠ w k + v k+1 Y k+2 = CĀ 2 ε k + CĀḠ w k + CḠ w k+1 + v k+2 Y k+3 = CĀ 3 ε k + CĀ 2 G w k + CĀḠ w k+1 + CḠ w k+2 + v k+3 Definition 5.1 We define the innovations autocovariance matrix as R(N) E Y k Y T k Y k Y T k+n..... Y k+n Y T k Y k+n Y T k+n = C C N..... C N C (5.5) We write out several of the cross covariance terms and observe how the new covariance terms between {w k } N d k= and {v k} N d k= enter into the expressions. We see that [ ] E w j v T k = χδ jk (5.6)

130 12 The covariance of the initial state is defined as P instead of Σ k. [ ] E Y k Y T k [ ] E Y k+1 Y T k [ ] E Y k+2 Y T k [ ] E Y k+1 Y T k+1 [ ] E Y k+2 Y T k+1 [ ] E Y k+2 Y T k+2 = CP C T + R v = CĀP C T + CḠχ = CĀ 2 P C T + CĀḠχ = CĀP Ā T C T + CḠ QḠ T C T + R v = CĀ 2 P Ā T C T + CĀḠ QḠ T C T + CḠχ = CĀ 2 P Ā 2T C T + CĀḠ QḠ T Ā T C T + CḠ QḠ T C T + R v The cross covariance terms are not additive as in the case of the {w k } N d k= sequence with itself. Therefore, a convergent sequence is not needed, and the additional terms can be added directly to the previous formulation. R(N) = + CP C T + R v CĀP C T CP C T + R v..... CĀ N P C T CĀ N 1 P C T CP C T + R v CḠχ CĀ N 1 Ḡχ CḠχ (5.7) The least-squares problem can then be set up in the same way as in the previous sections. However, this presents an additional challenge in that three matrices need

131 13 to be estimated from the problem as opposed to two in the previous version. C I CĀ CḠ CĀ 2 CĀḠ... CĀ N CĀ N 1 Ḡ P C T χ R v = R 1 (N) (5.8) However, the cross covariance term (χ) is known in terms of the sensor noise covariance, R v, and can effectively be eliminated from the problem. G R v = Ḡχ = [ G AL ] R v G = [ AL ] yielding C I CĀ CG CĀ 2 CĀG... CĀ N CĀ N 1 G P C T R v R v = R 1 (N) (5.9)

132 14 This transformation leads to a least-squares problem similar to the original problem. C I CĀ CG P C T CĀ 2 CĀG R v.. CĀ N CĀ N 1 G } {{ } A 1 = R 1(N) (5.1) The least-squares problem becomes (I p A 1 ) (P C T ) s (R v ) s = [R 1(N)] s (5.11) and ( P C T ) s (R v ) s = arg min x (I p A 1 )x [R 1 (N)] s (5.12) and ] P C T = [( P C T ) s s 1 (n,p) R v = [(R v ) s ] s 1 (p,p) (5.13) 5.2 Uniqueness Conditions We can use Equation 5.1 to develop conditions for finding unique estimates of Q w and R v. Lemma 5.1 The pair (Ā, C) is observable if and only if (A, C) is observable.

133 Proof: The pair (Ā, C) is observable if and only if the Hautus matrix H (Ā, C) has full 15 rank for all eigenvalues in the complex plane. λi A + ALC H (Ā, C) = C (5.14) This matrix can be written as the product λi A + ALC C = I AL I λi A C (5.15) therefore H (Ā, C) = FH (A, C) (5.16) Note that F has full rank and that all the eigenvalues are equal to one since the matrix is upper triangular. H (A, C) is full rank for all eigenvalues in the complex plane if and only if (A, C) is observable. Therefore H (Ā, C) is full rank if and only if (A, C) is observable. This lemma is used in conjunction with Equation 5.1 to find conditions for unique minimizers of the least-squares objective function, P and R v. Lemma 5.2 The estimates P C T, R v uniquely minimize the least-squares problem in Equation 5.1 if and only if (A, C) observable and A full rank. Proof: 1. (A, C) observable, A full rank A 1 is full rank This proof is similar to Theorem 4.3. Proof of unique estimates is the same as

134 16 proving observability of the augmented system (Ã, C), Ā AL [ ] Ã = C = C I p (5.17) H (Ã, C) = Ā AL λi [ ] C I p = λi Ā AL λi p C I p (5.18) λ The first n columns are independent if (Ā, C) is observable, which is equivalent to (A, C) observable by Lemma 5.1. The next p columns are independent due to the zero in the second block row. λ = H (Ã, C) = Ā AL C I p (5.19) In this case, to prove the columns are independent, we prove that all of the multipliers are zero. ALC A C AL I p α i β i = (5.2) Solving the second equation, we find β i = Cα i. Substituting into the first equation gives Aα i =.

135 17 Thus, if A is full rank, the α i s must be zero. Substituting into the first equation, the β i s are also zero. Thus, the columns are independent if A is full rank. 2. Unique estimates (A, C) full rank, A full rank, or not[(a, C) full rank or A full rank] A 1 is not full rank In Equation 5.18, the first n columns are not independent if (A, C) is not observable. In Equation 5.19, if A is rank deficient, then H (Ã, C) is also rank deficient. These conditions are the same for the state-based method in Chapter 4. Finally, we have Theorem 5.1 Given estimates of P C T and R v, unique estimates of Q w and R v can be found if and only if C has full column rank. Proof: This proof is identical to Theorem 4.1. The least-squares problem is modified to include Q w [R 1 (N)] s = (I A 1 ) C AL AL I } {{ } B Q w R v s (5.21) in which [( Q w ) s, ( R v ) s ] = arg min x [R 1 (N)] s [(I A 1 )B] x (5.22)

136 18 Assumption 5.1 C has full column rank Assumption 5.2 pq np g + p 2 p in which q is the dimension of the null space of A 1, and the number of unknowns in the covariance matrices, Q w, R v, are g and p respectively, in which g = g(g + 1) 2 p = p(p + 1) 2 (5.23) Theorem 5.2 The pair ( Q w, R v ) is a unique minimizer of the autocovariance leastsquares problem in Equation 5.21 if and only if Assumptions 5.1 and 5.2 are true. Proof: 1. pq < np g + p 2 p and C full rank ( ) Q w, R v is a unique minimizer of Equation 5.21 The proof involves showing the coefficient matrix (I p A 1 )B has full column rank. The dimensions of the matrices are I p A 1 R p2 N p(n+p) B R np+p2 g +p (5.24) in which rank(i p A 1 ) = np + p 2 pq (5.25) and { } rank(b) = min np + p 2, g + p m (5.26) and q = dim [null(a 1 )] m = dim [ null(c ) ] (5.27)

137 19 We know rank(i p A 1 ) = p rank(a 1 ) [23]. We next note that B is full rank if and only if C is full rank, which is proven by showing the multipliers of C AL AL I α i β i = (5.28) are equal to zero. This proof is identical to Theorem 4.1. From the rank property of the matrix product [6] rank(i p A) + rank(b) np p 2 rank[(i p A)B] min { rank(i p A), rank(b) } rank[(i p A)B] (5.29a) (5.29b) we can trap the rank of the matrix product between the inequalities, resulting in { } rank[(i p A)B] = min np + p 2 pq, g + p m (5.3) If pq np g + p 2 p (5.31) and C is full rank (m = ), then rank[(i p A)B] = g + p (5.32) 2. pq np g + p 2 p or not[c full rank] minimizer of Equation 5.21 ( Q w, R v ) is not a unique If the condition on q is not met or if m, then from Equation 5.3, [(I p A)B] cannot have full rank.

138 11 From Lemma 5.2, the null space of A 1, has a nonzero dimension when (A, C) is unobservable or when A is not full rank. The dimension of the null space is equal to the number of unobservable modes of (A, C), plus the number of zero eigenvalues of A, minus the common values (to avoid double counting). If the state-space system is in observer canonical form x 1 x 2 = k+1 [ A 11 A 21 A 22 ] x 1 x 2 + Bu k + Gw k (5.33a) y k = C 1 x k + v k (5.33b) then q = dim [null (A 11 )] + dim(a 22 ) (5.34) 5.3 Connections to the Literature The covariance method can be described [12] using the j th lagged autocovariance of the innovations C k = E[Y k Y T k+j ] (5.35) These autocovariances are used to estimate the unknown parameters, P C T and R v by sequentially solving the least-squares problem defined by Equation 22 in [12], and the equality in Equation 23 of [12]. R v = C C(P C T ) (5.36)

139 and the least-squares equation (Eq. 22 [12]) C 1 + CALC C 2 + CALC 1 + CA 2 LC P C T = (OA) #. C n + CALC n CA n LC (5.37) which can also be written as P C T = A # C 1 C 2. C n + LC (5.38) As mentioned in the previous chapter, solving for R v and P C T sequentially assumes that C has been estimated perfectly. Simultaneous solution is superior for numerical conditioning. Corollary 5.1 If n = N, solving the least-squares problem in Equation 5.1 is the same as the sequentially minimizing the least-squares problem in Equation 22 [12], and solving the equality in Equation 23 [12]. Proof: Beginning with Equation 5.1 C CĀ. CĀ n I CG. CĀ n 1 G P C T R v = C C 1. C n (5.39) 2 The superscript # denotes the pseudo-inverse of the matrix

140 112 Separate and use Mehra [12] Equation 23, C = CP C T + R v CĀ. CĀ n P C T + CG. CĀ n 1 G (C CP C T ) = C 1. C n (5.4) CĀ CG C. CĀ n CĀ n 1 G C P C T = C 1. C n CG. CĀ n 1 G C (5.41) Substitute G = AL C(Ā + ALC). CĀ n 1 (Ā + ALC) P C T = C 1. C n + CA. CĀ n 1 A LC (5.42) Substitute Ā = A ALC CA CĀA. CĀ n 1 A }{{} A P C T = C 1 C 2. C n + CA CĀA. CĀ n 1 A }{{} A LC (5.43)

141 Applying the pseudo-inverse of A to both sides of the equation yields Equation 22 in 113 Mehra [12] P C T = A # C 1 C 2. C n + LC (5.44) Conditions for Computing Q w In [12], Q w is also extracted from the P C T matrix by solving a series of matrix equations k 1 j= CA j GQG T (A j k ) T = (P C T ) T (A k ) T C T CA k (P C T ) (5.45) k 1 j= CA j Ω(A j k ) T C T (5.46) in which [ ] Ω = A L(P C T ) T (P C T )L T + LC L T A T (5.47) by substitution. This method can be tedious, and more difficult to automate. We prefer the C quantity for its simplicity, as well as the information it contains about uniqueness of the estimates Uniqueness Conditions In [12], the conditions given for finding unique estimates for Q w, R v are

142 (A, C) observable 2. A full rank 3. The number of unknown elements in the Q w matrix 1 g(g + 1) is less than or 2 equal to np The above conditions predict the following systems have unique estimates for Q w, R v A = G = (5.48) C = (5.49) In this case, the second state does not have a driving noise term, and only a unique Q w (1, 1) + Q w (2, 2) can be found. This situation could be resolved by simply remodeling the state noise structure. However, the aforementioned conditions also predict unique estimates for the following system A = G = (5.5) C = (5.51) In this case however, unique estimates for Q w cannot be found. In [14], if G = I n, the uniqueness conditions are given as

143 (A, C) observable 2. A full rank 3. p n Again, a counterexample can be provided to these conditions. A = C = (5.52) If it is assumed that Q w is diagonal, then the number of unknowns is less than np for any number of nonzero measurements. Remark 5.1 The conditions proposed by Mehra, (1)-(3), imply Assumption 5.1, but not Assumption 5.2. Mehra s conditions of observability and full rank of the A matrix imply q =. Further, he explicitly states (condition 3) that np g is nonnegative. Since p 2 p is positive for p 1, then Mehra s conditions imply Assumption 5.2. Note that the rank condition on C is missing from earlier work.

144 Derivation of Full Matrix Innovations-Based Least-Squares Problem While the single column methods are effective at simplifying the problem, the structure of the autocovariance matrix is lost. We develop the full matrix form, beginning with N R(N) = OP O T + Γ Ḡ Q w Ḡ T Γ T i=1 i=1 N + Ψ R v + N i=1 R v Ψ T + N R v (5.53) i=1 in which Γ = C CĀ C CĀ 2 CĀ C N Ψ = Γ ( AL) (5.54) j=1 Applying the vec operator to the equation and using the steady-state estimate error covariance (P k+1 k = P k k 1 = P ): P s = (Ā Ā)P s + (Ḡ Q w Ḡ T ) s (5.55) yields [R(N)] s = [(O O)(I n 2 Ā Ā) 1 + (Γ Γ )I n,n ](G G)(Q w ) s { [ ] + (O O)(I n2 Ā Ā) 1 + (Γ Γ ) I n,n (AL AL) + [Ψ Ψ + I p 2 N 2]I p,n } (R v ) s (5.56)

145 117 Definition 5.2 The Kronecker sum is defined as A B = A I + I B (5.57) Which results in the least-squares problem min x Ax b (5.58) in which A = D (G G) D (AL AL) + [Ψ Ψ + I p 2 N 2]I p,n S g S p (5.59) and x = (Q w ) s (R v ) s b = [R(N)] s s g s p (5.6) and D = (O O) ( I n 2 Ā Ā ) 1 + (Γ Γ ) In,N (5.61) Again the additional equations S g x = s g and S p x = s p are optional and can be used to enforce constraints on the solution. The least-squares problem is solved and yields ( Q w ) s ( R v ) s = arg min Ax b (5.62) x Theorem 5.3 The covariance estimates ( Q w, R v ) from Equation 5.62 are unique if and only if A has full column rank Proof: This proof comes directly from Lemma 4.2.

146 Discussion of the Autocovariance Estimates Ideally, we wish to compute the autocovariances as the ensemble average of the product Y i Y T i+k as shown in Figure 5.1. Practically, we approximate the ensemble average with the time average as shown in Figure 5.2. Replacing the ensemble average with the time average is valid if the process is ergodic. Since all of the computations in this dissertation deal with the autocovariances, a detailed discussion of the properties of the autocovariance estimates is appropriate. We define two versions of the autocovariance estimator: the biased and unbiased estimators. The biased autocovariance estimates are computed from C k = 1 N d k N d i=1 Y i Y T i+k (5.63) and the unbiased estimates are computed from N d k C k = 1 Y i Y T i+k (5.64) N d k Lemma 5.3 The biased autocovariance estimate converges asymptotically to the true autocovariance. i=1 Proof: Taking the expectation of the biased autocovariance estimate yields E[ C k ] = 1 N d k N d i=1 Y i Y T i+k = ( 1 k N d ) C k (5.65) Thus, the autocovariance estimate, C k, is biased as O(1/N d ). In the limit of large amounts of data, the autocovariance estimates converges to the true autocovariance. C k C k k as N d (5.66)

147 119 y k k (y 1,, y N ) 1 y k k (y 1,, y N ) 2 y k k (y 1,, y N ) 3 Figure 5.1: Ensemble average of the autocovariance

148 12 y k Sliding Window (y 1,, y N ) 1 k (y 1,, y N ) 2 (y 1,, y N ) 3 Figure 5.2: Time average estimate of the autocovariance Lemma 5.4 The expectation of the unbiased autocovariance estimate is equal to the true autocovariance. Proof: Taking the expectation of the biased autocovariance estimate yields E[ C k ] = C k k (5.67) Lemma 5.5 The variance of the estimated autocovariance ( C k or C k ) converges asymptotically to zero. Proof: The variance of the autocovariance is the fourth moment of the observations. cov(c k ) = E[Y i Y T i+j Y i+my T i+n ] (5.68)

149 121 For the biased form of the autocovariance estimates [126] cov( C k, C k+r ) = E[ C k C T k+r ] E[ C k ]E[ C k+r ] (5.69) = 1 N 2 E d N d r t=1 N d r v s=1 Y t Y T t+r Y s Y T s+r +v ( 1 r ) ( 1 r + v N d N d ) C r C r +v (5.7) A change of variables as suggested by Jenkins and Watts [68] gives = 1 N 2 d N d r t=1 N d r v s=1 [ ] C s t Cs+v t T + C s+r +v tcs t r T (5.71) This equation was approximated (for large N d ) by Bartlett [13] var( C k ) 1 N d m= [ ] C m Cm T + C m+k C T m k (5.72) This variance can be easily modified for the unbiased autocovariance estimator as well. var( C k ) 1 N d k m= [ ] C m Cm T + C m+kc T m k (5.73) The biases of the two estimators are summarized in Table 5.1. The biased estimate of the autocovariance is the more popular choice of statistical packages, despite the biased mean. This choice is due to the exploding variance of C k when k approaches N d in the unbiased estimator. When k N d, as is always the case in the types of problems to be solved in this dissertation, the difference between the two autocovariance estimators is negligible. These properties can be used to prove that the estimates of the covariances converge to the correct values.

150 122 Table 5.1: Convergence properties of autocovariance estimates Mean C k ( ) O 1 N d C k - Variance ( ) O 1 N d ( ) O 1 N d k Theorem 5.4 If A in Equation 5.59 has full column rank, and the autocovariances are computed using the biased estimator, then the covariance estimates ( Q w, R v ) from Equation 5.58 converge asymptotically to the true covariances of the plant (Q w, R v ) as N d. Proof: The autocovariance problem can be written for both the covariances of the plant and the covariances estimated from data. Subtracting the two problems results in A ls (Q w ) s ( Q w ) s (R v ) s ( R v ) s = C C C 1 C 1. C N C N (5.74)

151 Using Lemma 5.3, the autocovariance estimates computed from data can be related 123 to the true autocovariances of the system C C C 1 C 1 E =. C N C N 1 N d C 1. N N d C N (5.75) The right-hand side of Equation 5.74 converges asymptotically to zero as 1/N d. Thus the left-hand size of Equation 5.74 must also converge asymptotically to zero. Since we have proven that A ls in the least-squares problem is full column rank, it does not have a null space. Consequently, (Q w ) s ( Q w ) s (R v ) s ( R v ) s as N d (5.76) Theorem 5.5 If A in Equation 5.59 has full column rank, and the autocovariances are computed using the unbiased estimator, then the expectations of the covariance estimates ( Q w, R v ) from Equation 5.58 are equal to the covariances of the plant (Q w, R v ). Proof: Using the unbiased autocovariance estimator, the expectation of the estimated autocovariances are equal to the true autocovariances as shown in Lemma 5.4. The right-hand side of Equation 5.74 is equal to zero, and therefore Q w E = R v Q w R v (5.77)

152 124 Theorem 5.6 The second moment (the covariance) of the covariance estimates converges asymptotically to zero as N d. Proof: This proof is similar to Theorem 5.4. From Section 5.4, we can show T ( Q AE w ) s ( Q w ) s AT = cov ( R v ) s ( R v ) s C C 1. C N (5.78) Applying the vec operator, A AE ( Q w ) s ( R v ) s ( Q w ) s ( R v ) s T s = cov s C. C N (5.79) From Lemma 5.5, the right-hand side of the equation goes to zero as 1/N d (or 1/(N d k) if using the unbiased autocovariance estimator). If the right hand side of Equation 5.79 goes to zero, and if A does not have a null space, then (A A) does not have a null space [152], and E ( Q w ) s ( R v ) s ( Q w ) s ( R v ) s T as N d (5.8)

153 5.6 Different Ways to Process the Data 125 There a number of ways to use the available output data to compute the covariances of the disturbances 1. No filtering - the output based methods of Chapter 4 2. Innovations - the autocovariances of the innovations as discussed previously in this chapter 3. Measurement errors - the autocovariances of the update errors, Y k y k C x k k The equivalent state-space model for the update errors is [ ] ε k+1 = A ALC ε k + } {{ } Ā [ ] G AL } {{ } Ḡ [ ] [ ] Y k = C(I LC) ε k + I CL } {{ } } {{ } C D w k v k } {{ } w k v k (5.81a) (5.81b) The full matrix-matrix solution is then N R (N) = OP O T + Γ Ḡ Q w Ḡ T Γ T i=1 i=1 N + Ψ R v D T + N i=1 DR v Ψ T + N DR v D T (5.82) i=1

154 126 in which O is the observability matrix of (Ā, C) and Γ = C CA C CA 2 CA C N Ψ = Γ ( AL) (5.83) Applying the vec operator to the equation and applying the steady-state Lyapunov equation yields j=1 R (N) = + + { } (O O)(I n 2 Ā Ā) 1 + (Γ Γ )I }{{ n,n (G G)(Q } w ) s D { N N D (AL AL) + D Ψ Ip,N + Ψ D N D j=1 j=1 j=1 Ip,N N D Ip,N }(R v ) s (5.84) j=1 which can be cast in the usual least-squares framework A = D (G G) D (AL AL) [ ( N + D ) j=1 Ψ ( + Ψ N D ) j=1 ( N + D j=1 D) ] N j=1 I p,n (5.85) and x = (Q w ) s (R v ) s [ b = [R (N)] s ] (5.86) Using the measurement error covariances in place of the innovations covariance does not yield higher degrees of freedom in finding unique estimates of Q w, R v. However,

155 it can provide better numerical conditioning and smaller confidence intervals than the innovations methods alone Noise Shaping Methods If the noise shaping matrix, G, is not known, then a typical solution of the problem might involve finding R v, GQ w G T. However, unless the number of sensors is equal to the number of states, a unique estimates generally cannot be found. A better strategy might be to try and identify G Q w, reducing the number of covariance parameters from 1 n(n + 1) to ng. Once this procedure has been performed, the following sub- 2 stitution can be used G G Q w Q w I g The least-squares problem from Equation 5.58 can be rewritten as A = D D (AL AL)+ +[Ψ Ψ + I p 2 N 2]I p,n (5.87) and x = [ (G Qw )(G Q w ) T ] s (R v ) s (5.88) The nonlinear optimization is performed over the elements of P = G Q w and the elements of R v. min P ij, R ij (Ax b) (5.89)

156 128 Example 5.1 Full information computation of G Q w A = [ ] G = Q w =.5.5 C = I 2 (5.9) Ĝ = I 2 (5.91) in which G Q w = p 1 p 2 (5.92) Using this model, a unique estimate of GQ w G T can be computed. Subsequently, a unique estimate of G Q w can also be found. The vector of unknowns becomes [ p 2 1 p 1 p 2 p 2 2 (R v ) T s ] T = arg min Ax b (5.93) x The nonlinear optimization is performed over the elements of G Q w and the elements of R v. min Ax b (5.94) p 1,p 2,R ij In Figure 5.3, the contour plot of Φ = Ax b in the p ij space (G Q w ) has two strong optimum values. The two solutions are equivalent due to the symmetry of the system. Example 5.2 Short information computation of G Q w [ A = C = ] (5.95)

157 Figure 5.3: Objective function values versus elements of G Q w

158 13 G = 1.5 Q w = [.5 ] Ĝ = I 2 (5.96) In this case, the least-squares problem becomes min Ax b (5.97) p 1,p 2,r and the vector of unknowns becomes [ p 2 1 p 1 p 2 p 2 2 r v ] T = arg min x Ax b (5.98) The contour plot of the objective function is shown in Figure 5.4. As can be seen, it appears that the minima exists in a trough. In fact, there is a global minimum within the trough, albeit a weak one.

159 131 line Figure 5.4: Objective function values versus elements of G Q w

160 Multiple Observers Due to the structure of the method used in this dissertation, it is easy to make use of multiple observers, an idea that has roots in multi-model approaches. Friedland [42], generates linear relationships between the innovations and the covariance matrices, using covariance matching. Using a number of observers equal to the number of covariances to be estimated, the same number of innovations can be used to solve for the unknown covariances. When developing the methods of Section 5.4, it was assumed that the output data had been processed with a Kalman filter with gain L 1 (say). The data can also be processed by multiple observers (L 2, L 3, ) in parallel. The least-squares problem can be modified to include a second set of innovations, based on the filter, L 2. D D 1 (G G) 1 (AL 1 AL 1 )+ +[Ψ 1 Ψ 1 + I p A = 2 N 2] I p,n (5.99) D D 2 (G G) 2 (AL 2 AL 2 )+ +[Ψ 2 Ψ 2 + I p 2 N 2] I p,n and x = (Q w ) s (R v ) s b = [R 1 (N)] s [R 2 (N)] s (5.1) Example 5.3 Revisiting the short measurement example Revisiting the previous example, we can apply multiple observers to better estimate G Q w, R v. The two ellipses in Figure 5.5 represent the confidence intervals [13]

161 around the estimates that are found using each observer independently. When used in conjunction, the confidence interval becomes much smaller. By manipulating the L 1 L 2 L 1 L 2 plant.6 p p 1 Figure 5.5: Improving convergence with a double filter two observers that are used, the confidence intervals are reduced further, as shown in Figure 5.6.

162 L 1 L 2 L 1 L 2 plant.6 p p 1 Figure 5.6: Improving convergence with a different double filter

163 5.8 Optimal Filtering 135 Even if there is not enough information to find Q w, we may still find the optimal Kalman filter gain. As mentioned in Chapter 1, the state estimate error is defined as ε k x k x k k 1, with steady-state covariance, P. We suggest three ways to find the optimal gain: iterative, minimizing estimate error, and optimization Iterative Procedure From Equation 5.1 C C 1. C N = C CĀ. CĀ N I CAL. CĀ N 1 AL P C T R v (5.11) Unique estimates of P C T and R v are all that are needed to compute the optimal Kalman gain L. Mehra [12, 14] suggests processing the data with a filter, computing a new filter gain based on the computed values of P C T and R v, and reprocessing the data with the new filter, given by L new = P C T [ C P C T + R v ] 1 (5.12) This procedure can be repeated until the gain, L, converges. The eigenvalues of the closed-loop estimator gain, A ALC, can be checked at each time step, and terminated suboptimally if the magnitude of the real part of the eigenvalues become greater than

164 136 one. A negative eigenvalue can occur if not enough data was processed, or if there is severe model mismatch in the system. The disadvantage to this approach is that the state estimates have to be recomputed for every iteration, which can be time consuming if the data set is long Minimizing the Estimate Error Carew and Bélanger [28] assume the data has been processed with some initial filter, L 1. If L 1 is not optimal, then the state estimates are also suboptimal. The covariance of the optimal state estimates, x k k 1, minus the suboptimal state estimates, x k k 1, are driven to zero. P k = E[( x k k 1 x k k 1 )( x k k 1 x k k 1 )T ] (5.13) An iterative procedure was developed to force P k =. W ( P k ) = C P C T + R v C( P)C L( P k ) = ( P C T P k C T )W 1 ( P k ) (5.14a) (5.14b) P k+1 = (A AL 1 C) P k (A AL 1 C) T + A(L 1 L )W ( P k )(L 1 L )A T (5.14c) This approach circumvents the calculation of the state estimates for every time step.

165 Optimization Finding the optimal filter gain can also be cast as an optimization problem. Taking the expectation of the estimate error and its transpose and assuming steady state P = ĀP Ā T + GQ w G T + ALR v L T A T (5.15) The stacked version of the estimate error covariance is given as P s = (I n 2 Ā Ā) 1 ( GQ w G T + ALR v L T A T ) s (5.16) The optimization is carried out over some scalar function. In this case we use the trace of the estimate error covariance. min L tr(p ) = { ( [I n 2 Ā(L) Ā(L)] 1 C P + AL R ) } v L T A T I n (5.17) s in which I n = {(I n ) s } T (5.18) Minimizing the trace of the estimate error covariance (P ) over the gain L, yields the optimal filter gain. Example 5.4 Insufficient information to compute Q w To illustrate a problem with too many unknowns in the Q w matrix to find uniquely, we begin with the system in Example 3.1 with two states and one output. The covariance estimation is performed using 1, data points. The covariances of actual noise

166 138 sequences are Q w = R v = [.8 ] (5.19) Assuming there is no knowledge about the covariances, the initial covariances are set in a way that effectively turns off the estimator. Q w = R v = [ ] (5.11) Based on the initial tuning of the estimator, the probability distribution of the estimate error is shown in Figure 5.7. A smaller ellipse in this distribution represents better estimation of the states, which subsequently leads to better control performance. As.4.3 optimal KF initial estimate Figure 5.7: Probability distribution of (x k x k k 1 ), Kalman gain unknown

167 139 shown previously, the estimates of Q w and R v are not unique, but we are still able to find the optimal filter gain. We process the prediction errors three times, updating the filter gain each time with the solution from the covariance estimator. The estimated filter gain converges in three iterations L 3 = (5.111) in which the optimal gain based on the correct covariances is L optimal = (5.112) The state estimate error distribution using the estimated Kalman gain is shown below in Figure 5.8. The estimate error has been nearly minimized Whitening the Innovations A necessary condition for an optimal filter [25] is that the innovations (Y k ) of the process are white. Whiteness of a sequence, y k, is evaluated with the autocorrelation function Definition 5.3 ρ(k) = i= (y i ȳ)(y i+k ȳ) i=(y i ȳ) 2 (5.113) For a purely random (white) sequence, the autocorrelation is given by [2] 1 k = ρ(k) = k (5.114)

168 optimal KF final estimate Figure 5.8: Probability distribution of (x k x k k 1 ) using Kalman gain from covariance estimator

169 and illustrated for a purely random sequence in Figure 5.9. As mentioned, a require r k lags Figure 5.9: Autocorrelation function of white noise ment for the optimal gain (and optimal state estimator) is for the innovations to be white, so we look at the autocorrelation of the innovations ρ Y (k) = i= (Y i Y ) T (Y i+k Y ) i= (Y i Y ) T (Y i Y ) (5.115) Example 5.5 Revisiting the example We revisit Example 5.4 to analyze the autocorrelation for the innovations. The autocorrelation function for the example is shown below in Figure 5.1. The autocorrelation is biased on the first iteration, since the filter is not optimal. However, after the

170 142 1 r k 1 r k lags lags r k lags Figure 5.1: Evolution of the innovations autocorrelation while updating the Kalman gain

171 143 third iteration, the autocorrelation function closely approximates the autocorrelation function for a white noise, when compared to Figure 5.9. Finally, we look at the evolution of the eigenvalues of the closed loop estimator gain for this example. The eigenvalues of (A ALC) converge rapidly to the eigenvalues of the optimal closed-loop estimator gain. The evolution of the eigenvalues are shown in Figure λ(a-al 1 C) λ(a-al 2 C) 1 λ(a-al 1 C) λ(a-al 2 C) 1 λ(a-al 1 C) λ(a-al 2 C) Figure 5.11: Evolution of the estimator eigenvalues while updating the Kalman gain

172 Design Procedure We summarize this chapter with a procedure for solving these autocovariance leastsquares problems. The innovations procedure (Section 5.4) is used when there are enough sensors to estimate Q w and R v uniquely. The noise shaping method (Section 5.7) is used when the covariances cannot be estimated uniquely. Finally, if G Q w and R v cannot be estimated uniquely using the noise shaping method, then the optimal filter gain method (Section 5.8) is used. This procedure grows in computational complexity with each step, as summarized in Figure Q w, R v yes Compute Unique? Q w, R v Computational Complexity no G Q w, R v Unique? yes Optimize for Q w, R v L no Optimize for L Figure 5.12: ALS design procedure

173 145 Chapter 6 Closing the Loop 1 Things should be made as simple as possible, but not any simpler. Albert Einstein In this chapter, we explore how the ALS methods perform on closed-loop systems. We also explore the disturbance model that is used to ensure offset-free control in model predictive control. 1 Portions of this chapter were published in Odelson and Rawlings [121] and [12]

174 Output-based Solution From Equation 4.1, the outputs can be written as y k = CA k x + C k 1 h= A k h 1 Bu h + C } {{ } input effects k 1 h= A k h 1 w h + v k } {{ } random effects (6.1) Since the inputs are known, the input effects term in the previous equation can simply be subtracted from the outputs. The result are the outputs driven only by the random noise sequences, {w k } N d k= and {v k} N d k=. This method works, however, only if the system is stable. 6.2 Innovations-based Solution When closing the loop, the innovation method is especially attractive since the control law doesn t need to be specified. Model predictive controllers use an integrated white noise disturbance to ensure offset free control. This covariance of the driving term of the integrated white noise sequence must also be specified.

175 Fixed Disturbance Model From Chapter 2, we write a general expression for the disturbance model. x k+1 = Ax k + Bu k + B d d k + Gw k d k+1 = d k + ξ k y k = Cx k + C d d k + v k (6.2a) (6.2b) (6.2c) in which the covariances of the white noise sequences are given as E[w k w T k ] = Q w E[v k v T k ] = R v E[ξ k ξ T k ] = Q ξ (6.3) Generally, we deal with two specific cases of disturbance models. The pure output disturbance model B d = n p C d = I p (6.4) is a common industrial choice. We also deal with the pure input disturbance model B d = B C d = 2 2 (6.5) Other choices for disturbance models are beyond the scope of this dissertation. The interested reader is referred to [123]. We define two variants of the ALS method: the fixed and updated disturbance models. In the fixed disturbance model, it is assumed that a disturbance model is being used in the target tracking problem, and was not designed to reject a specific disturbance. In this case, our definition of the prediction error, Y k = y k C x k k 1, is inadequate. With a disturbance model, the prediction errors may not be stationary or

176 148 zero mean. We redefine the prediction error as Y k y k C x k k 1 d k k 1. Since this quantity is zero mean and stationary (with no deterministic disturbances or model mismatch), we can use the ALS methods. ε k+1 d k+1 k = A AL x C L d C AL x C d B d I L d C d ε k d k k 1 + G AL x L d w k v k Y k = [ ] C C d ε k d k k 1 (6.6a) (6.6b) In this formulation, the effects of the disturbance model are subtracted from the prediction errors, leaving only the effects of the underlying state and sensor noise sequences. Example 6.1 Shell Control Problem To demonstrate this method on a more complex system, we use a fractionator model from Shell [124]. G(s) = 4.5e 27s 5s e 18s 5s e 2s 33s e 28s 6s e 14s 6s e 22s 44s e 27s 5s+1 6.9e 15s 4s s+1 (6.7) The inputs are constrained as.5 u k.5 (6.8) The discrete time model contains 3 states, 3 inputs, and 3 outputs. A setpoint

177 149 change is made r k = (6.9) but due to the input constraints, it is not reachable as illustrated in Figure 6.1. Figure 6.2 illustrates the prediction error and the three inputs (one saturated at its y k.6 outputs setpoint Figure 6.1: Shell control problem - constrained outputs k upper bound). Figure 6.3 shows the results of the covariance estimation, on a element-

178 15 y k.2 prediction errors u k k inputs Figure 6.2: Shell control problem - prediction error, control actions k

179 by-element basis, in which R v = k < 2 (6.1) and R v = k 2 (6.11) The covariance estimator correctly estimates the elements of the covariance. 6.3 Updated Disturbance Models The second type of ALS method for disturbance models arises when the disturbance model is used to adapt to specific disturbances in the plant. We can use the following augmented system A AL x C B d L d C AL x B d L p + B d ε k+1 = ε k L d C I L d w k G AL x B d L p + ξ k (6.12a) I L d v k [ ] Y k = C C d ε k + v k (6.12b)

180 152 8e-5 7e-5 r 11 r 22 r 33 6e-5 5e-5 4e-5 3e-5 2e-5 1e e-5 1.5e-5 k r 21 r 31 r 32 1e-5 5e-6-5e-6-1e-5-1.5e-5-2e k Figure 6.3: Shell control problem - covariance matrix element-by-element

181 153 An additional tuning parameter is now required: the covariance of the integrated white noise disturbance. Unless the integrated white noise disturbance covariance is small (i.e. a slow-drift disturbance) relative to the length of the data string, it is unrealistic to expect this type of disturbance to actually be present in the plant. As a pure random walk, the covariance of the output disturbance, E[d k d T k ] = kq ξ, grows without bound as shown in Figure 6.4. To reject the disturbance, the control action 4 ξ k Figure 6.4: Examples of integrated white noise disturbances is unbounded in the unconstrained case, and the outputs are unbounded in the constrained case. In most cases, modeling the disturbance as an integrated white noise introduces model mismatch into the system, even if the system matrices (A, B, C, D) are known perfectly.

182 154 No other disturbance model can be used that retains the offset properties desired for MPC applications. With no a priori knowledge of the disturbance statistics or the introduced mismatch, initial tuning is even more difficult in practical applications. We can use the proposed methods to migrate from some initial tuning to a tuning specified by the data. For example, an initial tuning could be QDMC-like [91] which has the form L x L d = (6.13) 6.4 Time-delay Systems One class of system not previously considered are time-delay systems, described by the state-space system x k+1 = Ax k + Bu k d + Gw k y k = Cx k + v k (6.14a) (6.14b)

183 155 Given a state-space model with d time delays, the model can be written in augmented form. x k+1 u k d+1. u k 1 u k = A B I..... I x k u k d. u k 2 u k 1 +. I u k + G. w k (6.15a) y k = [ C ] x k u k d. u k 2 u k 1 + v k (6.15b) As mentioned previously in Chapters 4 and 5, the ALS methods require that the A matrix have full rank. It is apparent that time-delay systems violate the rank condition of the ALS method due to the final row of zeros. Accounting for the row of zeros requires an additional step in the ALS methods. The evolution equation for the last row in the augmented system, u k = u k, should not contain any uncertainty, and thus should not be updated by the Kalman filter. We have shown that if A is singular, the condition number of the least-squares problem is infinite. However, instead of estimating all of the elements of P C T, we estimate those corresponding to the actual states of the system.

184 156 P C T = pc 1 pc 2 pc 3 pc 4 pc 5 pc 6 pc 1 pc 2 pc 3 pc 4 pc 5 pc 1 pc 2 pc 3 pc 4 pc 1 pc 2 pc 3 γ Table 6.1: Condition numbers of the time-delay example 6.5 Illustrative Examples In this section we provide a number of closed-loop simulation results to demonstrate the effectiveness of the ALS methods, and to highlight some of the main points of this dissertation. Example 6.2 Time-delay system Example 3.1 is modified to include a time delay of four time steps. Table 6.1 summarizes the condition numbers of the least-squares problem based on the number of elements of P C T being estimated. If the time-delay is known correctly, then estimating only the elements of P C T corresponding to the actual states is appropriate and the condition number, γ, is 27. If extra elements were estimated (γ=524), then they are estimated to be zero (or nearly so). If the time delay was not known correctly, eight time steps for example, then estimating other elements of P C T is appropriate. The input/output results of updating the estimator to account for the incorrect time

185 157 delay is shown in Figure 6.5. The ALS estimator is able to partially account for the incorrect time delay. Based on the condition numbers of the least-squares problem, more data is required to obtain reliable covariance estimates when more parameters are being estimated. Example 6.3 Nonisothermal CSTR Example Consider a classic CSTR example [56] in which the irreversible reaction A B takes place. The system is governed by the following material and energy balances c A t = q ( ( ) ca,f c A k exp E V RT T t = q V ( Tf T ) + ( H) ρc p k exp ) c A ( E RT ) c A + UA V ρc p (T c T ) (6.16a) (6.16b) in which Table 6.2: Table of parameters for nonisothermal CSTR example Parameter Value Units q 1 L/min c A,f 1 mol/l T f 35 K V 1 L ρ 1 g/l C p.239 J/g K H J/mol E/R 875 K k min 1 UA J/min K T c 3 K c A.5 mol/l T 35 K

186 158 y k.5 ALS nominal setpoint u k ALS nominal Figure 6.5: Time-delay example

187 159 We are controlling composition and temperature in the reactor, using an unconstrained MPC controller. The input flowrate (q) and the cooling jacket temperature (T c ) are the manipulated variables. We use the linearized version of the system, in which the linearization is performed around the point c ss =.5mol/L q ss = 1L/min T ss = 35K (T c ) ss = 3K The regulator objective function is defined on the infinite horizon, penalizing deviations of the outputs from setpoint (r k ), and inputs from their steady-state targets Φ(k) = (y k+j r k ) T Q(y k+j r k ) + (u k+j u s,k ) T R(u k+j u s,k ) (6.17) j= We demonstrate the effectiveness of our methods using no a priori information. Our nominal tuning of the filter is QDMC-like, ( Q ξ Q w, R v ). This tuning represents an upper bound on the performance of the estimator, as it is a robust (albeit inefficient) choice for initial tuning with no a priori knowledge. A slow-drift output disturbance is injected into the plant, and the estimate error distributions are compared, using the elliptical representation of the probability distributions as shown in Chapter 2 in which [ ] λ 1 λ 2 P λ 1 λ 2 = c (6.18) and c is a constant. In Figure 6.6, the estimate of the probability distribution for the state estimate error is shown. [ ] P = E (x k x k k 1 )(x k x k k 1 ) T (6.19)

188 16 Figure 6.7 illustrates the estimation of the slow-drift disturbance, as illustrated by ALS nominal optimal.2.1 λ λ 1 Figure 6.6: CSTR example - estimation of state noise probability distribution the cross-section of the disturbance estimate probability distribution. [ P = E (d k d k k 1 )(d k d ] k k 1 ) T (6.2) Next the simulation is repeated to compare the regulator performance using the nominal estimator, the ALS estimator, and the optimal estimator. Again the slowdrift is present, and the regulator tries to remain at the setpoint r k =.5 35 (6.21) The concentration outputs are compared in Figure 6.8, and the tracking performance is similar. The cooling jacket temperature input is shown in Figure 6.9. The input ac-

189 ALS nominal optimal.2.1 λ λ 1 Figure 6.7: CSTR example - estimation of slow-drift disturbance probability distribution

190 y 1 k,optimal y 1 k,nominal mins y 1 k,als Figure 6.8: CSTR example - comparison of composition outputs

191 tion for the adaptive and optimal cases are similar while the QDMC-like tuning uses 163 inputs that are unnecessarily aggressive. The temperature outputs are compared u 1 k,nominal u 1 k,optimal mins u 1 k,adaptive Figure 6.9: CSTR example - comparison of cooling jacket temperature inputs in 6.1 and the flowrate inputs are compared in Figure Due to feedback control, the regulator compensates for a poorly tuned estimator. However, additional performance (i.e. reduced regulator objective function cost) can be realized by filtering disturbances in the estimator, instead of relying on the regulator to reject them. In Figure 6.12, a comparison of the average objective function cost (per unit time) is shown. The average objective function cost is defined as Φ k = 1 k (y k+j r k ) T Q(y k+j r k ) k j= + (u k+j u s,k ) T R(u k+j u s,k ) (6.22)

192 164 nominal ALS optimal y 2 k,nominal y 2 k,optimal y2 k,als mins Figure 6.1: CSTR example - comparison of temperature outputs

193 u 2 k,nominal u 2 k,optimal.8.6 u 2 k,als mins Figure 6.11: CSTR example - comparison of flowrate input

194 166 A reduction in the average objective function cost translates directly into better tracking performance and/or less control action. k -1 Φ k 225 ALS nominal optimal mins Figure 6.12: CSTR example - comparison of regulator costs

195 167 Example 6.4 Distillation Example - 15% mismatch The motivating example in Chapter 2 corresponds to the model for a high-purity distillation column as shown in Figure A linear model has been proposed [19, Condenser y D Feed Boilup x B Reboiler Figure 6.13: High purity distillation column 147] that captures the important elements of the model. G(s) = 1 75s (6.23) This particular model is ill-conditioned, however. The model has a strong directionality, and the condition number of the plant is large, γ = 142 in this case. We sample

196 168 at one minute and add mismatch in the unfavorable direction [173]. B = 1 + δ 1 δ B (6.24) If δ.168, then the determinants of the gain matrices of G(s) and G plant (s) have opposite signs and we expect any controller with an integrating disturbance to be unstable [148]. The model mismatch becomes the disturbance that the estimator rejects. The regulator can compensate for the model mismatch, but at a greater objective function cost than is realized by using the estimator tuning based on the data. The uncertainty parameter in the model is δ =.15, with a slow-drift disturbance present in the plant. As mentioned in Chapter 2, choosing the noise covariances to be the covariances of the disturbances actually entering the plant Q w = Q w R v = R v Q ξ = Q ξ (6.25) causes the closed-loop system to be unstable. As suggested in Anderson and Moore [5], because the plant is stable, Q w (or Q ξ ) can be increased until the system stabilizes. As a comparison, the control performance of the adaptive filter is compared to the covariance matching solution. Given a tuning that stabilizes the system, the covariances of the disturbances are computed from the residuals of the Kalman filter, as discussed in Chapter 3. It was shown that the residuals approach gives biased estimates of the disturbance covariances. Nevertheless, this type of tuning for an industrial process might be perfectly acceptable, due to its ease of implementation. The regulator is an

197 169 unconstrained infinite horizon LQR Φ(x k, r k, u k ) = y j r j 2 Q + u j u s k 2 R + u j u j 1 2 S (6.26) j=k in which Q = 5I 2 R = 2 2 S = I 2 (6.27) and a pure output disturbance model is selected B d = 2 2 C d = I 2 (6.28) A setpoint change is made from the origin.781 r k =.625 (6.29) and the outputs of the column are shown in Figure The covariance estimation techniques perform well in the presence of plant/model mismatch, with a settling time similar to using the optimal estimator and much smaller than the nominal case. The inputs to the column are shown in Figure The payoff of using the adaptive methods is reflected in the input of the system as well. Figure 6.16 compares the average objective function cost of the three cases. The average control cost using the ALS estimator is 3-4 times better than the nominal tuning. To demonstrate that the adaptive techniques do not tune the estimator so aggressively for servo control that regulatory control is sacrificed, an unmodeled impulse disturbance of magnitude.5 is injected into the output. The results in Figures 6.17 and 6.18 show that the input/output behavior using the ALS estimator is

198 y 1 k ALS nominal optimal setpoint mins.8 y 2 k 1 ALS nominal optimal setpoint mins Figure 6.14: Column example - target tracking comparison

199 171 u 1 k ALS nominal optimal mins u 2 k ALS nominal optimal mins Figure 6.15: Column example - input comparison

200 172 k -1 Φ k 6 adaptive nominal optimal k Figure 6.16: Column example - comparison of regulator costs

201 173.6 y 1 k ALS nominal setpoint mins.4 y 2 k ALS nominal setpoint mins Figure 6.17: Column example - outputs while rejecting a disturbance

202 174 u 1 k ALS nominal mins u 2 k ALS nominal mins Figure 6.18: Column example - inputs while rejecting a disturbance

203 175 not only stable, but performs better than the nominal case. Next, a comparison is also made using the input disturbance method in which B d = I 2 C d = 2 2 (6.3) The output comparisons are shown in Figure 6.19 and the inputs are shown in Figure 6.2. An input disturbance model appears to control the process better than an output disturbance model. Updating the estimator with the ALS methods results in better control performance in the input disturbance case than the nominal input disturbance model. The ALS methods applied to the output disturbance model achieved nearly the same performance as the input disturbance model. We conclude that a good choice of disturbance model is critical for the distillation column. However, if a bad choice is made, the ALS estimator can compensate to a degree for the bad choice. If the correct disturbance model is used (input disturbance model), the ALS methods can improve the closed-loop control performance over the nominal choice.

204 y 1 k ALS nominal optimal setpoint mins.8 y 2 k 1 ALS nominal optimal setpoint mins Figure 6.19: Column example - outputs using an input disturbance model

205 177 u 1 k ALS nominal optimal mins u 2 k ALS nominal optimal mins Figure 6.2: Column example - inputs using an input disturbance model

206 Discussion of Model Mismatch and Regulator Performance Our conclusions from these examples is that while diagnosing incorrect covariances of the disturbances can result in a significant reduction in controller cost, model mismatch appears to be the dominant opportunity for the ALS methods to reduce controller cost. The model mismatch can be in the input-output model, or can be a poor choice of the disturbance model. In Figure 6.21, we repeat the previous example, while varying the mismatch parameter, δ, from zero (no mismatch), to.15 (nearly unstable). In these simulations, the setpoint is constant, and the controller cost is the cost to stay at setpoint in the presence of a slow-drift disturbance. As can be seen, the potential for the ALS methods to reduce controller costs increases rapidly with model mismatch. In Figure 6.22, the previous experiment is repeated again, making a setpoint change. The same behavior is observed in this plot, in which the potential benefits of the ALS methods grow with mismatch. The difference in the scales between these plots is large; the potential payoffs of using the ALS methods are much larger when performing servo control, as opposed to regulatory control.

207 179 Φ ALS nominal δ Figure 6.21: Column example - effects of model mismatch on regulatory control cost

208 18 Φ 14 ALS nominal δ Figure 6.22: Column example effects of model mismatch on servo control cost

209 181 Chapter 7 Application to Industrial Data Nevertheless, determining suitable values for these covariance parameters continues to be an impediment in the industrial implementation of MPC. Steve Miller - Eastman Chemical Company An important goal of this project is to apply the developed methods to an industrial process. Using industrial data presents a number of unique challenges including: unmodeled nonlinearities in the plant, pre-filtering of data in the DCS system, and unmodeled deterministic disturbances.

210 Introduction We have worked with Eastman Chemical Company to apply this method to data from a gas-phase reactor. The process is represented below in Figure 7.1. As shown, the composition in the reactor is the controlled variable, and the feed fraction to the reactor is the manipulated variable. The temperature drop across the reactor is also being measured. While the temperature drop does not have a setpoint, it does have the constraint T min < T < T max (7.1) The setpoint for the composition comes from a downstream process and is timevarying. All of the data presented have been scaled to remove any information about Model Predictive Controller T F i FC T i T o c o Gas Phase Reactor Figure 7.1: Eastman gas phase reactor control the operating conditions of the reactor. As mentioned, there are two outputs and one input in the process. There are n states in the system, including a large number, n d,

211 183 of time-delay states. 7.2 Data Set 1 We were provided with three data sets from Eastman. Each data set spans several days of operating data, and each set is spaced months apart. The entire first data data set is shown in Figure 7.2. The set is divided into six 5, point subsets. The first 5, points compromise the training set, and the subsequent sets are used as validation sets. In the first formulation, a pure output disturbance model is used. x k+1 = Ax k + Bu k d k+1 = d k + ξ k y k = Cx k + d k (7.2a) (7.2b) (7.2c) We further assume that the number of time-delay states is correct. The methods of Chapter 6 are applied to find the optimal gain of the form L x = L d nd 2 n nd 2 (7.3) Recall that the covariance estimation techniques are based on the properties of the prediction errors (innovations) of the process. The prediction error for both the original and updated tunings are shown in Figure 7.3. The prediction error variances have

212 184 u k input Comp k Composition Temperature Setpoint Temp k Figure 7.2: Normalized process data - set 1

213 been reduced by more than an order of magnitude. The innovations have been significantly whitened as well. Once an updated filter gain has been computed, we use it line 1 line line 1 line Figure 7.3: Prediction error training Set - -5, to process the rest of the data string. The covariance estimation techniques are not useful if the training set was over-fit in such a way that the model did not match the validation data. Figures each contain eight subplots to fully characterize the behavior of the updated filter. Table 7.1 outlines the results contained in each of the subplots. The left column represents the original tuning in the process and the right column represents the results based on the covariance estimator. The bottom row represents the autocorrelation of the data to determine the whiteness of the results, which was described previously in Chapter 5.

214 186 Original Estimator ALS Estimator 1. Prediction Error 1. Prediction Error 2. Frequency Distribution of Composition Prediction Error 2. Frequency Distribution of Composition Prediction Error 3. Frequency Distribution of Temperature Prediction Error 3. Frequency Distribution of Temperature Prediction Error 4. Autocorrelation of Prediction Errors 4. Autocorrelation of Prediction Errors Table 7.1: Map of results Note that in all three cases, the updated filter performs well. The variability of the prediction error is decreased by at least an order of magnitude. Note the autocorrelation plots show that the prediction errors are white, a necessary condition for an optimal filter.

215 r 1 r r 1 r Figure 7.4: Validation set - 5,-1,

216 r 1 r r 1 r Figure 7.5: Validation set - 15,-2,

217 r 1 r r 1 r Figure 7.6: Validation set - 25,-3,

218 Consistency To test the consistency of the ALS technique, we use each of the six data subsets to compute a new filter gain. By overlapping these subsets, we compute eleven different filter gains as shown in Figure 7.7. The eigenvalues of the estimator gain.85 dt Figure 7.7: Division of data for consistency discussion A I A I L x L d [ C I ] (7.4) are computed for each new filter and plotted in Figure 7.8. With the exception of subsets four and five, the results are consistent. Referring back to Figure 7.7, subsets four and five contain a large temperature excursion. Figure 7.9 shows the prediction error of the temperature. There is a large spike around point 11,4, caused by a

219 191 1 λ(a-alc).8 1-3, Figure 7.8: Eigenvalues of the ALS estimators replicates

220 192 large unmodeled nonzero disturbance in the process. The deterministic disturbance.2 line 1 line Figure 7.9: Prediction errors containing an unmodeled disturbance is shown in the plot of the disturbance estimate in Figure 7.1. Once the estimator accounts for the unmodeled disturbance in the system, the prediction errors return to normal. If the outlier is removed from the innovations data, the eigenvalues for subsets four and five in Figure 7.8 rejoin the group Regulator Payoff We next demonstrate the benefits of adaptive estimation in terms of closed-loop control performance. A challenge of using industrial data is the fact that the true plant is not available for simulation. Intuitively, if prediction error decreases, then the

221 d est Figure 7.1: Disturbance estimates for the temperature excursion

222 194 state estimates are closer to the true states of the system, and regulator performance should increase. The target calculation follows the conventions of Muske/Rawlings [111]. The following optimization is performed to compute the steady-state targets, x s k, us k. subject to min Φ = (u s x s k ū)t R(u s k ū) (7.5) k,us k x s k = Axs k + Bus k (7.6a) r = Cx s k (7.6b) Eu s k e (7.6c) F(Cx s k + C d d k k ) f (7.6d) Using the targets just computed, another optimization is performed to find the optimal inputs to the system. subject to min Φ = {u k } (z k z) T Q(z k z) + (u k u s k )T R(u k u s k ) (7.7) k= Eu k e k (7.8a) Fy k f k (7.8b) Note that while the temperature does not have a setpoint, it still has a penalty in the objective function z k = Cx s k + C d d k k (7.9)

223 In order to accurately quantify the regulator payoff, we first attempt to simulate the system with an additional disturbance. 195 x m k+1 = Âxm k + Bu m k (7.1a) y m k = Ĉxm k + b k (7.1b) in which b k = y m k y p k (7.11) We have used an output disturbance model to account for the differences between (A, B, C) and (Â, B, Ĉ), as well as any unmodeled disturbances in the system that cause differences between the plant outputs, y p k, and the predicted outputs, y m k. Put another way, we use b k to make the prediction errors of the simulated data look identical to the process data using the original estimator. In Figure 7.11, a snapshot of the composition tracking is shown, which has significantly improved. While the temperature is not being controlled, the simulated results from early in the data set can be seen in Figure The effects on the manipulated variable are also shown. In Figure 7.14, the difference between the inputs and the steady-state targets is shown. The results are repeated several days later in the data set to again demonstrate the consistency. The composition results are shown in Figure The temperature outputs are shown in Figure The manipulated variables are shown in Figure 7.16.

224 y k original ALS setpoint k Figure 7.11: Comparison of composition behavior, early set 1

225 197 y k original ALS k Figure 7.12: Comparison of temperature behavior, early set 1

226 original ALS Figure 7.13: Comparison of normalized input behavior, early set 1

227 199.2 y k original ALS setpoint k Figure 7.14: Comparison of composition behavior, late set 1

228 2 y k original ALS k Figure 7.15: Comparison of temperature behavior, late set 1

229 original ALS Figure 7.16: Comparison of normalized input behavior, late set 1

230 22 The regulator payoff can also be quantified in terms of objective function cost. We look at the expected value of the objective function over time. We expect the cost to converge to some steady-state value. E[Φ k ] = 1 k (z T k k r k)q(z k r k ) + (u k u s k )T R(u k u s k ) (7.12) j= The expected average objective function cost is shown in Figure The regulator performance using the updated estimator is nearly three times better than using the original estimator. We also define an effectiveness factor η similar to the one used 16 Φ original ALS k Figure 7.17: Comparison of regulator objective function costs in Tyler and Morari [155] that was used to compute the estimated regulator benefits

231 23 using additional sensors. η = E[Φ k original tuning] E[Φ k updated estimator] (7.13) Instead of evaluating the benefits of additional sensors, we aim to quantify the advantages of using the updated estimator over the original model. In the first data set, the steady-state effectiveness factor works out to be about Input Disturbance Model All of the results to this point have been based on a model using an integrated white noise disturbance in the output. We wish to evaluate whether additional benefits can be realized by using an input/output disturbance model. As shown in Pannocchia and Rawlings [123], it is insufficient to have m integrated disturbances to remove offset in the outputs. They show that using p integrating disturbances is sufficient to guarantee offset-free control. Therefore, we use a mixed input/output disturbance model. x k+1 = Ax k + Bu k + B d p k p k+1 = p k + ν k d k+1 = d k + ξ k y k = Cx k + C d d k (7.14a) (7.14b) (7.14c) (7.14d) We assign the composition to the input disturbance and the temperature to the output

232 24 disturbance. B d = B C d = 1 (7.15) The covariance estimator techniques are applied to find: L x L p = L d nd p n nd p (7.16) The covariance estimation methods are unable to converge to an optimal solution. After the second iteration, the method drives the estimator into the unstable region. The prediction errors of the first two iterations are shown in Figure The regulator.15 line 1 line 2.2 line 1 line Figure 7.18: Before and after prediction errors, input disturbance model is simulated for the suboptimally terminated input disturbance model and compared to the regulator performance using the output disturbance model. The input disturbance model has an efficiency factor less than one, indicating it is less suitable than

233 25 η k 12 ID OD baseline k Figure 7.19: Regulator efficiency factors, input versus output models

234 26 the original tuning provided. We conjecture that the poor performance of the input disturbance model is due to the long time delay in the system Time-delay Parameter To this point, we have assumed that the time delay used in constructing the model is correct. We remove this assumption next, and adapt the methods to find the optimal filter of the form L x L d = nndp p nd n ndp p n nd p (7.17) in which n ndp is the number of time-delay parameters to be estimated. When the new filter is computed, the time-delay parameters are estimated to be zero (or nearly so), indicating that the time delay was known correctly, and using the output disturbance method without time-delay parameters is sufficient Computational Summary The following computation times are based on a Pentium-4, 2.4 GHz machine. As the filter gain estimate changes, the condition number of the least-squares problem also changes. Both the minimum and maximum condition numbers are listed for each

235 27 method. Method Iterations CPU time(s) min(γ) max(γ) Output Disturbance Input Disturbance Output Disturbance w/ TD parameters Table 7.2: Computational summary - Eastman set Data Set 2 A second data set was acquired from the plant several weeks after the first data set. An overview of the data is shown in Figure 7.2. The periodic behavior in the data is due to a downstream controller being turned off. As a result, there is no setpoint is available for the data set. Therefore, we are unable to reconstruct the regulator for this particular data set. We can however, compare the covariances of the prediction errors. Table 7.3 compares the prediction errors of the original and the filter computed from the first data set. Note that the prediction error variance is greatly reduced. The whiteness of the innovations (not shown) is improved as well. While we are unable to compare the regulator performances in this situation, we anticipate that the efficiency factor would be significantly greater than one. 1 This method terminated suboptimally

236 28 u k input Comp k Composition Temperature Setpoint Temp Figure 7.2: Normalized process data - set 2 k

237 29 Data Original Estimator ALS Estimator -5k Cov[Y k ]= [ ] [ Cov( ) k Cov[Y k ]= [ ] [ Cov( ) k Cov[Y k ]= [ ] [ Cov( ) k Cov[Y k ]= [ ] [ Cov( ) k Cov[Y k ]= [ ] [ Cov( ) k Cov[Y k ]= [ ] [ ] ] ] ] ] ] Cov( ) Table 7.3: Summary of new filter applied to set 2

238 Data Set 3 The third data set consists of approximately 3, points, but only 6, of those points are under active control. The data under control is shown in Figure Even though the entire data set is not under control, we can again compare the prediction errors. Table 7.4 shows the covariance of the prediction errors using the original and ALS estimators. The prediction error variance is again greatly reduced through the use of the updated estimator. Recall that the third data set occurs several months after the first. The fact that the updated estimator from the first data set performs well on the third data set, the unmodeled effects do not vary over time. As a further test of consistency, we compute a new filter gain based on the third data set. We simulate the control performance for using the new filter gain and compare it to the performance using the estimator computed from the first data set. The efficiency factor is about 4. for both cases. We conclude the following from the data provided by Eastman. 1. The closed-loop control of the temperature and composition can be improved by using an updated state estimator 2. An output disturbance model appears to be a better choice than an input disturbance model 3. The time delay in the model appears to be appropriate.

239 211 u k input Comp k Composition Temperature Setpoint Temp k Figure 7.21: Normalized process data - set 3

240 212 Data Original Estimator ALS Estimator -5k Cov[Y k ]= [ ] [ Cov( ) k Cov[Y k ]= [ ] [ Cov( ) k Cov[Y k ]= [ ] [ ] ] ] Cov( ) k Cov[Y k ]= [ ] [ Cov( ) k Cov[Y k ]= [ ] [ Cov( ) k Cov[Y k ]= [ ] [ ] ] ] Cov( ) Table 7.4: Summary of new filter applied to set 3

241 213 Chapter 8 Application to Laboratory Data 1 A mathematician is a device for turning coffee into theorems. Paul Erdos In addition to applying the ALS methods to industrial data, we applied them to a laboratory apparatus as well. Using laboratory data allows more experiments to be run, and to accurately quantify the potential regulator benefits. 1 An expanded version of this chapter is available in [118] and portions were published in Odelson, Lutz, and Rawlings [119]

242 Apparatus and Reaction We construct a laboratory CSTR reactor to convert acetic anhydride and water to acetic acid. Ac 2 O + H 2 O 2AcOH (8.1) A schematic of the CSTR is shown in Figure 8.1. The controlled variable is the con- F W T f c A,f F A T f Q T C c C T Figure 8.1: Schematic of reactor centration of acetic acid. The water flow rate is constant and far greater than the flow rate of acetic anhydride, which is used as input to the system. We can measure the temperature in the reactor as well as the conductance of the solution. The conductance is used to calculate the concentration of acetic acid. The output flow is driven by the pressure difference between top and bottom of the reactor. We assume that the volume in the reactor is constant. This assumption is valid since the constant flow rate of water into the reactor is much greater than the varying flow rate of acetic

215 anhydride, so the volume effect of which is neglected. A picture of the laboratory apparatus is shown in Figure 8.2. Figure 8.2: Laboratory setup 8.

243 215 anhydride, so the volume effect of which is neglected. A picture of the laboratory apparatus is shown in Figure 8.2. Figure 8.2: Laboratory setup 8.2 Model For model predictive control, a dynamic model of the plant is required. We develop this model from first principles techniques. The reaction consists of several steps with many intermediates [29]. However, our data shows that the complicated reaction expressions can be well approximated by a single reaction equation as long as the concentration of water is in excess. Beginning with the material balances for an

Online monitoring of MPC disturbance models using closed-loop data

Online monitoring of MPC disturbance models using closed-loop data Brian J. Odelson and James B. Rawlings Department of Chemical Engineering University of Wisconsin-Madison Online Optimization Based Identification