A Robust Strategy for Joint Data Reconciliation and Parameter Estimation

A Robust Strategy for Joint Data Reconciliation and Parameter Estimation Yen Yen Joe 1) 3), David Wang ), Chi Bun Ching 3), Arthur Tay 1), Weng Khuen Ho 1) and Jose Romagnoli ) * 1) Dept. of Electrical & Computer Engineering, The National University of Singapore, 1 Kent Ridge Cres., Singapore 1196 ) Dept. of Chemical Engineering, The University of Sydney, NSW 6, Australia 3) Institute of Chemical and Engineering Sciences, Ayer Rajah Cres., Block 8, Unit #-8, Singapore 139959 Abstract In this work, the generalized T (GT) distribution is used to develop a statistically robust joint data reconciliation parameter estimation (DRPE) strategy. The robustness feature is provided by the GT distribution, which includes Normal, Laplacian and Cauchy distribution as special cases. We use historical data to first estimate the parameters of the GT distribution, so that the resulting estimator is efficient when the error is in the GT family. The strategy is implemented in a simulation of a practical chemical engineering plant. The results confirm the robustness and efficiency of the estimator. Keywords: parameter estimation, data reconciliation, error-in-all-variables, robust, estimators 1. Introduction A more efficient approach than the sequential data reconciliation (DR) parameter estimation (PE) that is common in practice is to jointly perform DR and PE (DRPE), such that the resulting reconciled data and model parameters are consistent with respect to both the process model and DR constraints. The DRPE can also be viewed as the error-in-all-variables-measured (EVM) formulation, which is the generalization of the conventional PE: in EVM, all measurements are s.t. errors, such that the distinction between independent and dependent variables is no longer clear (Romagnoli et al, ). The three main aspects of EVM discussed in the literature are the EVM algorithm (Valko et al, 1987), the optimization strategy (Kim et al, 199; Tjoa et al, 1991) and the robustness of the EVM estimation (Albuquerque et al, 1996; Arora et al, 1). In this paper we will focus mainly on the robustness of the EVM estimation. Various robust estimation approaches such as the M-estimators have been proposed, but most assumed, in a priori, some forms of error distribution, which, although robust, might not be representative of the actual distribution. On the other hand, the nonparametric methods such as the kernel function (Wang et al, 3) are free of such assumptions and fully flexible, but are also complex and computationally demanding. * Author to whom correspondence should be addressed: jose@chem.eng.usyd.edu.au

An alternative is to strike a balance between the simplicity of the parametric approach and the flexibility of the non-parametric approach, i.e. by adopting a specific objective function that covers a wide variety of common distributions. This corresponds to the generalized T (GT) distribution. The parameters of the GT distribution can be estimated posteriori to ensure its suitability to the data. In this work, we extend the robust DR strategy using GT distribution (Wang et al, 3) to incorporate parameter estimation. This results in a statistically robust EVM strategy that is also efficient. The paper is organised as follows. The next section discusses the incorporation of the robustness feature into DRPE within a probabilistic framework. Section 3 describes the DRPE strategy using the GT distribution, which is then applied to a case study of a general purpose chemical engineering plant in Section 4. Finally, Section 5 concludes the paper.. The Robustness of DRPE Estimator Within a probabilistic framework, by maximum-likelihood principle, the DRPE can be formulated as: max f ( ε ) = min -log(f( ε)) = min ρ( ε ) x, u, θ x, u, θ x, u, θ () s.t. model and bounds where ε = y x is the measurement error, and f (ε ) is the probability density of the error. As the efficiency of the estimator depends on how well f (ε ) characterizes the actual error, the estimator can be made robust by reducing the sensitivity of ρ(ε ) to large values of ε. This corresponds to the robust M-estimator. The robustness of the M- estimators can be explained by the influence function (IF), defined by ψ ( ε)= ρ( ε) ε. Essentially, IF gives a rough measure of how much influence a particular residual has in the estimation (McDonald et al, 1988; Hampel et al, 1986), so it is desirable to have an IF that is bounded for large residuals in order for them to have limited influence on the estimation. It should be pointed out at this point that the conventional WLS, where ρ( ε ) ε, has the IF that is a straight line, which is why large residuals have unlimited influence on and can dominate the estimation, resulting in biased estimates. Common choices of robust M-estimator such as the contaminated normal (Tjoa et al, 1991), the combination of Laplace and Normal distribution (Wang et al, 3), the fair function (Albuquerque et al, 1996) and the redescending estimator (Arora et al, 1) depend on parameters which either are assigned to them a priori or do not have meaningful association with the error distribution. As a result, the underlying error distribution may not be well characterised and the estimators may not be efficient in the MLE sense (Wang et al, 3). We therefore propose the use of GT distribution as robust estimator for DRPE problems, as the GT has the advantages that enable it to be robust while not sacrificing efficiency. This will be elaborated in the next section. 3. Robust DRPE Using the Generalized T (GT) Distribution The use of the GT distribution in estimation is first proposed by McDonald et al (1988) due to its flexibility to accommodate various distributional shapes. The density function is given by:

p f GT ( ε; σ, p, q) = ; - < ε < (3) q+ 1 p p 1/ p ε σq B(1 p, q) 1 + p qσ Depending on the values taken by { p, q,, can take the shape of any distribution within the family defined by the GT distribution. As illustrated in Figure 1, it covers most of the important distributions that are commonly encountered in practice. f GT GT p = q σ =α 1 8 6 4 (a) q=5, sigma=1 p= -.- p=4 - -4 Power exponential/ Box-Tiao T-distribution, df = q -6-8 -1-5 -4-3 - -1 1 3 4 5 p =1 p = σ =α q q = 1 8 6 4 (b) p=, sigma=1 q=5 -.- q=4 Double exponential/ Laplacian Normal Figure 1: GT Distribution Tree Cauchy - -4-6 -8-5 -4-3 - -1 1 3 4 5 Figure : Influence Functions of GT with different parameter settings The IFs of f GT with different settings of { p, q, is shown Figure. σ only affects the distribution spread, while p and q determine the shape. It is seen that the robustness criteria are satisfied as the IFs are bounded and actually descending when the residuals get large. The two properties of GT demonstrated above: flexibility and robustness, enable us to achieve a robust yet efficient estimator in the MLE sense. This justifies our main motivation in selecting GT over other M-estimators, which as mentioned in Section, may not be efficient in the MLE sense as they may not characterize the error distribution well. The GT, on the other hand, can take a wide range of distributional shapes depending on its parameter values. When these parameters are estimated from the data, it is able to adapt its shape to the data. We therefore use a set of historical data to estimate the distribution parameters. To ensure robustness, however, care must be taken in estimating the distribution parameters. We see that as p increases (Figure a), the IF value for large residuals increases, and as q increases (Figure b), the IF becomes less bounded. It is thus necessary to impose bounds on p and q if robustness is to be preserved (in this work, p 5, q 5 ). We underline that although the bounds exclude a part of the GT family, by estimating p and q from the data, we effectively fit the data with the GT distribution within the parameter bounds. This ensures that the exclusion will have little effect on the asymptotic efficiency of the estimator (McDonald et al, 1988).

To estimate the GT distribution parameters, a preliminary reconciliation of the historical data is performed to obtain the residuals ε, which is then fed to the maximum likelihood estimator, given by (Wang et al, 3): max log f GT ( ε ; p, q, σ ) p, q, } (4) { σ The estimates of { p, q, are then obtained as the parameters of a GT member from which the data are most likely sampled. In DRPE, some of the measurement variables are non-redundant, which complicates the estimation of the distribution parameters. Since this work deals with steady-state data, we take the median as the estimated value of the non-redundant measurements. Taking the median as the estimated values corresponds to the use of robust L-estimator (Albuquerque et al, 1996; Hampel et al, 1986); however, it can only be used when the variables have repeated measurements or are known to be constant over the time horizon considered. In the case where these conditions do not hold, a robust preliminary parameter estimation or DRPE has to be performed to obtain the residuals for the nonredundant measurements. This is more complex and computationally expensive, especially if we would like to update the distribution parameters online, as the estimation of the distribution parameters may need a number of iterations. Another alternative is to assign fixed values of { p, q, that are sufficiently robust. In this case, the efficiency is traded off for convenience. Figure 3: Case Study Plant Flowsheet 4. Application Case Study The proposed robust DRPE strategy is applied to a case study of a pilot-scale setting containing two CSTRs, a mixer and a number of heat exchangers (Figure 3). Material feed from the feed tank is heated before being fed to the first reactor and the mixer. The effluent from first reactor is then mixed with the material feed in the mixer, and then fed to the second reactor. The effluent from the second reactor is, in turn, fed back to the feed tank and the cycle continues. Steady-state analysis of the system structure results in seven redundant equations involving 14 redundant variables. The model parameters estimated are the product of the heat transfer coefficient with the effective heat transfer area of the steam jacket (P1)

and the cooling coil (P) of the first reactor. For parameter estimation, two more model equations with five non-redundant variables are included. Associated with the pilot-scale plant, a virtual environment has been developed within the Matlab/Simulink framework which mimics the actual plant behaviour and will be used in this paper while the plant is being commissioned. Simulation data are generated with several different distributions: Normal, Laplacian and Cauchy distribution. The different distributions are considered as outliers. A data set having Normal distribution and with large random shifts as gross error is also generated. We then perform DRPE using three different methods, the conventional WLS, the contaminated (bivariate) Gaussian distribution with gross error probability p =. and gross error ratio b = (Tjoa et al, 1991), and the GT distribution with distribution parameters { p, q, estimated from a historical data set (n=1) having similar distribution as the current data set. The performance criterion used to compare the efficiencies of the different methods is the mean-squared error (MSE): MSE = 1 mk K m j= 1 i= 1 ) ( x i, j xi, j ) σ i where m is the number of measured variables and K is the number of data sets used for ) the DRPE (m=19, K=1 in this study). and are the estimates of the reconciled x i, j data and the actual value of the variable, respectively, while σ i is the standard deviation of the Gaussian noise on sensor i. The MSE results are shown in Figure 4, while Table 1 lists the estimated model parameters P1 and P. The fact that the bivariate Gaussian and GT method are more efficient (lower MSE and % discrepancy of parameter estimates) than WLS for distributions other than Normal, and for Normal noise with gross error, proves the robustness of the two M-estimators. Compared to the bivariate Gaussian method, the GT method is more efficient for the Laplacian and Cauchy error distributions, which are special cases of the GT distribution. The GT distribution parameters for some variables for the case of Laplacian and Cauchy noises are listed in Table. The reader can refer back to the distribution tree in Figure 1 to see that the values of the estimated p and q are close to the ideal p and q for the respective distributions. For example, for Laplacian noise, ideally p=1 and q ; the estimated p are close to one, while q are large or close to the upper bound, i.e. q=5. Figure 5 plots the relative frequency distribution of the noise, the estimated GT density with estimated{ p, q,, the bivariate density (p=.,q=), and the normal distribution (N(, σ )) corresponding to the actual noise, GT, bivariate and WLS estimator, respectively, for a temperature variable with Laplacian noise. It is seen that the GT estimator characterizes the data best, which explains its lowest MSE and %discrepancy for Laplacian noise in Figure 4 and Table 1. The same can be concluded for Cauchy noise. 5. Conclusion The DRPE based on GT distribution is robust and efficient, especially when the underlying error distribution is within the GT family. Since the GT family encompasses x i, j (5)

a wide variety and many important statistical distributions, the GT-based estimator is a very viable choice of estimator considering its simplicity. MSE 4.5 4 3.5 3.5 1.5 1.5 WLS Bivariate GT Normal Normal + Gross Error Laplacian Cauchy.5.4.3. *-. : relative freq : GT relative freq - - GT : Bivariate bivariate WLS Figure 4: Performance Comparison for Different Noise Profiles -6-4 - 4 Figure 5: Distribution Plots 6 Table 1: Model Parameter Estimates and Their Accuracies %discrepancy %discrepancy P1 P P1 P P1 P P1 P Actual value 37,5 56,5 -- -- 37,5 56,5 -- -- Normal WLS 39,17 57,747 4.45.1 Laplace 39,5 57,761 4.55.3 Biv. 36,6 58, -3.98.65 39,17 57,797 4.9.3 GT 37,96 57,769 -.54.5 36,545 57,951 -.55.57 Normal+ WLS 4,331 57,46 7.55 1.64 Cauchy 41,34 51,988 1.5-7.99 Gross Biv. 39,45 57,539 4.65 1.84 36,1 58, -4..65 Error GT 36,3 57,994-3.91.64 36,1 58, -4..65 Table : GT Distribution Parameter Estimates Cauchy Noise Laplacian Noise Variable p=1 q =.5 p=1 q inf T7 1.9387.5 1.464 4.6 T9 1. 1.1839 1.135 47.881 T1 1.478.76.186 1.3741 T1 1.8465.5 1.3936 19.733 Trx.699.5 1.84 5. Tmx.993.5 1. 48.5511.1 References Albuquerque, J. S., Biegler, L.T., 1996. AIChE J., Vol. 4, No. 1, pp. 841-856. Arora, N., Biegler, L.T., 1. Comp. Chem. Eng., Vol. 5, pp. 1585-1599. Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., Stahel, W.A., 1986, Robust Statistics: The Approach Based on Influence Functions, Wiley. Kim, I.W., Liebman, M.J., and Edgar, T.F., 199. AIChE, Vol. 36, pp. 985-993. McDonald, J.B., Newey, W.K., 1988, Partially Adaptive Estimation of Regression Models via the Generalized T Distribution, Econometric Theory, Vol. 4, pp.48-457. Romagnoli, J.A., Sanchez, M.C.,, Data Processing and Reconciliation for Chemical Process Operations, Academic Press. Tjoa, I.B., Biegler, L.T., 1991. Comp. Chem. Eng., Vol. 15 No. 1, pp. 679-69. Valko, P., Vadja, S., 1987. Comp. Chem. Eng., Vol. 11, pp. 37-43. Wang, D., Romagnoli, J.A., 3. Ind.Eng.Chem.Res.,Vol.4, No.13, pp. 375-384.