On Input Design for System Identification

On Input Design for System Identification Input Design Using Markov Chains CHIARA BRIGHENTI Masters Degree Project Stockholm, Sweden March 2009 XR-EE-RT 2009:002

Abstract When system identification methods are used to construct mathematical models of real systems, it is important to collect data that reveal useful information about the systems dynamics. Experimental data are always corrupted by noise and this causes uncertainty in the model estimate. Therefore, design of input signals that guarantee a certain model accuracy is an important issue in system identification. This thesis studies input design problems for system identification where time domain constraints have to be considered. A finite Markov chain is used to model the input of the system. This allows to directly include input amplitude constraints into the input model, by properly choosing the state space of the Markov chain. The state space is defined so that the model generates a binary signal. The probability distribution of the Markov chain is shaped in order to minimize an objective function defined in the input design problem. Two identification issues are considered in this thesis: parameter estimation and NMP zeros estimation of linear systems. Stochastic approximation is needed to minimize the objective function in the parameter estimation problem, while an adaptive algorithm is used to consistently estimate NMP zeros. One of the main advantages of this approach is that the input signal can be easily generated by extracting samples from the designed optimal distribution. No spectral factorization techniques or realization algorithms are required to generate the input signal. Numerical examples show how these models can improve system identification with respect to other input realization techniques. 1

Acknowledgements Working on my thesis at the Automatic Control department at KTH has been a great experience. I would like to thank my supervisor Professor Bo Wahlberg and my advisor Dr. Cristian R. Rojas for having given me the possibility to work on very interesting research topics. Thank you for your guidance. A special thank goes to Cristian R. Rojas, for the time he spent on me and for the many ideas shared with me. Thanks to all the people I had the pleasure to know during this work period: Mitra, Andre B., Fotis, Pedro, Mohammad, Andre, Pierluigi, Matteo, Davide and Alessandro. I really enjoyed your company. I would like to thank Professor Giorgio Picci who made this experience possible. Finally, thanks to my family and my friends for their continuous help and support. 2

Acronyms AR Autoregressive FDSA Finite Difference Stochastic Approximation FIR Finite Impulse Response LMI Linear Matrix Inequality LTI Linear Time Invariant NMP Non Minimum Phase PEM Prediction Error Method PRBS Pseudo Random Binary Signal RLS Recursive Least Squares SISO Single Input Single Output SPSA Simultaneous Perturbation Stochastic Approximation WN White Noise I

Contents Abstract................................. I 1 Introduction 1 1.1 Thesis outline and contributions................ 3 2 System Identification 4 2.1 System and model description.................. 4 2.2 Identification method...................... 5 2.3 Estimate uncertainty....................... 6 2.3.1 Parameter uncertainty.................. 6 2.3.2 Frequency response uncertainty............. 7 2.4 Conclusions............................ 7 3 Input Design for System Identification 9 3.1 Optimal input design problem.................. 9 3.2 Measures of estimate accuracy................. 10 3.2.1 Quality constraint based on the parameter covariance 11 3.2.2 Quality constraint based on the model variance.... 11 3.2.3 Quality constraint based on the confidence region... 12 3.3 Input spectra parametrization.................. 12 3.3.1 Finite dimensional spectrum parametrization..... 13 3.3.2 Partial correlation parametrization........... 14 3.4 Covariance matrix parametrization............... 15 3.5 Signal constraints parametrization............... 15 3.6 Limitations of input design in the frequency domain..... 16 3.7 Conclusions............................ 17 4 Markov Chain Input Model 18 4.1 Introduction............................ 18 4.2 Markov chains model....................... 19 4.2.1 Markov chain state space................ 19 II

4.2.2 Markov chains spectra.................. 21 4.3 More general Markov chains................... 25 4.4 Conclusions............................ 26 5 Estimation Using Markov Chains 27 5.1 Problem formulation....................... 27 5.2 Solution approach........................ 28 5.3 Cost function evaluation..................... 28 5.4 Algorithm description...................... 29 5.5 Numerical example........................ 31 5.6 Conclusions............................ 42 6 Zero Estimation 44 6.1 Problem formulation....................... 44 6.2 FIR................................ 46 6.3 ARX................................ 46 6.4 General linear SISO systems................... 47 6.5 Adaptive algorithm for time domain input design....... 48 6.5.1 A motivation....................... 48 6.5.2 Algorithm description.................. 49 6.5.3 Algorithm modification for Markov chain signals generation........................... 49 6.5.4 Numerical example.................... 51 6.6 Conclusions............................ 52 7 Summary and future work 53 7.1 Summary............................. 53 7.2 Future work............................ 54 References 55 III

List of Figures 4.1 Graph representation of the two states Markov chain S 2.... 20 4.2 Graph representation of the four states Markov chain S 4... 20 4.3 Graph representation of the three states Markov chain S 3... 25 5.1 Mass-spring-damper system.................... 31 5.2 Cost functions f 2 and f 3 in the interval of acceptable values of the variables u max and y max, respectively.......... 32 5.3 Cost functions f 1, f 4,f 5 in the interval of acceptable values. 32 5.4 Estimate of the cost function on a discrete set of points for the two states Markov chain in the case 2........... 34 5.5 Estimation of the best transition probability for the two states Markov chain in the case 2.................... 35 5.6 Estimation of the best transition probability p for the four states Markov chain in the case 2................ 37 5.7 Estimation of the best transition probability r for the four states Markov chain in the case 2................ 37 5.8 Estimate of the cost function on a discrete set of points for the two states Markov chain in the case 1............ 38 5.9 Estimate of the cost function on a discrete set of points for the four states Markov chain in the case 1............ 38 5.10 Bode diagrams of the optimal spectra of the 2 states Markov chains in the cases 1, 2 and 3 of Table 5.3, and of the real discrete system........................... 40 5.11 Bode diagrams of the optimal spectra of the 4 states Markov chains in the cases 1, 2 and 3 of Table 5.3, and of the real discrete system........................... 40 5.12 Estimates of the frequency response of the system using the optimal two states Markov chains for the cases 1, 2 and 3 in Table 5.1.............................. 41 IV

5.13 Estimates of the frequency response of the system using the optimal four states Markov chains for the cases 1, 2 and 3 in Table 5.1.............................. 42 6.1 Representation of the adaptive algorithm iteration at step k. u k denotes the vector of all collected input values from the beginning of the experiment, used for the output prediction ŷ (k)................................. 50 6.2 A zero estimate trajectory produced by the adaptive algorithms described in Sections 6.5.2 and 6.5.3........... 51 6.3 Normalized variance of the estimation error for the adaptive algorithms described in Sections 6.5.2 and 6.5.3........ 52 V

List of Tables 4.1 Poles and zeros of the canonical spectral factor of the spectrum of s m of Example 4.2.4................... 24 5.1 Maximum threshold values in the three analyzed cases..... 33 5.2 Results of 100 Monte-Carlo simulations of the algorithm with the 2 states Markov chain..................... 34 5.3 Optimal values of the transition probabilities in the cases 1, 2 and 3, obtained after 30000 algorithm iterations....... 35 5.4 Total cost function values obtained with the optimal Markov inputs, a PRBS and white noise in case 1............ 36 5.5 Trace of the covariance matrix obtained with the optimal Markov inputs, a PRBS, white noise, a binary input having the optimal correlation function and the optimal spectrum in case 2................................ 36 5.6 Total cost function value obtained with the optimal Markov inputs, a PRBS and white noise in case 3........... 36 5.7 Estimated values of the parameters of the continuous real system and relative percentage errors, obtained with the optimal two states Markov chains..................... 41 5.8 Estimated values of the parameters of the continuous real system and relative percentage errors, obtained with the optimal four states Markov chains..................... 42 VI

Chapter 1 Introduction Mathematical models for systems are necessary in order to predict their behavior and as parts of their control systems. This work focuses on models constructed and validated from experimental input/output data, by means of identification methods. Information obtained through experiments on the real system depend on the input excitation which is often limited by amplitude or power constraints. For this reason, experiment design is necessary in order to obtain system estimates within a given accuracy, saving time and cost of the experiment [1]. Robustness of input design for system identification is also one of the most important issues, specially when the model of the system is used for projecting its control system. In [2]- [7] some studies on this problem are presented. The effects of undermodeling on input design are pointed out in [8] and [9]. Depending on the cost function considered in this setting, input design can be typically solved as a constrained optimization problem. In the Prediction Error Method (PEM) framework it is common to use, as a measure of the estimate accuracy, a function of the asymptotic covariance matrix of the parameter estimate. This matrix depends on the input spectrum that can then be shaped in order to obtain a small covariance matrix and improve the estimate accuracy (see [10], [11]). Usually, a constraint on the input power is also included; in this way, time domain amplitude constraints are approximately translated in the frequency domain [12]. A first disadvantage of these methods is that they are strongly influenced by the initial knowledge of the system. Secondly, solving the problem in the frequency domain does not provide any further information on how to generate the input signal in the time domain: the input can be represented as filtered white noise, but many probability distributions can be used to 1

CHAPTER 1. INTRODUCTION 2 generate white noise. Furthermore, in practical applications time domain constraints on signals have to be considered and the power constraint that is usually set in the frequency domain does not assure that these constraints are respected. For this reason, in [13] a method is proposed to generate a binary input with a prescribed correlation function; once an optimal spectrum or correlation function is found solving the input design problem in the frequency domain, it is possible to generate a binary signal which approximates the optimal input. Also in [14] a method is proposed that provides a locally optimal binary input in the time domain. This thesis studies the input design problem in the probability domain. Compared to design methods in the frequency domain, a solution in the probability domain makes it easier to generate input trajectories to apply to the real system, by extracting samples from a given distribution. Inputs are modeled by finite stationary Markov chains which generate binary signals. Binary signals are often used in system identification and one of the reasons is that they achieve the largest power in the set of all signals having the same maximum amplitude and it is well known that this improves parameter estimation for linear models. The idea of modeling the input by a finite Markov chain derives from the possibility of including input amplitude constraints directly into the input model, by suitably choosing the state space of the Markov chain. Furthermore, unlike the design in the frequency domain, this approach keeps more degrees of freedom in the choice of the optimal spectrum, which in general is non unique [12]. Two identification problems are considered here: parameter estimation and non-minimum phase zeros estimation for LTI systems. For the first problem, the optimal distribution is found by minimizing the cost function defined in the input design problem with respect to the one-step transition probabilities of the Markov chain. In this analysis, a stochastic approximation algorithm is used since a closed-form solution to the optimization problem is not available and the cost is a stochastic function of these transition probabilities and is contaminated with noise (see [15], [16] for details). For the second problem, it will be shown that the Markov chain input model has exactly the optimal spectrum for the zero estimation problem of a FIR or ARX model [6]. In general, the spectrum of a two states Markov chain can be made equal to the spectrum of the AR process which guarantees a consistent estimate of the NMP zero of a linear SISO system [17]. Therefore, an optimal or consistent input can be generated in the time domain by Markov chain distributions. The adaptive algorithm introduced

CHAPTER 1. INTRODUCTION 3 in [18] for input design in the time domain when undermodeling is considered, is modified here in order to generate Markov chain signals having the same spectrum as the general inputs designed in the original version of the algorithm. The advantage is that a binary signal is then used to identify the non-minimum phase zero, by keeping the same input variance and spectrum. The outline of the thesis is presented in the next section. 1.1 Thesis outline and contributions The subject of this thesis is input design for system identification. In particular, the objective of this study is to analyze a new approach to input design: work in the probability domain model the input signal as a finite Markov chain. The first chapters summarize some known results of system identification in the PEM framework and input design in the frequency domain, in order to compare methods and results obtained with the classical input design method in the frequency domain and with the method proposed here. In Chapter 2 PEM and its asymptotic properties are reviewed. In Chapter 3 the most commonly adopted input design methods are described. These formulate input design problems as convex optimization problems. The solution is given as an optimal input spectrum. In Chapter 4 general Markov chains that model binary input signals are defined. Some spectral properties are also described. In Chapter 5 are presented the input design problem for parameter estimation and the solution approach based on the Markov chain input model. In Chapter 6 input design for identification of NMP zeros of a LTI system is considered. The chapter discusses classical solutions in the frequency domain and adaptive solutions in the time domain in undermodeling conditions where the input is modeled as an AR or a Markov chain process. Chapter 7 concludes the thesis.

Chapter 2 System Identification This chapter introduces system identification in the PEM framework for parameter estimation of LTI models. Once a model structure is defined, this method finds the model s parameters that minimize the prediction error. Even when the model structure is able to capture the true system dynamics, the estimate error will not be zero, since data used in the identification procedure are finite and corrupted by noise. The purpose of input design is to minimize the estimate error by minimizing the variance of the parameter estimates, assuming the estimation method is consistent. Section 2.1 defines systems and model structures considered in this work, while Section 2.2 discusses PEM and its asymptotic properties. 2.1 System and model description This thesis considers discrete-time LTI SISO systems, lying in the set M of parametric models y (t) = G (q, θ) u (t) + H (q, θ) e (t), (2.1) G (q, θ) = q n k b 1 + b 2 q 1 + + b nb q n b 1 + a 1 q 1 + + a na q na H (q, θ) = 1 + c 1q 1 + + c nc q n c 1 + d 1 q 1 + + d nd q n d, θ = [b 1,..., b nb, a 1,... a na, c 1,... c nc, d 1,... d nd ] T R b 1 where u (t) is the input, y (t) is the output and e (t) is zero mean white noise with finite variance. The symbol q 1 represents the delay operator (q 1 u (t) = u (t 1)). Assume H (q, θ) is stable, monic and minimum phase, i.e. poles and zeros lie inside the unit circle. 4

CHAPTER 2. SYSTEM IDENTIFICATION 5 The real system S is given as y (t) = G 0 (q) u (t) + H 0 (q) e 0 (t), (2.2) where e 0 (t) has finite variance λ 0. Assume there exists θ 0, parameter vector such that G (q, θ 0 ) = G 0 (q) and H (q, θ 0 ) = H 0 (q), i.e. assume there is no undermodeling: S M. (2.3) This condition is hardly satisfied in practice, since real systems often are of high order or non linear. Nevertheless, as it will be explained in the next section, this condition is crucial for the consistence and the asymptotic properties of PEM. In Chapter 5, regarding the parameter estimation problem, (2.3) will be supposed to hold, while in Chapter 6, where the zero estimation problem is analyzed, this condition will not necessarily be considered. 2.2 Identification method System identification aims at describing a real system through a mathematical model constructed and validated from experimental input-output data. The identification method considered here is PEM [19]. This method minimizes the function of the prediction error ε F (t, θ) V N ( θ, Z N ) = 1 2N N ε 2 F (t, θ) (2.4) where Z N is a vector containing the collected input-output data, i.e. Z N = [y (1), u (1), y (2), u (2)..., y (N), u (N)]. The prediction error is defined as ε F (t, θ) = y (t, θ) ŷ (t, θ), where the one-step ahead predictor is given by t=1 ŷ (t, θ) = H 1 (q, θ) G (q, θ) u (t) + [ 1 H 1 (q, θ) ] y (t). Suppose all the hypothesis for the consistence of PEM are satisfied; in that case, the parameter estimate ˆθ N converges to the true parameter vector θ 0 as N tends to infinity. Briefly, these conditions are 1. Condition (2.3) holds, i.e. there is no undermodeling 2. The signals y (t) and u (t) are jointly quasi-stationary 3. u (t) is persistently exciting of sufficiently large order

CHAPTER 2. SYSTEM IDENTIFICATION 6 2.3 Estimate uncertainty Measuring the quality of the model estimate is an important issue in system identification. The measure is chosen depending on the application for which the model is required. One possibility to measure the estimate uncertainty is to use a function of the covariance matrix of the parameter estimate. In other cases, such as in control applications, it could be better to use the variance of the frequency response estimate in the frequency domain. These two cases are now presented in more detail. 2.3.1 Parameter uncertainty Under the assumptions for the consistency of PEM, it holds that N (ˆθN θ 0 ) N (0, P θ0 ) as N (2.5) P 1 θ 0 = 1 λ 0 E [ ψ (t, θ 0 ) ψ T (t, θ 0 ) ] ψ (t, θ 0 ) = ŷ (t, θ) θ where N denotes the Normal distribution [19]. Therefore, when the model class is sufficiently flexible to describe the real system, the parameter estimate will converge to the true parameter vector as the number of data N used in the estimation goes to infinity, with covariance decaying as 1 /N. From (2.5) it follows that a confidence region in which the parameter estimate will lie with probability α is { ( U θ = θ N θ ˆθ ) T ( N P 1 θ 0 θ ˆθ ) } N χ 2 α (n). (2.6) The covariance matrix defines an ellipsoid asymptotically centered in θ 0. Upon the condition that u and e are independent (that is, data are collected in open-loop), the asymptotic expression in the number of data points N of the inverse of the covariance matrix of the parameter estimate is P 1 θ 0 = N 2πλ 0 R e (θ 0 ) = N 2π π π π π θ0 ( F u e iω ), θ 0 Φu (ω) Fu ( e iω ), θ 0 dω + Re (θ 0 ) (2.7) F e ( e iω, θ 0 ) F e ( e iω, θ 0 ) dω

CHAPTER 2. SYSTEM IDENTIFICATION 7 where ( F u e iω ), θ 0 = H 1 ( e iω ) G ( e iω, θ ), θ 0 (2.8) θ θ0 ( F e e iω ), θ 0 = H 1 ( e iω ) H ( e iω, θ ), θ 0 (2.9) θ θ0 and Φ u (ω) is the power spectral density of the input u (t). Here denotes the complex conjugate transpose. Expression (2.7) shows that the asymptotic covariance matrix of the parameter estimate depends on the input spectrum. Therefore, by shaping Φ u (ω) it is possible to obtain estimates within a given accuracy. In the whole thesis it will assumed that there is no feedback in the system, i.e. u and e are independent. 2.3.2 Frequency response uncertainty In many applications, it could be preferable to measure the quality of the model estimate using the variance of the frequency response estimate, frequency by frequency. In [19] it is shown that under the condition (2.3), the variance of G (e iω, ˆθ ) N can be approximated by ( Var G (e iω, ˆθ )) N m Φ v (ω) N Φ u (ω) (2.10) for large but finite model order m and number of data N, where v is the process defined as v (t) = H 0 (q) e 0 (t). If the model order is not large enough the previous expression is not a good approximation. Instead, by the Gauss approximation formula, it is possible to write ( Var G (e iω, ˆθ )) N 1 G ( e iω, θ ) G ( e iω, θ ). (2.11) N θ θ θ0 θ 0 P θ0 Equation (2.11) expresses the frequency response uncertainty in terms of the parameter uncertainty. Therefore, both equations (2.10) and (2.11) show that a proper choice of the input spectrum can reduce the variance of the frequency response estimate. This is the purpose of input design. 2.4 Conclusions This chapter introduced PEM for system identification and its asymptotic properties that are often used to solve input design problems.

CHAPTER 2. SYSTEM IDENTIFICATION 8 Models constructed from experimental data are always affected by uncertainty. In Section 2.3 two possible measures of the model uncertainty were discussed: parameter uncertainty and frequency response uncertainty. The choice between the two depends on the application. Typically, in control applications a measure in the frequency domain is preferable. This work considers parameter uncertainty.

Chapter 3 Input Design for System Identification This chapter presents general input design problems for system identification. Typically, input design aims at optimizing some performance function under constraints on the estimate accuracy and on the input signal. The solution approach will be presented in detail, describing how input design problems can be formulated as convex optimization problems. Ideas and drawbacks of the general input design framework are reviewed in Section 3.1. The most widely used measures of estimate accuracy are presented in Section 3.2. Here is also shown how quality constraints can be written as convex constraints. Sections 3.3 to 3.5 describe some techniques used for spectra and signal constraints parametrization, needed for handling finitely parametrized problems. 3.1 Optimal input design problem In a general formulation, input design problems are constrained optimization problems, where the constraints are typically on the input signal spectrum or power and the estimate accuracy. In this framework the objective function to be optimized can be any performance criterion, which usually depends on the practical application. For example, input power or experiment time can be minimized. Common input design problem formulations are: 1. Optimize some measure of the estimate accuracy, under constraints on input excitation. 2. Optimize some property of the input signal, given constraints on the 9

CHAPTER 3. INPUT DESIGN FOR SYSTEM IDENTIFICATION 10 estimate accuracy. As it will be discussed in the next section, typical measures of the estimate accuracy are functions of the uncertainty in the model estimate, like (2.7), (2.10) and (2.11). As was shown in Section 2.3, these functions depend on the input spectrum, which therefore can be used to optimize the objective function. A formal expression of the first problem formulation is that can also be written as subject to g (Φ u ) α, min Φ u f (P θ0 ) (3.1) subject to min Φ u γ (3.2) f (P θ0 ) γ g (Φ u ) α. The next sections will show how this type of constraints can be formulated as convex constraints, upon certain conditions on the functions f and g. The following drawbacks of input design problems like (3.2) have to be enlightened. First of all, notice that the asymptotic expression of the covariance matrix depends on the true parameter θ 0 that is not known. Secondly, the constraints may be non-convex and infinite dimensional. In that case, a parametrization of the input spectrum is necessary in order to handle finitely parametrized optimization problems. Furthermore, once the optimal input spectrum has been found, an input signal having that optimal spectrum has to be generated. This can be done by filtering white noise with an input spectral factor 1. Nevertheless, no information on the probability distribution of the white noise is given in this solution approach. 3.2 Measures of estimate accuracy In the usual input design framework, three types of quality measures are typically considered. These are described in the following sections. 1 By spectral factor is meant an analytic function L (z) such that Φ u (z) = L (z) L ( z 1).

CHAPTER 3. INPUT DESIGN FOR SYSTEM IDENTIFICATION 11 3.2.1 Quality constraint based on the parameter covariance In Section 2.3 it has been shown that the asymptotic covariance matrix of the parameter estimate is an affine function of the input spectrum, through (2.7). If the purpose of the identification procedure is parameter estimation, then typical scalar measures of estimate accuracy are the trace, the determinant or the maximum eigenvalue of the covariance matrix [12]. Then the following quality constraints can be introduced: TrP θ0 γ (3.3) detp θ0 δ (3.4) λ max (P θ0 ) ɛ. (3.5) It is possible to prove that these constraints can be manipulated to be convex in P 1 θ 0 ; proofs can be found in [20] and [21] for (3.4) and (3.5), respectively. For example, (3.5) is equivalent to [ ] Iɛ I 0 (3.6) I P 1 θ 0 which is an LMI in P 1 θ 0. The constraint (3.3) is a special case of the more general weighted trace constraint that will be considered in the next subsection. Notice that all these quality constraints depend on the true parameter vector θ 0. Many solutions have been presented in the literature to handle this problem, as will be discussed afterwards. 3.2.2 Quality constraint based on the model variance Consider the quality constraint based on the variance of the frequency response 1 π ( F (ω) Var G (e iω, 2π ˆθ )) N dω γ, (3.7) π where F (ω) is a weighting function. By substituting the variance expression (2.11), it results that this quality constraint can be written as where W = 1 π 1 2π π N TrW P θ0 γ, (3.8) G ( e iω, θ ) F (ω) G ( e iω, θ ) dω. (3.9) θ θ θ0 θ 0 See [12] for details. The following Lemma generalizes the previous result. A proof can be found in [6].

CHAPTER 3. INPUT DESIGN FOR SYSTEM IDENTIFICATION 12 Lemma 3.2.1 The problem TrW (ω) P θ0 γ, ω W (ω) = V (ω) V (ω) 0, ω P θ0 0 can be formulated as an LMI of the form [ Z V (ω) γ TrZ ] 0 V (ω) 0, ω. P 1 θ 0 (3.10) Notice that this formulation is convex in the matrices P 1 θ 0 and Z (see [12], [6]). Notice also that this type of constraint includes the constraint on the trace of the covariance matrix (3.3) as a special case, when W is the identity matrix. 3.2.3 Quality constraint based on the confidence region In control applications, it is often preferable to have frequency by frequency constraints on the estimate error. Consider the measure [6, 12] ( e iω, θ ) = T ( e iω) G ( 0 e iω ) G ( e iω, θ ) G (e iω. (3.11), θ) The input has to be designed so that ( e iω) γ, ω, θ U θ. (3.12) This constraint can also be formulated as a convex constraint in P 1 θ 0, as proven in [22]. 3.3 Input spectra parametrization As discussed in the last section, the typical measures of estimate accuracy are functions of the covariance matrix P θ0. Expression (2.7), derived in the asymptotic analysis of PEM, shows that the input spectrum Φ u can be used to optimize the estimate performance. The problem of finding an optimal input spectrum has an infinite number of parameters, since Φ u (ω) is a continuous function of the frequency ω. Nevertheless, by a proper spectrum parametrization, it is possible to formulate the problem as a convex

CHAPTER 3. INPUT DESIGN FOR SYSTEM IDENTIFICATION 13 and finite dimensional optimization problem, since the parametrization of the input spectrum leads to a parametrization of the inverse of the covariance matrix [6, 12]. A spectrum can always be written in the general form Φ u (ω) = k= c k B k ( e iω ), (3.13) where { ( B k e iω )} are proper stable rational basis functions that span2 k= L 2. It is always possible to choose basis functions having the hermitian properties B k = Bk and B ( k e iω ) = Bk ( e iω ) [6]. The coefficients c k satisfy the symmetry property c k = c k and must be such that Φ u (ω) 0, ω, (3.14) otherwise Φ u would not be a spectrum. For example, the FIR representation of a spectrum is obtained by choosing B k (ω) = e iωk ; consequently, it results c k = r k, where r k is the correlation function of the process u. By substituting (3.13) into (2.7), the inverse of the covariance matrix becomes an affine function of the coefficients c k of the form P 1 θ 0 = k= c k Q k + Q. (3.15) This parametrization of the input spectrum leads to a denumerable but infinite number of parameters in the optimization problem. Two possible spectra parametrizations are described in the following subsections, which make the problem finitely parametrized. 3.3.1 Finite dimensional spectrum parametrization The finite dimensional spectrum parametrization has the form Φ u (ω) = Ψ ( e iω) + Ψ ( e iω) Ψ ( e iω) = M 1 k=0 c k B k ( e iω ). (3.16) This parametrization forces the coefficients c M, c M+1,... to be zero. Therefore, the condition Φ u (ω) 0 must be assured through the coefficients { c k } M 1 k=0. The following result, deriving from an application of the Positive Real Lemma [23], can be used to assure the constraint (3.14). 2 L 2 denotes the set { f f (x) 2 dx < }

CHAPTER 3. INPUT DESIGN FOR SYSTEM IDENTIFICATION 14 Lemma 3.3.1 Let {A, B, C, D} be a controllable state-space realization of Ψ ( e iω). Then there exists a matrix Q = Q T such that ( ) ( ) Q A T QA A T QB 0 C T B T QA B T + QB C D + D T 0 (3.17) if and only if Φ u (ω) M 1 k=0 c [ ( k Bk e iω ) + Bk ( e iω )] 0, ω. The state-space realization of the positive real part of the input spectrum can be easily constructed. For example (Example 3.5 in [6]), an FIR spectrum has positive real part given by Ψ ( e iω) = 1 M 1 2 r 0 + r k e iωk. (3.18) A controllable state-space realization of Ψ ( e iω) is ( ) O 1 M 2 0 ( ) T A =, B = 1 0... 0 I M 2 O M 2 1 ( ) C = r 1 r 2... r M 1, D = 1 2 r 0. (3.19) Therefore, in this example the constraint (3.14) can be written as an LMI in Q and r 1,..., r M 1. k=1 3.3.2 Partial correlation parametrization The partial correlation parametrization uses the finite expansion M 1 k= (M 1) c k B k ( e iω ) (3.20) in order to design only the first M coefficients c k. In this case, it is necessary to assure that there exists a sequence c M, c M+1,... such that the complete sequence { c k } k=0 defines a spectrum. That is, the condition (3.14) must hold. This means that (3.20) does not necessary define a spectrum itself, but the designed coefficients are extendable to a sequence that parametrizes a spectrum. As explained in [6], if an FIR spectrum is considered, a necessary and sufficient condition for (3.14) to hold is r 0 r 1... r M 1 r 1 r 0... r M 2...... 0 (3.21) r M 1 r M 2... r 0

CHAPTER 3. INPUT DESIGN FOR SYSTEM IDENTIFICATION 15 (see [24] and [25]). This condition also applies to more general basis functions, like B k ( e iω ) = L ( e iω) e iωk, where L ( e iω) > 0 [6, 12]. The constraint (3.21) is an LMI in the first M correlation coefficients and therefore is convex in these variables. Notice that (3.21) is less restrictive than the condition imposed in Lemma 3.3.1 for the finite dimensional parametrization. Furthermore, as it will be discussed in the next section, the finite spectrum parametrization allows to handle spectral constraints on input and output signals, that the partial correlation parametrization cannot handle, since the parametrization (3.20) is not a spectrum. An advantage of the latter parametrization, though, is that it uses the minimum number of free parameters. 3.4 Covariance matrix parametrization By using one of the input spectrum parametrizations in the expression (2.7), the inverse of the asymptotic covariance matrix can be written as P 1 θ 0 = M 1 k= (M 1) c k Q k + Q, (3.22) where Q k = N π 2πλ 0 π F ( u e iω ) (, θ 0 Bk e iω ) Fu ( e iω ), θ 0 dω and Q = Re (θ 0 ). Then, P 1 θ 0 is expressed as a linear and finitely parametrized function of the coefficients c 0,..., c M 1 (since the symmetry condition c k = c k holds). Therefore, any quality constraint that is convex in P 1 θ 0 is also convex in c 0,..., c M 1. Some common quality constraints have been introduced in Section 3.2, which were all convex functions of P 1 θ 0. 3.5 Signal constraints parametrization Constraints on the input spectrum are also considered in input design. Typically, they are frequency by frequency or power constraints. A detailed discussion of the power constraints parametrization is presented in [6]. Briefly, consider power constraints of the type 1 2π 1 2π π π π π Wu ( e iω ) 2 Φ u (ω) dω α u (3.23) Wy ( e iω ) 2 Φ y (ω) dω α y. (3.24)

CHAPTER 3. INPUT DESIGN FOR SYSTEM IDENTIFICATION 16 By using a finite spectrum parametrization these constraints can be written as convex finite-dimensional functions of { c k } M 1 k=0. For example, a constraint on the input power for an FIR spectrum becomes r 0 α u. For frequency by frequency constraints of the form β u (ω) Φ u (ω) γ u (ω) (3.25) β y (ω) Φ y (ω) γ y (ω), (3.26) Lemma 3.3.1 can be applied to write them as convex constraints in { c k } M 1 k=0, upon the condition that the constraining functions are rational [6]. 3.6 Limitations of input design in the frequency domain The previous sections introduced input design problems where constraints on the signal spectra as well as on a measure of the estimate accuracy are considered. It has been shown that they can be formulated as finitely parametrized convex optimization problems, upon the condition that the measure of the estimate accuracy is a convex function of P 1 θ 0 and the input spectrum is parametrized as proposed in Section 3.3. Therefore, by solving a constrained optimization problem, the optimal variables c 0,..., c M 1 are found. The FIR spectrum representation is commonly used, so that the optimization procedure returns the first M terms of the correlation function. If a partial correlation parametrization is used, the optimal spectrum can be found by solving the Yule-Walker equations as described in [26]. From the optimal spectrum is then necessary to generate a signal in the time domain to apply to the real system. This is a realization problem that caracterizes solutions in the frequency domain. The input can be generated as filtered white noise, by spectral factorization of the optimal spectrum. Nevertheless, many probability distributions can be used to generate white noise. Also, it has to be noticed that in general the optimal spectrum is non unique and the input design approach so far considered only finds one of the optimal spectra. In fact, a finite dimensional spectrum parametrization forces the input correlation coefficients r M, r M+1,... to be zero; on the other hand, the partial correlation parametrization needs to complete the correlation sequence by solving Yule-Walker equations which give only one particular correlation sequence. Furthermore, in practical applications time domain constraints on the signals have to be considered and the power constraint that is usually set in

CHAPTER 3. INPUT DESIGN FOR SYSTEM IDENTIFICATION 17 the frequency domain does not assure that these constraints are respected. For these reasons, this thesis proposes to analyze the performance of an input design method in the probability domain, as it will be presented in the next chapters. 3.7 Conclusions Classical input design in the frequency domain has been presented. The advantage of this approach is that input design problems can be formulated as convex optimization problems. Some limitations of the method concern time domain constraints on signals and realization techniques.

Chapter 4 Markov Chain Input Model The drawbacks of input design in the frequency domain, presented in Section 3.6, suggest to study the possibility of a different approach. This chapter will introduce the idea of input design in the probability domain. In particular, reasons and advantages of modeling the input signal as a Markov chain will be presented in Section 4.1. In Section 4.2 the Markov chain input model will be described in detail. 4.1 Introduction What in general is required for system identification in practical applications is an input signal in the time domain that guarantees a sufficiently accurate estimate of the system while respecting some amplitude constraints. As discussed above, input design in the frequency domain does not handle time domain constraints on signals. Another disadvantage is that it does not define how to generate the input signal to apply to the real system from the optimal spectrum. Furthermore, input design in the frequency domain does not use the degrees of freedom in the choice of the optimal spectrum, which is generally non unique. That approach, in fact, only finds an optimal solution that fits with the optimal correlation coefficients r 0,..., r M 1. All the other possible solutions are not considered. The idea of input design in the probability domain arises from this observation: a solution in the probability domain makes it easier to generate input trajectories to apply to the real system, by extracting samples from the optimal distribution. In this way, no spectral factorization or realization algorithm are required. Markov chains having finite state space could then be used in order to directly include the time domain amplitude constraints into the input 18

CHAPTER 4. MARKOV CHAIN INPUT MODEL 19 model, by suitably choosing the state space. The idea is to use Markov chain distributions to generate binary signals. Binary signals are often used in system identification and one of the reasons is that they achieve the largest power in the set of all signals having the same maximum amplitude and this improves parameter estimation for linear models. Also, if the Markov chain has spectrum of sufficiently high order (and this depends on the state space dimension), when designing the optimal probability distribution there are more degrees of freedom in the choice of the optimal spectrum. A finite stationary Markov chain is used as input model for system identification. The probability distribution will be shaped in order to optimize the objective function defined in the input design problem. This function is then minimized with respect to the transition probabilities of the Markov chain that completely define its distribution [27]. 4.2 Markov chains model This section describes the state space structure of the general Markov chain input model and some of its spectral features. 4.2.1 Markov chain state space Consider a finite stationary Markov chain having states of the form (u t n, u t n+1,..., u t ) (4.1) where u i represents the value of the input at the time instant i; it can be equal to either u max or u max, where u max is the maximum tolerable input amplitude, imposed by the real system. This model allows the present value of the input to depend on the last n past values, rather than only on the previous one. Note that at the time instant t, the state can transit only to either the state (u t n+1, u t n+2,..., u t, u max ) or (u t n+1, u t n+2,..., u t, u max ) with probabilities p (ut n,...,u t) and 1 p (ut n,...,u t), respectively. Not all the transitions between states are possible; therefore the transition matrix will present several zeros corresponding to those forbidden state transitions. The last component of the Markov chain state will generate the binary signal to apply to the real system. Example 4.2.1 Consider a Markov chain having state space S 2 = {1, 1}. The graph representation is shown in Figure 4.1 and the corresponding tran-

CHAPTER 4. MARKOV CHAIN INPUT MODEL 20 sition matrix is Π 2 = ( p 1 p 1 q q ). (4.2) Figure 4.1: Graph representation of the two states Markov chain S 2. This simple model generates a binary signal where each sample at time t depends only on the previous value at time t 1. Example 4.2.2 A more general model is the four states Markov chain, with state space S = {(1, 1), (1, 1), ( 1, 1), ( 1, 1)}. The transition matrix is Π = p 1 p 0 0 0 0 s 1 s 0 0 q 1 q r 1 r 0 0 and the corresponding graph is shown in Figure 4.2. (4.3) Figure 4.2: Graph representation of the four states Markov chain S 4. Note that when p = r and s = q the four states Markov chain model is equivalent to the two states Markov chain of Example 4.2.1.

CHAPTER 4. MARKOV CHAIN INPUT MODEL 21 These examples show one of the advantages of the proposed Markov chain input model: each model includes all the models of lower dimension as special cases, for proper choices of the transition probabilities. 4.2.2 Markov chains spectra This section presents a method to calculate the expression of the Markov chains spectra. Some examples will illustrate the type of signals these models generate. By means of Markov chains and state space realization theories (see [27] and [28]), it is possible to derive a general expression for the spectrum of a finite stationary Markov chain s m having state space S = {S 1, S 2,..., S J }. For the general Markov chains considered in the previous section, each state has the form (4.1) and the number of states is J = 2 n+1. Let Π denote the transition matrix whose elements are( the conditional ) probabilities Π (i, j) = P {s m+1 = S j s m = S i } and p = p 1... p J the solution of the linear system p = pπ, containing the stationary probabilities p i = P {s n = S i }. Consider the states S i as column vectors. Defining ( ) A s = S 1... S J and D s = p 1 0... 0 p J it is possible to write the correlation coefficients of the output signal in the matricial form r k = A s D s Π k A T s, k = 0, 1, 2... (4.4) For k < 0 the correlation can be obtained by the symmetry condition r k = r k, since the process s m is real. To calculate the spectrum of s m as the Fourier transform of the correlation function, note that r k can be viewed as the impulse response, for k = 1, 2,..., of the linear system x k+1 = Πx k + ΠA T s u k y k = A s D s x k. (4.5) Therefore, the transfer function W (z) = A s D s (zi Π) 1 ΠA T s of the system (4.5) is the Z-transform of the causal part of the correlation function, that is {r k } k=1.

CHAPTER 4. MARKOV CHAIN INPUT MODEL 22 Consequently, W ( z 1) is the Z-transform of the anticausal part of the correlation function, {r k } 1 k=. The spectrum of the Markov chain signal s m can then be expressed as Φ s (z) = W (z) + r 0 + W ( z 1), (4.6) The correlation r k is in general a matricial function, i.e. for each k, r k is an n n matrix. The correlation function of the input signal is given by the sequence obtained for the element in position (n, n): {r k (n, n)} k=0. Then, the input signal spectrum is given by the element in position (n, n) of the matrix Φ s (z). Consider now the two Markov chains in Examples 4.2.1 and 4.2.2 of the previous section. Example 4.2.3 By calculating the transfer function W (z) and the autocorrelation r 0, the following expression for the spectrum of the Markov chain of Figure 4.1 is obtained: Φ s (z) = (1 + α) (1 γ) (1 + α) (1 γ) (z α) (z 1, (4.7) α) where α = p + q 1 [ 1, 1] and γ = (p+q 1)(p+q 2) (p q)2 (p+q 2). Notice that (4.7) is the spectrum of a first order AR process. This also means that it is possible to generate any first order AR process through a two states Markov chain. The mean value of s m is E [s m ] = p q 2 p q. By forcing p = q the mean value would be zero and since γ = α the spectrum would depend on only one parameter. If p = q the variance of the Markov chain is 1. Example 4.2.4 Consider the Example 4.2.2 of the previous section where the mean value of the Markov chain is set to zero. Then s = (1 q)r (1 p) and the stationary probabilities are p = r(1 q) 2(1 p)(1 q)+2r(1 q) (1 p)(1 q) 2(1 p)(1 q)+2r(1 q) r(1 q) 2(1 p)(1 q)+2r(1 q) (1 p)(1 q) 2(1 p)(1 q)+2r(1 q) ( ) 1 1 1 1 By definition, A s =. However, only the second 1 1 1 1 component of the Markov chain is of interest, since it will represent the T

CHAPTER 4. MARKOV CHAIN INPUT MODEL 23 input signal. Therefore, the ( correlation r k of the ) input process can be calculated using the vector Ās = 1 1 1 1 instead of A s. The analytic calculation of the spectrum turns out to be too involved, so it has only been evaluated numerically. Some values of poles and zeros of the canonical spectral factor 1 are reported in Table 4.1. These data show that it is not possible to model s m by an AR process, because in general there are also non null zeros in the spectrum. The four states Markov chain has a higher order spectrum than the two states Markov chain of Example 4.2.3; the number of poles and zeros depends on the values of the probabilities p, r and q and can be up to eight. For some values of the probabilities p, r and q, there are zero-pole cancelations that reduce the spectrum order; in particular, when p = q = r the spectrum has the same simple structure obtained for the previous case (see Table 4.1), as it has already been shown in the previous section. A particular choice of the transition matrix for this Markov chain is p 1 p 0 0 0 0 r 1 r Π 4 = (4.8) 0 0 p 1 p r 1 r 0 0 This choice of the transition matrix makes the Markov chain symmetric in the sense that the transition probabilities are invariant with respect to exchanges in the sign of the states components. Even if the input is designed in the probability domain, the spectrum is shaped by the choice of n and of the transition probabilities of the Markov chain. Notice that Φ u is only subject to (4.6) and no other structural constraints are imposed, except for the constraint that the transition probabilities have to lie in [0, 1]. For the spectrum parametrizations described in Section 3.3 this does not happen. For example, if the FIR representation of the spectrum is used, the finite dimensional parametrization forces the positive real part of the spectrum to be an FIR system. The Markov chain signals have spectrum where poles and zeros are related each other, since the number of free parameters in the problem is J (the transition probabilities), as the number of poles of W (z). However, the positive real part of the spectrum is not forced to be FIR. From these observations it is possible to conclude that looking at input design using Markov chains in the frequency domain, this 1 By canonical spectral factor is meant the analytic and minimum phase function L (z) such that Φ s (z) = L (z) L ( z 1).

CHAPTER 4. MARKOV CHAIN INPUT MODEL 24 p q r poles zeros 0.2 0.2 0.2-0.6000 0 0.8 0.2 0.2 0.2 0.8 0.2 0.2 0.2 0.8 0.8 0.2 0.8 0.8 0.8 0.2 0.2 0.8 0.8 0.3557 + 0.6161i 0.3557 0.6161i 0.7114 0.7746 0.7746 0 + 0.7746i 0 0.7746i 0 + 0.7746i 0 0.7746i 0.7746 0.7746 0.3557 + 0.6161i 0.3557 0.6161i 0.7114 0 0 0 0 0.4514 0 0 0 0 0 0 0 0 0 0.8 0.8 0.8 0.6000 0 0.3 0.5 0.8 0.2 0.5 0.8 0.0302 + 0.5049i 0.0302 0.5049i 0.1396 0.1500 + 0.5268i 0.1500 0.5268i 0 0 0.3008 0 0.3333 Table 4.1: Poles and zeros of the canonical spectral factor of the spectrum of s m of Example 4.2.4.

CHAPTER 4. MARKOV CHAIN INPUT MODEL 25 approach preserves more degrees of freedom for the choice of the optimal spectrum than the input design approach described in Chapter 3, since less constraints on the structure are imposed. In order to optimize an objective function for input design purposes by shaping the input process distribution, it is necessary to define the Markov chain state space and its transition probabilities. That is, once the input model structure is defined, the objective function is optimized with respect to the transition probabilities. For the examples considered in this section, the purpose of the input design problem would be to optimize the objective function J (u, θ 0 ) with respect to the transition probabilities, p in the first case, p and r in the second. 4.3 More general Markov chains Input signals generated by the Markov chains described in the previous sections are binary signals. It is also possible to extend this type of Markov models to more general input signals. For example, Markov chains with state space S = {0, 1, 1, 2, 2, 3, 3,...} would generate signals having more than two amplitude levels. As a simple example, consider a three states Markov chain generating a ternary input signal. The Markov chain has state space S 3 = {0, 1, 1} and transition graph in Figure 4.3. Figure 4.3: Graph representation of the three states Markov chain S 3. To easily calculate an expression for the spectrum, the mean value of the process is set to zero, so that p = q.

CHAPTER 4. MARKOV CHAIN INPUT MODEL 26 By solving p = pπ, the vector p = found. The spectrum results ( 1 r 3 2r p 1 p 3 2r p 1 r 3 2r p ) is Φ s (z) = 3 (r 1) (p 1) (1 + 3p) 2 (3 2r p) ( z 3p 1 2 1 ) ( z 1 3p 1 2 ) This spectrum has the same structure as the one found for the two states Markov chain in Example 4.2.3. In this case, Φ s depends on both r and p: p determines the pole and r the gain of the spectrum. The expression found for Φ s (z) shows that this input model does not provide a higher order spectrum than the two states Markov chain. In this case, it is the preferable to use a simpler model. This work focuses on input models that generate binary signals. The input models described in this section will not be further considered. 4.4 Conclusions This chapter defined finite stationary Markov chains generating binary signals. These processes have been introduced to model input signals for system identification purposes. The main advantages of using Markov chains as input models are that amplitude constraints are directly included into the input model and the input signal can be easily generated from the optimal distribution, avoiding the realization problem. Spectral properties of these processes have been also analyzed through some examples in Section 4.2.2.

Chapter 5 Estimation Using Markov Chains This chapter proposes a method for parameter estimation of an LTI SISO system by using the Markov chain input model presented in Chapter 4. Section 5.1 defines the input design problem that will be studied here. The solution approach is presented in Section 5.2. The design method is described in detail in Sections 5.3 and 5.4. A numerical example, analyzed in Section 5.5, concludes the chapter. 5.1 Problem formulation Consider the system (2.2) and the parametric model (2.1) defined in Section 2.1. In this chapter, objective of the identification procedure is the estimation of the parameter vector θ. The input design problem considered in this study is to minimize a measure of the estimate error, f (P θ0 ), where f is a convex function of the covariance matrix of the parameter estimate. Often in practice, it is also necessary to take into account some constraints on the real signals. In that case, a general cost function can be considered J (u, θ 0 ) = f (P θ0 (u)) + g, (5.1) where g is a term which represents the cost of the experiment. As explained in Chapter 3, typical functions f are the trace, the determinant or the largest eigenvalue of the covariance matrix P θ0 [1]. This problem formulation is slightly different from the one presented in Chapter 3. No constraints are set in the optimization problem explicitly, but they are included into the cost function through the term g. The reason 27

CHAPTER 5. ESTIMATION USING MARKOV CHAINS 28 for this is that the stochastic approximation framework is used to minimize the objective function, since no analytic convex formulation of the problem is available. This can be seen as a classical multiobjective optimization approach (an overview can be found in [29] and [30]). 5.2 Solution approach Since the analytic expressions for the covariance matrix P θ0 as a function of the transition probabilities of the Markov chain modeling the input are quite involved, simulation techniques are required to evaluate the cost function. The estimate of P θ0 is a stochastic function of the one-step transition probabilities and is contaminated with noise; therefore, it can only be evaluated through randomly generated input and noise signals. In this framework, stochastic approximation algorithms are needed to minimize the cost function (5.1) with respect to the transition probabilities of the Markov chain (see [15], [16] for details). An expression for P θ0 that suits in the stochastic approximation approach is found in Section 5.3; P θ0 is estimated as a function of input and noise data. The stochastic algorithm used to minimize the cost function (5.1) is described in Section 5.4. 5.3 Cost function evaluation From the model expression (2.1) it is possible to write e (t) = H (q, θ) 1 (y (t) G (q, θ) u (t)) and by linearizing the functions G (q, θ) and H (q, θ) at θ = θ 0 G (q, θ) G 0 (q) + G (q, θ) H (q, θ) H 0 (q) + H (q, θ), where G (q, θ) = (θ θ 0 ) T G(q,θ) θ, H (q, θ) = (θ θ 0 ) T H(q,θ) θ0 θ, the θ0 following expression is derived: e (t) = (H 0 (q) + H (q, θ)) 1 (H 0 (q) e 0 (t) G (q, θ) u (t)). (5.2) By substituting the Taylor expansion (H 0 (q) + H (q, θ)) 1 1 H 0 (q) H (q, θ) H 0 (q) 2

CHAPTER 5. ESTIMATION USING MARKOV CHAINS 29 and the expressions of G (q, θ) and H (q, θ), it results ( e 0 (t) (θ θ 0 ) T 1 H (q, θ) H 0 (q) θ e 0 (t) + θ0 ) 1 G (q, θ) H 0 (q) θ u (t) + e (t). (5.3) θ0 The problem of estimating the parameter θ for the model (2.1) is asymptotically equivalent to solving the least squares problem for (5.3) where e 0 and u are known, when the number of data points N used for estimation goes to infinity. Therefore, the asymptotic expression (2.7) can be approximated as where S = ( w 1... w b ) R N b and w i R N 1 is the sequence ob- u (t) + 1 e 0 (t). θ0 θ0 tained from w it = 1 H 0 (q) G(q,θ) θ P 1 θ 0 = 1 λ 0 ( S T S ) H 0 (q) H(q,θ) θ Therefore, at each iteration of the algorithm, the cost function is evaluated using randomly generated input and noise signals. 5.4 Algorithm description When evaluating the cost function by simulation, it is necessary to consider that the cost function estimate is a stochastic variable that depends on the transition probabilities of the Markov chains and on the noise process e (t). Therefore, the cost function values generated through simulation have to be considered as samples of that stochastic variable. The true value of the cost function for a given transition probability would be the mean of that stochastic variable. For these reasons, stochastic approximation is necessary in order to minimize the cost function with respect to the transition probabilities of the Markov chain describing the input. One of the most common stochastic approximation methods that do not require the knowledge of the cost function gradient is the finite difference stochastic approximation (FDSA) [15]. It uses the recursion ˆp k+1 = ˆp k a k J k, (5.4) where J k is an estimate of the gradient of J at the k-th step and a k is a sequence such that lim k a k = 0. The FDSA estimates the gradient of the

CHAPTER 5. ESTIMATION USING MARKOV CHAINS 30 cost function as J ki = J (ˆp k + c k e i ) J (ˆp k c k e i ) 2c k, where e i denotes the unit vector in the i-th direction, J ki is the i-th component of the gradient vector and c k is a sequence of coefficients converging to zero as k. Depending on the number d of parameters with respect to which minimize the cost function, a simultaneous perturbation stochastic approximation (SPSA) may be more efficient than the FDSA [31] ; when d increases the number of cost function evaluations in a FDSA procedure may be too large and the algorithm be very slow. In that case the SPSA algorithm described in [31] gives better performance, since it requires only two evaluations of the cost function regardless of d. In fact, SPSA estimates the gradient by J ki = J (ˆp k + c k k ) J (ˆp k c k k ) 2c k ki where k is a d-dimensional random perturbation vector, whose components are independently generated from a Bernoulli ±1 distribution with probability 0.5 for each outcome [32]. The iteration (5.4) is initialized by a first evaluation of the cost function on a discrete set of points and choosing the minimum in that set. At any point in this set, the cost function is evaluated only once; therefore, the value obtained is a sample extracted from the stochastic variable describing the cost function at that point. Therefore, it could turn out that the initial condition is not close to the true minimum of the cost function, due to noise in the measurements. Nevertheless, in some cases the result of the initialization procedure may be sufficiently accurate, so there could be no need to run many algorithm iterations. This of course will depend on the cost function shape and on the choice of the grid of points. a c (k+1) 1 /3, The sequences a k and c k can be chosen as a k = A+k+1 and c k = which are asymptotically optimal for the FDSA algorithm (see [15]). A method for choosing A, a and c may be to estimate the gradient of the cost function at the initial condition, so that the product a 0 J 0 has magnitude approximately equal to the expected changes among the elements of ˆp k in the early iterations [32]. The coefficient c (as suggested in [32]) ought to be greater than the variance of the noise in the cost function measurements in order to have a good estimate of the gradient. This variance may be estimated at the initial condition of the algorithm. An analytic proof of the algorithm convergence can be found in [15].

CHAPTER 5. ESTIMATION USING MARKOV CHAINS 31 Figure 5.1: Mass-spring-damper system. 5.5 Numerical example Consider a mass-spring-damper system (Figure 5.1), where the input u is the force applied to the mass and the output y is the mass position. It is described by the transfer function G 0 (s) = 1 m s 2 + c m s + k m with m = 100 Kg, k = 10 N /m and c = 6.3246 Ns /m, resulting the natural frequency ω n = 0.3162 rad /s and the damping ξ = 0.1. The power here is defined as pw (t) = u (t) ẏ (t). White noise with variance λ 0 = 0.0001 is added at the output and an output-error model is used [19]. Data are sampled with T s = 1 s and the number of data points generated is N = 1000. As a measure of the estimate accuracy, the trace of the covariance matrix P θ0 is used. In order to consider also some practical constraints on the amplitude of the input and output signals and the maximum and mean input power, a general cost function will be used: J (u, θ 0 ) = f 1 (T rp θ0 (u)) + f 2 (u max ) (5.5) + f 3 (y max ) + f 4 (pw max ) + f 5 (pw mean ) where u max and y max are the absolute maximum values of the input and output signals, pw max and pw mean are the maximum and mean input power. Thresholds for T rp θ0, u max, y max, pw max and pw mean have been set, which define the maximum values allowed for each of these variables. Figure 5.2 and 5.3 show the cost functions f 2, f 3 and f 1, f 4, f 5, respectively: when the variables T rp θ0, u max, y max, pw max and pw mean reach their maximum

CHAPTER 5. ESTIMATION USING MARKOV CHAINS 32 Figure 5.2: Cost functions f 2 and f 3 in the interval of acceptable values of the variables u max and y max, respectively. Figure 5.3: Cost functions f 1, f 4,f 5 in the interval of acceptable values.

CHAPTER 5. ESTIMATION USING MARKOV CHAINS 33 acceptable value (100%), the cost is one. Outside the interval of acceptable values, the cost functions continue growing linearly. As input models, the two simple examples of Markov chains introduced in Section 4.2 are considered here. They are described by the transition matrices ( ) p 1 p Π 2 = 1 p p Π 4 = p 1 p 0 0 0 0 r 1 r 0 0 p 1 p r 1 r 0 0. The set of points used for the algorithm initialization as explained in Section 5.4 is {0.1, 0.2,... 0.9}. In the case analyzed here, since the cost function depends on not more than two parameters, FDSA is used. The algorithm coefficients have been chosen by the method suggested in the previous section. Three cases that have been studied: 1. The cost associated to T rp θ0 and the costs associated to the physical constraints have comparable values. 2. No power and amplitude constraints are considered. 3. Very strict power constraints are considered. These cases are summarized in 5.1. Case T rp θ0 u max y max pw max pw mean 1 5 10 6 1 N 1 m 0.3 Nm s 0.03 Nm s 2 5 10 6 Inf N Inf m Inf Nm s 3 5 10 6 1 N 1 m 0.03 Nm s Inf Nm s 0.003 Nm s Table 5.1: Maximum threshold values in the three analyzed cases. As a term of comparison for the performance of the Markov input model, a pseudo-random binary signal and white noise with unit variance (the same as the variance of the Markov chains) have been applied as inputs to the system. The results of the simulation runs for the three cases listed above are shown in Tables 5.4, 5.5 and 5.6. The cost function values are estimated by evaluating the average of 100 simulation runs using the optimal input found by the algorithm, the PRBS and white noise inputs. Table 5.5, related to case 2, shows the optimal value of the trace of the covariance matrix calculated by solving the LMI formulation of the input design problem in

CHAPTER 5. ESTIMATION USING MARKOV CHAINS 34 the frequency domain, as explained in [12]. Furthermore, by the method described in [13], a binary signal having the optimal correlation function is generated. The minimum obtained with this input signal is also shown in Table 5.5. The second case, which is the most standard in input design problems, is first analyzed in detail. Figure 5.4 presents the cost function, estimated on a fine grid of points, as the average of 100 simulations. Table 5.2 exhibits the Figure 5.4: Estimate of the cost function on a discrete set of points for the two states Markov chain in the case 2. results of two Monte-Carlo simulations (each consisting of 100 runs), which show that the variance of the algorithm output decreases approximately as 1 N Iter, where N Iter is the number of algorithm iterations; this guarantees the empirical algorithm convergence. With 10000 iterations the algorithm N Iter Mean value Eˆp Variance Varˆp 1000 0.8657 4.6 10 4 2000 0.8671 2.5 10 4 Table 5.2: Results of 100 Monte-Carlo simulations of the algorithm with the 2 states Markov chain. produces the results in Figure 5.5. The optimality of the probability ˆp found by the algorithm has been verified by using the expression of the two states

CHAPTER 5. ESTIMATION USING MARKOV CHAINS 35 Markov chain spectrum in the asymptotic expression (2.7) and minimizing T rp θ0 with respect to α; it turns out that the optimal value ˆp = 0.8714 is very close to the one found by the stochastic algorithm after 30000 iterations, that is ˆp = 0.8712 (Table 5.3). This confirms that the stochastic algorithm converges to the true optimal value. In practice, it is not necessary to run the algorithm for 30000 iterations, since already at the initial condition the cost function is very close to the minimum and the variance of the estimate after 10000 iterations is of the order of 10 5. It has been done here, anyway, to show that the final value obtained is the true optimal one. Figure 5.5: Estimation of the best transition probability for the two states Markov chain in the case 2. Case S 2 [ ] [ S 4 1 ˆp = 0.4720 ˆp [ ˆr = 0.4730 ] [ 0.6794 2 ˆp = 0.8712 ˆp [ ˆr = 0.8494 ] [ 0.6445 3 ˆp = 0.1100 ˆp ˆr = 0.0005 0.2981 ] ] ] Table 5.3: Optimal values of the transition probabilities in the cases 1, 2 and 3, obtained after 30000 algorithm iterations. Notice from the results in Table 5.5 that the Markov chains give lower values of the trace of P θ0 (u) than all the other inputs, except the true optimal

CHAPTER 5. ESTIMATION USING MARKOV CHAINS 36 spectrum. The frequencies of the optimal input spectrum for the case 2 S 2 S 4 PRBS WN J (u, θ 0 ) 1.2758 1.2788 1.2564 20.1326 Table 5.4: Total cost function values obtained with the optimal Markov inputs, a PRBS and white noise in case 1. S 2 S 4 PRBS WN BI Optimum T rp θ0 1.43e-7 1.59e-7 4.35e-7 4.66e-7 2.18e-6 2.85e-8 Table 5.5: Trace of the covariance matrix obtained with the optimal Markov inputs, a PRBS, white noise, a binary input having the optimal correlation function and the optimal spectrum in case 2. S 2 S 4 PRBS WN J (u, ϑ 0 ) 78.51 73.58 163.85 484.94 Table 5.6: Total cost function value obtained with the optimal Markov inputs, a PRBS and white noise in case 3. have been estimated by means of the Multiple Signal Classification (MUSIC) methods, described in [26]. It results that the optimal input consists of two sinusoids of frequencies 0.3023 rad /s and 0.3571 rad /s, respectively, where the main contribution is given by the sinusoid of high frequency, which has approximately 5.6 times the power of the first component. Note that these frequencies are very close to the natural frequency of the system and to the poles of the Markov chains spectra (Figures 5.10 and 5.11). Figures 5.6 and 5.7 show the trajectories of the probabilities estimates obtained for the four states Markov chain in case 2. The empirical speed of convergence is lower than for the two states Markov chain. Nevertheless, the cost function value does not change significantly if the algorithm is stopped after 2000 iterations. Case 1 analyzes the more practical situation in which amplitude and power constraints on signals have to be considered. In this case, the cost functions obtained for the two and the four states Markov chain are presented in Figures 5.8 and 5.9. Notice that despite the presence of noise, the cost function is convex; therefore, the problem has a solution.

CHAPTER 5. ESTIMATION USING MARKOV CHAINS 37 Figure 5.6: Estimation of the best transition probability p for the four states Markov chain in the case 2. Figure 5.7: Estimation of the best transition probability r for the four states Markov chain in the case 2.

CHAPTER 5. ESTIMATION USING MARKOV CHAINS 38 Figure 5.8: Estimate of the cost function on a discrete set of points for the two states Markov chain in the case 1. Figure 5.9: Estimate of the cost function on a discrete set of points for the four states Markov chain in the case 1. Power constraints move the minimum of the cost function to smaller probability values; that is, the transition probabilities cannot be too large, otherwise the input excitation would not respect power constraints. In case 1, the Markov inputs and the PRBS signal give almost the same cost value