System identification with information theoretic criteria

Size: px
Start display at page:

Download "System identification with information theoretic criteria"

Transcription

1 Centrum voor Wiskunde en Informatica REPORTRAPPORT System identification with information theoretic criteria A.A. Stoorvogel and J.H. van Schuppen Department of Operations Reasearch, Statistics, and System Theory BS-R

2 Report BS-R953 ISSN CWI P.O. Box GB Amsterdam The Netherlands CWI is the National Research Institute for Mathematics and Computer Science. CWI is part of the Stichting Mathematisch Centrum (SMC), the Dutch foundation for promotion of mathematics and computer science and their applications. SMC is sponsored by the Netherlands Organization for Scientific Research (NWO). CWI is a member of ERCIM, the European Research Consortium for Informatics and Mathematics. Copyright Stichting Mathematisch Centrum P.O. Box 94079, 090 GB Amsterdam (NL) Kruislaan 43, 098 SJ Amsterdam (NL) Telephone Telefax

3 System Identication with Information Theoretic Criteria A.A. Stoorvogel Department of Mathematics and Computing Science, Eindhoven University of Technology P.O. Box 53, 5600 MB Eindhoven, The Netherlands J.H. van Schuppen CWI P.O. Box 94079, 090 GB Amsterdam, The Netherlands Abstract Attention is focused in this paper on the approximation problem of system identication with information theoretic criteria. For a class of problems it is shown that the criterion of mutual information rate is identical to the criterion of exponentialof-quadratic cost and to H entropy. In addition the relation between the likelihood function and divergence is explored. As a consequence of these relations a parameter estimator is derived by four methods for the approximation of a stationary Gaussian process by the output of a Gaussian system. The unied framework for approximation problems of system identication formulated in this paper seems extremely useful. AMS Subject Classication (99): 93B30, 93B36, 93E2, 93E20, 94A7. Keywords and Phrases: System Identication, Information Theory, Parameter Estimation, Maximum Likelihood Method, Linear-Exponential-Quadratic-Gaussian Control, H Optimal Control. Note: The text of this report will appear in the proceedings of the NATO Advanced Study Institute `From Identication to Learning' that was held in Como, Italy in August 994. Introduction The purpose of this paper is to clarify and to explore the relationship between several approximation criteria in system identication. The original motivation for the research reported on in this paper was to clarify the relation between LEQG optimal stochastic control and H optimal control with an entropy criterion, and to explore the use of this relation in system identication. The investigation The research of dr. A.A. Stoorvogel has been made possible by a fellowship of the Royal Netherlands Academy of Sciences and Arts. His cooperation with the second author is supported in part by CWI.

4 lead to a study of information theoretic criteria. The unied framework that appeared seems quite useful for the theory of system identication. A motivating problem of this paper is the choice of an approximation criterion for system identication. Up to recently the approximation criteria used mainly in system identication of stochastic processes were: (i) The likelihood function (maximum likelihood method). (ii) A quadratic cost function (least squares method). For stationary Gaussian processes these criteria are identical. The resulting algorithms for parameter estimation with these criteria are well known. The approximation criteria considered in this paper are: (i) Mutual information rate. (ii) A weighted likelihood function and divergence rate between the measure associated with a simple model and that associated with an element in the model class. In the following sections these criteria are discussed in detail. It will be shown that the criterion of mutual information rate is related to that of the LEQG optimal stochastic control problem and to that of an H optimal control problem with an entropy criterion. Mutual information is related to the information theoretic concept of divergence. The divergence concept is the same as the Kullback-Leibler pseudodistance, which in turn is related to the likelihood function. The discussion of the approximation criteria is rather general. Thus the discussion is formulated in terms of stationary Gaussian processes but can easily be extended to other classes of processes. In summary, in this paper several approximation criteria for system identication of stationary Gaussian processes will be introduced and their relationship will be discussed. Subsequently several parameter estimation problems are posed and solved. The character of this paper is expository. The paper is primarily addressed to doctoral students in engineering, mathematics, and econometrics. The reader is assumed to be familiar with system and control theory, and with probability at a rst year graduate level. The paper contains many denitions and elementary results that are known although not available in a concise and compact publication. The body of the paper contains the main results while the appendices contain details on concepts and theorems. A brief conference version of this paper appeared as [68]. A description of the contents by section follows. Section 2 contains a brief problem formulation. Parameter estimation by the approximation criterion of mutual information is the subject of Section 3 and by the approximation criterion of the likelihood function and divergence is the subject of Section 4 Section 5 presents concluding remarks. Appendix A introduces concepts from probability theory and stochastic processes and Appendix B concepts from system theory. Appendix C contains denitions from information theory, Appendix D formulas of information measures for Gaussian random variables, and Appendix E formulas of information measures for stationary Gaussian processes. Appendix F summarizes results for the LEQG optimal stochastic control problem and Appendix G the H optimal control problem with an entropy criterion. 2

5 2 Problem formulation System identication is a topic of the research area of systems and control. The problem of system identication is to obtain from data a mathematical model in the form of a dynamic system for a phenomenon. Examples of applications are system identication of a wind turbine, of a gas boiler, and of the ow of benzo-a-pyreen through the human body. System identication of a phenomenon often proceeds by the following procedure: (i) Physical modeling and selection of a model class of dynamic systems. (ii) Experiment design, data collection, and preprocessing of the data. (iii) Check on the identiability of the parametrization of the model class. (iv) Selection of an element in the model class that is an approximation of the data. (v) Evaluation of the model determined and, possibly, redoing one or more of the previous steps. In this paper attention is restricted to step (iv) of the above procedure, the selection of an element in the model class that is an approximation of the data. In this paper attention is restricted to phenomena that, after preliminary physical or domain modeling, can be modeled by a stationary Gaussian process. See the appendices for concepts and terminology used in the body of the paper. It is well known from stochastic realization theory for stationary Gaussian processes that such a process can under certain conditions be represented as the output of a Gaussian system in state space form x(t + ) = Ax(t) + Kv(t); y(t) = Cx(t) + v(t); where v : T R p is a Gaussian white noise process. Such a process can also be represented by an ARMA (Auto-Regressive-Moving-Average) representation of the form nx i=0 a i y(t? i) = mx j=0 b j v(t? j): Below attention is sometimes restricted to an AR (Auto-Regressive) representation of the form y(t) + nx i= With the notation a i y(t? i) = v(t): (2.) = (?a ; : : : ;?a n ) T ; (t) = (y(t? ); : : : ; y(t? n)) T ; this representation may be rewritten as y(t) = (t) T + v(t): (2.2) 3

6 Problem 2. Parameter estimation problem for time-invariant Gaussian systems. Consider observations of a stationary Gaussian process with values in R p. Consider the model class of time-invariant Gaussian systems x(t + ) = Ax(t) + Kv(t); y(t) = Cx(t) + v(t); (2.3) or, properly restricted, in AR representation, y(t) = (t) T + v(t): (2.4) Select an element in the model class that approximates the observations by one or more of the approximation criteria mentioned in the following sections. 3 Approximation with mutual information Successively this section presents the topics of mutual information, a parameter estimation problem, the relation between mutual information, LEQG, and H entropy, parameter estimation by LEQG optimal stochastic control, and parameter estimation by the H method. 3. Mutual information There follows an introduction to the concept of mutual information. For a detailed treatment and references see the Appendices C, D, and E. The mutual information of two discrete-valued random variables x : X = fa ; : : : ; a n g and y : Y = fb ; : : : ; b m g, is dened by the formula where J(x; y) = nx mx i= j= p xy (a i ; b j ) ln pxy (a i ; b j ) p x (a i )p y (b j ) ; (3.) p xy (a i ; b j ) = P (fx = a i ; y = b j g); p x (a i ) = P (fx = a i g); p y (b j ) = P (fy = b j g): The mutual information of two random variables describes the dependence between the associated probability measures. In general 0 J(x; y) + and J(x; y) = 0 if and only if x and y are independent. For random variables with continuous values the mutual information is dened as the limit of the mutual information of discrete valued random variables, in which the discrete random variables are required to approximate the continuous valued random variables. In case the random variables x; y are jointly Gaussian random variables with a strictly positive denite variance, or (x; y) 2 G(0; Q) with Q = Q T > 0 where Q = Q x Q xy Q T ; xy Q y 4

7 then J(x; y) =? 2 ln det Q : (3.2) det Q x det Q y For stationary processes the corresponding concept is the mutual information rate. Let x : T R m, y : T R p, be stationary stochastic processes on T =. The mutual information rate of x and y is dened by the formula J(x; y) = lim sup n 2n + J? xj [?n;n] ; yj [?n;n] ; (3.3) where xj [?n;n] denotes the random variable dened by the restriction of the process x to the interval f?n; : : : ;?; 0; ; : : : ; ng. Let for T =, x : T R m and y : T R p be two jointly stationary Gaussian processes that admit a spectral density. Denote their joint spectral density function by ^Wx ^Wxy ^W : C C (m+p)(m+p) ; ^W = : ^W xy ^Wy It follows from [24] that the mutual information of the processes x; y is invariant under scaling by nonsingular linear systems. Let S x : C C mm and S y : C C pp be minimal and square spectral factors of respectively ^W x and ^W y, hence Dene ^W x (z) = S x (z)s x (z? ) T ; ^Wy (z) = S y (z)s y (z? ) T : S(z) = S x (z)? ^Wxy (z)s y (z? )?T : The mutual information rate of these processes is given by the formula J(x; y) =? 4i D z ln det[i? S(z? ) T S(z)]dz: (3.4) Assume that the function S admits a realization as a nite-dimensional linear system of the form S(z) = C(zI? A)? B + D; sp(a) C? ; (3.5) and that the following set of relations admits a unique solution Q 2 R nn Q = Q T 0; (3.6) Q = A T QA + C T C + (A T QB + C T D)N? (B T QA + D T C); (3.7) N = I? D T D? B T QB > 0; (3.8) sp(a + BN? (B T QA + D T C)) C? : (3.9) Then the mutual information rate of the processes x; y is given by the formula J(x; y) =? 2 ln det(i? BT QB? D T D): (3.0) 5

8 3.2 A parameter estimation problem Consider the model class of Gaussian systems in AR representation as stated in Section 2, (t + ) = (t) + v(t); (0) = 0 ; y(t) = (t) T (t) + Nw(t); (3.) with 0 2 G(m 0 ; Q 0 ), v(t) 2 G(0; I), w(t) 2 G(0; I). The problem is to determine a linear recursive parameter estimator, say, ^(t + ) = ^(t) + K(t)[y(t)? (t) T ^(t)]; ^(0) = m0 ; (3.2) such that the estimation error e : T R n ; e(t) = (t)? ^(t); is small according to a criterion specied below. The recursion for the estimation error is then e(t + ) = [I? K(t)(t) T ]e(t) + v(t)? K(t)Nw(t); e(0) = 0? ^ 0 : Assume that e is such that there exists a Gaussian random process r, which is independent of e, such that z := e + r is a standard white noise process, i.e. z 2 G(0; I). Our objective is to determine a parameter estimator such that the mutual information rate J(e; z) between e and z is minimized. This implies that the error signal e should be as small as possible. Clearly this measure is not very intuitive but it turns out to be closely related to LEQG and H parameter estimators. In particular, assume e is asymptotically stationary with asymptotic density function R. The assumption regarding the existence of r and z implies 0 R(z) I: It is then easy to show that the mutual information rate of the processes z and e is given by J(z; e) =? 4i D? ln det[i? R(z)]dz = z 4i D z ln det[i? S(z? ) T S(z)]dz (3.3) where S is a spectral factor of R. Expressing the mutual information in terms of the spectral factor S will be useful later on. Problem 3. Consider the parameter estimation setting dened above, with the observation equation and the parameter model as given in (3.). Determine a parameter estimator of the form (3.2) that minimizes the mutual information rate (3.3) between the processes z and e. Problem 3. will not be solved directly but via LEQG optimal stochastic control and via H optimal control theory. 6

9 3.3 Relation of mutual information, H entropy, and LEQG cost An investigation of the authors has established that the criterion of mutual information of two stationary Gaussian processes is identical to the H entropy criterion and to the limit of the cost of a stochastic control problem with an exponential-of-quadratic cost function. This result is rst stated as a theorem and then explained. Theorem 3.2 Consider a nite-dimensional linear system x(t + ) = Ax(t) + Bu(t); y(t) = Cx(t) + Du(t); (3.4) that is a minimal realization of its impulse response function and such that sp(a) C?. Denote the transfer function of this system by S, S(z) = C(zI? A)? B + D: (3.5) Assume that the set of relations (3.6)-(3.9) admits a unique solution Q 2 R nn. Consider the expression? 2 ln(det(i? BT QB? D T D): (3.6) a. If e : T R m and z : T R p are stationary Gaussian processes with spectral density I S(z? ) T S(z) I ; (3.7) then the mutual information of (e; z) is given by the expression (3.6). b. The H entropy of the system (3.4), dened by: J =? 4i D is equal to the expression (3.6). z ln det I? S(z? ) T S(z) dz; (3.8) c. Let u : T R m be a stationary Gaussian process in (3.4) with u(t) 2 G(0; I). Then lim "exp t t ln E 2 tx is equal to the expression (3.6). y(s) T y(s) # Proof a. See Theorem E.3. b. See Theorem G.. c. See Proposition F.3. 2 The discovery of the relation between the criterion of LEQG and the H entropy has been published by K. Glover and J.C. Doyle [26]. That the H entropy criterion is identical to mutual information is new as far as the authors know. The H entropy criterion for continuous-time systems was introduced by K. Glover and D. Mustafa in [27] and has 7

10 been used extensively since then in H optimal control theory. In that paper the concept of H entropy is referred to the paper [8] by D.. Arov and M.G. Krein. In the latter paper the formula (3.8) is presented with the explanation that this formula `has a denite entropy sense' under reference to a publication by Kolmogorov, Gelfand, and Yaglom. In fact, the formula (3.8) is the mutual information of two stationary Gaussian processes that was apparently rst analyzed in [24]. The expression of mutual information provided in [24] is in a geometric formulation from which (3.8) may be derived. The formula for the mutual information in the case of scalar processes is stated in [57]. The authors therefore conclude that H entropy is actually mutual information. 3.4 Parameter estimation with an exponential-of-quadratic cost A short introduction to the LEQG optimal stochastic control problem follows. A detailed treatment may be found in Appendix F. Consider the Gaussian stochastic control system x(t + ) = Ax(t) + Bu(t) + Mv(t); y(t) = C x(t) + D u(t) + Nv(t); z(t) = C 2 x(t) + D 2 u(t); (3.9) where v is Gaussian white noise with v(t) 2 G(0; V ). A control law is a collection of maps g = fg t ; t 2 T g, g t : Y t U t U such that the input u(t) is given by u(t) = g t (y(0); : : : ; y(t? ); u(0); : : : ; u(t? )): Let G be the set of control laws. Consider the cost function on a nite horizon J t : G R J t (g) = E " c exp 2 c tx t=0 z(t) T z(t) # ; (3.20) for c 2 R, c 6= 0. The LEQG optimal stochastic control problem is then to determine a control law g 2 G such that J t (g ) = inf J t (g): (3.2) g2g Assume that we apply a controller g to the stochastic system (3.9) such that the closed loop system is linear and time-invariant. We can then determine the limit of the cost function " # t E c exp tx 2 c z(s) T z(s) : (3.22) as t. In Appendix F it is proven that " # lim t t ln E c exp tx 2 c z(s) T z(s) =? 2? ln det(v? cm T QM? cn T N) +? ln det V : (3.23) 2 In (3.23) Q is the solution of a set of relations similar to (3.6)-(3.9). Next the parameter estimation problem is posed in terms of the LEQG problem. 8

11 Problem 3.3 Consider the Gaussian system (t + ) = (t) + v(t); (0) = 0 ; y(t) = (t) T (t) + w(t); (3.24) where T = f0; ; : : : ; t g, 0 : R n, 0 2 G(m 0 ; Q 0 ), v : R n and w : R p are Gaussian white noise processes with v(t) 2 G(0; Q v ), w(t) 2 G(0; Q w ) with Q w > 0, 0 ; v; w are independent, and : T R np is as specied in (2.2). Determine the process ^ : T R n and a recursion for it such that ^(t) is F y t? measurable for all t 2 T, and such that the following expression is minimized E " c exp 2 c tx z(s) T z(s) # where z(t) = C[(t)? ^(t)], c 2 R, c 6= 0. ; (3.25) Theorem 3.4 Consider Problem 3.3. Dene the functions K : T R np, Q : T R nn by the recursions: K(t) = Q(t) (t) C T I S(t)? ; (3.26) 0 Q(t + ) = Q(t) + Q v? Q(t) (t) C T (t) T S(t)? Q(t); (3.27) C Q(0) = Q 0 ; (3.28) S(t) = (t)t Q(t) (t) C T + Q w 0? C 0 c I : (3.29) Assume that for all t 2 T we have Q(t) > 0. The optimal LEQG parameter estimator is then specied by the recursion ^(t + ) = ^(t) + K(t)[y(t)? (t) T ^(t)]: (3.30) If in addition Q v = 0 then Q(t + )? = Q(t)? + (t)q? w (t) T? cc T C; (3.3) K(t) = Q(t)(t) ((t) T Q(t)(t) + Q w )? : (3.32) The details of the proof of Theorem 3.4 will not be provided in this paper because of space limitations. The problem is related to Problem F.. It diers from that problem in that Problem 3.3 has output matrix, (t), that depends on the past observations. Therefore a conditional version of Problem F. is obtained. The result of Problem F. may be found in [32]. The second author of this paper is preparing a publication that contains the solution to the conditional version of Problem F.. As pointed out in Appendix F, P. Whittle rst solved the discrete time case of Problem F. but for a system representation slightly dierent from that used in this paper. 9

12 3.5 Parameter estimation with H entropy A short introduction to the H optimal control problem follows. A detailed treatment may be found in Appendix G Consider the linear control system x(t + ) = Ax(t) + Bu(t) + Mv(t); y(t) = C x(t) + D u(t) + Nv(t); z(t) = C 2 x(t) + D 2 u(t); (3.33) where v is this time an unknown deterministic signal. As before, a control law is a collection of maps g = fg t ; t 2 T g, g t : Y t U t U such that the input u(t) is given by u(t) = g t (y(0); : : : ; y(t? ); u(0); : : : ; u(t? )): Let G be the set of control laws. Consider the cost function J : G R where J(g) = sup v;x0 kvk 2 2;t := tx kzk 2 2;t kvk 2 2;t + xt 0 R? x 0 (3.34) v(t) T v(t): (3.35) Note that R plays a role similar to the covariance of the initial condition for the LEQG problem. The larger R, the more uncertainty we have for the initial condition. If t = then we must constrain v to `2, the class of signals for which the innite sum (3.35) converges. The H optimal control problem is then to determine a control law g 2 G such that J(g ) = inf J(g): (3.36) g2g This problem turns out to be very hard. Hence instead we will look for a control law g such that: J(g) < c? : This is equivalent to minimizing the following criterion J c (g) = sup v;x0 ckzk 2 2;t? kvk2 2;t? xt 0 R? x 0 : As a matter of fact J(g) < if J c (g) < c?. We will actually solve the problem to nd g such that J c (g ) = inf g2g J c(g): (3.37) If t =, x(0) = 0 (R = 0), and both the system and the controller are time-invariant, then this problem has another interpretation. Let us dene J e (g) =? 4i D z ln det I? cs T (z? )S(z) dz 0

13 where S is the closed loop transfer matrix from v to z. Then g satises (3.37) if and only if J e (g ) = inf g2g J e(g): Next the parameter estimation problem is posed in terms of an H control problem. Problem 3.5 Consider the Gaussian system (t + ) = (t) + v(t); (0) = 0 ; y(t) = (t) T (t) + w(t); (3.38) where T = f0; ; : : : ; t g and is as specied in (2.2). Determine the process ^ : T R n and a recursion for it such that for all t 2 T we have that ^(t) is a function of past measurements y(0); : : : ; y(t? ) and such that the following expression is minimized sup w;v;x0?x T 0Q? 0 x 0 + tx t=0 cz(t) T z(t)? v(t) T Q? v v(t)? w(t) T Q? w w(t) where z(t) = C[(t)? ^(t)], c 2 R, c 6= 0, Q v = Q T v > 0, Q w = Q T w > 0. (3.39) Proposition 3.6 Consider Problem 3.5. There exists a ^ such that the supremum (3.39) is nite if and only if there exists a matrix Q satisfying (3.27) and (3.28). Moreover, in that case the optimal estimator is given by (3.30). 4 Approximation with likelihood and divergence In this section the reader is presented successively with text on the approximation with likelihood, divergence, the relation of likelihood and divergence, approximation with divergence, and parameter estimation by divergence minimization. 4. Approximation with the likelihood function In this subsection a particular recursive weighted maximum likelihood problem will be formulated and solved. The maximum likelihood method for parameter estimation has been proposed and analyzed by Sir Ronald Fisher, see [2, 22]. In most lectures and most textbooks the likelihood function is dened by example only. In general the likelihood function may be dened as the Radon-Nikodym derivative of the measure of the observations with respect to the measure with respect to which the observations are independent in discrete-time or to the measure with respect to which the observation process is an independent increment process in continuous-time. The likelihood function for a Gaussian or ARMA representation is presented in [33, 63].

14 Problem 4. Consider the parameter estimation model of the rst paragraph of Section 3, with the representation (t + ) = (t), (0) = ; y(t) = (t) T (t) + w(t): (4.) Let c 2 R, c 6= 0, 2 G(0; Q 0 ), Q 0 = Q T 0 > 0. Suppose that at t 2 T the parameter estimates f^(s); s = 0; : : : ; tg have been determined. Choose the next parameter estimate ^(t + ) 2 R n as the solution of the optimization problem sup 2R exp n 2 c tx k? ^(s)k 2 f(y(0); : : : ; y(t); ); (4.2) where f is the joint density function of the observation process fy(s); s = ; : : : ; tg and fy(s); s = ; : : : ; tg are the numerical values of the observations. The density function f depends on the parameter vector. Theorem 4.2 Consider Problem 4. with Q w = I. Assume that the model is such that Q(t) is well-dened by Q(t + )? = Q(t)? + (t)q? w (t) T? ci: (4.3) with Q(0) = Q 0 and Q(t) > 0 for all t 2 T. Then the optimal estimate at t 2 T is given by the recursion where ^(t + ) = ^(t) + K(t)[y(t)? (t) T ^(t)]; ^(0) = 0; (4.4) K(t) = Q(t)(t)[(t) T Q(t)(t) + ]? : (4.5) Note that the parameter estimator (4.4) is identical to that of Theorem 3.4. Proof: to The logarithm of the criterion is, up to a term that does not depend on, equal 2h() =? T Q? By assumption 0? tx 2d 2 h()=d 2 = cti? Q? 0? ky(s)? (s) T k 2 + c tx The rst order conditions then yield: Q(t)? = tx y(s)(s)? c t? X t? X (s)(s) T =?Q(t)?? ci < 0: ^(s); k? ^(s)k 2 : (4.6) and after simple calculations one gets (4.4). 2 2

15 4.2 Divergence The Kullback-Leibler pseudo-distance or divergence is a function that measures the relation between probability measures. For system identication it is an important approximation criterion. There follows a brief introduction to divergence, the full details may be found in the Appendices C, D, and E Consider the measurable space (; F ) and two probability measures P ; P 2 dened on it. Let Q be a -nite measure on (; F ) such that P and P 2 are absolutely continuous with respect to Q, say with r = dp =dq and r 2 = dp 2 =dq. Such a measure always exists. The divergence of P ; P 2 is dened by the formula D(P kp 2 ) = E Q r ln( r r 2 )I (r2 >0) : (4.7) It may be shown that the denition of D(P kp 2 ) does not depend on the measure Q chosen, that for all P ; P 2 we have D(P kp 2 ) 0, and D(P kp 2 ) = 0 i P = P 2. The triangle inequality is not satised by D hence it is not a distance but a pseudo-distance. S. Kullback and R.A. Leibler in [46] introduced this concept which is therefore also known as the Kullback-Leibler pseudo-distance. The term divergence is used in information theory. Divergence and mutual information for measures induced by two real valued random variables are related by J(x; y) = D(P xy kp x P y ): Let G(m ; Q ) and G(m 2 ; Q 2 ) be two Gaussian measures on R n. Assume that Q ; Q 2 > 0. The divergence between these measures is given by the formula (see D.5), 2D (G(m ; Q )kg(m 2 ; Q 2 )) =? ln det Q + tr([q? 2? Q? det Q ]Q ) + (m 2? m ) T Q? (m 2 2? m ): (4.8) 2 Consider next two jointly stationary Gaussian processes x ; x 2 : T R n on T =. Assume that they have a mean value function that is zero and that the joint processes admit a spectral density. Under additional conditions, see Theorem E.5, the divergence between the measures associated with these processes is given by the formula D(P kp 2 ) =? 2 ln det(bt QB + D T D) + 2 tr(bt RB + D T D? I); (4.9) where Q 2 R nn is the solution of an algebraic Riccati equation and R 2 R nn is the solution of a Lyapunov equation. 4.3 Relation of likelihood function and divergence The likelihood function as an approximation criterion is related to the approximation criterion of divergence discussed in the previous subsection. The relation between these criteria will be described for Gaussian random variables. Consider n real valued observations. The model class of the variable considered is the class of Gaussian random variables with values in R, say G(m; q), m 2 R, q 2 R, q > 0. 3

16 The observations are assumed to be independent. The measure of these observations and their representation are given by y : T R n, y 2 G(me; Q), where e = (; : : : ; ) T 2 R n ; Q = qi: The density function of y with respect to Lebesgue measure is p(v; me; Q) = p (2)n det Q exp? 2 (v? me)t Q? (v? me) The likelihood function L : R R + R + is therefore equal to L(m; q; y) = p (2)n det Q exp? 2 (y? me)t Q? (y? me) where y 2 R n is the vector with the numerical values of the observations. As is well known, the maximum likelihood estimate of the parameters m; q are given by the formulas ^m = n nx k= y k ; ^q = n nx k= (y k? ^m) 2 : The density function of a Gaussian measure with ^m; ^q as parameters is denoted by p(:; ^m; ^q). Formula (4.8) for the divergence of two Gaussian measures on R reduces in this case to (assuming q > 0 and q 2 > 0) 2D (p(:; m ; q )kp(:; m 2 ; q 2 )) =? ln q q 2? + q q 2 + (m? m 2 ) 2 q 2 : Proposition 4.3 Consider the likelihood function and the divergence for the Gaussian random variables dened above. Then Proof: L(m; q; y) = p (2) n exp?n 2 [2D (G( ^m; ^q)kg(m; q)) + + ln ^q] : From the discussion above follows that ln L(m; q; y) =? n 2 ln(2)? 2 n[ln(q) + q q ]; ; : 2D(G( ^m; ^q)kg(m; q)) =? ln( ^q q )? + ^q q ( ^m? m)2 + ; q where q = n Note that q = n nx k= nx k= (y k? m) 2 : (y k? m) 2 = n nx k= (^y k? ^m + ^m? m) 2 = ^q + ( ^m? m) 2 ; 4

17 hence the result. 2 The conclusion from Proposition 4.3 is that maximizing the likelihood function is equivalent to minimizing the divergence between a measure in the model class and the measure associated with the maximum likelihood estimates. However, note that this fact cannot be used to nd the maximum likelihood estimates since the maximum likelihood estimates ^m and ^q appear in the expression. Rather, it shows that the likelihood function is equivalent to the divergence criterion. In the example above both measures appearing in the divergence, G(m; q) associated with the model class and G( ^m; ^q) associated with the maximum, are members of the model class. Another way to consider the relation between the likelihood function and divergence has been proposed by H. Akaike in []. Does the relation between the likelihood function and divergence extend to other classes of stochastic systems? In the maximum likelihood approach one maximizes the likelihood function with respect to the parameters of the model class. The divergence between the measure associated with the observations and the measure associated with the model class is a pseudo-distance. Therefore the maximum likelihood approach and the minimum divergence approach are related. The exponential transformation between both criteria as in Proposition 4.3 for the Gaussian case may not hold in general, although this relation does hold for a larger class of distributions than the Gaussian one. S.-I. Amari has investigated the geometric structure of exponential families of probability distributions and explored the use of the likelihood function and of divergence for this family, see [4, 6, 5]. The minimum divergence method diers from the maximum likelihood method in that for the rst method a measure must be chosen that represents the observations. In this point the maximum likelihood approach requires less. 4.4 Approximation with divergence For system identication an approximation problem with the divergence criterion may be considered. The relationship between the likelihood function and divergence, see Proposition 4.3, suggests such an approach. In system identication one is given observations. According to the procedure outlined in Section 2 one then selects a model class. In the case considered in this paper this is the class of stationary Gaussian processes which can be represented as a Gaussian system. With the observations one can associate a measure of a stationary Gaussian process. For example, one can take the measure associated with the mean value function of this process to be the zero function and the covariance function to be an estimate, say ( P t t?t ^W (t) = y(s + t)y(s) T ; t = 0; : : : ; t 2 ; (4.0) 0; else. Another choice is a measure associated with an AR representation of high order. Problem 4.4 Approximation of a stationary Gaussian process by a time-invariant Gaussian system according to the divergence criterion. Consider given a probability measure P on the space of Gaussian processes with values in R p and zero mean value function. Consider the model class GS(n) of Gaussian systems up to order n and the associated 5

18 set of probability measures fp (); 2 GS(n)g on the output processes of such systems. Solve the approximation problem inf D(P kp ()): (4.) 2GS(n) A solution to Problem 4.4, if it exists, is a Gaussian system of which the probability measure on the output process minimizes the divergence to the probabiblity measure associated with the observations. Note that the probability measure associated with the observations may not be in the model class, for example because the covariance estimate is not a rational function. Problem 4.4 is therefore equivalent to the determination of an element in the model class that minimizes the divergence to the measure associated with the observations. The principle of approximation by the divergence criterion is used in the literature. It has apparently rst been proposed by H. Akaike in [] who explored its use in [2, 3] and who derived the Akaike Information Criterion (AIC) from it. Other publications on this approach are those of I. Csiszar, [4, 5, 7] and, in the case of the ARMA model class, [20, 53]. The following model reduction problem for Gaussian random variables is of interest to system identication. In system identication a technique is used called subspace methods that is closely related to the following problem. It is expected that solution of this problem will be useful to subspace methods. The problem was rst formulated in [69, Subsec. 3.8]. Problem 4.5 Model reduction for a pair of Gaussian random variables. Let y : R p, y 2 : R p 2 with (y ; y 2 ) 2 G(0; S), S = S S 2 S T ; rank(s 2 ) = n : 2 S 22 Consider the model class, for n 2 2 +, n 2 < n, G(n 2 ) = fg(0; Q) on R p+p2 jrank(q 2 ) = n 2 ; g; (4.2) Q = Q Q 2 Q T ; (4.3) 2 Q 22 in which the components of Q are compatible with the decomposition (y ; y 2 ). Solve for xed n 2 2 +, n 2 < n, inf D (G(0; S)kG(0; Q)): (4.4) G(0;Q)2G(n2) An interpretation of this problem follows. In stochastic realization theory of stochastic processes one uses the framework of past, future, and present or state. If one restricts attention to the case of Gaussian random variables, thus not of processes, then the past and the future are represented by Gaussian random variables, say y ; y 2 as above. From observations one can estimate the measure of the past and future observations, say G(0; S). The state space associated with the past and the future then has dimension n = rank(s 2 ). In the model reduction problem one seeks a model or measure of the observations in which the dimension of the state space is less than that initially xed, thus rank(q 2 ) = n 2 < n. 6

19 The approximation criterion of divergence seems quite appropriate, also because of its relation with the likelihood function. For initial results on this problem see [69, Subsec. 3.8]. The problem has a rich convex structure that remains to be explored. 4.5 Parameter estimation by divergence minimization Consider Problem 2. of parameter estimation. Consider the time moment t 2 T. Suppose that the observations fy(s); s = 0; : : : ; tg are available and that the estimates f^(s); s = 0; : : : ; t? g have been chosen. The question is then how to choose ^(t). With the observations fy(s); s = 0; : : : ; tg we associate the probability measure G(y; I) and with the model class the probability measure G(; I) where = 0 (0) T () T C. A ; y = (t) T 0 y(0) y() C. A : y(t) The unknown parameter has to be estimated. In this setting : T R n is a deterministic function. Associate with the past parameter estimates the probability measure G( ; W ) and with the model class the probability measure G(; c? I), where c 2 R, c 6= 0, = t t? X ^(s); W = t t? X (^(s)? )(^(s)? ) T : Problem 4.6 Consider Problem 2. and the notation introduced above. For t 2 T choose the parameter estimate ^(t) as the solution of the minimization problem inf 2R n t D (G(y; I)kG(; I))? D? G( ; W)kG(; c? I) : (4.5) The problem formulated above is related to but dierent from that of the papers [34, 35]. Theorem 4.7 Consider Problem 4.6. a. The divergence criterion of that problem is equivalent to the criterion inf 2R n t ky? k 2? ck? k 2 : (4.6) b. The parameter estimator that solves this problem is given by the formula ^(t + ) = ^(t) + K(t)[y(t)? (t) T ^(t)]; ^(0) = 0; (4.7) where the formula for the gain and the Riccati recursion are as stated in Theorem 4.2. Note that the parameter estimator (4.7) is identical to the estimators of Theorem 3.4 and Theorem

20 Proof a. A simple calculation yields that 2D (G(y; I)kG(; I)) =? ln 2D? G( ; W)kG(; c? I) =? ln det I det I? t + tr(i? I) + ky? k 2 = ky? k 2 ; det W? t + tr(c W ) + ck? k2 ; det(c? I) 2 t D (G(y; I)kG(; I))? 2D(G( ; W )kg(; c? I)) = t ky? k2? ck? k 2 + ln det W + t? c tr W + t ln c: Note that the last four terms depend on the observations and not on the parameter. Hence they can be neglected in the criterion. b. It will be shown that the criterion of part a of the theorem is equivalent to the criterion ky? k 2? ck? ^k 2 : This will be established by showing that inmization of the functions f() = k? k 2 ; g() = t k? ^k 2 = t t? X (? ^(s)) T (? ^(s)); is equivalent. Both functions are quadratic forms in. The derivatives are df() d dg() d = 2(? ) T ; = 2 t t? X (? ^(s)) T = 2? 2 t t? X ^(s) = 2(? ) T : Hence the rst order conditions are identical and therefore the optimal is the same for both optimization problems. This optimization problem is then given by: inf 2R n t hky? k 2? ck? ^k 2 i : The result then follows from Theorem Concluding remarks What is the contribution of this paper to the theory of system identication? The main contribution of the paper is the relation between several information theoretic criteria and their use in system identication. The concept of mutual information rate for stationary Gaussian processes is identical to H entropy and to the exponentialof-quadratic cost in optimal stochastic control. The likelihood function is identical to divergence for Gaussian random variables. As a consequence of this relation a parameter estimator is presented that may be derived by four dierent approximation criteria. The problem concerns the approximation of observations from a stationary Gaussian process 8

21 by the output of a Gaussian system of a specied order. Another product of the paper are formulas for information theoretic criteria of stationary Gaussian processes in case these processes are outputs of time-invariant Gaussian systems. The relationship between the H entropy for a nite-dimensional linear system and the exponential-of-quadratic cost for a Gaussian stochastic control system is understood at the level of the formulas. It is a question as to whether a deeper understanding is possible. By rephrasing the relation as a relation between mutual information and the exponentialof-quadratic cost, the interpretation becomes dierent but it is still not enlightening. The parameter estimator derived in this paper should be tested on data. Its robustness properties require further study. Which problems of system identication theory require further attention? The framework for system identication with information theoretic criteria should be explored further. For stationary Gaussian processes also the case of ARMA representations may be considered. Besides Gaussian processes also nite valued processes, with the hidden Markov model as stochastic system, and counting processes should be considered. Theoretical problems of system identication and of realization may be considered using the framework of this paper. The relation between the likelihood function and divergence rate may be explored for model reduction of Gaussian systems. Possibly minimization of the divergence can be solved partly analytically. The approximation problem as such also requires further study. System identication theory in general may benet from studying other classes of dynamic systems in relation with practical problems. Examples of such classes are positive linear systems, particular classes of nonlinear systems, and the class of errors-in-variables systems. Approximation techniques for nonlinear systems, such as articial neural nets, wavelets, and fuzzy modeling, may be explored. Acknowledgements The authors thank S. Bittanti and G. Picci for the organization of the NATO Advanced Study Institute From Identication to Learning and for the invitation to the second named author to lecture on the topic of this paper. The meeting and its proceedings are very useful ways to transfer knowledge in the area of system identication. The second named author of this paper also thanks L. Finesso of LADSEB, Padova, Italy, for a discussion on the subject of this paper and for useful references. The research of this paper has been sponsored in part by the Commission of the European Communities through the SCIENCE Program by the project SC*-CT

22 A Concepts from probability and the theory of stochastic processes In this appendix concepts and notation of probability and of the theory of stochastic processes are introduced. General mathematics notation follows. The set of integers is denoted by and the set of positive integers by +. The set of real numbers is denoted by R and the set of the positive real numbers by R + = [0; ). The n-dimensional vector space over R is denoted by R n and the set of n m matrices over this vector space by R nm. The transpose of a matrix A 2 R nn is denoted by A T. The matrix A 2 R nn is said to be positive denite if x T Ax 0 for all x 2 R n and strictly positive denite if x T Ax > 0 for all x 2 R n ; x 6= 0. The set of complex numbers is denoted by C and C? := c 2 C jcj < ; D := c 2 C jcj = : For A 2 R nn, sp(a) C denotes the spectrum of the matrix; in other words the set of eigenvalues. A. Probability concepts A measurable space, denoted by (; F ), is dened as a set and a -algebra F. probability space is dened as a triple (; F; P ) where (; F ) is a measurable space and P : F R is a probability measure. Let I A : R be the indicator function of the event A 2 F dened by I A () =, if 2 A and I A () = 0 otherwise. For any random variable x let F x denote the smallest -algebra on which the random variable x is measurable. Let P; Q be two probability measures on a measurable space (; F ). Then P is said to be absolutely continuous with respect to Q, denoted by P Q, if P (A) = 0 is implied by Q(A) = 0. If P Q then it follows from the Radon-Nikodym theorem that there exists a random variable r : R + such that P (A) = E Q [ri A ] = r()i A ()dq: A.2 Gaussian random variables The Gaussian probability distribution function with parameters m 2 R n and Q 2 R nn, with Q strictly positive denite, is dened by the probability density function p(v) = p (2)n det Q exp? 2 (v? m)t Q? (v? m) A : (A.) A random variable x : R n is said to be a Gaussian random variable with parameters m 2 R n and Q 2 R nn, Q positive denite, if for all u 2 R n E[exp(iu T x)] = exp iu T m? 2 ut Qu : (A.2) 20

23 The notation x 2 G(m; Q) will be used in this case. Moreover, (x ; : : : ; x n ) 2 G(m; Q) denotes that, with x = (x ; : : : ; x n ) T, x 2 G(m; Q). In this case x ; : : : ; x n are said to be jointly Gaussian random variables. A random variable with Gaussian probability distribution function is a Gaussian random variable. A Gaussian random variable does not necessarily have a Gaussian probability density function, only so in the case its variance is strictly positive denite. In the following the geometric approach to Gaussian random variables is introduced. In this approach one considers the space that the random variables generate rather than the random variables themselves. Proposition A. Let x : R n x 2 G(m; Q), S 2 R n n. a. Then F Sx F x. b. F Sx = F x i ker Q = ker SQ. The above result motivates the following denition. Denition A.2 Let x : R n, x 2 G and consider the -algebra F x. a. A basis for F x is a triple (n ; m ; Q ) 2 N R n R n n such that there exists a x : R n satisfying x 2 G(m ; Q ), Q = Q T 0, and F x = F x. b. A minimal basis for F x is a basis (n ; m ; Q ) such that rank(q ) = n. c. A basis transformation of F x is a map x 7 Sx, with S 2 R n n such that F Sx = F x. By use of linear algebra one can, given a basis (n ; m ; Q ), always construct a minimal basis for F x. Proposition A.3 Let x : R n, x 2 G(0; Q ), and Q = Q T 0. Then there exists a n 2 2 N and a basis transformation S 2 R n 2n such that, if x 2 : R n2 x 2 = Sx, then x 2 2 G(0; Q 2 ), Q 2 = Q T 2 > 0, and F x 2 = F x. Problem A.4 Let y : R k and y 2 : R k 2 be jointly Gaussian random variables with (y ; y 2 ) 2 G(0; Q). Determine a canonical form for the spaces F y ; F y2. Note that a basis transformation of the form S = block? diag(s ; S 2 ) with S ; S 2 nonsingular, leaves the spaces F y ; F y 2 invariant. Therefore this operation introduces an equivalence relation on the spaces F y ; F y2, hence one can speak of a canonical form. Denition A.5 Let y : R k and y 2 : R k2 be jointly Gaussian random variables with (y ; y 2 ) 2 G(0; Q). Then (y ; y 2 ) are said to be in canonical variable form if Q = 0 I I I 0C A 2 R(k+k2)(k+k2) I where 2 R k 2k2, k 2 2 N +, = diag( ; : : : ; k2 ), : : : k2 > 0. One then says that (y ; : : : ; y k ), (y 2 ; : : : ; y 2k2 ) are the canonical variables and ( ; : : : ; k2 ) the canonical correlation coecients. 2

24 Theorem A.6 Let y : R k and y 2 : R k 2 be jointly Gaussian random variables with (y ; y 2 ) 2 G(0; Q). Then there exists a basis transformation S = block? diag(s ; S 2 ) such that with respect to the new basis (S y ; S 2 y 2 ) 2 G(0; Q ) has the canonical variable form. The transformation to canonical variable form is not unique in general. The remaining invariance of the canonical variable form will not be stated here because of space limitation. For additional results on canonical variables the reader is referred to [7, 25]. A.3 Concepts from the theory of stochastic processes A real valued stochastic process x : T R on a probability space (; F; P ) and a time index set T is dened to be a function such that for all t 2 T the map x(:; t) : R is a random variable. A stochastic process x : T R n is said to be a Gaussian process if every nite dimensional distribution is Gaussian. Thus, if for all m 2 + and (t ; : : : ; t m ) 2 T the collection of random variables x(t ); : : : ; x(t m ) is jointly Gaussian. Such a process is completely specied by its mean value function m : T R n, m(t) = E[x(t)] and its covariance function W : T T R nn, W (t; s) = E[(x(t)? m(t))(x(s)? m(s)) T ]. A (discrete time) Gaussian white noise process with values in R k and intensity V : T R kk, V (t) = V (t) T 0, is a stochastic process such that () fv(t); t 2 T g is a collection of independent Gaussian random variables; (2) v is a Gaussian process with v(t) 2 G(0; V (t)). A stochastic process x : T R n on T = or T = R is said to be stationary if for all m 2 +, s 2 T, and t ; : : : ; t m 2 T the joint distribution of (x(t ); : : : ; x(t m )) equals the joint distribution of (x(t + s); : : : ; x(t m + s)). If for all m 2 + and t ; : : : ; t m 2 T the distribution function of (x(t + s); : : : ; x(t m + s)) converges to a limit as s then we call the process asymptotically stationary. A Gaussian process with mean value function m : T R n and covariance function W is stationary i m(t) = m(0) 8t 2 T and W (t; s) = W (t + u; s + u), 8t; s; u 2 T. In this case the function W : T R nn, W (t) = W (t; 0) is also called the covariance function. The concept of canonical correlation process has been dened for stationary Gaussian processes in analogy with the canonical variable form and canonical correlation coecients for Gaussian random variables. That this may be done follows from the fact that many properties of a stationary Gaussian process depend only on the spaces generated by such a process. For references on this topic see [28, 42, 43, 54]. B Concepts from system theory This appendix contains several denitions of concepts from system theory that are needed at several places in the paper. Denition B. A discrete-time nite-dimensional linear dynamic system or, by way of abbreviation, a linear system, is a dynamic system = ft;r p ; Y;R n ;R m ; U; ; rg 22

25 in which the state transition map and the read-out map r are specied by : x(t + ) = A(t)x(t) + B(t)u(t); x(0) = x 0 ; r : y(t) = C(t)x(t) + D(t)u(t): (B.) Here T ; m; n; p 2 +, u : T R m ; u 2 U, x 0 2 R n, A : T R nn, B : T R nm, C : T R pn, D : T R pm, and x : T R n ; y : T R p are determined by the above recursions and are called respectively the state function and the output function. Such a system is called time-invariant if for all t 2 T A(t) = A(0); B(t) = B(0); C(t) = C(0); D(t) = D(0). The parameters of such a system are denoted by fn; m; p; A; B; C; Dg. Denition B.2 A Gaussian stochastic control system is dened by the representation x(t + ) = A(t)x(t) + B(t)u(t) + M(t)v(t); x(0) = x 0 ; y(t) = C(t)x(t) + D(t)u(t) + N(t)v(t); (B.2) where T = f0; ; : : : ; t g, t 2 +, X = R n, U = R m, Y = R p, x 0 : X, x 0 2 G(m 0 ; Q 0 ), v : T R r is a Gaussian white noise process with for all t 2 T, v(t) 2 G(0; V (t)), V : T R rr, V (t) = V (t) T 0, the -algebras F x0 and Ft v are independent, A : T R nn, B : T R nm, C : T R pn, D : T R pm, M : T R nr, N : T R pr, x : T X, and y : T Y. Stochastic realization The problem of obtaining a representation of a stochatic process as the output of a stochastic system is called the stochastic realization problem. Below the weak stochastic realization problem for stationary Gaussian processes is mentioned. For references on this problem see [9] and the paper by G. Picci elsewhere in this volume. Consider a stationary Gaussian process with mean value function equal to zero and covariance function W : T R pp. The problem is whether there exists a time-invariant Gaussian system, say of the form x(t + ) = Ax(t) + Mv(t); y(t) = Cx(t) + Nv(t); (B.3) with sp(a) C? such that the covariance function of the output process y equals the given covariance function. If so, then this system is said to be a stochastic realization of the given process. There is a necessary and sucient condition for the existence of such a realization. When a realization exists there may be many realizations. Attention is then restricted to minimal realizations for which the dimension of the state space is as small as possible. There is a classication of all minimal stochastic realizations. In addition, there is an algorithm to construct a minimal realization. C Concepts from information theory The purpose of this appendix is to describe for the reader the main concepts of information theory. In Appendix D formulas are presented for information measures of Gaussian 23

26 random variables and in Appendix E formulas for information measures of stationary Gaussian processes. Information theory originated with the work of C.E. Shannon [6, 62]. The Russian mathematicians A.N. Kolmogorov, I.M. Gelfand, and A.M. Yaglom partly developed the measure theoretic formulation of information theory, see [24]. The book [57] by M.S. Pinsker partly summarizes information theory as developed by the Russian school. The publications of A. Perez, [55, 56], oer another measure theoretic formulation of information theory in which martingale theory is used as a technique to establish convergence results. Of more recent contributions to information theory we mention the work of I. Csiszar [6]. A recent textbook on information theory is the one by T.M. Cover and J.A. Thomas [3]. Information theory of continuous stochastic processes is treated in the book by S. Ihara [37]. Other books are [30, 45]. Denition C. Let x : X with X = fa ; a 2 ; : : : ; a n g be a nite valued random variable and let p x : X R be its frequency function, p x (a i ) = P (fx = a i g). Dene the entropy of the nite valued random variable x as H C (x) = nx i= p x (a i ) ln p x (a i ) : (C.) The concept of entropy of a nite valued random variable should be regarded as entropy with respect to a counting measure. Entropy is a property of a probability measure, not of a random variable. Nevertheless, the terminology of information theory is followed in referring to entropy of a random variable. Denition C.2 Let x : R n be a random variable whose probability distribution function is absolutely continuous with respect to Lebesgue measure with density p x : R n R +. Dene the entropy with respect to Lebesgue measure of such a random variable by H L (x) = p x (v) ln dv; (C.2) p x (v) R n \S where S is the support of p x. The entropy dened above depends on Lebesgue measure, or in general on the measure with respect to which it is dened. Denition C.3 [57, p. 9] Let (; F ) be a measurable space consisting of a set and a -algebra F. Let P ; P 2 be two probability measures dened on (; F ). Dene the entropy of P with respect to P 2 by the formula H(P ; P 2 ) = sup fe;::: ;Eng nx i= P (E i ) ln P (E i ) ; (C.3) P 2 (E i ) where the supremum is taken over all nite measurable partitions of. Besides entropy there is the concept of mutual information. 24

Control for Coordination of Linear Systems

Control for Coordination of Linear Systems Control for Coordination of Linear Systems André C.M. Ran Department of Mathematics, Vrije Universiteit De Boelelaan 181a, 181 HV Amsterdam, The Netherlands Jan H. van Schuppen CWI, P.O. Box 9479, 19 GB

More information

Structural identifiability of linear mamillary compartmental systems

Structural identifiability of linear mamillary compartmental systems entrum voor Wiskunde en Informatica REPORTRAPPORT Structural identifiability of linear mamillary compartmental systems JM van den Hof Department of Operations Reasearch, Statistics, and System Theory S-R952

More information

only nite eigenvalues. This is an extension of earlier results from [2]. Then we concentrate on the Riccati equation appearing in H 2 and linear quadr

only nite eigenvalues. This is an extension of earlier results from [2]. Then we concentrate on the Riccati equation appearing in H 2 and linear quadr The discrete algebraic Riccati equation and linear matrix inequality nton. Stoorvogel y Department of Mathematics and Computing Science Eindhoven Univ. of Technology P.O. ox 53, 56 M Eindhoven The Netherlands

More information

8 Periodic Linear Di erential Equations - Floquet Theory

8 Periodic Linear Di erential Equations - Floquet Theory 8 Periodic Linear Di erential Equations - Floquet Theory The general theory of time varying linear di erential equations _x(t) = A(t)x(t) is still amazingly incomplete. Only for certain classes of functions

More information

H 1 optimisation. Is hoped that the practical advantages of receding horizon control might be combined with the robustness advantages of H 1 control.

H 1 optimisation. Is hoped that the practical advantages of receding horizon control might be combined with the robustness advantages of H 1 control. A game theoretic approach to moving horizon control Sanjay Lall and Keith Glover Abstract A control law is constructed for a linear time varying system by solving a two player zero sum dierential game

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

3. ESTIMATION OF SIGNALS USING A LEAST SQUARES TECHNIQUE

3. ESTIMATION OF SIGNALS USING A LEAST SQUARES TECHNIQUE 3. ESTIMATION OF SIGNALS USING A LEAST SQUARES TECHNIQUE 3.0 INTRODUCTION The purpose of this chapter is to introduce estimators shortly. More elaborated courses on System Identification, which are given

More information

The Pacic Institute for the Mathematical Sciences http://www.pims.math.ca pims@pims.math.ca Surprise Maximization D. Borwein Department of Mathematics University of Western Ontario London, Ontario, Canada

More information

1. Find the solution of the following uncontrolled linear system. 2 α 1 1

1. Find the solution of the following uncontrolled linear system. 2 α 1 1 Appendix B Revision Problems 1. Find the solution of the following uncontrolled linear system 0 1 1 ẋ = x, x(0) =. 2 3 1 Class test, August 1998 2. Given the linear system described by 2 α 1 1 ẋ = x +

More information

A note on coinduction and weak bisimilarity for while programs

A note on coinduction and weak bisimilarity for while programs Centrum voor Wiskunde en Informatica A note on coinduction and weak bisimilarity for while programs J.J.M.M. Rutten Software Engineering (SEN) SEN-R9826 October 31, 1998 Report SEN-R9826 ISSN 1386-369X

More information

Control Systems. Laplace domain analysis

Control Systems. Laplace domain analysis Control Systems Laplace domain analysis L. Lanari outline introduce the Laplace unilateral transform define its properties show its advantages in turning ODEs to algebraic equations define an Input/Output

More information

Congurations of periodic orbits for equations with delayed positive feedback

Congurations of periodic orbits for equations with delayed positive feedback Congurations of periodic orbits for equations with delayed positive feedback Dedicated to Professor Tibor Krisztin on the occasion of his 60th birthday Gabriella Vas 1 MTA-SZTE Analysis and Stochastics

More information

Stochastic Processes

Stochastic Processes Introduction and Techniques Lecture 4 in Financial Mathematics UiO-STK4510 Autumn 2015 Teacher: S. Ortiz-Latorre Stochastic Processes 1 Stochastic Processes De nition 1 Let (E; E) be a measurable space

More information

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley Time Series Models and Inference James L. Powell Department of Economics University of California, Berkeley Overview In contrast to the classical linear regression model, in which the components of the

More information

Solution of Linear State-space Systems

Solution of Linear State-space Systems Solution of Linear State-space Systems Homogeneous (u=0) LTV systems first Theorem (Peano-Baker series) The unique solution to x(t) = (t, )x 0 where The matrix function is given by is called the state

More information

and the nite horizon cost index with the nite terminal weighting matrix F > : N?1 X J(z r ; u; w) = [z(n)? z r (N)] T F [z(n)? z r (N)] + t= [kz? z r

and the nite horizon cost index with the nite terminal weighting matrix F > : N?1 X J(z r ; u; w) = [z(n)? z r (N)] T F [z(n)? z r (N)] + t= [kz? z r Intervalwise Receding Horizon H 1 -Tracking Control for Discrete Linear Periodic Systems Ki Baek Kim, Jae-Won Lee, Young Il. Lee, and Wook Hyun Kwon School of Electrical Engineering Seoul National University,

More information

The iterative convex minorant algorithm for nonparametric estimation

The iterative convex minorant algorithm for nonparametric estimation The iterative convex minorant algorithm for nonparametric estimation Report 95-05 Geurt Jongbloed Technische Universiteit Delft Delft University of Technology Faculteit der Technische Wiskunde en Informatica

More information

STABILITY OF INVARIANT SUBSPACES OF COMMUTING MATRICES We obtain some further results for pairs of commuting matrices. We show that a pair of commutin

STABILITY OF INVARIANT SUBSPACES OF COMMUTING MATRICES We obtain some further results for pairs of commuting matrices. We show that a pair of commutin On the stability of invariant subspaces of commuting matrices Tomaz Kosir and Bor Plestenjak September 18, 001 Abstract We study the stability of (joint) invariant subspaces of a nite set of commuting

More information

QUASI-UNIFORMLY POSITIVE OPERATORS IN KREIN SPACE. Denitizable operators in Krein spaces have spectral properties similar to those

QUASI-UNIFORMLY POSITIVE OPERATORS IN KREIN SPACE. Denitizable operators in Krein spaces have spectral properties similar to those QUASI-UNIFORMLY POSITIVE OPERATORS IN KREIN SPACE BRANKO CURGUS and BRANKO NAJMAN Denitizable operators in Krein spaces have spectral properties similar to those of selfadjoint operators in Hilbert spaces.

More information

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 Instructions Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 The exam consists of four problems, each having multiple parts. You should attempt to solve all four problems. 1.

More information

Design of abstract domains using first-order logic

Design of abstract domains using first-order logic Centrum voor Wiskunde en Informatica REPORTRAPPORT Design of abstract domains using first-order logic E. Marchiori Computer Science/Department of Interactive Systems CS-R9633 1996 Report CS-R9633 ISSN

More information

Nonlinear Observers. Jaime A. Moreno. Eléctrica y Computación Instituto de Ingeniería Universidad Nacional Autónoma de México

Nonlinear Observers. Jaime A. Moreno. Eléctrica y Computación Instituto de Ingeniería Universidad Nacional Autónoma de México Nonlinear Observers Jaime A. Moreno JMorenoP@ii.unam.mx Eléctrica y Computación Instituto de Ingeniería Universidad Nacional Autónoma de México XVI Congreso Latinoamericano de Control Automático October

More information

Robust control and applications in economic theory

Robust control and applications in economic theory Robust control and applications in economic theory In honour of Professor Emeritus Grigoris Kalogeropoulos on the occasion of his retirement A. N. Yannacopoulos Department of Statistics AUEB 24 May 2013

More information

Special Classes of Fuzzy Integer Programming Models with All-Dierent Constraints

Special Classes of Fuzzy Integer Programming Models with All-Dierent Constraints Transaction E: Industrial Engineering Vol. 16, No. 1, pp. 1{10 c Sharif University of Technology, June 2009 Special Classes of Fuzzy Integer Programming Models with All-Dierent Constraints Abstract. K.

More information

A Control Methodology for Constrained Linear Systems Based on Positive Invariance of Polyhedra

A Control Methodology for Constrained Linear Systems Based on Positive Invariance of Polyhedra A Control Methodology for Constrained Linear Systems Based on Positive Invariance of Polyhedra Jean-Claude HENNET LAAS-CNRS Toulouse, France Co-workers: Marina VASSILAKI University of Patras, GREECE Jean-Paul

More information

Mathematical Institute, University of Utrecht. The problem of estimating the mean of an observed Gaussian innite-dimensional vector

Mathematical Institute, University of Utrecht. The problem of estimating the mean of an observed Gaussian innite-dimensional vector On Minimax Filtering over Ellipsoids Eduard N. Belitser and Boris Y. Levit Mathematical Institute, University of Utrecht Budapestlaan 6, 3584 CD Utrecht, The Netherlands The problem of estimating the mean

More information

Nonparametric regression with martingale increment errors

Nonparametric regression with martingale increment errors S. Gaïffas (LSTA - Paris 6) joint work with S. Delattre (LPMA - Paris 7) work in progress Motivations Some facts: Theoretical study of statistical algorithms requires stationary and ergodicity. Concentration

More information

1 Linear Algebra Problems

1 Linear Algebra Problems Linear Algebra Problems. Let A be the conjugate transpose of the complex matrix A; i.e., A = A t : A is said to be Hermitian if A = A; real symmetric if A is real and A t = A; skew-hermitian if A = A and

More information

1 Review of di erential calculus

1 Review of di erential calculus Review of di erential calculus This chapter presents the main elements of di erential calculus needed in probability theory. Often, students taking a course on probability theory have problems with concepts

More information

Introduction to the Mathematical and Statistical Foundations of Econometrics Herman J. Bierens Pennsylvania State University

Introduction to the Mathematical and Statistical Foundations of Econometrics Herman J. Bierens Pennsylvania State University Introduction to the Mathematical and Statistical Foundations of Econometrics 1 Herman J. Bierens Pennsylvania State University November 13, 2003 Revised: March 15, 2004 2 Contents Preface Chapter 1: Probability

More information

ELEG 3143 Probability & Stochastic Process Ch. 6 Stochastic Process

ELEG 3143 Probability & Stochastic Process Ch. 6 Stochastic Process Department of Electrical Engineering University of Arkansas ELEG 3143 Probability & Stochastic Process Ch. 6 Stochastic Process Dr. Jingxian Wu wuj@uark.edu OUTLINE 2 Definition of stochastic process (random

More information

460 HOLGER DETTE AND WILLIAM J STUDDEN order to examine how a given design behaves in the model g` with respect to the D-optimality criterion one uses

460 HOLGER DETTE AND WILLIAM J STUDDEN order to examine how a given design behaves in the model g` with respect to the D-optimality criterion one uses Statistica Sinica 5(1995), 459-473 OPTIMAL DESIGNS FOR POLYNOMIAL REGRESSION WHEN THE DEGREE IS NOT KNOWN Holger Dette and William J Studden Technische Universitat Dresden and Purdue University Abstract:

More information

Model reduction for linear systems by balancing

Model reduction for linear systems by balancing Model reduction for linear systems by balancing Bart Besselink Jan C. Willems Center for Systems and Control Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen, Groningen,

More information

Positive systems in the behavioral approach: main issues and recent results

Positive systems in the behavioral approach: main issues and recent results Positive systems in the behavioral approach: main issues and recent results Maria Elena Valcher Dipartimento di Elettronica ed Informatica Università dipadova via Gradenigo 6A 35131 Padova Italy Abstract

More information

2 Interval-valued Probability Measures

2 Interval-valued Probability Measures Interval-Valued Probability Measures K. David Jamison 1, Weldon A. Lodwick 2 1. Watson Wyatt & Company, 950 17th Street,Suite 1400, Denver, CO 80202, U.S.A 2. Department of Mathematics, Campus Box 170,

More information

Problem Description The problem we consider is stabilization of a single-input multiple-state system with simultaneous magnitude and rate saturations,

Problem Description The problem we consider is stabilization of a single-input multiple-state system with simultaneous magnitude and rate saturations, SEMI-GLOBAL RESULTS ON STABILIZATION OF LINEAR SYSTEMS WITH INPUT RATE AND MAGNITUDE SATURATIONS Trygve Lauvdal and Thor I. Fossen y Norwegian University of Science and Technology, N-7 Trondheim, NORWAY.

More information

Vector Space Basics. 1 Abstract Vector Spaces. 1. (commutativity of vector addition) u + v = v + u. 2. (associativity of vector addition)

Vector Space Basics. 1 Abstract Vector Spaces. 1. (commutativity of vector addition) u + v = v + u. 2. (associativity of vector addition) Vector Space Basics (Remark: these notes are highly formal and may be a useful reference to some students however I am also posting Ray Heitmann's notes to Canvas for students interested in a direct computational

More information

14 - Gaussian Stochastic Processes

14 - Gaussian Stochastic Processes 14-1 Gaussian Stochastic Processes S. Lall, Stanford 211.2.24.1 14 - Gaussian Stochastic Processes Linear systems driven by IID noise Evolution of mean and covariance Example: mass-spring system Steady-state

More information

Kalman Filter and Parameter Identification. Florian Herzog

Kalman Filter and Parameter Identification. Florian Herzog Kalman Filter and Parameter Identification Florian Herzog 2013 Continuous-time Kalman Filter In this chapter, we shall use stochastic processes with independent increments w 1 (.) and w 2 (.) at the input

More information

6.241 Dynamic Systems and Control

6.241 Dynamic Systems and Control 6.241 Dynamic Systems and Control Lecture 24: H2 Synthesis Emilio Frazzoli Aeronautics and Astronautics Massachusetts Institute of Technology May 4, 2011 E. Frazzoli (MIT) Lecture 24: H 2 Synthesis May

More information

1 Outline Part I: Linear Programming (LP) Interior-Point Approach 1. Simplex Approach Comparison Part II: Semidenite Programming (SDP) Concludin

1 Outline Part I: Linear Programming (LP) Interior-Point Approach 1. Simplex Approach Comparison Part II: Semidenite Programming (SDP) Concludin Sensitivity Analysis in LP and SDP Using Interior-Point Methods E. Alper Yldrm School of Operations Research and Industrial Engineering Cornell University Ithaca, NY joint with Michael J. Todd INFORMS

More information

Computation Of Asymptotic Distribution. For Semiparametric GMM Estimators. Hidehiko Ichimura. Graduate School of Public Policy

Computation Of Asymptotic Distribution. For Semiparametric GMM Estimators. Hidehiko Ichimura. Graduate School of Public Policy Computation Of Asymptotic Distribution For Semiparametric GMM Estimators Hidehiko Ichimura Graduate School of Public Policy and Graduate School of Economics University of Tokyo A Conference in honor of

More information

ECE504: Lecture 8. D. Richard Brown III. Worcester Polytechnic Institute. 28-Oct-2008

ECE504: Lecture 8. D. Richard Brown III. Worcester Polytechnic Institute. 28-Oct-2008 ECE504: Lecture 8 D. Richard Brown III Worcester Polytechnic Institute 28-Oct-2008 Worcester Polytechnic Institute D. Richard Brown III 28-Oct-2008 1 / 30 Lecture 8 Major Topics ECE504: Lecture 8 We are

More information

Markov processes Course note 2. Martingale problems, recurrence properties of discrete time chains.

Markov processes Course note 2. Martingale problems, recurrence properties of discrete time chains. Institute for Applied Mathematics WS17/18 Massimiliano Gubinelli Markov processes Course note 2. Martingale problems, recurrence properties of discrete time chains. [version 1, 2017.11.1] We introduce

More information

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3 Brownian Motion Contents 1 Definition 2 1.1 Brownian Motion................................. 2 1.2 Wiener measure.................................. 3 2 Construction 4 2.1 Gaussian process.................................

More information

d(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N

d(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N Problem 1. Let f : A R R have the property that for every x A, there exists ɛ > 0 such that f(t) > ɛ if t (x ɛ, x + ɛ) A. If the set A is compact, prove there exists c > 0 such that f(x) > c for all x

More information

MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES. Contents

MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES. Contents MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES JAMES READY Abstract. In this paper, we rst introduce the concepts of Markov Chains and their stationary distributions. We then discuss

More information

7.1. Calculus of inverse functions. Text Section 7.1 Exercise:

7.1. Calculus of inverse functions. Text Section 7.1 Exercise: Contents 7. Inverse functions 1 7.1. Calculus of inverse functions 2 7.2. Derivatives of exponential function 4 7.3. Logarithmic function 6 7.4. Derivatives of logarithmic functions 7 7.5. Exponential

More information

Lecture 2: Review of Prerequisites. Table of contents

Lecture 2: Review of Prerequisites. Table of contents Math 348 Fall 217 Lecture 2: Review of Prerequisites Disclaimer. As we have a textbook, this lecture note is for guidance and supplement only. It should not be relied on when preparing for exams. In this

More information

satisfying ( i ; j ) = ij Here ij = if i = j and 0 otherwise The idea to use lattices is the following Suppose we are given a lattice L and a point ~x

satisfying ( i ; j ) = ij Here ij = if i = j and 0 otherwise The idea to use lattices is the following Suppose we are given a lattice L and a point ~x Dual Vectors and Lower Bounds for the Nearest Lattice Point Problem Johan Hastad* MIT Abstract: We prove that given a point ~z outside a given lattice L then there is a dual vector which gives a fairly

More information

A Concise Course on Stochastic Partial Differential Equations

A Concise Course on Stochastic Partial Differential Equations A Concise Course on Stochastic Partial Differential Equations Michael Röckner Reference: C. Prevot, M. Röckner: Springer LN in Math. 1905, Berlin (2007) And see the references therein for the original

More information

On a class of stochastic differential equations in a financial network model

On a class of stochastic differential equations in a financial network model 1 On a class of stochastic differential equations in a financial network model Tomoyuki Ichiba Department of Statistics & Applied Probability, Center for Financial Mathematics and Actuarial Research, University

More information

g(.) 1/ N 1/ N Decision Decision Device u u u u CP

g(.) 1/ N 1/ N Decision Decision Device u u u u CP Distributed Weak Signal Detection and Asymptotic Relative Eciency in Dependent Noise Hakan Delic Signal and Image Processing Laboratory (BUSI) Department of Electrical and Electronics Engineering Bogazici

More information

Statistical Signal Processing Detection, Estimation, and Time Series Analysis

Statistical Signal Processing Detection, Estimation, and Time Series Analysis Statistical Signal Processing Detection, Estimation, and Time Series Analysis Louis L. Scharf University of Colorado at Boulder with Cedric Demeure collaborating on Chapters 10 and 11 A TT ADDISON-WESLEY

More information

ECON 582: Dynamic Programming (Chapter 6, Acemoglu) Instructor: Dmytro Hryshko

ECON 582: Dynamic Programming (Chapter 6, Acemoglu) Instructor: Dmytro Hryshko ECON 582: Dynamic Programming (Chapter 6, Acemoglu) Instructor: Dmytro Hryshko Indirect Utility Recall: static consumer theory; J goods, p j is the price of good j (j = 1; : : : ; J), c j is consumption

More information

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157 Lecture 6: Gaussian Channels Copyright G. Caire (Sample Lectures) 157 Differential entropy (1) Definition 18. The (joint) differential entropy of a continuous random vector X n p X n(x) over R is: Z h(x

More information

1 More concise proof of part (a) of the monotone convergence theorem.

1 More concise proof of part (a) of the monotone convergence theorem. Math 0450 Honors intro to analysis Spring, 009 More concise proof of part (a) of the monotone convergence theorem. Theorem If (x n ) is a monotone and bounded sequence, then lim (x n ) exists. Proof. (a)

More information

Dynamic Data Factorization

Dynamic Data Factorization Dynamic Data Factorization Stefano Soatto Alessro Chiuso Department of Computer Science, UCLA, Los Angeles - CA 90095 Department of Electrical Engineering, Washington University, StLouis - MO 63130 Dipartimento

More information

A quasi-separation theorem for LQG optimal control with IQ constraints Andrew E.B. Lim y & John B. Moore z February 1997 Abstract We consider the dete

A quasi-separation theorem for LQG optimal control with IQ constraints Andrew E.B. Lim y & John B. Moore z February 1997 Abstract We consider the dete A quasi-separation theorem for LQG optimal control with IQ constraints Andrew E.B. Lim y & John B. Moore z February 1997 Abstract We consider the deterministic, the full observation and the partial observation

More information

Notes on Measure Theory and Markov Processes

Notes on Measure Theory and Markov Processes Notes on Measure Theory and Markov Processes Diego Daruich March 28, 2014 1 Preliminaries 1.1 Motivation The objective of these notes will be to develop tools from measure theory and probability to allow

More information

Optimal Data and Training Symbol Ratio for Communication over Uncertain Channels

Optimal Data and Training Symbol Ratio for Communication over Uncertain Channels Optimal Data and Training Symbol Ratio for Communication over Uncertain Channels Ather Gattami Ericsson Research Stockholm, Sweden Email: athergattami@ericssoncom arxiv:50502997v [csit] 2 May 205 Abstract

More information

Stochastic Realization of Binary Exchangeable Processes

Stochastic Realization of Binary Exchangeable Processes Stochastic Realization of Binary Exchangeable Processes Lorenzo Finesso and Cecilia Prosdocimi Abstract A discrete time stochastic process is called exchangeable if its n-dimensional distributions are,

More information

1 Vectors. Notes for Bindel, Spring 2017 Numerical Analysis (CS 4220)

1 Vectors. Notes for Bindel, Spring 2017 Numerical Analysis (CS 4220) Notes for 2017-01-30 Most of mathematics is best learned by doing. Linear algebra is no exception. You have had a previous class in which you learned the basics of linear algebra, and you will have plenty

More information

Law of Cosines and Shannon-Pythagorean Theorem for Quantum Information

Law of Cosines and Shannon-Pythagorean Theorem for Quantum Information In Geometric Science of Information, 2013, Paris. Law of Cosines and Shannon-Pythagorean Theorem for Quantum Information Roman V. Belavkin 1 School of Engineering and Information Sciences Middlesex University,

More information

Chapter 9 Observers, Model-based Controllers 9. Introduction In here we deal with the general case where only a subset of the states, or linear combin

Chapter 9 Observers, Model-based Controllers 9. Introduction In here we deal with the general case where only a subset of the states, or linear combin Lectures on Dynamic Systems and Control Mohammed Dahleh Munther A. Dahleh George Verghese Department of Electrical Engineering and Computer Science Massachuasetts Institute of Technology c Chapter 9 Observers,

More information

Modeling and Analysis of Dynamic Systems

Modeling and Analysis of Dynamic Systems Modeling and Analysis of Dynamic Systems Dr. Guillaume Ducard Fall 2017 Institute for Dynamic Systems and Control ETH Zurich, Switzerland G. Ducard c 1 / 57 Outline 1 Lecture 13: Linear System - Stability

More information

Iterative procedure for multidimesional Euler equations Abstracts A numerical iterative scheme is suggested to solve the Euler equations in two and th

Iterative procedure for multidimesional Euler equations Abstracts A numerical iterative scheme is suggested to solve the Euler equations in two and th Iterative procedure for multidimensional Euler equations W. Dreyer, M. Kunik, K. Sabelfeld, N. Simonov, and K. Wilmanski Weierstra Institute for Applied Analysis and Stochastics Mohrenstra e 39, 07 Berlin,

More information

Gaussian processes. Basic Properties VAG002-

Gaussian processes. Basic Properties VAG002- Gaussian processes The class of Gaussian processes is one of the most widely used families of stochastic processes for modeling dependent data observed over time, or space, or time and space. The popularity

More information

Chapter 7 Interconnected Systems and Feedback: Well-Posedness, Stability, and Performance 7. Introduction Feedback control is a powerful approach to o

Chapter 7 Interconnected Systems and Feedback: Well-Posedness, Stability, and Performance 7. Introduction Feedback control is a powerful approach to o Lectures on Dynamic Systems and Control Mohammed Dahleh Munther A. Dahleh George Verghese Department of Electrical Engineering and Computer Science Massachuasetts Institute of Technology c Chapter 7 Interconnected

More information

PROOF OF TWO MATRIX THEOREMS VIA TRIANGULAR FACTORIZATIONS ROY MATHIAS

PROOF OF TWO MATRIX THEOREMS VIA TRIANGULAR FACTORIZATIONS ROY MATHIAS PROOF OF TWO MATRIX THEOREMS VIA TRIANGULAR FACTORIZATIONS ROY MATHIAS Abstract. We present elementary proofs of the Cauchy-Binet Theorem on determinants and of the fact that the eigenvalues of a matrix

More information

MAS. Modelling, Analysis and Simulation. Modelling, Analysis and Simulation. Realization theory for linear switched systems. M.

MAS. Modelling, Analysis and Simulation. Modelling, Analysis and Simulation. Realization theory for linear switched systems. M. C e n t r u m v o o r W i s k u n d e e n I n f o r m a t i c a MAS Modelling, Analysis and Simulation Modelling, Analysis and Simulation Realization theory for linear switched systems M. Petreczky REPORT

More information

11 a 12 a 21 a 11 a 22 a 12 a 21. (C.11) A = The determinant of a product of two matrices is given by AB = A B 1 1 = (C.13) and similarly.

11 a 12 a 21 a 11 a 22 a 12 a 21. (C.11) A = The determinant of a product of two matrices is given by AB = A B 1 1 = (C.13) and similarly. C PROPERTIES OF MATRICES 697 to whether the permutation i 1 i 2 i N is even or odd, respectively Note that I =1 Thus, for a 2 2 matrix, the determinant takes the form A = a 11 a 12 = a a 21 a 11 a 22 a

More information

Theory of Probability Fall 2008

Theory of Probability Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 8.75 Theory of Probability Fall 008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. Section Prekopa-Leindler inequality,

More information

Quick Review on Linear Multiple Regression

Quick Review on Linear Multiple Regression Quick Review on Linear Multiple Regression Mei-Yuan Chen Department of Finance National Chung Hsing University March 6, 2007 Introduction for Conditional Mean Modeling Suppose random variables Y, X 1,

More information

1 Introduction This work follows a paper by P. Shields [1] concerned with a problem of a relation between the entropy rate of a nite-valued stationary

1 Introduction This work follows a paper by P. Shields [1] concerned with a problem of a relation between the entropy rate of a nite-valued stationary Prexes and the Entropy Rate for Long-Range Sources Ioannis Kontoyiannis Information Systems Laboratory, Electrical Engineering, Stanford University. Yurii M. Suhov Statistical Laboratory, Pure Math. &

More information

N.G.Bean, D.A.Green and P.G.Taylor. University of Adelaide. Adelaide. Abstract. process of an MMPP/M/1 queue is not a MAP unless the queue is a

N.G.Bean, D.A.Green and P.G.Taylor. University of Adelaide. Adelaide. Abstract. process of an MMPP/M/1 queue is not a MAP unless the queue is a WHEN IS A MAP POISSON N.G.Bean, D.A.Green and P.G.Taylor Department of Applied Mathematics University of Adelaide Adelaide 55 Abstract In a recent paper, Olivier and Walrand (994) claimed that the departure

More information

SOME MEASURABILITY AND CONTINUITY PROPERTIES OF ARBITRARY REAL FUNCTIONS

SOME MEASURABILITY AND CONTINUITY PROPERTIES OF ARBITRARY REAL FUNCTIONS LE MATEMATICHE Vol. LVII (2002) Fasc. I, pp. 6382 SOME MEASURABILITY AND CONTINUITY PROPERTIES OF ARBITRARY REAL FUNCTIONS VITTORINO PATA - ALFONSO VILLANI Given an arbitrary real function f, the set D

More information

BUMPLESS SWITCHING CONTROLLERS. William A. Wolovich and Alan B. Arehart 1. December 27, Abstract

BUMPLESS SWITCHING CONTROLLERS. William A. Wolovich and Alan B. Arehart 1. December 27, Abstract BUMPLESS SWITCHING CONTROLLERS William A. Wolovich and Alan B. Arehart 1 December 7, 1995 Abstract This paper outlines the design of bumpless switching controllers that can be used to stabilize MIMO plants

More information

Lecture 5 Channel Coding over Continuous Channels

Lecture 5 Channel Coding over Continuous Channels Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 14, 2014 1 / 34 I-Hsiang Wang NIT Lecture 5 From

More information

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539 Brownian motion Samy Tindel Purdue University Probability Theory 2 - MA 539 Mostly taken from Brownian Motion and Stochastic Calculus by I. Karatzas and S. Shreve Samy T. Brownian motion Probability Theory

More information

1. Introduction Over the last three decades a number of model selection criteria have been proposed, including AIC (Akaike, 1973), AICC (Hurvich & Tsa

1. Introduction Over the last three decades a number of model selection criteria have been proposed, including AIC (Akaike, 1973), AICC (Hurvich & Tsa On the Use of Marginal Likelihood in Model Selection Peide Shi Department of Probability and Statistics Peking University, Beijing 100871 P. R. China Chih-Ling Tsai Graduate School of Management University

More information

Gaussian, Markov and stationary processes

Gaussian, Markov and stationary processes Gaussian, Markov and stationary processes Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ November

More information

Module 9: Stationary Processes

Module 9: Stationary Processes Module 9: Stationary Processes Lecture 1 Stationary Processes 1 Introduction A stationary process is a stochastic process whose joint probability distribution does not change when shifted in time or space.

More information

If we want to analyze experimental or simulated data we might encounter the following tasks:

If we want to analyze experimental or simulated data we might encounter the following tasks: Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction

More information

No. of dimensions 1. No. of centers

No. of dimensions 1. No. of centers Contents 8.6 Course of dimensionality............................ 15 8.7 Computational aspects of linear estimators.................. 15 8.7.1 Diagonalization of circulant andblock-circulant matrices......

More information

ECON 616: Lecture 1: Time Series Basics

ECON 616: Lecture 1: Time Series Basics ECON 616: Lecture 1: Time Series Basics ED HERBST August 30, 2017 References Overview: Chapters 1-3 from Hamilton (1994). Technical Details: Chapters 2-3 from Brockwell and Davis (1987). Intuition: Chapters

More information

4. Conditional risk measures and their robust representation

4. Conditional risk measures and their robust representation 4. Conditional risk measures and their robust representation We consider a discrete-time information structure given by a filtration (F t ) t=0,...,t on our probability space (Ω, F, P ). The time horizon

More information

Gaussian distributions and processes have long been accepted as useful tools for stochastic

Gaussian distributions and processes have long been accepted as useful tools for stochastic Chapter 3 Alpha-Stable Random Variables and Processes Gaussian distributions and processes have long been accepted as useful tools for stochastic modeling. In this section, we introduce a statistical model

More information

Linear-Quadratic Optimal Control: Full-State Feedback

Linear-Quadratic Optimal Control: Full-State Feedback Chapter 4 Linear-Quadratic Optimal Control: Full-State Feedback 1 Linear quadratic optimization is a basic method for designing controllers for linear (and often nonlinear) dynamical systems and is actually

More information

Chapter Robust Performance and Introduction to the Structured Singular Value Function Introduction As discussed in Lecture 0, a process is better desc

Chapter Robust Performance and Introduction to the Structured Singular Value Function Introduction As discussed in Lecture 0, a process is better desc Lectures on Dynamic Systems and Control Mohammed Dahleh Munther A Dahleh George Verghese Department of Electrical Engineering and Computer Science Massachuasetts Institute of Technology c Chapter Robust

More information

REPORTRAPPORT. Centrum voor Wiskunde en Informatica. Asymptotics of zeros of incomplete gamma functions. N.M. Temme

REPORTRAPPORT. Centrum voor Wiskunde en Informatica. Asymptotics of zeros of incomplete gamma functions. N.M. Temme Centrum voor Wiskunde en Informatica REPORTRAPPORT Asymptotics of zeros of incomplete gamma functions N.M. Temme Department of Analysis, Algebra and Geometry AM-R9402 994 Asymptotics of Zeros of Incomplete

More information

Notes on uniform convergence

Notes on uniform convergence Notes on uniform convergence Erik Wahlén erik.wahlen@math.lu.se January 17, 2012 1 Numerical sequences We begin by recalling some properties of numerical sequences. By a numerical sequence we simply mean

More information

Chapter Stability Robustness Introduction Last chapter showed how the Nyquist stability criterion provides conditions for the stability robustness of

Chapter Stability Robustness Introduction Last chapter showed how the Nyquist stability criterion provides conditions for the stability robustness of Lectures on Dynamic Systems and Control Mohammed Dahleh Munther A Dahleh George Verghese Department of Electrical Engineering and Computer Science Massachuasetts Institute of Technology c Chapter Stability

More information

Parameter Selection Techniques and Surrogate Models

Parameter Selection Techniques and Surrogate Models Parameter Selection Techniques and Surrogate Models Model Reduction: Will discuss two forms Parameter space reduction Surrogate models to reduce model complexity Input Representation Local Sensitivity

More information

Exploring Granger Causality for Time series via Wald Test on Estimated Models with Guaranteed Stability

Exploring Granger Causality for Time series via Wald Test on Estimated Models with Guaranteed Stability Exploring Granger Causality for Time series via Wald Test on Estimated Models with Guaranteed Stability Nuntanut Raksasri Jitkomut Songsiri Department of Electrical Engineering, Faculty of Engineering,

More information

THE INVERSE FUNCTION THEOREM

THE INVERSE FUNCTION THEOREM THE INVERSE FUNCTION THEOREM W. PATRICK HOOPER The implicit function theorem is the following result: Theorem 1. Let f be a C 1 function from a neighborhood of a point a R n into R n. Suppose A = Df(a)

More information

The Uniformity Principle: A New Tool for. Probabilistic Robustness Analysis. B. R. Barmish and C. M. Lagoa. further discussion.

The Uniformity Principle: A New Tool for. Probabilistic Robustness Analysis. B. R. Barmish and C. M. Lagoa. further discussion. The Uniformity Principle A New Tool for Probabilistic Robustness Analysis B. R. Barmish and C. M. Lagoa Department of Electrical and Computer Engineering University of Wisconsin-Madison, Madison, WI 53706

More information

Balanced Truncation 1

Balanced Truncation 1 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.242, Fall 2004: MODEL REDUCTION Balanced Truncation This lecture introduces balanced truncation for LTI

More information

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS PROBABILITY: LIMIT THEOREMS II, SPRING 15. HOMEWORK PROBLEMS PROF. YURI BAKHTIN Instructions. You are allowed to work on solutions in groups, but you are required to write up solutions on your own. Please

More information

Algebraic Information Geometry for Learning Machines with Singularities

Algebraic Information Geometry for Learning Machines with Singularities Algebraic Information Geometry for Learning Machines with Singularities Sumio Watanabe Precision and Intelligence Laboratory Tokyo Institute of Technology 4259 Nagatsuta, Midori-ku, Yokohama, 226-8503

More information