Optimal Experiment Design with Diffuse Prior Information

Optiml Experiment Design with Diffuse Prior Informtion Cristin R. Rojs Grhm C. Goodwin Jmes S. Welsh Arie Feuer Abstrct In system identifiction one lwys ims to lern s much s possible bout system from given observtion period. This hs led to on-going interest in the problem of optiml experiment design. Not surprisingly, the more one knows bout system the more focused the experiment cn be. Indeed, mny procedures for optiml experiment design depend, prdoxiclly, on exct knowledge of the system prmeters. This hs motivted recent reserch on, so clled, robust experiment design where one ssumes only prtil prior knowledge of the system. Here we go further nd study the question of optiml experiment design when the -priori informtion bout the system is diffuse. We show tht bndlimited 1/f noise is optiml for prticulr choice of cost function. I. INTRODUCTION In system identifiction, there is lwys strong incentive to lern s much bout system s possible from given observtion period. This hs motivted substntil interest in the topic of optiml experiment design. Indeed, there exists body of work on this topic, both in the sttistics literture [5, 14, 7] nd in the engineering literture [17, 10, 27]. Much of the existing literture is bsed on designing the experiment to optimize some sclr function of the Fisher Informtion Mtrix [10, pg. 6]. However, fundmentl difficulty is tht when the system response depends non-linerly on the prmeters, the Informtion Mtrix depends, interli, on the true system prmeters. Moreover, we note tht models for dynmicl systems (even if liner) typiclly hve the chrcteristic tht their response depends non-linerly on the prmeters. Hence, the informtion mtrix for models of dynmicl systems generlly depends upon the true system prmeters. This mens tht experiment designs which re bsed on the Fisher Informtion Mtrix will, in principle, depend upon knowledge of the true system prmeters. This is prdoxicl since the optiml experiment then depends on the very thing tht the experiment is imed t estimting [13, pg. 427]. The bove resoning hs motivted the study of, so clled, robust optiml experiment designs with respect to uncertinty on priori informtion. In this vein, vrious pproches hve been proposed, e.g. (i) Itertive design where one lterntes between prmeter estimtion nd experiment design bsed on the C. R. Rojs, G. C. Goodwin nd J. S. Welsh re with the School of Electricl Engineering & Computer Science, The University of Newcstle, NSW, Austrli 2308. cristin.rojs@studentmil.newcstle.edu.u, jmes.welsh@newcstle.edu.u, grhm.goodwin@newcstle.edu.u A. Feuer is with the Deprtment of Electricl Engineering, Technion, Hif 32000, Isrel. feuer@ee.technion.c.il current estimtes [4, 18, 25]. (ii) Byesin design where one optimizes some function of the expected informtion mtrix, with the expecttion tken over some -priori distribution of the prmeters [1, 3, 6]. (iii) Min-Mx design in which one optimizes the worst cse over bounded set of -priori given prmeter vlues [20, 8, 21]. The ltter designs mentioned bove re closely relted to gme theory. Indeed, gme-theoreticl ides hve been used to chrcterize the optiml robust (in the min-mx sense) experiment. For exmple, severl ppers hve studied different types of one-prmeter robust experiment design problems [21, 11]. It hs been shown for these problems tht the optiml min-mx experiment hs mny interesting properties, e.g. it exists, it is unique, it hs compct support in the frequency domin nd it is chrcterized by line spectrum. For multi-prmeter problems, one usully needs to use gridding strtegies to crry out the robust designs numericlly [21, 25]. A surprising observtion from recent work on min-mx optiml experiment design is tht bnd-limited 1/f noise is ctully quite close to optiml for prticulr problems. Indeed, 1/f noise hs been shown to hve performnce which is within fctor of 2 from the performnce of robust optiml designs for first-order nd resonnt systems [21, 11]. It is importnt to note, however, tht the proof of ner optimlity depends on prticulr property of these systems which llows one to scle the prmeters with respect to frequency. Here we sk more generl question: Sy we re just beginning to experiment on system nd thus hve very little (i.e. diffuse) prior knowledge bout it. Wht would be good initil experiment to use to estimte the system? In this cse we consider s diffuse prior informtion tht the interesting prt of the frequency response of the system lies in n intervl [, b]. This implies tht we re seeking n experiment which is good over very brod clss of possible systems. In this pper, we propose possible solution to this problem, being tht the experiment should consist of bndlimited 1/f noise. The pper is structured s follows. In Section II we discuss the problem of mesuring the goodness of n experiment by using system independent criterion. Section III gives some desirble properties tht such mesure would be expected to possess. In Section IV we consider typicl input constrint generlly used in experiment design. Section V shows preliminry result for choosing suitble cost function which stisfies the properties developed in Section

u G(q) Fig. 1. Block digrm describing the reltionship between the input u, the noise n nd the output y of the system G to be identified. III. Sections VI nd VII develop the form of the cost function which stisfies the properties in Section III. In Section VIII we show tht bndlimited 1/f noise is n optiml input signl ccording to this cost function, nd Section IX clerly illustrtes the dvntges of bndlimited 1/f noise by mens of n exmple. We present conclusions in Section X. II. A MEASURE OF THE GOODNESS OF AN EXPERIMENT Our im is to design n experiment which is good for very brod clss of systems. This mens tht we need mesure of goodness of n experiment which is system independent. To construct such mesure, we mke use of the work of Ljung [15], who hs shown tht, for brod clss of liner systems, the vrince of the error in the estimted discrete time frequency response tkes the following symptotic (in both system order nd dt points) form: Vr(Ĝ(ejω )) = K φ n(ω) ; ω [0, 2π], (1) φ u (ω) where φ n is the noise spectrl density nd φ u is the input spectrl density. Here K is function of the number of system prmeters nd the number of observtions. Figure 1 shows how the input u, the noise n nd the output y of the system re relted, i.e. n y(t) = G(q)u(t) + n(t) (2) where G is the trnsfer function of the system, nd q is the forwrd shift opertor. Actully, it hs been rgued in [19] tht better pproximtions exist to tht given in (1) but the simpler expression (1) suffices for our purposes. In fct, the expressions given in [19] for Box-Jenkins nd Output-Error models include fctor which is dependent, for some prticulr specil cses, only upon the poles of G. We note tht this cn be incorported into φ n, thus obtining test signl which is independent on φ n nd the pt. This implies tht the results given here re exct for some clsses of models of finite order. An interesting nd highly desirble property of (1) is tht it is essentilly independent of the system prmeters. This is becuse it depends only on φ n nd φ u. Of course, φ n is somewht problemtic since it would lso be desirble to hve (1) independent of the rel chrcteristics of the noise. This will lso be prt of our considertion. y As rgued in [12, 21, 11, 26], bsolute vrinces re not prticulrly useful when one wnts to crry out n experiment design tht pplies to brod clss of systems. Specificlly, n error stndrd devition of 10 2 in vrible of nominl size 1 would be considered to be insignificnt, wheres the sme error stndrd devition of 10 2 in vrible of nominl size 10 3 would be considered ctstrophic. Hence, it seems preferble to work with reltive errors. Thus, if G(e jω) is the mgnitude of the frequency response of the pt t frequency ω, then eqution (1) suggests tht the reltive vrince t frequency ω is given by φ n (ω) Rel. Vr(Ĝ(ejω )) = K φ u (ω) G(e jω) ; ω [0, 2π]. 2 (3) Finlly, rther thn look t single frequency ω, we will look t n verge mesure over rnge of frequencies. This leds to generl mesure of the goodness of n experiment of the form: J(φ u ) = = F (Vr(Ĝ(ejω ))/ G(e jω ) 2 )W (ω)dω ( F φ u (ω) G(e jω ) 2 ) W (ω)dω, (4) where F nd W re functions to be specified lter, nd 0 < < b < 2π. Here, W is weighting function tht llows the control engineer to define t which frequencies it would be preferble to obtin better model (depending on the control requirements, but not necessrily on the true pt chrcteristics). In the next Section we propose some desirble properties of the functions F nd W. III. DESIRABLE PROPERTIES OF THE COST FUNCTION We consider two sets of criteri. The first reltes principlly to the function F, the second to the function W. In ddition to these properties, we will lso ssume tht F C 1 ([, b], R + 0 ) nd W C1 ([, b], R + ), where C 1 (X, Y ) is the spce of ll functions from X R to Y R hving continuous derivtive. Criteri A It is resonble to consider cost function (4) whose minimum is chieved by function which does not depend on the ctul system chrcteristics. The reson being tht these chrcteristics re typiclly unknown t the time the experiment is pplied, nd in fct it is the purpose of the experiment to revel this informtion. On the other hnd, the cost function (4) should be mesure of the size of the vrince in the estimtion of the pt frequency response. Hence, loosely speking, the cost function should increse ccordingly to n increse of the vrince t ny frequency. The bove rgument implies tht the function F for mesure (4) should be chosen so s to stisfy the following requirements:

A.1) The optiml experiment, φ u, which minimizes J in (4), should be independent on the pt G(e jω ) 2 nd the noise vrince φ n. A.2) The integrnd in (4) should increse if the vrince Vr(Ĝ(ejω )) increses t ny frequency. This implies tht F should be monotoniclly incresing function. Criterion B Mny properties of liner systems depend on the rtio of poles nd zeros rther thn on their bsolute loctions in the frequency domin [2, 9, 22]. This implies tht if we scle the frequency ω by constnt, the optiml input must keep its shpe, s the poles nd zeros of the new pt will hve the sme rtios s before. This invrince property must be reflected in the weighting function W, which hs to give equl weight to frequency intervls whose endpoints re in the sme proportion. Thus, the weighting function W should be such tht for every 0 < α < β < 2π nd every k > 0 such tht 0 < kα < kβ < 2π we hve tht β kβ W (ω)dω = W (ω)dω. (5) α kα IV. CONSTRAINTS Our gol will then be to optimize cost function s in (4) where φ u is constrined in some fshion. A typicl constrint used in experiment design is tht the totl input energy should be constrined [10, pg. 125]. Thus, we need to optimize J(φ u ) subject to constrint of the form φ u (ω)dω = 1. (6) Specificlly our gol is to djust F nd W such tht the optiml experiment tht minimizes (4) subject to the constrint (6) stisfies the criteri A.1, A.2 nd B in Section III. V. A PRELIMINARY TECHNICAL RESULT Motivted by the need for mesure to be independent of the system nd such tht criteri A.1, A.2 nd B re met subject to constrint on the input, we hve estblished the following result: Lemm 1: For 0 < < b < 2π, let g, F C 1 ([, b], R + 0 ) nd W C 1 ([, b], R + ). Define, if it exists, f (g) := rg min f C 1 ([,b],r + ) b f(x)dx=1 F ( ) g(x) W (x)dx. (7) f(x) If f (g) does not depend on g, then there re constnts α, β, γ R such tht F (y) = α y + β; g(x) inf x [,b] f(x) y sup g(x) x [,b] f(x), (8) nd f = γw. Proof: Let g, F C 1 ([, b], R + 0 ) nd W C 1 ([, b], R + ) be fixed, nd such tht f (g), s defined in (7), exists. Then, by [16, Section 7.7, Theorem 2], there is constnt λ R for which f (g), is sttionry point of J λ (f) := F ( ) g(x) W (x)dx + λ f(x)dx. (9) f(x) Thus, for ny h C 1 ([, b], R + 0 ) we hve tht δj λ(f ; h) = 0, which mens [16, Section 7.5] tht [ ( ) ( g(x) F f (x) g(x) (f (x)) 2 ) ] W (x) + λ h (x)dx = 0, (10) thus, by [16, Section 7.5, Lemm 1], ( ) g(x) F g(x) f W (x) (x) (f = λ; x [, b]. (11) (x)) 2 Let l(x) := g(x)/f (x), then (11) cn be written s F (l(x))l(x) = λ f (x) ; x [, b]. (12) W (x) The left side of (12) depends on g, but the right does not (becuse of the ssumption on the independence of f upon g). Thus, both sides re equl to constnt, sy, α R, which implies tht F (l(x)) = α ; x [, b]. (13) l(x) Now, by integrting both sides with respect to l between l(x) nd sup l(x), we obtin inf x [,b] x [,b] F (l(x)) = α l(x) + β; x [, b] (14) for some constnt β R. On the other hnd, we hve tht λ f (x) = α, (15) W (x) so if we define γ := α/λ, we conclude tht f = γw. This concludes the proof. VI. CHOICE OF THE FUNCTION F In this Section we use the result of the previous Section to find suitble function F which stisfies Criteri A.1 nd A.2, nd to find the optiml input signl for the resulting cost function. We first exmine the choice of the function F in (4). Now, we my tke, without loss of generlity, α = 1 nd β = 0 for the function F given by Lemm 1. This is becuse, ccording to Lemm 1, every cost function (4) stisfying Criteri A.1 nd A.2 is minimized by the sme f C 1 ([, b], R + ). Thus, such cost function cn be written s J(φ u ) = ( φ u (ω) G(e jω ) 2 ) W (ω)dω. (16)

It is then reltively strightforwrd to optimize (16) subject to the constrint given by (6). Indeed, by Lemm 1 the optiml experiment will be essentilly given by scled version of W, i.e. φ 1 u(ω) = W ; ω [, b]. (17) W (x)dx The following Lemm estblishes tht φ u gives not only n extremum, but globl minimum for the cost function (16). Lemm 2: The function φ u defined in (17) gives the globl minimum of the cost function (16). In other words, for 0 < < b < 2π, let W C 1 ([, b], R + ), then, φ u = (18) rg min ( ) φ u (ω) G(e jω ) 2 W (ω)dω. φ u C 1 ([,b],r + ) b φ u(ω)dω=1 Proof: The cost function (16) cn be written s J(φ u ) = C (φ u (ω))w (ω)dω, (19) where C is constnt, independent of φ u, given by Now, if φ u C := ( ) Kφn (ω) G(e jω ) 2 W (ω)dω. (20) is ny function in C 1 ([, b], R + ) such tht φ u (ω)dω = 1, then by (17) we hve tht J(φ u ) = C = C = J(φ u) [φ u(ω) + (φ u (ω) φ u(ω))]w (ω)dω (φ u(ω))w (ω)dω 1 φ u(ω) (φ u(ω) φ u(ω))w (ω)dω h(φ u (ω), φ u(ω))w (ω)dω (21) = J(φ u) h(φ u (ω), φ u(ω))w (ω)dω W (ω)dω (φ u (ω) φ u(ω))dω h(φ u (ω), φ u(ω))w (ω)dω, where h : R + R + R is given by h(x, y) := x y 1 (x y). (22) y Thus, since w > 0, to prove tht φ u gives the globl minimum for the cost function (16), it suffices to show tht h(x, y) < 0 for every x, y R + such tht x y. To this end, notice tht h x (x, y) = 1 x 1 y, (23) thus if x > y, then x h h(x, y) = h(y, y) + ( x, y)d x < 0, (24) x nd similrly for x < y. This proves the Lemm. The reltionship given in (17) highlights the importnce of choosing the correct function W so s to reflect the desired reltive frequency weighting. The choice of W will be explored in the next Section. VII. CHOICE OF THE FUNCTION W A weighting function which is resonble in the sense tht it stisfies Criterion B is described below: Lemm 3: For 0 < < b < 2π, let W C 1 ([, b], R + ). If W stisfies β kβ W (ω)dω = W (ω)dω (25) α for every α < β b nd every k > 0 such tht kα < kβ b, then there is λ > 0 such tht W (x) = λ/x for every x [, b]. Proof: Since W is continuous, we hve from (25) tht +ε W (ω)dω W () = lim ε 0 + for 1 k < b/. Thus, = lim ε 0 + k y = kw (k) kα ε k+kε k W (ω)dω kε (26) W (k) = 1 W (); k < b, (27) k or, by defining x := k nd λ := W (), W (x) = x W () = λ x ; x < b. (28) By the continuity of W, we lso hve tht W (b) = λ/b. This proves the Lemm. With this lst result, nd those of the previous Sections, we cn now proceed to estblish the form of suitble mesure of the goodness of n experiment, nd n optiml input signl ccording to this cost function. This will be done in the next Section.

10 Power spectrl density fctor of 2 of the optimum for certin fmilies of oneprmeter problems [12, 21, 11], lthough generl results for multi-prmeter problems re not yet vilble. TABLE I RELATIVE VALUES OF COST FOR DIFFERENT INPUT SIGNALS 1 mx θ Θ [θ2 M(θ, φ u)] 1 Single frequency t ω = 1 7.75 Bndlimited white noise 12.09 Bndlimited 1/f noise 1.43 Robust min-mx optiml input 1.00 0.1 0 0.5 1 1.5 2 2.5 3 Frequency [rd/s] Fig. 2. Power spectrl density of bndlimited 1/f noise signl for = 1 nd b = 2. VIII. BAND-LIMITED 1/f NOISE If we pply the results of the previous sections to the cost function (16), then we immeditely see tht resonble cost function for mesuring the goodness of n experiment when hving only diffuse prior knowledge bout pt is J(φ u ) = ( φ u (ω) G(e jω ) 2 ) 1 dω. (29) ω Therefore, ccording to (17) nd Lemm 2, the optiml input spectrum is given by φ u(ω) = 1/ω dω ω = 1/ω ; ω [, b]. (30) b Figure 2 shows the spectrl density of this type of signl, known s bndlimited 1/f noise, for = 1 nd b = 2. Thus we see tht, subject to the ssumptions introduced bove, i.e. Criteri A.1, A.2 nd B, 1/f noise is the robust input signl for identifying system when one hs only diffuse prior knowledge. Remrk 1: The fct tht bndlimited 1/f noise is the solution of vritionl problem mens tht it is possible to consider dditionl prior informtion by imposing constrints in the optimistion problem. In this sense, the problem of experiment design resembles the development of the Principle of Mximum Entropy s given in [23, 24]. IX. EXAMPLE We hve seen bove tht bndlimited 1/f noise cn be regrded s robust optiml test signl in the sense described in Section VIII. This result is consistent with erlier findings in the literture, which show tht bndlimited 1/f noise hs ner optiml properties for specific clsses of systems. For exmple, it is known to yield performnce which is within Tble I, reproduced from [21], shows some interesting results. In prticulr, this Tble shows the numericl results for the problem of designing n input signl to identify the prmeter θ of the pt 1 G(s) = s/θ + 1, (31) where it is ssumed priori tht θ lies in the rnge Θ := [0.1, 10]. The cost function used for comprison is the worst cse normlized vrince of n efficient estimtor of θ, J (φ u ) := mx θ Θ 0 ω 2 /θ 2 (ω 2 /θ 2 + 1) 2 φ u(ω)dω 1, (32) where the inputs being compred re (i) A sine wve of frequency 1 (this is the optiml input if the true prmeter is θ = 1). (ii) Bndlimited white noise input, limited to the frequency rnge [0.1, 10]. (iii) Bndlimited 1/f noise input, limited to the frequency rnge [0.1, 10]. (iv) The pproximte discretised robust optiml input generted by Liner Progrmming [21]. Notice tht, for ese of comprison, the costs in Tble I hve been normlized so tht the robust optiml input hs cost 1.00. Figure 3 shows the performnce of these signls ccording to the normlized vrince obtined s function of the true vlue of θ. Both Tble I nd Figure 3 demonstrte tht bndlimited 1/f noise does indeed yield good performnce t lest in terms of n specific exmple. The results presented in the current pper give theoreticl support to these erlier observtions. X. CONCLUSIONS In this pper, we hve studied the problem of robust experiment design in the fce of diffuse prior informtion. We hve nlysed generl clss of criteri for mesuring how good n experiment is, nd hve found tht there is specific mesure within this clss tht gives system independent optiml experiment design, which is suitble for the cse when one only hs vgue ide bout the pt to be identified. We hve lso shown tht 1/f noise is optiml ccording to this cost function.

Cost 10 3 10 2 10 1 10 0 10 1 10 0 10 1 Fig. 3. Vrition of the normlized vrince [ ] ω 2 /θ 2 1 0 (ω 2 /θ 2 +1) 2 φ u (ω)dω s function of θ for vrious input signls: the robust optiml input (solid), sine wve of frequency 1 (dotted), bndlimited white noise (dshed) nd bndlimited 1/f noise (dsh-dotted). θ REFERENCES [1] A. C. Atkinson nd A. N. Doner. Optimum Experiment Design. Clrendon Press, Oxford, 1992. [2] H. W. Bode. Network Anlysis nd Feedbck Amplifier Design. D. Vn Nostrnd, 1945. [3] K. Chloner nd I. Verdinelli. Byesin experiment design: A review. Sttisticl Science, 1995. [4] H. Chernoff. Approches in sequentil design of experiments. In J.N. Srivstv, editor, Survey of Sttisticl Design, pges 67 90. North Hold, Amsterdm, 1975. [5] D. R. Cox. Pning of Experiments. Wiley, New York, 1958. [6] M. A. El-Gml nd T. R. Plfrey. Economicl experiments: Byesin efficient experimentl design. Int. J. Gme Theory, 25(4):495 517, 1996. [7] V. V. Fedorov. Theory of Optiml Experiments. Acdemic Press, New York nd London, 1972. [8] V. V. Fedorov. Convex design theory. Mth. Opertionsforsch. Sttist. Ser. Sttistics, 11(3):403 413, 1980. [9] G. C. Goodwin, S. F. Grebe, nd M. E. Slgdo. Control System Design. Prentice Hll, Upper Sddle River, New Jersey, 2001. [10] G. C. Goodwin nd R. L. Pyne. Dynmic System Identifiction: Experiment Design nd Dt Anlysis. Acdemic Press, New York, 1977. [11] G. C. Goodwin, C. R. Rojs, nd J. S. Welsh. Good, bd nd optiml experiments for identifiction. In T. Gld, editor, Forever Ljung in System Identifiction - Workshop on the occsion of Lennrt Ljung s 60th birthdy. September 2006. [12] G. C. Goodwin, J. S. Welsh, A. Feuer, nd M. Derpich. Utilizing prior knowledge in robust optiml experiment design. Proc. of the 14th IFAC SYSID, Newcstle, Austrli, pges 1358 1363, 2006. [13] H. Hjlmrsson. From experiment design to closed-loop control. Automtic, 41(3):393 438, Mrch 2005. [14] O. Kempthorne. Design nd Anlysis of Experiments. Wiley, New York, 1952. [15] L. Ljung. Asymptotic vrince expressions for identified blck-box trnsfer function models. IEEE Trnsctions on Automtic Control, 30:834 844, September 1985. [16] D. G. Luenberger. Optimiztion by Vector Spce Methods. John Wiley & Sons, 1969. [17] R. K. Mehr. Optiml inputs for system identifiction. IEEE Trnsctions on Automtic Control, AC-19:192 200, 1974. [18] W. G. Müller nd B. M. Pötscher. Btch sequentil design for nonliner estimtion problem. In V.V. Fedorov, W.G. Müller, nd I.N. Vuchkov, editors, Model-Orientted Dt Anlysis, Survey of Recent Methods, pges 77 87. Physic-Verlg, Heidelberg, 1992. [19] B. Ninness nd H. Hjlmrsson. Vrince error quntifictions tht re exct for finite-model order. IEEE Trnsctions on Automtic Control, 49(8):1275 1291, 2004. [20] L. Pronzto nd E. Wlter. Robust experiment design vi mxmin optimistion. Mthemticl Biosciences, 89:161 176, 1988. [21] C. R. Rojs, G. C. Goodwin, J. S. Welsh, nd A. Feuer. Robust optiml experiment design for system identifiction. Automtic (ccepted for publiction), 2006. [22] M. M. Seron, J. H. Brslvsky, nd G. C. Goodwin. Fundmentl Limittions in Filtering nd Control. Springer-Verlg, 1997. [23] J. E. Shore nd R. W. Johnson. Axiomtic derivtion of the principle of mximum entropy nd the principle of minimum cross-entropy. IEEE Trnsctions on Informtion Theory, 26(1):26 37, 1980. [24] J. Skilling. The xioms of mximum entropy. In G. J. Erickson nd C. R. Smith, editors, Mximum-Entropy nd Byesin Methods in Science nd Engineering (Vol. 1), pges 173 187. Kluwer Acdemic Publishers, 1988. [25] E. Wlter nd L. Pronzto. Identifiction of Prmetric Models from Experimentl Dt. Springer-Verlg, Berlin, Heidelberg, New York, 1997. [26] J. S. Welsh, G. C. Goodwin, nd A. Feuer. Evlution nd comprison of robust optiml experiment design criteri. Proceedings of the Americn Control Conference, Minnepolis, USA, pges 1659 1664, 2006. [27] M. Zrrop. Optiml Experiment Design for Dynmic System Identifiction, volume 21 of Lecture Notes in Control nd Informtion. Springer, Berlin, New York, 1979.