Monotonic convergence of a general algorithm for computing optimal designs

Size: px

Start display at page:

Download "Monotonic convergence of a general algorithm for computing optimal designs"

Pamela Turner
6 years ago
Views:

1 Monotonc convergence of a general algorthm for computng optmal desgns Yamng Yu Department of Statstcs Unversty of Calforna Irvne, CA 92697, USA yamngy@uc.edu Abstract Monotonc convergence s establshed for a general class of multplcatve algorthms ntroduced by Slvey et al. 1978) for computng optmal desgns. A conjecture of Ttterngton 1978) s confrmed as a consequence. Optmal desgns for logstc regresson are used as an llustraton. Keywords: A-optmalty; auxlary varables; c-optmalty; D-optmalty; expermental desgn; generalzed lnear models; multplcatve algorthm. 1 A general class of algorthms Optmal expermental desgn approxmate theory) s a well-developed area and we refer to Kefer 1974), Slvey 1980), Pázman 1986), and Pukelshem 1993) for general ntroducton and basc results. We consder computatonal aspects of optmal desgns, focusng on a fnte desgn space X = {x 1,..., x n }. Suppose the probablty densty or mass functon of the response s specfed as py x, θ), where θ s the m 1 parameter of nterest. Let A denote the m m expected Fsher nformaton matrx from a unt assgned to x the expectaton s wth respect to py x, θ)): A = E[sθ; y, x )s θ; y, x )], sθ; y, x ) = log py x, θ). θ The moment matrx, as a functon of the desgn measure w = w 1,..., w n ), s defned as Mw) = w A, whch s proportonal to the Fsher nformaton for θ when the number of unts assgned to x s proportonal to w. Here w Ω, and Ω denotes the closure of Ω = {w : w > 0, n =1 w = 1}. Throughout we assume that A are well-defned and nonnegatve defnte for fxed θ). The set =1 Ω + {w Ω : Mw) > 0 postve defnte)} s assumed nonempty. Our approach may concevably extend to the case where Mw) s allowed sngular, by usng generalzed nverses, although we do not pursue ths here. Gven an optmalty crteron φ, defned on postve defnte matrces, the goal s to maxmze φmw)) wth respect to w Ω +. Typcal optmalty crtera nclude 1

2 ) the D-crteron φ 0 M) = log detm), ) the A-crteron φ 1 M) = trm 1 ), ) more generally, the pth mean crteron φ p M) = trm p ), p < 0, and v) the c-crteron φ c M) = c M 1 c, where c s a nonzero constant vector. Our notaton s mostly for convenence and may not correspond exactly to those of well-known texts.) Often only a lnear combnaton K θ, e.g., a subvector of θ, s of nterest. Assumng nvertblty, the Fsher nformaton for K θ s naturally defned as K M 1 K) 1 Pukelshem, 1993). We may therefore consder the D- and A-crtera for K θ defned respectvely as φ 0,K M) = log detk M 1 K); φ 1,K M) = trk M 1 K). 1) The c-crteron s a specal case of φ 1,K M). Motvatons for such optmalty crtera are wellknown. For example, n a lnear problem, the A-crteron seeks to mnmze the sum of varances of the best lnear unbased estmators BLUEs) for all coordnates of θ, whle the c-crteron seeks to mnmze the varance of the BLUE for c θ. Smlar nterpretatons wth asymptotc arguments) apply to nonlnear problems. In general Mw) also depends on the unknown parameter θ, whch complcates the defnton of an optmalty crteron. A smple soluton s to maxmze φmw)) wth θ fxed at a pror guess θ ; ths leads to local optmalty Chernoff 1953). Local optmalty may be crtczed for gnorng uncertanty n θ. However, n a stuaton where real pror nformaton s avalable, or where the dependence of M on θ s weak, t s nevertheless a vable approach, and has been adopted routnely see, for example, L and Majumdar 2008). Henceforth we assume a fxed θ and suppress the dependence of M on θ. Possble extensons are mentoned n Secton 5. Optmal desgns do not usually come n closed form. As early as Wynn 1972), Fedorov 1972) and Atwood 1973), and as late as Torsney 2007), Harman and Pronzato 2007), and Dette 2008), varous procedures have been studed for numercal computaton. We shall focus on the followng multplcatve algorthm Ttterngton 1976, 1978; Slvey 1978), whch s specfed through a power parameter λ 0, 1]. Algorthm I Set λ 0, 1] and w 0) Ω. For t = 0, 1,..., compute w t) ) where w t+1) = w t) d λ n j=1 wt) j d λ j w t) ), = 1,..., n, 2) d w) = trφ Mw))A ), φ M) φm) M. 3) Iterate untl convergence. We defer the dscusson on convergence crtera untl Secton 4. For a heurstc explanaton, observe that 2) s equvalent to w t+1) w t) φmw)) λ w, = 1,..., n. 4) w=w t)) The value of φmw))/ w ndcates the amount of gan n nformaton, as measured by φ, by a slght ncrease n w, the weght on the th desgn pont. So 4) can be seen as adjustng w so that relatvely more weght s placed on desgn ponts whose ncreased weght may result n a larger gan n φ. 2

3 Algorthm I s remarkable n ts generalty. For example, lttle restrcton s placed on the underlyng model py x, θ). Part of the reason, of course, s that we focus on Fsher nformaton and local optmalty, whch essentally reduces the problem to a lnear one. There exsts a large lterature on Algorthm I and ts relatves; see, for example, Ttterngton 1976, 1978), Slvey et al. 1978), Pázman 1986), Fellman 1989), Pukelshem and Torsney 1991), Torsney and Mandal 2006), Harman and Pronzato 2007), and Dette et al. 2008). One feature that has attracted much attenton s that Algorthm I appears to be monotonc,.e., φmw t) )) ncreases n t, at least n some specal cases. For example, when φ = φ 0 for D- optmalty) and λ = 1, Ttterngton 1976) and Pázman 1986) have shown monotoncty usng clever probablstc and analytc nequaltes; see also Dette 2008) and Harman and Trnovská 2009). Algorthm I s also known to be monotonc for φ = φ 1,K as n 1), assumng λ = 1/2 and A are rank-one Fellman 1974; Torsney 1983). Monotoncty s mportant because convergence then holds under mld assumptons see Secton 4). Results n these specal cases suggest a monotonc convergence theory for a broad class of φ, whch s also supported by numercal evdence presented n some of the references above. 2 Man result We am to state general condtons on φ that ensure that Algorthm I converges monotoncally. As a consequence certan known theoretcal results are unfed and generalzed, and one partcular conjecture Ttterngton 1978) s confrmed. Defne ψm) φm 1 ), M > 0. The functons φ and ψ are assumed to be dfferentable on postve defnte matrces. Our condtons are convenently stated n terms of ψ. As usual, for two symmetrc matrces, M 1 <)M 2 means M 2 M 1 s nonnegatve postve) defnte. ψm) s ncreasng: 0 < M 1 M 2 = ψm 1 ) ψm 2 ), 5) or, equvalently, ψ M) s nonnegatve defnte for postve defnte M. ψm) s concave: αψm 1 ) + 1 α)ψm 2 ) ψαm α)m 2 ), 6) for α [0, 1], M 1, M 2 > 0. Equvalently, ψm 2 ) ψm 1 ) + trψ M 1 )M 2 M 1 )), M 1, M 2 > 0. 7) Condton 5) s usually satsfed by any reasonable nformaton crteron Pukelshem 1993). Also note that, f 5) fals, then φmw))/ w on the rght hand sde of 4) s not even guaranteed to be nonnegatve. The real restrcton s the concavty condton 6). For example, 6) s not satsfed by ψ p M) = φ p M 1 ) the pth mean crteron) when p < 1. It s usually assumed that φm), rather than ψm), s concave.) Nevertheless, 6) s satsfed by a wde range of crtera, ncludng the commonly used D-, A- or c-crtera see Cases ) and ) n the llustraton of the man result below). Our man result s as follows. 3

4 Theorem 1 General monotoncty). Assume 5) and 6). Assume that n teraton 2), Mw t) ) > 0, φ Mw t) )) 0, and Mw t+1) ) > 0. Then φmw t+1) )) φmw t) )). In other words, under mld condtons that ensure that 2) s well-defned specfcally, the denomnator n 2) s nonzero), 5) and 6) mply that 2) monotoncally ncreases the crteron φ. Let us llustrate Theorem 1 wth some examples. For smplcty, n ) v) we dsplay formulae for λ = 1 only, although the dscusson apples to all λ 0, 1]. ) Take { log det M, p = 0; φ p M) = trm p ), p [ 1, 0). Then ψ p M) φ p M 1 ) satsfes 5) and 6) see Appendx A). By Theorem 1, Algorthm I s monotonc for φ = φ p, p [ 1, 0]. Ths generalzes the prevously known cases p = 0 and p = 1 wth partcular values of λ). The teraton 2) reads w t+1) = w t) trm p 1 w t) )A ) trm p w t), = 1,..., n. )) ) More generally, gven a full rank m r matrx K r m), consder { ψ p,k M 1 log detk ) φ p,k M) = M 1 K), p = 0; trk M 1 K) p ), p [ 1, 0). Then ψ p,k M) satsfes 5) and 6) the proof s the same as n Case )). By Theorem 1, Algorthm I s monotonc for φ = φ p,k, p [ 1, 0]. The teraton 2) reads w t+1) = w t) trm 1 KK M 1 K) p 1 K M 1 A ) tr K M 1 K) p ). 8) M=Mw t) ) ) In partcular, takng r = 1, K = c an m 1 vector) and p = 1 n Case ), we obtan that Algorthm I s monotonc for the c-crteron φ c. The teraton 8) reduces to compare wth Fellman 1974) w t+1) = w t) c M 1 w t) )A M 1 w t) )c c M 1 w t), = 1,..., n. )c v) Consder another example of Case ), wth p = 0, r = m 1 and K = 0 r, I r ). Henceforth 0 r denotes the r 1 vector of zeros, and I r denotes the r r dentty matrx. Assume A = x x, x = 1, z ) and z s m 1) 1. Ths corresponds to a D-optmal desgn problem for the lnear model wth ntercept y x, θ) Nx θ, σ 2 ), x = 1, z ), where the parameter s θ = θ 0, θ 1,..., θ m 1 ) but nterest centers on θ 1,..., θ m 1 ), not θ 0. Nevertheless, as far as the desgn measure w s concerned, the optmalty crteron, φ 0,K M), concdes wth φ 0 M),.e., log detk M 1 w)k) = log det Mw). 4

5 After some algebra, 8) reduces to where w t+1) z = = w t) =1 z z) M 1 c w t) z ; M c w t) ) = w t) )z z), = 1,..., n, 9) m 1 =1 w t) z z)z z). Thus 9) satsfes det Mw t+1) ) det Mw t) ). Monotoncty of 9) has been conjectured snce Ttterngton 1978), and consderable numercal evdence has accumulated over the years. Recently, extendng the arguments of Pázman 1986), Dette et al. 2008) have obtaned results that come very close to resolvng Ttterngton s conjecture. Nevertheless, we have been unable to extend ther arguments any further. Instead we prove the more general Theorem 1 usng a dfferent approach, and settle ths conjecture as a consequence. The proof of Theorem 1 s acheved by usng a method of auxlary varables. When a functon fw) e.g., det Mw)) to be mnmzed s complcated, we ntroduce a new varable Q and a functon gw, Q) such that mn Q gw, Q) = fw) for all w, thus transformng the problem nto mnmzng gw, Q) over w and Q jontly. Then we may use an teratve condtonal mnmzaton strategy on gw, Q). Ths s nspred by the EM algorthm Dempster et al. 1977; Meng and van Dyk 1997); n partcular, see Csszár and Tusnady s [6] nterpretaton see [30] for a related nterpretaton of the data augmentaton algorthm). In Secton 3 we analyze Algorthm I usng ths strategy. Although attenton s pad to the mathematcs, our focus s on ntutvely appealng nterpretatons, whch may lead to further extensons of Algorthm I wth the same desrable monotoncty propertes. If the algorthm s monotonc, then convergence can be establshed under mld condtons Secton 4). Secton 5 contans an llustraton wth locally optmal desgns for generalzed lnear models. 3 Explanng the monotoncty A key observaton s that the problem of maxmzng φmw)), or, equvalently, mnmzng ψm 1 w)) can be formulated as a jont mnmzaton over both the desgn and the estmator. Specfcally, let us compare the orgnal Problem P1 wth ts companon P2. Throughout A 1/2 denotes the symmetrc nonnegatve defnte SNND) square root of an SNND matrx A. Problem P1: Mnmze φmw)) ψ n =1 w A ) 1 ) over w Ω. Problem P2: Mnmze gw, Q) ψ Q w Q ) 10) over w Ω and Q an m mn) matrx), subject to QG = I m, where w Dagw 1 1,..., w 1 n ) I m ; G A 1/2 1,..., A 1/2 n ). Though not mmedately obvous, P1 and P2 are equvalent, and ths may be explaned n statstcal terms as follows. In 10), Q w Q s smply the varance matrx of a lnear unbased estmator, QY, of the m 1 parameter θ n the model Y = Gθ + ɛ, ɛ N0, w ), 5

6 where Y s the mn) 1 vector of observatons. The constrant QG = I m ensures unbasedness. Note that G s full-rank snce Mw) s nonsngular by assumpton.) Of course, the weghted least squares WLS) estmator s the best lnear unbased estmator, havng the smallest varance matrx n the sense of postve defnte orderng) and, by 5), the smallest ψ for that matrx. It follows that, for fxed w, gw, Q) s mnmzed by choosng QY as the WLS estmator: However, from 10) and 11) we get gw, ˆQ W LS ) = nf gw, Q), 11) QG=I m ) ˆQ W LS = M 1 w) w 1 A 1/2 1,..., w n An 1/2. gw, ˆQ W LS ) = ψm 1 w)). 12) That s, P2 reduces to P1 upon mnmzng over Q. Snce P2 s not mmedately solvable, t s natural to consder the subproblems: ) mnmzng gw, Q) over Q for fxed w, and ) mnmzng gw, Q) over w for fxed Q. Part ) s agan formulated as a jont mnmzaton problem. For a fxed m mn) matrx Q such that QG = I m, let us consder Problems P3 and P4. Problem P3: Mnmze gw, Q) as n 10) over w Ω. Problem P4: Mnmze the functon )) hσ, w, Q) = ψσ) + tr ψ Σ) Q w Q Σ 13) over w Ω and the m m postve-defnte matrx Σ. To relate P4 to P3, note that for fxed w and Q, the concavty assumpton 7) mples that hσ, w, Q) ψ Q w Q ) 14) wth equalty when Σ = Q w Q,.e., Problem P4 reduces to P3 upon mnmzng over Σ. Snce P4 s not mmedately solvable, t s natural to consder the subproblems: ) mnmzng hσ, w, Q) over Σ for fxed w and Q, and ) mnmzng hσ, w, Q) over w for fxed Σ and Q. Part ), whch amounts to mnmzng tr ψ Σ)Q w Q ) = tr Q ψ Σ)Q w ), admts a closed-form soluton: f we wrte Q = Q 1,..., Q n ) where each Q s m m, then w 2 should be proportonal to trq ψ Σ)Q ). But algorthm I may not perform an exact mnmzaton here; see 15). Based on the above dscusson, we can express Algorthm I as an teratve condtonal mnmzaton algorthm nvolvng w, Q and Σ. At teraton t, defne Q t) = Q t) 1,..., Qt) n ); Q t) = w t) M 1 w t) )A 1/2, = 1,..., n; Σ t) = Q t) w t)q t) = M 1 w t) ). 6

7 Then we have ψm 1 w t) )) = gw t), Q t) ) by 12)) = hσ t), w t), Q t) ) by 13)) hσ t), w t+1), Q t) ) see below) 15) gw t+1), Q t) ) by 14), 10)) 16) ψm 1 w t+1) )) by 11), 12)). 17) The choce of w t+1) leads to 15) as follows. After smple algebra, the teraton 2) becomes where w t+1) = r λ w1 2λ n j=1 rλ j w1 2λ j w w t), r tr Snce 0 < λ 1, Jensen s nequalty yelds Q t), = 1,..., n, ) ψ Σ t) )Q t). r w =1 ) 1 λ =1 ) 1 λ r w w 2 ; r w =1 ) λ =1 ) λ r w w 2. That s, Hence =1 r w =1 tr ψ Σ t) )Q t) w t)q t) ) = r 1 λ w 2λ 1 =1 =1 r w t) r w t+1) ) =1 r λ w 1 2λ ) = tr ψ Σ t) )Q t) w t+1)q t) ),. whch produces 15). Choosng λ = 1/2,.e., w t+1) r, leads to exact mnmzaton n 15); choosng λ = 1 yelds equalty n 15). But any choce of w t+1) that decreases hσ t), w, Q t) ) at 15) would have resulted n the desred nequalty ψm 1 w t) )) ψm 1 w t+1) )). We may allow λ to change from teraton to teraton, and monotoncty stll holds, as long as λ 0, 1]. See Slvey et al. 1978) and Fellman 1989) for nvestgatons concernng the choce of λ. Also note that we assume w t), w t+1) > 0 for all. Ths s not essental, however, because ) the possblty of w t) = 0 can be handled by restrctng our analyss to all desgn ponts such that w t) > 0, and ) the possblty of w t+1) = 0 can be handled by a standard lmtng argument. Monotoncty holds as long as Mw t) ) and Mw t+1) ) are both postve defnte, as noted n the statement of Theorem 1. 7

8 4 Global convergence Let us revew characterzatons of optmal desgns, commonly referred to as general equvalence theorems; see Kefer and Wolfowtz 1960) and Whttle 1973). Theorem 2. Assume the crteron φ s dfferentable and concave. Defne dw) w d w) = trφ Mw))Mw)), =1 where, as n 3), d w) trφ Mw))A ). Then for every w Ω +, the followng are equvalent: a) w maxmzes φmw)) on Ω + ; b) d w ) dw ) for all ; c) for all, f w 0 then d w ) = dw ), and f w = 0 then d w ) dw ). Theorem 2 suggests the followng convergence crteron for Algorthm I: stop when w t)) 1 + δ) d max 1 n d w t)), 18) for some small δ > 0. Theorem 2 also plays an mportant role n the proof of our man convergence result Theorem 3). Of course, the drvng force behnd Theorem 3 s monotoncty Theorem 1). Theorem 3 Global convergence). Denote the mappng 2) by T. a) Assume φ Mw)) 0; φ Mw))A 0, w Ω +, = 1,..., n. b) Assume 2) s strctly monotonc,.e., w Ω +, T w w = φmt w)) > φmw)). 19) c) Assume φ s strctly concave and φ s contnuous on postve defnte matrces. d) Assume that, f M a postve defnte matrx) tends to M such that φm) ncreases monotoncally, then M s nonsngular. Let w t) be generated by 2) wth w 0) > 0 for all. Then ) all lmt ponts of w t) are global maxma of φmw)), and ) as t, φmw t) )) ncreases monotoncally to sup w Ω+ φmw)). The proof of Theorem 3 s somewhat subtle. Standard arguments see Lemma 1 n Appendx B) show that all lmt ponts of w t) are fxed ponts of the mappng T. Ths alone does not mply convergence to a global maxmum, however, because there often exst sub-optmal fxed ponts on the boundary of Ω. Global maxma occur routnely on the boundary also.) Our goal s therefore to rule out possble convergence to such sub-optmal ponts; Appendx B presents the detals. We shall comment on Condtons a) d). Condton a) ensures that startng wth w 0) Ω +, all teratons are well-defned. Moreover, f w 0) > 0 for all, then w t) > 0 for all t and. Ths hghlghts the followng basc dea. In order to converge to a global maxmum w, the startng value w 0) must assgn postve weght to every support pont of w. Such a requrement s not necessary for monotoncty. On the 8

9 other hand, assgnng weght to non-supportng ponts of w tends to slow the algorthm down. Hence methods that quckly elmnate non-optmal support ponts are valuable Harman and Pronzato, 2007). Condton b) smply says that unless w s a fxed pont, the mappng T should produce a better soluton. Let us assume 5), 7) and Condton a), so that Theorem 1 apples. Then, by checkng the equalty condton n 15), t s easy to see that Condton b) s satsfed f 0 < λ < 1. The argument leadng to 19) techncally assumes that all coordnates of w are nonzero, but we can apply t to the approprate subvector of w.) If λ = 1, then 15) reduces to an equalty. However, by checkng the equalty condtons n 16) and 17), we can show that Condton b) s satsfed f ψ s strctly ncreasng and strctly concave: M 2 M 1 > 0, M 1 M 2 = ψm 1 ) < ψm 2 ); 20) M 1, M 2 > 0, M 1 M 2 = ψm 2 ) < ψm 1 ) + trψ M 1 )M 2 M 1 )). 21) Condtons c) and d) are techncal requrements that concern φ alone. Condton c) ensures unqueness of the optmal moment matrx, whch smplfes the analyss. Condton d) ensures that postve defnteness of Mw) s mantaned n the lmt. Condtons c) and d) are satsfed by φ = φ p wth p 0, for example. Let us menton a typcal example of Theorem 3. Corollary 1. Assume A 0, w 0) > 0, = 1,..., n, and Mw 0) ) > 0. Then the concluson of Theorem 3 holds for Algorthm I wth φ = φ 0. Proof. Condtons a), c) and d) are readly verfed. Condton b) s satsfed by 20) and 21). The clam follows from Theorem 3. A more specalzed example concerns 9). Proposton 1. Assume that X = x 1,..., x n ) as n 9) has rank m, and that w 0) all. Then the concluson of Theorem 3, wth φ = φ 0, holds for the teraton 9). > 0 for The teraton 9) techncally does not satsfy the assumptons of Theorem 3. For example, we do not have w t+1) > 0 even f w t) > 0 for all. However, nspecton of 9) shows that w t+1) s set to zero only when z = z, n whch case t can be shown that an optmal desgn need not nclude x as a support pont,.e., x s safely elmnated. Proposton 1 can be establshed by followng the proof of Theorem 3 step by step, rather than appealng to Theorem 3 drectly. We omt the detals. 5 Remarks and llustratons The lterature abounds wth numercal examples of Algorthm I and ts relatves. There are several reasons for such wde nterest. Smlar to the EM algorthm, Algorthm I s smple, easy to mplement, and monotoncally convergent for a large class of optmalty crtera although ths was not proved n the present generalty). Algorthm I s known to be slow sometmes. But t serves as a foundaton upon whch more effectve varants can be bult see, e.g., Harman and Pronzato 2007, and Dette et al. 2008). Whle solvng the conjectured monotoncty of 9) holds mathematcal nterest, our man contrbuton s a way of nterpretng such algorthms as 9

10 optmzaton on augmented spaces. Ths opens up new possbltes n constructng algorthms wth the same desrable monotonc convergence propertes. As a numercal example, consder the logstc regresson model py x, θ) = expyx θ) 1 + expx, y = 0, 1. θ) The expected Fsher nformaton for θ from a unt assgned to x s A = x expx θ) 1 + expx θ))2 x. We compute locally optmal desgns wth pror guess θ = 1, 1) m = 2), and desgn spaces X 1 = {x = 1, /20), = 1,..., 20}; X 2 = {x = 1, /10), = 1,..., 30}. The desgn crtera consdered are φ 0 for D-optmalty) and φ 2. We use Algorthm I wth λ = 1, startng wth equally weghted desgns. For φ 0, Corollary 1 guarantees monotonc convergence. Ths s llustrated by Fgure 1, the frst row, where φ 0 = log det Mw) s plotted aganst teraton t. Usng the convergence crteron 18) wth δ = , the number of teratons untl convergence s 93 for X 1 and 2121 for X 2. The actual locally D-optmal desgns are w 1 = w 20 = 0.5 for X 1 and w 1 = w 23 = 0.5 for X 2, as can be verfed usng the general equvalence theorem. Ths smple example serves to llustrate both the monotoncty of Algorthm I when Theorem 1 apples) and ts potental slow convergence. For φ 2, although Algorthm I can be mplemented just as easly, Theorem 1 does not apply, because the concavty condton 7) no longer holds. Indeed, Algorthm I wth λ = 1) s not monotonc, as s evdent from Fgure 1, the second row, where φ 2 = trm 2 w)) s plotted aganst teraton t. Ths shows the potental danger of usng Algorthm I when monotoncty s not guaranteed. It seems worthwhle to nvestgate modfcatons that may agan lead to monotonc convergence n such stuatons. We have focused on local optmalty crtera. An alternatve, Bayesan optmalty Chaloner and Larntz, 1989; Chaloner and Verdnell, 1995), seeks to maxmze the expected value of φmθ; w)) over a pror dstrbuton πθ). The notaton Mθ; w) emphaszes the dependence of the moment matrx on the parameter θ. It would be worthwhle to extend our strategy n Secton 3 to Bayesan optmalty, and we plan to report both theoretcal and emprcal evaluatons of such extensons n future work. Acknowledgement The author would lke to thank Professors Donald Rubn, Xao-L Meng, and Davd van Dyk for ntroducng hm to the feld of statstcal computng. Appendx A: Detals of Case ) We only verfy 6), snce 5) s easy. For ψ p M) = trm p ), p [ 1, 0), 6) holds because the functon fx) = x p, x [0, ), p [ 1, 0), s operator concave Bhata 1997, Chapter V). For 10

11 desgn space 1 desgn space 2 log det M log det M teraton teraton ph crteron p= 2 1.2e e e+00 desgn space ph crteron p= desgn space teraton teraton Fgure 1: Values of φ 0 = log det M row 1) and φ 2 = trm 2 ) row 2) for Algorthm I wth desgn spaces X 1 left) and X 2 rght). 11

12 ψ 0 M) = log det M, a smple proof of 6) s to choose an m m nvertble matrx U such that U M 1 U = Daga 1,..., a m ); U M 2 U = Dagb 1,..., b m ). Then 6) reduces to m m m α log a + 1 α) log b logαa + 1 α)b ), whch holds by Jensen s nequalty. =1 =1 =1 Appendx B: Proof of Theorem 3 Lemma 1. Any lmt pont w of w t) s a fxed pont of T,.e., T w = w. Proof. Let w t j ) be a subsequence convergng to w. By Theorem 1 and Condton d), Mw ) s postve defnte,.e., w Ω +. Moreover, snce both T and the functon φm )) are contnuous on Ω +, we have φmw )) = lm j φmw t j ) )) = lm j φmw t j +1) )) = φmt w )), where the two lmts are equal by monotoncty. From Condton b) we deduce that T w = w. Lemma 2. Suppose w s a lmt pont of w t), and defne S + = {w Ω + : w = 0 = w = 0},.e., S + collects all w that are absolutely contnuous wth respect to w and satsfy Mw) > 0. Then we have φmw )) = sup w S + φmw)); 22) ŵ S +, φmŵ)) = sup w S + φmw)) = Mŵ) = Mw ). 23) Proof. By Lemma 1, w s a fxed pont of T. That s, w 0 = trφ Mw ))A ) = trφ Mw ))Mw )). By Theorem 2, w maxmzes φmw)) on S +, and 22) s proved. The mplcaton 23) holds because we assume φ ) s strctly concave Condton c)). Lemma 3. The sequence Mw t) ) has fntely many lmt ponts. Proof. Snce M ) s contnuous, any lmt pont of Mw t) ) s of the form Mw ) for some lmt pont w of w t). By Lemma 2, Mw ) s the unque maxmzer of φm) among all M > 0 that can be wrtten as M = Mw) wth w w. Dependng on whch coordnates of w are zero, there are fewer than 2 n such degenerate maxmzers. Lemma 4. The lmt M lm t Mw t) ) exsts. 12

13 Proof. Assume Mw t) ) has L < lmt ponts, and let B, = 1,..., L, be non-ntersectng balls neghborhoods) centered on these. We know L 1 because Mw t) ) s bounded; the choce of a metrc s mmateral. Agan by boundedness, for large enough t, each Mw t) ) belongs to exactly one of B. Assume L 2,.e., Mw t) ) does not converge. Then there exsts a subsequence such that Mw t j ) ), Mw t j +1) ) always belong to dfferent B. By passng through a sub-subsequence f necessary, we may assume w t j ) w ; by Lemma 1, w t j +1) w. It follows that Mw t j ) ) Mw ) and Mw t j +1) ) Mw ), whch contradcts the assumpton of dstnct neghborhoods. Lemma 5. The lmt M as defned n Lemma 4 satsfes φm ) = sup w Ω + φmw)). Proof. Let us check the condtons of the general equvalence theorem,.e., trφ M )A ) trφ M )M ), = 1,..., n. Suppose ths fals for = 1, say, then by Lemma 4 there exsts δ > 0 such that for suffcently large t, we have ) ) tr φ Mw t) ))A 1 > 1 + δ)tr φ Mw t) ))Mw t) ). It follows from the defnton of T that w t+1) 1 w t) 1 > 1 + δ) λ n =1 w d ) λ n =1 w d λ, 24) where d = tr φ Mw t) ) ))A. However, the rght hand sde of 24) s at least 1 + δ) λ due to Jensen s nequalty. That s, w t+1) 1 > 1 + δ) λ w t) 1 for all t large enough, whch contradcts the obvous constrant 0 < w t) 1 1. Theorem 3 then follows from Lemma 5. References [1] C.L. Atwood, Sequences convergng to D-optmal desgns of experments, Ann. Statst ) pp [2] R. Bhata, Matrx Analyss 1997) Sprnger. [3] K. Chaloner and K. Larntz, Optmal Bayesan desgn appled to logstc regresson experments, J. Statst. Plann. Inference ) pp [4] K. Chaloner and I. Verdnell, Bayesan expermental desgn: A revew, Statst. Sc ) pp [5] H. Chernoff, Locally optmal desgn for estmatng parameters, Ann. Math. Statst ) pp

14 [6] I. Csszár and G. Tusnady, Informaton geometry and alternatng mnmzaton procedures, Statstcs & Decsons, Supplement Issue ) pp [7] A.P. Dempster, N.M. Lard and D.B. Rubn, Maxmum lkelhood from ncomplete data va the EM algorthm wth dscusson), J. Roy. Statst. Soc. B ) pp [8] H. Dette, A. Pepelyshev and A. Zhgljavsky, Improvng updatng rules n multplcatve algorthms for computng D-optmal desgns, Computatonal Statstcs & Data Analyss ) pp [9] V.V. Fedorov, Theory of Optmal Experments 1972) Academc Press, New York. [10] J. Fellman, On the allocaton of lnear observatons Thess), Comment. Phys.-Math ) pp [11] J. Fellman, An emprcal study of a class of teratve searches for optmal desgns, J. Statst. Plannng Infer ) pp [12] R. Harman and L. Pronzato, Improvements on removng nonoptmal support ponts n D-optmum desgn algorthms, Statst. Probab. Lett ) pp [13] R. Harman and M. Trnovská, Approxmate D-optmal desgns of experments on the convex hull of a fnte set of nformaton matrces, Accepted, Mathematca Slovaca. [14] J. Kefer, General equvalence theory for optmum desgns approxmate theory), Ann. Statst ) [15] J. Kefer and J. Wolfowtz, The equvalence of two extremum problems, Canad. J. Math ) pp [16] G. L and D. Majumdar, D-optmal desgns for logstc models wth three and four parameters, Journal of Statstcal Plannng and Inference ) pp [17] X.-L. Meng and D. van Dyk, The EM algorthm an old folk-song sung to a fast new tune wth dscusson), J. Roy. Statst. Soc. B ) pp [18] A. Pázman, Foundatons of Optmum Expermental Desgn, Redel, Dordrecht 1986). [19] F. Pukelshem, Optmal Desgn of Experments, John Wley & Sons Inc, New York 1993). [20] F. Pukelshem and B. Torsney, Optmal weghts for expermental desgns on lnearly ndependent support ponts, Ann. Statst ) pp [21] S.D. Slvey, Optmal Desgn, Chapman & Hall, London 1980). [22] S.D. Slvey, D.M. Ttterngton and B. Torsney, An algorthm for optmal desgns on a fnte desgn space, Commun. Stat. Theory Methods ) pp [23] D.M. Ttterngton, Algorthms for computng D-optmal desgn on fnte desgn spaces. In Proc. of the 1976 Conf. on Informaton Scence and Systems, John Hopkns Unversty, ) pp

15 [24] D.M. Ttterngton, Estmaton of correlaton coeffcents by ellpsodal trmmng, Appl. Stat ) pp [25] B. Torsney, A moment nequalty and monotoncty of an algorthm. In: Kortanek, K.O. and Facco, A.V. Eds.), Proceedngs of the Internatonal Symposum on Sem-Infnte Programmng and Appl., Lecture Notes n Economcs and Mathematcal Systems, 215. Unversty of Texas at Austn 1983) pp [26] B. Torsney, W-teratons and rpples therefrom, In: Pronzato, L., Zhgljavsky, A. Eds.), Optmal Desgn and Related Areas n Optmzaton and Statstcs, Sprnger-Verlag, N.Y 2007) pp [27] B. Torsney and S. Mandal, Two classes of multplcatve algorthms for constructng optmzng dstrbutons, Computatonal Statstcs & Data Analyss ) pp [28] P. Whttle, Some general ponts n the theory of optmal expermental desgn. J. R. Statst. Soc. B ) [29] H.P. Wynn, Results n the theory and constructon of D-optmum expermental desgns, J. Roy. Statst. Soc. Ser. B ) pp [30] Y. Yu, A bt of nformaton theory, and the data augmentaton algorthm converges, IEEE Trans. Inform. Theory ) pp

Approximate D-optimal designs of experiments on the convex hull of a finite set of information matrices

Approximate D-optimal designs of experiments on the convex hull of a finite set of information matrices Approxmate D-optmal desgns of experments on the convex hull of a fnte set of nformaton matrces Radoslav Harman, Mára Trnovská Department of Appled Mathematcs and Statstcs Faculty of Mathematcs, Physcs