On construction of constrained optimum designs

On construction of constrained optimum designs Institute of Control and Computation Engineering University of Zielona Góra, Poland DEMA2008, Cambridge, 15 August 2008

Numerical algorithms to construct optimal designs 1 Sequential algorithms with selection of support points: Wynn-Fedorov scheme (Atkinson, Donev and Tobias, 2007; Fedorov and Hackl, 1997; Walter and Pronzato, 1997; Pázman, 1986; Silvey, 1980). 2 Sequential numerical design algorithms with support points given a priori: multiplicative scheme (Torsney, 1988; Silvey, Titterington and Torsney, 1978, Torsney and Mandal, 2001; 2004; Pázman, 1986), linear matrix inequalities (Boyd and Vandenberghe, 2004). In practice, various inequality constraints must be sometimes considered due to cost limitations, required restrictions for achieving certain robustness properties, or restrictions on the experimental space. Although much work has been done in theory (Fedorov and Hackl, 1997; Cook and Fedorov, 1995), publications on the algorithmic aspects of constrained optimization are still scarce.

Classical framework Multiresponse parametric model y ij = η(x i, θ) + ε ij, { j = 1,..., ri i = 1,..., n Notation: y ij observations of response variables x i fixed values of explanatory (or independent) variables (e.g., time, temperature, spatial location, drug doses, etc.) r i > 1 no. of replications for setting x i, N = n i=1 r i η(, ) known regression function θ vector of constant but unknown parameters

Classical framework Additive random errors Notation: E(ε ij ) = 0 E(ε ij ε T kl ) = δ ijδ kl V (x i ) V (x i ) 0 dispersion matrices are (known, possibly up to a common constant multiplier) δ ij the Kronecker delta

Simplification for linear models Linear regression η(x i, θ) = F (x i ) T θ Notation: F (x i ) known matrices BLUE of θ n θ = M 1 r i F (x i )V (x i ) 1 ȳ i, i=1 Notation: ȳ i = 1 r i ri j=1 y ij M i = F (x i )V (x i ) 1 F (x i ) T, i = 1,..., n. M = n i=1 r i M i Fisher information matrix

Linear models (ctd ) Covariance matrix of θ cov( θ) = M 1 We assume that the values of x i, i = 1,..., n are fixed and may not be altered, but we have full control over the corresponding numbers of replications r i, i = 1,..., n. We wish to choose them in an optimal way to enhance the process of estimating θ.

Convenient formulation Discrete design { } x ξ = 1,..., x n p 1,..., p n Notation: x i support points p i = r i /N weights P.m.f. property of weights 1 T p = 1, p 0 Notation: 1 = (1, 1,..., 1)

Optimality criterion Normalized FIM D-optimality criterion M(p) = 1 n N M = p i M i i=1 Φ[ M(p)] = log det( M(p)) max Further, for simplicity of notation, the tilde over M( ) will be dropped.

Problems involved Problem 1. The resulting optimization problem constitutes a classical discrete resource allocation problem. Its combinatorial nature excludes calculus techniques and implies prohibitive computational complexity. Way round: Relaxation Feasible weights p i s are considered as any real numbers in the interval [0, 1] which sum up to unity, and not necessarily integer multiples of 1/N. Advantage A simple and efficient multiplicative algorithm can be exploited (cf. the previous talk by Ben Torsney).

Problems involved Problem 2. The produced designs concentrate at a relatively small number of support points (close to the number of the estimated parameters), rather than spreading the measurement effort around appropriately, which many practicing statisticians tend to do. Solution Prevent spending the overall experimental effort at few points by directly bounding the frequencies of observations from above: p b where b 1 is fixed.

Problem statement once again Ultimate formulation Given a vector b 0 satisfying 1 T b 1 find a vector of weights p = (p 1,..., p n ) to maximize Φ[M(p)] = log det ( M(p) ) subject to 0 p b 1 T p = 1

Properties 1 The performance index Φ is concave over the canonical simplex S n = { p 0 1 T p = 1 } 2 It is differentiable at points yielding nonsingular FIMs, with φ(p) := Φ(p) = [tr { M(p) 1 } { M 1,..., tr M(p) 1 } ] T M n 3 The constraint set P is a rather nice convex set (e.g., fast algorithms of orthogonal projection on P exist). Numerous computational methods can potentially be employed, e.g., the conditional gradient method or a gradient projection method. But, if the number of the support points is large, they may lead to unsatisfactory long computational times.

Characterization of the optimal design Proposition 1 Suppose that the matrix M(p ) is nonsingular for some p P. The vector p constitutes a global maximum of Φ over P if, and only if, there exists a number λ such that for i = 1,..., n. λ if p φ i (p i = b i ) = λ if 0 < pi < b i λ if pi = 0

Simplicial decomposition Simplicial Decomposition (SD) stands for a class of methods for solving large-scale continuous problems in mathematical programming with convex feasible sets (von Hohenbalken, 1977). It iterates by alternately solving 1 a linear programming subproblem (the so-called column generation problem) which generates an extreme point of the polyhedron, and 2 a nonlinear restricted master problem (RMP) which finds the maximum of the objective function over the convex hull (a simplex) of previously defined extreme points. Its principal characteristic is that the sequence of successive solutions to the master problem tends to a solution to the original problem in such a way that the objective function strictly monotonically approaches its optimal value.

Simplicial decomposition 3 p2 2 1 K3 K2 K1 0 1 2 3 p1 K1 K2 K3

Algorithm SD Step 0: (Initialization) Guess an initial solution p (0) P such that M(p (0) ) is nonsingular. Set I = { 1,..., n }, Q (0) = { p (0)} and k = 0. Step 1: (Termination check) Set If I (k) ub I (k) im I (k) lb = { i I p (k) i = b i }, = { i I 0 < p (k) i < b i }, = { i I p (k) i = 0 }. λ if i I (k) ub, φ i (p (k) ) = λ if i I (k) im, λ if i I (k) lb for some λ R +, then STOP and p (k) is optimal.

Step 2: (Solution of the column generation subproblem) Compute q (k+1) = arg max p P φ(p(k) ) T p and set Q (k+1) = Q (k) { q (k+1)}. Step 3: (Solution of the restricted master subproblem) Find p (k+1) = arg max M(p) p co(q (k+1) ) and purge Q (k+1) of all extreme points with zero weights in the resulting expression of p (k+1) as a convex combination of elements in Q (k+1). Increment k by one and go back to Step 1.

Column generation problem Basically, it is a linear programming problem: maximize subject to c T p p P where c = φ(p (k) ). A vector q P constitutes its global solution if, and only if, there exists a scalar ρ such that for i = 1,..., n. ρ if q i = b i c i = ρ if 0 < q i < b i ρ if q i = 0

Solution of the column generation problem Step 0: (Initialization) Set j = 0 and v (0) = 0. Step 1: (Sorting) Sort the elements of c in nonincreasing order, i.e., find a permutation π on the index set I = { 1,..., n } such that c π(i) c π(i+1), i = 1,..., n 1 Step 2: (Identification of nonzero weights) Step 2.1: If v (j) + b π(j+1) < 1 then set v (j+1) = v (j) + b π(j+1). Otherwise, go to Step 3. Step 2.2: Increment j by one and go to Step 2.1.

Solution of the column generation problem Step 3: (Form the ultimate solution) Set b π(i) for i = 1,..., j, q π(i) = 1 v (j) for i = j + 1, 0 for i = j + 2,..., n. The algorithm starts by picking the consecutive largest components c i of c and setting the corresponding weights q i as their maximal allowable values b i. The process is repeated until the sum of the assigned weights exceeds one. Then the value of the last weight which was set in this manner should be corrected so that the weights sum up to one. The remaining (i.e., unassigned) weights are set as zeros.

Solution of the restricted master problem Suppose that in the (k + 1)-th iteration of SD, we have Q (k+1) = { q 1,..., q r } possibly with r < k + 1 (owing to the deletion mechanism of uninformative points). Step 3 of Algorithm SD involves maximization of Φ[M(p)] = log det ( M(p ) over co(q (k+1) ) = p = r w j q j w 0, 1 T w = 1 j=1

Solution of the restricted master problem From the representation of any p co(q (k+1) ) as p = or, in component-wise form, r w j q j j=1 r p i = w j q j,i, j=1 i = 1,..., n q j,i being the i-th component of q j, it follows that n r ( n ) r M(p) = p i M i = w j q j,i M i = w j M(q j ) i=1 j=1 i=1 j=1

Solution of the restricted master problem Equivalent formulation of the RMP Find the sequence of weights w R r to maximize subject to the constraints Notation: H(w) = r j=1 w j H j H j = M(q j ) Ψ(w) = log det ( H(w) ) 1 T w = 1 w 0

Proposition 2 Suppose that the matrix H(w ) is nonsingular for some w S r. The vector w constitutes a global solution to the RMP if and only if for each j = 1,..., r, where { = m if w ψ j (w ) j > 0 m if wj = 0 ψ j (w) = tr [ H(w) 1 H j ], j = 1,..., r

Multiplicative algorithm for the RMP Step 0: (Initialization) Select a weight vector w (0) S r R r ++, e.g., set w (0) = (1/r)1. Set l = 0. Step 1: (Termination check) If 1 m ψ(w (l) ) 1 then STOP. Step 2: (Multiplicative update) Evaluate w (l+1) = 1 m ψ(w (l) ) w (l) Increment l by one and go to Step 1.

Numerical example Consider a batch reactor initially loaded with an aqueous solution of component A. In the presence of a solid catalyst, this reacts to form components B and C according to the consecutive reaction scheme A B C. The time changes in the concentrations [A], [B] and [C] are governed by d[a] = k 1 [A] γ 1, [A] t=0 = 1 dt d[b] = k 1 [A] γ 1 k 2 [B] γ 2, [B] t=0 = 0 dt d[c] = k 2 [B] γ 2, [C] t=0 = 0 dt where k 1 and k 2 are the rates and γ 1 and γ 2 are the orders of the reactions. Usually, the coefficients k 1, k 2, γ 1 and γ 2 are not known in advance.

Numerical example We set Moreover, x i = t i, i = 1,..., n θ = (k 1, k 2, γ 1, γ 2 ) η(t, θ) = ([A](t; θ), [B](t; θ), [C](t; θ)) θ 0 = (0.7, 0.2, 1.1, 1.5) V (t i ) = I 3, i = 1,..., n F (t i ) T = η θ (t i, θ 0 ), i = 1,..., n Consider n = 100 potential support points evenly distributed on the time interval [0, 20].

Responses and designs: p 0.35 1 1 0.9 0.8 0.7 responses 0.6 0.5 0.4 [A](t) [B](t) [C](t) 0.3 0.2 0.1 0 0 5 10 15 20 time

Responses and designs: p 0.15 1 1 0.9 0.8 0.7 responses 0.6 0.5 0.4 [A](t) [B](t) [C](t) 0.3 0.2 0.1 0 0 5 10 15 20 time

Responses and designs: p 0.05 1 1 0.9 0.8 0.7 responses 0.6 0.5 0.4 [A](t) [B](t) [C](t) 0.3 0.2 0.1 0 0 5 10 15 20 time

Variance function: p 0.35 1 variance function 5 4 3 2 0.7 0.65 0.6 0.55 0.5 0.45 0.35 0.25 0.2 1 0.15 0.1 0.05 0 0 5 10 15 20 0 time 0.4 0.3 weights

Variance function: p 0.15 1 5 0.3 4 0.25 variance function 3 2 0.2 0.15 0.1 weights 1 0.05 0 0 5 10 15 20 0 time

Variance function: p 0.05 1 5 0.1 4 variance function 3 2 0.05 weights 1 0 0 5 10 15 20 0 time

Convergence: p 0.35 1 x 10 5 2.4 2.2 2 1.8 det( M( p (k) )) 1.6 1.4 1.2 1 0.8 0.6 0 1 2 3 4 5 6 k

Convergence: p 0.15 1 2.4 x 10 5 2.2 2 1.8 det( M( p (k) )) 1.6 1.4 1.2 1 0.8 0.6 0 1 2 3 4 5 6 7 8 9 k

Convergence: p 0.05 1 x 10 5 2 det( M( p (k) )) 1.5 1 0.5 0 2 4 6 8 10 12 k

Conclusions A simple algorithm was developed for constructing constrained D-optimum designs on finite design spaces. Extensive numerical experiments demonstrate that it can outperform approaches based on the use of sophisticated general-purpose nonlinear programming solvers. Its unquestionable advantage is the simplicity of implementation which does not require any additional numerical routines, nor painstaking programming efforts. A refinement: Restricted simplicial decomposition based on the observation that a particular feasible solution, such as the optimal one, can be represented as the convex combination of an often much smaller number of extreme points than that implied by Carathéodory s Theorem (Hearn et al., 1985; 1997; Ventura and Hearn, 1993).

Conclusions Apart from that, some improvements aimed at removing nonoptimal support points proposed by Luc Pronzato can be incorporated in the restricted master problem to speed up its solution. The method can be incorporated into to find upper bounds to the maximum value of the objective function in the design of a monitoring network for parameter estimation of systems described by partial differential equations. Using this technique in conjunction with the branch-and-bound method, it was then possible to select hundreds of gaged sites from among thousands of admissible sites within no more than five minutes on a low-cost PC (Uciński and Patan, 2007).

Conclusions Although the interest here was on constructing D-optimum designs under bound constraints, the same simplicial decomposition technique can be applied to other smooth optimality criteria, e.g., the A-optimality one and other linear constraints on the design weights can be easily included. Efficient parallelization is possible via Parallel Variable Distribution (Ferris and Mangasarian, 1994; Solodov, 1998). Extension to continuous designs (Ermoliev et al., 1985; Higgins and Polak, 1990; Cook and Fedorov, 1995; Shapiro and Ahmed, 2004).