A new algorithm for deriving optimal designs

Size: px

Start display at page:

Download "A new algorithm for deriving optimal designs"

Luke Pope
6 years ago
Views:

1 A new algorithm for deriving optimal designs Stefanie Biedermann, University of Southampton, UK Joint work with Min Yang, University of Illinois at Chicago 18 October 2012, DAE, University of Georgia, Athens, GA 1 / 45

2 Outline 1 Introduction 2 Current algorithms 3 New algorithm 4 Examples Example I Example II Example III 5 Summary 2 / 45

3 Statistics and Optimal design of experiments The main goal of statistics is to gain information from collected data Design of experiments concerns the way of data collection An optimal/efficient design can reduce the sample size and cost needed for achieving a specified precision of estimation, or can improve the precision for a given sample size 3 / 45

4 Exact and approximate designs Exact design ξ n for sample size n: ξ n = { x1 x 2... x m n 1 /n n 2 /n... n m /n } ; n i integers, m n i = n i=1 Approximate design ξ: ξ = { } x1 x 2... x m ; ω ω 1 ω 2... ω i > 0, m m ω i = 1 Note: To run an approximate design, some rounding procedure is required i=1 4 / 45

5 Optimality criteria Suppose interest is in F(θ), a vector of functions of the model parameters The (asymptotic) covariance matrix of the MLE of F(θ) under design ξ, Σ ξ (F), can be written as Σ ξ (F) = F (θ) θ T (I ξ) F(θ) θ where I ξ is the information matrix of the design ξ Aim: Obtain precise estimates Minimise Φ(Σ ξ (F )), an optimality criterion 5 / 45

6 Optimality criteria Example 1: A D-optimal design minimises the determinant of the covariance matrix minimises the volume of a confidence ellipsoid for F(θ) Example 2: An A-optimal design minimises the trace of the covariance matrix minimises the sum of the variances of the elements of F(ˆθ) 6 / 45

7 Locally optimal designs A large and popular class of optimality criteria are the Φ p -criteria, of which A- and D-optimality are special cases: Φ p (Σ ξ (F)) = where 0 p < and Σ ξ (F ) IR v v [ ] 1/p 1 v trace(σ ξ(f )) p Problem: In many situations, Φ p -optimal designs depend on the unknown parameter vector θ locally optimal designs 7 / 45

8 Optimal multi-stage designs Use a sequential strategy: Suppose n 0 observations are already available, from design ξ n0, and another n observations can be made Estimate θ ˆθ 0 Design problem: Find ξ such that the combined design ξ = n 0 /(n 0 + n)ξ n0 + n/(n 0 + n)ξ minimises Φ(Σ ξ (F )) for ˆθ 0 Note: ξ and thus ξ are approximate designs 8 / 45

9 Aim of this project Provide a fast, general, convenient, and easy to use approach to find locally and multi-stage optimal designs 9 / 45

10 Outline 1 Introduction 2 Current algorithms 3 New algorithm 4 Examples Example I Example II Example III 5 Summary 10 / 45

11 Current algorithms Three well-known algorithms are: Vertex direction method (VDM; Fedorov, 1972) Multiplicative algorithm (MA; Silvey, Titterington and Torsney, 1978) Vertex exchange method (VEM; Böhning, 1986) Yu (2011) combines the best features of these algorithms with a nearest neighbour exchange step to improve speed Cocktail algorithm 11 / 45

12 Computing time comparison I Model: Y = θ 1 e θ 2x + θ 3 e θ 4x + ε Design space [0, 3] discretised to X = {3i/N, i = 1,..., N} Computing time (in seconds) (Table 1 of Yu, 2011) N = 20 N = 50 N = 100 N = 200 N = 500 MA VEM Cocktail (VDM is omitted since it is known to be slower than VEM) 12 / 45

13 Computing time comparison II Y = θ 1 + θ 2 r + θ 3 r 2 + θ 4 s + θ 5 rs + ε Design space [ 1, 1] [0, 1] discretised to X = {(r i, s j ), r i = 2i/k 1, i = 1,..., k, s j = j/k, j = 1,..., k}, N = k 2 Computing time (in seconds) (Table 4 of Yu, 2011) N = 20 2 N = 50 2 N = N = MA aborted aborted VEM aborted Cocktail / 45

14 Summary of current algorithms All algorithms are relatively easy to implement The Cocktail algorithm leads to dramatically improved speed, sometimes by several orders of magnitude, relative to MA, VDM, and VEM. The Cocktail algorithm is restricted to D-optimal design for the full parameter vector Restricted to (one-stage) locally optimal designs 14 / 45

15 Outline 1 Introduction 2 Current algorithms 3 New algorithm 4 Examples Example I Example II Example III 5 Summary 15 / 45

16 Goal of the new algorithm Any parameters (or functions thereof) can be of interest Any optimality criterion: Φ p -optimality (includes D-, A-, and E-optimality) One-stage and multi-stage Convenient to use Even faster 16 / 45

17 Rationale 1. The existing algorithms typically seek monotonic convergence May not be the fastest way to converge to the optimal weights 2. New idea: instead of seeking monotonic convergence, we derive the optimal weights directly, which can be done by solving a system of nonlinear equations 17 / 45

18 Notations for the new algorithm X = {x 1,..., x N }, the set of all possible design points S (t), the support after tth iteration ξ, design with support S (t) and optimal weights S (t) d(x, ξ ), sensitivity function of design ξ S (t) S (t) 18 / 45

19 The new algorithm Start with an initial design support, S (0). In the tth step: Derive ξ using the Newton-Raphson method S (t) Compute the sensitivity function d(x i, ξ ) for i = 1,..., N S (t) Select 1 i max N such that d(x imax, ξ S (t) ) = max i n N d(x i, ξ S (t) ) Stop when d(x imax, ξ S (t) ) ɛ, ɛ > 0 very small, otherwise S (t+1) = S (t) {x imax } 19 / 45

20 Convergence (I) Theorem 1: For a multi-stage design, let I ξ0 be non-singular. Then for any initial support S (0), the sequence of designs {ξ S (t) ; t 0} converges to an optimal design minimising Φ p (Σ ξ (F)) as t, provided the optimal weights are found in each updating step For a one-stage design, the above result holds if I ξs (0) and if the Jacobian of F is square and of full rank is non-singular, 20 / 45

21 Convergence (II) Theorem 2: For a multi-stage design, let p be an integer and I ξ0 be non-singular. For given support, Φ p (Σ ξ (F)) is minimised (w.r.t the weights) at any critical point in Ω = {w = (w 1,..., w m 1 ) T : w i 0, w w m 1 1} or its boundary. In addition, the Hessian of Φ p (Σ ξ (F )) is non-negative definite. Newton-Raphson method will converge provided the starting point is close enough to the critical point. Since Ω is compact, we can always find a critical point (given it exists) by using sufficiently many starting values 21 / 45

22 Outline 1 Introduction 2 Current algorithms 3 New algorithm 4 Examples Example I Example II Example III 5 Summary 22 / 45

23 Example I Set up Consider model Y = θ 1 e θ 2x + θ 3 e θ 4x + ε. Let θ = (θ 1, θ 2, θ 3, θ 4 ) and X = {3i/N, i = 1,..., N}. Assume that θ = (1, 1, 1, 2). The same set up as that of Table 1 of Yu (2011) Algorithms are coded using SAS IML and computed at a Dell laptop (2.2GHz and 8Gb Ram) 23 / 45

24 Example I Computing time (in seconds) comparison I Computing time (in seconds) for the D-optimal design for the full parameter vector θ N = 500 N = 1000 N = 5000 N = Cocktail New algorithm / 45

25 Example I Locally A- and D-optimal designs Computing time (in seconds) under the new algorithm D A θ (θ 1, θ 3 ) (θ 2, θ 4 ) θ (θ 1, θ 3 ) (θ 2, θ 4 ) N = N = N = N = / 45

26 Example I Multi-stage A- and D-optimal designs Suppose we have an initial design ξ 0 = {(0, 0.25), (1, 0.25), (2, 0.25), (3, 0.25)} with n 0 = 40 Aim: Allocate the next 80 observations Computing time (in seconds) under the new algorithm D A θ (θ 1, θ 3 ) (θ 2, θ 4 ) θ (θ 1, θ 3 ) (θ 2, θ 4 ) N = N = N = N = / 45

27 Example II Set up Consider model Y = θ 1 + θ 2 r + θ 3 r 2 + θ 4 s + θ 5 rs + ε. Let θ = (θ 1, θ 2, θ 3, θ 4, θ 5 ) and X = {(r i, s j ), r i = 2i/k 1, i = 1,..., k, s j = j/k, j = 1,..., k}. The same set up as that of table 4 of Yu (2011) Algorithms are coded using SAS IML and computed at a Dell laptop (2.2GHz and 8Gb Ram) 27 / 45

28 Example II Computing time (in seconds) comparison II D-optimal design for the full parameter vector θ N = 20 2 N = 50 2 N = N = N = Cocktail new algorithm / 45

29 Example III Set up Consider multinomial logit model with 3 categories, and 3 variables: Response Y = (Y 1, Y 2, Y 3 ) T at experimental condition x with Y 1 + Y 2 + Y 3 = 1 and π i (x) = P(Y i = 1 x) = where g(x) T = (1, x 1, x 2, x 3 ) e g(x)t θ i 1 + e g(x)t θ 1 + e g(x) T θ 2, i = 1, / 45

30 Example III Modification to algorithm (example III only) Design space X = {(6i/k, 6j/k, 6l/k), i, j, l = 0, 1,..., k} The number of design points in X rapidly increases as k increases (N k 3 ) Start with a coarse grid Identify the best design based on the grid at that stage For the next stage, a finer grid is restricted to neighbourhoods of the best support points found at the current stage Continue until a specified accuracy for the design points is reached Verify optimality through the equivalence theorem 30 / 45

31 Example III Computing time (in seconds) example III Locally optimal design Multi-stage optimal design θ θ θ θ N D A D A D A D A θ T = (θ T 1, θt 2 ) where θ 1 = (1, 1, 1, 2) and θ 2 = ( 1, 2, 1, 1). θ = (θ 11, θ 12, θ 13, θ 21, θ 22, θ 23 ) 31 / 45

32 Outline 1 Introduction 2 Current algorithms 3 New algorithm 4 Examples Example I Example II Example III 5 Summary 32 / 45

33 Summary The new algorithm is even faster than the Cocktail algorithm More importantly, the new algorithm is more widely applicable: Φ p-optimality Single- and Multi-stage designs interest in functions of the parameters Future work: Extend algorithm to find Bayesian and (hopefully) Minimax designs Compare its performance with further algorithms, e.g PSO 33 / 45

34 Thank You! 34 / 45

35 Selected references Böhning, D. (1986), A vertex-exchange-method in D-optimal design theory, Metrika, 33, Fedorov, V.V. (1972). Theory of optimal experiments (transl. and ed. by Studden WJ, Klimko EM), New York: Academic Press. Silvey, S.D., Titterington, D.M., and Torsney, B. (1978), An algorithm for optimal designs on a finite design space, Communications in Statistics - Theory and Methods, 14, Yang, M. and Biedermann, S. (2012). On optimal designs for nonlinear models: a general and efficient algorithm, Under revision. Yu, Y. (2011), D-optimal designs via a cocktail algorithm, Statistics and Computing, 21, / 45

36 36 / 45

37 VDM Starting with an initial design ξ (0) (uniform design over a set of support points). Let ξ (t) be the design with the corresponding weight vector ω (t). ξ (t+1) is updated through the following steps. Select 1 i max N such that d(i max, ξ (t) ) = max i n N d(i, ξ (t) ) Stop when d(i max, ξ (t) ) m + ɛ, otherwise { ω (t+1) (1 δ)ω (t) i = i, i i max (1 δ)ω (t) where δ = d(imax,ξ i + δ, i = i max (t) )/m 1 d(i max,ξ (t) ) / 45

38 MA Starting with an initial design ξ (0) (uniform design over a set of support points). Let ξ (t) be the design with the corresponding weight vector ω (t). ξ (t+1) is updated through the following steps. Stop when d(i max, ξ (t) ) m + ɛ, otherwise ω (t+1) i = ω (t) i d(i, ξ (t) )/m, where m is the dimension of I ξ. 38 / 45

39 VEM VEM (vertex exchange method) is close to VDM but performs better. Select i min and i max such that d(i min, ξ (t) ) = min{d(i, ξ (t) ) : w (t) i > 0}, and d(i max, ξ (t) ) = max i n N d(i, ξ (t) ) Stop when d(i max, ξ (t) ) m + ɛ, otherwise Set ω (t+1) = VE(i max, i min, ω (t) ). 39 / 45

40 Vertex exchanges (VE) VE(j, k, ξ (t) ) is to exchange the weight between x j and x k, where ω (t) ω (t+1) i, i j, k i = ω (t) i δ, i = j Here δ = min{ω j, max{ ω k, δ (j, k)}} ω (t) i + δ, i = k and δ (j, k) = d(k, ξ (t) ) d(j, ξ (t) ) ( 2 d(j, ξ (t) ) d(k, ξ (t) ) ( (f (x j ) T I 1 ξ (t) f (x k ) ) 2 ) 40 / 45

41 Nearest neighbour exchanges (NNE) Let i 1,..., i q+1 be the elements of {i : ω (t) i > 0} where q + 1 is the number of support points of ω (t). For each i j, j = 1,..., q, let ij be any index i {i j+1,..., i q+1 } such that x i x ij is minimized. Perform VE between x ij and x i j for j = 1,..., q in turn, i.e., ω (t+j/q)=ve(i j,i j,ξ (t+(j 1)/q) ) 41 / 45

42 42 / 45

43 Computing optimal weights Initial weight ω (t) 0 for S (t) can be updated through the following iteration: (a) ω (t) j = ω (t) j 1 α ( ) 1 2 Φ p(σ ξ0 +ξ(g)) Φp(Σ ξ0 (g)) +ξ ωω T (t) ω=ω ω (t) j 1 ω=ω j 1 (b) Check if there are non-positive components of ω (t) j (c) If answer of (b) is no, check whether Φp(Σ ξ 0 +ξ(g)) ω (t) ω=ω is less j than some pre-specified ɛ 1. If yes, ω (t) j holds the optimal weights. If no, start the next iteration. (d) If answer of (b) is yes, reduce α to α/2. Repeat (a) and (b) until α reaches a pre-specified value. Remove the support point with smallest weight, and repeat (a) and (b) 43 / 45

44 44 / 45

45 Sensitivity function: ( d(x i, ξ S ) = Tr[ I(x (t) i ) I ξ ( Σ ξ S (t) ) p 1 ( F (θ) For D-optimality, p = 0. θ T ) ( I ξ0 + ri ξ S ) (t) ( ) ] I + ri ξ0 ξ S (t) S (t) ) ( F(θ) θ T ) T 45 / 45

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a