Optimal Designs for 2 k Factorial Experiments with Binary Response

Size: px

Start display at page:

Download "Optimal Designs for 2 k Factorial Experiments with Binary Response"

Abner Foster
5 years ago
Views:

1 Optimal Designs for 2 k Factorial Experiments with Binary Response arxiv: v4 [math.st] 29 Mar 2013 Jie Yang 1, Abhyuday Mandal 2 and Dibyen Majumdar 1 1 University of Illinois at Chicago and 2 University of Georgia April 2, 2013 Abstract: We consider the problem of obtaining locally D-optimal designs for factorial experiments with binary response and k qualitative factors at two levels each. We obtain a characterization for a design to be locally D- optimal. Based on this characterization, we develop efficient numerical techniques to search for locally D-optimal designs. We also investigate the properties of fractional factorial designs and study the robustness with respect to the assumed parameter values of locally D-optimal designs. Using prior distributions on the parameters, we investigate EW D-optimal designs, that are designs which maximize the determinant of the expected information matrix. It turns out that these designs are much easier to find and still highly efficient compared to Bayes D-optimal designs, as well as quite robust. Key words and phrases: Generalized linear model, full factorial design, fractional factorial design, D-optimality, uniform design, EW D-optimal design 1 Introduction Our goal is to determine optimal and efficient designs for factorial experiments with qualitative factors and a binary response. If the response is continuous and a linear model is appropriate, the rich body of literature on traditional factorial designs may be used. On the other hand if the factors are quantitative, there is a wide-ranging, growing literature on optimal design theory that would provide solutions. For the specific experiments we study, however, there is not a wealth of literature available on the determination and properties of optimal and efficient designs. A motivating example is the odor removal study conducted by the textile engineers at the University of Georgia. The scientists study the manufacture of bio-plastics 1

2 from algae that contain odorous volatiles. These odorous volatiles, generated from algae bio-plastics, either occur naturally within the algae or are generated through the thermoplastic processing due to heat and pressure. In order to commercialize these algae bio-plastics, the odor causing volatiles must be removed. Static headspace microextraction and gas chromatography-mass spectroscopy is used to identify the odorous compounds and qualitatively compare the samples for odor removal. For that purpose, a study was conducted with a IV design, with five replicates, using algae and synthetic plastic resin blends. The four different factors were: types of algae, scavenger materials (adsorbents), synthetic resin and compatabilizers. Factor Levels + A algae raffinated or solvent extracted algae catfish pond algae B scavenger materials Aqua Tech activated carbon BYK-P 4200 purchased from BYK Additives Instruments C synthetic resin polyethylene polypropylene D compatabilizers absent present The samples were hand blended together and combined so that the total weight was 3.0g. Compression molding of algae blends formulations was performed on a 24-ton bench top press (Carver Model 3850, Washburn, IN). The samples were poured into stainless steel molds and compressed between two electrically heated platens for 20 minutes at 150 C. Then they were water-cooled while still under pressure for another 10 minutes. The samples were then removed and were available for odor testing. The test samples are placed in separate desiccators and stored at 37 C and 55% relative humidity for 24 hours and assessed by a panelists. The response was binary the samples were determined to be either non-detectable or unbearable. The design and the observations are given in Table 4. There are many scientific and industrial experiments with qualitative factors and a binary response. Nair et al. (2008) reported a fractional factorial experiment used in the study called Project Quit, a Center for Health Communications Research project on smoking cessation. They studied six qualitative factors each at two levels and the primary outcome measure was abstinence over a 7-day period assessed by the question, Did you smoke a tobacco cigarette in the past 7 days? The design is provided in Example

3 Factor Levels + A type of exposure a single, large set of materials multiple correspondences over several weeks B outcome expectation depth general feedback related individualized feedback and to motives for advice for quitting quitting C success story depth read a story tailored to read a story tailored to their their gender specific sociodemographic D efficacy expectation received advice on a broader range of barriers E source personalization level the low-depth group version included only a photograph from the study HMOs individual smoking cessation team F type of framing loss framing (negative effects of continued smoking) backgrounds received feedback and advice on the most significant barriers to quitting received personalized materials, including a photograph of and supportive text from the study HMOs individual smoking cessation team gain framing (positive aspects of quitting) There are also a class of experiments that include both qualitative and quantitative factors. For instance, Seo, Goldschmidt-Clermont and West (2007) discussed an experiment where the response is whether or not a particular symptom developed in mice. This study investigated three key environmental factors: (A) age (young, 6 weeks and old, 12 weeks), (B) gender (Male, Female) and (C) dietary fat intake (Chow diet and Western diet) along with a (D) key genetic factor related to the Apolipoprotein E gene pathway (Wild type and ApoE-/-). Other examples can be found in Miller and Sitter (2001), Grimshaw et al. (2001), Hamada and Nelder (1997), Vasandani and Govindaraj (1995), and Street and Burgess (2007). Determination of optimal and efficient designs for experiments with both qualitative and quantitative factors is part of our ongoing research project. Our preliminary results indicate that as long as the assumed parameter values are not far from zero, it is sufficient to consider optimal design for continuous quantitative factors that are supported on the boundary points, and hence the results of this paper will be of direct relevance. A special case is given in Result 7.1 in the Appendix. This is a complete-class result that effectively tells us that the theory for optimal and efficient designs with qualitative factors may be applied to these experiments. We also refer the reader to Yang, Zhang and Huang (2011) for related results. The special case of 2 2 experiments with qualitative factors and a binary response was studied by Yang, Mandal and Majumdar (2012). As mentioned there, the paper by Graßhoff and Schwabe (2008) has some relevant results for the k = 2 factor case. It turns out that the extension from k = 2 to the general case of k factors is substantial since the former does not have the complexities associated with determination, computation and robustness of optimal designs that is present in the general case. This 3

4 paper, therefore, is not a mere generalization of our earlier work. For the general case of 2 k experiments with binary response Dorta-Guerra, González-Dávila and Ginebra (2008) have obtained an expression for the D-criterion and studied several special cases. We will use a Generalized Linear Model (GLM) to describe our setup. The methods for analyzing data from GLMs have been discussed in depth in the literature (McCullagh and Nelder (1989), Agresti (2002), Lindsey (1997), McCulloch and Searle (2001), Dobson(2002) and Myers, Montgomery and Vining (2002)). Khuri, Mukherjee, Sinha and Ghosh (2006) provided a systematic study of the optimal design problem in the GLM setup and recently there has been an upsurge in research in both theory and computation of optimal designs. Russell et al. (2009), Yang and Stufken (2009), Yang et al. (2011), Stufken and Yang (2012) are some of the papers that developed theory and Woods et al. (2006), Dror and Steinberg (2006, 2008), Waterhouse et al. (2008), Woods and van de Ven (2011) focused on developing efficient numerical techniques for obtaining optimal designs under generalized linear models. We obtain theoretical results and algorithms for locally optimal designs for qualitative factors at two levels each and a binary response in the generalized linear model setup. Optimal designs are obtained using the D-optimality criterion that maximizes the determinant of the information matrix. In order to overcome the difficulty posed by the dependence of the design optimality criterion on the unknown parameters, we primarily use the local optimality approach of Chernoff(1953) in which the parameters are replaced by assumed values. Although we explore designs for full factorials, i.e., ones in which observations are taken at every possible level combination, when the number of factors is large, full factorials are practically infeasible. Hence the study of fractional factorial designs occupies a substantial part of the linear-model based design literature, and we too study these designs in our generalized linear model setup. A natural question that arises when we use local optimality is whether the optimal designs are robust. The study of robustness with respect to the assumed parameter values of locally D-optimal designs, therefore, is a focus of this study. Bayes optimality (Chaloner and Verdinelli, 1995) is an alternative approach to design optimality. For our problem, however, for large k (k 4) the computations quickly become too expensive and intractable. Hence as a surrogate criterion, we explore a D-optimality criterion with the information matrix replaced by its expectation under the prior. This is one of the suggested alternatives to formal Bayes optimality in Atkinson, Donev, and Tobias (2007). It has been used by Zayats and Steinberg (2010) for optimal designs for detection capability of networks. We call this EW optimality(e forexpectation, W forthenotationw i usedforinformationinindividual observation). Effectively this reduces to a locally optimal design with local values of the weight parameters replaced by their expectations. It turns out that the EW optimal designs have strong efficiency and robustness characteristics as local designs, and in addition, they are very good and easy-to-compute surrogates for Bayes optimal designs. Unless the experimenter has reliable assumed papameter values, we suggest 4

5 the use of EW optimal designs. Note that the use of surrogates of Bayes optimality has been recommended by authors including Woods et al. (2006), and Gotwalt et al (2009). Beyond theoretical results, the question that may be asked is whether these results give the user any advantage in real experiments. Would the experimenter gain much by using these results instead of designs that are optimal for the linear model, for instance, uniform complete designs or regular fractions. After all, for 2 2 experiments, Yang et al. (2012) concluded that the uniform design (i.e., full factorial) had many desirable properties. In the more complex setup of 2 k experiments it turns out that the answer to the question posed above is a definite yes. In most situations we gain considerably by taking advantage of the results of this paper instead of using standard linear-model results. Unlike the linear model case, all regular fractions are not equivalent. Indeed, if we have some knowledge of the parameters, we will be able to identify an efficient fractional factorial design, which will often not be a regular fraction. This paper is organized as follows. In Section 2 we describe the preliminary setup. In Section 3 we provide several results for locally D-optimal designs, including the uniqueness of the D-optimal designs, characterization for a design to be locally D- optimal, the concept of EW D-optimal designs, and algorithms for searching D- optimal designs. In Section 4 we discuss the properties of fractional factorial designs. We address the robustness of D-optimal designs in Section 5 and revisit some examples in Section 6. Some concluding remarks and topics of future research are discussed in Section 7. Additional results, proofs and some details on the algorithms are relegated to the Appendix and the Supplementary Materials of this paper. 2 Preliminary Setup Consider a2 k experiment withbinaryresponse, i.e., anexperiment withkexplanatory variables at 2 levels each. Suppose n i units are allocated to the ith experimental condition such that n i 0, i = 1,...,2 k, and n 1 + +n 2 k = n. We suppose that n is fixed and the problem is to determine the optimal n i s. In fact, we write our optimality criterion in terms of the proportions: p i = n i /n, i = 1,...,2 k and determine the optimal p i s over real values. Since n i s are integers, an optimal design obtained in this fashion may not always be feasible. In Section we will consider the design problem over integer n i s. We will use a generalized linear model setup to describe the model. Suppose η is a linear predictor that involves the main effects and interactions which are assumed to be in the model. For instance, for a 2 3 experiment with a model that includes the 5

6 main effects and the two-factor interaction of factors 1 and 2, η = β 0 +β 1 x 1 +β 2 x 2 + β 3 x 3 + β 12 x 1 x 2, where each x i { 1,1}. The aim of the experiment is to obtain inferences about the parameter vector of factor effects β = (β 0,β 1,β 2,β 3,β 12 ). In the frameworkofgeneralized linearmodels, theexpectationoftheresponsey, E(Y) = π, is connected to the linear predictor η by the link function g: η = g(π) (McCullagh and Nelder (1989)). For a binary response, the commonly used link functions are logit, probit, log-log, and complimentary log-log links. The maximum likelihood estimator of β has an asymptotic covariance matrix(mccullagh and Nelder, 1989; Khuri, Mukherjee, Sinha and Ghosh, 2006) that is the inverse ( ) 2/(πi of nx dπ WX, where W = diag{w 1 p 1,...,w 2 kp 2 k}, w i = i dη i (1 π i )) 0, η i and π i correspond to the ith experimental condition for η and π, and X is the design matrix. For example, for a 2 3 experiment with model η = β 0 +β 1 x 1 +β 2 x 2 + β 3 x 3 +β 12 x 1 x 2, X = (1) The n i s determine how many observations are made at each experimental condition, which are rows of X. A D-optimal design maximizing X WX depends on the w i s, which in turn depend on the regression parameters β and the link function g. In this paper, we discuss D-optimal designs in terms of w i s so that our results do not depend on specific link functions. 3 Locally D-Optimal Designs In this section, we consider locally D-optimal designs for the full factorial 2 k experiment. The goal is to find an optimal p = (p 1,p 2,...,p 2 k) which maximizes f(p) := X WX for specified values of w i 0,i = 1,...,2 k. Here p i 0, i = 1,...,2 k and 2 k i=1 p i = 1. It is easy to see that there always exists a D-optimal allocation p since the set of all feasible allocations is bounded and closed. On the other hand, the uniqueness of D-optimal designs is usually not guaranteed (Section 7). 6

7 3.1 Characterization of locally D-optimal designs Suppose the parameters (main effects and interactions) are β = (β 0,β 1,...,β d ), where d k. The following lemma expresses our objective function as an order- (d+1) homogeneous polynomial of p 1,...,p 2 k. Lemma 3.1 Let X[i 1,i 2,...,i d+1 ] be the (d + 1) (d + 1) sub-matrix consisting of the i 1 th, i 2 th,..., i d+1 th rows of the design matrix X. Then f(p) = X WX = X[i 1,i 2,...,i d+1 ] 2 p i1 w i1 p i2 w i2 p id+1 w id+1. 1 i 1 <i 2 < <i d+1 2 k González-Dávila, Dorta-Guerra and Ginebra (2007, Proposition 2.1) obtained essentially the same result. This can also be proved directly using the results from Rao (1973, Chapter 1). From Lemma 3.1 it is immediate that at least (d+1) w i s, as well as the corresponding p i s, have to be positive for the determinant f(p) to be nonzero. This implies that if p is D-optimal, then p i < 1 for each i. Theorem 3.1 below gives a sharper bound, p i 1 for each i = d+1 1,...,2k, for the optimal allocation. Let us define for each i = 1,...,2 k, ( 1 z f i (z) = f p 1,..., 1 z p i 1,z, 1 z p i+1,..., 1 z ) p 1 p i 1 p i 1 p i 1 p 2 k, 0 z 1. (2) i Note that f i (z) is well defined for all p of interest. Theorem 3.1 Suppose f (p) > 0. Then p is D-optimal if and only if for each i = 1,...,2 k, one of the two conditions below is satisfied: (i) p i = 0 and f i ( 1 2 ) d+2 2 d+1 f(p); (ii) 0 < p i 1 d+1 and f i(0) = 1 p i(d+1) (1 p i ) d+1 f(p). Designs that are supported on (d + 1) points are attractive in many experiments because they require a minimum number of settings. In our context, a design p = (p 1,...,p 2 k) is called minimally supported if it contains exactly (d+1) nonzero p i s. Based on Lemma 3.1, a minimally supported design p with p i1 > 0,...,p id+1 > 0 is D-optimal if and only if p i1 = = p id+1 = 1. Yang et al. (2012) found a d+1 necessary and sufficient condition for a minimally supported design to be D-optimal for 2 2 main-effects model. With the aid of Theorem 3.1, we provide a generalization for 2 k designs in the next theorem. Note that w i > 0 for each i for the commonly used link functions including logit, probit, and (complementary) log-log. 7

8 Theorem 3.2 Assume w i > 0, i = 1,...,2 k. Let I = {i 1,...,i d+1 } {1,...,2 k } be an index set satisfying X[i 1,..., i d+1 ] 0. Then the minimally supported design satisfying p i1 = p i2 = = p id+1 = 1 is D-optimal if and only if for each i / I, d+1 j I X[{i} I\{j}] 2 w j X[i 1,i 2,...,i d+1 ] 2 w i. For example, under the 2 2 main-effects model, since X[i 1,i 2,i 3 ] 2 is constant across all choices of i 1,i 2,i 3, p 1 = p 2 = p 3 = 1/3 is D-optimal if and only if v 1 +v 2 +v 3 v 4, where v i = 1/w i, i = 1,2,3,4. This gives us Theorem 1 of Yang, Mandal and Majumdar (2012). For the 2 3 main-effects model, using the order of rows as in (1), the standard regular fractional factorial design p 1 = p 4 = p 6 = p 7 = 1/4 is D- optimal if and only if v 1 +v 4 +v 6 +v 7 4min{v 2,v 3,v 5,v 8 }, and the other standard regular fractional design p 2 = p 3 = p 5 = p 8 = 1/4 is D-optimal if and only if v 2 +v 3 +v 5 +v 8 4min{v 1,v 4,v 6,v 7 }. Remark 1 In order to characterize the uniqueness of the optimal allocation, we define a matrix X w = [1, w 1, w γ 2,..., w γ s ], where 1,γ 2,...,γ s are all distinct pairwise Schur products (or entrywise product) of the columns of the design matrix X, w = (w 1,...,w 2 k), and indicates Schur product. It can be seen that any two feasible allocations generate the same matrix X WX as long as their difference belongs to the null space of X w. If rank(x w ) < 2 k, any criterion based on X WX yields not just a single solution but an affine set of solutions with dimension 2 k rank(x w ). If rank(x w ) = 2 k, the D-optimal allocation p is unique. For example, for a 2 3 design the model consisting of all main effects and one two-factor interaction, or for a 2 4 design the model consisting of all main effects, all two-factor interactions, and one three-factor interaction, the D-optimal allocations are unique. 3.2 EW D-optimal designs Since locally D-optimal designs depend on w i s, they require assumed values of w i s, or β i s, as input. In Section 5, we will examine the robustness of D-optimal designs to mis-specification of β i s. An alternate to local optimality is Bayes optimality (Chaloner and Verdinelli, 1995). In our setup, a Bayes D-optimal design maximizes E(log X WX ) where the expectation is taken over the prior on β i s. One difficulty of Bayes optimality is that it is computationally expensive. In order to reduce the computational time we will explore an alternative suggested by Atkinson, Donev and Tobias (2007) where W in the Bayes criterion is replaced by its expectation. We will call this EW (expectation of W) optimality. Definition: An EW D-optimal design is an optimal allocation of p that maximizes X E(W)X. 8

9 Note that EW optimality may be viewed as local optimality with w i s replaced by their expectation. All of the existence and uniqueness properties of locally D-optimal design apply. Since w i > 0 for all β under typical link functions, E(w i ) > 0 for each i. Note that, by Jensen s inequality, E(log X WX ) log X E(W)X since log X WX isconcave inw. ThusanEWD-optimaldesignmaximizes anupper bound to the Bayesian D-optimality criterion. We will show that EW-optimal designs are often almost as efficient as Bayes-optimal ones with respect to the Bayes optimality criterion, while realizing considerable savings in computation time. Furthermore, EW-optimal designs are highly robust in terms of maximum relative efficiencies (Section 5). Note that given link function g, w i = ν(η i ) = ν(x i β), i = 1,...,2 k, where ν = ( (g 1 ) ) 2 /[g 1 (1 g 1 )], x i is the ith row of the design matrix, and β = (β 0,β 1,..., β d ). Supposetheregression coefficients, β 0,β 1,...,β d areindependent, andβ 1,...,β d all have a symmetric distribution about 0(not necessarily the same distribution), then all the w i,i = 1,2,...,2 k have the same distribution and the uniform design is an EW D-optimal design for any given link function. (By uniform design we mean designs with uniform allocation on its support points. For example, for a design that is supported on all 2 k points, the uniform design will have p 1 =... = p 2 k = 2 k.) On the other hand, in many experiments we may be able to assume that the slope is non-decreasing (if β i 0, then we can assume β i 0 by re-defining x i ). If β i [0,β iu ] for each i, the uniform design will not be EW D-optimal in general, as illustrated in the following examples. Example 3.1 Consider a 2 2 experiment with main-effects model. Suppose the experimenter has the following prior information for the parameters: β 0,β 1,β 2 are independent, β 0 U[ 1,1], and β 1,β 2 U[0,1]. Under the logit link, E(w 1 ) = E(w 4 ) = 0.187, E(w 2 ) = E(w 3 ) = 0.224, and the EW D-optimal design is p e = (0.239, 0.261, 0.261,0.239). TheBayesD-optimaldesign, which maximizesφ(p)=e(log X WX ), is p o = (0.235, 0.265, 0.265, 0.235). The relative efficiency of p e with respect to p o is { } { } φ(pe ) φ(p o ) ( ) exp 100% = exp 100% = 99.99%. d The computational time cost for the EW optimal design is 0.11 sec, while it is 5.45 secs for the Bayes-optimal design. It should also be noted that the relative efficiency of the uniform design p u = (1/4,1/4,1/4,1/4) with respect to p o is 99.88% for logit link, and is 89.6% for complementary log-log link (EW design s is %, see Section 7). 9

10 Example 3.2 Consider2 3 experimentwithmain-effectsmodel. Supposeβ 0,β 1,β 2,β 3 are independent, β 0 U[ 3,3], and β 1,β 2,β 3 U[0,3]. Then E(w 1 ) = E(w 8 ) = 0.042, E(w 2 ) = E(w 3 ) = = E(w 7 ) = Under the logit link the EW D- optimal design is given by p e = (0,1/6,1/6,1/6, 1/6, 1/6, 1/6, 0), and the Bayesian D-optimal design is p o = (0.004,0.165,0.166,0.165,0.165,0.166,0.165,0.004). The relative efficiency of p e with respect to p o is 99.98%, while the relative efficiency of the uniform design is 94.39%. It is remarkable that it takes 2.39 seconds to find an EW solution while it takes seconds to find a Bayes solution. The difference in computational time is even more prominent for 2 4 case (24 seconds versus 3147 seconds). 3.3 Algorithms to search for locally D-optimal allocation In this section, we develop efficient algorithms to search for locally D-optimal allocations with given w i s. The same algorithms can be used for finding EW D-optimal designs Lift-one algorithm for maximizing f(p) = X WX Here we propose the lift-one algorithm for searching locally D-optimal p = (p 1,..., p 2 k) with given w i s. The basic idea is that, for randomly chosen i {1,...,2 k }, we update p i to p i and all the other p j s to p 1 p j = p i j 1 p i. This technique is motivated by the coordinate descent algorithm (Zangwill, 1969). It is also in spirit similar to the idea of one-point correction in the literature (Wynn, 1970; Fedorov, 1972; Müller, 2007), where design points are added/adjusted one by one. The major advantage of the lift-onealgorithm is that in order to determine an optimal p i, we need to calculate X WX only once due to Lemma 3.1 (see Step 3 of the algorithm below). Lift-one algorithm: 1 Start with arbitrary p 0 = (p 1,...,p 2 k) satisfying 0 < p i < 1, i = 1,...,2 k and compute f (p 0 ). 2 Set up a random order of i going through {1,2,...,2 k }. ( 3 For each i, determine f i (z) as in (A.8). In this step, either f i (0) or f 1 i 2) needs to be calculated according to equation (2). ( 4 Define p (i) 1 z, = 1 p i p 1,..., 1 z 1 p i p i 1,z, 1 z 1 p i p i+1,..., 1 z 1 p i p 2 k) where z maximizes f i (z) with 0 z 1 (see Lemma 7.2). Note that f(p (i) ) = f i (z ). 5 Replace p 0 with p (i), f (p 0 ) with f(p (i) ). 6 Repeat 2 5 until convergence, that is, f(p 0 ) = f(p (i) ) for each i. 10

11 While in all examples that we studied, the lift-one algorithm converges very fast, we do not have a proof of convergence. There is a modified lift-one algorithm, which is only slightly slower, that can be shown to converge. This algorithm can be described as follows. For the 10mth iteration and a fixed order of i = 1,...,2 k we repeat steps 3 5, m = 1,2,... If p (i) is a better allocation found by the lift-one algorithm than the allocation p 0, instead of updating p 0 to p (i) immediately, { we obtain p (i) for each i, and replace p 0 with the first best one among p (i),i = 1,...,2 }. k For iterations other than the 10mth, we follow the original lift-one algorithm update. Theorem 3.3 When the lift-one algorithm or the modified lift-one algorithm converges, the resulting allocation p maximizes X WX on the set of feasible allocations. Furthermore, the modified lift-one algorithm is guaranteed to converge. Our simulation studies indicate that as k grows, the optimal designs produced by the lift-one algorithm for main-effects models is supported only on a fraction of all the 2 k design points. To illustrate this, we randomly generated the regression coefficients i.i.d. from U( 3, 3) and applied our algorithm to find the optimal designs under the logit link. Figure 1 gives histograms of number of support points in the optimal designs. For example, with k = 2, 76% of the designs are supported on three points only and 24% of them are supported on all four points. As k becomes larger, the number of support points moves towards a smaller fraction. On the other hand, a narrower range of coefficients requires a larger portion of support points. For example, the mean numbers of support points with β i s iid from U( 3,3) are 3.2,5.1,8.0,12.4,18.7,28.2 for k = 2,3,4,5,6,7, respectively. The corresponding numbers increase to 4.0, 7.1, 11.9, 19.1, 30.6, 47.7 for U( 1, 1), and further to 4.0,7.6,14.1,24.7,41.2,66.8 for U( 0.5,0.5). As demonstrated in Table 1, the lift-one algorithm is much faster than commonly used optimization techniques including Nelder-Mead, quasi-newton, conjugate-gradient, and simulated annealing (for a comprehensive reference, see Nocedal and Wright (1999)). Our algorithm finds the optimal solution with more accuracy as well. For example, for a 2 4 design problem, the relative efficiency of the design found by the Nelder-Meadmethodwithrespect tothedesignfoundbyouralgorithmisonly95%on average, and in 10% of the cases the relative efficiencies are less than 75%. When the number of factors becomes large, our investigation indicates that some algorithms do not converge within a reasonable amount of time or are not applicable due to unavailability of the gradient of the objective function. These are denoted by NA in Table Algorithm for maximizing X WX with integer solutions To maximize X WX, an alternative algorithm, called exchange algorithm, is to adjust p i andp j simultaneously forrandomlychosenindex pair(i,j)(see Supplementary 11

12 K=2 K=3 K=4 Proportions Number of support points Proportions Number of support points Proportions Number of support points K=5 K=6 K=7 Proportions Number of support points Proportions Number of support points Proportions Number of support points Figure 1: Number of support points in the optimal design(based on 1000 simulations) Table 1: Performance of the lift-one algorithm(cpu time in seconds for 100 simulated β from U(-3,3) with logit link and main-effects model) Algorithms Nelder-Mead quasi-newton conjugate simulated lift-one gradient annealing 2 2 Designs Designs Designs NA NA NA

13 Materials for detailed description). The original idea of exchange was suggested by Fedorov (1972). It follows from Lemma 3.1 that the optimal adjusted (p i,p j ) can be obtained easily by maximizing a quadratic function. Unlike the lift-one algorithm, the exchange algorithm can be applied to search for integer-valued optimal allocation n = (n 1,...,n 2 k), where i n i = n. Exchange algorithm for integer-valued allocations: 1 Start with initial design n = (n 1,...,n 2 k) such that f(n) > 0. 2 Set up a random order of (i,j) going through all pairs {(1,2),(1,3),...,(1,2 k ),(2,3),...,(2 k 1,2 k )} 3 For each (i,j), let m = n i + n j. If m = 0, let n ij f ij (z) as given in equation (A.11). Then let = n. Otherwise, calculate n ij = (n 1,...,n i 1,z,n i+1,...,n j 1,m z,n j+1,...,n 2 k) where the integer z maximizes f ij (z) with 0 z m according to Lemma 7.4 in the Appendix. Note that f(n ij ) = f ij(z ) f(n) > 0. 4 Repeat 2 3 until convergence (no more increase in terms of f(n) by any pairwise adjustment). As expected, the integer-valued optimal allocation (n 1,...,n 2 k) is consistent with the proportion-valued allocation (p 1,...,p 2 k) for large n. For small n, the algorithm may be used for fractional design problem in Section 4. It should be noted that the exchange algorithm with slight modification is guaranteed to converge to the optimal allocation when searching for proportions but not for integer-valued solutions, especially when n is small compared to 2 k. In terms of finding optimal proportions, the exchange algorithm produces essentially the same results as the lift-one algorithm, although the former is relatively slower. For example, based on 1000 simulated β s from U(-3,3) with logit link and the maineffects model, the ratio of computational time of the exchange algorithm over the lift-one algorithm is 6.2, 10.2, 16.8, 28.8, 39.5 and 51.3 for k = 2,...,7 respectively. Note that it requires 2.02, 5.38, 19.2, 84.3, 352, and 1245 seconds respectively to finish the 1000 simulations using the lift-one algorithm on a regular PC with 2.26GHz CPU and 2.0G memory. As the total number of factors k becomes larger, the computation is more intensive. Detailed study of the computational properties of the proposed algorithms is a topic of future research. 13

14 4 Fractional Factorial Designs Unless the number of factors k is very small, the total number of experimental conditions 2 k is large, and it could be impossible to take observations at all 2 k level combinations. The reasons are that the total number of observations n would become large, as well as the number of level changes which are expensive in many applications. So when k is large, the experimenter may have to consider fractional designs. For linear models, the accepted practice is to use regular fractions due to the many desirable properties like minimum aberration and optimality. We will show that in our setup the regular fractions are not always optimal. First, we will examine the situations when they are optimal. We use 2 3 designs for illustration. More discussion on general cases can be found in Section 5.1. The design matrix for 2 3 main-effects model consists of the first four columns of X given in (1) and w j represents the information in the jth experimental condition, i.e., the jth row of X. The maximum number of experimental conditions is fixed at a number less than 8, and the problem is to identify the experimental conditions and corresponding p i s that optimize the design optimality criterion. Half fractions use 4 experimental conditions (hence the design is uniform). The fractions defined by rows {1,4,6,7} and {2,3,5,8} are regular fractions, given by the defining relations 1 = ABC and 1 = ABC respectively. If the assumed values of all the regression coefficients corresponding to the design factors are zeros, it leads to the case where all the w i s are equal and effectively reduces to the linear model, where the regular fractions are D-optimal. The following Theorem identifies the necessary and sufficient conditions for regular fractions to be D-optimal in terms of w i s. Theorem 4.1 For the 2 3 main-effects model, suppose β 1 = 0 (which implies w 1 = w 5, w 2 = w 6, w 3 = w 7, and w 4 = w 8 ). The regular fractions {1,4,6,7} and {2,3,5,8} are D-optimal within the class of half-fractions if and only if 4 min{w 1,w 2,w 3,w 4 } max{w 1,w 2,w 3,w 4 }. Further suppose β 2 = 0 (thus w 1 = w 3 = w 5 = w 7 and w 2 = w 4 = w 6 = w 8 ). The two regular half-fractions {1,4,6,7} and {2,3,5,8} are D-optimal half-fractions if and only if 4min{w 1,w 2 } max{w 1,w 2 }. Example 4.1 Under logit link, consider the 2 3 main-effects model with β 1 = β 2 = 0, which impliesw 1 = w 3 = w 5 = w 7 andw 2 = w 4 = w 6 = w 8. Theregularhalf-fractions {1,4,6,7} and {2,3,5,8} have the same X WX but not the same X WX. They are D-optimal half-fractions if and only if one of the following happens: (i) β 3 log2 (3) ( 2e β 3 ) 1 (ii) β 3 > log2 and β 0 log. e β

15 β Modified C = +1 Modified C = 1 Regular Fractions ABC = ± 1 Modified C = 1 Modified C = +1 β β 0 β 2 Figure 2: Partitioning of the parameter space When the regular half-fractions are not optimal, the goal is to find {i 1,i 2,i 3,i 4 } that maximizes X[i 1,i 2,i 3,i 4 ] 2 w i1 w i2 w i3 w i4 based on Lemma 3.1. Recall that in this case there are only two distinct w i s. If β 0 β 3 > 0, w i s corresponding to {2,4,6,8} are larger than others, so this fraction given by C = 1 will maximize w i1 w i2 w i3 w i4 but this leads to a singular design matrix. It is not surprising that the D-optimal half-fractions are close to the design {2,4,6,8}, and are in fact given by the 16 designs each consisting of three elements from {2,4,6,8} and one from {1,3,5,7}, that are called modified C = 1. All the 16 designs lead to the same X WX, which is w 1 w 3 2 /4. For β 0β 3 < 0, D-optimal half-fractions are similarly obtained from the fraction C = +1. Figure2partitions the parameter space for 2 3 main-effects logit model. The left panel corresponds to the case β 1 = β 2 = 0. Here the parameters in the middle region would make the regular fractions D-optimal, whereas the top-right and bottom-left regions correspond to the case β 0 β 3 > 0. Similarly the other two regions correspond to the case β 0 β 3 < 0. The right panel of Figure 2 is for the case β 1 = 0 and shows the contour plots for the largest β 0 s that would make the regular fractions D-optimal. (For details, see the Supplementary Materials of this paper.) Along with Figure 2, conditions (3) and (S.10) in Supplementary Materials indicate that if β 1,β 2 and β 3 are small then the regular fractions are preferred (see also Table 2). However, when at least one β i is large, the information is not uniformly distributed over the design points and the regular fractions may not be optimal. In general, when all the β i s are nonzero, the regular fractions given by the rows {1,4,6,7} or {2,3,5,8} are not necessarily the optimal half-fractions. To explore this, we simulated the regression coefficients β 0, β 1, β 2,β 3 independently from different distributions and calculated the corresponding w s under logit, probit and complementary log-log links for 10,000 times each. For each w, we found the best 15

16 Table 2: Distribution of D-optimal half-fractions under 2 3 main-effects model Rows Percentages β 0 U( 10,10) N(0,5) Simulation β 1 U(.3,.3) U( 3,3) U( 3,0) U(0,1) N(1,1) Setup β 2 U(.3,.3) U(0,3) U(0,3) U(0,3) N(2,1) β 3 U(.3,.3) U(1,5) U( 2,2) U(0,5) N(3,1) (99.5) 0.07 (0.22) 0.86 (2.18) 0.95 (2.61) 0.04 (2.89) (99.5) 0.04 (0.23) 0.68 (2.12) 1.04 (2.56) 0.08 (2.86) (according to D-criterion) design supported on 4 distinct rows of the design matrix. By Lemma 3.1, any such design has to be uniform. Table 2 gives the percentages of times each of those designs turn out to be the optimal ones for the logit model. The results are somewhat similar for the other links. It shows that the regular fractions are optimal when the β i s are close to zero. In Table 2, we only reported the non-regular fractions which turned out to be D-optimal for more than 15% of the times. For the 2 4 case, the results are similar, that is, when the β i s are nonzeros, the performance of the regular fractions given by 1 = ±ABCD are not very efficient in general. For investigating the robustness of a design, let us denote the D-criterion value as ψ(p,w) = X WX for given w = (w 1,...,w 2 k) and p = (p 1,...,p 2 k). Suppose p w is a D-optimal allocation with respect to w. Then the relative loss of efficiency of p with respect to w can be defined as R(p,w) = 1 ( ) 1 ψ(p,w) d+1. (4) ψ(p w,w) In Table 2, we have provided, within parenthesis, the percentages of times the regular fractions were at least 95% efficient compared to the best half-fractions. It is clear that when the regular fractions are not D-optimal, they are usually not highly efficient either. On the other hand, for each of the five situations described in Table 2, we also calculated the corresponding EW D-optimal half-fractions. For all of the five cases, including the highly asymmetric fifth scenario, the regular fractions are the EW D-optimal half-fractions. Remark 2 In Table 2 and later (Table 3 and Table 6) we have used distributions for β s in two ways. For locally optimal designs these distributions are used to simulate the assumed values in order to study the properties of the designs, especially robustness. For EW D-optimal designs these distributions are used as priors. 16

17 Remark 3 These priors should be chosen carefully for real applications. For example, a uniform prior on β [ a,a] would indicate that the experimenter does not know much about the corresponding factor. If β [0,b] then the experimenter knows the direction of the parameter. In our odor study example, factor A (algae) has two levels: raffinated or solvent extracted algae ( 1) and catfish pond algae (+1). The scientists initially assessed that raffinated algae has residual lipid which should prevent absorber to interact with volatiles, causing odor to release. Hence it is expected that the β for this factor should be nonnegative. In this case, one should take the prior as [0,b]. On the other hand, for factor B (Scavenger), it is not known before conducting the experiment whether Activated Carbon ( 1) is better or worse than Zeolite (+1) so a prior of the form [ a,a] is appropriate. Remark 4 Consider the problem of obtaining the locally D-optimal fractional factorial designs when the number of experimental settings (m, say) is fixed. If the total number of factors under consideration is not too large, one can always calculate the D-efficiencies of all fractions and choose the best one. However, we need a better strategy than this for even moderately large number of factors since this is computational expensive. One such strategy would be to choose the m largest w i s and the corresponding rows, since those w i represent the information at the corresponding design points. Another one could be to use our algorithms discussed in Section 3.3 to find an optimal allocation for the full factorial designs first. Then choose the m largest p i s and scale them appropriately. One has to be careful in order to avoid designs which would not allow the estimation of the model parameters. In this case, the exchange algorithm described in Section may be used to choose the fraction with given m experimental units. Our simulations (not presented here) shows that both of these methods perform satisfactorily with the second method giving designs which are generally more than 95% efficient for four factors with main-effects model. This method will be used for computations in the next section. 5 Robustness In this section, we will check the robustness of locally D-optimal designs over the assumed parameter values, for both full factorial and fractional factorial setups. 5.1 Most robust minimally supported designs Minimally supported designs have been studied extensively. For continuous, quantitative factors, these designs are D-optimal for many linear and non-linear models. In our setup of qualitative factors, these designs are attractive since they use the minimal number, (d + 1), of experimental conditions. In many applications, level changes are expensive, hence fewer experimental conditions are desirable. In this section, we will examine the robustness of minimally supported designs. Our next result gives neces- 17

18 sary and sufficient conditions for a fraction to be a D-optimal minimally supported design. Note that Theorem 5.1 is an immediate consequence of Lemma 3.1. Theorem 5.1 Let I = {i 1,...,i d+1 } {1,...,2 k } be an index set. A design p I = (p 1,..., p 2 k) satisfying p i = 0, i / I is a D-optimal minimally supported design if and only if p i1 = = p id+1 = 1 d+1 and I maximizes X[i 1,...,i d+1 ] 2 w i1 w id+1. Recall that we denote by R(p,w) the relative loss of efficiency of p with respect to w in (4). Let us define the maximum relative loss of efficiency of a given design p with respect to a specified region W of w by R max (p) = maxr(p,w). (5) w W It can be shown that regions W take the form [a, b] 2k for 2 k main-effects model if the range of each of the regression coefficients is an interval symmetric about 0. For example, for a 2 4 main-effects model, if all the regression coefficients range between [ 3,3], thenw = [ , 0.25] 16 forlogitlink, or[ ,0.637] 16 forprobit link. This is the rationale for the choice of the range of w i s in Theorem 5.2 below. A design which minimizes the maximum relative loss of efficiency will be called most robust. Note that this criterion is known as maximin efficiency in the literature - see, for example, Dette (1997). For unbounded β i s with a prior distribution, one may use.99 or.95 quantile instead of the maximum relative loss to measure the robustness. Theorem 5.2 Suppose k 3 and d(d + 1) 2 k+1 4. Suppose w i [a, b], i = 1,...,2 k, 0 < a < b. Let I = {i 1,...,i d+1 } be an index set which maximizes X[i 1,i 2,...,i d+1 ] 2. Then the design p I = (p 1,...,p 2 k) satisfying p i1 = = p id+1 = 1 is a most robust minimally supported design with maximum relative loss of efficiency 1 a. d+1 b Based on Theorem 5.2, the maximum relative loss of efficiency depends on the possible range of w i s. The result is practically meaningful only if the interval [a,b] is bounded away from 0. Figure 3 provides some idea about the possible bounds of w i s for commonly used link functions. For example, for 2 3 designs with main-effects model, if w i 0.25 under logit link (see Remark of Yang et al. (2012)), then the maximum relative loss of efficiency of the regular half-fractional design satisfying p 1 = p 4 = p 6 = p 7 = 1/4 is /0.25 = 58%. The more certain we are about the range of w i s, the more useful the result will be. Note that for k = 2 all the 4 minimally supported designs perform equally well (or equally bad). So they are all most robust under the above definition. For maineffects models, the condition d(d + 1) 2 k+1 4 in Theorem 5.2 is guaranteed by 18

19 Table 3: Relative loss of efficiency of 2 4 uniform design Percentages β 0 U( 3,3) U( 1,1) U( 3,0) N(0,5) β 1 U( 1,1) U(0,1) U(1,3) N(0,1) Simulation β 2 U( 1,1) U(0,1) U(1,3) N(2,1) Setup β 3 U( 1,1) U(0,1) U( 3, 1) N(.5,2) β 4 U( 1,1) U(0,1) U( 3, 1) N(.5,2) Quantiles (I) (II) (III) (I) (II) (III) (I) (II) (III) (I) (II) (III) R R R Note: (I) = R 100α (p u ), (II) = min 100α(p s ), (III) = R 100α (p e ). 1 s 1000 k 3. A most robust minimally supported design can be obtained by searching an index set {i 1,...,i d+1 } which maximizes X[i 1,i 2,...,i d+1 ] 2. Note that such an index set is usually not unique. Based on Lemma 7.3, if the index set {i 1,...,i d+1 } maximizes X[i 1,...,i d+1 2, there always exists another index set {i 1,...,i d+1 } such that X[i 1,...,i d+1 ] 2 = X[i 1,...,i d+1 ] 2. It should also be noted that a most robust minimallysupporteddesignmayinvolveasetofexperimentalconditions{i 1,...,i d+1 } which does not maximize X[i 1,...,i d+1 ] 2. For example, consider a design with main-effects model. Suppose w i [a,b], i = 1,...,8. If 4a > b, then the most robust minimally supported designs are the regular fractional ones. Otherwise, if 4a b, then any uniform design restricted to {i 1,i 2,i 3,i 4 } satisfying X[i 1,i 2,i 3,i 4 ] 0 is a most robust minimally supported one. 5.2 Robustness of uniform designs As mentioned before, for examples in Section 3.2, a design is called uniform if the allocation of experimental units is the same for all points in the support of the design. Yang, Mandal and Majumdar (2012) showed that for a 2 2 main-effects model, the uniform design is the most robust design in terms of maximum relative loss of efficiency. In this section, we use simulation studies to examine the robustness of uniform designs and EW D-optimal designs for higher order cases. For illustration, we use a 2 4 main-effects model. We simulate β 0,...,β 4 from different distributions 1000 times each and calculate the corresponding w s, denoted by w 1,..., w For each w s, we use the algorithm described in Section to obtain a D-optimal allocation p s. For any allocation p, let R 100α (p) denote the αth quantile of the set of relative loss of efficiencies {R(p,w s ), s = 1,...,1000}. Thus R 100 (p) = R max (p) which is the R max defined in (5) with W = {w 1,...,w 1000 }. Since W here is simulated, the quantities R 99 (p) or R 95 (p) are more reliable in measuring the robustness of p. Table 3 compares the R 100α of the uniform design p u = (1/16,...,1/16) with the 19

20 minimum of R 100α (p s ) for the optimal allocations p s, s = 1,...,1000, as well as the R 100α of the EW design p e. In this table, if the values of column (I) is smaller than those of column (II), then we can conclude that the uniform design is better than all the D-optimal designs in terms of the quantiles of relative loss of efficiency, which happens in many situations. Table 3 provides strong evidence for fact that the uniform design p u is one of the most robust ones if the β i s are expected to come from an interval that is symmetric around zero. This is consistent with the conclusion of Cox (1988). However, there are situations where the uniform design does not perform well, as illustrated by the two middle blocks of Table 3. If the signs of the regression coefficients are known, it is advisable not to use the uniform design. For many practical applications, the experimenter will have some idea of the direction of effects of factors, which in statistical terms determines the signs of the regression coefficients. For these situations, it turns out that the performance of the EW D-optimal designs is comparable to that of the most robust designs, even when the uniform design does not perform well (see columns (III) in Table 3, where p e is the EW design). Hence we recommend the use of EW D-optimal designs when the experimenter has some idea about the signs of β i s. Uniform designs are recommended in the absence of prior knowledge of the sign of the regression parameters. Now consider the uniform designs restricted to regular fractions. Again we use 2 4 main-effects model as illustration and consider the uniform designs restricted to the regular half-fractions identified by 1 = ±ABCD. We performed simulations as above and our conclusions are similar, that is, uniform designs on regular fractions are among the most robust ones if the signs of the regression parameters are unknown but they may not perform well if the signs of β i s are known. 6 Examples In this section we revisit the examples discussed in the introduction. Example 6.1 Consider the odor study discussed in Section 1. The IV design given by D = ABC was used with 5 replications per experimental setup. Total number of samples where odor has been removed successfully is given as # of Success in Table 4. After fitting a probit model, the estimate of unknown parameter turns out to be ˆβ = (0.209,3.637, 2.843,0.269, 0.269). If one wants to conduct a follow-up experiment restricted on this half-fraction, then it is reasonable to assume the value of the unknown parameter at a nearby point, say β = (0.2,3.5, 3,0.3, 0.3). In that case, the corresponding D-optimal design is given by p a in Table 4. It is interesting to note that this design is supported only on six points rather than the original eight points, and the relative efficiency of the original design used with respect to p a is only 76% assuming ˆβ is the true parameter. We have also calculated the exact D-optimal 20

Optimal Designs for 2 k Experiments with Binary Response

1 / 57 Optimal Designs for 2 k Experiments with Binary Response Dibyen Majumdar Mathematics, Statistics, and Computer Science College of Liberal Arts and Sciences University of Illinois at Chicago Joint