D-optimal Designs with Ordered Categorical Data

Size: px

Start display at page:

Download "D-optimal Designs with Ordered Categorical Data"

Willis Tucker
5 years ago
Views:

1 D-optimal Designs with Ordered Categorical Data Jie Yang Liping Tong Abhyuday Mandal University of Illinois at Chicago Loyola University Chicago University of Georgia February 20, 2015 Abstract We consider D-optimal designs with ordered categorical responses and cumulative link models. In addition to theoretically characterizing locally D-optimal designs, we develop efficient algorithms for obtaining both approximate designs and exact designs. For ordinal data and general link functions, we obtain a simplified structure of the Fisher information matrix, and express its determinant as a homogeneous polynomial. For a predetermined set of design points, we derive the necessary and sufficient conditions for an allocation to be locally D- optimal. We prove that the number of support points in a minimally supported design only depends on the number of predictors, which can be much less than the number of parameters in the model. We show that a D-optimal minimally supported allocation in this case is usually not uniform on its support points. We also provide EW D- optimal designs as a highly efficient surrogate to Bayesian D-optimal designs with ordinal data. Keyword: Approximate design; exact design; multinomial response; cumulative link model; minimally supported design; ordinal data 1 Introduction We consider optimal experimental designs with ordered categorical responses, or simply ordinal data. Design of experiment with ordinal data has been of 1

2 great importance in a rich variety of scientific disciplines especially when human evaluations are involved (Christensen, 2013). Examples include wine bitterness study (Randall, 1989), potato pathogen experiments (Omer et al., 2000), radish seedling s damping-off study (Krause et al., 2001), polysilicon deposition study (Wu, 2008), beef cattle research (Osterstock et al., 2010), and toxicity study (Agresti, 2013). This research is motivated by an odor removal study conducted by the textile engineers at the University of Georgia. The scientists manufacture bio-plastics from algae that contain odorous elements. Following traditional factorial design theory for linear models, a regular 2 2 experiment with equal number of replicates was used to study the effect of types of algae and synthetic resins in removing the odor, and the response was ordinal in nature no odor, medium odor and strong odor. In this paper we identify designs that are significantly more efficient than the one used for this purpose. For an ordinal response Y with J categories and a set of d predictors x = (x 1,..., x d ) T, the most popular model is the cumulative logit model (also known as proportional odds model, see Liu and Agresti (2005) for a detailed review). McCullagh (1980) extended the proportional odds model with a more general link function g called the cumulative link model (also known as ordinal regression model) g (P (Y j x)) = θ j β T x, j = 1,..., J 1 (1) and treated it as a special case of the multivariate generalized linear model. In this paper, we focus on the cumulative link model with a general link. If there are only two categories (J = 2), the cumulative link model (1) is essentially a generalized linear model for binary data (McCullagh and Nelder, 1989; Dobson and Barnett, 2008). For optimal designs under generalized linear models, there is a growing body of literature (see Khuri et al. (2006), Atkinson et al. (2007), Stufken and Yang (2012) and references therein). When J 3, the results on optimal designs is meagre and restricted to logit link (Zocchi and Atkinson, 1999; Perevozskaya et al., 2003) due to the complexity of the Fisher information matrix F. In this paper, we obtain a special structure of F (Lemmas 1 and 2) for general link and reveal that the optimal designs with J 3 are quite different from the cases with J = 2. We prove that the number of support points of a minimally supported design is d + 1 which could be much less than the number of parameters d + J 1 (Theorems 3 and 4). We also show that the design weights of a minimally 2

3 supported design is usually not uniform on its support points when it is optimal (Section 6). Among various design criteria, D-optimality is the most frequently used one (Zocchi and Atkinson, 1999) and often performs well according to other criteria (Atkinson et al., 2007). Throughout this paper, we focus on D- criterion. In order to overcome the difficulty due to dependency of D-optimal designs on the values of unknown parameters, we choose the local optimality approach of Chernoff (1953) with assumed parameter values. In terms of robust designs, we compare Bayesian D-optimal designs (Chaloner and Verdinelli, 1995) with EW D-optimal designs (Atkinson et al., 2007; Yang, Mandal and Majumdar, 2014) for ordinal data. As a surrogate for Bayesian designs, EW design is much easier to find and retains high efficiency with respect to Bayesian criterion (Section 7). In the design literature, one type of experiments deal with quantitative or continuous factors only. Such a design problem includes identification of a set of design points {x i } i=1,...,m and the corresponding weights {p i } i=1,...,m (see, for example, Atkinson et al. (2007) and Stufken and Yang (2012)). For this type of optimal design problems, numerical algorithms are typically used for cases with two or more factors (see, for example, Woods et al. (2006)). Another type of experiments use qualitative or discrete factors, where the set of design points {x i } i=1,...,m is predetermined and only the weights {p i } i=1,...,m are to be optimized (see, for example, Yang and Mandal (2014)). One connection between the two types of designs is that one can pick up grid points of the continuous factors and turn the first type into the second. Tong et al. (2014) made another connection between the optimal designs for discrete factors and continuous factors (see Section 5 of that paper). In this paper, we concentrate on the second type of designs and assume {x i } i=1,...,m is given and fixed. This paper is organized as follows. In Section 2, we obtain the Fisher information matrix for cumulative link model with a general link, which generalizes Perevozskaya et al. (2003) s result for logit link. In Section 3 identifies a necessary and sufficient condition for the Fisher information matrix to be positive definite. In Sections 4 and 5, theoretical results and numerical algorithms for searching locally D-optimal approximate or exact designs are provided. In Section 6, we identify analytic D-optimal designs for special cases to illustrate that a D-optimal minimally supported design is usually not uniform on its support points. In Section 7, we show by examples that the EW D-optimal design is highly efficient with respect to Bayesian D- 3

4 optimality. Beyond theoretical results provided in this paper, the question that might be asked is whether these results give the users any advantage in real experiments. The answer is a definite yes as demonstrated for the motivating example. 2 Cumulative link model and Fisher information matrix Suppose there are m (m 2) experimental settings which are predetermined. For the ith experimental setting with corresponding covariates or predictors x i = (x i1,..., x id ) T R d (d 1), there are n i experimental units assigned to it. Among them, the kth experimental unit generates a response V ik which belongs to one of J (J 2) ordered categories. In many real applications, V i1,..., V ini are regarded as i.i.d. discrete random variables. Denote π ij = P (V ik = j), where i = 1,..., m, j = 1,..., J, and k = 1,..., n i. Let Y ij = #{k V ik = j} be the number of V ik s falling into the jth category. Then (Y i1,..., Y ij ) Multinomial(n i ; π i1,..., π ij ). Throughout this paper, we assume Assumption 1. 0 < π ij < 1, i = 1,..., m; j = 1,..., J. Denote γ ij = P (V ik j) = π i1 + + π ij, j = 1,..., J. Based on Assumption 1, 0 < γ i1 < γ i2 <... < γ i,j 1 < γ ij = 1 for each i = 1,..., m. Consider independent multinomial observations (Y i1,..., Y ij ), i = 1,..., m with corresponding predictors x 1,..., x m. Under a cumulative link model or ordinal regression model (McCullagh, 1980; Agresti, 2013; Christensen, 2013), there exists a link function g and parameters of interest θ 1,..., θ J 1, β = (β 1,..., β d ) T, such that g(γ ij ) = θ j x T i β, j = 1,..., J 1. This leads to m(j 1) equations in d + J 1 parameters (β 1,..., β d, θ 1,..., θ J 1 ). Furthermore, if g is strictly increasing, then θ 1 < θ 2 < < θ J 1 under Assumption 1, which is the case for commonly used link functions including logit (log(γ/(1 γ)), probit (Φ 1 (γ)), log-log ( log( log(γ))), complementary log-log (log( log(1 γ))), and cauchit (tan(π(γ 1/2))) (McCullagh and Nelder, 1989; Christensen, 2013). 4

5 Example 1. Consider the logit link g(γ) = log(γ/(1 γ)) with two factors and three ordered categories. The model consists of 2m equations g(γ ij ) = θ j x i1 β 1 x i2 β 2, i = 1,..., m; j = 1, 2 and 4 parameters (β 1, β 2, θ 1, θ 2 ). Under Assumption 1, γ i1 < γ i2 and θ 1 < θ 2 since g is strictly increasing. Example 2. Suppose the model consists of three covariates x 1, x 2, x 3 and a few second-order items, g(γ ij ) = θ j x i1 β 1 x i2 β 2 x i3 β 3 x i1 x i2 β 12 x 2 i1β 11 x 2 i2β 22, where i = 1,..., m; j = 1,..., J 1. Then d = 6. Since (Y i1,..., Y ij ), i = 1,..., m are independent, the log-likelihood function (up to a constant) of the cumulative link model is l(β 1,..., β d, θ 1,..., θ J 1 ) = m J Y ij log(π ij ) i=1 j=1 where π ij = γ ij γ i,j 1 with γ ij = g 1 (θ j x T i β) for j = 1,..., J 1 and γ i0 = 0, γ ij = 1, i = 1,..., m. Assumption 2. The link function g is differentiable and its derivative g is always strictly positive. We keep Assumption 2 throughout the paper, which is satisfied for logit, probit, log-log, complementary log-log, and cauchit. Under Assumptions 1 and 2, g is strictly increasing and thus θ 1 < θ 2 < < θ J 1. For s = 1,..., d, t = 1,..., J 1, l β s = l θ t = m ( x is ) i=1 { Yi1 π i1 (g 1 ) (θ 1 x T i β) + Y i2 [(g 1 ) (θ 2 x T i β) (g 1 ) (θ 1 x T i β) ] π i2 + + Y ij [ (g 1 ) (θ J 1 x T i β) ] } π ij m ( (g 1 ) (θ t x T Yit i β) Y ) i,t+1 π it π i,t+1 i=1 Since Y ij s come from multinomial distributions, we know E(Y ij ) = n i π ij, E(Yij) 2 = n i (n i 1)πij 2 + n i π ij, and E(Y is Y it ) = n i (n i 1)π is π it when s t. Then we have the following lemma. 5

6 Lemma 1. Let F = (F st ) be the (d + J 1) (d + J 1) Fisher information matrix. (i) For 1 s d, 1 t d, ( ) l l F st = E = β s β t m n i x is x it i=1 J (g ij g i,j 1 ) 2 where g ij = (g 1 ) (θ j x T i β) > 0 for j = 1,..., J 1 and g i0 = g ij = 0. j=1 π ij (ii) For 1 s d, 1 t J 1, ( ) l l m ( git g i,t 1 F s,d+t = E = n i ( x is )g it β s θ t π it (iii) For 1 s J 1, 1 t d, ( ) l l m ( gis g i,s 1 F d+s,t = E = n i ( x it )g is θ s β t π is i=1 i=1 g ) i,t+1 g it π i,t+1 g ) i,s+1 g is π i,s+1 (iv) For 1 s J 1, 1 t J 1, ( ) m l l i=1 n igis(π 2 1 is + π 1 i,s+1 ), if s = t F d+s,d+t = E = m i=1 θ s θ t n ig is g it ( π 1 i,s t ), if s t = 1 0, if s t 2 where s t = max{s, t}. Perevozskaya et al. (2003) obtained a detailed form of Fisher information matrix for logit link and one predictor. Our expressions here are good for fairly general link and d predictors. To simplify the notations, we denote e i = J (g ij g i,j 1 ) 2 j=1 c it = g it ( git g i,t 1 π it u it = g 2 it(π 1 it π ij > 0, i = 1,..., m (2) ), i = 1,..., m; t = 1,..., J 1 (3) g i,t+1 g it π i,t+1 + π 1 i,t+1 ) > 0, i = 1,..., m; t = 1,..., J 1 (4) b it = g i,t 1 g it π 1 it > 0, i = 1,..., m; t = 2,..., J 1 (if J 3) (5) Note that g ij is defined in Lemma 1 (i). Then we obtain the following lemma which plays a key role in later on calculation of F. 6

7 Lemma 2. c it = u it b it b i,t+1, i = 1,..., m; t = 1,..., J 1; e i = J 1 t=1 c it = J 1 t=1 (u it 2b it ), i = 1,..., m, where b i1 = b ij = 0 for i = 1,..., m. Example 1 (continued) For logit link g, g 1 (η) = e η /(1 + e η ) and (g 1 ) = g 1 (1 g 1 ). Thus g ij = (g 1 ) (θ j x T i β) = (γ ij )(1 γ ij ). With J = 3, we have π i1 +π i2 +π i3 = 1 for i = 1,..., m. Then for i = 1,..., m, g i1 = π i1 (π i2 + π i3 ), g i2 = (π i1 + π i2 )π i3, b i2 = π i1 π i3 π 1 i2 (π i1 + π i2 )(π i2 + π i3 ), u i1 = π i1 π 1 i2 (π i1 + π i2 )(π i2 + π i3 ) 2, u i2 = π i3 π 1 i2 (π i1 + π i2 ) 2 (π i2 + π i3 ), c i1 = π i1 (π i1 + π i2 )(π i2 +π i3 ), c i2 = π i3 (π i1 +π i2 )(π i2 +π i3 ), e i = (π i1 +π i2 )(π i1 +π i3 )(π i2 +π i3 ) As a direct conclusion of Lemma 1 and Lemma 2, we obtain the theorem as follows: Theorem 1. Under Assumptions 1 and 2, the Fisher information matrix F can be written as m F = n i A i (6) A T i2 i=1 where the (d + J 1) (d + J 1) matrix ( ) ( ) Ai1 A A i = i2 (ei x = is x it ) s=1,...d;t=1,...,d ( x is c it ) s=1,...,d;t=1,...,j 1 A i3 ( c is x it ) s=1,...,j 1;t=1,...,d A i3 and the (J 1) (J 1) matrix A i3 is symmetric tri-diagonal with diagonal entries u i1,..., u i,j 1 and off-diagonal entries b i2,..., b i,j 1 for J 3. Note that A i3 contains only one entry u i1 for J = 2. Examples of A i3 include (u i1 ), ( ui1 b i2 b i2 u i2 ), u i1 b i2 0 b i2 u i2 b i3 0 b i3 u i3 for J = 2, 3, 4, or 5 respectively., u i1 b i2 0 0 b i2 u i2 b i3 0 0 b i3 u i3 b i4 0 0 b i4 u i4 Remark 1. As an important property of the Fisher information matrix, F is always positive semi-definite (p.s.d.) which implies F 0. As a special case, A i can be regarded as the Fisher information matrix at the support point x i. Therefore, A i is also p.s.d. and A i 0 (actually A i = 0 according to Lemma 3 in Section 3). 7

8 3 Determinant of Fisher Information Matrix Among the several criteria for optimal designs, D-criterion looks for the allocation maximizing F, the determinant of F. A D-optimal design with m predetermined design points x 1,..., x m could either be an integer-valued allocation (n 1, n 2,..., n m ) maximizing F with pre-determined n = m i=1 n i > 0, known as an exact design; or a real-valued allocation (p 1, p 2,..., p m ) maximizing n 1 F with p i = n i /n 0 and m i=1 p i = 1, known as an approximate design. To study the structure of F as a polynomial function of (n 1,..., n m ), we denote the (k, l)th entry of A i by a (i) kl. Given a row map τ : {1, 2,..., d + ( J 1} ) {1,..., m}, we define a (d + J 1) (d + J 1) matrix A τ = a (τ(k)) kl whose kth row is given by the kth row of A τ(k). For a power index (α 1,..., α m ) with α i {0, 1,..., d + J 1} and m i=1 α i = d + J 1, we denote τ (α 1,..., α m ) if α i = #{j : τ(j) = i} for each i = 1,..., m. In terms of the construction of A τ, it says that α i rows of A τ are from the matrix A i. Theorem 2. The determinant F is an order-(d + J 1) homogeneous polynomial of (n 1,..., n m ) and F = α 1 + +α m =d+j 1 c α1,...,α m n α 1 1 n α m m where c α1,...,α m = A τ (7) τ (α 1,...,α m) Proof of Theorem 2: According to the Leibniz formula for the determinant, m F = n i A i = d+j 1 m n i a (i) k,σ(k) i=1 σ S d+j 1 ( 1) sgn(σ) where σ is a permutation of {1, 2,..., d + J 1}, and sgn(σ) is the sign or k=1 i=1 8

9 signature of σ. Therefore, c α1,...,α m = = = σ S d+j 1 ( 1) sgn(σ) τ (α 1,...,α m) ( 1) sgn(σ) τ (α 1,...,α m ) σ S d+j 1 τ (α 1,...,α m ) A τ d+j 1 k=1 d+j 1 k=1 a (τ(k)) k,σ(k) a (τ(k)) k,σ(k) In order to obtain analytic properties of F, we need the following lemmas derived from Lemma 2 and Theorem 1, as well as classical matrix theory and mathematical induction. Note that the following Lemma 3 covers Lemma 1 in Perevozskaya et al. (2003) as a special case. Lemma 3. Rank(A i ) = Rank(A i3 ) = J 1. Furthermore, A i3 is positive definite and J 1 J A i3 = gis 2 π 1 it > 0 (8) s=1 Lemma 4. Rank((A i1 A i2 )) 1 where = is true if and only if x i 0. Based on Lemma 3 and Lemma 4, we can obtain the two lemmas below on c α1,...,α m which significantly simplify the structure of F as a polynomial of (n 1,..., n m ). Lemma 5. If max 1 i m α i J, then A τ = 0 for any τ (α 1,..., α m ) and thus c α1,...,α m = 0. Proof of Lemma 5: Without any loss of generality, we assume α 1 α 2 α m. Then max 1 i m α i J implies α 1 J. In this case, for any τ (α 1,..., α m ), τ 1 (1) := {i τ(i) = 1} {1,..., d + J 1} and τ 1 (1) = α 1. If τ 1 (1) {1,..., d} 2, then A τ = 0 due to Lemma 4; otherwise {d + 1,..., d + J 1} τ 1 (1) and thus A τ = 0 due to Lemma 3. Thus c α1,...,α m = 0 according to (7) provided in Theorem 2. Lemma 6. If #{i : α i 1} d, then A τ = 0 for any τ (α 1,..., α m ) and thus c α1,...,α m = 0. 9 t=1

10 Proof of Lemma 6: Without any loss of generality, we assume α 1 α 2 α m. Then #{i : α i 1} d indicates α d+1 = = α m = 0. Let τ : {1, 2,..., d + J 1} {1,..., m} satisfy τ (α 1,..., α m ). Then the (d + J 1) (d + J 1) matrix A τ can be written as ( ) ( Aτ1 A τ2 (eτ(s) x = τ(s)s x τ(s)t ) s=1,...d;t=1,...,d ( x τ(s)s c τ(s)t ) s=1,...,d;t=1,...,j 1 A τ3 A τ4 ( c τ(d+s)s x τ(d+s)t ) s=1,...,j 1;t=1,...,d A τ4 ) where the (J 1) (J 1) matrix A τ4 is either a single entry u τ(d+1)1 (if J = 2) or symmetric tri-diagonal with diagonal entries u τ(d+1)1,..., u τ(d+j 1),J 1, upper off-diagonal entries b τ(d+1)2,..., b τ(d+j 2),J 1, and lower off-diagonal entries b τ(d+2)2,..., b τ(d+j 1),J 1. Note that A τ is asymmetric in general. If #{i : α i 1} d 1, then there exists an i 0 such that 1 i 0 d and τ 1 (i 0 ) {1,..., d} 2. In this case, A τ = 0 according to Lemma 4. If #{i : α i 1} = d, we may assume τ 1 (i) {1,..., d} = 1 for i = 1,..., d (otherwise A τ = 0 according to Lemma 4). Suppose α 1 α 2 α k 2 > α k+1. Then {d+1,..., d+j 1} k i=1τ 1 (i) and k i=1 (α i 1) = J 1. In order to show A τ = 0, we first replace A τ1 with A (1) τ1 = (e τ(s) x τ(s)t ) s=1,...d; t=1,...,d and replace A τ2 with A (1) τ2 = ( c τ(s)t ) s=1,...,d; t=1,...,j 1. It changes A τ into a new matrix A (1) τ. Note that A τ = d s=1 x τ(s)s A (1) τ. According to Lemma 2, the sum of the columns of A (1) τ2 is ( e τ(1),..., e τ(d) ) T, and the elementwise sum of the columns of A τ4 is (c τ(d+1)1, c τ(d+2)2,..., c τ(d+j 1),J 1 ) T. Secondly, for t = 1,..., d, we add x 1t ( e τ(1),..., e τ(d), c τ(d+1)1,..., c τ(d+j 1),J 1 ) T to the tth column of A (1) τ. We denote the resulting matrix by A (2) τ. Note that A (1) τ = A (2) τ. We consider the sub-matrix A (2) τd which consists of the first d columns of A (2) τ. For s τ 1 (1), the sth row of A (2) τd is simply 0. For i = 2,..., k, the jth row of A(2) τd is proportional to (x i1 x 11, x i2 x 12,..., x id x 1d ) if j τ 1 (i). Therefore, Rank(A (2) τd ) (d + J 1) α 1 k i=2 (α i 1) = d 1, which leads to A (2) τ = 0 and thus A (1) τ = 0, A τ = 0. According to (7) in Theorem 2, c α1,...,α m = 0. Example 3. Suppose d = 2, J = 3 with link function g. According to Theorem 2, F in this case is an order-4 homogeneous polynomial of (n 1,..., n m ). Due to Lemma 5 and Lemma 6, we can remove all the terms in the form of 10

11 n 4 i, n 3 i n j, or n 2 i n 2 j from F. Therefore, F = m c ijk n 2 i n j n k + c ijkl n i n j n k n l i=1 j<k,j i,k i i<j<k<l for some coefficients c ijk and c ijkl. Based on Lemma 5 and Lemma 6, in order to keep c α1,...,α m 0, the largest possible α i is J 1 and the fewest possible number of positive α i s is d + 1. As a direct conclusion of Lemma 6, the following theorem states a minimally supported design has at least d + 1 support point. Note that it could be much less than the number of parameters d + J 1. Theorem 3. F > 0 only if m d + 1. In order to find out when d+1 support point are enough for a meaningful design (that is, F > 0), we study the leading term of F with max 1 i m α i = J 1. For example, a i0 = J 1 for some 1 i 0 m. Due to Lemma 6 and m i=1 α i = d + J 1, in order to keep c α1,...,α m 0, there must exist 1 i 1 < i 2 < < i d m which are different from i 0, such that, α i1 = = α id = 1. The following lemma provides the explicit formula of such a coefficient c α1,...,α m. Lemma 7. Suppose α i0 = J 1 and α i1 = = α id = 1, where i 0, i 1,..., i d are distinct integers in {1,..., m}. Then c α1,...,α m = d e is A i0 3 X 1 [i 0, i 1,..., i d ] 2 s=1 where e is is defined by (2), A i0 3 can be calculated by (8), X 1 = (1 X) is an m (d + 1) matrix with 1 = (1,..., 1) T, X = (x 1,..., x m ) T, and X 1 [i 0, i 1,..., i d ] is the sub-matrix consisting of the i 0 th, i 1 th,..., i d th rows of X 1. The proof for Lemma 7 is relegated to the Appendix. For the purpose of finding D-optimal allocations, we write F = f(n 1,..., n m ) for an order- (d+j 1) homogeneous polynomial function f. The D-optimal exact design problem is to solve the integer-valued optimization problem given a positive 11

12 integer n max f(n 1, n 2,..., n m ) subject to n i {0, 1,..., n}, i = 1,..., m (9) n 1 + n n m = n Denote p i = n i /n, i = 1,..., m. According to Theorem 1, m m f(n 1,..., n m ) = n i A i = n m p i A i = n d+j 1 p i A i = n d+j 1 f(p 1,..., p m ) i=1 i=1 Therefore, the D-optimal approximate design problem is to solve the realvalued optimization problem i=1 max f(p 1, p 2,..., p m ) subject to 0 p i 1, i = 1,..., m (10) p 1 + p p m = 1 According to Lemma 3, A i0 3 > 0. Thus c α1,...,α m in Lemma 7 is positive as long as X 1 [i 0,..., i d ] is of full rank. Theorem 3 implies that a minimally supported design contains at least d + 1 support points, while the following theorem states a necessary and sufficient condition for the minimum number of support points to be exactly d + 1. Recall that X 1 = (1 X) is defined in Lemma 7. Theorem 4. f(p) > 0 for some p = (p 1,..., p m ) T if and only if Rank(X 1 ) = d + 1. Proof of Theorem 4: Suppose Rank(X 1 ) = d + 1. Then there exist i 0,..., i d {1,..., m}, such that, X 1 [i 0, i 1,..., i d ] = 0. According to Lemma 5, f(p) can be regarded as an order-(j 1) polynomial of p i0. Let p i0 = x (0, 1) and p i = (1 x)/(m 1) for i i 0. Based on Lemma 7, f(p) can be written as ( ) d ( ) d+1 1 x 1 x f i0 (x) = a J 1 x J 1 + a J 2x J 2 m 1 m 1 ( ) d+j 2 ( ) d+j 1 1 x 1 x + + a 1 x + a 0, where m 1 m 1 d a J 1 = A i0 3 e i s X 1 [i 0, i 1,..., i d] 2 > 0 {i 1,...,i d } {1,...,m}\{i 0} 12 s=1

13 Therefore, lim x 1 (1 x) d x 1 J f i0 (x) = (m 1) d a J 1 > 0. That is, f(p) > 0 for p i0 = x close enough to 1 and p i = (1 x)/(m 1) for i i 0. In order to justify that the condition Rank(X 1 ) = d + 1 is also necessary, we only need to show that f(p) 0 if Rank(X 1 ) d. Actually, for any as in the proof of Lemma 6. Then A τ = d s=1 x τ(s)s A (1) τ. Similar as in the proof of Lemma 6, for t = 1,..., d, we add x τ(1)t ( e τ(1),..., e τ(d), c τ(d+1)1,..., c τ(d+j 1),J 1 ) T to the tth column of A (1) τ. We denote the resulting matrix by A (3) τ. Note that A (1) τ = A (3) τ. We consider the sub-matrix A (3) τd which consists of the first d columns of A (3) τ. For s τ 1 (τ(1)), the sth row of A (3) τd is simply 0. For s = 2,..., k, the sth row of A (3) τd is e τ(s)(x τ(s)1 x τ(1)1,..., x τ(s)d x τ(1)d ). For s = 1,..., J 1, the (d + s)th row of A (3) τd is c τ(d+s)s(x τ(d+s)1 x τ(1)1,..., x τ(d+s)d x τ(1)d ). We claim that Rank(A (3) τd ) d 1. Otherwise, if Rank(A (3) τd ) = d, then there exist i 1,..., i d {2,..., d + J 1}, such that, the sub-matrix consisting of the i 1 th,..., i d th rows of A (3) τd is nonsingular. Then the sub-matrix consisting of the τ(1)th, τ(i 1 )th,..., τ(i d )th rows of X 1 is nonsingular, which implies Rank(X 1 ) = d + 1. The contradiction implies Rank(A (3) τd ) d 1. Then A(3) τ = 0 and thus A τ = 0 for each τ. Based on Theorem 2, F 0 and thus f(p) 0. τ : {1,..., d + J 1} {1,..., m}, we construct A (1) τ 4 Locally D-optimal Approximate Design A D-optimal approximate design is an allocation p = (p 1,..., p m ) T solving the optimization problem (10). The solution always exists since f is continuous and the set of feasible allocations m S := {(p 1, p 2,..., p m ) T R m p i 0, i = 1,..., m; p i = 1} is convex and compact. Theorem 4 ascertains that a meaningful D-optimal approximate design problem requires the following assumption. We assume that it is true for the rest of the paper. Assumption 3. Rank(X 1 ) = d + 1. Under Assumption 3, the set of nontrivial allocations S + := {p = (p 1, p 2,..., p m ) T S f(p) > 0} 13 i=1

14 is nonempty. As discussed in Remark 1, the Fisher information matrix F = m i=1 n ia i (see Theorem 1) is always positive semi-definite. Note that f(p) = n 1 d J F given p i = n i /n, i = 1,..., m. Since F = n m i=1 p ia i is linear in p and ϕ( ) = log is concave on positive semi-definite matrices, we know that f(p) is log-concave (Silvey, 1980). Lemma 8. F = F (p) is always positive semi-definite. It is positive definite if and only if p S +. Furthermore, log f(p) is concave on S. Lemma 8 assures that S + is convex given that it is nonempty. Following the proof of Theorem 4, we can justify that S + contains all p whose coordinates are all strictly positive. Theorem 5. f(p) > 0 if and only if Rank(X 1 [{i p i > 0}]) = d + 1, where p = (p 1,..., p m ) T S and X 1 [{i p i > 0}] is the sub-matrix consisting of the {i p i > 0}th rows of X 1. In other words, S + = { p = (p 1, p 2,..., p m ) T S Rank(X 1 [{i p i > 0}]) = d + 1 } Proof of Theorem 5: Combining Theorem 1 and Theorem 4, it is straightforward that f(p) = 0 if Rank(X 1 [{i p i > 0}]) d. We only need to show that f(p) > 0 if Rank(X 1 [{i p i > 0}]) = d + 1. Due to Theorem 1, we only need to verify the case p i > 0, i = 1,..., m. (Otherwise, we may simply remove all support points with p i = 0.) Suppose p i > 0, i = 1,..., m and Rank(X 1 ) = d + 1. Then there exist i 0,..., i d {1,..., m}, such that, X 1 [i 0,..., i d ] = 0. According to the proof of Theorem 4, for each i {i 0,..., i d }, there exists an ϵ i (0, 1), such that, f(p) > 0 as long as p i = x (1 ϵ i, 1) and p j = (1 x)/(m 1) for j i. On the other hand, for each i / {i 0,..., i d }, if we denote the jth row of X 1 by α j, j = 1,..., m, then α i = a 0 α i0 + + a d α id for some real numbers a 0,..., a d. Since α i 0, then at least one a i 0. Without any loss of generality, we assume a 0 0. Then it can be verified that X 1 [i, i 1,..., i d ] 0 too. Following the proof of Theorem 4 again, for such an i / {i 0,..., i d }, there also exists an ϵ i (0, 1), such that, f(p) > 0 as long as p i = x (1 ϵ i, 1) and p j = (1 x)/(m 1) for j i. Let ϵ = min{min i ϵ i, (m 1) min i p i, 1 1/m}/2. For i = 1,..., m, denote δ i = (δ i1,..., δ im ) T S with δ ii = 1 ϵ and δ ij = ϵ /(m 1) for j i. It can be verified that p = a 1 δ a m δ m with a i = (p i ϵ /(m 1))/(1 mϵ /(m 1)). By the choice of ϵ, f(δ i ) > 0, a i > 0, i = 1,..., m, and i a i = 1. Then f(p) > 0 according to Lemma 8. 14

15 Corollary 1. Under Assumption 3, f(p) > 0 if p = (p 1,..., p m ) T S satisfying p i > 0, i = 1,..., m. As a special case, f(p u ) > 0, where p u = (1/m,..., 1/m) T is the uniform allocation. Corollary 2. F > 0 if and only if Rank(X 1 [{i n i > 0}]) = d + 1. Since f(p) is log-concave, the Karush-Kuhn-Tucker conditions (Karush (1939); Kuhn and Tucker (1951)) are also sufficient for p to be D-optimal. We have the following theorem as a direction conclusion. Theorem 6. Suppose p = (p 1,..., p m) T S +. p is D-optimal if and only if there exists a λ R such that for i = 1,..., m, either f(p)/ p i = λ if p i > 0 or f(p)/ p i λ if p i = 0. Theorem 6 provides a Karush-Kuhn-Tucker type condition. It is especially useful for checking when a minimally supported design is D-optimal (see Section 6). Another necessary and sufficient condition for D-optimal designs is of the general-equivalence-theorem type (Kiefer, 1974; Pukelsheim, 1993; Atkinson et al., 2007; Stufken and Yang, 2012; Fedorov and Leonov, 2014; Yang, Mandal and Majumdar, 2014). It is more convenient while searching for numerical solutions. Following Yang, Mandal and Majumdar (2014), for given p = (p 1,..., p m ) T S + and i {1,..., m}, we define ( 1 z f i (z) = f p 1,..., 1 z p i 1, z, 1 z p i+1,..., 1 z ) p m (11) 1 p i 1 p i 1 p i 1 p i with 0 z 1. Note that f i (z) is well defined as long as p i < 1. Suppose f(p) > 0. Following the proof of Theorem 4, we obtain the following theorem on the coefficients of f i (z). Theorem 7. Suppose p = (p 1,..., p m ) T S +. Given i {1,..., m}, for 0 z 1, J 1 f i (z) = (1 z) d a j z j (1 z) J 1 j (12) j=0 where a 0 = f i (0), (a J 1,..., a 1 ) T = B 1 J 1 c, B J 1 = (s t 1 ) st is a (J 1) (J 1) matrix, and c = (c 1,..., c J 1 ) T with c j = (j + 1) d+j 1 j d f i ( 1 ) j+1 j J 1 f i (0), j = 1,..., J 1. 15

16 According to Theorem 7, f i (z) is an order-(d + J 1) polynomial of z. In other to determine its coefficients a 0, a 1,..., a J 1 as in (12), we need to calculate f i (0), f i (1/2), f i (1/3),..., f i (1/J), which are J determinants defined in (11). Note that B 1 J 1 is a matrix determined by J 1 only. For example, B1 1 = 1 for J = 2, ( 2 1 B2 1 = 1 1 ), B3 1 = , B4 1 = for J = 3, 4, or 5 respectively. Once a 0,..., a J 1 in (12) are determined, the maximization of f i (z) on z [0, 1] is numerically straightforward since it is a polynomial and its derivative is given by J 1 J 1 f i(z) = (1 z) d ja j z j 1 (1 z) J 1 j (1 z) d 1 (d+j 1 j)a j z j (1 z) J 1 j j=1 (13) Following the proof of Theorem 3.1.1, Theorem 3.3.3, and the lift-one algorithm in Yang, Mandal and Majumdar (2014), we have similar results and algorithm as follows: Theorem 8. Suppose p = (p 1,..., p m) T S +. p is D-optimal if and only if for each i = 1,..., m, f i (z), 0 z 1 attains it maximum at z = p i. Lift-one algorithm: 1 Start with arbitrary p 0 = (p 1,..., p m ) satisfying 0 < p i < 1, i = 1,..., m and compute f (p 0 ). 2 Set up a random order of i going through {1, 2,..., m}. 3 Following the random order of i in 2, for each i, determine f i (z) according to Theorem 7. In this step, J determinants f i (0), f i (1/2), f i (1/3),..., f i (1/J) are calculated based on (11). 4 Use quasi-newton method with gradient defined in (13) to find z maximizing f i (z) with 0 z 1. If f i (z ) f i (0), let z = 0. Define p (i) = ( 1 z 1 p i p 1,..., 1 z 1 p i p i 1, z, 1 z 1 p i p i+1,..., 1 z 1 p i p m ) T. Note that f(p (i) ) = f i (z ). 16 j=0 1 6

17 5 Replace p 0 with p (i), f (p 0 ) with f(p (i) ). 6 Repeat 2 5 until convergence, that is, f(p 0 ) = f(p (i) ) for each i. Theorem 9. When the lift-one algorithm converges, the resulting allocation p maximizes f(p) on the set of feasible allocations S. Example 4. Odor removal study The motivating example mentioned in Introduction is the odor removal study conducted at the University of Georgia. The scientists study the manufacture of bio-plastics from algae that contain odorous volatiles. These odorous volatiles, generated from algae bioplastics, either occur naturally within the algae or are generated through the thermoplastic processing due to heat and pressure. In order to commercialize these algae bio-plastics, the odor causing volatiles must be removed. For that purpose, a 2 2 factorial experiment was conducted using algae and synthetic plastic resin blends. The two factors were types of algae (X 1 : raffinated or solvent extracted algae ( ), catfish pond algae (+)) and synthetic resins (X 2 : polyethylene ( ), polypropylene (+)). The responses had three categories: serious odor (j = 1), medium odor (j = 2) and almost no odor (j = 3). The results of a pilot study with uniform design and ten replicates at each experimental setting are given in Table 1. We consider the logit link and fit Table 1: Odor Removal Study Group X 1 X 2 Responses # of replicates Model y i1 y i2 y i3 i = n 1 = y 1j = 10 g(γ 1j ) = θ j β 1 β 2 i = n 2 = y 2j = 10 g(γ 2j ) = θ j β 1 + β 2 i = n 3 = y 3j = 10 g(γ 3j ) = θ j + β 1 β 2 i = n 4 = y 4j = 10 g(γ 4j ) = θ j + β 1 + β 2 the cumulative link model presented in Table 1. The estimated values of the model parameters are ( ˆβ 1, ˆβ 2, ˆθ 1, ˆθ 2 ) = ( 2.45, 1.09, 2.67, 0.21). Suppose a follow-up experiment is planned and the estimated parameter values are regarded as the true value. Then the D-optimal approximate allocation found by the lift-one algorithm is p o = (0.4454, , 0, ) T. 17

18 With respect to p o, the relative efficiency of the uniform approximate allocation p u = (1/4, 1/4, 1/4, 1/4) T is (f(p u )/f(p o )) 1/4 = 79.6% which is far from satisfactory. With all examples that we studied, the lift-one algorithms converge very fast. Nevertheless, Yang, Mandal and Majumdar (2014) also provided a modified lift-one algorithm, which is slightly slower but guaranteed to converge. The same technique could be easily applied to the lift-one algorithm above if it does not converge in a pre-specified number of iterations. 5 Locally D-optimal Exact Design A locally D-optimal exact design is an integer-valued allocation n = (n 1,..., n m ) T maximizing F given the total number n of experimental units or runs, where n i s are nonnegative integers satisfying m i=1 n i = n. According to Corollary 2, we must have n d + 1 in order to make F > 0 possible. Thus we assume n d + 1 in this section to avoid trivial cases. To maximize f(n) = f(n 1,..., n m ) = F, we adopt the idea of exchange algorithm which was first suggested by Fedorov (1972). Following the algorithm described in Yang, Mandal and Majumdar (2014), the exchange algorithm here is to adjust n i and n j simultaneously for randomly chosen index pair (i, j) while keeping n i + n j = c as a constant. We start with an n = (n 1,..., n m ) T satisfying f(n) > 0. According to Corollary 2, it indicates Rank(X 1 [{i n i > 0}]) = d + 1. Following Yang, Mandal and Majumdar (2014), for 1 i < j m, we define f ij (z) = f (n 1,..., n i 1, z, n i+1,..., n j 1, c z, n j+1,..., n m ) (14) where c = n i + n j, z = 0, 1,..., c. Note that f ij (n i ) = f(n). As a conclusion of Theorem 2, Lemmas 5 and 6, we have the following formula on calculating f ij (z): Theorem 10. Suppose n = (n 1,..., n m ) T satisfies f(n) > 0. Given 1 i < j m, suppose n i + n j J. For z = 0, 1,..., n i + n j, f ij (z) = J c s z s (15) s=0 18

19 where c 0 = f ij (0), and c 1,..., c J can be obtained by (c 1,..., c J ) T = B 1 J (d 1,..., d J ) T with B J = (s t 1 ) st as a J J matrix and d s = (f ij (s) f ij (0))/s, s = 1,..., J. Note that the J J matrix B J in Theorem 10 shares the same form of B J 1 in Theorem 7. According to Theorem 10, in other to maximize f ij (z) with z = 0, 1,..., n i + n j, one can obtain the exact polynomial form of f ij (z) by calculating f ij (0), f ij (1),..., f ij (J). There is no practical need to find out the exact form of f ij (z) if n i +n j < J since one may simply calculate f ij (z) for each z = 0, 1,..., n i + n j. Following Yang, Mandal and Majumdar (2014), the algorithm below based on Theorem 10 could be used to find out the D-optimal exact allocation. Exchange algorithm for D-optimal allocation (n 1,..., n m ) T given n > 0: 1 Start with an initial design n = (n 1,..., n m ) T such that f(n) > 0. 2 Set up a random order of (i, j) going through all pairs {(1, 2), (1, 3),..., (1, m), (2, 3),..., (m 1, m)}. 3 For each (i, j), let c = n i + n j. If c = 0, let n ij = n. Otherwise, there are two cases. Case one: 0 < c J, we calculate f ij (z) as defined in (14) for z = 0, 1,..., c directly and find z which maximizes f ij (z). Case two: c > J, we first calculate f ij (z) for z = 0, 1,..., J; secondly determine c 0, c 1,..., c J in (15) according to Theorem 10; thirdly calculate f ij (z) for z = J + 1,..., c based on (15); fourthly find z maximizing f ij (z) for z = 0,..., c. For both cases, we define n ij = (n 1,..., n i 1, z, n i+1,..., n j 1, c z, n j+1,..., n m ) T Note that f(n ij) = f ij (z ) f(n) > 0. If f(n ij) > f(n), replace n with n ij, and f(n) with f(n ij). 4 Repeat 2 3 until convergence, that is, f(n ij) = f(n) in step 3 for any (i, j). Example 4 : Odor Removal Study (continued) Suppose we want to conduct a followup experiment with n runs. Using the exchange algorithm described above, we obtain the D-optimal exact designs listed in Table 2. It can be seen from the number of iterations that the algorithms for D- optimal exact and approximate designs converge very quickly. As expected, 19

20 Table 2: D-optimal Exact Designs and Approximate Design for Odor Removal Study n n 1 n 2 n 3 n 4 n 4 F # iterations Time(sec.) < p o the D-optimal exact allocations (n 1,..., n 4 ) T is consistent with the D-optimal approximate allocation p o = (p 1,..., p 4 ) T (last row of Table 4) for large n. The time costs in seconds (last column of Table 4) are recorded on a PC with 2GHz CPU and 8GB memory. Suppose we rerun a design with n = 40. With respect to the D-optimal exact design n o = (18, 11, 0, 11) T, the relative efficiency of the uniform exact design n u = (10, 10, 10, 10) T is only (f(n u )/f(n o )) 1/4 = 79.7%. 6 Minimally Supported Designs A minimally supported design is a design with the minimal number of support/design points while keeping F > 0. It is of practical significance since it indicates the minimal number of different experimental settings needed in the experiment. According to Theorem 3, a minimally supported design contains at least d + 1 support points. Note that the minimal number d + 1 does not depend on J and could be strictly smaller than the number of parameters d + J 1. On the other hand, according to Theorem 4, a minimally supported design could contain exactly d + 1 support points as long as the extended design matrix X 1 = (1 X) is of full rank, that is, Rank(X 1 ) = d + 1. Example 5. Suppose J = 2. The multinomial response is actually binomial. In this case, there are d + 1 parameters, θ 1, β 1,..., β d. Consider a general link function satisfying Assumptions 1 and 2. For i = 1,..., m, g i0 = g i2 = 0, g i1 = (g 1 ) (θ 1 x T i β) > 0, e i = u i1 = c i1 = g 2 i1/[π i1 (1 π i1 )]. Then A i3 20

21 in Theorem 1 contains only one entry, u i1, and thus A i3 = u i1 or simply e i (Lemma 3 still holds). Assume that the m d design matrix X satisfies Assumption 3. According to Theorem 2, Lemma 5, Lemma 6, and Lemma 7, for an approximate design p = (p 1,..., p m ) T, f(p) = n (d+1) F = X 1 [i 0, i 1,..., i d ] 2 p i0 e i0 p i1 e i1 p id e id 1 i 0 <i 1 < <i d m (16) It can be verified that equation (16) is essentially the same as Lemma 3.1 in Yang and Mandal (2014). According to Theorem 3.2 in Yang and Mandal (2014), a minimally supported design may contain d + 1 support points and a D-optimal one must keep equal weight 1/(d + 1) on all support points. For univariate responses (including binomial response) under generalized linear models, a minimally supported design must keep equal weights on all its support points in order to keep D-optimality (Yang, Mandal and Majumdar, 2014; Yang and Mandal, 2014). However, for multinomial type responses with J 3, it is usually not the case. In the following part of this section, we use the cases of d = 1 and d = 2 as illustrations. 6.1 Minimally supported designs with d = 1 and J 3 In this subsection, we consider the cases with d = 1 and J 3. That is, there is only one factor in the experiment and the response belongs to J 3 categories. The corresponding parameters are β 1 and θ 1,..., θ J 1. We first set m = 2, that is, a design with only two support points (minimally supported). As a direct conclusion from Theorem 2, Lemma 5, and Lemma 6, for an approximate design p = (p 1, p 2 ) T, we have the result on the form of F as follows: Theorem 11. Suppose d = 1, J 3, and m = 2. The objective function for a D-optimal approximate design is J 1 f(p 1, p 2 ) = n 2 F = c s p J s 1 p s 2 (17) where c 1,..., c J 1 can be obtained by (c 1,..., c J 1 ) T = B 1 J 1 (d 1,..., d J 1 ) T with B J 1 = (s t 1 ) st as a (J 1) (J 1) matrix and d s = f(1/(s+1), s/(s+ 1)) (s + 1) J /s, s = 1,..., J s=1

22 Actually, there is another way to calculate c 1,..., c J 1 in equation (17). For example, according to Lemma 7, c 1 = e J 1 2 s=1 g2 1s J t=1 π 1 1t (x 1 x 2 ) 2, c J 1 = e J 1 1 s=1 g2 2s J t=1 π 1 2t (x 1 x 2 ) 2, where x 1, x 2 are the two levels of the only factor. Nevertheless, Theorem 11 provides a practically convenient way to find out the exact form of the objective function after calculating F for J 1 different designs. Then the D-optimal problem is to maximize an order- J polynomial (f(z, 1 z) for z [0, 1]) which is numerically straightforward. As a special case which can be solved explicitly, we set J = 3 and get the following result as a direct conclusion of Theorem 6 and Theorem 11. Corollary 3. Suppose d = 1, J = 3, and m = 2. The objective function for a D-optimal approximate design is f(p 1, p 2 ) = p 1 p 2 (c 1 p 1 + c 2 p 2 ) (18) where c 1 = e 2 g 2 11g 2 12(π 11 π 12 π 13 ) 1 (x 1 x 2 ) 2 > 0, c 2 = e 1 g 2 21g 2 22(π 21 π 22 π 23 ) 1 (x 1 x 2 ) 2 > 0, and x 1, x 2 are the two levels of the factor. The D-optimal design p = (p 1, p 2) which maximizes (18) can be obtained as follows p 1 = c 1 c 2 + c 2 1 c 1 c 2 + c 2 2 2c 1 c 2 + c 2 1 c 1 c 2 + c 2 2, p 2 = Furthermore, p 1 = p 2 = 1/2 if and only if c 1 = c 2. c 1 2c 1 c 2 + c 2 1 c 1 c 2 + c 2 2 (19) For the case of (d, J, m) = (1, 3, 2), it can verified that the D-optimal design p 1 = p 2 = 1/2 if β 1 = 0. However, p 1 p 2 in general, and p 1 > p 2 if and only if c 1 > c 2, where c 1, c 2 are defined as in Corollary 3. The following result provides a necessary and sufficient condition for a minimally supported design to be D-optimal for the case of d = 1 and J = 3. Its proof is relegated to the supplementary materials. Corollary 4. Suppose d = 1, J = 3, and m 3. Let x 1,..., x m denote the m distinct levels of the factor. A minimally supported design p = (p 1, p 2, 0,..., 0) T is D-optimal if and only if (1) p 1, p 2 are defined same as in (19); (2) For i = 3,..., m, s i3 (p 1) 2 + (s i5 2c 1 )p 1p 2 + (s i4 c 2 )(p 2) 2 0, where c 1, c 2 are same as in Corollary 3, s i3 = e i g 2 11g 2 12(π 11 π 12 π 13 ) 1 (x 1 x i ) 2 > 0, s i4 = e i g 2 21g 2 22(π 21 π 22 π 23 ) 1 (x 2 x i ) 2 > 0, s i5 = e 1 (u 22 u i1 + u 21 u i2 2b 22 b i2 )(x 1 x 2 )(x 1 x i ) + e 2 (u 12 u i1 + u 11 u i2 2b 12 b i2 )(x 2 x 1 )(x 2 x i ) + e i (u 12 u 21 + u 11 u 22 2b 12 b 22 )(x i x 1 )(x i x 2 ). 22

23 (a) β= 2 (b) θ 2 =5 p 2=0 θ p 2=0 p 1>0, p 2>0, p 3>0 p 1=0 p 3=0 θ p 3=0 p 1=0 p 2=0 p1>0, p2>0, p3>0 p1>0, p2>0, p3> θ β Figure 1: Regions for a two-point design to be D-optimal with d = 1, J = 3, x { 1, 0, 1}, and logit link (note that θ 1 < θ 2 is required) Example 6. Suppose d = 1, J = 3, and m = 3 with three factor levels { 1, 0, 1}. Under the logit link g(γ) = log(γ/(1 γ)), there are three parameters β, θ 1, θ 2 satisfying g(γ 1j ) = θ j + β, g(γ 2j ) = θ j, g(γ 3j ) = θ j β, j = 1, 2 It can be verified that the D-optimal deign satisfies p 1 = p 3 = 1/2 if β = 0. Figure 1 shows cases with more general parameter values. In Figure 1(a), four regions in (θ 1, θ 2 )-plane are occupied by minimally supported designs (note that θ 1 < θ 2 is required). For example, regions labeled with p 2 = 0 indicates a minimally supported design satisfying p 2 = 0 is D-optimal given such a triple (θ 1, θ 2, β = 2). From Figure 1(b), one can see clearly that a design supported on { 1, 1} (that is, p 2 = 0) is D-optimal if β is not far away from Minimally supported designs with d = 2 and J = 3 In this subsection, we consider experiments with two factors and three categories. The corresponding parameters are β 1, β 2, θ 1, θ 2. For cases with more than three categories, similar conclusions could be obtained accordingly but with messier notations. According to Theorem 3, a minimally supported design in this case needs three support points, for example, (x i1, x i2 ), i = 1, 2, 3. Under Assumption 3, X 1 0, where X 1 = (1 X) is defined as in Lemma 7. In this case, X 1 is 23

24 a 3 3 matrix. Following Theorem 2, Lemmas 5, 6, and 7, the objective function for a minimally supported design at (d, J, m) = (2, 3, 3) is f(p 1, p 2, p 3 ) = X 1 2 e 1 e 2 e 3 p 1 p 2 p 3 (w 1 p 1 + w 2 p 2 + w 3 p 3 ) (20) where w i = e 1 i gi1g 2 i2(π 2 i1 π i2 π i3 ) 1 > 0, i = 1, 2, 3. We first solve the D-optimal design p = (p 1, p 2, p 3 ) T maximizing f(p 1, p 2, p 3 ) in (20), or equivalently maximizing p 1 p 2 p 3 (p 1 w 1 + p 2 w 2 + p 3 w 3 ). Since f(p 1, p 2, p 3 ) = 0 if p 1 p 2 p 3 = 0, then a D-optimal p = (p 1, p 2, p 3 ) T maximizing f(p 1, p 2, p 3 ) must satisfy 0 < p 1, p 2, p 3 < 1. As a direct conclusion of Theorem 6, a necessary condition for (p 1, p 2, p 3 ) to maximize f(p 1, p 2, p 3 ) is f = f = f (21) p 1 p 2 p 3 Following Tong et al. (2014), we are able to find analytic solutions maximizing equation (20). Theorem 12. Without any loss of generality, we assume 0 < w 3 w 2 w 1. The D-optimal allocation p = (p 1, p 2, p 3) T maximizing f(p 1, p 2, p 3 ) in (20) exists and is unique. It satisfies 0 < p 3 p 2 p 1 < 1 and can be obtained analytically as follows (i) If w 2 = w 3, then p 1 = 1 /(4w ), p 2 = p 3 = 2w 1 /(4w ), where 1 = 2w 1 3w 2 + 4w 2 1 4w 1 w 2 + 9w 2 2. Note that a special case is p 1 = p 2 = p 3 = 1/3 if we have w 3 = w 2 = w 1. (ii) If w 1 = w 2 w 3, then p 1 = p 2 = 2 /[2( 2 + 2w 1 )], p 3 = 2w 1 /( 2 + 2w 1 ), where 2 = 3w 1 2w 3 + 9w 2 1 4w 1 w 3 + 4w 2 3. (iii) If 0 < w 3 < w 2 < w 1, then p 1 = y 1 /(y 1 + y 2 + 1), p 2 = y 2 /(y 1 + y 2 + 1), p 3 = 1/(y 1 + y 2 + 1), where y 1 = b /3 (3b 1 b 2 2) 3A 1/3 + A1/ /3, y 2 = (w 1 w 3 )y 1 (w 2 w 3 ) + (w 1 w 2 )y 1 with A = 27b 0 + 9b 1 b 2 2b /2 (27b b b 0 b 1 b 2 b 2 1b b 0 b 3 2) 1/2, b i = c i /c 3, i = 0, 1, 2, and c 0 = w 3 (w 2 w 3 ) > 0, c 1 = 3w 1 w 2 w 1 w 3 4w 2 w 3 + 2w 2 3 > 0, c 2 = 2w 2 1 4w 1 w 2 w 1 w 3 + 3w 2 w 3, c 3 = w 1 (w 2 w 1 ) < 0. 24

25 The proof of Theorem 12 is relegated to the Appendix. A quick conclusion is that in this case a minimally supported design is usually not uniformly supported. Corollary 5. Suppose d = 2, J = 3, and m = 3. Then p = (1/3, 1/3, 1/3) T is D-optimal if and only if w 1 = w 2 = w 3, where w 1, w 2, w 3 are defined as in (20). Example 7. Suppose d = 2, J = 3, and m = 4. Consider a typical 2 2 factorial design problem, that is, the four design points are (x i1, x i2 ) = (1, 1), (1, 1), ( 1, 1), and ( 1, 1) for i = 1, 2, 3, 4 respectively. Suppose the link function g is differentiable and strictly monotonic. Define w i = e 1 i gi1g 2 i2(π 2 i1 π i2 π i3 ) 1, i = 1, 2, 3, 4. (i) If β 1 = β 2 = 0, then w 1 = w 2 = w 3 = w 4. (ii) If β 1 = 0, β 2 0, then w 1 = w 3, w 2 = w 4, but w 1 w 2. (iii) If β 1 0, β 2 = 0, then w 1 = w 2, w 3 = w 4, but w 1 w 3. (iv) If β 1 = β 2 0, then w 2 = w 3, but w 1, w 2, w 4 are distinct. (v) If β 1 = β 2 0, then w 1 = w 4, but w 1, w 2, w 3 are distinct. Theorem 12 provides analytic forms of minimally supported designs with d = 2 and J = 3. As a direct conclusion of Theorem 6, the following corollary provides a necessary and sufficient condition for a minimally supported design to be D-optimal. Its proof is relegated to the supplementary materials. Corollary 6. Suppose d = 2, J = 3, and m 4. Let (x i1, x i2 ), i = 1,..., m be the m distinct level combinations of the two factors. Let X 1 be the m 3 matrix defined in Lemma 7. Then a minimally supported design p = (p 1, p 2, p 3, 0,..., 0) T is D-optimal if and only if (1) p 1, p 2, p 3 are obtained according to Theorem 12; (2) For i = 3,..., m, X 1 [1, 2, i] 2 e 1 e 2 e i p 1p 2(w 1 p 1 + w 2 p 2) + X 1 [1, 3, i] 2 e 1 e 3 e i p 1p 3(w 1 p 1 + w 3 p 3) + X 1 [2, 3, i] 2 e 2 e 3 e i p 2p 3(w 2 p 2 + w 3 p 3) + D i p 1p 2p 3 X 1 [1, 2, 3] 2 e 1 e 2 e 3 p 2p 3(2w 1 p 1 + w 2 p 2 + w 3 p 3) where e j = u j1 + u j2 2b j2, w j = e 1 j gj1g 2 j2(π 2 j1 π j2 π j3 ) 1, j = 1,..., m, D i = e j e k (u s1 u t2 +u s2 u t1 2b s2 b t2 ) X 1 [j, k, s] X 1 [j, k, t] {j,k,s,t} ={1,2,3,i} 25

26 (a) (b) β θ 1=1, θ 2=2 θ 1=3, θ 2=5 θ 1=.2, θ 2=.5 θ β 1=1, β 2=1 β 1=1, β 2=3 β 1=1, β 2= β 1 θ 1 Figure 2: Boundary lines for a three-point design to be D-optimal with logit link: Region of (β 1, β 2 ) for given (θ 1, θ 2 ) is outside the boundary lines in Panel (a); Region of (θ 1, θ 2 ) (with θ 1 < θ 2 ) for given (β 1, β 2 ) is between the boundary lines and θ 1 = θ 2 in Panel (b) with the sum going through (j, k, s, t) = (1, 2, 3, i), (1, 3, 2, i), (1, i, 2, 3), (2, 3, 1, i), (2, i, 1, 3), (3, i, 1, 2). Example 8. Suppose d = 2, J = 3, m = 4 with logit link function. We consider the typical 2 2 factorial design problem with four design points (1, 1), (1, 1), ( 1, 1), and ( 1, 1). According to Theorem 12 and Corollary 6, we can analytically calculate the best three-point design and determine whether it is D-optimal or not. Figure 2 provides the boundary lines of regions of parameters (β 1, β 2, θ 1, θ 2 ) for which the best three-point design is D-optimal. In particular, Figure 2(a) shows the region of (β 1, β 2 ) for given θ 1, θ 2. It clearly indicates that the best three-point design tends to be D- optimal when the absolute values of β 1, β 2 are large. The region tends to be larger as the absolute values of θ 1, θ 2 increase. On the other hand, Figure 2(b) displays the region of (θ 1, θ 2 ) for given β 1, β 2. The symmetry of the boundary lines about θ 1 + θ 2 = 0 is due to the logit link which is symmetric about 0. An interesting conclusion based on Corollary 6 is that in this case a three-point design can never be D-optimal if β 1 = 0 or β 2 = 0. 26

27 7 EW D-optimal Design The previous sections mainly focus on locally D-optimal designs which require assumed parameter values, (β 1,..., β d, θ 1,..., θ J 1 ). For many applications, the experimenter may have little or limited information about the values of parameters. In this case, Bayes D-optimality (Chaloner and Verdinelli, 1995) which maximizes E(log F ) given a prior distribution on parameters provides a reasonable solution. Here E stands for expectation, and F is the Fisher information matrix. An alternative to Bayes one is the EW D- optimality (Yang, Mandal and Majumdar, 2014; Atkinson et al., 2007) which maximizes log E(F ) essentially. Compared with Bayes ones, EW D-optimal designs are much easier to calculate and still highly efficient (Yang, Mandal and Majumdar, 2014). Based on Theorem 1, an EW D-optimal design which maximizes E(F ) may be viewed as a locally D-optimal design with e i, c it, u it and b it replaced by their expectations. After the replacement, Lemma 2 still holds. Therefore, almost all the lemmas, theorems, corollaries, and algorithms in the previous sections can be applied directly to EW D-optimal designs as well. The only exception is due to Lemma 3 which provides the formula of A i3 in terms of g ij and π ij. In order to fit EW D-optimal designs, A i3 has to be calculated in terms of u it and b it. For example, A i3 = u i1 if J = 2, A i3 = u i1 u i2 b 2 i2 if J = 3, and A i3 = u i1 u i2 u i3 u i1 b 2 i3 u i3 b 2 i2 if J = 4. Then the formulas of A i3 in Lemma 7, c 1, c 2 in Corollary 3, s i3, s i4, s i5 in Corollary 4, w i in (20), and w j in Corollary 6 need to be written in terms of u it and b it as well. According to Lemma 2, one only needs to calculate E(u it ), i = 1,..., m; t = 1,..., J 1 and E(b it ), i = 1,..., m; t = 2,..., J 1 (if J 3). Then E(c it ) = E(u it ) E(b it ) E(b i,t+1 ) and E(e i ) = J 1 t=1 E(c it). After that, we can use the lift-one algorithm in Section 4 or the exchange algorithm in Section 5 to find EW D-optimal designs. We use the odor removal example to illustrate how it works. Example 4 : Odor Removal Study (continued) Again suppose that we want to conduct a followup experiment. Instead of using the assumed parameter values (β 1, β 2, θ 1, θ 2 ) = ( 2.45, 1.09, 2.67, 0.21), suppose we believe that the truth values of parameters satisfy β 1 [ 3, 1], β 2 [0, 2], θ 1 [ 4, 2], and θ 2 [ 1, 1]. In order to perform Bayes optimality, we assume that the four parameters are independently and uniformly distributed within their intervals. It takes the R function constroptim 430 seconds to numerically find the Bayes D-optimal allocation p b = (0.3879, , , 27

D-OPTIMAL DESIGNS WITH ORDERED CATEGORICAL DATA

Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0210 D-OPTIMAL DESIGNS WITH ORDERED CATEGORICAL DATA Jie Yang, Liping Tong and Abhyuday Mandal University of Illinois at Chicago,