D-optimal Designs with Ordered Categorical Data

Size: px
Start display at page:

Download "D-optimal Designs with Ordered Categorical Data"

Transcription

1 D-optimal Designs with Ordered Categorical Data Jie Yang Liping Tong Abhyuday Mandal University of Illinois at Chicago Loyola University Chicago University of Georgia February 20, 2015 Abstract We consider D-optimal designs with ordered categorical responses and cumulative link models. In addition to theoretically characterizing locally D-optimal designs, we develop efficient algorithms for obtaining both approximate designs and exact designs. For ordinal data and general link functions, we obtain a simplified structure of the Fisher information matrix, and express its determinant as a homogeneous polynomial. For a predetermined set of design points, we derive the necessary and sufficient conditions for an allocation to be locally D- optimal. We prove that the number of support points in a minimally supported design only depends on the number of predictors, which can be much less than the number of parameters in the model. We show that a D-optimal minimally supported allocation in this case is usually not uniform on its support points. We also provide EW D- optimal designs as a highly efficient surrogate to Bayesian D-optimal designs with ordinal data. Keyword: Approximate design; exact design; multinomial response; cumulative link model; minimally supported design; ordinal data 1 Introduction We consider optimal experimental designs with ordered categorical responses, or simply ordinal data. Design of experiment with ordinal data has been of 1

2 great importance in a rich variety of scientific disciplines especially when human evaluations are involved (Christensen, 2013). Examples include wine bitterness study (Randall, 1989), potato pathogen experiments (Omer et al., 2000), radish seedling s damping-off study (Krause et al., 2001), polysilicon deposition study (Wu, 2008), beef cattle research (Osterstock et al., 2010), and toxicity study (Agresti, 2013). This research is motivated by an odor removal study conducted by the textile engineers at the University of Georgia. The scientists manufacture bio-plastics from algae that contain odorous elements. Following traditional factorial design theory for linear models, a regular 2 2 experiment with equal number of replicates was used to study the effect of types of algae and synthetic resins in removing the odor, and the response was ordinal in nature no odor, medium odor and strong odor. In this paper we identify designs that are significantly more efficient than the one used for this purpose. For an ordinal response Y with J categories and a set of d predictors x = (x 1,..., x d ) T, the most popular model is the cumulative logit model (also known as proportional odds model, see Liu and Agresti (2005) for a detailed review). McCullagh (1980) extended the proportional odds model with a more general link function g called the cumulative link model (also known as ordinal regression model) g (P (Y j x)) = θ j β T x, j = 1,..., J 1 (1) and treated it as a special case of the multivariate generalized linear model. In this paper, we focus on the cumulative link model with a general link. If there are only two categories (J = 2), the cumulative link model (1) is essentially a generalized linear model for binary data (McCullagh and Nelder, 1989; Dobson and Barnett, 2008). For optimal designs under generalized linear models, there is a growing body of literature (see Khuri et al. (2006), Atkinson et al. (2007), Stufken and Yang (2012) and references therein). When J 3, the results on optimal designs is meagre and restricted to logit link (Zocchi and Atkinson, 1999; Perevozskaya et al., 2003) due to the complexity of the Fisher information matrix F. In this paper, we obtain a special structure of F (Lemmas 1 and 2) for general link and reveal that the optimal designs with J 3 are quite different from the cases with J = 2. We prove that the number of support points of a minimally supported design is d + 1 which could be much less than the number of parameters d + J 1 (Theorems 3 and 4). We also show that the design weights of a minimally 2

3 supported design is usually not uniform on its support points when it is optimal (Section 6). Among various design criteria, D-optimality is the most frequently used one (Zocchi and Atkinson, 1999) and often performs well according to other criteria (Atkinson et al., 2007). Throughout this paper, we focus on D- criterion. In order to overcome the difficulty due to dependency of D-optimal designs on the values of unknown parameters, we choose the local optimality approach of Chernoff (1953) with assumed parameter values. In terms of robust designs, we compare Bayesian D-optimal designs (Chaloner and Verdinelli, 1995) with EW D-optimal designs (Atkinson et al., 2007; Yang, Mandal and Majumdar, 2014) for ordinal data. As a surrogate for Bayesian designs, EW design is much easier to find and retains high efficiency with respect to Bayesian criterion (Section 7). In the design literature, one type of experiments deal with quantitative or continuous factors only. Such a design problem includes identification of a set of design points {x i } i=1,...,m and the corresponding weights {p i } i=1,...,m (see, for example, Atkinson et al. (2007) and Stufken and Yang (2012)). For this type of optimal design problems, numerical algorithms are typically used for cases with two or more factors (see, for example, Woods et al. (2006)). Another type of experiments use qualitative or discrete factors, where the set of design points {x i } i=1,...,m is predetermined and only the weights {p i } i=1,...,m are to be optimized (see, for example, Yang and Mandal (2014)). One connection between the two types of designs is that one can pick up grid points of the continuous factors and turn the first type into the second. Tong et al. (2014) made another connection between the optimal designs for discrete factors and continuous factors (see Section 5 of that paper). In this paper, we concentrate on the second type of designs and assume {x i } i=1,...,m is given and fixed. This paper is organized as follows. In Section 2, we obtain the Fisher information matrix for cumulative link model with a general link, which generalizes Perevozskaya et al. (2003) s result for logit link. In Section 3 identifies a necessary and sufficient condition for the Fisher information matrix to be positive definite. In Sections 4 and 5, theoretical results and numerical algorithms for searching locally D-optimal approximate or exact designs are provided. In Section 6, we identify analytic D-optimal designs for special cases to illustrate that a D-optimal minimally supported design is usually not uniform on its support points. In Section 7, we show by examples that the EW D-optimal design is highly efficient with respect to Bayesian D- 3

4 optimality. Beyond theoretical results provided in this paper, the question that might be asked is whether these results give the users any advantage in real experiments. The answer is a definite yes as demonstrated for the motivating example. 2 Cumulative link model and Fisher information matrix Suppose there are m (m 2) experimental settings which are predetermined. For the ith experimental setting with corresponding covariates or predictors x i = (x i1,..., x id ) T R d (d 1), there are n i experimental units assigned to it. Among them, the kth experimental unit generates a response V ik which belongs to one of J (J 2) ordered categories. In many real applications, V i1,..., V ini are regarded as i.i.d. discrete random variables. Denote π ij = P (V ik = j), where i = 1,..., m, j = 1,..., J, and k = 1,..., n i. Let Y ij = #{k V ik = j} be the number of V ik s falling into the jth category. Then (Y i1,..., Y ij ) Multinomial(n i ; π i1,..., π ij ). Throughout this paper, we assume Assumption 1. 0 < π ij < 1, i = 1,..., m; j = 1,..., J. Denote γ ij = P (V ik j) = π i1 + + π ij, j = 1,..., J. Based on Assumption 1, 0 < γ i1 < γ i2 <... < γ i,j 1 < γ ij = 1 for each i = 1,..., m. Consider independent multinomial observations (Y i1,..., Y ij ), i = 1,..., m with corresponding predictors x 1,..., x m. Under a cumulative link model or ordinal regression model (McCullagh, 1980; Agresti, 2013; Christensen, 2013), there exists a link function g and parameters of interest θ 1,..., θ J 1, β = (β 1,..., β d ) T, such that g(γ ij ) = θ j x T i β, j = 1,..., J 1. This leads to m(j 1) equations in d + J 1 parameters (β 1,..., β d, θ 1,..., θ J 1 ). Furthermore, if g is strictly increasing, then θ 1 < θ 2 < < θ J 1 under Assumption 1, which is the case for commonly used link functions including logit (log(γ/(1 γ)), probit (Φ 1 (γ)), log-log ( log( log(γ))), complementary log-log (log( log(1 γ))), and cauchit (tan(π(γ 1/2))) (McCullagh and Nelder, 1989; Christensen, 2013). 4

5 Example 1. Consider the logit link g(γ) = log(γ/(1 γ)) with two factors and three ordered categories. The model consists of 2m equations g(γ ij ) = θ j x i1 β 1 x i2 β 2, i = 1,..., m; j = 1, 2 and 4 parameters (β 1, β 2, θ 1, θ 2 ). Under Assumption 1, γ i1 < γ i2 and θ 1 < θ 2 since g is strictly increasing. Example 2. Suppose the model consists of three covariates x 1, x 2, x 3 and a few second-order items, g(γ ij ) = θ j x i1 β 1 x i2 β 2 x i3 β 3 x i1 x i2 β 12 x 2 i1β 11 x 2 i2β 22, where i = 1,..., m; j = 1,..., J 1. Then d = 6. Since (Y i1,..., Y ij ), i = 1,..., m are independent, the log-likelihood function (up to a constant) of the cumulative link model is l(β 1,..., β d, θ 1,..., θ J 1 ) = m J Y ij log(π ij ) i=1 j=1 where π ij = γ ij γ i,j 1 with γ ij = g 1 (θ j x T i β) for j = 1,..., J 1 and γ i0 = 0, γ ij = 1, i = 1,..., m. Assumption 2. The link function g is differentiable and its derivative g is always strictly positive. We keep Assumption 2 throughout the paper, which is satisfied for logit, probit, log-log, complementary log-log, and cauchit. Under Assumptions 1 and 2, g is strictly increasing and thus θ 1 < θ 2 < < θ J 1. For s = 1,..., d, t = 1,..., J 1, l β s = l θ t = m ( x is ) i=1 { Yi1 π i1 (g 1 ) (θ 1 x T i β) + Y i2 [(g 1 ) (θ 2 x T i β) (g 1 ) (θ 1 x T i β) ] π i2 + + Y ij [ (g 1 ) (θ J 1 x T i β) ] } π ij m ( (g 1 ) (θ t x T Yit i β) Y ) i,t+1 π it π i,t+1 i=1 Since Y ij s come from multinomial distributions, we know E(Y ij ) = n i π ij, E(Yij) 2 = n i (n i 1)πij 2 + n i π ij, and E(Y is Y it ) = n i (n i 1)π is π it when s t. Then we have the following lemma. 5

6 Lemma 1. Let F = (F st ) be the (d + J 1) (d + J 1) Fisher information matrix. (i) For 1 s d, 1 t d, ( ) l l F st = E = β s β t m n i x is x it i=1 J (g ij g i,j 1 ) 2 where g ij = (g 1 ) (θ j x T i β) > 0 for j = 1,..., J 1 and g i0 = g ij = 0. j=1 π ij (ii) For 1 s d, 1 t J 1, ( ) l l m ( git g i,t 1 F s,d+t = E = n i ( x is )g it β s θ t π it (iii) For 1 s J 1, 1 t d, ( ) l l m ( gis g i,s 1 F d+s,t = E = n i ( x it )g is θ s β t π is i=1 i=1 g ) i,t+1 g it π i,t+1 g ) i,s+1 g is π i,s+1 (iv) For 1 s J 1, 1 t J 1, ( ) m l l i=1 n igis(π 2 1 is + π 1 i,s+1 ), if s = t F d+s,d+t = E = m i=1 θ s θ t n ig is g it ( π 1 i,s t ), if s t = 1 0, if s t 2 where s t = max{s, t}. Perevozskaya et al. (2003) obtained a detailed form of Fisher information matrix for logit link and one predictor. Our expressions here are good for fairly general link and d predictors. To simplify the notations, we denote e i = J (g ij g i,j 1 ) 2 j=1 c it = g it ( git g i,t 1 π it u it = g 2 it(π 1 it π ij > 0, i = 1,..., m (2) ), i = 1,..., m; t = 1,..., J 1 (3) g i,t+1 g it π i,t+1 + π 1 i,t+1 ) > 0, i = 1,..., m; t = 1,..., J 1 (4) b it = g i,t 1 g it π 1 it > 0, i = 1,..., m; t = 2,..., J 1 (if J 3) (5) Note that g ij is defined in Lemma 1 (i). Then we obtain the following lemma which plays a key role in later on calculation of F. 6

7 Lemma 2. c it = u it b it b i,t+1, i = 1,..., m; t = 1,..., J 1; e i = J 1 t=1 c it = J 1 t=1 (u it 2b it ), i = 1,..., m, where b i1 = b ij = 0 for i = 1,..., m. Example 1 (continued) For logit link g, g 1 (η) = e η /(1 + e η ) and (g 1 ) = g 1 (1 g 1 ). Thus g ij = (g 1 ) (θ j x T i β) = (γ ij )(1 γ ij ). With J = 3, we have π i1 +π i2 +π i3 = 1 for i = 1,..., m. Then for i = 1,..., m, g i1 = π i1 (π i2 + π i3 ), g i2 = (π i1 + π i2 )π i3, b i2 = π i1 π i3 π 1 i2 (π i1 + π i2 )(π i2 + π i3 ), u i1 = π i1 π 1 i2 (π i1 + π i2 )(π i2 + π i3 ) 2, u i2 = π i3 π 1 i2 (π i1 + π i2 ) 2 (π i2 + π i3 ), c i1 = π i1 (π i1 + π i2 )(π i2 +π i3 ), c i2 = π i3 (π i1 +π i2 )(π i2 +π i3 ), e i = (π i1 +π i2 )(π i1 +π i3 )(π i2 +π i3 ) As a direct conclusion of Lemma 1 and Lemma 2, we obtain the theorem as follows: Theorem 1. Under Assumptions 1 and 2, the Fisher information matrix F can be written as m F = n i A i (6) A T i2 i=1 where the (d + J 1) (d + J 1) matrix ( ) ( ) Ai1 A A i = i2 (ei x = is x it ) s=1,...d;t=1,...,d ( x is c it ) s=1,...,d;t=1,...,j 1 A i3 ( c is x it ) s=1,...,j 1;t=1,...,d A i3 and the (J 1) (J 1) matrix A i3 is symmetric tri-diagonal with diagonal entries u i1,..., u i,j 1 and off-diagonal entries b i2,..., b i,j 1 for J 3. Note that A i3 contains only one entry u i1 for J = 2. Examples of A i3 include (u i1 ), ( ui1 b i2 b i2 u i2 ), u i1 b i2 0 b i2 u i2 b i3 0 b i3 u i3 for J = 2, 3, 4, or 5 respectively., u i1 b i2 0 0 b i2 u i2 b i3 0 0 b i3 u i3 b i4 0 0 b i4 u i4 Remark 1. As an important property of the Fisher information matrix, F is always positive semi-definite (p.s.d.) which implies F 0. As a special case, A i can be regarded as the Fisher information matrix at the support point x i. Therefore, A i is also p.s.d. and A i 0 (actually A i = 0 according to Lemma 3 in Section 3). 7

8 3 Determinant of Fisher Information Matrix Among the several criteria for optimal designs, D-criterion looks for the allocation maximizing F, the determinant of F. A D-optimal design with m predetermined design points x 1,..., x m could either be an integer-valued allocation (n 1, n 2,..., n m ) maximizing F with pre-determined n = m i=1 n i > 0, known as an exact design; or a real-valued allocation (p 1, p 2,..., p m ) maximizing n 1 F with p i = n i /n 0 and m i=1 p i = 1, known as an approximate design. To study the structure of F as a polynomial function of (n 1,..., n m ), we denote the (k, l)th entry of A i by a (i) kl. Given a row map τ : {1, 2,..., d + ( J 1} ) {1,..., m}, we define a (d + J 1) (d + J 1) matrix A τ = a (τ(k)) kl whose kth row is given by the kth row of A τ(k). For a power index (α 1,..., α m ) with α i {0, 1,..., d + J 1} and m i=1 α i = d + J 1, we denote τ (α 1,..., α m ) if α i = #{j : τ(j) = i} for each i = 1,..., m. In terms of the construction of A τ, it says that α i rows of A τ are from the matrix A i. Theorem 2. The determinant F is an order-(d + J 1) homogeneous polynomial of (n 1,..., n m ) and F = α 1 + +α m =d+j 1 c α1,...,α m n α 1 1 n α m m where c α1,...,α m = A τ (7) τ (α 1,...,α m) Proof of Theorem 2: According to the Leibniz formula for the determinant, m F = n i A i = d+j 1 m n i a (i) k,σ(k) i=1 σ S d+j 1 ( 1) sgn(σ) where σ is a permutation of {1, 2,..., d + J 1}, and sgn(σ) is the sign or k=1 i=1 8

9 signature of σ. Therefore, c α1,...,α m = = = σ S d+j 1 ( 1) sgn(σ) τ (α 1,...,α m) ( 1) sgn(σ) τ (α 1,...,α m ) σ S d+j 1 τ (α 1,...,α m ) A τ d+j 1 k=1 d+j 1 k=1 a (τ(k)) k,σ(k) a (τ(k)) k,σ(k) In order to obtain analytic properties of F, we need the following lemmas derived from Lemma 2 and Theorem 1, as well as classical matrix theory and mathematical induction. Note that the following Lemma 3 covers Lemma 1 in Perevozskaya et al. (2003) as a special case. Lemma 3. Rank(A i ) = Rank(A i3 ) = J 1. Furthermore, A i3 is positive definite and J 1 J A i3 = gis 2 π 1 it > 0 (8) s=1 Lemma 4. Rank((A i1 A i2 )) 1 where = is true if and only if x i 0. Based on Lemma 3 and Lemma 4, we can obtain the two lemmas below on c α1,...,α m which significantly simplify the structure of F as a polynomial of (n 1,..., n m ). Lemma 5. If max 1 i m α i J, then A τ = 0 for any τ (α 1,..., α m ) and thus c α1,...,α m = 0. Proof of Lemma 5: Without any loss of generality, we assume α 1 α 2 α m. Then max 1 i m α i J implies α 1 J. In this case, for any τ (α 1,..., α m ), τ 1 (1) := {i τ(i) = 1} {1,..., d + J 1} and τ 1 (1) = α 1. If τ 1 (1) {1,..., d} 2, then A τ = 0 due to Lemma 4; otherwise {d + 1,..., d + J 1} τ 1 (1) and thus A τ = 0 due to Lemma 3. Thus c α1,...,α m = 0 according to (7) provided in Theorem 2. Lemma 6. If #{i : α i 1} d, then A τ = 0 for any τ (α 1,..., α m ) and thus c α1,...,α m = 0. 9 t=1

10 Proof of Lemma 6: Without any loss of generality, we assume α 1 α 2 α m. Then #{i : α i 1} d indicates α d+1 = = α m = 0. Let τ : {1, 2,..., d + J 1} {1,..., m} satisfy τ (α 1,..., α m ). Then the (d + J 1) (d + J 1) matrix A τ can be written as ( ) ( Aτ1 A τ2 (eτ(s) x = τ(s)s x τ(s)t ) s=1,...d;t=1,...,d ( x τ(s)s c τ(s)t ) s=1,...,d;t=1,...,j 1 A τ3 A τ4 ( c τ(d+s)s x τ(d+s)t ) s=1,...,j 1;t=1,...,d A τ4 ) where the (J 1) (J 1) matrix A τ4 is either a single entry u τ(d+1)1 (if J = 2) or symmetric tri-diagonal with diagonal entries u τ(d+1)1,..., u τ(d+j 1),J 1, upper off-diagonal entries b τ(d+1)2,..., b τ(d+j 2),J 1, and lower off-diagonal entries b τ(d+2)2,..., b τ(d+j 1),J 1. Note that A τ is asymmetric in general. If #{i : α i 1} d 1, then there exists an i 0 such that 1 i 0 d and τ 1 (i 0 ) {1,..., d} 2. In this case, A τ = 0 according to Lemma 4. If #{i : α i 1} = d, we may assume τ 1 (i) {1,..., d} = 1 for i = 1,..., d (otherwise A τ = 0 according to Lemma 4). Suppose α 1 α 2 α k 2 > α k+1. Then {d+1,..., d+j 1} k i=1τ 1 (i) and k i=1 (α i 1) = J 1. In order to show A τ = 0, we first replace A τ1 with A (1) τ1 = (e τ(s) x τ(s)t ) s=1,...d; t=1,...,d and replace A τ2 with A (1) τ2 = ( c τ(s)t ) s=1,...,d; t=1,...,j 1. It changes A τ into a new matrix A (1) τ. Note that A τ = d s=1 x τ(s)s A (1) τ. According to Lemma 2, the sum of the columns of A (1) τ2 is ( e τ(1),..., e τ(d) ) T, and the elementwise sum of the columns of A τ4 is (c τ(d+1)1, c τ(d+2)2,..., c τ(d+j 1),J 1 ) T. Secondly, for t = 1,..., d, we add x 1t ( e τ(1),..., e τ(d), c τ(d+1)1,..., c τ(d+j 1),J 1 ) T to the tth column of A (1) τ. We denote the resulting matrix by A (2) τ. Note that A (1) τ = A (2) τ. We consider the sub-matrix A (2) τd which consists of the first d columns of A (2) τ. For s τ 1 (1), the sth row of A (2) τd is simply 0. For i = 2,..., k, the jth row of A(2) τd is proportional to (x i1 x 11, x i2 x 12,..., x id x 1d ) if j τ 1 (i). Therefore, Rank(A (2) τd ) (d + J 1) α 1 k i=2 (α i 1) = d 1, which leads to A (2) τ = 0 and thus A (1) τ = 0, A τ = 0. According to (7) in Theorem 2, c α1,...,α m = 0. Example 3. Suppose d = 2, J = 3 with link function g. According to Theorem 2, F in this case is an order-4 homogeneous polynomial of (n 1,..., n m ). Due to Lemma 5 and Lemma 6, we can remove all the terms in the form of 10

11 n 4 i, n 3 i n j, or n 2 i n 2 j from F. Therefore, F = m c ijk n 2 i n j n k + c ijkl n i n j n k n l i=1 j<k,j i,k i i<j<k<l for some coefficients c ijk and c ijkl. Based on Lemma 5 and Lemma 6, in order to keep c α1,...,α m 0, the largest possible α i is J 1 and the fewest possible number of positive α i s is d + 1. As a direct conclusion of Lemma 6, the following theorem states a minimally supported design has at least d + 1 support point. Note that it could be much less than the number of parameters d + J 1. Theorem 3. F > 0 only if m d + 1. In order to find out when d+1 support point are enough for a meaningful design (that is, F > 0), we study the leading term of F with max 1 i m α i = J 1. For example, a i0 = J 1 for some 1 i 0 m. Due to Lemma 6 and m i=1 α i = d + J 1, in order to keep c α1,...,α m 0, there must exist 1 i 1 < i 2 < < i d m which are different from i 0, such that, α i1 = = α id = 1. The following lemma provides the explicit formula of such a coefficient c α1,...,α m. Lemma 7. Suppose α i0 = J 1 and α i1 = = α id = 1, where i 0, i 1,..., i d are distinct integers in {1,..., m}. Then c α1,...,α m = d e is A i0 3 X 1 [i 0, i 1,..., i d ] 2 s=1 where e is is defined by (2), A i0 3 can be calculated by (8), X 1 = (1 X) is an m (d + 1) matrix with 1 = (1,..., 1) T, X = (x 1,..., x m ) T, and X 1 [i 0, i 1,..., i d ] is the sub-matrix consisting of the i 0 th, i 1 th,..., i d th rows of X 1. The proof for Lemma 7 is relegated to the Appendix. For the purpose of finding D-optimal allocations, we write F = f(n 1,..., n m ) for an order- (d+j 1) homogeneous polynomial function f. The D-optimal exact design problem is to solve the integer-valued optimization problem given a positive 11

12 integer n max f(n 1, n 2,..., n m ) subject to n i {0, 1,..., n}, i = 1,..., m (9) n 1 + n n m = n Denote p i = n i /n, i = 1,..., m. According to Theorem 1, m m f(n 1,..., n m ) = n i A i = n m p i A i = n d+j 1 p i A i = n d+j 1 f(p 1,..., p m ) i=1 i=1 Therefore, the D-optimal approximate design problem is to solve the realvalued optimization problem i=1 max f(p 1, p 2,..., p m ) subject to 0 p i 1, i = 1,..., m (10) p 1 + p p m = 1 According to Lemma 3, A i0 3 > 0. Thus c α1,...,α m in Lemma 7 is positive as long as X 1 [i 0,..., i d ] is of full rank. Theorem 3 implies that a minimally supported design contains at least d + 1 support points, while the following theorem states a necessary and sufficient condition for the minimum number of support points to be exactly d + 1. Recall that X 1 = (1 X) is defined in Lemma 7. Theorem 4. f(p) > 0 for some p = (p 1,..., p m ) T if and only if Rank(X 1 ) = d + 1. Proof of Theorem 4: Suppose Rank(X 1 ) = d + 1. Then there exist i 0,..., i d {1,..., m}, such that, X 1 [i 0, i 1,..., i d ] = 0. According to Lemma 5, f(p) can be regarded as an order-(j 1) polynomial of p i0. Let p i0 = x (0, 1) and p i = (1 x)/(m 1) for i i 0. Based on Lemma 7, f(p) can be written as ( ) d ( ) d+1 1 x 1 x f i0 (x) = a J 1 x J 1 + a J 2x J 2 m 1 m 1 ( ) d+j 2 ( ) d+j 1 1 x 1 x + + a 1 x + a 0, where m 1 m 1 d a J 1 = A i0 3 e i s X 1 [i 0, i 1,..., i d] 2 > 0 {i 1,...,i d } {1,...,m}\{i 0} 12 s=1

13 Therefore, lim x 1 (1 x) d x 1 J f i0 (x) = (m 1) d a J 1 > 0. That is, f(p) > 0 for p i0 = x close enough to 1 and p i = (1 x)/(m 1) for i i 0. In order to justify that the condition Rank(X 1 ) = d + 1 is also necessary, we only need to show that f(p) 0 if Rank(X 1 ) d. Actually, for any as in the proof of Lemma 6. Then A τ = d s=1 x τ(s)s A (1) τ. Similar as in the proof of Lemma 6, for t = 1,..., d, we add x τ(1)t ( e τ(1),..., e τ(d), c τ(d+1)1,..., c τ(d+j 1),J 1 ) T to the tth column of A (1) τ. We denote the resulting matrix by A (3) τ. Note that A (1) τ = A (3) τ. We consider the sub-matrix A (3) τd which consists of the first d columns of A (3) τ. For s τ 1 (τ(1)), the sth row of A (3) τd is simply 0. For s = 2,..., k, the sth row of A (3) τd is e τ(s)(x τ(s)1 x τ(1)1,..., x τ(s)d x τ(1)d ). For s = 1,..., J 1, the (d + s)th row of A (3) τd is c τ(d+s)s(x τ(d+s)1 x τ(1)1,..., x τ(d+s)d x τ(1)d ). We claim that Rank(A (3) τd ) d 1. Otherwise, if Rank(A (3) τd ) = d, then there exist i 1,..., i d {2,..., d + J 1}, such that, the sub-matrix consisting of the i 1 th,..., i d th rows of A (3) τd is nonsingular. Then the sub-matrix consisting of the τ(1)th, τ(i 1 )th,..., τ(i d )th rows of X 1 is nonsingular, which implies Rank(X 1 ) = d + 1. The contradiction implies Rank(A (3) τd ) d 1. Then A(3) τ = 0 and thus A τ = 0 for each τ. Based on Theorem 2, F 0 and thus f(p) 0. τ : {1,..., d + J 1} {1,..., m}, we construct A (1) τ 4 Locally D-optimal Approximate Design A D-optimal approximate design is an allocation p = (p 1,..., p m ) T solving the optimization problem (10). The solution always exists since f is continuous and the set of feasible allocations m S := {(p 1, p 2,..., p m ) T R m p i 0, i = 1,..., m; p i = 1} is convex and compact. Theorem 4 ascertains that a meaningful D-optimal approximate design problem requires the following assumption. We assume that it is true for the rest of the paper. Assumption 3. Rank(X 1 ) = d + 1. Under Assumption 3, the set of nontrivial allocations S + := {p = (p 1, p 2,..., p m ) T S f(p) > 0} 13 i=1

14 is nonempty. As discussed in Remark 1, the Fisher information matrix F = m i=1 n ia i (see Theorem 1) is always positive semi-definite. Note that f(p) = n 1 d J F given p i = n i /n, i = 1,..., m. Since F = n m i=1 p ia i is linear in p and ϕ( ) = log is concave on positive semi-definite matrices, we know that f(p) is log-concave (Silvey, 1980). Lemma 8. F = F (p) is always positive semi-definite. It is positive definite if and only if p S +. Furthermore, log f(p) is concave on S. Lemma 8 assures that S + is convex given that it is nonempty. Following the proof of Theorem 4, we can justify that S + contains all p whose coordinates are all strictly positive. Theorem 5. f(p) > 0 if and only if Rank(X 1 [{i p i > 0}]) = d + 1, where p = (p 1,..., p m ) T S and X 1 [{i p i > 0}] is the sub-matrix consisting of the {i p i > 0}th rows of X 1. In other words, S + = { p = (p 1, p 2,..., p m ) T S Rank(X 1 [{i p i > 0}]) = d + 1 } Proof of Theorem 5: Combining Theorem 1 and Theorem 4, it is straightforward that f(p) = 0 if Rank(X 1 [{i p i > 0}]) d. We only need to show that f(p) > 0 if Rank(X 1 [{i p i > 0}]) = d + 1. Due to Theorem 1, we only need to verify the case p i > 0, i = 1,..., m. (Otherwise, we may simply remove all support points with p i = 0.) Suppose p i > 0, i = 1,..., m and Rank(X 1 ) = d + 1. Then there exist i 0,..., i d {1,..., m}, such that, X 1 [i 0,..., i d ] = 0. According to the proof of Theorem 4, for each i {i 0,..., i d }, there exists an ϵ i (0, 1), such that, f(p) > 0 as long as p i = x (1 ϵ i, 1) and p j = (1 x)/(m 1) for j i. On the other hand, for each i / {i 0,..., i d }, if we denote the jth row of X 1 by α j, j = 1,..., m, then α i = a 0 α i0 + + a d α id for some real numbers a 0,..., a d. Since α i 0, then at least one a i 0. Without any loss of generality, we assume a 0 0. Then it can be verified that X 1 [i, i 1,..., i d ] 0 too. Following the proof of Theorem 4 again, for such an i / {i 0,..., i d }, there also exists an ϵ i (0, 1), such that, f(p) > 0 as long as p i = x (1 ϵ i, 1) and p j = (1 x)/(m 1) for j i. Let ϵ = min{min i ϵ i, (m 1) min i p i, 1 1/m}/2. For i = 1,..., m, denote δ i = (δ i1,..., δ im ) T S with δ ii = 1 ϵ and δ ij = ϵ /(m 1) for j i. It can be verified that p = a 1 δ a m δ m with a i = (p i ϵ /(m 1))/(1 mϵ /(m 1)). By the choice of ϵ, f(δ i ) > 0, a i > 0, i = 1,..., m, and i a i = 1. Then f(p) > 0 according to Lemma 8. 14

15 Corollary 1. Under Assumption 3, f(p) > 0 if p = (p 1,..., p m ) T S satisfying p i > 0, i = 1,..., m. As a special case, f(p u ) > 0, where p u = (1/m,..., 1/m) T is the uniform allocation. Corollary 2. F > 0 if and only if Rank(X 1 [{i n i > 0}]) = d + 1. Since f(p) is log-concave, the Karush-Kuhn-Tucker conditions (Karush (1939); Kuhn and Tucker (1951)) are also sufficient for p to be D-optimal. We have the following theorem as a direction conclusion. Theorem 6. Suppose p = (p 1,..., p m) T S +. p is D-optimal if and only if there exists a λ R such that for i = 1,..., m, either f(p)/ p i = λ if p i > 0 or f(p)/ p i λ if p i = 0. Theorem 6 provides a Karush-Kuhn-Tucker type condition. It is especially useful for checking when a minimally supported design is D-optimal (see Section 6). Another necessary and sufficient condition for D-optimal designs is of the general-equivalence-theorem type (Kiefer, 1974; Pukelsheim, 1993; Atkinson et al., 2007; Stufken and Yang, 2012; Fedorov and Leonov, 2014; Yang, Mandal and Majumdar, 2014). It is more convenient while searching for numerical solutions. Following Yang, Mandal and Majumdar (2014), for given p = (p 1,..., p m ) T S + and i {1,..., m}, we define ( 1 z f i (z) = f p 1,..., 1 z p i 1, z, 1 z p i+1,..., 1 z ) p m (11) 1 p i 1 p i 1 p i 1 p i with 0 z 1. Note that f i (z) is well defined as long as p i < 1. Suppose f(p) > 0. Following the proof of Theorem 4, we obtain the following theorem on the coefficients of f i (z). Theorem 7. Suppose p = (p 1,..., p m ) T S +. Given i {1,..., m}, for 0 z 1, J 1 f i (z) = (1 z) d a j z j (1 z) J 1 j (12) j=0 where a 0 = f i (0), (a J 1,..., a 1 ) T = B 1 J 1 c, B J 1 = (s t 1 ) st is a (J 1) (J 1) matrix, and c = (c 1,..., c J 1 ) T with c j = (j + 1) d+j 1 j d f i ( 1 ) j+1 j J 1 f i (0), j = 1,..., J 1. 15

16 According to Theorem 7, f i (z) is an order-(d + J 1) polynomial of z. In other to determine its coefficients a 0, a 1,..., a J 1 as in (12), we need to calculate f i (0), f i (1/2), f i (1/3),..., f i (1/J), which are J determinants defined in (11). Note that B 1 J 1 is a matrix determined by J 1 only. For example, B1 1 = 1 for J = 2, ( 2 1 B2 1 = 1 1 ), B3 1 = , B4 1 = for J = 3, 4, or 5 respectively. Once a 0,..., a J 1 in (12) are determined, the maximization of f i (z) on z [0, 1] is numerically straightforward since it is a polynomial and its derivative is given by J 1 J 1 f i(z) = (1 z) d ja j z j 1 (1 z) J 1 j (1 z) d 1 (d+j 1 j)a j z j (1 z) J 1 j j=1 (13) Following the proof of Theorem 3.1.1, Theorem 3.3.3, and the lift-one algorithm in Yang, Mandal and Majumdar (2014), we have similar results and algorithm as follows: Theorem 8. Suppose p = (p 1,..., p m) T S +. p is D-optimal if and only if for each i = 1,..., m, f i (z), 0 z 1 attains it maximum at z = p i. Lift-one algorithm: 1 Start with arbitrary p 0 = (p 1,..., p m ) satisfying 0 < p i < 1, i = 1,..., m and compute f (p 0 ). 2 Set up a random order of i going through {1, 2,..., m}. 3 Following the random order of i in 2, for each i, determine f i (z) according to Theorem 7. In this step, J determinants f i (0), f i (1/2), f i (1/3),..., f i (1/J) are calculated based on (11). 4 Use quasi-newton method with gradient defined in (13) to find z maximizing f i (z) with 0 z 1. If f i (z ) f i (0), let z = 0. Define p (i) = ( 1 z 1 p i p 1,..., 1 z 1 p i p i 1, z, 1 z 1 p i p i+1,..., 1 z 1 p i p m ) T. Note that f(p (i) ) = f i (z ). 16 j=0 1 6

17 5 Replace p 0 with p (i), f (p 0 ) with f(p (i) ). 6 Repeat 2 5 until convergence, that is, f(p 0 ) = f(p (i) ) for each i. Theorem 9. When the lift-one algorithm converges, the resulting allocation p maximizes f(p) on the set of feasible allocations S. Example 4. Odor removal study The motivating example mentioned in Introduction is the odor removal study conducted at the University of Georgia. The scientists study the manufacture of bio-plastics from algae that contain odorous volatiles. These odorous volatiles, generated from algae bioplastics, either occur naturally within the algae or are generated through the thermoplastic processing due to heat and pressure. In order to commercialize these algae bio-plastics, the odor causing volatiles must be removed. For that purpose, a 2 2 factorial experiment was conducted using algae and synthetic plastic resin blends. The two factors were types of algae (X 1 : raffinated or solvent extracted algae ( ), catfish pond algae (+)) and synthetic resins (X 2 : polyethylene ( ), polypropylene (+)). The responses had three categories: serious odor (j = 1), medium odor (j = 2) and almost no odor (j = 3). The results of a pilot study with uniform design and ten replicates at each experimental setting are given in Table 1. We consider the logit link and fit Table 1: Odor Removal Study Group X 1 X 2 Responses # of replicates Model y i1 y i2 y i3 i = n 1 = y 1j = 10 g(γ 1j ) = θ j β 1 β 2 i = n 2 = y 2j = 10 g(γ 2j ) = θ j β 1 + β 2 i = n 3 = y 3j = 10 g(γ 3j ) = θ j + β 1 β 2 i = n 4 = y 4j = 10 g(γ 4j ) = θ j + β 1 + β 2 the cumulative link model presented in Table 1. The estimated values of the model parameters are ( ˆβ 1, ˆβ 2, ˆθ 1, ˆθ 2 ) = ( 2.45, 1.09, 2.67, 0.21). Suppose a follow-up experiment is planned and the estimated parameter values are regarded as the true value. Then the D-optimal approximate allocation found by the lift-one algorithm is p o = (0.4454, , 0, ) T. 17

18 With respect to p o, the relative efficiency of the uniform approximate allocation p u = (1/4, 1/4, 1/4, 1/4) T is (f(p u )/f(p o )) 1/4 = 79.6% which is far from satisfactory. With all examples that we studied, the lift-one algorithms converge very fast. Nevertheless, Yang, Mandal and Majumdar (2014) also provided a modified lift-one algorithm, which is slightly slower but guaranteed to converge. The same technique could be easily applied to the lift-one algorithm above if it does not converge in a pre-specified number of iterations. 5 Locally D-optimal Exact Design A locally D-optimal exact design is an integer-valued allocation n = (n 1,..., n m ) T maximizing F given the total number n of experimental units or runs, where n i s are nonnegative integers satisfying m i=1 n i = n. According to Corollary 2, we must have n d + 1 in order to make F > 0 possible. Thus we assume n d + 1 in this section to avoid trivial cases. To maximize f(n) = f(n 1,..., n m ) = F, we adopt the idea of exchange algorithm which was first suggested by Fedorov (1972). Following the algorithm described in Yang, Mandal and Majumdar (2014), the exchange algorithm here is to adjust n i and n j simultaneously for randomly chosen index pair (i, j) while keeping n i + n j = c as a constant. We start with an n = (n 1,..., n m ) T satisfying f(n) > 0. According to Corollary 2, it indicates Rank(X 1 [{i n i > 0}]) = d + 1. Following Yang, Mandal and Majumdar (2014), for 1 i < j m, we define f ij (z) = f (n 1,..., n i 1, z, n i+1,..., n j 1, c z, n j+1,..., n m ) (14) where c = n i + n j, z = 0, 1,..., c. Note that f ij (n i ) = f(n). As a conclusion of Theorem 2, Lemmas 5 and 6, we have the following formula on calculating f ij (z): Theorem 10. Suppose n = (n 1,..., n m ) T satisfies f(n) > 0. Given 1 i < j m, suppose n i + n j J. For z = 0, 1,..., n i + n j, f ij (z) = J c s z s (15) s=0 18

19 where c 0 = f ij (0), and c 1,..., c J can be obtained by (c 1,..., c J ) T = B 1 J (d 1,..., d J ) T with B J = (s t 1 ) st as a J J matrix and d s = (f ij (s) f ij (0))/s, s = 1,..., J. Note that the J J matrix B J in Theorem 10 shares the same form of B J 1 in Theorem 7. According to Theorem 10, in other to maximize f ij (z) with z = 0, 1,..., n i + n j, one can obtain the exact polynomial form of f ij (z) by calculating f ij (0), f ij (1),..., f ij (J). There is no practical need to find out the exact form of f ij (z) if n i +n j < J since one may simply calculate f ij (z) for each z = 0, 1,..., n i + n j. Following Yang, Mandal and Majumdar (2014), the algorithm below based on Theorem 10 could be used to find out the D-optimal exact allocation. Exchange algorithm for D-optimal allocation (n 1,..., n m ) T given n > 0: 1 Start with an initial design n = (n 1,..., n m ) T such that f(n) > 0. 2 Set up a random order of (i, j) going through all pairs {(1, 2), (1, 3),..., (1, m), (2, 3),..., (m 1, m)}. 3 For each (i, j), let c = n i + n j. If c = 0, let n ij = n. Otherwise, there are two cases. Case one: 0 < c J, we calculate f ij (z) as defined in (14) for z = 0, 1,..., c directly and find z which maximizes f ij (z). Case two: c > J, we first calculate f ij (z) for z = 0, 1,..., J; secondly determine c 0, c 1,..., c J in (15) according to Theorem 10; thirdly calculate f ij (z) for z = J + 1,..., c based on (15); fourthly find z maximizing f ij (z) for z = 0,..., c. For both cases, we define n ij = (n 1,..., n i 1, z, n i+1,..., n j 1, c z, n j+1,..., n m ) T Note that f(n ij) = f ij (z ) f(n) > 0. If f(n ij) > f(n), replace n with n ij, and f(n) with f(n ij). 4 Repeat 2 3 until convergence, that is, f(n ij) = f(n) in step 3 for any (i, j). Example 4 : Odor Removal Study (continued) Suppose we want to conduct a followup experiment with n runs. Using the exchange algorithm described above, we obtain the D-optimal exact designs listed in Table 2. It can be seen from the number of iterations that the algorithms for D- optimal exact and approximate designs converge very quickly. As expected, 19

20 Table 2: D-optimal Exact Designs and Approximate Design for Odor Removal Study n n 1 n 2 n 3 n 4 n 4 F # iterations Time(sec.) < p o the D-optimal exact allocations (n 1,..., n 4 ) T is consistent with the D-optimal approximate allocation p o = (p 1,..., p 4 ) T (last row of Table 4) for large n. The time costs in seconds (last column of Table 4) are recorded on a PC with 2GHz CPU and 8GB memory. Suppose we rerun a design with n = 40. With respect to the D-optimal exact design n o = (18, 11, 0, 11) T, the relative efficiency of the uniform exact design n u = (10, 10, 10, 10) T is only (f(n u )/f(n o )) 1/4 = 79.7%. 6 Minimally Supported Designs A minimally supported design is a design with the minimal number of support/design points while keeping F > 0. It is of practical significance since it indicates the minimal number of different experimental settings needed in the experiment. According to Theorem 3, a minimally supported design contains at least d + 1 support points. Note that the minimal number d + 1 does not depend on J and could be strictly smaller than the number of parameters d + J 1. On the other hand, according to Theorem 4, a minimally supported design could contain exactly d + 1 support points as long as the extended design matrix X 1 = (1 X) is of full rank, that is, Rank(X 1 ) = d + 1. Example 5. Suppose J = 2. The multinomial response is actually binomial. In this case, there are d + 1 parameters, θ 1, β 1,..., β d. Consider a general link function satisfying Assumptions 1 and 2. For i = 1,..., m, g i0 = g i2 = 0, g i1 = (g 1 ) (θ 1 x T i β) > 0, e i = u i1 = c i1 = g 2 i1/[π i1 (1 π i1 )]. Then A i3 20

21 in Theorem 1 contains only one entry, u i1, and thus A i3 = u i1 or simply e i (Lemma 3 still holds). Assume that the m d design matrix X satisfies Assumption 3. According to Theorem 2, Lemma 5, Lemma 6, and Lemma 7, for an approximate design p = (p 1,..., p m ) T, f(p) = n (d+1) F = X 1 [i 0, i 1,..., i d ] 2 p i0 e i0 p i1 e i1 p id e id 1 i 0 <i 1 < <i d m (16) It can be verified that equation (16) is essentially the same as Lemma 3.1 in Yang and Mandal (2014). According to Theorem 3.2 in Yang and Mandal (2014), a minimally supported design may contain d + 1 support points and a D-optimal one must keep equal weight 1/(d + 1) on all support points. For univariate responses (including binomial response) under generalized linear models, a minimally supported design must keep equal weights on all its support points in order to keep D-optimality (Yang, Mandal and Majumdar, 2014; Yang and Mandal, 2014). However, for multinomial type responses with J 3, it is usually not the case. In the following part of this section, we use the cases of d = 1 and d = 2 as illustrations. 6.1 Minimally supported designs with d = 1 and J 3 In this subsection, we consider the cases with d = 1 and J 3. That is, there is only one factor in the experiment and the response belongs to J 3 categories. The corresponding parameters are β 1 and θ 1,..., θ J 1. We first set m = 2, that is, a design with only two support points (minimally supported). As a direct conclusion from Theorem 2, Lemma 5, and Lemma 6, for an approximate design p = (p 1, p 2 ) T, we have the result on the form of F as follows: Theorem 11. Suppose d = 1, J 3, and m = 2. The objective function for a D-optimal approximate design is J 1 f(p 1, p 2 ) = n 2 F = c s p J s 1 p s 2 (17) where c 1,..., c J 1 can be obtained by (c 1,..., c J 1 ) T = B 1 J 1 (d 1,..., d J 1 ) T with B J 1 = (s t 1 ) st as a (J 1) (J 1) matrix and d s = f(1/(s+1), s/(s+ 1)) (s + 1) J /s, s = 1,..., J s=1

22 Actually, there is another way to calculate c 1,..., c J 1 in equation (17). For example, according to Lemma 7, c 1 = e J 1 2 s=1 g2 1s J t=1 π 1 1t (x 1 x 2 ) 2, c J 1 = e J 1 1 s=1 g2 2s J t=1 π 1 2t (x 1 x 2 ) 2, where x 1, x 2 are the two levels of the only factor. Nevertheless, Theorem 11 provides a practically convenient way to find out the exact form of the objective function after calculating F for J 1 different designs. Then the D-optimal problem is to maximize an order- J polynomial (f(z, 1 z) for z [0, 1]) which is numerically straightforward. As a special case which can be solved explicitly, we set J = 3 and get the following result as a direct conclusion of Theorem 6 and Theorem 11. Corollary 3. Suppose d = 1, J = 3, and m = 2. The objective function for a D-optimal approximate design is f(p 1, p 2 ) = p 1 p 2 (c 1 p 1 + c 2 p 2 ) (18) where c 1 = e 2 g 2 11g 2 12(π 11 π 12 π 13 ) 1 (x 1 x 2 ) 2 > 0, c 2 = e 1 g 2 21g 2 22(π 21 π 22 π 23 ) 1 (x 1 x 2 ) 2 > 0, and x 1, x 2 are the two levels of the factor. The D-optimal design p = (p 1, p 2) which maximizes (18) can be obtained as follows p 1 = c 1 c 2 + c 2 1 c 1 c 2 + c 2 2 2c 1 c 2 + c 2 1 c 1 c 2 + c 2 2, p 2 = Furthermore, p 1 = p 2 = 1/2 if and only if c 1 = c 2. c 1 2c 1 c 2 + c 2 1 c 1 c 2 + c 2 2 (19) For the case of (d, J, m) = (1, 3, 2), it can verified that the D-optimal design p 1 = p 2 = 1/2 if β 1 = 0. However, p 1 p 2 in general, and p 1 > p 2 if and only if c 1 > c 2, where c 1, c 2 are defined as in Corollary 3. The following result provides a necessary and sufficient condition for a minimally supported design to be D-optimal for the case of d = 1 and J = 3. Its proof is relegated to the supplementary materials. Corollary 4. Suppose d = 1, J = 3, and m 3. Let x 1,..., x m denote the m distinct levels of the factor. A minimally supported design p = (p 1, p 2, 0,..., 0) T is D-optimal if and only if (1) p 1, p 2 are defined same as in (19); (2) For i = 3,..., m, s i3 (p 1) 2 + (s i5 2c 1 )p 1p 2 + (s i4 c 2 )(p 2) 2 0, where c 1, c 2 are same as in Corollary 3, s i3 = e i g 2 11g 2 12(π 11 π 12 π 13 ) 1 (x 1 x i ) 2 > 0, s i4 = e i g 2 21g 2 22(π 21 π 22 π 23 ) 1 (x 2 x i ) 2 > 0, s i5 = e 1 (u 22 u i1 + u 21 u i2 2b 22 b i2 )(x 1 x 2 )(x 1 x i ) + e 2 (u 12 u i1 + u 11 u i2 2b 12 b i2 )(x 2 x 1 )(x 2 x i ) + e i (u 12 u 21 + u 11 u 22 2b 12 b 22 )(x i x 1 )(x i x 2 ). 22

23 (a) β= 2 (b) θ 2 =5 p 2=0 θ p 2=0 p 1>0, p 2>0, p 3>0 p 1=0 p 3=0 θ p 3=0 p 1=0 p 2=0 p1>0, p2>0, p3>0 p1>0, p2>0, p3> θ β Figure 1: Regions for a two-point design to be D-optimal with d = 1, J = 3, x { 1, 0, 1}, and logit link (note that θ 1 < θ 2 is required) Example 6. Suppose d = 1, J = 3, and m = 3 with three factor levels { 1, 0, 1}. Under the logit link g(γ) = log(γ/(1 γ)), there are three parameters β, θ 1, θ 2 satisfying g(γ 1j ) = θ j + β, g(γ 2j ) = θ j, g(γ 3j ) = θ j β, j = 1, 2 It can be verified that the D-optimal deign satisfies p 1 = p 3 = 1/2 if β = 0. Figure 1 shows cases with more general parameter values. In Figure 1(a), four regions in (θ 1, θ 2 )-plane are occupied by minimally supported designs (note that θ 1 < θ 2 is required). For example, regions labeled with p 2 = 0 indicates a minimally supported design satisfying p 2 = 0 is D-optimal given such a triple (θ 1, θ 2, β = 2). From Figure 1(b), one can see clearly that a design supported on { 1, 1} (that is, p 2 = 0) is D-optimal if β is not far away from Minimally supported designs with d = 2 and J = 3 In this subsection, we consider experiments with two factors and three categories. The corresponding parameters are β 1, β 2, θ 1, θ 2. For cases with more than three categories, similar conclusions could be obtained accordingly but with messier notations. According to Theorem 3, a minimally supported design in this case needs three support points, for example, (x i1, x i2 ), i = 1, 2, 3. Under Assumption 3, X 1 0, where X 1 = (1 X) is defined as in Lemma 7. In this case, X 1 is 23

24 a 3 3 matrix. Following Theorem 2, Lemmas 5, 6, and 7, the objective function for a minimally supported design at (d, J, m) = (2, 3, 3) is f(p 1, p 2, p 3 ) = X 1 2 e 1 e 2 e 3 p 1 p 2 p 3 (w 1 p 1 + w 2 p 2 + w 3 p 3 ) (20) where w i = e 1 i gi1g 2 i2(π 2 i1 π i2 π i3 ) 1 > 0, i = 1, 2, 3. We first solve the D-optimal design p = (p 1, p 2, p 3 ) T maximizing f(p 1, p 2, p 3 ) in (20), or equivalently maximizing p 1 p 2 p 3 (p 1 w 1 + p 2 w 2 + p 3 w 3 ). Since f(p 1, p 2, p 3 ) = 0 if p 1 p 2 p 3 = 0, then a D-optimal p = (p 1, p 2, p 3 ) T maximizing f(p 1, p 2, p 3 ) must satisfy 0 < p 1, p 2, p 3 < 1. As a direct conclusion of Theorem 6, a necessary condition for (p 1, p 2, p 3 ) to maximize f(p 1, p 2, p 3 ) is f = f = f (21) p 1 p 2 p 3 Following Tong et al. (2014), we are able to find analytic solutions maximizing equation (20). Theorem 12. Without any loss of generality, we assume 0 < w 3 w 2 w 1. The D-optimal allocation p = (p 1, p 2, p 3) T maximizing f(p 1, p 2, p 3 ) in (20) exists and is unique. It satisfies 0 < p 3 p 2 p 1 < 1 and can be obtained analytically as follows (i) If w 2 = w 3, then p 1 = 1 /(4w ), p 2 = p 3 = 2w 1 /(4w ), where 1 = 2w 1 3w 2 + 4w 2 1 4w 1 w 2 + 9w 2 2. Note that a special case is p 1 = p 2 = p 3 = 1/3 if we have w 3 = w 2 = w 1. (ii) If w 1 = w 2 w 3, then p 1 = p 2 = 2 /[2( 2 + 2w 1 )], p 3 = 2w 1 /( 2 + 2w 1 ), where 2 = 3w 1 2w 3 + 9w 2 1 4w 1 w 3 + 4w 2 3. (iii) If 0 < w 3 < w 2 < w 1, then p 1 = y 1 /(y 1 + y 2 + 1), p 2 = y 2 /(y 1 + y 2 + 1), p 3 = 1/(y 1 + y 2 + 1), where y 1 = b /3 (3b 1 b 2 2) 3A 1/3 + A1/ /3, y 2 = (w 1 w 3 )y 1 (w 2 w 3 ) + (w 1 w 2 )y 1 with A = 27b 0 + 9b 1 b 2 2b /2 (27b b b 0 b 1 b 2 b 2 1b b 0 b 3 2) 1/2, b i = c i /c 3, i = 0, 1, 2, and c 0 = w 3 (w 2 w 3 ) > 0, c 1 = 3w 1 w 2 w 1 w 3 4w 2 w 3 + 2w 2 3 > 0, c 2 = 2w 2 1 4w 1 w 2 w 1 w 3 + 3w 2 w 3, c 3 = w 1 (w 2 w 1 ) < 0. 24

25 The proof of Theorem 12 is relegated to the Appendix. A quick conclusion is that in this case a minimally supported design is usually not uniformly supported. Corollary 5. Suppose d = 2, J = 3, and m = 3. Then p = (1/3, 1/3, 1/3) T is D-optimal if and only if w 1 = w 2 = w 3, where w 1, w 2, w 3 are defined as in (20). Example 7. Suppose d = 2, J = 3, and m = 4. Consider a typical 2 2 factorial design problem, that is, the four design points are (x i1, x i2 ) = (1, 1), (1, 1), ( 1, 1), and ( 1, 1) for i = 1, 2, 3, 4 respectively. Suppose the link function g is differentiable and strictly monotonic. Define w i = e 1 i gi1g 2 i2(π 2 i1 π i2 π i3 ) 1, i = 1, 2, 3, 4. (i) If β 1 = β 2 = 0, then w 1 = w 2 = w 3 = w 4. (ii) If β 1 = 0, β 2 0, then w 1 = w 3, w 2 = w 4, but w 1 w 2. (iii) If β 1 0, β 2 = 0, then w 1 = w 2, w 3 = w 4, but w 1 w 3. (iv) If β 1 = β 2 0, then w 2 = w 3, but w 1, w 2, w 4 are distinct. (v) If β 1 = β 2 0, then w 1 = w 4, but w 1, w 2, w 3 are distinct. Theorem 12 provides analytic forms of minimally supported designs with d = 2 and J = 3. As a direct conclusion of Theorem 6, the following corollary provides a necessary and sufficient condition for a minimally supported design to be D-optimal. Its proof is relegated to the supplementary materials. Corollary 6. Suppose d = 2, J = 3, and m 4. Let (x i1, x i2 ), i = 1,..., m be the m distinct level combinations of the two factors. Let X 1 be the m 3 matrix defined in Lemma 7. Then a minimally supported design p = (p 1, p 2, p 3, 0,..., 0) T is D-optimal if and only if (1) p 1, p 2, p 3 are obtained according to Theorem 12; (2) For i = 3,..., m, X 1 [1, 2, i] 2 e 1 e 2 e i p 1p 2(w 1 p 1 + w 2 p 2) + X 1 [1, 3, i] 2 e 1 e 3 e i p 1p 3(w 1 p 1 + w 3 p 3) + X 1 [2, 3, i] 2 e 2 e 3 e i p 2p 3(w 2 p 2 + w 3 p 3) + D i p 1p 2p 3 X 1 [1, 2, 3] 2 e 1 e 2 e 3 p 2p 3(2w 1 p 1 + w 2 p 2 + w 3 p 3) where e j = u j1 + u j2 2b j2, w j = e 1 j gj1g 2 j2(π 2 j1 π j2 π j3 ) 1, j = 1,..., m, D i = e j e k (u s1 u t2 +u s2 u t1 2b s2 b t2 ) X 1 [j, k, s] X 1 [j, k, t] {j,k,s,t} ={1,2,3,i} 25

26 (a) (b) β θ 1=1, θ 2=2 θ 1=3, θ 2=5 θ 1=.2, θ 2=.5 θ β 1=1, β 2=1 β 1=1, β 2=3 β 1=1, β 2= β 1 θ 1 Figure 2: Boundary lines for a three-point design to be D-optimal with logit link: Region of (β 1, β 2 ) for given (θ 1, θ 2 ) is outside the boundary lines in Panel (a); Region of (θ 1, θ 2 ) (with θ 1 < θ 2 ) for given (β 1, β 2 ) is between the boundary lines and θ 1 = θ 2 in Panel (b) with the sum going through (j, k, s, t) = (1, 2, 3, i), (1, 3, 2, i), (1, i, 2, 3), (2, 3, 1, i), (2, i, 1, 3), (3, i, 1, 2). Example 8. Suppose d = 2, J = 3, m = 4 with logit link function. We consider the typical 2 2 factorial design problem with four design points (1, 1), (1, 1), ( 1, 1), and ( 1, 1). According to Theorem 12 and Corollary 6, we can analytically calculate the best three-point design and determine whether it is D-optimal or not. Figure 2 provides the boundary lines of regions of parameters (β 1, β 2, θ 1, θ 2 ) for which the best three-point design is D-optimal. In particular, Figure 2(a) shows the region of (β 1, β 2 ) for given θ 1, θ 2. It clearly indicates that the best three-point design tends to be D- optimal when the absolute values of β 1, β 2 are large. The region tends to be larger as the absolute values of θ 1, θ 2 increase. On the other hand, Figure 2(b) displays the region of (θ 1, θ 2 ) for given β 1, β 2. The symmetry of the boundary lines about θ 1 + θ 2 = 0 is due to the logit link which is symmetric about 0. An interesting conclusion based on Corollary 6 is that in this case a three-point design can never be D-optimal if β 1 = 0 or β 2 = 0. 26

27 7 EW D-optimal Design The previous sections mainly focus on locally D-optimal designs which require assumed parameter values, (β 1,..., β d, θ 1,..., θ J 1 ). For many applications, the experimenter may have little or limited information about the values of parameters. In this case, Bayes D-optimality (Chaloner and Verdinelli, 1995) which maximizes E(log F ) given a prior distribution on parameters provides a reasonable solution. Here E stands for expectation, and F is the Fisher information matrix. An alternative to Bayes one is the EW D- optimality (Yang, Mandal and Majumdar, 2014; Atkinson et al., 2007) which maximizes log E(F ) essentially. Compared with Bayes ones, EW D-optimal designs are much easier to calculate and still highly efficient (Yang, Mandal and Majumdar, 2014). Based on Theorem 1, an EW D-optimal design which maximizes E(F ) may be viewed as a locally D-optimal design with e i, c it, u it and b it replaced by their expectations. After the replacement, Lemma 2 still holds. Therefore, almost all the lemmas, theorems, corollaries, and algorithms in the previous sections can be applied directly to EW D-optimal designs as well. The only exception is due to Lemma 3 which provides the formula of A i3 in terms of g ij and π ij. In order to fit EW D-optimal designs, A i3 has to be calculated in terms of u it and b it. For example, A i3 = u i1 if J = 2, A i3 = u i1 u i2 b 2 i2 if J = 3, and A i3 = u i1 u i2 u i3 u i1 b 2 i3 u i3 b 2 i2 if J = 4. Then the formulas of A i3 in Lemma 7, c 1, c 2 in Corollary 3, s i3, s i4, s i5 in Corollary 4, w i in (20), and w j in Corollary 6 need to be written in terms of u it and b it as well. According to Lemma 2, one only needs to calculate E(u it ), i = 1,..., m; t = 1,..., J 1 and E(b it ), i = 1,..., m; t = 2,..., J 1 (if J 3). Then E(c it ) = E(u it ) E(b it ) E(b i,t+1 ) and E(e i ) = J 1 t=1 E(c it). After that, we can use the lift-one algorithm in Section 4 or the exchange algorithm in Section 5 to find EW D-optimal designs. We use the odor removal example to illustrate how it works. Example 4 : Odor Removal Study (continued) Again suppose that we want to conduct a followup experiment. Instead of using the assumed parameter values (β 1, β 2, θ 1, θ 2 ) = ( 2.45, 1.09, 2.67, 0.21), suppose we believe that the truth values of parameters satisfy β 1 [ 3, 1], β 2 [0, 2], θ 1 [ 4, 2], and θ 2 [ 1, 1]. In order to perform Bayes optimality, we assume that the four parameters are independently and uniformly distributed within their intervals. It takes the R function constroptim 430 seconds to numerically find the Bayes D-optimal allocation p b = (0.3879, , , 27

D-OPTIMAL DESIGNS WITH ORDERED CATEGORICAL DATA

D-OPTIMAL DESIGNS WITH ORDERED CATEGORICAL DATA Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0210 D-OPTIMAL DESIGNS WITH ORDERED CATEGORICAL DATA Jie Yang, Liping Tong and Abhyuday Mandal University of Illinois at Chicago,

More information

arxiv: v5 [math.st] 8 Nov 2017

arxiv: v5 [math.st] 8 Nov 2017 D-OPTIMAL DESIGNS WITH ORDERED CATEGORICAL DATA Jie Yang 1, Liping Tong 2 and Abhyuday Mandal 3 arxiv:1502.05990v5 [math.st] 8 Nov 2017 1 University of Illinois at Chicago, 2 Advocate Health Care and 3

More information

D-optimal Designs for Multinomial Logistic Models

D-optimal Designs for Multinomial Logistic Models D-optimal Designs for Multinomial Logistic Models Jie Yang University of Illinois at Chicago Joint with Xianwei Bu and Dibyen Majumdar October 12, 2017 1 Multinomial Logistic Models Cumulative logit model:

More information

OPTIMAL DESIGNS FOR 2 k FACTORIAL EXPERIMENTS WITH BINARY RESPONSE

OPTIMAL DESIGNS FOR 2 k FACTORIAL EXPERIMENTS WITH BINARY RESPONSE 1 OPTIMAL DESIGNS FOR 2 k FACTORIAL EXPERIMENTS WITH BINARY RESPONSE Jie Yang 1, Abhyuday Mandal 2 and Dibyen Majumdar 1 1 University of Illinois at Chicago and 2 University of Georgia Abstract: We consider

More information

OPTIMAL DESIGNS FOR 2 k FACTORIAL EXPERIMENTS WITH BINARY RESPONSE

OPTIMAL DESIGNS FOR 2 k FACTORIAL EXPERIMENTS WITH BINARY RESPONSE Statistica Sinica 26 (2016), 385-411 doi:http://dx.doi.org/10.5705/ss.2013.265 OPTIMAL DESIGNS FOR 2 k FACTORIAL EXPERIMENTS WITH BINARY RESPONSE Jie Yang 1, Abhyuday Mandal 2 and Dibyen Majumdar 1 1 University

More information

D-optimal Designs for Factorial Experiments under Generalized Linear Models

D-optimal Designs for Factorial Experiments under Generalized Linear Models D-optimal Designs for Factorial Experiments under Generalized Linear Models Jie Yang Department of Mathematics, Statistics, and Computer Science University of Illinois at Chicago Joint research with Abhyuday

More information

arxiv: v8 [math.st] 27 Jan 2015

arxiv: v8 [math.st] 27 Jan 2015 1 arxiv:1109.5320v8 [math.st] 27 Jan 2015 OPTIMAL DESIGNS FOR 2 k FACTORIAL EXPERIMENTS WITH BINARY RESPONSE Jie Yang 1, Abhyuday Mandal 2 and Dibyen Majumdar 1 1 University of Illinois at Chicago and

More information

arxiv: v2 [math.st] 14 Aug 2018

arxiv: v2 [math.st] 14 Aug 2018 D-optimal Designs for Multinomial Logistic Models arxiv:170703063v2 [mathst] 14 Aug 2018 Xianwei Bu 1, Dibyen Majumdar 2 and Jie Yang 2 1 AbbVie Inc and 2 University of Illinois at Chicago August 16, 2018

More information

Optimal Designs for 2 k Experiments with Binary Response

Optimal Designs for 2 k Experiments with Binary Response 1 / 57 Optimal Designs for 2 k Experiments with Binary Response Dibyen Majumdar Mathematics, Statistics, and Computer Science College of Liberal Arts and Sciences University of Illinois at Chicago Joint

More information

Optimal Designs for 2 k Factorial Experiments with Binary Response

Optimal Designs for 2 k Factorial Experiments with Binary Response Optimal Designs for 2 k Factorial Experiments with Binary Response arxiv:1109.5320v4 [math.st] 29 Mar 2013 Jie Yang 1, Abhyuday Mandal 2 and Dibyen Majumdar 1 1 University of Illinois at Chicago and 2

More information

D-optimal Factorial Designs under Generalized Linear Models

D-optimal Factorial Designs under Generalized Linear Models D-optimal Factorial Designs under Generalized Linear Models Jie Yang 1 and Abhyuday Mandal 2 1 University of Illinois at Chicago and 2 University of Georgia Abstract: Generalized linear models (GLMs) have

More information

Optimal Designs for 2 k Factorial Experiments with Binary Response

Optimal Designs for 2 k Factorial Experiments with Binary Response Optimal Designs for 2 k Factorial Experiments with Binary Response arxiv:1109.5320v3 [math.st] 10 Oct 2012 Jie Yang 1, Abhyuday Mandal 2 and Dibyen Majumdar 1 1 University of Illinois at Chicago and 2

More information

OPTIMAL DESIGNS FOR TWO-LEVEL FACTORIAL EXPERIMENTS WITH BINARY RESPONSE

OPTIMAL DESIGNS FOR TWO-LEVEL FACTORIAL EXPERIMENTS WITH BINARY RESPONSE 1 OPTIMAL DESIGNS FOR TWO-LEVEL FACTORIAL EXPERIMENTS WITH BINARY RESPONSE Abhyuday Mandal 1, Jie Yang 2 and Dibyen Majumdar 2 1 University of Georgia and 2 University of Illinois Abstract: We consider

More information

A new algorithm for deriving optimal designs

A new algorithm for deriving optimal designs A new algorithm for deriving optimal designs Stefanie Biedermann, University of Southampton, UK Joint work with Min Yang, University of Illinois at Chicago 18 October 2012, DAE, University of Georgia,

More information

OPTIMAL DESIGNS FOR TWO-LEVEL FACTORIAL EXPERIMENTS WITH BINARY RESPONSE

OPTIMAL DESIGNS FOR TWO-LEVEL FACTORIAL EXPERIMENTS WITH BINARY RESPONSE 1 OPTIMAL DESIGNS FOR TWO-LEVEL FACTORIAL EXPERIMENTS WITH BINARY RESPONSE Jie Yang 1, Abhyuday Mandal 2 and Dibyen Majumdar 1 1 University of Illinois at Chicago and 2 University of Georgia Abstract:

More information

OPTIMAL DESIGNS FOR TWO-LEVEL FACTORIAL EXPERIMENTS WITH BINARY RESPONSE

OPTIMAL DESIGNS FOR TWO-LEVEL FACTORIAL EXPERIMENTS WITH BINARY RESPONSE Statistica Sinica 22 (2012), 885-907 doi:http://dx.doi.org/10.5705/ss.2010.080 OPTIMAL DESIGNS FOR TWO-LEVEL FACTORIAL EXPERIMENTS WITH BINARY RESPONSE Jie Yang 1, Abhyuday Mandal 2 and Dibyen Majumdar

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

OPTIMAL DESIGNS FOR TWO-LEVEL FACTORIAL EXPERIMENTS WITH BINARY RESPONSE

OPTIMAL DESIGNS FOR TWO-LEVEL FACTORIAL EXPERIMENTS WITH BINARY RESPONSE Statistica Sinica: Sulement OPTIMAL DESIGNS FOR TWO-LEVEL FACTORIAL EXPERIMENTS WITH BINARY RESPONSE Jie Yang 1, Abhyuday Mandal, and Dibyen Majumdar 1 1 University of Illinois at Chicago and University

More information

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Put your solution to each problem on a separate sheet of paper. Problem 1. (5166) Assume that two random samples {x i } and {y i } are independently

More information

The master equality polyhedron with multiple rows

The master equality polyhedron with multiple rows The master equality polyhedron with multiple rows Sanjeeb Dash Ricardo Fukasawa IBM Research February 17, 2009 Oktay Günlük Abstract The master equality polyhedron (MEP) is a canonical set that generalizes

More information

Designs for Generalized Linear Models

Designs for Generalized Linear Models Designs for Generalized Linear Models Anthony C. Atkinson David C. Woods London School of Economics and Political Science, UK University of Southampton, UK December 9, 2013 Email: a.c.atkinson@lse.ac.uk

More information

By Min Yang 1 and John Stufken 2 University of Missouri Columbia and University of Georgia

By Min Yang 1 and John Stufken 2 University of Missouri Columbia and University of Georgia The Annals of Statistics 2009, Vol. 37, No. 1, 518 541 DOI: 10.1214/07-AOS560 c Institute of Mathematical Statistics, 2009 SUPPORT POINTS OF LOCALLY OPTIMAL DESIGNS FOR NONLINEAR MODELS WITH TWO PARAMETERS

More information

Week 15-16: Combinatorial Design

Week 15-16: Combinatorial Design Week 15-16: Combinatorial Design May 8, 2017 A combinatorial design, or simply a design, is an arrangement of the objects of a set into subsets satisfying certain prescribed properties. The area of combinatorial

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

Chapter 2. Optimization. Gradients, convexity, and ALS

Chapter 2. Optimization. Gradients, convexity, and ALS Chapter 2 Optimization Gradients, convexity, and ALS Contents Background Gradient descent Stochastic gradient descent Newton s method Alternating least squares KKT conditions 2 Motivation We can solve

More information

The master equality polyhedron with multiple rows

The master equality polyhedron with multiple rows The master equality polyhedron with multiple rows Sanjeeb Dash IBM Research sanjeebd@us.ibm.com Ricardo Fukasawa University of Waterloo rfukasaw@math.uwaterloo.ca September 16, 2010 Oktay Günlük IBM Research

More information

Notes taken by Graham Taylor. January 22, 2005

Notes taken by Graham Taylor. January 22, 2005 CSC4 - Linear Programming and Combinatorial Optimization Lecture : Different forms of LP. The algebraic objects behind LP. Basic Feasible Solutions Notes taken by Graham Taylor January, 5 Summary: We first

More information

On construction of constrained optimum designs

On construction of constrained optimum designs On construction of constrained optimum designs Institute of Control and Computation Engineering University of Zielona Góra, Poland DEMA2008, Cambridge, 15 August 2008 Numerical algorithms to construct

More information

OPTIMAL DESIGNS FOR GENERALIZED LINEAR MODELS WITH MULTIPLE DESIGN VARIABLES

OPTIMAL DESIGNS FOR GENERALIZED LINEAR MODELS WITH MULTIPLE DESIGN VARIABLES Statistica Sinica 21 (2011, 1415-1430 OPTIMAL DESIGNS FOR GENERALIZED LINEAR MODELS WITH MULTIPLE DESIGN VARIABLES Min Yang, Bin Zhang and Shuguang Huang University of Missouri, University of Alabama-Birmingham

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Generalized Linear Models (GLZ)

Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the

More information

Optimum designs for model. discrimination and estimation. in Binary Response Models

Optimum designs for model. discrimination and estimation. in Binary Response Models Optimum designs for model discrimination and estimation in Binary Response Models by Wei-Shan Hsieh Advisor Mong-Na Lo Huang Department of Applied Mathematics National Sun Yat-sen University Kaohsiung,

More information

18.10 Addendum: Arbitrary number of pigeons

18.10 Addendum: Arbitrary number of pigeons 18 Resolution 18. Addendum: Arbitrary number of pigeons Razborov s idea is to use a more subtle concept of width of clauses, tailor made for this particular CNF formula. Theorem 18.22 For every m n + 1,

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

FRACTIONAL FACTORIAL DESIGNS OF STRENGTH 3 AND SMALL RUN SIZES

FRACTIONAL FACTORIAL DESIGNS OF STRENGTH 3 AND SMALL RUN SIZES FRACTIONAL FACTORIAL DESIGNS OF STRENGTH 3 AND SMALL RUN SIZES ANDRIES E. BROUWER, ARJEH M. COHEN, MAN V.M. NGUYEN Abstract. All mixed (or asymmetric) orthogonal arrays of strength 3 with run size at most

More information

CONSTRUCTION OF SLICED SPACE-FILLING DESIGNS BASED ON BALANCED SLICED ORTHOGONAL ARRAYS

CONSTRUCTION OF SLICED SPACE-FILLING DESIGNS BASED ON BALANCED SLICED ORTHOGONAL ARRAYS Statistica Sinica 24 (2014), 1685-1702 doi:http://dx.doi.org/10.5705/ss.2013.239 CONSTRUCTION OF SLICED SPACE-FILLING DESIGNS BASED ON BALANCED SLICED ORTHOGONAL ARRAYS Mingyao Ai 1, Bochuan Jiang 1,2

More information

A-optimal designs for generalized linear model with two parameters

A-optimal designs for generalized linear model with two parameters A-optimal designs for generalized linear model with two parameters Min Yang * University of Missouri - Columbia Abstract An algebraic method for constructing A-optimal designs for two parameter generalized

More information

Approximation algorithms for nonnegative polynomial optimization problems over unit spheres

Approximation algorithms for nonnegative polynomial optimization problems over unit spheres Front. Math. China 2017, 12(6): 1409 1426 https://doi.org/10.1007/s11464-017-0644-1 Approximation algorithms for nonnegative polynomial optimization problems over unit spheres Xinzhen ZHANG 1, Guanglu

More information

d-qpso: A Quantum-Behaved Particle Swarm Technique for Finding D-Optimal Designs for Models with Mixed Factors and a Binary Response

d-qpso: A Quantum-Behaved Particle Swarm Technique for Finding D-Optimal Designs for Models with Mixed Factors and a Binary Response d-qpso: A Quantum-Behaved Particle Swarm Technique for Finding D-Optimal Designs for Models with Mixed Factors and a Binary Response Joshua Lukemire, Abhyuday Mandal, Weng Kee Wong Abstract Identifying

More information

An Alternative Proof of Primitivity of Indecomposable Nonnegative Matrices with a Positive Trace

An Alternative Proof of Primitivity of Indecomposable Nonnegative Matrices with a Positive Trace An Alternative Proof of Primitivity of Indecomposable Nonnegative Matrices with a Positive Trace Takao Fujimoto Abstract. This research memorandum is aimed at presenting an alternative proof to a well

More information

Stochastic Design Criteria in Linear Models

Stochastic Design Criteria in Linear Models AUSTRIAN JOURNAL OF STATISTICS Volume 34 (2005), Number 2, 211 223 Stochastic Design Criteria in Linear Models Alexander Zaigraev N. Copernicus University, Toruń, Poland Abstract: Within the framework

More information

Interactive Interference Alignment

Interactive Interference Alignment Interactive Interference Alignment Quan Geng, Sreeram annan, and Pramod Viswanath Coordinated Science Laboratory and Dept. of ECE University of Illinois, Urbana-Champaign, IL 61801 Email: {geng5, kannan1,

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

ORTHOGONAL ARRAYS OF STRENGTH 3 AND SMALL RUN SIZES

ORTHOGONAL ARRAYS OF STRENGTH 3 AND SMALL RUN SIZES ORTHOGONAL ARRAYS OF STRENGTH 3 AND SMALL RUN SIZES ANDRIES E. BROUWER, ARJEH M. COHEN, MAN V.M. NGUYEN Abstract. All mixed (or asymmetric) orthogonal arrays of strength 3 with run size at most 64 are

More information

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines Nonlinear Support Vector Machines through Iterative Majorization and I-Splines P.J.F. Groenen G. Nalbantov J.C. Bioch July 9, 26 Econometric Institute Report EI 26-25 Abstract To minimize the primal support

More information

Repeated ordinal measurements: a generalised estimating equation approach

Repeated ordinal measurements: a generalised estimating equation approach Repeated ordinal measurements: a generalised estimating equation approach David Clayton MRC Biostatistics Unit 5, Shaftesbury Road Cambridge CB2 2BW April 7, 1992 Abstract Cumulative logit and related

More information

Foundations of Matrix Analysis

Foundations of Matrix Analysis 1 Foundations of Matrix Analysis In this chapter we recall the basic elements of linear algebra which will be employed in the remainder of the text For most of the proofs as well as for the details, the

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

THE N-VALUE GAME OVER Z AND R

THE N-VALUE GAME OVER Z AND R THE N-VALUE GAME OVER Z AND R YIDA GAO, MATT REDMOND, ZACH STEWARD Abstract. The n-value game is an easily described mathematical diversion with deep underpinnings in dynamical systems analysis. We examine

More information

36-720: The Rasch Model

36-720: The Rasch Model 36-720: The Rasch Model Brian Junker October 15, 2007 Multivariate Binary Response Data Rasch Model Rasch Marginal Likelihood as a GLMM Rasch Marginal Likelihood as a Log-Linear Model Example For more

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

MIXED MODELS THE GENERAL MIXED MODEL

MIXED MODELS THE GENERAL MIXED MODEL MIXED MODELS This chapter introduces best linear unbiased prediction (BLUP), a general method for predicting random effects, while Chapter 27 is concerned with the estimation of variances by restricted

More information

12 Modelling Binomial Response Data

12 Modelling Binomial Response Data c 2005, Anthony C. Brooms Statistical Modelling and Data Analysis 12 Modelling Binomial Response Data 12.1 Examples of Binary Response Data Binary response data arise when an observation on an individual

More information

15-780: LinearProgramming

15-780: LinearProgramming 15-780: LinearProgramming J. Zico Kolter February 1-3, 2016 1 Outline Introduction Some linear algebra review Linear programming Simplex algorithm Duality and dual simplex 2 Outline Introduction Some linear

More information

DISTINGUISHING PARTITIONS AND ASYMMETRIC UNIFORM HYPERGRAPHS

DISTINGUISHING PARTITIONS AND ASYMMETRIC UNIFORM HYPERGRAPHS DISTINGUISHING PARTITIONS AND ASYMMETRIC UNIFORM HYPERGRAPHS M. N. ELLINGHAM AND JUSTIN Z. SCHROEDER In memory of Mike Albertson. Abstract. A distinguishing partition for an action of a group Γ on a set

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit

Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit R. G. Pierse 1 Introduction In lecture 5 of last semester s course, we looked at the reasons for including dichotomous variables

More information

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW SSC Annual Meeting, June 2015 Proceedings of the Survey Methods Section ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW Xichen She and Changbao Wu 1 ABSTRACT Ordinal responses are frequently involved

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Lecture Notes on Game Theory

Lecture Notes on Game Theory Lecture Notes on Game Theory Levent Koçkesen Strategic Form Games In this part we will analyze games in which the players choose their actions simultaneously (or without the knowledge of other players

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

Bayesian Multivariate Logistic Regression

Bayesian Multivariate Logistic Regression Bayesian Multivariate Logistic Regression Sean M. O Brien and David B. Dunson Biostatistics Branch National Institute of Environmental Health Sciences Research Triangle Park, NC 1 Goals Brief review of

More information

Regression models for multivariate ordered responses via the Plackett distribution

Regression models for multivariate ordered responses via the Plackett distribution Journal of Multivariate Analysis 99 (2008) 2472 2478 www.elsevier.com/locate/jmva Regression models for multivariate ordered responses via the Plackett distribution A. Forcina a,, V. Dardanoni b a Dipartimento

More information

Support weight enumerators and coset weight distributions of isodual codes

Support weight enumerators and coset weight distributions of isodual codes Support weight enumerators and coset weight distributions of isodual codes Olgica Milenkovic Department of Electrical and Computer Engineering University of Colorado, Boulder March 31, 2003 Abstract In

More information

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,

More information

2 Describing Contingency Tables

2 Describing Contingency Tables 2 Describing Contingency Tables I. Probability structure of a 2-way contingency table I.1 Contingency Tables X, Y : cat. var. Y usually random (except in a case-control study), response; X can be random

More information

CONSTRUCTION OF SLICED ORTHOGONAL LATIN HYPERCUBE DESIGNS

CONSTRUCTION OF SLICED ORTHOGONAL LATIN HYPERCUBE DESIGNS Statistica Sinica 23 (2013), 1117-1130 doi:http://dx.doi.org/10.5705/ss.2012.037 CONSTRUCTION OF SLICED ORTHOGONAL LATIN HYPERCUBE DESIGNS Jian-Feng Yang, C. Devon Lin, Peter Z. G. Qian and Dennis K. J.

More information

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown. Weighting We have seen that if E(Y) = Xβ and V (Y) = σ 2 G, where G is known, the model can be rewritten as a linear model. This is known as generalized least squares or, if G is diagonal, with trace(g)

More information

On Multiple-Objective Nonlinear Optimal Designs

On Multiple-Objective Nonlinear Optimal Designs On Multiple-Objective Nonlinear Optimal Designs Qianshun Cheng, Dibyen Majumdar, and Min Yang December 1, 2015 Abstract Experiments with multiple objectives form a staple diet of modern scientific research.

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

LECTURE 2 LINEAR REGRESSION MODEL AND OLS SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another

More information

Relation of Pure Minimum Cost Flow Model to Linear Programming

Relation of Pure Minimum Cost Flow Model to Linear Programming Appendix A Page 1 Relation of Pure Minimum Cost Flow Model to Linear Programming The Network Model The network pure minimum cost flow model has m nodes. The external flows given by the vector b with m

More information

Moment Aberration Projection for Nonregular Fractional Factorial Designs

Moment Aberration Projection for Nonregular Fractional Factorial Designs Moment Aberration Projection for Nonregular Fractional Factorial Designs Hongquan Xu Department of Statistics University of California Los Angeles, CA 90095-1554 (hqxu@stat.ucla.edu) Lih-Yuan Deng Department

More information

Binary choice 3.3 Maximum likelihood estimation

Binary choice 3.3 Maximum likelihood estimation Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Output of the estimation We explain here the various outputs from the maximum likelihood estimation procedure. Solution of the maximum likelihood

More information

1 Directional Derivatives and Differentiability

1 Directional Derivatives and Differentiability Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=

More information

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems Robert M. Freund February 2016 c 2016 Massachusetts Institute of Technology. All rights reserved. 1 1 Introduction

More information

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X. Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may

More information

Partition models and cluster processes

Partition models and cluster processes and cluster processes and cluster processes With applications to classification Jie Yang Department of Statistics University of Chicago ICM, Madrid, August 26 and cluster processes utline 1 and cluster

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Review of Vectors and Matrices

Review of Vectors and Matrices A P P E N D I X D Review of Vectors and Matrices D. VECTORS D.. Definition of a Vector Let p, p, Á, p n be any n real numbers and P an ordered set of these real numbers that is, P = p, p, Á, p n Then P

More information

Chapter 1: Linear Programming

Chapter 1: Linear Programming Chapter 1: Linear Programming Math 368 c Copyright 2013 R Clark Robinson May 22, 2013 Chapter 1: Linear Programming 1 Max and Min For f : D R n R, f (D) = {f (x) : x D } is set of attainable values of

More information

A strongly polynomial algorithm for linear systems having a binary solution

A strongly polynomial algorithm for linear systems having a binary solution A strongly polynomial algorithm for linear systems having a binary solution Sergei Chubanov Institute of Information Systems at the University of Siegen, Germany e-mail: sergei.chubanov@uni-siegen.de 7th

More information

A Distributed Newton Method for Network Utility Maximization, II: Convergence

A Distributed Newton Method for Network Utility Maximization, II: Convergence A Distributed Newton Method for Network Utility Maximization, II: Convergence Ermin Wei, Asuman Ozdaglar, and Ali Jadbabaie October 31, 2012 Abstract The existing distributed algorithms for Network Utility

More information

Latent Class Analysis for Models with Error of Measurement Using Log-Linear Models and An Application to Women s Liberation Data

Latent Class Analysis for Models with Error of Measurement Using Log-Linear Models and An Application to Women s Liberation Data Journal of Data Science 9(2011), 43-54 Latent Class Analysis for Models with Error of Measurement Using Log-Linear Models and An Application to Women s Liberation Data Haydar Demirhan Hacettepe University

More information

A Framework for the Construction of Golay Sequences

A Framework for the Construction of Golay Sequences 1 A Framework for the Construction of Golay Sequences Frank Fiedler, Jonathan Jedwab, and Matthew G Parker Abstract In 1999 Davis and Jedwab gave an explicit algebraic normal form for m! h(m+) ordered

More information

TRANSPORTATION PROBLEMS

TRANSPORTATION PROBLEMS Chapter 6 TRANSPORTATION PROBLEMS 61 Transportation Model Transportation models deal with the determination of a minimum-cost plan for transporting a commodity from a number of sources to a number of destinations

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

Worst case analysis for a general class of on-line lot-sizing heuristics

Worst case analysis for a general class of on-line lot-sizing heuristics Worst case analysis for a general class of on-line lot-sizing heuristics Wilco van den Heuvel a, Albert P.M. Wagelmans a a Econometric Institute and Erasmus Research Institute of Management, Erasmus University

More information

Describing Contingency tables

Describing Contingency tables Today s topics: Describing Contingency tables 1. Probability structure for contingency tables (distributions, sensitivity/specificity, sampling schemes). 2. Comparing two proportions (relative risk, odds

More information

A Generalized Eigenmode Algorithm for Reducible Regular Matrices over the Max-Plus Algebra

A Generalized Eigenmode Algorithm for Reducible Regular Matrices over the Max-Plus Algebra International Mathematical Forum, 4, 2009, no. 24, 1157-1171 A Generalized Eigenmode Algorithm for Reducible Regular Matrices over the Max-Plus Algebra Zvi Retchkiman Königsberg Instituto Politécnico Nacional,

More information

Generalized Linear Models Introduction

Generalized Linear Models Introduction Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,

More information

Concepts and Applications of Stochastically Weighted Stochastic Dominance

Concepts and Applications of Stochastically Weighted Stochastic Dominance Concepts and Applications of Stochastically Weighted Stochastic Dominance Jian Hu Department of Industrial Engineering and Management Sciences Northwestern University jianhu@northwestern.edu Tito Homem-de-Mello

More information

Symmetric Matrices and Eigendecomposition

Symmetric Matrices and Eigendecomposition Symmetric Matrices and Eigendecomposition Robert M. Freund January, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 2 1 Symmetric Matrices and Convexity of Quadratic Functions

More information

On Expected Gaussian Random Determinants

On Expected Gaussian Random Determinants On Expected Gaussian Random Determinants Moo K. Chung 1 Department of Statistics University of Wisconsin-Madison 1210 West Dayton St. Madison, WI 53706 Abstract The expectation of random determinants whose

More information

The initial involution patterns of permutations

The initial involution patterns of permutations The initial involution patterns of permutations Dongsu Kim Department of Mathematics Korea Advanced Institute of Science and Technology Daejeon 305-701, Korea dskim@math.kaist.ac.kr and Jang Soo Kim Department

More information

Optimal XOR based (2,n)-Visual Cryptography Schemes

Optimal XOR based (2,n)-Visual Cryptography Schemes Optimal XOR based (2,n)-Visual Cryptography Schemes Feng Liu and ChuanKun Wu State Key Laboratory Of Information Security, Institute of Software Chinese Academy of Sciences, Beijing 0090, China Email:

More information