D-optimal Designs with Ordered Categorical Data
|
|
- Willis Tucker
- 5 years ago
- Views:
Transcription
1 D-optimal Designs with Ordered Categorical Data Jie Yang Liping Tong Abhyuday Mandal University of Illinois at Chicago Loyola University Chicago University of Georgia February 20, 2015 Abstract We consider D-optimal designs with ordered categorical responses and cumulative link models. In addition to theoretically characterizing locally D-optimal designs, we develop efficient algorithms for obtaining both approximate designs and exact designs. For ordinal data and general link functions, we obtain a simplified structure of the Fisher information matrix, and express its determinant as a homogeneous polynomial. For a predetermined set of design points, we derive the necessary and sufficient conditions for an allocation to be locally D- optimal. We prove that the number of support points in a minimally supported design only depends on the number of predictors, which can be much less than the number of parameters in the model. We show that a D-optimal minimally supported allocation in this case is usually not uniform on its support points. We also provide EW D- optimal designs as a highly efficient surrogate to Bayesian D-optimal designs with ordinal data. Keyword: Approximate design; exact design; multinomial response; cumulative link model; minimally supported design; ordinal data 1 Introduction We consider optimal experimental designs with ordered categorical responses, or simply ordinal data. Design of experiment with ordinal data has been of 1
2 great importance in a rich variety of scientific disciplines especially when human evaluations are involved (Christensen, 2013). Examples include wine bitterness study (Randall, 1989), potato pathogen experiments (Omer et al., 2000), radish seedling s damping-off study (Krause et al., 2001), polysilicon deposition study (Wu, 2008), beef cattle research (Osterstock et al., 2010), and toxicity study (Agresti, 2013). This research is motivated by an odor removal study conducted by the textile engineers at the University of Georgia. The scientists manufacture bio-plastics from algae that contain odorous elements. Following traditional factorial design theory for linear models, a regular 2 2 experiment with equal number of replicates was used to study the effect of types of algae and synthetic resins in removing the odor, and the response was ordinal in nature no odor, medium odor and strong odor. In this paper we identify designs that are significantly more efficient than the one used for this purpose. For an ordinal response Y with J categories and a set of d predictors x = (x 1,..., x d ) T, the most popular model is the cumulative logit model (also known as proportional odds model, see Liu and Agresti (2005) for a detailed review). McCullagh (1980) extended the proportional odds model with a more general link function g called the cumulative link model (also known as ordinal regression model) g (P (Y j x)) = θ j β T x, j = 1,..., J 1 (1) and treated it as a special case of the multivariate generalized linear model. In this paper, we focus on the cumulative link model with a general link. If there are only two categories (J = 2), the cumulative link model (1) is essentially a generalized linear model for binary data (McCullagh and Nelder, 1989; Dobson and Barnett, 2008). For optimal designs under generalized linear models, there is a growing body of literature (see Khuri et al. (2006), Atkinson et al. (2007), Stufken and Yang (2012) and references therein). When J 3, the results on optimal designs is meagre and restricted to logit link (Zocchi and Atkinson, 1999; Perevozskaya et al., 2003) due to the complexity of the Fisher information matrix F. In this paper, we obtain a special structure of F (Lemmas 1 and 2) for general link and reveal that the optimal designs with J 3 are quite different from the cases with J = 2. We prove that the number of support points of a minimally supported design is d + 1 which could be much less than the number of parameters d + J 1 (Theorems 3 and 4). We also show that the design weights of a minimally 2
3 supported design is usually not uniform on its support points when it is optimal (Section 6). Among various design criteria, D-optimality is the most frequently used one (Zocchi and Atkinson, 1999) and often performs well according to other criteria (Atkinson et al., 2007). Throughout this paper, we focus on D- criterion. In order to overcome the difficulty due to dependency of D-optimal designs on the values of unknown parameters, we choose the local optimality approach of Chernoff (1953) with assumed parameter values. In terms of robust designs, we compare Bayesian D-optimal designs (Chaloner and Verdinelli, 1995) with EW D-optimal designs (Atkinson et al., 2007; Yang, Mandal and Majumdar, 2014) for ordinal data. As a surrogate for Bayesian designs, EW design is much easier to find and retains high efficiency with respect to Bayesian criterion (Section 7). In the design literature, one type of experiments deal with quantitative or continuous factors only. Such a design problem includes identification of a set of design points {x i } i=1,...,m and the corresponding weights {p i } i=1,...,m (see, for example, Atkinson et al. (2007) and Stufken and Yang (2012)). For this type of optimal design problems, numerical algorithms are typically used for cases with two or more factors (see, for example, Woods et al. (2006)). Another type of experiments use qualitative or discrete factors, where the set of design points {x i } i=1,...,m is predetermined and only the weights {p i } i=1,...,m are to be optimized (see, for example, Yang and Mandal (2014)). One connection between the two types of designs is that one can pick up grid points of the continuous factors and turn the first type into the second. Tong et al. (2014) made another connection between the optimal designs for discrete factors and continuous factors (see Section 5 of that paper). In this paper, we concentrate on the second type of designs and assume {x i } i=1,...,m is given and fixed. This paper is organized as follows. In Section 2, we obtain the Fisher information matrix for cumulative link model with a general link, which generalizes Perevozskaya et al. (2003) s result for logit link. In Section 3 identifies a necessary and sufficient condition for the Fisher information matrix to be positive definite. In Sections 4 and 5, theoretical results and numerical algorithms for searching locally D-optimal approximate or exact designs are provided. In Section 6, we identify analytic D-optimal designs for special cases to illustrate that a D-optimal minimally supported design is usually not uniform on its support points. In Section 7, we show by examples that the EW D-optimal design is highly efficient with respect to Bayesian D- 3
4 optimality. Beyond theoretical results provided in this paper, the question that might be asked is whether these results give the users any advantage in real experiments. The answer is a definite yes as demonstrated for the motivating example. 2 Cumulative link model and Fisher information matrix Suppose there are m (m 2) experimental settings which are predetermined. For the ith experimental setting with corresponding covariates or predictors x i = (x i1,..., x id ) T R d (d 1), there are n i experimental units assigned to it. Among them, the kth experimental unit generates a response V ik which belongs to one of J (J 2) ordered categories. In many real applications, V i1,..., V ini are regarded as i.i.d. discrete random variables. Denote π ij = P (V ik = j), where i = 1,..., m, j = 1,..., J, and k = 1,..., n i. Let Y ij = #{k V ik = j} be the number of V ik s falling into the jth category. Then (Y i1,..., Y ij ) Multinomial(n i ; π i1,..., π ij ). Throughout this paper, we assume Assumption 1. 0 < π ij < 1, i = 1,..., m; j = 1,..., J. Denote γ ij = P (V ik j) = π i1 + + π ij, j = 1,..., J. Based on Assumption 1, 0 < γ i1 < γ i2 <... < γ i,j 1 < γ ij = 1 for each i = 1,..., m. Consider independent multinomial observations (Y i1,..., Y ij ), i = 1,..., m with corresponding predictors x 1,..., x m. Under a cumulative link model or ordinal regression model (McCullagh, 1980; Agresti, 2013; Christensen, 2013), there exists a link function g and parameters of interest θ 1,..., θ J 1, β = (β 1,..., β d ) T, such that g(γ ij ) = θ j x T i β, j = 1,..., J 1. This leads to m(j 1) equations in d + J 1 parameters (β 1,..., β d, θ 1,..., θ J 1 ). Furthermore, if g is strictly increasing, then θ 1 < θ 2 < < θ J 1 under Assumption 1, which is the case for commonly used link functions including logit (log(γ/(1 γ)), probit (Φ 1 (γ)), log-log ( log( log(γ))), complementary log-log (log( log(1 γ))), and cauchit (tan(π(γ 1/2))) (McCullagh and Nelder, 1989; Christensen, 2013). 4
5 Example 1. Consider the logit link g(γ) = log(γ/(1 γ)) with two factors and three ordered categories. The model consists of 2m equations g(γ ij ) = θ j x i1 β 1 x i2 β 2, i = 1,..., m; j = 1, 2 and 4 parameters (β 1, β 2, θ 1, θ 2 ). Under Assumption 1, γ i1 < γ i2 and θ 1 < θ 2 since g is strictly increasing. Example 2. Suppose the model consists of three covariates x 1, x 2, x 3 and a few second-order items, g(γ ij ) = θ j x i1 β 1 x i2 β 2 x i3 β 3 x i1 x i2 β 12 x 2 i1β 11 x 2 i2β 22, where i = 1,..., m; j = 1,..., J 1. Then d = 6. Since (Y i1,..., Y ij ), i = 1,..., m are independent, the log-likelihood function (up to a constant) of the cumulative link model is l(β 1,..., β d, θ 1,..., θ J 1 ) = m J Y ij log(π ij ) i=1 j=1 where π ij = γ ij γ i,j 1 with γ ij = g 1 (θ j x T i β) for j = 1,..., J 1 and γ i0 = 0, γ ij = 1, i = 1,..., m. Assumption 2. The link function g is differentiable and its derivative g is always strictly positive. We keep Assumption 2 throughout the paper, which is satisfied for logit, probit, log-log, complementary log-log, and cauchit. Under Assumptions 1 and 2, g is strictly increasing and thus θ 1 < θ 2 < < θ J 1. For s = 1,..., d, t = 1,..., J 1, l β s = l θ t = m ( x is ) i=1 { Yi1 π i1 (g 1 ) (θ 1 x T i β) + Y i2 [(g 1 ) (θ 2 x T i β) (g 1 ) (θ 1 x T i β) ] π i2 + + Y ij [ (g 1 ) (θ J 1 x T i β) ] } π ij m ( (g 1 ) (θ t x T Yit i β) Y ) i,t+1 π it π i,t+1 i=1 Since Y ij s come from multinomial distributions, we know E(Y ij ) = n i π ij, E(Yij) 2 = n i (n i 1)πij 2 + n i π ij, and E(Y is Y it ) = n i (n i 1)π is π it when s t. Then we have the following lemma. 5
6 Lemma 1. Let F = (F st ) be the (d + J 1) (d + J 1) Fisher information matrix. (i) For 1 s d, 1 t d, ( ) l l F st = E = β s β t m n i x is x it i=1 J (g ij g i,j 1 ) 2 where g ij = (g 1 ) (θ j x T i β) > 0 for j = 1,..., J 1 and g i0 = g ij = 0. j=1 π ij (ii) For 1 s d, 1 t J 1, ( ) l l m ( git g i,t 1 F s,d+t = E = n i ( x is )g it β s θ t π it (iii) For 1 s J 1, 1 t d, ( ) l l m ( gis g i,s 1 F d+s,t = E = n i ( x it )g is θ s β t π is i=1 i=1 g ) i,t+1 g it π i,t+1 g ) i,s+1 g is π i,s+1 (iv) For 1 s J 1, 1 t J 1, ( ) m l l i=1 n igis(π 2 1 is + π 1 i,s+1 ), if s = t F d+s,d+t = E = m i=1 θ s θ t n ig is g it ( π 1 i,s t ), if s t = 1 0, if s t 2 where s t = max{s, t}. Perevozskaya et al. (2003) obtained a detailed form of Fisher information matrix for logit link and one predictor. Our expressions here are good for fairly general link and d predictors. To simplify the notations, we denote e i = J (g ij g i,j 1 ) 2 j=1 c it = g it ( git g i,t 1 π it u it = g 2 it(π 1 it π ij > 0, i = 1,..., m (2) ), i = 1,..., m; t = 1,..., J 1 (3) g i,t+1 g it π i,t+1 + π 1 i,t+1 ) > 0, i = 1,..., m; t = 1,..., J 1 (4) b it = g i,t 1 g it π 1 it > 0, i = 1,..., m; t = 2,..., J 1 (if J 3) (5) Note that g ij is defined in Lemma 1 (i). Then we obtain the following lemma which plays a key role in later on calculation of F. 6
7 Lemma 2. c it = u it b it b i,t+1, i = 1,..., m; t = 1,..., J 1; e i = J 1 t=1 c it = J 1 t=1 (u it 2b it ), i = 1,..., m, where b i1 = b ij = 0 for i = 1,..., m. Example 1 (continued) For logit link g, g 1 (η) = e η /(1 + e η ) and (g 1 ) = g 1 (1 g 1 ). Thus g ij = (g 1 ) (θ j x T i β) = (γ ij )(1 γ ij ). With J = 3, we have π i1 +π i2 +π i3 = 1 for i = 1,..., m. Then for i = 1,..., m, g i1 = π i1 (π i2 + π i3 ), g i2 = (π i1 + π i2 )π i3, b i2 = π i1 π i3 π 1 i2 (π i1 + π i2 )(π i2 + π i3 ), u i1 = π i1 π 1 i2 (π i1 + π i2 )(π i2 + π i3 ) 2, u i2 = π i3 π 1 i2 (π i1 + π i2 ) 2 (π i2 + π i3 ), c i1 = π i1 (π i1 + π i2 )(π i2 +π i3 ), c i2 = π i3 (π i1 +π i2 )(π i2 +π i3 ), e i = (π i1 +π i2 )(π i1 +π i3 )(π i2 +π i3 ) As a direct conclusion of Lemma 1 and Lemma 2, we obtain the theorem as follows: Theorem 1. Under Assumptions 1 and 2, the Fisher information matrix F can be written as m F = n i A i (6) A T i2 i=1 where the (d + J 1) (d + J 1) matrix ( ) ( ) Ai1 A A i = i2 (ei x = is x it ) s=1,...d;t=1,...,d ( x is c it ) s=1,...,d;t=1,...,j 1 A i3 ( c is x it ) s=1,...,j 1;t=1,...,d A i3 and the (J 1) (J 1) matrix A i3 is symmetric tri-diagonal with diagonal entries u i1,..., u i,j 1 and off-diagonal entries b i2,..., b i,j 1 for J 3. Note that A i3 contains only one entry u i1 for J = 2. Examples of A i3 include (u i1 ), ( ui1 b i2 b i2 u i2 ), u i1 b i2 0 b i2 u i2 b i3 0 b i3 u i3 for J = 2, 3, 4, or 5 respectively., u i1 b i2 0 0 b i2 u i2 b i3 0 0 b i3 u i3 b i4 0 0 b i4 u i4 Remark 1. As an important property of the Fisher information matrix, F is always positive semi-definite (p.s.d.) which implies F 0. As a special case, A i can be regarded as the Fisher information matrix at the support point x i. Therefore, A i is also p.s.d. and A i 0 (actually A i = 0 according to Lemma 3 in Section 3). 7
8 3 Determinant of Fisher Information Matrix Among the several criteria for optimal designs, D-criterion looks for the allocation maximizing F, the determinant of F. A D-optimal design with m predetermined design points x 1,..., x m could either be an integer-valued allocation (n 1, n 2,..., n m ) maximizing F with pre-determined n = m i=1 n i > 0, known as an exact design; or a real-valued allocation (p 1, p 2,..., p m ) maximizing n 1 F with p i = n i /n 0 and m i=1 p i = 1, known as an approximate design. To study the structure of F as a polynomial function of (n 1,..., n m ), we denote the (k, l)th entry of A i by a (i) kl. Given a row map τ : {1, 2,..., d + ( J 1} ) {1,..., m}, we define a (d + J 1) (d + J 1) matrix A τ = a (τ(k)) kl whose kth row is given by the kth row of A τ(k). For a power index (α 1,..., α m ) with α i {0, 1,..., d + J 1} and m i=1 α i = d + J 1, we denote τ (α 1,..., α m ) if α i = #{j : τ(j) = i} for each i = 1,..., m. In terms of the construction of A τ, it says that α i rows of A τ are from the matrix A i. Theorem 2. The determinant F is an order-(d + J 1) homogeneous polynomial of (n 1,..., n m ) and F = α 1 + +α m =d+j 1 c α1,...,α m n α 1 1 n α m m where c α1,...,α m = A τ (7) τ (α 1,...,α m) Proof of Theorem 2: According to the Leibniz formula for the determinant, m F = n i A i = d+j 1 m n i a (i) k,σ(k) i=1 σ S d+j 1 ( 1) sgn(σ) where σ is a permutation of {1, 2,..., d + J 1}, and sgn(σ) is the sign or k=1 i=1 8
9 signature of σ. Therefore, c α1,...,α m = = = σ S d+j 1 ( 1) sgn(σ) τ (α 1,...,α m) ( 1) sgn(σ) τ (α 1,...,α m ) σ S d+j 1 τ (α 1,...,α m ) A τ d+j 1 k=1 d+j 1 k=1 a (τ(k)) k,σ(k) a (τ(k)) k,σ(k) In order to obtain analytic properties of F, we need the following lemmas derived from Lemma 2 and Theorem 1, as well as classical matrix theory and mathematical induction. Note that the following Lemma 3 covers Lemma 1 in Perevozskaya et al. (2003) as a special case. Lemma 3. Rank(A i ) = Rank(A i3 ) = J 1. Furthermore, A i3 is positive definite and J 1 J A i3 = gis 2 π 1 it > 0 (8) s=1 Lemma 4. Rank((A i1 A i2 )) 1 where = is true if and only if x i 0. Based on Lemma 3 and Lemma 4, we can obtain the two lemmas below on c α1,...,α m which significantly simplify the structure of F as a polynomial of (n 1,..., n m ). Lemma 5. If max 1 i m α i J, then A τ = 0 for any τ (α 1,..., α m ) and thus c α1,...,α m = 0. Proof of Lemma 5: Without any loss of generality, we assume α 1 α 2 α m. Then max 1 i m α i J implies α 1 J. In this case, for any τ (α 1,..., α m ), τ 1 (1) := {i τ(i) = 1} {1,..., d + J 1} and τ 1 (1) = α 1. If τ 1 (1) {1,..., d} 2, then A τ = 0 due to Lemma 4; otherwise {d + 1,..., d + J 1} τ 1 (1) and thus A τ = 0 due to Lemma 3. Thus c α1,...,α m = 0 according to (7) provided in Theorem 2. Lemma 6. If #{i : α i 1} d, then A τ = 0 for any τ (α 1,..., α m ) and thus c α1,...,α m = 0. 9 t=1
10 Proof of Lemma 6: Without any loss of generality, we assume α 1 α 2 α m. Then #{i : α i 1} d indicates α d+1 = = α m = 0. Let τ : {1, 2,..., d + J 1} {1,..., m} satisfy τ (α 1,..., α m ). Then the (d + J 1) (d + J 1) matrix A τ can be written as ( ) ( Aτ1 A τ2 (eτ(s) x = τ(s)s x τ(s)t ) s=1,...d;t=1,...,d ( x τ(s)s c τ(s)t ) s=1,...,d;t=1,...,j 1 A τ3 A τ4 ( c τ(d+s)s x τ(d+s)t ) s=1,...,j 1;t=1,...,d A τ4 ) where the (J 1) (J 1) matrix A τ4 is either a single entry u τ(d+1)1 (if J = 2) or symmetric tri-diagonal with diagonal entries u τ(d+1)1,..., u τ(d+j 1),J 1, upper off-diagonal entries b τ(d+1)2,..., b τ(d+j 2),J 1, and lower off-diagonal entries b τ(d+2)2,..., b τ(d+j 1),J 1. Note that A τ is asymmetric in general. If #{i : α i 1} d 1, then there exists an i 0 such that 1 i 0 d and τ 1 (i 0 ) {1,..., d} 2. In this case, A τ = 0 according to Lemma 4. If #{i : α i 1} = d, we may assume τ 1 (i) {1,..., d} = 1 for i = 1,..., d (otherwise A τ = 0 according to Lemma 4). Suppose α 1 α 2 α k 2 > α k+1. Then {d+1,..., d+j 1} k i=1τ 1 (i) and k i=1 (α i 1) = J 1. In order to show A τ = 0, we first replace A τ1 with A (1) τ1 = (e τ(s) x τ(s)t ) s=1,...d; t=1,...,d and replace A τ2 with A (1) τ2 = ( c τ(s)t ) s=1,...,d; t=1,...,j 1. It changes A τ into a new matrix A (1) τ. Note that A τ = d s=1 x τ(s)s A (1) τ. According to Lemma 2, the sum of the columns of A (1) τ2 is ( e τ(1),..., e τ(d) ) T, and the elementwise sum of the columns of A τ4 is (c τ(d+1)1, c τ(d+2)2,..., c τ(d+j 1),J 1 ) T. Secondly, for t = 1,..., d, we add x 1t ( e τ(1),..., e τ(d), c τ(d+1)1,..., c τ(d+j 1),J 1 ) T to the tth column of A (1) τ. We denote the resulting matrix by A (2) τ. Note that A (1) τ = A (2) τ. We consider the sub-matrix A (2) τd which consists of the first d columns of A (2) τ. For s τ 1 (1), the sth row of A (2) τd is simply 0. For i = 2,..., k, the jth row of A(2) τd is proportional to (x i1 x 11, x i2 x 12,..., x id x 1d ) if j τ 1 (i). Therefore, Rank(A (2) τd ) (d + J 1) α 1 k i=2 (α i 1) = d 1, which leads to A (2) τ = 0 and thus A (1) τ = 0, A τ = 0. According to (7) in Theorem 2, c α1,...,α m = 0. Example 3. Suppose d = 2, J = 3 with link function g. According to Theorem 2, F in this case is an order-4 homogeneous polynomial of (n 1,..., n m ). Due to Lemma 5 and Lemma 6, we can remove all the terms in the form of 10
11 n 4 i, n 3 i n j, or n 2 i n 2 j from F. Therefore, F = m c ijk n 2 i n j n k + c ijkl n i n j n k n l i=1 j<k,j i,k i i<j<k<l for some coefficients c ijk and c ijkl. Based on Lemma 5 and Lemma 6, in order to keep c α1,...,α m 0, the largest possible α i is J 1 and the fewest possible number of positive α i s is d + 1. As a direct conclusion of Lemma 6, the following theorem states a minimally supported design has at least d + 1 support point. Note that it could be much less than the number of parameters d + J 1. Theorem 3. F > 0 only if m d + 1. In order to find out when d+1 support point are enough for a meaningful design (that is, F > 0), we study the leading term of F with max 1 i m α i = J 1. For example, a i0 = J 1 for some 1 i 0 m. Due to Lemma 6 and m i=1 α i = d + J 1, in order to keep c α1,...,α m 0, there must exist 1 i 1 < i 2 < < i d m which are different from i 0, such that, α i1 = = α id = 1. The following lemma provides the explicit formula of such a coefficient c α1,...,α m. Lemma 7. Suppose α i0 = J 1 and α i1 = = α id = 1, where i 0, i 1,..., i d are distinct integers in {1,..., m}. Then c α1,...,α m = d e is A i0 3 X 1 [i 0, i 1,..., i d ] 2 s=1 where e is is defined by (2), A i0 3 can be calculated by (8), X 1 = (1 X) is an m (d + 1) matrix with 1 = (1,..., 1) T, X = (x 1,..., x m ) T, and X 1 [i 0, i 1,..., i d ] is the sub-matrix consisting of the i 0 th, i 1 th,..., i d th rows of X 1. The proof for Lemma 7 is relegated to the Appendix. For the purpose of finding D-optimal allocations, we write F = f(n 1,..., n m ) for an order- (d+j 1) homogeneous polynomial function f. The D-optimal exact design problem is to solve the integer-valued optimization problem given a positive 11
12 integer n max f(n 1, n 2,..., n m ) subject to n i {0, 1,..., n}, i = 1,..., m (9) n 1 + n n m = n Denote p i = n i /n, i = 1,..., m. According to Theorem 1, m m f(n 1,..., n m ) = n i A i = n m p i A i = n d+j 1 p i A i = n d+j 1 f(p 1,..., p m ) i=1 i=1 Therefore, the D-optimal approximate design problem is to solve the realvalued optimization problem i=1 max f(p 1, p 2,..., p m ) subject to 0 p i 1, i = 1,..., m (10) p 1 + p p m = 1 According to Lemma 3, A i0 3 > 0. Thus c α1,...,α m in Lemma 7 is positive as long as X 1 [i 0,..., i d ] is of full rank. Theorem 3 implies that a minimally supported design contains at least d + 1 support points, while the following theorem states a necessary and sufficient condition for the minimum number of support points to be exactly d + 1. Recall that X 1 = (1 X) is defined in Lemma 7. Theorem 4. f(p) > 0 for some p = (p 1,..., p m ) T if and only if Rank(X 1 ) = d + 1. Proof of Theorem 4: Suppose Rank(X 1 ) = d + 1. Then there exist i 0,..., i d {1,..., m}, such that, X 1 [i 0, i 1,..., i d ] = 0. According to Lemma 5, f(p) can be regarded as an order-(j 1) polynomial of p i0. Let p i0 = x (0, 1) and p i = (1 x)/(m 1) for i i 0. Based on Lemma 7, f(p) can be written as ( ) d ( ) d+1 1 x 1 x f i0 (x) = a J 1 x J 1 + a J 2x J 2 m 1 m 1 ( ) d+j 2 ( ) d+j 1 1 x 1 x + + a 1 x + a 0, where m 1 m 1 d a J 1 = A i0 3 e i s X 1 [i 0, i 1,..., i d] 2 > 0 {i 1,...,i d } {1,...,m}\{i 0} 12 s=1
13 Therefore, lim x 1 (1 x) d x 1 J f i0 (x) = (m 1) d a J 1 > 0. That is, f(p) > 0 for p i0 = x close enough to 1 and p i = (1 x)/(m 1) for i i 0. In order to justify that the condition Rank(X 1 ) = d + 1 is also necessary, we only need to show that f(p) 0 if Rank(X 1 ) d. Actually, for any as in the proof of Lemma 6. Then A τ = d s=1 x τ(s)s A (1) τ. Similar as in the proof of Lemma 6, for t = 1,..., d, we add x τ(1)t ( e τ(1),..., e τ(d), c τ(d+1)1,..., c τ(d+j 1),J 1 ) T to the tth column of A (1) τ. We denote the resulting matrix by A (3) τ. Note that A (1) τ = A (3) τ. We consider the sub-matrix A (3) τd which consists of the first d columns of A (3) τ. For s τ 1 (τ(1)), the sth row of A (3) τd is simply 0. For s = 2,..., k, the sth row of A (3) τd is e τ(s)(x τ(s)1 x τ(1)1,..., x τ(s)d x τ(1)d ). For s = 1,..., J 1, the (d + s)th row of A (3) τd is c τ(d+s)s(x τ(d+s)1 x τ(1)1,..., x τ(d+s)d x τ(1)d ). We claim that Rank(A (3) τd ) d 1. Otherwise, if Rank(A (3) τd ) = d, then there exist i 1,..., i d {2,..., d + J 1}, such that, the sub-matrix consisting of the i 1 th,..., i d th rows of A (3) τd is nonsingular. Then the sub-matrix consisting of the τ(1)th, τ(i 1 )th,..., τ(i d )th rows of X 1 is nonsingular, which implies Rank(X 1 ) = d + 1. The contradiction implies Rank(A (3) τd ) d 1. Then A(3) τ = 0 and thus A τ = 0 for each τ. Based on Theorem 2, F 0 and thus f(p) 0. τ : {1,..., d + J 1} {1,..., m}, we construct A (1) τ 4 Locally D-optimal Approximate Design A D-optimal approximate design is an allocation p = (p 1,..., p m ) T solving the optimization problem (10). The solution always exists since f is continuous and the set of feasible allocations m S := {(p 1, p 2,..., p m ) T R m p i 0, i = 1,..., m; p i = 1} is convex and compact. Theorem 4 ascertains that a meaningful D-optimal approximate design problem requires the following assumption. We assume that it is true for the rest of the paper. Assumption 3. Rank(X 1 ) = d + 1. Under Assumption 3, the set of nontrivial allocations S + := {p = (p 1, p 2,..., p m ) T S f(p) > 0} 13 i=1
14 is nonempty. As discussed in Remark 1, the Fisher information matrix F = m i=1 n ia i (see Theorem 1) is always positive semi-definite. Note that f(p) = n 1 d J F given p i = n i /n, i = 1,..., m. Since F = n m i=1 p ia i is linear in p and ϕ( ) = log is concave on positive semi-definite matrices, we know that f(p) is log-concave (Silvey, 1980). Lemma 8. F = F (p) is always positive semi-definite. It is positive definite if and only if p S +. Furthermore, log f(p) is concave on S. Lemma 8 assures that S + is convex given that it is nonempty. Following the proof of Theorem 4, we can justify that S + contains all p whose coordinates are all strictly positive. Theorem 5. f(p) > 0 if and only if Rank(X 1 [{i p i > 0}]) = d + 1, where p = (p 1,..., p m ) T S and X 1 [{i p i > 0}] is the sub-matrix consisting of the {i p i > 0}th rows of X 1. In other words, S + = { p = (p 1, p 2,..., p m ) T S Rank(X 1 [{i p i > 0}]) = d + 1 } Proof of Theorem 5: Combining Theorem 1 and Theorem 4, it is straightforward that f(p) = 0 if Rank(X 1 [{i p i > 0}]) d. We only need to show that f(p) > 0 if Rank(X 1 [{i p i > 0}]) = d + 1. Due to Theorem 1, we only need to verify the case p i > 0, i = 1,..., m. (Otherwise, we may simply remove all support points with p i = 0.) Suppose p i > 0, i = 1,..., m and Rank(X 1 ) = d + 1. Then there exist i 0,..., i d {1,..., m}, such that, X 1 [i 0,..., i d ] = 0. According to the proof of Theorem 4, for each i {i 0,..., i d }, there exists an ϵ i (0, 1), such that, f(p) > 0 as long as p i = x (1 ϵ i, 1) and p j = (1 x)/(m 1) for j i. On the other hand, for each i / {i 0,..., i d }, if we denote the jth row of X 1 by α j, j = 1,..., m, then α i = a 0 α i0 + + a d α id for some real numbers a 0,..., a d. Since α i 0, then at least one a i 0. Without any loss of generality, we assume a 0 0. Then it can be verified that X 1 [i, i 1,..., i d ] 0 too. Following the proof of Theorem 4 again, for such an i / {i 0,..., i d }, there also exists an ϵ i (0, 1), such that, f(p) > 0 as long as p i = x (1 ϵ i, 1) and p j = (1 x)/(m 1) for j i. Let ϵ = min{min i ϵ i, (m 1) min i p i, 1 1/m}/2. For i = 1,..., m, denote δ i = (δ i1,..., δ im ) T S with δ ii = 1 ϵ and δ ij = ϵ /(m 1) for j i. It can be verified that p = a 1 δ a m δ m with a i = (p i ϵ /(m 1))/(1 mϵ /(m 1)). By the choice of ϵ, f(δ i ) > 0, a i > 0, i = 1,..., m, and i a i = 1. Then f(p) > 0 according to Lemma 8. 14
15 Corollary 1. Under Assumption 3, f(p) > 0 if p = (p 1,..., p m ) T S satisfying p i > 0, i = 1,..., m. As a special case, f(p u ) > 0, where p u = (1/m,..., 1/m) T is the uniform allocation. Corollary 2. F > 0 if and only if Rank(X 1 [{i n i > 0}]) = d + 1. Since f(p) is log-concave, the Karush-Kuhn-Tucker conditions (Karush (1939); Kuhn and Tucker (1951)) are also sufficient for p to be D-optimal. We have the following theorem as a direction conclusion. Theorem 6. Suppose p = (p 1,..., p m) T S +. p is D-optimal if and only if there exists a λ R such that for i = 1,..., m, either f(p)/ p i = λ if p i > 0 or f(p)/ p i λ if p i = 0. Theorem 6 provides a Karush-Kuhn-Tucker type condition. It is especially useful for checking when a minimally supported design is D-optimal (see Section 6). Another necessary and sufficient condition for D-optimal designs is of the general-equivalence-theorem type (Kiefer, 1974; Pukelsheim, 1993; Atkinson et al., 2007; Stufken and Yang, 2012; Fedorov and Leonov, 2014; Yang, Mandal and Majumdar, 2014). It is more convenient while searching for numerical solutions. Following Yang, Mandal and Majumdar (2014), for given p = (p 1,..., p m ) T S + and i {1,..., m}, we define ( 1 z f i (z) = f p 1,..., 1 z p i 1, z, 1 z p i+1,..., 1 z ) p m (11) 1 p i 1 p i 1 p i 1 p i with 0 z 1. Note that f i (z) is well defined as long as p i < 1. Suppose f(p) > 0. Following the proof of Theorem 4, we obtain the following theorem on the coefficients of f i (z). Theorem 7. Suppose p = (p 1,..., p m ) T S +. Given i {1,..., m}, for 0 z 1, J 1 f i (z) = (1 z) d a j z j (1 z) J 1 j (12) j=0 where a 0 = f i (0), (a J 1,..., a 1 ) T = B 1 J 1 c, B J 1 = (s t 1 ) st is a (J 1) (J 1) matrix, and c = (c 1,..., c J 1 ) T with c j = (j + 1) d+j 1 j d f i ( 1 ) j+1 j J 1 f i (0), j = 1,..., J 1. 15
16 According to Theorem 7, f i (z) is an order-(d + J 1) polynomial of z. In other to determine its coefficients a 0, a 1,..., a J 1 as in (12), we need to calculate f i (0), f i (1/2), f i (1/3),..., f i (1/J), which are J determinants defined in (11). Note that B 1 J 1 is a matrix determined by J 1 only. For example, B1 1 = 1 for J = 2, ( 2 1 B2 1 = 1 1 ), B3 1 = , B4 1 = for J = 3, 4, or 5 respectively. Once a 0,..., a J 1 in (12) are determined, the maximization of f i (z) on z [0, 1] is numerically straightforward since it is a polynomial and its derivative is given by J 1 J 1 f i(z) = (1 z) d ja j z j 1 (1 z) J 1 j (1 z) d 1 (d+j 1 j)a j z j (1 z) J 1 j j=1 (13) Following the proof of Theorem 3.1.1, Theorem 3.3.3, and the lift-one algorithm in Yang, Mandal and Majumdar (2014), we have similar results and algorithm as follows: Theorem 8. Suppose p = (p 1,..., p m) T S +. p is D-optimal if and only if for each i = 1,..., m, f i (z), 0 z 1 attains it maximum at z = p i. Lift-one algorithm: 1 Start with arbitrary p 0 = (p 1,..., p m ) satisfying 0 < p i < 1, i = 1,..., m and compute f (p 0 ). 2 Set up a random order of i going through {1, 2,..., m}. 3 Following the random order of i in 2, for each i, determine f i (z) according to Theorem 7. In this step, J determinants f i (0), f i (1/2), f i (1/3),..., f i (1/J) are calculated based on (11). 4 Use quasi-newton method with gradient defined in (13) to find z maximizing f i (z) with 0 z 1. If f i (z ) f i (0), let z = 0. Define p (i) = ( 1 z 1 p i p 1,..., 1 z 1 p i p i 1, z, 1 z 1 p i p i+1,..., 1 z 1 p i p m ) T. Note that f(p (i) ) = f i (z ). 16 j=0 1 6
17 5 Replace p 0 with p (i), f (p 0 ) with f(p (i) ). 6 Repeat 2 5 until convergence, that is, f(p 0 ) = f(p (i) ) for each i. Theorem 9. When the lift-one algorithm converges, the resulting allocation p maximizes f(p) on the set of feasible allocations S. Example 4. Odor removal study The motivating example mentioned in Introduction is the odor removal study conducted at the University of Georgia. The scientists study the manufacture of bio-plastics from algae that contain odorous volatiles. These odorous volatiles, generated from algae bioplastics, either occur naturally within the algae or are generated through the thermoplastic processing due to heat and pressure. In order to commercialize these algae bio-plastics, the odor causing volatiles must be removed. For that purpose, a 2 2 factorial experiment was conducted using algae and synthetic plastic resin blends. The two factors were types of algae (X 1 : raffinated or solvent extracted algae ( ), catfish pond algae (+)) and synthetic resins (X 2 : polyethylene ( ), polypropylene (+)). The responses had three categories: serious odor (j = 1), medium odor (j = 2) and almost no odor (j = 3). The results of a pilot study with uniform design and ten replicates at each experimental setting are given in Table 1. We consider the logit link and fit Table 1: Odor Removal Study Group X 1 X 2 Responses # of replicates Model y i1 y i2 y i3 i = n 1 = y 1j = 10 g(γ 1j ) = θ j β 1 β 2 i = n 2 = y 2j = 10 g(γ 2j ) = θ j β 1 + β 2 i = n 3 = y 3j = 10 g(γ 3j ) = θ j + β 1 β 2 i = n 4 = y 4j = 10 g(γ 4j ) = θ j + β 1 + β 2 the cumulative link model presented in Table 1. The estimated values of the model parameters are ( ˆβ 1, ˆβ 2, ˆθ 1, ˆθ 2 ) = ( 2.45, 1.09, 2.67, 0.21). Suppose a follow-up experiment is planned and the estimated parameter values are regarded as the true value. Then the D-optimal approximate allocation found by the lift-one algorithm is p o = (0.4454, , 0, ) T. 17
18 With respect to p o, the relative efficiency of the uniform approximate allocation p u = (1/4, 1/4, 1/4, 1/4) T is (f(p u )/f(p o )) 1/4 = 79.6% which is far from satisfactory. With all examples that we studied, the lift-one algorithms converge very fast. Nevertheless, Yang, Mandal and Majumdar (2014) also provided a modified lift-one algorithm, which is slightly slower but guaranteed to converge. The same technique could be easily applied to the lift-one algorithm above if it does not converge in a pre-specified number of iterations. 5 Locally D-optimal Exact Design A locally D-optimal exact design is an integer-valued allocation n = (n 1,..., n m ) T maximizing F given the total number n of experimental units or runs, where n i s are nonnegative integers satisfying m i=1 n i = n. According to Corollary 2, we must have n d + 1 in order to make F > 0 possible. Thus we assume n d + 1 in this section to avoid trivial cases. To maximize f(n) = f(n 1,..., n m ) = F, we adopt the idea of exchange algorithm which was first suggested by Fedorov (1972). Following the algorithm described in Yang, Mandal and Majumdar (2014), the exchange algorithm here is to adjust n i and n j simultaneously for randomly chosen index pair (i, j) while keeping n i + n j = c as a constant. We start with an n = (n 1,..., n m ) T satisfying f(n) > 0. According to Corollary 2, it indicates Rank(X 1 [{i n i > 0}]) = d + 1. Following Yang, Mandal and Majumdar (2014), for 1 i < j m, we define f ij (z) = f (n 1,..., n i 1, z, n i+1,..., n j 1, c z, n j+1,..., n m ) (14) where c = n i + n j, z = 0, 1,..., c. Note that f ij (n i ) = f(n). As a conclusion of Theorem 2, Lemmas 5 and 6, we have the following formula on calculating f ij (z): Theorem 10. Suppose n = (n 1,..., n m ) T satisfies f(n) > 0. Given 1 i < j m, suppose n i + n j J. For z = 0, 1,..., n i + n j, f ij (z) = J c s z s (15) s=0 18
19 where c 0 = f ij (0), and c 1,..., c J can be obtained by (c 1,..., c J ) T = B 1 J (d 1,..., d J ) T with B J = (s t 1 ) st as a J J matrix and d s = (f ij (s) f ij (0))/s, s = 1,..., J. Note that the J J matrix B J in Theorem 10 shares the same form of B J 1 in Theorem 7. According to Theorem 10, in other to maximize f ij (z) with z = 0, 1,..., n i + n j, one can obtain the exact polynomial form of f ij (z) by calculating f ij (0), f ij (1),..., f ij (J). There is no practical need to find out the exact form of f ij (z) if n i +n j < J since one may simply calculate f ij (z) for each z = 0, 1,..., n i + n j. Following Yang, Mandal and Majumdar (2014), the algorithm below based on Theorem 10 could be used to find out the D-optimal exact allocation. Exchange algorithm for D-optimal allocation (n 1,..., n m ) T given n > 0: 1 Start with an initial design n = (n 1,..., n m ) T such that f(n) > 0. 2 Set up a random order of (i, j) going through all pairs {(1, 2), (1, 3),..., (1, m), (2, 3),..., (m 1, m)}. 3 For each (i, j), let c = n i + n j. If c = 0, let n ij = n. Otherwise, there are two cases. Case one: 0 < c J, we calculate f ij (z) as defined in (14) for z = 0, 1,..., c directly and find z which maximizes f ij (z). Case two: c > J, we first calculate f ij (z) for z = 0, 1,..., J; secondly determine c 0, c 1,..., c J in (15) according to Theorem 10; thirdly calculate f ij (z) for z = J + 1,..., c based on (15); fourthly find z maximizing f ij (z) for z = 0,..., c. For both cases, we define n ij = (n 1,..., n i 1, z, n i+1,..., n j 1, c z, n j+1,..., n m ) T Note that f(n ij) = f ij (z ) f(n) > 0. If f(n ij) > f(n), replace n with n ij, and f(n) with f(n ij). 4 Repeat 2 3 until convergence, that is, f(n ij) = f(n) in step 3 for any (i, j). Example 4 : Odor Removal Study (continued) Suppose we want to conduct a followup experiment with n runs. Using the exchange algorithm described above, we obtain the D-optimal exact designs listed in Table 2. It can be seen from the number of iterations that the algorithms for D- optimal exact and approximate designs converge very quickly. As expected, 19
20 Table 2: D-optimal Exact Designs and Approximate Design for Odor Removal Study n n 1 n 2 n 3 n 4 n 4 F # iterations Time(sec.) < p o the D-optimal exact allocations (n 1,..., n 4 ) T is consistent with the D-optimal approximate allocation p o = (p 1,..., p 4 ) T (last row of Table 4) for large n. The time costs in seconds (last column of Table 4) are recorded on a PC with 2GHz CPU and 8GB memory. Suppose we rerun a design with n = 40. With respect to the D-optimal exact design n o = (18, 11, 0, 11) T, the relative efficiency of the uniform exact design n u = (10, 10, 10, 10) T is only (f(n u )/f(n o )) 1/4 = 79.7%. 6 Minimally Supported Designs A minimally supported design is a design with the minimal number of support/design points while keeping F > 0. It is of practical significance since it indicates the minimal number of different experimental settings needed in the experiment. According to Theorem 3, a minimally supported design contains at least d + 1 support points. Note that the minimal number d + 1 does not depend on J and could be strictly smaller than the number of parameters d + J 1. On the other hand, according to Theorem 4, a minimally supported design could contain exactly d + 1 support points as long as the extended design matrix X 1 = (1 X) is of full rank, that is, Rank(X 1 ) = d + 1. Example 5. Suppose J = 2. The multinomial response is actually binomial. In this case, there are d + 1 parameters, θ 1, β 1,..., β d. Consider a general link function satisfying Assumptions 1 and 2. For i = 1,..., m, g i0 = g i2 = 0, g i1 = (g 1 ) (θ 1 x T i β) > 0, e i = u i1 = c i1 = g 2 i1/[π i1 (1 π i1 )]. Then A i3 20
21 in Theorem 1 contains only one entry, u i1, and thus A i3 = u i1 or simply e i (Lemma 3 still holds). Assume that the m d design matrix X satisfies Assumption 3. According to Theorem 2, Lemma 5, Lemma 6, and Lemma 7, for an approximate design p = (p 1,..., p m ) T, f(p) = n (d+1) F = X 1 [i 0, i 1,..., i d ] 2 p i0 e i0 p i1 e i1 p id e id 1 i 0 <i 1 < <i d m (16) It can be verified that equation (16) is essentially the same as Lemma 3.1 in Yang and Mandal (2014). According to Theorem 3.2 in Yang and Mandal (2014), a minimally supported design may contain d + 1 support points and a D-optimal one must keep equal weight 1/(d + 1) on all support points. For univariate responses (including binomial response) under generalized linear models, a minimally supported design must keep equal weights on all its support points in order to keep D-optimality (Yang, Mandal and Majumdar, 2014; Yang and Mandal, 2014). However, for multinomial type responses with J 3, it is usually not the case. In the following part of this section, we use the cases of d = 1 and d = 2 as illustrations. 6.1 Minimally supported designs with d = 1 and J 3 In this subsection, we consider the cases with d = 1 and J 3. That is, there is only one factor in the experiment and the response belongs to J 3 categories. The corresponding parameters are β 1 and θ 1,..., θ J 1. We first set m = 2, that is, a design with only two support points (minimally supported). As a direct conclusion from Theorem 2, Lemma 5, and Lemma 6, for an approximate design p = (p 1, p 2 ) T, we have the result on the form of F as follows: Theorem 11. Suppose d = 1, J 3, and m = 2. The objective function for a D-optimal approximate design is J 1 f(p 1, p 2 ) = n 2 F = c s p J s 1 p s 2 (17) where c 1,..., c J 1 can be obtained by (c 1,..., c J 1 ) T = B 1 J 1 (d 1,..., d J 1 ) T with B J 1 = (s t 1 ) st as a (J 1) (J 1) matrix and d s = f(1/(s+1), s/(s+ 1)) (s + 1) J /s, s = 1,..., J s=1
22 Actually, there is another way to calculate c 1,..., c J 1 in equation (17). For example, according to Lemma 7, c 1 = e J 1 2 s=1 g2 1s J t=1 π 1 1t (x 1 x 2 ) 2, c J 1 = e J 1 1 s=1 g2 2s J t=1 π 1 2t (x 1 x 2 ) 2, where x 1, x 2 are the two levels of the only factor. Nevertheless, Theorem 11 provides a practically convenient way to find out the exact form of the objective function after calculating F for J 1 different designs. Then the D-optimal problem is to maximize an order- J polynomial (f(z, 1 z) for z [0, 1]) which is numerically straightforward. As a special case which can be solved explicitly, we set J = 3 and get the following result as a direct conclusion of Theorem 6 and Theorem 11. Corollary 3. Suppose d = 1, J = 3, and m = 2. The objective function for a D-optimal approximate design is f(p 1, p 2 ) = p 1 p 2 (c 1 p 1 + c 2 p 2 ) (18) where c 1 = e 2 g 2 11g 2 12(π 11 π 12 π 13 ) 1 (x 1 x 2 ) 2 > 0, c 2 = e 1 g 2 21g 2 22(π 21 π 22 π 23 ) 1 (x 1 x 2 ) 2 > 0, and x 1, x 2 are the two levels of the factor. The D-optimal design p = (p 1, p 2) which maximizes (18) can be obtained as follows p 1 = c 1 c 2 + c 2 1 c 1 c 2 + c 2 2 2c 1 c 2 + c 2 1 c 1 c 2 + c 2 2, p 2 = Furthermore, p 1 = p 2 = 1/2 if and only if c 1 = c 2. c 1 2c 1 c 2 + c 2 1 c 1 c 2 + c 2 2 (19) For the case of (d, J, m) = (1, 3, 2), it can verified that the D-optimal design p 1 = p 2 = 1/2 if β 1 = 0. However, p 1 p 2 in general, and p 1 > p 2 if and only if c 1 > c 2, where c 1, c 2 are defined as in Corollary 3. The following result provides a necessary and sufficient condition for a minimally supported design to be D-optimal for the case of d = 1 and J = 3. Its proof is relegated to the supplementary materials. Corollary 4. Suppose d = 1, J = 3, and m 3. Let x 1,..., x m denote the m distinct levels of the factor. A minimally supported design p = (p 1, p 2, 0,..., 0) T is D-optimal if and only if (1) p 1, p 2 are defined same as in (19); (2) For i = 3,..., m, s i3 (p 1) 2 + (s i5 2c 1 )p 1p 2 + (s i4 c 2 )(p 2) 2 0, where c 1, c 2 are same as in Corollary 3, s i3 = e i g 2 11g 2 12(π 11 π 12 π 13 ) 1 (x 1 x i ) 2 > 0, s i4 = e i g 2 21g 2 22(π 21 π 22 π 23 ) 1 (x 2 x i ) 2 > 0, s i5 = e 1 (u 22 u i1 + u 21 u i2 2b 22 b i2 )(x 1 x 2 )(x 1 x i ) + e 2 (u 12 u i1 + u 11 u i2 2b 12 b i2 )(x 2 x 1 )(x 2 x i ) + e i (u 12 u 21 + u 11 u 22 2b 12 b 22 )(x i x 1 )(x i x 2 ). 22
23 (a) β= 2 (b) θ 2 =5 p 2=0 θ p 2=0 p 1>0, p 2>0, p 3>0 p 1=0 p 3=0 θ p 3=0 p 1=0 p 2=0 p1>0, p2>0, p3>0 p1>0, p2>0, p3> θ β Figure 1: Regions for a two-point design to be D-optimal with d = 1, J = 3, x { 1, 0, 1}, and logit link (note that θ 1 < θ 2 is required) Example 6. Suppose d = 1, J = 3, and m = 3 with three factor levels { 1, 0, 1}. Under the logit link g(γ) = log(γ/(1 γ)), there are three parameters β, θ 1, θ 2 satisfying g(γ 1j ) = θ j + β, g(γ 2j ) = θ j, g(γ 3j ) = θ j β, j = 1, 2 It can be verified that the D-optimal deign satisfies p 1 = p 3 = 1/2 if β = 0. Figure 1 shows cases with more general parameter values. In Figure 1(a), four regions in (θ 1, θ 2 )-plane are occupied by minimally supported designs (note that θ 1 < θ 2 is required). For example, regions labeled with p 2 = 0 indicates a minimally supported design satisfying p 2 = 0 is D-optimal given such a triple (θ 1, θ 2, β = 2). From Figure 1(b), one can see clearly that a design supported on { 1, 1} (that is, p 2 = 0) is D-optimal if β is not far away from Minimally supported designs with d = 2 and J = 3 In this subsection, we consider experiments with two factors and three categories. The corresponding parameters are β 1, β 2, θ 1, θ 2. For cases with more than three categories, similar conclusions could be obtained accordingly but with messier notations. According to Theorem 3, a minimally supported design in this case needs three support points, for example, (x i1, x i2 ), i = 1, 2, 3. Under Assumption 3, X 1 0, where X 1 = (1 X) is defined as in Lemma 7. In this case, X 1 is 23
24 a 3 3 matrix. Following Theorem 2, Lemmas 5, 6, and 7, the objective function for a minimally supported design at (d, J, m) = (2, 3, 3) is f(p 1, p 2, p 3 ) = X 1 2 e 1 e 2 e 3 p 1 p 2 p 3 (w 1 p 1 + w 2 p 2 + w 3 p 3 ) (20) where w i = e 1 i gi1g 2 i2(π 2 i1 π i2 π i3 ) 1 > 0, i = 1, 2, 3. We first solve the D-optimal design p = (p 1, p 2, p 3 ) T maximizing f(p 1, p 2, p 3 ) in (20), or equivalently maximizing p 1 p 2 p 3 (p 1 w 1 + p 2 w 2 + p 3 w 3 ). Since f(p 1, p 2, p 3 ) = 0 if p 1 p 2 p 3 = 0, then a D-optimal p = (p 1, p 2, p 3 ) T maximizing f(p 1, p 2, p 3 ) must satisfy 0 < p 1, p 2, p 3 < 1. As a direct conclusion of Theorem 6, a necessary condition for (p 1, p 2, p 3 ) to maximize f(p 1, p 2, p 3 ) is f = f = f (21) p 1 p 2 p 3 Following Tong et al. (2014), we are able to find analytic solutions maximizing equation (20). Theorem 12. Without any loss of generality, we assume 0 < w 3 w 2 w 1. The D-optimal allocation p = (p 1, p 2, p 3) T maximizing f(p 1, p 2, p 3 ) in (20) exists and is unique. It satisfies 0 < p 3 p 2 p 1 < 1 and can be obtained analytically as follows (i) If w 2 = w 3, then p 1 = 1 /(4w ), p 2 = p 3 = 2w 1 /(4w ), where 1 = 2w 1 3w 2 + 4w 2 1 4w 1 w 2 + 9w 2 2. Note that a special case is p 1 = p 2 = p 3 = 1/3 if we have w 3 = w 2 = w 1. (ii) If w 1 = w 2 w 3, then p 1 = p 2 = 2 /[2( 2 + 2w 1 )], p 3 = 2w 1 /( 2 + 2w 1 ), where 2 = 3w 1 2w 3 + 9w 2 1 4w 1 w 3 + 4w 2 3. (iii) If 0 < w 3 < w 2 < w 1, then p 1 = y 1 /(y 1 + y 2 + 1), p 2 = y 2 /(y 1 + y 2 + 1), p 3 = 1/(y 1 + y 2 + 1), where y 1 = b /3 (3b 1 b 2 2) 3A 1/3 + A1/ /3, y 2 = (w 1 w 3 )y 1 (w 2 w 3 ) + (w 1 w 2 )y 1 with A = 27b 0 + 9b 1 b 2 2b /2 (27b b b 0 b 1 b 2 b 2 1b b 0 b 3 2) 1/2, b i = c i /c 3, i = 0, 1, 2, and c 0 = w 3 (w 2 w 3 ) > 0, c 1 = 3w 1 w 2 w 1 w 3 4w 2 w 3 + 2w 2 3 > 0, c 2 = 2w 2 1 4w 1 w 2 w 1 w 3 + 3w 2 w 3, c 3 = w 1 (w 2 w 1 ) < 0. 24
25 The proof of Theorem 12 is relegated to the Appendix. A quick conclusion is that in this case a minimally supported design is usually not uniformly supported. Corollary 5. Suppose d = 2, J = 3, and m = 3. Then p = (1/3, 1/3, 1/3) T is D-optimal if and only if w 1 = w 2 = w 3, where w 1, w 2, w 3 are defined as in (20). Example 7. Suppose d = 2, J = 3, and m = 4. Consider a typical 2 2 factorial design problem, that is, the four design points are (x i1, x i2 ) = (1, 1), (1, 1), ( 1, 1), and ( 1, 1) for i = 1, 2, 3, 4 respectively. Suppose the link function g is differentiable and strictly monotonic. Define w i = e 1 i gi1g 2 i2(π 2 i1 π i2 π i3 ) 1, i = 1, 2, 3, 4. (i) If β 1 = β 2 = 0, then w 1 = w 2 = w 3 = w 4. (ii) If β 1 = 0, β 2 0, then w 1 = w 3, w 2 = w 4, but w 1 w 2. (iii) If β 1 0, β 2 = 0, then w 1 = w 2, w 3 = w 4, but w 1 w 3. (iv) If β 1 = β 2 0, then w 2 = w 3, but w 1, w 2, w 4 are distinct. (v) If β 1 = β 2 0, then w 1 = w 4, but w 1, w 2, w 3 are distinct. Theorem 12 provides analytic forms of minimally supported designs with d = 2 and J = 3. As a direct conclusion of Theorem 6, the following corollary provides a necessary and sufficient condition for a minimally supported design to be D-optimal. Its proof is relegated to the supplementary materials. Corollary 6. Suppose d = 2, J = 3, and m 4. Let (x i1, x i2 ), i = 1,..., m be the m distinct level combinations of the two factors. Let X 1 be the m 3 matrix defined in Lemma 7. Then a minimally supported design p = (p 1, p 2, p 3, 0,..., 0) T is D-optimal if and only if (1) p 1, p 2, p 3 are obtained according to Theorem 12; (2) For i = 3,..., m, X 1 [1, 2, i] 2 e 1 e 2 e i p 1p 2(w 1 p 1 + w 2 p 2) + X 1 [1, 3, i] 2 e 1 e 3 e i p 1p 3(w 1 p 1 + w 3 p 3) + X 1 [2, 3, i] 2 e 2 e 3 e i p 2p 3(w 2 p 2 + w 3 p 3) + D i p 1p 2p 3 X 1 [1, 2, 3] 2 e 1 e 2 e 3 p 2p 3(2w 1 p 1 + w 2 p 2 + w 3 p 3) where e j = u j1 + u j2 2b j2, w j = e 1 j gj1g 2 j2(π 2 j1 π j2 π j3 ) 1, j = 1,..., m, D i = e j e k (u s1 u t2 +u s2 u t1 2b s2 b t2 ) X 1 [j, k, s] X 1 [j, k, t] {j,k,s,t} ={1,2,3,i} 25
26 (a) (b) β θ 1=1, θ 2=2 θ 1=3, θ 2=5 θ 1=.2, θ 2=.5 θ β 1=1, β 2=1 β 1=1, β 2=3 β 1=1, β 2= β 1 θ 1 Figure 2: Boundary lines for a three-point design to be D-optimal with logit link: Region of (β 1, β 2 ) for given (θ 1, θ 2 ) is outside the boundary lines in Panel (a); Region of (θ 1, θ 2 ) (with θ 1 < θ 2 ) for given (β 1, β 2 ) is between the boundary lines and θ 1 = θ 2 in Panel (b) with the sum going through (j, k, s, t) = (1, 2, 3, i), (1, 3, 2, i), (1, i, 2, 3), (2, 3, 1, i), (2, i, 1, 3), (3, i, 1, 2). Example 8. Suppose d = 2, J = 3, m = 4 with logit link function. We consider the typical 2 2 factorial design problem with four design points (1, 1), (1, 1), ( 1, 1), and ( 1, 1). According to Theorem 12 and Corollary 6, we can analytically calculate the best three-point design and determine whether it is D-optimal or not. Figure 2 provides the boundary lines of regions of parameters (β 1, β 2, θ 1, θ 2 ) for which the best three-point design is D-optimal. In particular, Figure 2(a) shows the region of (β 1, β 2 ) for given θ 1, θ 2. It clearly indicates that the best three-point design tends to be D- optimal when the absolute values of β 1, β 2 are large. The region tends to be larger as the absolute values of θ 1, θ 2 increase. On the other hand, Figure 2(b) displays the region of (θ 1, θ 2 ) for given β 1, β 2. The symmetry of the boundary lines about θ 1 + θ 2 = 0 is due to the logit link which is symmetric about 0. An interesting conclusion based on Corollary 6 is that in this case a three-point design can never be D-optimal if β 1 = 0 or β 2 = 0. 26
27 7 EW D-optimal Design The previous sections mainly focus on locally D-optimal designs which require assumed parameter values, (β 1,..., β d, θ 1,..., θ J 1 ). For many applications, the experimenter may have little or limited information about the values of parameters. In this case, Bayes D-optimality (Chaloner and Verdinelli, 1995) which maximizes E(log F ) given a prior distribution on parameters provides a reasonable solution. Here E stands for expectation, and F is the Fisher information matrix. An alternative to Bayes one is the EW D- optimality (Yang, Mandal and Majumdar, 2014; Atkinson et al., 2007) which maximizes log E(F ) essentially. Compared with Bayes ones, EW D-optimal designs are much easier to calculate and still highly efficient (Yang, Mandal and Majumdar, 2014). Based on Theorem 1, an EW D-optimal design which maximizes E(F ) may be viewed as a locally D-optimal design with e i, c it, u it and b it replaced by their expectations. After the replacement, Lemma 2 still holds. Therefore, almost all the lemmas, theorems, corollaries, and algorithms in the previous sections can be applied directly to EW D-optimal designs as well. The only exception is due to Lemma 3 which provides the formula of A i3 in terms of g ij and π ij. In order to fit EW D-optimal designs, A i3 has to be calculated in terms of u it and b it. For example, A i3 = u i1 if J = 2, A i3 = u i1 u i2 b 2 i2 if J = 3, and A i3 = u i1 u i2 u i3 u i1 b 2 i3 u i3 b 2 i2 if J = 4. Then the formulas of A i3 in Lemma 7, c 1, c 2 in Corollary 3, s i3, s i4, s i5 in Corollary 4, w i in (20), and w j in Corollary 6 need to be written in terms of u it and b it as well. According to Lemma 2, one only needs to calculate E(u it ), i = 1,..., m; t = 1,..., J 1 and E(b it ), i = 1,..., m; t = 2,..., J 1 (if J 3). Then E(c it ) = E(u it ) E(b it ) E(b i,t+1 ) and E(e i ) = J 1 t=1 E(c it). After that, we can use the lift-one algorithm in Section 4 or the exchange algorithm in Section 5 to find EW D-optimal designs. We use the odor removal example to illustrate how it works. Example 4 : Odor Removal Study (continued) Again suppose that we want to conduct a followup experiment. Instead of using the assumed parameter values (β 1, β 2, θ 1, θ 2 ) = ( 2.45, 1.09, 2.67, 0.21), suppose we believe that the truth values of parameters satisfy β 1 [ 3, 1], β 2 [0, 2], θ 1 [ 4, 2], and θ 2 [ 1, 1]. In order to perform Bayes optimality, we assume that the four parameters are independently and uniformly distributed within their intervals. It takes the R function constroptim 430 seconds to numerically find the Bayes D-optimal allocation p b = (0.3879, , , 27
D-OPTIMAL DESIGNS WITH ORDERED CATEGORICAL DATA
Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0210 D-OPTIMAL DESIGNS WITH ORDERED CATEGORICAL DATA Jie Yang, Liping Tong and Abhyuday Mandal University of Illinois at Chicago,
More informationarxiv: v5 [math.st] 8 Nov 2017
D-OPTIMAL DESIGNS WITH ORDERED CATEGORICAL DATA Jie Yang 1, Liping Tong 2 and Abhyuday Mandal 3 arxiv:1502.05990v5 [math.st] 8 Nov 2017 1 University of Illinois at Chicago, 2 Advocate Health Care and 3
More informationD-optimal Designs for Multinomial Logistic Models
D-optimal Designs for Multinomial Logistic Models Jie Yang University of Illinois at Chicago Joint with Xianwei Bu and Dibyen Majumdar October 12, 2017 1 Multinomial Logistic Models Cumulative logit model:
More informationOPTIMAL DESIGNS FOR 2 k FACTORIAL EXPERIMENTS WITH BINARY RESPONSE
1 OPTIMAL DESIGNS FOR 2 k FACTORIAL EXPERIMENTS WITH BINARY RESPONSE Jie Yang 1, Abhyuday Mandal 2 and Dibyen Majumdar 1 1 University of Illinois at Chicago and 2 University of Georgia Abstract: We consider
More informationOPTIMAL DESIGNS FOR 2 k FACTORIAL EXPERIMENTS WITH BINARY RESPONSE
Statistica Sinica 26 (2016), 385-411 doi:http://dx.doi.org/10.5705/ss.2013.265 OPTIMAL DESIGNS FOR 2 k FACTORIAL EXPERIMENTS WITH BINARY RESPONSE Jie Yang 1, Abhyuday Mandal 2 and Dibyen Majumdar 1 1 University
More informationD-optimal Designs for Factorial Experiments under Generalized Linear Models
D-optimal Designs for Factorial Experiments under Generalized Linear Models Jie Yang Department of Mathematics, Statistics, and Computer Science University of Illinois at Chicago Joint research with Abhyuday
More informationarxiv: v8 [math.st] 27 Jan 2015
1 arxiv:1109.5320v8 [math.st] 27 Jan 2015 OPTIMAL DESIGNS FOR 2 k FACTORIAL EXPERIMENTS WITH BINARY RESPONSE Jie Yang 1, Abhyuday Mandal 2 and Dibyen Majumdar 1 1 University of Illinois at Chicago and
More informationarxiv: v2 [math.st] 14 Aug 2018
D-optimal Designs for Multinomial Logistic Models arxiv:170703063v2 [mathst] 14 Aug 2018 Xianwei Bu 1, Dibyen Majumdar 2 and Jie Yang 2 1 AbbVie Inc and 2 University of Illinois at Chicago August 16, 2018
More informationOptimal Designs for 2 k Experiments with Binary Response
1 / 57 Optimal Designs for 2 k Experiments with Binary Response Dibyen Majumdar Mathematics, Statistics, and Computer Science College of Liberal Arts and Sciences University of Illinois at Chicago Joint
More informationOptimal Designs for 2 k Factorial Experiments with Binary Response
Optimal Designs for 2 k Factorial Experiments with Binary Response arxiv:1109.5320v4 [math.st] 29 Mar 2013 Jie Yang 1, Abhyuday Mandal 2 and Dibyen Majumdar 1 1 University of Illinois at Chicago and 2
More informationD-optimal Factorial Designs under Generalized Linear Models
D-optimal Factorial Designs under Generalized Linear Models Jie Yang 1 and Abhyuday Mandal 2 1 University of Illinois at Chicago and 2 University of Georgia Abstract: Generalized linear models (GLMs) have
More informationOptimal Designs for 2 k Factorial Experiments with Binary Response
Optimal Designs for 2 k Factorial Experiments with Binary Response arxiv:1109.5320v3 [math.st] 10 Oct 2012 Jie Yang 1, Abhyuday Mandal 2 and Dibyen Majumdar 1 1 University of Illinois at Chicago and 2
More informationOPTIMAL DESIGNS FOR TWO-LEVEL FACTORIAL EXPERIMENTS WITH BINARY RESPONSE
1 OPTIMAL DESIGNS FOR TWO-LEVEL FACTORIAL EXPERIMENTS WITH BINARY RESPONSE Abhyuday Mandal 1, Jie Yang 2 and Dibyen Majumdar 2 1 University of Georgia and 2 University of Illinois Abstract: We consider
More informationA new algorithm for deriving optimal designs
A new algorithm for deriving optimal designs Stefanie Biedermann, University of Southampton, UK Joint work with Min Yang, University of Illinois at Chicago 18 October 2012, DAE, University of Georgia,
More informationOPTIMAL DESIGNS FOR TWO-LEVEL FACTORIAL EXPERIMENTS WITH BINARY RESPONSE
1 OPTIMAL DESIGNS FOR TWO-LEVEL FACTORIAL EXPERIMENTS WITH BINARY RESPONSE Jie Yang 1, Abhyuday Mandal 2 and Dibyen Majumdar 1 1 University of Illinois at Chicago and 2 University of Georgia Abstract:
More informationOPTIMAL DESIGNS FOR TWO-LEVEL FACTORIAL EXPERIMENTS WITH BINARY RESPONSE
Statistica Sinica 22 (2012), 885-907 doi:http://dx.doi.org/10.5705/ss.2010.080 OPTIMAL DESIGNS FOR TWO-LEVEL FACTORIAL EXPERIMENTS WITH BINARY RESPONSE Jie Yang 1, Abhyuday Mandal 2 and Dibyen Majumdar
More informationLOGISTIC REGRESSION Joseph M. Hilbe
LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of
More informationOPTIMAL DESIGNS FOR TWO-LEVEL FACTORIAL EXPERIMENTS WITH BINARY RESPONSE
Statistica Sinica: Sulement OPTIMAL DESIGNS FOR TWO-LEVEL FACTORIAL EXPERIMENTS WITH BINARY RESPONSE Jie Yang 1, Abhyuday Mandal, and Dibyen Majumdar 1 1 University of Illinois at Chicago and University
More informationPh.D. Qualifying Exam Friday Saturday, January 3 4, 2014
Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Put your solution to each problem on a separate sheet of paper. Problem 1. (5166) Assume that two random samples {x i } and {y i } are independently
More informationThe master equality polyhedron with multiple rows
The master equality polyhedron with multiple rows Sanjeeb Dash Ricardo Fukasawa IBM Research February 17, 2009 Oktay Günlük Abstract The master equality polyhedron (MEP) is a canonical set that generalizes
More informationDesigns for Generalized Linear Models
Designs for Generalized Linear Models Anthony C. Atkinson David C. Woods London School of Economics and Political Science, UK University of Southampton, UK December 9, 2013 Email: a.c.atkinson@lse.ac.uk
More informationBy Min Yang 1 and John Stufken 2 University of Missouri Columbia and University of Georgia
The Annals of Statistics 2009, Vol. 37, No. 1, 518 541 DOI: 10.1214/07-AOS560 c Institute of Mathematical Statistics, 2009 SUPPORT POINTS OF LOCALLY OPTIMAL DESIGNS FOR NONLINEAR MODELS WITH TWO PARAMETERS
More informationWeek 15-16: Combinatorial Design
Week 15-16: Combinatorial Design May 8, 2017 A combinatorial design, or simply a design, is an arrangement of the objects of a set into subsets satisfying certain prescribed properties. The area of combinatorial
More informationA Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression
A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent
More informationChapter 2. Optimization. Gradients, convexity, and ALS
Chapter 2 Optimization Gradients, convexity, and ALS Contents Background Gradient descent Stochastic gradient descent Newton s method Alternating least squares KKT conditions 2 Motivation We can solve
More informationThe master equality polyhedron with multiple rows
The master equality polyhedron with multiple rows Sanjeeb Dash IBM Research sanjeebd@us.ibm.com Ricardo Fukasawa University of Waterloo rfukasaw@math.uwaterloo.ca September 16, 2010 Oktay Günlük IBM Research
More informationNotes taken by Graham Taylor. January 22, 2005
CSC4 - Linear Programming and Combinatorial Optimization Lecture : Different forms of LP. The algebraic objects behind LP. Basic Feasible Solutions Notes taken by Graham Taylor January, 5 Summary: We first
More informationOn construction of constrained optimum designs
On construction of constrained optimum designs Institute of Control and Computation Engineering University of Zielona Góra, Poland DEMA2008, Cambridge, 15 August 2008 Numerical algorithms to construct
More informationOPTIMAL DESIGNS FOR GENERALIZED LINEAR MODELS WITH MULTIPLE DESIGN VARIABLES
Statistica Sinica 21 (2011, 1415-1430 OPTIMAL DESIGNS FOR GENERALIZED LINEAR MODELS WITH MULTIPLE DESIGN VARIABLES Min Yang, Bin Zhang and Shuguang Huang University of Missouri, University of Alabama-Birmingham
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationGeneralized Linear Models (GLZ)
Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the
More informationOptimum designs for model. discrimination and estimation. in Binary Response Models
Optimum designs for model discrimination and estimation in Binary Response Models by Wei-Shan Hsieh Advisor Mong-Na Lo Huang Department of Applied Mathematics National Sun Yat-sen University Kaohsiung,
More information18.10 Addendum: Arbitrary number of pigeons
18 Resolution 18. Addendum: Arbitrary number of pigeons Razborov s idea is to use a more subtle concept of width of clauses, tailor made for this particular CNF formula. Theorem 18.22 For every m n + 1,
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationFRACTIONAL FACTORIAL DESIGNS OF STRENGTH 3 AND SMALL RUN SIZES
FRACTIONAL FACTORIAL DESIGNS OF STRENGTH 3 AND SMALL RUN SIZES ANDRIES E. BROUWER, ARJEH M. COHEN, MAN V.M. NGUYEN Abstract. All mixed (or asymmetric) orthogonal arrays of strength 3 with run size at most
More informationCONSTRUCTION OF SLICED SPACE-FILLING DESIGNS BASED ON BALANCED SLICED ORTHOGONAL ARRAYS
Statistica Sinica 24 (2014), 1685-1702 doi:http://dx.doi.org/10.5705/ss.2013.239 CONSTRUCTION OF SLICED SPACE-FILLING DESIGNS BASED ON BALANCED SLICED ORTHOGONAL ARRAYS Mingyao Ai 1, Bochuan Jiang 1,2
More informationA-optimal designs for generalized linear model with two parameters
A-optimal designs for generalized linear model with two parameters Min Yang * University of Missouri - Columbia Abstract An algebraic method for constructing A-optimal designs for two parameter generalized
More informationApproximation algorithms for nonnegative polynomial optimization problems over unit spheres
Front. Math. China 2017, 12(6): 1409 1426 https://doi.org/10.1007/s11464-017-0644-1 Approximation algorithms for nonnegative polynomial optimization problems over unit spheres Xinzhen ZHANG 1, Guanglu
More informationd-qpso: A Quantum-Behaved Particle Swarm Technique for Finding D-Optimal Designs for Models with Mixed Factors and a Binary Response
d-qpso: A Quantum-Behaved Particle Swarm Technique for Finding D-Optimal Designs for Models with Mixed Factors and a Binary Response Joshua Lukemire, Abhyuday Mandal, Weng Kee Wong Abstract Identifying
More informationAn Alternative Proof of Primitivity of Indecomposable Nonnegative Matrices with a Positive Trace
An Alternative Proof of Primitivity of Indecomposable Nonnegative Matrices with a Positive Trace Takao Fujimoto Abstract. This research memorandum is aimed at presenting an alternative proof to a well
More informationStochastic Design Criteria in Linear Models
AUSTRIAN JOURNAL OF STATISTICS Volume 34 (2005), Number 2, 211 223 Stochastic Design Criteria in Linear Models Alexander Zaigraev N. Copernicus University, Toruń, Poland Abstract: Within the framework
More informationInteractive Interference Alignment
Interactive Interference Alignment Quan Geng, Sreeram annan, and Pramod Viswanath Coordinated Science Laboratory and Dept. of ECE University of Illinois, Urbana-Champaign, IL 61801 Email: {geng5, kannan1,
More information[y i α βx i ] 2 (2) Q = i=1
Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation
More informationORTHOGONAL ARRAYS OF STRENGTH 3 AND SMALL RUN SIZES
ORTHOGONAL ARRAYS OF STRENGTH 3 AND SMALL RUN SIZES ANDRIES E. BROUWER, ARJEH M. COHEN, MAN V.M. NGUYEN Abstract. All mixed (or asymmetric) orthogonal arrays of strength 3 with run size at most 64 are
More informationNonlinear Support Vector Machines through Iterative Majorization and I-Splines
Nonlinear Support Vector Machines through Iterative Majorization and I-Splines P.J.F. Groenen G. Nalbantov J.C. Bioch July 9, 26 Econometric Institute Report EI 26-25 Abstract To minimize the primal support
More informationRepeated ordinal measurements: a generalised estimating equation approach
Repeated ordinal measurements: a generalised estimating equation approach David Clayton MRC Biostatistics Unit 5, Shaftesbury Road Cambridge CB2 2BW April 7, 1992 Abstract Cumulative logit and related
More informationFoundations of Matrix Analysis
1 Foundations of Matrix Analysis In this chapter we recall the basic elements of linear algebra which will be employed in the remainder of the text For most of the proofs as well as for the details, the
More informationGeneralized Linear Models for Non-Normal Data
Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationTHE N-VALUE GAME OVER Z AND R
THE N-VALUE GAME OVER Z AND R YIDA GAO, MATT REDMOND, ZACH STEWARD Abstract. The n-value game is an easily described mathematical diversion with deep underpinnings in dynamical systems analysis. We examine
More information36-720: The Rasch Model
36-720: The Rasch Model Brian Junker October 15, 2007 Multivariate Binary Response Data Rasch Model Rasch Marginal Likelihood as a GLMM Rasch Marginal Likelihood as a Log-Linear Model Example For more
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationMIXED MODELS THE GENERAL MIXED MODEL
MIXED MODELS This chapter introduces best linear unbiased prediction (BLUP), a general method for predicting random effects, while Chapter 27 is concerned with the estimation of variances by restricted
More information12 Modelling Binomial Response Data
c 2005, Anthony C. Brooms Statistical Modelling and Data Analysis 12 Modelling Binomial Response Data 12.1 Examples of Binary Response Data Binary response data arise when an observation on an individual
More information15-780: LinearProgramming
15-780: LinearProgramming J. Zico Kolter February 1-3, 2016 1 Outline Introduction Some linear algebra review Linear programming Simplex algorithm Duality and dual simplex 2 Outline Introduction Some linear
More informationDISTINGUISHING PARTITIONS AND ASYMMETRIC UNIFORM HYPERGRAPHS
DISTINGUISHING PARTITIONS AND ASYMMETRIC UNIFORM HYPERGRAPHS M. N. ELLINGHAM AND JUSTIN Z. SCHROEDER In memory of Mike Albertson. Abstract. A distinguishing partition for an action of a group Γ on a set
More informationReview. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis
Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,
More informationEconometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit
Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit R. G. Pierse 1 Introduction In lecture 5 of last semester s course, we looked at the reasons for including dichotomous variables
More informationANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW
SSC Annual Meeting, June 2015 Proceedings of the Survey Methods Section ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW Xichen She and Changbao Wu 1 ABSTRACT Ordinal responses are frequently involved
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationLecture Notes on Game Theory
Lecture Notes on Game Theory Levent Koçkesen Strategic Form Games In this part we will analyze games in which the players choose their actions simultaneously (or without the knowledge of other players
More information8 Nominal and Ordinal Logistic Regression
8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on
More informationBayesian Multivariate Logistic Regression
Bayesian Multivariate Logistic Regression Sean M. O Brien and David B. Dunson Biostatistics Branch National Institute of Environmental Health Sciences Research Triangle Park, NC 1 Goals Brief review of
More informationRegression models for multivariate ordered responses via the Plackett distribution
Journal of Multivariate Analysis 99 (2008) 2472 2478 www.elsevier.com/locate/jmva Regression models for multivariate ordered responses via the Plackett distribution A. Forcina a,, V. Dardanoni b a Dipartimento
More informationSupport weight enumerators and coset weight distributions of isodual codes
Support weight enumerators and coset weight distributions of isodual codes Olgica Milenkovic Department of Electrical and Computer Engineering University of Colorado, Boulder March 31, 2003 Abstract In
More informationLINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More information2 Describing Contingency Tables
2 Describing Contingency Tables I. Probability structure of a 2-way contingency table I.1 Contingency Tables X, Y : cat. var. Y usually random (except in a case-control study), response; X can be random
More informationCONSTRUCTION OF SLICED ORTHOGONAL LATIN HYPERCUBE DESIGNS
Statistica Sinica 23 (2013), 1117-1130 doi:http://dx.doi.org/10.5705/ss.2012.037 CONSTRUCTION OF SLICED ORTHOGONAL LATIN HYPERCUBE DESIGNS Jian-Feng Yang, C. Devon Lin, Peter Z. G. Qian and Dennis K. J.
More informationNow consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.
Weighting We have seen that if E(Y) = Xβ and V (Y) = σ 2 G, where G is known, the model can be rewritten as a linear model. This is known as generalized least squares or, if G is diagonal, with trace(g)
More informationOn Multiple-Objective Nonlinear Optimal Designs
On Multiple-Objective Nonlinear Optimal Designs Qianshun Cheng, Dibyen Majumdar, and Min Yang December 1, 2015 Abstract Experiments with multiple objectives form a staple diet of modern scientific research.
More informationSemiparametric Generalized Linear Models
Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student
More informationLECTURE 2 LINEAR REGRESSION MODEL AND OLS
SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another
More informationRelation of Pure Minimum Cost Flow Model to Linear Programming
Appendix A Page 1 Relation of Pure Minimum Cost Flow Model to Linear Programming The Network Model The network pure minimum cost flow model has m nodes. The external flows given by the vector b with m
More informationMoment Aberration Projection for Nonregular Fractional Factorial Designs
Moment Aberration Projection for Nonregular Fractional Factorial Designs Hongquan Xu Department of Statistics University of California Los Angeles, CA 90095-1554 (hqxu@stat.ucla.edu) Lih-Yuan Deng Department
More informationBinary choice 3.3 Maximum likelihood estimation
Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Output of the estimation We explain here the various outputs from the maximum likelihood estimation procedure. Solution of the maximum likelihood
More information1 Directional Derivatives and Differentiability
Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=
More informationUNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems
UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems Robert M. Freund February 2016 c 2016 Massachusetts Institute of Technology. All rights reserved. 1 1 Introduction
More informationOptimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.
Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may
More informationPartition models and cluster processes
and cluster processes and cluster processes With applications to classification Jie Yang Department of Statistics University of Chicago ICM, Madrid, August 26 and cluster processes utline 1 and cluster
More informationOutline of GLMs. Definitions
Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density
More informationReview of Vectors and Matrices
A P P E N D I X D Review of Vectors and Matrices D. VECTORS D.. Definition of a Vector Let p, p, Á, p n be any n real numbers and P an ordered set of these real numbers that is, P = p, p, Á, p n Then P
More informationChapter 1: Linear Programming
Chapter 1: Linear Programming Math 368 c Copyright 2013 R Clark Robinson May 22, 2013 Chapter 1: Linear Programming 1 Max and Min For f : D R n R, f (D) = {f (x) : x D } is set of attainable values of
More informationA strongly polynomial algorithm for linear systems having a binary solution
A strongly polynomial algorithm for linear systems having a binary solution Sergei Chubanov Institute of Information Systems at the University of Siegen, Germany e-mail: sergei.chubanov@uni-siegen.de 7th
More informationA Distributed Newton Method for Network Utility Maximization, II: Convergence
A Distributed Newton Method for Network Utility Maximization, II: Convergence Ermin Wei, Asuman Ozdaglar, and Ali Jadbabaie October 31, 2012 Abstract The existing distributed algorithms for Network Utility
More informationLatent Class Analysis for Models with Error of Measurement Using Log-Linear Models and An Application to Women s Liberation Data
Journal of Data Science 9(2011), 43-54 Latent Class Analysis for Models with Error of Measurement Using Log-Linear Models and An Application to Women s Liberation Data Haydar Demirhan Hacettepe University
More informationA Framework for the Construction of Golay Sequences
1 A Framework for the Construction of Golay Sequences Frank Fiedler, Jonathan Jedwab, and Matthew G Parker Abstract In 1999 Davis and Jedwab gave an explicit algebraic normal form for m! h(m+) ordered
More informationTRANSPORTATION PROBLEMS
Chapter 6 TRANSPORTATION PROBLEMS 61 Transportation Model Transportation models deal with the determination of a minimum-cost plan for transporting a commodity from a number of sources to a number of destinations
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1
MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical
More informationLatent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent
Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary
More informationMaximum Likelihood, Logistic Regression, and Stochastic Gradient Training
Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions
More informationLasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices
Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,
More informationWorst case analysis for a general class of on-line lot-sizing heuristics
Worst case analysis for a general class of on-line lot-sizing heuristics Wilco van den Heuvel a, Albert P.M. Wagelmans a a Econometric Institute and Erasmus Research Institute of Management, Erasmus University
More informationDescribing Contingency tables
Today s topics: Describing Contingency tables 1. Probability structure for contingency tables (distributions, sensitivity/specificity, sampling schemes). 2. Comparing two proportions (relative risk, odds
More informationA Generalized Eigenmode Algorithm for Reducible Regular Matrices over the Max-Plus Algebra
International Mathematical Forum, 4, 2009, no. 24, 1157-1171 A Generalized Eigenmode Algorithm for Reducible Regular Matrices over the Max-Plus Algebra Zvi Retchkiman Königsberg Instituto Politécnico Nacional,
More informationGeneralized Linear Models Introduction
Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,
More informationConcepts and Applications of Stochastically Weighted Stochastic Dominance
Concepts and Applications of Stochastically Weighted Stochastic Dominance Jian Hu Department of Industrial Engineering and Management Sciences Northwestern University jianhu@northwestern.edu Tito Homem-de-Mello
More informationSymmetric Matrices and Eigendecomposition
Symmetric Matrices and Eigendecomposition Robert M. Freund January, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 2 1 Symmetric Matrices and Convexity of Quadratic Functions
More informationOn Expected Gaussian Random Determinants
On Expected Gaussian Random Determinants Moo K. Chung 1 Department of Statistics University of Wisconsin-Madison 1210 West Dayton St. Madison, WI 53706 Abstract The expectation of random determinants whose
More informationThe initial involution patterns of permutations
The initial involution patterns of permutations Dongsu Kim Department of Mathematics Korea Advanced Institute of Science and Technology Daejeon 305-701, Korea dskim@math.kaist.ac.kr and Jang Soo Kim Department
More informationOptimal XOR based (2,n)-Visual Cryptography Schemes
Optimal XOR based (2,n)-Visual Cryptography Schemes Feng Liu and ChuanKun Wu State Key Laboratory Of Information Security, Institute of Software Chinese Academy of Sciences, Beijing 0090, China Email:
More information