On I-Optimal Designs for Generalized Linear Models: An Efficient Algorithm via General Equivalence Theory

Size: px

Start display at page:

Download "On I-Optimal Designs for Generalized Linear Models: An Efficient Algorithm via General Equivalence Theory"

Augustine Carroll
5 years ago
Views:

1 arxiv: v1 [stat.me] 17 Jan 2018 On I-Optimal Designs for Generalized Linear Models: An Efficient Algorithm via General Equivalence Theory Yiou Li Department of Mathematical Sciences, DePaul University, Chicago, IL ( and Xinwei Deng Department of Statistics, Virginia Tech, Blacksburg, VA ( Yiou Li is Assistant Professor, Department of Mathematical Sciences, DePaul University, Chicago, IL ( Xinwei Deng is Associate Professor, Department of Statistics, Virginia Tech, Blacksburg, VA ( 1

2 Abstract The generalized linear model plays an important role in statistical analysis and the related design issues are undoubtedly challenging. The state-of-the-art works mostly apply to design criteria on the estimates of regression coefficients. It is of importance to study optimal designs for generalized linear models, especially on the prediction aspects. In this work, we propose a prediction-oriented design criterion, I-optimality, and develop an efficient sequential algorithm of constructing I-optimal designs for generalized linear models. Through establishing the General Equivalence Theorem of the I-optimality for generalized linear models, we obtain an insightful understanding for the proposed algorithm on how to sequentially choose the support points and update the weights of support points of the design. The proposed algorithm is computationally efficient with guaranteed convergence property. Numerical examples are conducted to evaluate the feasibility and computational efficiency of the proposed algorithm. Keywords: Experimental Design, Fast convergence; Multiplicative algorithm; Predictionoriented; Sequential algorithm 2

3 1 Introduction The generalized linear model (GLM) is a flexible generalization of linear models that allow the response being related to the predictor variables through a link function. It was first formulated by Nelder and Wedderburn [17] as a generalization of various statistical models, including linear regression, logistic regression and Poisson regression. The GLMs are used widely in many statistical analysis with different applications including business analytics, image analysis, bioinformatics, etc. From a data collection perspective, it is of importance to understand what is an optimal design for the generalized linear model, especially on the prediction aspects. Suppose that an experiment has d input factors x = [x 1,...,x d ], and let Ω j be a measurable set of all possible levels of the jth input factor. Common examples of Ω j are [ 1,1] and R. The experimental region, Ω, is some measurable subset of Ω 1 Ω d. In a generalized linear model, the response variable Y(x) is assumed to follow a particular distribution in the exponential family, which includes normal, binomial, Poisson and gamma distribution, etc. A link function provides the relationship between the linear predictor, η = β T g(x), and the mean of the response Y(x), µ(x) = E[Y(x)] = h 1( β T g(x) ), (1) whereg = [g 1,...,g p ] T arethebasisandβ = [β 1,β 2,...,β p ] T arethecorrespondingregression coefficients. Here h : R R is the link function, and h 1 is the inverse function of h. In this work, we consider a continuous design ξ as: x 1,..., x n ξ = λ 1,..., λ n with x i x j if i j. The λ i (λ i 0) represents the fraction of experiments that is to be carried out at design point x i and n λ i = 1. The Fisher information matrix of the generalized linear model in (1) is: I(ξ,β) = n λ i g(x i )w(x i )g T (x i ), (2) 3

4 where w(x i ) = 1/{var(Y(x i ))[h (µ(x i ))] 2 }. The notation I(ξ,β) emphasizes the dependence on the design ξ and regression coefficient β. The design issues of linear models have been well studied due to its simplicity and practicality [20]. The design issues of generalized linear models tend to be much more challenging due to the complicated Fisher information matrix. In GLMs, the Fisher information matrix I(ξ, β) often depends on the regression coefficient β through the function w(x i ) in (2). As most optimal design criteria can be expressed as a function of the Fisher information matrix, a scientific understanding of a locally optimal design is often conducted under the assumption that some initial estimates of the model coefficients are available. Khuri et al.[12] surveyed design issues for generalized linear models and noted that the research on designs for generalized linear models is still very much in developmental stage. Not much work has been accomplished either in terms of theory or in terms on computational methods to evaluate the optimal design when the dimension of the design space is high. Yang and Stufken [33], Yang [31], Dette and Melas [4], and Wu and Stufken [26] obtained a series of theoretical results for several optimality criteria with univariate nonlinear models and univariate generalized linear models. On the computational aspect, most literature focuses on the D-optimality. Popular algorithms include modified versions of Fedorov-Wynn algorithm proposed by Wynn [27] and Fedorov [7] that updates the support of the design and the multiplicative algorithm proposed by Silvey et al. [21] that updates the weights of candidate points from a large candidate pool. Chernoff [3] expressed concerns about the concentration of study on D-optimality, advocating the study of other criteria. Research works on algorithms of finding optimal design under other criteria are sparse. Yang et al. [32] proposed a multistage algorithm to construct φ p optimal design by sequentially updating the support of the design and optimizing the weights using the Newton s method in each iteration. A practical objective of the experimental design often focuses on the accuracy of the model prediction. Under such a consideration, the corresponding criterion, I-optimality, aims at minimizing the integrated mean squared error of the prediction. There are some 4

5 literatures on I-optimality for linear regression models. Haines[11] proposed a simulated annealing algorithm to obtain exact I-optimal design. Meyer and Nachtsheim[15][16] used simulated annealing and coordinate-exchange algorithms to construct exact I-optimal design. Recently, Goos et al.[10] proposed a modified multiplicative algorithm to find I- optimal designs for the mixture experiments. However, based on our best knowledge, there are few literatures on defining and obtaining I-optimal designs for generalized linear models (GLMs). The contribution of this work aims to fill the gap in the theory of I-optimal designs for GLMs and develop an efficient algorithm of constructing I-optimal designs for GLMs. Specifically, we first establish the General Equivalence Theorem of the I-optimality for GLMs. The established theoretical results shed some insightful lights on how to develop efficient algorithms on constructing the I-optimal design for GLMs. The General Equivalence Theorem is well established for linear regression models, but not for GLMs. Kiefer and Wolfowitz [14] derived an equivalence theorem on D-optimality for linear models, and White [24] extended the theorem on D- and G-optimalities to nonlinear models. Later, Kiefer [13] generalized General Equivalence Theorem (GET) to φ p -optimality for linear regression models. Since the I-optimality is not a member of φ p -optimality, the General Equivalence Theorem on I-optimality for GLMs is not thoroughly investigated. After establishing the General Equivalence Theorem of I-optimality for generalized linear models, we further develop an efficient sequential algorithm of constructing I-optimal designs for GLMs. The proposed algorithm is able to sequentially add points into design and update their weights simultaneously by using a modified multiplicative algorithm [23]. We also establish the convergence properties of the proposed algorithm. The multiplicative algorithm, first proposed by Titterington [23] and Silvey et al.[21], has been used widely in finding optimal designs of linear regression models. The original multiplicative algorithm often requires a large candidate pool and updating the weights of all candidate points simultaneously can be computationally expensive. For our proposed sequential algorithm, we have modified the multiplicative algorithm and use it on a much smaller set of points. The proposed algorithm can significantly enhance the computation efficiency because of the sequential nature of the algorithm. It is worth point- 5

6 ing out that the proposed algorithm can be easily extended to construct optimal design under other optimality criteria. The remainder of this work is organized as follows. We introduce the I-optimality criterion for the generalized linear model in Section 2. In Section 3, we derive the General Equivalence Theorem on the I-optimality for the generalized linear models. We detail the proposed sequential algorithm and establish its the convergence properties in Sections 4-5. Numerical examples are conducted in Section 6 to evaluate the performance of the proposed algorithm. We conclude this work with some discussion in Section 7. 2 The I-Optimality Criterion for GLM For the generalized linear models, one practical goal is to make prediction of response Y(x) at given input x. Thus it is important to adopt the predication oriented criterion for considering the optimal design. The integrated mean squared error (IMSE) is defined as the expected value of mean squared error between fitted mean response ˆµ(x) and true mean response µ(x) over the domain of input x with distribution of input x as F IMSE on the experimental region Ω, i.e., E [ Ω (ˆµ(x) µ(x))2 df IMSE ]. Under the context of generalized linear models, the fitted mean response ˆµ(x) = h 1 (ˆβT g(x) ) can be expanded around the true mean response µ(x) = h 1( β T g(x) ) using Taylor expansion, which is ˆµ(x) µ(x) = h 1 (ˆβT g(x) ) h 1( β T g(x) ) c(x) T (ˆβ β ), with c(x) = ( ) T ( ) h 1 β 1 (x),..., h 1 β d (x) = g(x) dh 1. Here η = β T g(x) is the linear predic- dη tor. Then, using the above first-order Taylor expansion, the integrated mean squared error could be approximated with, [ IMSE(ξ,β) = E Ω c(x) T (ˆβ β )(ˆβ β ) T c(x)dfimse ]. In numerical analysis, the first-order Taylor expansion is commonly used to approximate the difference of a nonlinear function when evaluated at two close points. Considering the 6

7 design issue for generalized linear models, similar approaches was commonly used in the literature, such as the work in Schein and Ungar [19] for the logistic regression models. Lemma 1. The IMSE(ξ,β) could be expressed as the trace of the product of two matrices, IMSE(ξ,β) = tr ( AI(ξ,β) 1), where matrix A = Ω c(x)c(x)t df IMSE = [ ] 2dFIMSE Ω g(x)gt dh (x) 1 depends only on the dη regression coefficients and basis functions, but not the design, and the Fisher information matrix I(ξ, β), defined in equation (2), depends on both the regression coefficients and the design. Proof. [ IMSE(ξ,β) = E Ω = c T E Ω Ω[( = tr ) ] T c (ˆβ T β )(ˆβ β cdfimse [ ) ] T (ˆβ β )(ˆβ β cdf IMSE tr ( cc T I(ξ,β) 1) df IMSE ) ] cc T df IMSE I(ξ,β) 1 Ω = tr ( AI(ξ,β) 1). Theapproximationisprovidedbythefactthattheestimatedregressioncoefficient ˆβ follows N(β,I(ξ,β) 1 ) asymptotically. We call the I-optimality as the corresponding optimality criterion aiming at minimizing IMSE(ξ,β) = tr ( AI(ξ,β) 1). A design ξ is called an I-optimal design if it minimizes IMSE(ξ,β). The I-optimality for linear regression was first proposed by Box and Draper [2] back in 1959, and has been studied extensively for decades. The I-optimality for generalized linear models defined above follows a similar setup. In this article, we focus on local I-optimal design with a given regression coefficient β. To simplify the notation, in the rest of the article, β is omitted from the notation I(ξ,β) 7

8 of the Fisher information matrix, and I(ξ) is used instead. The following lemma shows the convexity of the IMSE(ξ,β), which provides a key fact that facilitates the proof of the General Equivalence Theorem in Section 3. Lemma 2. The IMSE(ξ,β) is a convex function of the design ξ. Proof. Consider two designs ξ 1 and ξ 2, and define ξ = (1 a)ξ 1 +aξ 2. With 0 a 1, ξ is still a feasible design. The Fisher information matrix of ξ could be expressed as I(ξ) = (1 a)i(ξ 1 )+ai(ξ 2 ). Since the three Fisher information matrices are positive definite and the inverse of positive definite matrix is convex [1], we have: I(ξ) 1 (1 a)i(ξ 1 ) 1 +ai(ξ 2 ) 1. Since A is symmetric and positive semi-definite, there exists a unique positive semi-definite matrix B with B 2 = A [1]. As B is symmetric, we have BI(ξ) 1 B (1 a)bi(ξ 1 ) 1 B+aBI(ξ 2 ) 1 B. As a result, tr(ai(ξ) 1 ) = tr(bi(ξ) 1 B) tr((1 a)bi(ξ 1 ) 1 B+aBI(ξ 2 ) 1 B) = (1 a)tr(ai(ξ 1 ) 1 )+atr(ai(ξ 2 ) 1 ). That means IMSE((1 a)ξ 1 + aξ 2,β) (1 a)imse(ξ 1,β) + aimse(ξ 2,β). Thus IMSE(ξ,β) is a convex function of design ξ. 3 General Equivalence Theorem of I-Optimality for GLM In this section, we will derive the General Equivalence Theorem of I-optimality for generalized linear models. The established theoretical results facilitate the sequential choice of support points as that of Fedorov-Wynn algorithm [27, 7]. 8

9 We first investigate the directional derivative of IMSE(ξ, β) with respect to ξ. Given two designs ξ and ξ, let the design ξ be constructed as ξ = (1 α)ξ +αξ. Then, the derivative of IMSE(ξ,β) in the direction of design ξ is φ(ξ IMSE( ξ,β) IMSE(ξ,β),ξ) = lim. (3) α 0 + α Theorem 1. The directional derivative of IMSE(ξ,β) in the direction of any design ξ is, φ(ξ IMSE( ξ,β) IMSE(ξ,β),ξ) = lim = tr(i(ξ) 1 A) tr(i(ξ )I(ξ) 1 AI(ξ) 1 ). α 0 + α Proof. Given ξ = (1 α)ξ +αξ, we have I( ξ) = (1 α)i(ξ)+αi(ξ ). For any matrix B as a function of α, the derivative of its inverse B 1 can be calculated as B 1 α = B 1 B α B 1 [1]. So, the derivative of I( ξ) 1 A with respect to α can be expressed as, [ ] I( ξ) 1 A 1 I( ξ) = α α A = I( ξ) 1 [I(ξ ) I(ξ)]I( ξ) 1 A. (4) Then, the directional derivative of IMSE(ξ, β) is φ(ξ,ξ) = tr[i( ξ) 1 A] α [ α=0 ] I( ξ) 1 A = tr α α=0 = tr( I( ξ) 1 [I(ξ ) I(ξ)]I( ξ) 1 A) α=0 = tr(i(ξ) 1 A) tr ( I(ξ) 1 I(ξ )I(ξ) 1 A ). Corollary 1. The directional derivative of IMSE(ξ,β) in the direction of a single point x is given as, φ(x,ξ) = tr(i(ξ) 1 A) w(x)g(x) T I(ξ) 1 AI(ξ) 1 g(x). Proof. Consider ξ as as design with one design point x whose weight is 1. 9

10 Based on Lemma 2 and the above Corollary 1, we can derive the following General Equivalence Theorem of I-optimality for generalized linear models. This theorem provides an intuitive understanding on how to choose support points sequentially and check the optimality of a design. Theorem 2 (General Equivalence Theorem). The following three conditions of ξ are equivalent: 1. The design ξ minimizes IMSE(ξ,β) = tr(ai(ξ) 1 ). 2. The design ξ minimizes supw(x)g T (x)i(ξ) 1 AI(ξ) 1 g(x). x Ω 3. w(x)g T (x)i(ξ ) 1 AI(ξ ) 1 g(x) tr(i(ξ ) 1 A) holds over the experimental region Ω, Proof. and the equality holds only at the points of support of the design ξ. (i) 1 3: As proved in Lemma 2, the IMSE(ξ,β) = tr(ai(ξ) 1 ) is a convex function of the design ξ, and since ξ minimizes IMSE(ξ,β), the directional derivative φ(x,ξ ) 0 for all x Ω, i.e., tr(i(ξ ) 1 A) w(x)g(x) T I(ξ ) 1 AI(ξ ) 1 g(x), and equality holds only for support points of ξ. (ii) 1 2: Thiscouldbeprovedbycontradiction. Supposethatξ istheminimizerofimse(ξ,β), but is not the minimizer of supw(x)g T (x)i(ξ) 1 AI(ξ) 1 g(x). Then there exists a design ξ different from ξ that x Ω minimizes supw(x)g T (x)i(ξ) 1 AI(ξ) 1 g(x), x Ω 10

11 i.e., sup x Ω w(x)g T (x)i(ξ ) 1 AI(ξ ) 1 g(x) < supw(x)g T (x)i(ξ ) 1 AI(ξ ) 1 g(x). (5) x Ω As just proved above in (i), w(x)g T (x)i(ξ ) 1 AI(ξ ) 1 g(x) tr(i(ξ ) 1 A) holds over the experimental region Ω, we have supw(x)g T (x)i(ξ ) 1 AI(ξ ) 1 g(x) tr(i(ξ ) 1 A). (6) x Ω Combining the results in (5) and (6), it is easy to see that supw(x)g T (x)i(ξ ) 1 AI(ξ ) 1 g(x) < tr(i(ξ ) 1 A). (7) x Ω Recall that the weight of the jth point x j in design ξ is denoted as λ j. Then supw(x)g T (x)i(ξ ) 1 AI(ξ ) 1 g(x) sup w(x i )g T (x i )I(ξ ) 1 AI(ξ ) 1 g(x i ) x Ω x i ξ j λ j w(x j )g T (x j )I(ξ ) 1 AI(ξ ) 1 g(x j ) [ ] = tr λ j w(x j )g(x j )g T (x j )I(ξ ) 1 AI(ξ ) 1 j = tr(i(ξ ) 1 A) = IMSE(ξ,β) Comparing the left and right hand sides of equation (7), we have IMSE(ξ,β) < IMSE(ξ,β), which contradicts with condition 1. Thus, the design ξ must be the minimizer of supw(x)g T (x)i(ξ) 1 AI(ξ) 1 g(x). x Ω (iii) 2 1: This statement can also be proved by contradiction. Suppose the design ξ is the minimizer of supw(x)g T (x)i(ξ) 1 AI(ξ) 1 g(x), but not the minimizer of IMSE(ξ,β). x Ω Then there exists a design ξ different from ξ that minimizes IMSE(ξ,β), i.e., IMSE(ξ,β) < IMSE(ξ,β), 11

12 and as a result from (i), φ(x,ξ ) 0 for all x Ω. Based on Corollary 1 and the assumption that ξ is the minimizer of we thus have, supw(x)g T (x)i(ξ) 1 AI(ξ) 1 g(x), x Ω tr(i(ξ ) 1 A) supw(x)g(x) T I(ξ ) 1 AI(ξ ) 1 g(x) x Ω supw(x)g(x) T I(ξ ) 1 AI(ξ ) 1 g(x). (8) x Ω Since IMSE(ξ,β) < IMSE(ξ,β) as defined, together with equation (8), we have supw(x)g(x) T I(ξ ) 1 AI(ξ ) 1 g(x) tr(i(ξ ) 1 A) < tr(i(ξ ) 1 A), x Ω So, φ(x,ξ ) > 0 for all x in Ω. Together with Corollary 1, for any point x i in the design ξ, we have λ i tr(i(ξ ) 1 A) > λ i w(x i )g(x) T I(ξ ) 1 AI(ξ ) 1 g(x i ), i = 1,2,...,n. Summing over all design points in ξ, we have tr(i(ξ ) 1 A) > i λ i w(x i )g(x) T I(ξ ) 1 AI(ξ ) 1 g(x i ) = tr(i(ξ ) 1 A), which leads to a contradiction that tr(i(ξ ) 1 A) > tr(i(ξ ) 1 A). Thus, ξ must minimize IMSE(ξ, β). (iv) 3 1: Suppose that w(x)g T (x)i(ξ ) 1 AI(ξ ) 1 g(x) tr(i(ξ ) 1 A) holds over the experimental region Ω, and the equality holds only at the points of support of the design ξ, then the directional derivative φ(x,ξ ) 0 for all x Ω, and φ(x i,ξ ) = 0 for all x i ξ. Thus, by the convexity of IMSE, ξ minimizes IMSE, and condition 1. holds. 12

13 Theorem 2 provides an intuitive way for choosing the support points to construct an I-optimal design in a sequential fashion. According to Theorem 2, for an optimal design ξ, the directional derivative φ(x,ξ ) is non-negative for any x Ω. It implies that for any non-optimal design, there will be some directions in which the directional derivative φ(x,ξ) < 0. Given a current design ξ, to gain the maximal decrease in the I-optimality criterion, we would choose a new support point x to be added into the design, if φ(x,ξ) = minφ(x,ξ) < 0. Then one can optimize the weights of all support points in the updated x Ω design, which will be described in Algorithm 1 in Section 4. By iterating the selection of support point and the weight update of all support points, this two-step iterative procedure can be continued until minφ(x,ξ) 0 for all x Ω, which means the updated design x Ω is I-optimal. Such a sequential algorithm of constructing optimal designs, as described in Algorithm 2 in Section 5, follows similar spirits in the widely-used Fedorov-Wynn algorithm [27, 28]. Note that the I-optimal designs are not necessarily unique. More details will be discussed in Section 6.2 after the proposed algorithms are introduced. In next section, we will first discuss the details on how to optimize the weights of supports point for a given design. Then we will describe Algorithm 2 in details in Section 5 on the sequential algorithm of constructing I-optimal designs for the generalized linear models. 4 Optimal Weights Given the Design Points In this section, we will investigate how to assign optimal weights to support points that minimize IMSE(ξ,β) when the design points are given. In Wynn s work [27, 28], equal weightsareassignedtoalldesignpointsafteranewdesignpointisadded. Whittle[25]made improvement by choosing the weight of the newly added point as the one that minimizes the optimality criterion and the weights of existing points proportional to their current weights. Yang et al. [32] proposed using a Newton-Raphson method to find the optimal weights, which solves for the roots of first order partial derivatives of the criterion with respect to the weights. Note that the Newton-Raphson method may get negative weights and thus more efforts need to be taken to eliminate those negative weights. Also when the 13

14 first order partial derivatives does not exist, such as under the c-optimality criterion, Yang et al. s algorithm [32] is not applicable. To address the above challenges, we develop a modified multiplicative algorithm for finding the optimal weights of support points. The developed algorithm does not require the partial derivative and Hessian matrix inversion, which can be computationally efficient. 4.1 Condition of Optimal Weights Given the design points as x 1,x 2,...,x n fixed, denote λ = [λ 1,λ 2,...,λ n ] T to be the weight vector with λ i as the weight for the corresponding design point x i. We write the corresponding design as x ξ λ 1,..., x n = λ 1,..., λ n A superscript λ is added to emphasize that only the weight vector is changeable in the design under this situation. The optimal weight vector λ should bethe one that minimizes IMSE(ξ,β) with design points x 1,...,x n fixed. We can still obtain the directional directional derivative of IMSE(ξ, β) with respect to the weight vector λ. Consider another weight vector λ = [ λ 1,..., λ n ] T. Then the convex combination λ = (1 α)λ + α λ with 0 α 1 is also a weight vector. Define the directional derivative of IMSE(ξ, β) in the direction of a new weight vector λ = [ λ 1,..., λ n ] as:. IMSE(ξ λ,β) IMSE(ξ λ,β) ψ( λ,λ) = lim. α 0 + α Lemma 3. The IMSE(ξ,β) is a convex function of weight vector λ. Proof. Since λ = (1 a)λ+a λ, we have I(ξ λ) = (1 a)i(ξ λ )+ai(ξ λ ). Then, the rest of the proof is similar to that of Lemma 2. Lemma 4. The directional derivative of IMSE(ξ,β) in the direction of a new weight vector λ = [ λ 1,..., λ n ] T, ψ( λ,λ) = tr ( I(ξ λ ) 1 A ) n λ i w(x i )g(x i ) T I(ξ λ ) 1 AI(ξ λ ) 1 g(x i ). 14

15 Proof. This is a special case of the general directional derivative in equation 3, where the design points in ξ = ξ λ and ξ = ξ λ are the same, but with different weights. Thus, Theorem 1 still holds and ψ( λ,λ) = tr(i(ξ) 1 A) tr(i(ξ )I(ξ) 1 AI(ξ) 1 ) = tr ( I(ξ λ ) 1 A ) n λ i w(x i )g(x i ) T I(ξ λ ) 1 AI(ξ λ ) 1 g(x i ). The following result provides a necessary and sufficient condition of optimal weight vector to minimize IMSE(ξ, β) when the design points are fixed. Theorem 3. Given a fixed set of design points x 1,...,x n, the following two conditions on the weight vector λ = [λ 1,...,λ n] T are equivalent: 1. The weight vector λ minimizes IMSE(ξ,β). 2. tr(i(ξ λ ) 1 A) = w(x i )g(x i ) T I(ξ λ ) 1 AI(ξ λ ) 1 g(x i ), for i = 1,...,n. Proof. (i) (1) (2) Since λ minimizes IMSE(ξ,β), and IMSE(ξ,β) is a convex function of weight vector as proved in Lemma 3, ψ( λ,λ ) = tr ( I(ξ λ ) 1 A ) n λ i w(x i )g(x i ) T I(ξ λ ) 1 AI(ξ λ ) 1 g(x i ) = n [ λ i tr(i(ξ λ ) 1 A) w(x i )g(x i ) T I(ξ λ ) 1 AI(ξ λ ) 1 g(x i ) ] 0, for all feasible weight vector λ. Thus, tr(i(ξ λ ) 1 A) w(x i )g(x i ) T I(ξ λ ) 1 AI(ξ λ ) 1 g(x i ) 0 for i = 1,...,n. Now we will show that actually, tr(i(ξ λ ) 1 A) w(x i )g(x i ) T I(ξ λ ) 1 AI(ξ λ ) 1 g(x i ) = 0 for i = 1,...,n. 15

16 Suppose there exists at least one x i, such that tr(i(ξ λ ) 1 A) w(x i )g(x i ) T I(ξ λ ) 1 AI(ξ λ ) 1 g(x i ) > 0. Then, tr(i(ξ λ ) 1 A) = > n λ i tr(i(ξ λ ) 1 A) n λ i w(x i)g(x i ) T I(ξ λ ) 1 AI(ξ λ ) 1 g(x i ) = tr(i(ξ λ ) 1 A), which is a contradiction. So, tr(i(ξ λ ) 1 A) w(x i )g(x i ) T I(ξ λ ) 1 AI(ξ λ ) 1 g(x i ) = 0, for i = 1,...,n. (ii) (2) (1) It is obvious since ψ( λ,λ ) = 0 for all feasible weight vector λ and IMSE(ξ,β) is a convex function of weight vector. Remark 1. The proof of (1) (2) in Theorem 3 can also be derived by the method of Lagrange multipliers. The results in Theorem 3 provides a useful gateway to design an effective algorithm for finding the optimal weights given the design points. 4.2 Multiplicative Algorithm for Optimal Weights Base on Theorem 3 for given design points, the optimal weight vector that aims at minimizing IMSE(ξ,β) should be the solution of a system of equations as tr(i(ξ λ ) 1 A) = w(x i )g(x i ) T I(ξ λ ) 1 AI(ξ λ ) 1 g(x i ), i = 1,...,n. (9) 16

17 The weight of a design point x i should be adjusted according to the values of the two sides of equation (9). Note that for any weight vector λ, tr(i(ξ λ ) 1 A) = n λ i w(x i )g(x i ) T I(ξ λ ) 1 AI(ξ λ ) 1 g(x i ). Then, our strategy of finding optimal weight would be, for a design point x i, if tr(i(ξ λ ) 1 A) < w(x i )g(x i ) T I(ξ λ ) 1 AI(ξ λ ) 1 g(x i ), the weight λ i of x i should be increased; if tr(i(ξ λ ) 1 A) > w(x i )g(x i ) T I(ξ λ ) 1 AI(ξ λ ) 1 g(x i ), the weight λ i of x i should be decreased. Thus, the ratio w(x i)g(x i ) T I(ξ λ ) 1 AI(ξ λ ) 1 g(x i ) tr(i(ξ λ ) 1 A) weight of a design point x i. would indicate a good adjustment of the current Encouraged by the promising performance of the multiplicative algorithm recently used in Goos and Jones [10], we consider a modified multiplicative algorithm to find optimal weights based on Theorem 3. The detailed algorithm is described in Algorithm 1 as follows. Algorithm 1 Step 1: Assign a random weight vector λ 0 = [λ 0 1,...,λ 0 n] T, and k = 0. Step 2: For each design point x i, update the weight as λ k+1 i = λ k i = λ k i n ( λ k i n ) δ w(x i )g T (x i )I(ξ λk ) 1 AI(ξ λk ) 1 g(x i ) tr(i(ξ λk ) 1 A) ( ) δ w(xi )g T (x i )I(ξ λk ) 1 AI(ξ λk ) 1 g(x i ) tr(i(ξ λk ) 1 A) [ ] δ w(x i )g T (x i )I(ξ λk ) 1 AI(ξ λk ) 1 g(x i ) [ w(xi )g T (x i )I(ξ λk ) 1 AI(ξ λk ) 1 g(x i ) ], (10) δ λ k i Step 3: Set k = k +1 and go to Step 2 until convergence. In Algorithm 1, the value of 0 < δ < 1 is the convergence parameter chosen by the user. Following the suggestion by Fellman [8] and Torsney [9], we use δ = 1. The following 2 lemma provides a justification that every iteration in Algorithm 1 is always feasible. 17

18 Lemma 5. A positive definite A = [ ] 2dFIMSE Ω g(x)gt dh (x) 1 would ensure the iteration dη in equation (10) to be always feasible, i.e., the denominator is always positive. n λ k i [ ] δ w(x i )g T (x i )I(ξ λk ) 1 AI(ξ λk ) 1 g(x i ) Proof. Given A and I(ξ λk ) are positive definite, then ] 0 < tr(i(ξ λk ) 1 A) = tr [I(ξ λk )I(ξ λk ) 1 AI(ξ λk ) 1 [ n ] = tr λ k i w(x i)g(x i )g(x i ) T I(ξ λk ) 1 AI(ξ λk ) 1 = = n ] λ k i tr [w(x i )g(x i )g(x i ) T I(ξ λk ) 1 AI(ξ λk ) 1 n λ k i w(x i)g(x i ) T I(ξ λk ) 1 AI(ξ λk ) 1 g(x i ). Thus, there exists some x i, such that λ k iw(x i )g(x i ) T I(ξ λk ) 1 AI(ξ λk ) 1 g(x i ) > 0, and naturally, λ k i w(x i )g(x i ) T I(ξ λk ) 1 AI(ξ λk ) 1 g(x i )) ( δ > 0, which leads to n λ k i [ w(x i )g T (x i )I(ξ λk ) 1 AI(ξ λk ) 1 g(x i )] δ > 0. Remark 2. Note that the matrix A in Lemma 5 is always positive semi-definite and does not depend on the choice of design. When a singular A is observed in the first step, one can choose the basis functions g 1,...,g p carefully so that they are (nearly) orthogonal functions [ ] 2F under the weight function IMSE. By doing this, one can ensure the matrix A to dh 1 dη be positive definite. In general, the Gram-Schmidt orthogonalization [18] can be used to construct uni/multi-variate orthogonal basis functions. The Favard s theorem [6] could be applied to construct uni-variate orthogonal polynomials, and it is straightforward to prove that when the input variables x 1,...,x d are independent, i.e. F IMSE is separable, the multivariate orthogonal polynomials can be constructed using the tensor product of uni-variate orthogonal polynomials. 18

19 The following result provides the convergence property of Algorithm 1. Proposition 1 (Convergence of Algorithm 1). Given the design points x 1,...,x n fixed, the weight vector λ k obtained from Algorithm 1 converges to the optimal weight vector λ that minimizes I-optimality IMSE(ξ λ,β), as k. The proof of Proposition 1 will mainly based on the results in Theorems 1 and 2 in the paper by Yu [35]. We first require the following lemma before laying out the proof. Lemma 6. Define ϕ(i) = IMSE(ξ λ,β) = tr(i 1 A) as a function of Fisher information matrix I, then (a) ϕ(i) is a strictly convex function of I. (b) ϕ(i) is a decreasing function of I. Proof. The proof of (a) is very similar to that of Lemma 2, and is omitted here. For (b), it is straightforward to see that ϕ(i) is decreasing since 0 < I(ξ λ 1 ) I(ξ λ 2 ) I(ξ λ 1 ) 1 I(ξ λ 2 ) 1 ϕ(i 1 ) ϕ(i 2 ). Proof of Proposition 1. BasedonLemma6, itisknownthatϕ(i) = IMSE(ξ λ,β) = tr(i 1 A) is a convex and decreasing function of I. Under Algorithm 1 with 0 < δ < 1, denote λ k and λ k+1 to be the solutions on kth and (k+1)th iteration of the multiplicative algorithm, respectively. Since I(ξ λk ),I(ξ λk+1 ) and A are positive definite, it is easy to see that I(ξ λk ) > 0, I(ξ λk+1 ) > 0, and I(ξ λk ) 1 AI(ξ λk ) 1 > 0, where I(ξ λk ) 1 AI(ξ λk ) 1 is a continuous function for I. Based on Theorem 1 in [35], we can have tr(i(ξ λk+1 ) 1 A) tr(i(ξ λk ) 1 A) when λ k+1 λ k. Thus it shows the monotonicity of Algorithm 1. 19

20 Moreover, we also have that for ϕ(i) = IMSE(ξ λ,β) = tr(i 1 A), w(x i )g(x i )g T (x i ) ϕ(i) I = w(x I=I(ξ λ k ) i)g(x i )g T (x i )I(ξ λk ) 1 AI(ξ λk ) 1 0. For the sequence I(ξ λk ),k = 1,2,... from Algorithm 1, its limit point is obviously nonsingular because of the positive definiteness. Combining the above statements with results in Lemma 6, all the required conditions in Theorem 2 of Yu [35] are satisfied. Thus, using the results in Theorem 2 of Yu [35], we have all limit points of λ k are global minims of IMSE(ξ λ,β), and the IMSE(ξ λk,β) decreases monotonically to inf λ IMSE(ξλ,β) as k. 5 Sequential Algorithm of Constructing I-Optimal Design Recall that the General Equivalent Theorem in Section 3 provides some insightful guidelines on sequential selection of design points. In combination with finding optimal weights for fixed design points in Section 4, we propose an efficient sequential algorithm to construct the I-optimal design of generalized linear models. The details of the proposed sequential algorithm is summarized in Algorithm 2. Algorithm 2 Step 1: Choose an initial design D 0 containing vertexes of Ω and set r = 0. Step 2: Generate an N points Sobol sequence S and form the candidate pool C = S D 0. Step 3: Obtain optimal weights of design points using Algorithm 1. Step 4: Add the point x r = argmin φ(x,ξ r ) to the current design if φ(x r,ξr ) < ǫ, where x C φ(x,ξ r ) is the directional derivative expressed as φ(x,ξ r ) = tr(i(ξ r ) 1 A) w(x)g(x) T I(ξ r ) 1 AI(ξ r ) 1 g(x). Step 5: Set r = r +1, and go to Step 2 until convergence. In Algorithm 2, The value of ǫ > 0 is chosen to be a small positive value. Note that the proposed sequential algorithm of constructing I-optimal design for GLMs does not 20

21 require the computation of Hessian matrix as in Newton-Raphson methods. Thus it can be computationally efficient than the conventional methods. Also the sequential nature of the proposed algorithm enables efficient search of optimal weights without updating weights for all candidates points. Moreover, we also establish the convergence property of the proposed sequential algorithm as follows. Theorem 4 (Convergence of Algorithm 2). Assuming the candidate pool C is large enough, i.e., C Ω, or it includes all the support points in I-optimal design, the design constructed by Algorithm 2 converges to I-optimal design ξ that minimizes IMSE(ξ,β), as r, i.e., lim r IMSE(ξr,β) = IMSE(ξ,β). Proof. We will establish the argument by proof of contradiction. Assume that the ξ r does not converge to ξ, i.e., lim r IMSE(ξr,β) IMSE(ξ,β) > 0. Since the support set of the r-th iteration is a subset of support set of the (r + 1)-th iteration, it is obvious that IMSE(ξ r+1,β) IMSE(ξ r,β) for all r 0. Thus, there exists some a > 0, such that IMSE(ξ r,β) > IMSE(ξ,β)+a, for all r. Consider a design ξ = (1 α)ξ r +αξ, with 0 α 1, then φ(ξ,ξ r IMSE((1 α)ξ r +αξ,β) IMSE(ξ r,β) ) = lim α 0 α (1 α)imse(ξ r,β)+αimse(ξ,β) IMSE(ξ r,β) lim α 0 α = IMSE(ξ,β) IMSE(ξ r,β) < a, (11) where the inequality above is provided by the convexity of IMSE. Note that in each iteration, the new design point x r is chosen as x r = argmin φ(x,ξ r ). Thus, x C tr [ w(x r )g(x r )g(x r )T I(ξ r ) 1 AI(ξ r ) 1] = suptr [ w(x)g(x)g(x) T I(ξ r ) 1 AI(ξ r ) 1]. x C 21

22 When C Ω, or it contains all the support points in I-optimal design ξ, suptr [ w(x)g(x)g(x) T I(ξ r ) 1 AI(ξ r ) 1] tr [ w(x i )g(x i )g(x i ) T I(ξ r ) 1 AI(ξ r ) 1], x C with x i as any support point in ξ. As a result, tr [ w(x r )g(x r )g(x r )T I(ξ r ) 1 AI(ξ r ) 1] tr [ w(x i )g(x i )g(x i ) T I(ξ r ) 1 AI(ξ r ) 1], with x i as any support point in ξ. Denote the number of support points in optimal design ξ as n, then, by Theorem 1, φ(ξ,ξ r ) = tr [ I(ξ r ) 1 A) ] tr [ I(ξ )I(ξ r ) 1 AI(ξ r ) 1] = tr [ I(ξ r ) 1 A) ] tr [ n ] λ iw(x i )g(x i )g(x i ) T I(ξ r ) 1 AI(ξ r ) 1 = tr [ I(ξ r ) 1 A) ] n λ i tr[ w(x i )g(x i )g(x i ) T I(ξ r ) 1 AI(ξ r ) 1] n tr [ I(ξ r ) 1 A) ] λ i tr [ w(x r)g(x r)g(x r) T I(ξ r ) 1 AI(ξ r ) 1] = tr [ I(ξ r ) 1 A) ] tr [ w(x r)g(x r)g(x r) T I(ξ r ) 1 AI(ξ r ) 1] = φ(x r,ξr ). Together with inequality (11), we have φ(x r,ξ r ) < a. It is quite obvious from the derivation of the directional derivative in (4)) that IMSE((1 γ)ξ + γξ,β) is infinitely differentiable with respect to γ [0,1]. Thus, the second order derivative is bounded on γ, and denote the upper bound by U, with U > 0 as IMSE is a convex function. Then, consider another update strategy as ξ r+1 = (1 γ)ξ r +γx r, with γ [0,1], then since our algorithm achieves optimal weights in each iteration, we have, IMSE(ξ r+1,β) IMSE( ξ r+1,β). 22

23 Using the Taylor expansion of IMSE( ξ r+1,β), we have IMSE(ξ r+1,β) IMSE( ξ r+1,β) = IMSE(ξ r,β)+γφ(x r,ξ r )+ 1 2 γ2 2 IMSE((1 γ)ξ r +γx r,β) γ 2 < IMSE(ξ r,β) γa+ 1 2 γ2 U, γ=γ where γ is some value between 0 and 1. Consequently, we have IMSE(ξ r+1,β) IMSE(ξ r,β) < 1 ( 2 U γ a ) 2 a 2 U 2U. If a 1 U a, choose γ = a, then we have U U IMSE(ξ r+1,β) IMSE(ξ r,β) < a2 2U. If a U > 1 U < a, choose γ = 1, then we have IMSE(ξ r+1,β) IMSE(ξ r,β) < 1 U a < 0. 2 Both situations will lead to lim r IMSE(ξ r,β) =, which contradicts with IMSE(ξ r,β) > 0. In summary, the assumption that ξ r does not converge to ξ is not valid, and thus we prove that ξ r converges to ξ. Wewould like to point out that thechoice of candidatepool C would affect theefficiency of Algorithm 2. Based on the convergence property of the algorithm, we choose the Sobol sequence [22] as the candidate pool. The Sobol sequence is a space filling design that covers the experimental domain Ω better and is efficient when the dimension of the input variables is high. To further improve the efficiency of the algorithm, a search strategy inspired by StufkenandYang[34]couldbeemployed. OnecanstartwithamoresparseSobolsequence, and achieve the current best design using Algorithm 2. Then, we can further create denser and denser candidate pool in the neighborhood of the support points in the current best design until there is no further improvement under the I-optimality criterion. 23

24 6 Numerical Examples In this section, we will conduct several numerical examples to evaluate the performance of the proposed sequential algorithm (Algorithm 2) of constructing I-optimal designs for GLMs. Specifically, we will focus on the feasibility of our algorithm through examples of logistic regression model with binary response. The proposed algorithm surely is applicable for I-optimality for other GLMs such as the Poisson regression. Assume that thedomainofd dimensional input variablex = [x 1,...,x d ]isstandardized to be a unit hypercube [ 1,1] d. With p basis functions g(x) = [g 1 (x),...g p (x)] T and regression coefficients β = [β 0,...,β p 1 ] T, the logistic regression model with binary response Y {0,1} is defined as: Given a design the I-optimality criterion is with A = [ 1,1] d g(x)g(x) T Prob(Y = 1 x) = eβt g(x) 1+e βt g(x). x 1,..., x n ξ = λ 1,..., λ n, IMSE(ξ,β) = tr ( AI(ξ) 1), e2βt g(x) dx, and I(ξ) = n (1+e βt g(x) ) 4 λ i e βt g(x i ) (1+e βt g(x i) ) 2 g(x i )g(x i ) T. 6.1 Univariate Logistic Model with Two Parameters Consider a univariate logistic model with two parameters β 0 and β 1, the model could be expressed as: Prob(Y = 1 x) = eβ 0+β 1 x 1+e β 0+β 1 x. For a univariate logistic model with two parameters, i.e. d = 1 and g = [1,x] T, we can define an induced variable as c = β 0 + β 1 x, and the range of c as [D 1,D 2 ]. Yang and Stufken[33] theoretically proved that the optimal design may have two support points and the location of the support points in the form of induced variable c are: symmetric, if D 1 = D 2 ; 24

25 symmetric, or D 1 with another point in ( D 1,D 2 ], if D 1 < 0 < D 2, and D 1 < D 2 ; symmetric, or D 2 with another point in [D 1,D 2 ), if D 1 < 0 < D 2, and D 1 > D 2 ; D 2 with another point, if D 2 0 D 1 with another point, if D 1 0; In the following numerical examples, we would like to verify Yang and Stufken s results using our proposed algorithm. The proposed algorithm also provides the optimal weights of the support points. The numerical examples of interest include: (a) β 0 = 0,β 1 = 2, and the range of induced variable c = x is [ 2,2]. (b) β 0 = 0.2,β 1 = 1.6, and the range of induced variable c = x is [ 1.4,1.8] (c) β 0 = 0.27, β 1 = 1.12, andtherangeofinducedvariablec = xis[ 0.85,1.39]. (d) β 0 = 1,β 1 = 0.9, and the range of induced variable c = 1 0.9x is [ 1.9, 0.1]. (e) β 0 = 2,β 1 = 1.9, and the range of induced variable c = 2+1.9x is [0.1,3.9]. According to Yang and Stufken s result[33], the I-optimal design should have two points and the support points in form of induced variable c = β 0 +β 1 x of the five scenarios should be: (a) symmetric. (b) symmetric, or 1.4 and another point in (1.4, 1.8]. (c) symmetric, or 0.85 and another point in (0.85, 1.39]. (d) 0.1 and another point. (e) 0.1 and another point. 25

26 In the implementation of Algorithm 2 for the above examples, the candidate pool consists of N = 2 14 Sobol points and the two interval ends 1 and 1. Table 1 shows the exact locations and weights of support points and the computational time (in seconds) of our algorithm. Table 1: Local I-Optimal Design of Univariate Logistic Model with Two Parameters β 0 = 0 β 0 = 0.2 β 0 = 0.27 β 1 = 2 β 1 = 1.6 β 1 = 1.12 Support Points ( , ) ( , ) ( 1, ) Values of Induced Variable ( , ) ( , ) ( 0.85, 1.2) Weights (0.4960, ) (0.4731, ) (0.4776, ) Computational Time (in seconds) β 0 = 1 β 0 = 2 β 1 = 0.9 β 1 = 1.9 Support Points ( 1, 1) ( 1, ) Values of Induced Variable ( 1.9, 0.1) (0.1, ) Weights (0.5051, ) (0.4364, ) Computational Time (in seconds) From the results in Table 1, we can see that the support points computed using our algorithm agree with those derived theoretically by Yang and Stufken[33]. Moreover, the convergence of the proposed algorithm is fairly fast. 6.2 Multivariate Logistic Model We further consider the logistic model with multiple input variable as linear predictor to explore the computation efficiency of the proposed algorithm in terms of the dimension of predictors. The multivariate logistic model with binary response Y {0, 1} is expressed as: 26

27 eβ0+ d β i x i Prob(Y = 1 x) =. 1+e β d 0+ β i x i As mentioned in Section 3, the I-optimal designs are not necessarily unique. The following simple example with d = 2, i.e., β = [0,2,2] T, is to illuminate this point. Using the proposed sequential algorithm, we obtain an I-optimal design as: ( 1, 1), (0.2915, 1), (1, ) ξ 1 = , , whose optimality could be proved numerically using Theorem 2 (GET). By symmetry, one, can easily obtain a different I-optimal design as (1, 1), ( 1, ), ( , 1) ξ 2 = , , Actually, there are infinitely many I-optimal designs in this example, as any convex linear combination of ξ 1 and ξ 2 would yield to an I-optimal design for this example. Now we study the computational efficiency of the proposed sequential algorithm in the case of d = 2,5 and 10 predictors with coefficients as (a) d = 2, β = [2,1, 2.5] T. (b) d = 5, β = [0.5,1.6, 2.5,2, 1.8] T. (c) d = 10, β = [0.5,1.6, 2.5,2, 1.8,4, 2.1, 1.6,2.2,2.5, 2] T. In the implementation of Algorithm 2 for the above examples, the candidate pool consists of , , candidate points, respectively. The computation time of the proposed algorithm for three examples are given in Table 6.2. Table 2: Computation time (in seconds) for logistic models with different input variable dimension d = 2 d = 5 d = 10 Candidate Pool Size N Computational Time (in seconds)

28 From the results in Table 2, the computation time of the proposed algorithm does not increasedramaticallyasdincreases, especially fromd = 5tod = 10. Wewouldlike topoint out that such an observation does not necessarily imply that the dimension of predictors have no impact on the complexity of our algorithm. Since the proposed algorithm updates the weights only on a small size of design points instead of the whole candidate pool, the size of the candidate pool does not play a significant role in computation time. With that being said, the dimension of predictors would influence the computational time to some extent, but not as much as the traditional multiplicative algorithm which requires updating the weights on a huge candidate pool. 7 Discussion In this work, we have developed an efficient sequential algorithm of constructing I-optimal designs for the generalized linear models. By establishing the General Equivalence Theorem of the I-optimality for GLMs, we have obtained an insightful understanding for the proposed algorithm on how to sequentially choose the support point and update the weights of support points of the design. The proposed algorithm is computationally efficient with guaranteed convergence property. Moreover, all the computations in the proposed algorithm are explicit and simple to code by embracing the support from theory. The proposed method exemplifies a good case on the integration of theory and computation to advance the development of new statistical methodology. It is worth pointing out that the proposed sequential algorithm (Algorithm 2) is not restricted to the I-optimality for constructing optimal designs. It can be easily extended to other optimality criteria as long as the directional derivative φ(ξ,ξ) in (3) exists. Take an example of φ(ξ,ξ) = p tr(i 1 (ξ)i(ξ )) for D-optimality, where p is the dimension of the regression coefficient vector β. Then, at Step 4 on the rth iteration in Algorithm 2, the candidate point that minimizes single direction derivative φ(x,ξ r ) = p w(x)g(x) T I(ξ r ) 1 g(x) will be added to the current design ξ r, if the minimal is negative. The optimal weights for 28

29 the current design ξ r could be achieved using Algorithm 4.2 with updated rule in the kth iteration as λ k+1 i = λ k i [ ] δ w(x i )g T (x i )I(ξ λk ) 1 g(x i ) n [ w(xi )g T (x i )I(ξ λk ) 1 g(x i ) ], i = 1,...,n. δ λ k i A thorough investigation of extending the proposed algorithm for other optimality criteria will be reported elsewhere in the future. Note that this work focuses on the local I-optimal designs for the generalized linear models, which depends on given regression coefficients. To alleviate this limitation, we will further investigate the robust I-optimal design for the GLMs. One possibility is to establish a tight upper bound for the integrated mean squared error in Lemma 1 to relax the dependency on the regression coefficients. Then we can modify the proposed algorithm in search of optimal designs based on the robust optimality criterion constructed according to the upper bound. Hickernell and Liu [5] developed some theoretical results in a similar direction for linear regression models under model uncertainty, and Li and Hickernell [30] extended the results to linear regression models with gradient information. Another interesting direction for the future work is to construct I-optimal designs for the models with both quantitative and qualitative responses [29], which is commonly occurred in the manufacturing and biomedical systems. Since the GLMs include both the linear regression for continuous response and the logistic regression for binary response, it would be interesting to study the I-optimal designs under the consideration of jointly modeling both quantitative and qualitative responses. References [1] Bernstein, D. S. (2009), Matrix Mathematics (2nd Edition), Princeton University Press. [2] Box, G. E. P. and Draper, N. R. (1959), A Basis for the Selection of a Response Surface Design, Journal of the American Statistical Association, 54,

30 [3] Chernoff, H. (1999), Gustav Elfving s Impact on Experimental Design, Statistical Science, 14, [4] Dette, H. and Melas, V. B. (2011), A Note on the de la Garza Phenomenon for Locally Optimal Designs, The Annals of Statistics, 39, [5] F J Hickernell, Liu, M. (2002), Uniform designs limit aliasing, Biometrika, 89, [6] Favard, J. (1935), Sur les polynomes de Tchebicheff, Comptes Rendus de l Académie des Sciences, 200, [7] Fedorov, V. V. (1972), Theory of Optimal Experiments, New York: Academic Press. [8] Fellman, J. (1974), On the Allocation of Linear Observations, Societas Scientiarum Fennica. [9] Fiacco, A. V. and Kortanek, K. O. (eds.) (1983), A Moment Inequality and Monotonicity of an Algorithm. [10] Goos, P., Jones, B., and Syafitri, U. (2016), I-Optimal Design of Mixture Experiments, Journal of the American Statistical Association, 111, [11] Haines, L. M. (1987), The Application of the Annealing Algorithm to the Construction of Exact Optimal Designs for Linear-Regression Models, Technometrics, 29, [12] Khuri, A. I., Mukherjee, B., Sinha, B. K., and Ghosh, M. (2006), Design Issues for Generalized Linear Models, Technometrics, 48, [13] Kiefer, J. (1974), General Equivalence Theorey of Optimum Designs (Approximate Theory), Annals of Statistics, 2, [14] Kiefer, J. and Wolfowitz, J. (1960), The Equivalence of Two Extremum Problems, Canadian Journal of Mathematics, 12,

31 [15] Meyer, R. K. and Nachtsheim, C. J. (1988), Simulated Annealing in the Construction of Exact Optimal Design of Experiments, American Journal of Mathematical and Management Science, 8. [16] (1995), The Coordinate-Exchange Algorithm for Constructing Exact Optimal Experimental Designs, Technometrics, 37, [17] Nelder, J. and Wedderburn, R. (1972), Generalized Linear Models, Journal of Royal Statistical Society-Series B, 135, [18] Pursell, L. and Trimble, S. T. (1991), Gram-Schmidt Orthogonalization by Gauss Elimination, The American Mathematical Monthly, 98, [19] Schein, A. I. and Ungar, L. H. (2007), Active Learning for Logistic Regression: An Evaluation, Machine Learning, 68, [20] Silvey, S. D. (2013), Optimal design: an introduction to the theory for parameter estimation, vol. 1, Springer Science & Business Media. [21] Silvey, S. D., Titterington, D. M., and Torsney, B. (1978), An algorithm for optimal designs on a finite design space, Communications in Statistics - Theory and Methods, 14, [22] Sobol, I. M. (1967), On the Distribution of Points in a Cube and the Approximate Evaluation of Integrals, USSR Computational Mathematics and Mathematical Physics, 7, [23] Titterington, D. M. (1978), Estimation of Correlation Coefficients by Ellipsoidal Trimming, Applied Statistics, 27, [24] White, L. V. (1973), An Extention of the General Equivalence Theorem to Nonlinear Models, Biometrika, 60, [25] Whittle, P. (1973), Some General Points in the Theory of Optimal Experimental Design, Journal of Royal Statistical Society-Series B, 35,

32 [26] Wu, H. and Stufken, J. (2014), Locally φ p -optimal designs for generalized linear models with a single-variable quadratic polynomial predictor, Biometrika, 101, [27] Wynn, H. (1970), The Sequential Generation of D-optimum Experimental Designs, Annals of Mathematical Statistics, A 7, [28] Wynn, H. P. (1972), Results in the Theory and Construction of D-Optimum Experimental Designs, Journal of Royal Statistical Society-Series B, 34, [29] X Deng, Jin, R. (2015), QQ Models: Joint Modeling for Quantitative and Qualitative Quality Responses in Manufacturing Systems, Technometrics, 57, [30] Y, Y. L. and Hickernell, F. J. (2014), Design of experiments for linear regression models when gradient information is available, Journal of Statistical Planning and Inference, 144, [31] Yang, M. (2010), On the de la Garza Phenomenon, The Annals of Statistics, 38, [32] Yang, M., Biedermann, S., and Tang, E. (2013), On Optimal Designs for Nonlinear Models: A general and Efficient Algorithm, Journal of the American Statistical Association, 108, [33] Yang, M. and Stufken, J. (2009), Support points of locally optimal designs for nonlinear models with two parameters, The Annals of Statistics, 37, [34] (2012), Identifying Locally Optimal Designs for Nonlinear Models: A Simple Extension with Profound Consequences, The Annals of Statistics, 40, [35] Yu, Y. (2010), Monotonic Convergence of a General Algorithm for Computing Optimal Designs, The Annals of Statistics, 38,

OPTIMAL DESIGNS FOR GENERALIZED LINEAR MODELS WITH MULTIPLE DESIGN VARIABLES

Statistica Sinica 21 (2011, 1415-1430 OPTIMAL DESIGNS FOR GENERALIZED LINEAR MODELS WITH MULTIPLE DESIGN VARIABLES Min Yang, Bin Zhang and Shuguang Huang University of Missouri, University of Alabama-Birmingham