On I-Optimal Designs for Generalized Linear Models: An Efficient Algorithm via General Equivalence Theory

Size: px
Start display at page:

Download "On I-Optimal Designs for Generalized Linear Models: An Efficient Algorithm via General Equivalence Theory"

Transcription

1 arxiv: v1 [stat.me] 17 Jan 2018 On I-Optimal Designs for Generalized Linear Models: An Efficient Algorithm via General Equivalence Theory Yiou Li Department of Mathematical Sciences, DePaul University, Chicago, IL ( and Xinwei Deng Department of Statistics, Virginia Tech, Blacksburg, VA ( Yiou Li is Assistant Professor, Department of Mathematical Sciences, DePaul University, Chicago, IL ( Xinwei Deng is Associate Professor, Department of Statistics, Virginia Tech, Blacksburg, VA ( 1

2 Abstract The generalized linear model plays an important role in statistical analysis and the related design issues are undoubtedly challenging. The state-of-the-art works mostly apply to design criteria on the estimates of regression coefficients. It is of importance to study optimal designs for generalized linear models, especially on the prediction aspects. In this work, we propose a prediction-oriented design criterion, I-optimality, and develop an efficient sequential algorithm of constructing I-optimal designs for generalized linear models. Through establishing the General Equivalence Theorem of the I-optimality for generalized linear models, we obtain an insightful understanding for the proposed algorithm on how to sequentially choose the support points and update the weights of support points of the design. The proposed algorithm is computationally efficient with guaranteed convergence property. Numerical examples are conducted to evaluate the feasibility and computational efficiency of the proposed algorithm. Keywords: Experimental Design, Fast convergence; Multiplicative algorithm; Predictionoriented; Sequential algorithm 2

3 1 Introduction The generalized linear model (GLM) is a flexible generalization of linear models that allow the response being related to the predictor variables through a link function. It was first formulated by Nelder and Wedderburn [17] as a generalization of various statistical models, including linear regression, logistic regression and Poisson regression. The GLMs are used widely in many statistical analysis with different applications including business analytics, image analysis, bioinformatics, etc. From a data collection perspective, it is of importance to understand what is an optimal design for the generalized linear model, especially on the prediction aspects. Suppose that an experiment has d input factors x = [x 1,...,x d ], and let Ω j be a measurable set of all possible levels of the jth input factor. Common examples of Ω j are [ 1,1] and R. The experimental region, Ω, is some measurable subset of Ω 1 Ω d. In a generalized linear model, the response variable Y(x) is assumed to follow a particular distribution in the exponential family, which includes normal, binomial, Poisson and gamma distribution, etc. A link function provides the relationship between the linear predictor, η = β T g(x), and the mean of the response Y(x), µ(x) = E[Y(x)] = h 1( β T g(x) ), (1) whereg = [g 1,...,g p ] T arethebasisandβ = [β 1,β 2,...,β p ] T arethecorrespondingregression coefficients. Here h : R R is the link function, and h 1 is the inverse function of h. In this work, we consider a continuous design ξ as: x 1,..., x n ξ = λ 1,..., λ n with x i x j if i j. The λ i (λ i 0) represents the fraction of experiments that is to be carried out at design point x i and n λ i = 1. The Fisher information matrix of the generalized linear model in (1) is: I(ξ,β) = n λ i g(x i )w(x i )g T (x i ), (2) 3

4 where w(x i ) = 1/{var(Y(x i ))[h (µ(x i ))] 2 }. The notation I(ξ,β) emphasizes the dependence on the design ξ and regression coefficient β. The design issues of linear models have been well studied due to its simplicity and practicality [20]. The design issues of generalized linear models tend to be much more challenging due to the complicated Fisher information matrix. In GLMs, the Fisher information matrix I(ξ, β) often depends on the regression coefficient β through the function w(x i ) in (2). As most optimal design criteria can be expressed as a function of the Fisher information matrix, a scientific understanding of a locally optimal design is often conducted under the assumption that some initial estimates of the model coefficients are available. Khuri et al.[12] surveyed design issues for generalized linear models and noted that the research on designs for generalized linear models is still very much in developmental stage. Not much work has been accomplished either in terms of theory or in terms on computational methods to evaluate the optimal design when the dimension of the design space is high. Yang and Stufken [33], Yang [31], Dette and Melas [4], and Wu and Stufken [26] obtained a series of theoretical results for several optimality criteria with univariate nonlinear models and univariate generalized linear models. On the computational aspect, most literature focuses on the D-optimality. Popular algorithms include modified versions of Fedorov-Wynn algorithm proposed by Wynn [27] and Fedorov [7] that updates the support of the design and the multiplicative algorithm proposed by Silvey et al. [21] that updates the weights of candidate points from a large candidate pool. Chernoff [3] expressed concerns about the concentration of study on D-optimality, advocating the study of other criteria. Research works on algorithms of finding optimal design under other criteria are sparse. Yang et al. [32] proposed a multistage algorithm to construct φ p optimal design by sequentially updating the support of the design and optimizing the weights using the Newton s method in each iteration. A practical objective of the experimental design often focuses on the accuracy of the model prediction. Under such a consideration, the corresponding criterion, I-optimality, aims at minimizing the integrated mean squared error of the prediction. There are some 4

5 literatures on I-optimality for linear regression models. Haines[11] proposed a simulated annealing algorithm to obtain exact I-optimal design. Meyer and Nachtsheim[15][16] used simulated annealing and coordinate-exchange algorithms to construct exact I-optimal design. Recently, Goos et al.[10] proposed a modified multiplicative algorithm to find I- optimal designs for the mixture experiments. However, based on our best knowledge, there are few literatures on defining and obtaining I-optimal designs for generalized linear models (GLMs). The contribution of this work aims to fill the gap in the theory of I-optimal designs for GLMs and develop an efficient algorithm of constructing I-optimal designs for GLMs. Specifically, we first establish the General Equivalence Theorem of the I-optimality for GLMs. The established theoretical results shed some insightful lights on how to develop efficient algorithms on constructing the I-optimal design for GLMs. The General Equivalence Theorem is well established for linear regression models, but not for GLMs. Kiefer and Wolfowitz [14] derived an equivalence theorem on D-optimality for linear models, and White [24] extended the theorem on D- and G-optimalities to nonlinear models. Later, Kiefer [13] generalized General Equivalence Theorem (GET) to φ p -optimality for linear regression models. Since the I-optimality is not a member of φ p -optimality, the General Equivalence Theorem on I-optimality for GLMs is not thoroughly investigated. After establishing the General Equivalence Theorem of I-optimality for generalized linear models, we further develop an efficient sequential algorithm of constructing I-optimal designs for GLMs. The proposed algorithm is able to sequentially add points into design and update their weights simultaneously by using a modified multiplicative algorithm [23]. We also establish the convergence properties of the proposed algorithm. The multiplicative algorithm, first proposed by Titterington [23] and Silvey et al.[21], has been used widely in finding optimal designs of linear regression models. The original multiplicative algorithm often requires a large candidate pool and updating the weights of all candidate points simultaneously can be computationally expensive. For our proposed sequential algorithm, we have modified the multiplicative algorithm and use it on a much smaller set of points. The proposed algorithm can significantly enhance the computation efficiency because of the sequential nature of the algorithm. It is worth point- 5

6 ing out that the proposed algorithm can be easily extended to construct optimal design under other optimality criteria. The remainder of this work is organized as follows. We introduce the I-optimality criterion for the generalized linear model in Section 2. In Section 3, we derive the General Equivalence Theorem on the I-optimality for the generalized linear models. We detail the proposed sequential algorithm and establish its the convergence properties in Sections 4-5. Numerical examples are conducted in Section 6 to evaluate the performance of the proposed algorithm. We conclude this work with some discussion in Section 7. 2 The I-Optimality Criterion for GLM For the generalized linear models, one practical goal is to make prediction of response Y(x) at given input x. Thus it is important to adopt the predication oriented criterion for considering the optimal design. The integrated mean squared error (IMSE) is defined as the expected value of mean squared error between fitted mean response ˆµ(x) and true mean response µ(x) over the domain of input x with distribution of input x as F IMSE on the experimental region Ω, i.e., E [ Ω (ˆµ(x) µ(x))2 df IMSE ]. Under the context of generalized linear models, the fitted mean response ˆµ(x) = h 1 (ˆβT g(x) ) can be expanded around the true mean response µ(x) = h 1( β T g(x) ) using Taylor expansion, which is ˆµ(x) µ(x) = h 1 (ˆβT g(x) ) h 1( β T g(x) ) c(x) T (ˆβ β ), with c(x) = ( ) T ( ) h 1 β 1 (x),..., h 1 β d (x) = g(x) dh 1. Here η = β T g(x) is the linear predic- dη tor. Then, using the above first-order Taylor expansion, the integrated mean squared error could be approximated with, [ IMSE(ξ,β) = E Ω c(x) T (ˆβ β )(ˆβ β ) T c(x)dfimse ]. In numerical analysis, the first-order Taylor expansion is commonly used to approximate the difference of a nonlinear function when evaluated at two close points. Considering the 6

7 design issue for generalized linear models, similar approaches was commonly used in the literature, such as the work in Schein and Ungar [19] for the logistic regression models. Lemma 1. The IMSE(ξ,β) could be expressed as the trace of the product of two matrices, IMSE(ξ,β) = tr ( AI(ξ,β) 1), where matrix A = Ω c(x)c(x)t df IMSE = [ ] 2dFIMSE Ω g(x)gt dh (x) 1 depends only on the dη regression coefficients and basis functions, but not the design, and the Fisher information matrix I(ξ, β), defined in equation (2), depends on both the regression coefficients and the design. Proof. [ IMSE(ξ,β) = E Ω = c T E Ω Ω[( = tr ) ] T c (ˆβ T β )(ˆβ β cdfimse [ ) ] T (ˆβ β )(ˆβ β cdf IMSE tr ( cc T I(ξ,β) 1) df IMSE ) ] cc T df IMSE I(ξ,β) 1 Ω = tr ( AI(ξ,β) 1). Theapproximationisprovidedbythefactthattheestimatedregressioncoefficient ˆβ follows N(β,I(ξ,β) 1 ) asymptotically. We call the I-optimality as the corresponding optimality criterion aiming at minimizing IMSE(ξ,β) = tr ( AI(ξ,β) 1). A design ξ is called an I-optimal design if it minimizes IMSE(ξ,β). The I-optimality for linear regression was first proposed by Box and Draper [2] back in 1959, and has been studied extensively for decades. The I-optimality for generalized linear models defined above follows a similar setup. In this article, we focus on local I-optimal design with a given regression coefficient β. To simplify the notation, in the rest of the article, β is omitted from the notation I(ξ,β) 7

8 of the Fisher information matrix, and I(ξ) is used instead. The following lemma shows the convexity of the IMSE(ξ,β), which provides a key fact that facilitates the proof of the General Equivalence Theorem in Section 3. Lemma 2. The IMSE(ξ,β) is a convex function of the design ξ. Proof. Consider two designs ξ 1 and ξ 2, and define ξ = (1 a)ξ 1 +aξ 2. With 0 a 1, ξ is still a feasible design. The Fisher information matrix of ξ could be expressed as I(ξ) = (1 a)i(ξ 1 )+ai(ξ 2 ). Since the three Fisher information matrices are positive definite and the inverse of positive definite matrix is convex [1], we have: I(ξ) 1 (1 a)i(ξ 1 ) 1 +ai(ξ 2 ) 1. Since A is symmetric and positive semi-definite, there exists a unique positive semi-definite matrix B with B 2 = A [1]. As B is symmetric, we have BI(ξ) 1 B (1 a)bi(ξ 1 ) 1 B+aBI(ξ 2 ) 1 B. As a result, tr(ai(ξ) 1 ) = tr(bi(ξ) 1 B) tr((1 a)bi(ξ 1 ) 1 B+aBI(ξ 2 ) 1 B) = (1 a)tr(ai(ξ 1 ) 1 )+atr(ai(ξ 2 ) 1 ). That means IMSE((1 a)ξ 1 + aξ 2,β) (1 a)imse(ξ 1,β) + aimse(ξ 2,β). Thus IMSE(ξ,β) is a convex function of design ξ. 3 General Equivalence Theorem of I-Optimality for GLM In this section, we will derive the General Equivalence Theorem of I-optimality for generalized linear models. The established theoretical results facilitate the sequential choice of support points as that of Fedorov-Wynn algorithm [27, 7]. 8

9 We first investigate the directional derivative of IMSE(ξ, β) with respect to ξ. Given two designs ξ and ξ, let the design ξ be constructed as ξ = (1 α)ξ +αξ. Then, the derivative of IMSE(ξ,β) in the direction of design ξ is φ(ξ IMSE( ξ,β) IMSE(ξ,β),ξ) = lim. (3) α 0 + α Theorem 1. The directional derivative of IMSE(ξ,β) in the direction of any design ξ is, φ(ξ IMSE( ξ,β) IMSE(ξ,β),ξ) = lim = tr(i(ξ) 1 A) tr(i(ξ )I(ξ) 1 AI(ξ) 1 ). α 0 + α Proof. Given ξ = (1 α)ξ +αξ, we have I( ξ) = (1 α)i(ξ)+αi(ξ ). For any matrix B as a function of α, the derivative of its inverse B 1 can be calculated as B 1 α = B 1 B α B 1 [1]. So, the derivative of I( ξ) 1 A with respect to α can be expressed as, [ ] I( ξ) 1 A 1 I( ξ) = α α A = I( ξ) 1 [I(ξ ) I(ξ)]I( ξ) 1 A. (4) Then, the directional derivative of IMSE(ξ, β) is φ(ξ,ξ) = tr[i( ξ) 1 A] α [ α=0 ] I( ξ) 1 A = tr α α=0 = tr( I( ξ) 1 [I(ξ ) I(ξ)]I( ξ) 1 A) α=0 = tr(i(ξ) 1 A) tr ( I(ξ) 1 I(ξ )I(ξ) 1 A ). Corollary 1. The directional derivative of IMSE(ξ,β) in the direction of a single point x is given as, φ(x,ξ) = tr(i(ξ) 1 A) w(x)g(x) T I(ξ) 1 AI(ξ) 1 g(x). Proof. Consider ξ as as design with one design point x whose weight is 1. 9

10 Based on Lemma 2 and the above Corollary 1, we can derive the following General Equivalence Theorem of I-optimality for generalized linear models. This theorem provides an intuitive understanding on how to choose support points sequentially and check the optimality of a design. Theorem 2 (General Equivalence Theorem). The following three conditions of ξ are equivalent: 1. The design ξ minimizes IMSE(ξ,β) = tr(ai(ξ) 1 ). 2. The design ξ minimizes supw(x)g T (x)i(ξ) 1 AI(ξ) 1 g(x). x Ω 3. w(x)g T (x)i(ξ ) 1 AI(ξ ) 1 g(x) tr(i(ξ ) 1 A) holds over the experimental region Ω, Proof. and the equality holds only at the points of support of the design ξ. (i) 1 3: As proved in Lemma 2, the IMSE(ξ,β) = tr(ai(ξ) 1 ) is a convex function of the design ξ, and since ξ minimizes IMSE(ξ,β), the directional derivative φ(x,ξ ) 0 for all x Ω, i.e., tr(i(ξ ) 1 A) w(x)g(x) T I(ξ ) 1 AI(ξ ) 1 g(x), and equality holds only for support points of ξ. (ii) 1 2: Thiscouldbeprovedbycontradiction. Supposethatξ istheminimizerofimse(ξ,β), but is not the minimizer of supw(x)g T (x)i(ξ) 1 AI(ξ) 1 g(x). Then there exists a design ξ different from ξ that x Ω minimizes supw(x)g T (x)i(ξ) 1 AI(ξ) 1 g(x), x Ω 10

11 i.e., sup x Ω w(x)g T (x)i(ξ ) 1 AI(ξ ) 1 g(x) < supw(x)g T (x)i(ξ ) 1 AI(ξ ) 1 g(x). (5) x Ω As just proved above in (i), w(x)g T (x)i(ξ ) 1 AI(ξ ) 1 g(x) tr(i(ξ ) 1 A) holds over the experimental region Ω, we have supw(x)g T (x)i(ξ ) 1 AI(ξ ) 1 g(x) tr(i(ξ ) 1 A). (6) x Ω Combining the results in (5) and (6), it is easy to see that supw(x)g T (x)i(ξ ) 1 AI(ξ ) 1 g(x) < tr(i(ξ ) 1 A). (7) x Ω Recall that the weight of the jth point x j in design ξ is denoted as λ j. Then supw(x)g T (x)i(ξ ) 1 AI(ξ ) 1 g(x) sup w(x i )g T (x i )I(ξ ) 1 AI(ξ ) 1 g(x i ) x Ω x i ξ j λ j w(x j )g T (x j )I(ξ ) 1 AI(ξ ) 1 g(x j ) [ ] = tr λ j w(x j )g(x j )g T (x j )I(ξ ) 1 AI(ξ ) 1 j = tr(i(ξ ) 1 A) = IMSE(ξ,β) Comparing the left and right hand sides of equation (7), we have IMSE(ξ,β) < IMSE(ξ,β), which contradicts with condition 1. Thus, the design ξ must be the minimizer of supw(x)g T (x)i(ξ) 1 AI(ξ) 1 g(x). x Ω (iii) 2 1: This statement can also be proved by contradiction. Suppose the design ξ is the minimizer of supw(x)g T (x)i(ξ) 1 AI(ξ) 1 g(x), but not the minimizer of IMSE(ξ,β). x Ω Then there exists a design ξ different from ξ that minimizes IMSE(ξ,β), i.e., IMSE(ξ,β) < IMSE(ξ,β), 11

12 and as a result from (i), φ(x,ξ ) 0 for all x Ω. Based on Corollary 1 and the assumption that ξ is the minimizer of we thus have, supw(x)g T (x)i(ξ) 1 AI(ξ) 1 g(x), x Ω tr(i(ξ ) 1 A) supw(x)g(x) T I(ξ ) 1 AI(ξ ) 1 g(x) x Ω supw(x)g(x) T I(ξ ) 1 AI(ξ ) 1 g(x). (8) x Ω Since IMSE(ξ,β) < IMSE(ξ,β) as defined, together with equation (8), we have supw(x)g(x) T I(ξ ) 1 AI(ξ ) 1 g(x) tr(i(ξ ) 1 A) < tr(i(ξ ) 1 A), x Ω So, φ(x,ξ ) > 0 for all x in Ω. Together with Corollary 1, for any point x i in the design ξ, we have λ i tr(i(ξ ) 1 A) > λ i w(x i )g(x) T I(ξ ) 1 AI(ξ ) 1 g(x i ), i = 1,2,...,n. Summing over all design points in ξ, we have tr(i(ξ ) 1 A) > i λ i w(x i )g(x) T I(ξ ) 1 AI(ξ ) 1 g(x i ) = tr(i(ξ ) 1 A), which leads to a contradiction that tr(i(ξ ) 1 A) > tr(i(ξ ) 1 A). Thus, ξ must minimize IMSE(ξ, β). (iv) 3 1: Suppose that w(x)g T (x)i(ξ ) 1 AI(ξ ) 1 g(x) tr(i(ξ ) 1 A) holds over the experimental region Ω, and the equality holds only at the points of support of the design ξ, then the directional derivative φ(x,ξ ) 0 for all x Ω, and φ(x i,ξ ) = 0 for all x i ξ. Thus, by the convexity of IMSE, ξ minimizes IMSE, and condition 1. holds. 12

13 Theorem 2 provides an intuitive way for choosing the support points to construct an I-optimal design in a sequential fashion. According to Theorem 2, for an optimal design ξ, the directional derivative φ(x,ξ ) is non-negative for any x Ω. It implies that for any non-optimal design, there will be some directions in which the directional derivative φ(x,ξ) < 0. Given a current design ξ, to gain the maximal decrease in the I-optimality criterion, we would choose a new support point x to be added into the design, if φ(x,ξ) = minφ(x,ξ) < 0. Then one can optimize the weights of all support points in the updated x Ω design, which will be described in Algorithm 1 in Section 4. By iterating the selection of support point and the weight update of all support points, this two-step iterative procedure can be continued until minφ(x,ξ) 0 for all x Ω, which means the updated design x Ω is I-optimal. Such a sequential algorithm of constructing optimal designs, as described in Algorithm 2 in Section 5, follows similar spirits in the widely-used Fedorov-Wynn algorithm [27, 28]. Note that the I-optimal designs are not necessarily unique. More details will be discussed in Section 6.2 after the proposed algorithms are introduced. In next section, we will first discuss the details on how to optimize the weights of supports point for a given design. Then we will describe Algorithm 2 in details in Section 5 on the sequential algorithm of constructing I-optimal designs for the generalized linear models. 4 Optimal Weights Given the Design Points In this section, we will investigate how to assign optimal weights to support points that minimize IMSE(ξ,β) when the design points are given. In Wynn s work [27, 28], equal weightsareassignedtoalldesignpointsafteranewdesignpointisadded. Whittle[25]made improvement by choosing the weight of the newly added point as the one that minimizes the optimality criterion and the weights of existing points proportional to their current weights. Yang et al. [32] proposed using a Newton-Raphson method to find the optimal weights, which solves for the roots of first order partial derivatives of the criterion with respect to the weights. Note that the Newton-Raphson method may get negative weights and thus more efforts need to be taken to eliminate those negative weights. Also when the 13

14 first order partial derivatives does not exist, such as under the c-optimality criterion, Yang et al. s algorithm [32] is not applicable. To address the above challenges, we develop a modified multiplicative algorithm for finding the optimal weights of support points. The developed algorithm does not require the partial derivative and Hessian matrix inversion, which can be computationally efficient. 4.1 Condition of Optimal Weights Given the design points as x 1,x 2,...,x n fixed, denote λ = [λ 1,λ 2,...,λ n ] T to be the weight vector with λ i as the weight for the corresponding design point x i. We write the corresponding design as x ξ λ 1,..., x n = λ 1,..., λ n A superscript λ is added to emphasize that only the weight vector is changeable in the design under this situation. The optimal weight vector λ should bethe one that minimizes IMSE(ξ,β) with design points x 1,...,x n fixed. We can still obtain the directional directional derivative of IMSE(ξ, β) with respect to the weight vector λ. Consider another weight vector λ = [ λ 1,..., λ n ] T. Then the convex combination λ = (1 α)λ + α λ with 0 α 1 is also a weight vector. Define the directional derivative of IMSE(ξ, β) in the direction of a new weight vector λ = [ λ 1,..., λ n ] as:. IMSE(ξ λ,β) IMSE(ξ λ,β) ψ( λ,λ) = lim. α 0 + α Lemma 3. The IMSE(ξ,β) is a convex function of weight vector λ. Proof. Since λ = (1 a)λ+a λ, we have I(ξ λ) = (1 a)i(ξ λ )+ai(ξ λ ). Then, the rest of the proof is similar to that of Lemma 2. Lemma 4. The directional derivative of IMSE(ξ,β) in the direction of a new weight vector λ = [ λ 1,..., λ n ] T, ψ( λ,λ) = tr ( I(ξ λ ) 1 A ) n λ i w(x i )g(x i ) T I(ξ λ ) 1 AI(ξ λ ) 1 g(x i ). 14

15 Proof. This is a special case of the general directional derivative in equation 3, where the design points in ξ = ξ λ and ξ = ξ λ are the same, but with different weights. Thus, Theorem 1 still holds and ψ( λ,λ) = tr(i(ξ) 1 A) tr(i(ξ )I(ξ) 1 AI(ξ) 1 ) = tr ( I(ξ λ ) 1 A ) n λ i w(x i )g(x i ) T I(ξ λ ) 1 AI(ξ λ ) 1 g(x i ). The following result provides a necessary and sufficient condition of optimal weight vector to minimize IMSE(ξ, β) when the design points are fixed. Theorem 3. Given a fixed set of design points x 1,...,x n, the following two conditions on the weight vector λ = [λ 1,...,λ n] T are equivalent: 1. The weight vector λ minimizes IMSE(ξ,β). 2. tr(i(ξ λ ) 1 A) = w(x i )g(x i ) T I(ξ λ ) 1 AI(ξ λ ) 1 g(x i ), for i = 1,...,n. Proof. (i) (1) (2) Since λ minimizes IMSE(ξ,β), and IMSE(ξ,β) is a convex function of weight vector as proved in Lemma 3, ψ( λ,λ ) = tr ( I(ξ λ ) 1 A ) n λ i w(x i )g(x i ) T I(ξ λ ) 1 AI(ξ λ ) 1 g(x i ) = n [ λ i tr(i(ξ λ ) 1 A) w(x i )g(x i ) T I(ξ λ ) 1 AI(ξ λ ) 1 g(x i ) ] 0, for all feasible weight vector λ. Thus, tr(i(ξ λ ) 1 A) w(x i )g(x i ) T I(ξ λ ) 1 AI(ξ λ ) 1 g(x i ) 0 for i = 1,...,n. Now we will show that actually, tr(i(ξ λ ) 1 A) w(x i )g(x i ) T I(ξ λ ) 1 AI(ξ λ ) 1 g(x i ) = 0 for i = 1,...,n. 15

16 Suppose there exists at least one x i, such that tr(i(ξ λ ) 1 A) w(x i )g(x i ) T I(ξ λ ) 1 AI(ξ λ ) 1 g(x i ) > 0. Then, tr(i(ξ λ ) 1 A) = > n λ i tr(i(ξ λ ) 1 A) n λ i w(x i)g(x i ) T I(ξ λ ) 1 AI(ξ λ ) 1 g(x i ) = tr(i(ξ λ ) 1 A), which is a contradiction. So, tr(i(ξ λ ) 1 A) w(x i )g(x i ) T I(ξ λ ) 1 AI(ξ λ ) 1 g(x i ) = 0, for i = 1,...,n. (ii) (2) (1) It is obvious since ψ( λ,λ ) = 0 for all feasible weight vector λ and IMSE(ξ,β) is a convex function of weight vector. Remark 1. The proof of (1) (2) in Theorem 3 can also be derived by the method of Lagrange multipliers. The results in Theorem 3 provides a useful gateway to design an effective algorithm for finding the optimal weights given the design points. 4.2 Multiplicative Algorithm for Optimal Weights Base on Theorem 3 for given design points, the optimal weight vector that aims at minimizing IMSE(ξ,β) should be the solution of a system of equations as tr(i(ξ λ ) 1 A) = w(x i )g(x i ) T I(ξ λ ) 1 AI(ξ λ ) 1 g(x i ), i = 1,...,n. (9) 16

17 The weight of a design point x i should be adjusted according to the values of the two sides of equation (9). Note that for any weight vector λ, tr(i(ξ λ ) 1 A) = n λ i w(x i )g(x i ) T I(ξ λ ) 1 AI(ξ λ ) 1 g(x i ). Then, our strategy of finding optimal weight would be, for a design point x i, if tr(i(ξ λ ) 1 A) < w(x i )g(x i ) T I(ξ λ ) 1 AI(ξ λ ) 1 g(x i ), the weight λ i of x i should be increased; if tr(i(ξ λ ) 1 A) > w(x i )g(x i ) T I(ξ λ ) 1 AI(ξ λ ) 1 g(x i ), the weight λ i of x i should be decreased. Thus, the ratio w(x i)g(x i ) T I(ξ λ ) 1 AI(ξ λ ) 1 g(x i ) tr(i(ξ λ ) 1 A) weight of a design point x i. would indicate a good adjustment of the current Encouraged by the promising performance of the multiplicative algorithm recently used in Goos and Jones [10], we consider a modified multiplicative algorithm to find optimal weights based on Theorem 3. The detailed algorithm is described in Algorithm 1 as follows. Algorithm 1 Step 1: Assign a random weight vector λ 0 = [λ 0 1,...,λ 0 n] T, and k = 0. Step 2: For each design point x i, update the weight as λ k+1 i = λ k i = λ k i n ( λ k i n ) δ w(x i )g T (x i )I(ξ λk ) 1 AI(ξ λk ) 1 g(x i ) tr(i(ξ λk ) 1 A) ( ) δ w(xi )g T (x i )I(ξ λk ) 1 AI(ξ λk ) 1 g(x i ) tr(i(ξ λk ) 1 A) [ ] δ w(x i )g T (x i )I(ξ λk ) 1 AI(ξ λk ) 1 g(x i ) [ w(xi )g T (x i )I(ξ λk ) 1 AI(ξ λk ) 1 g(x i ) ], (10) δ λ k i Step 3: Set k = k +1 and go to Step 2 until convergence. In Algorithm 1, the value of 0 < δ < 1 is the convergence parameter chosen by the user. Following the suggestion by Fellman [8] and Torsney [9], we use δ = 1. The following 2 lemma provides a justification that every iteration in Algorithm 1 is always feasible. 17

18 Lemma 5. A positive definite A = [ ] 2dFIMSE Ω g(x)gt dh (x) 1 would ensure the iteration dη in equation (10) to be always feasible, i.e., the denominator is always positive. n λ k i [ ] δ w(x i )g T (x i )I(ξ λk ) 1 AI(ξ λk ) 1 g(x i ) Proof. Given A and I(ξ λk ) are positive definite, then ] 0 < tr(i(ξ λk ) 1 A) = tr [I(ξ λk )I(ξ λk ) 1 AI(ξ λk ) 1 [ n ] = tr λ k i w(x i)g(x i )g(x i ) T I(ξ λk ) 1 AI(ξ λk ) 1 = = n ] λ k i tr [w(x i )g(x i )g(x i ) T I(ξ λk ) 1 AI(ξ λk ) 1 n λ k i w(x i)g(x i ) T I(ξ λk ) 1 AI(ξ λk ) 1 g(x i ). Thus, there exists some x i, such that λ k iw(x i )g(x i ) T I(ξ λk ) 1 AI(ξ λk ) 1 g(x i ) > 0, and naturally, λ k i w(x i )g(x i ) T I(ξ λk ) 1 AI(ξ λk ) 1 g(x i )) ( δ > 0, which leads to n λ k i [ w(x i )g T (x i )I(ξ λk ) 1 AI(ξ λk ) 1 g(x i )] δ > 0. Remark 2. Note that the matrix A in Lemma 5 is always positive semi-definite and does not depend on the choice of design. When a singular A is observed in the first step, one can choose the basis functions g 1,...,g p carefully so that they are (nearly) orthogonal functions [ ] 2F under the weight function IMSE. By doing this, one can ensure the matrix A to dh 1 dη be positive definite. In general, the Gram-Schmidt orthogonalization [18] can be used to construct uni/multi-variate orthogonal basis functions. The Favard s theorem [6] could be applied to construct uni-variate orthogonal polynomials, and it is straightforward to prove that when the input variables x 1,...,x d are independent, i.e. F IMSE is separable, the multivariate orthogonal polynomials can be constructed using the tensor product of uni-variate orthogonal polynomials. 18

19 The following result provides the convergence property of Algorithm 1. Proposition 1 (Convergence of Algorithm 1). Given the design points x 1,...,x n fixed, the weight vector λ k obtained from Algorithm 1 converges to the optimal weight vector λ that minimizes I-optimality IMSE(ξ λ,β), as k. The proof of Proposition 1 will mainly based on the results in Theorems 1 and 2 in the paper by Yu [35]. We first require the following lemma before laying out the proof. Lemma 6. Define ϕ(i) = IMSE(ξ λ,β) = tr(i 1 A) as a function of Fisher information matrix I, then (a) ϕ(i) is a strictly convex function of I. (b) ϕ(i) is a decreasing function of I. Proof. The proof of (a) is very similar to that of Lemma 2, and is omitted here. For (b), it is straightforward to see that ϕ(i) is decreasing since 0 < I(ξ λ 1 ) I(ξ λ 2 ) I(ξ λ 1 ) 1 I(ξ λ 2 ) 1 ϕ(i 1 ) ϕ(i 2 ). Proof of Proposition 1. BasedonLemma6, itisknownthatϕ(i) = IMSE(ξ λ,β) = tr(i 1 A) is a convex and decreasing function of I. Under Algorithm 1 with 0 < δ < 1, denote λ k and λ k+1 to be the solutions on kth and (k+1)th iteration of the multiplicative algorithm, respectively. Since I(ξ λk ),I(ξ λk+1 ) and A are positive definite, it is easy to see that I(ξ λk ) > 0, I(ξ λk+1 ) > 0, and I(ξ λk ) 1 AI(ξ λk ) 1 > 0, where I(ξ λk ) 1 AI(ξ λk ) 1 is a continuous function for I. Based on Theorem 1 in [35], we can have tr(i(ξ λk+1 ) 1 A) tr(i(ξ λk ) 1 A) when λ k+1 λ k. Thus it shows the monotonicity of Algorithm 1. 19

20 Moreover, we also have that for ϕ(i) = IMSE(ξ λ,β) = tr(i 1 A), w(x i )g(x i )g T (x i ) ϕ(i) I = w(x I=I(ξ λ k ) i)g(x i )g T (x i )I(ξ λk ) 1 AI(ξ λk ) 1 0. For the sequence I(ξ λk ),k = 1,2,... from Algorithm 1, its limit point is obviously nonsingular because of the positive definiteness. Combining the above statements with results in Lemma 6, all the required conditions in Theorem 2 of Yu [35] are satisfied. Thus, using the results in Theorem 2 of Yu [35], we have all limit points of λ k are global minims of IMSE(ξ λ,β), and the IMSE(ξ λk,β) decreases monotonically to inf λ IMSE(ξλ,β) as k. 5 Sequential Algorithm of Constructing I-Optimal Design Recall that the General Equivalent Theorem in Section 3 provides some insightful guidelines on sequential selection of design points. In combination with finding optimal weights for fixed design points in Section 4, we propose an efficient sequential algorithm to construct the I-optimal design of generalized linear models. The details of the proposed sequential algorithm is summarized in Algorithm 2. Algorithm 2 Step 1: Choose an initial design D 0 containing vertexes of Ω and set r = 0. Step 2: Generate an N points Sobol sequence S and form the candidate pool C = S D 0. Step 3: Obtain optimal weights of design points using Algorithm 1. Step 4: Add the point x r = argmin φ(x,ξ r ) to the current design if φ(x r,ξr ) < ǫ, where x C φ(x,ξ r ) is the directional derivative expressed as φ(x,ξ r ) = tr(i(ξ r ) 1 A) w(x)g(x) T I(ξ r ) 1 AI(ξ r ) 1 g(x). Step 5: Set r = r +1, and go to Step 2 until convergence. In Algorithm 2, The value of ǫ > 0 is chosen to be a small positive value. Note that the proposed sequential algorithm of constructing I-optimal design for GLMs does not 20

21 require the computation of Hessian matrix as in Newton-Raphson methods. Thus it can be computationally efficient than the conventional methods. Also the sequential nature of the proposed algorithm enables efficient search of optimal weights without updating weights for all candidates points. Moreover, we also establish the convergence property of the proposed sequential algorithm as follows. Theorem 4 (Convergence of Algorithm 2). Assuming the candidate pool C is large enough, i.e., C Ω, or it includes all the support points in I-optimal design, the design constructed by Algorithm 2 converges to I-optimal design ξ that minimizes IMSE(ξ,β), as r, i.e., lim r IMSE(ξr,β) = IMSE(ξ,β). Proof. We will establish the argument by proof of contradiction. Assume that the ξ r does not converge to ξ, i.e., lim r IMSE(ξr,β) IMSE(ξ,β) > 0. Since the support set of the r-th iteration is a subset of support set of the (r + 1)-th iteration, it is obvious that IMSE(ξ r+1,β) IMSE(ξ r,β) for all r 0. Thus, there exists some a > 0, such that IMSE(ξ r,β) > IMSE(ξ,β)+a, for all r. Consider a design ξ = (1 α)ξ r +αξ, with 0 α 1, then φ(ξ,ξ r IMSE((1 α)ξ r +αξ,β) IMSE(ξ r,β) ) = lim α 0 α (1 α)imse(ξ r,β)+αimse(ξ,β) IMSE(ξ r,β) lim α 0 α = IMSE(ξ,β) IMSE(ξ r,β) < a, (11) where the inequality above is provided by the convexity of IMSE. Note that in each iteration, the new design point x r is chosen as x r = argmin φ(x,ξ r ). Thus, x C tr [ w(x r )g(x r )g(x r )T I(ξ r ) 1 AI(ξ r ) 1] = suptr [ w(x)g(x)g(x) T I(ξ r ) 1 AI(ξ r ) 1]. x C 21

22 When C Ω, or it contains all the support points in I-optimal design ξ, suptr [ w(x)g(x)g(x) T I(ξ r ) 1 AI(ξ r ) 1] tr [ w(x i )g(x i )g(x i ) T I(ξ r ) 1 AI(ξ r ) 1], x C with x i as any support point in ξ. As a result, tr [ w(x r )g(x r )g(x r )T I(ξ r ) 1 AI(ξ r ) 1] tr [ w(x i )g(x i )g(x i ) T I(ξ r ) 1 AI(ξ r ) 1], with x i as any support point in ξ. Denote the number of support points in optimal design ξ as n, then, by Theorem 1, φ(ξ,ξ r ) = tr [ I(ξ r ) 1 A) ] tr [ I(ξ )I(ξ r ) 1 AI(ξ r ) 1] = tr [ I(ξ r ) 1 A) ] tr [ n ] λ iw(x i )g(x i )g(x i ) T I(ξ r ) 1 AI(ξ r ) 1 = tr [ I(ξ r ) 1 A) ] n λ i tr[ w(x i )g(x i )g(x i ) T I(ξ r ) 1 AI(ξ r ) 1] n tr [ I(ξ r ) 1 A) ] λ i tr [ w(x r)g(x r)g(x r) T I(ξ r ) 1 AI(ξ r ) 1] = tr [ I(ξ r ) 1 A) ] tr [ w(x r)g(x r)g(x r) T I(ξ r ) 1 AI(ξ r ) 1] = φ(x r,ξr ). Together with inequality (11), we have φ(x r,ξ r ) < a. It is quite obvious from the derivation of the directional derivative in (4)) that IMSE((1 γ)ξ + γξ,β) is infinitely differentiable with respect to γ [0,1]. Thus, the second order derivative is bounded on γ, and denote the upper bound by U, with U > 0 as IMSE is a convex function. Then, consider another update strategy as ξ r+1 = (1 γ)ξ r +γx r, with γ [0,1], then since our algorithm achieves optimal weights in each iteration, we have, IMSE(ξ r+1,β) IMSE( ξ r+1,β). 22

23 Using the Taylor expansion of IMSE( ξ r+1,β), we have IMSE(ξ r+1,β) IMSE( ξ r+1,β) = IMSE(ξ r,β)+γφ(x r,ξ r )+ 1 2 γ2 2 IMSE((1 γ)ξ r +γx r,β) γ 2 < IMSE(ξ r,β) γa+ 1 2 γ2 U, γ=γ where γ is some value between 0 and 1. Consequently, we have IMSE(ξ r+1,β) IMSE(ξ r,β) < 1 ( 2 U γ a ) 2 a 2 U 2U. If a 1 U a, choose γ = a, then we have U U IMSE(ξ r+1,β) IMSE(ξ r,β) < a2 2U. If a U > 1 U < a, choose γ = 1, then we have IMSE(ξ r+1,β) IMSE(ξ r,β) < 1 U a < 0. 2 Both situations will lead to lim r IMSE(ξ r,β) =, which contradicts with IMSE(ξ r,β) > 0. In summary, the assumption that ξ r does not converge to ξ is not valid, and thus we prove that ξ r converges to ξ. Wewould like to point out that thechoice of candidatepool C would affect theefficiency of Algorithm 2. Based on the convergence property of the algorithm, we choose the Sobol sequence [22] as the candidate pool. The Sobol sequence is a space filling design that covers the experimental domain Ω better and is efficient when the dimension of the input variables is high. To further improve the efficiency of the algorithm, a search strategy inspired by StufkenandYang[34]couldbeemployed. OnecanstartwithamoresparseSobolsequence, and achieve the current best design using Algorithm 2. Then, we can further create denser and denser candidate pool in the neighborhood of the support points in the current best design until there is no further improvement under the I-optimality criterion. 23

24 6 Numerical Examples In this section, we will conduct several numerical examples to evaluate the performance of the proposed sequential algorithm (Algorithm 2) of constructing I-optimal designs for GLMs. Specifically, we will focus on the feasibility of our algorithm through examples of logistic regression model with binary response. The proposed algorithm surely is applicable for I-optimality for other GLMs such as the Poisson regression. Assume that thedomainofd dimensional input variablex = [x 1,...,x d ]isstandardized to be a unit hypercube [ 1,1] d. With p basis functions g(x) = [g 1 (x),...g p (x)] T and regression coefficients β = [β 0,...,β p 1 ] T, the logistic regression model with binary response Y {0,1} is defined as: Given a design the I-optimality criterion is with A = [ 1,1] d g(x)g(x) T Prob(Y = 1 x) = eβt g(x) 1+e βt g(x). x 1,..., x n ξ = λ 1,..., λ n, IMSE(ξ,β) = tr ( AI(ξ) 1), e2βt g(x) dx, and I(ξ) = n (1+e βt g(x) ) 4 λ i e βt g(x i ) (1+e βt g(x i) ) 2 g(x i )g(x i ) T. 6.1 Univariate Logistic Model with Two Parameters Consider a univariate logistic model with two parameters β 0 and β 1, the model could be expressed as: Prob(Y = 1 x) = eβ 0+β 1 x 1+e β 0+β 1 x. For a univariate logistic model with two parameters, i.e. d = 1 and g = [1,x] T, we can define an induced variable as c = β 0 + β 1 x, and the range of c as [D 1,D 2 ]. Yang and Stufken[33] theoretically proved that the optimal design may have two support points and the location of the support points in the form of induced variable c are: symmetric, if D 1 = D 2 ; 24

25 symmetric, or D 1 with another point in ( D 1,D 2 ], if D 1 < 0 < D 2, and D 1 < D 2 ; symmetric, or D 2 with another point in [D 1,D 2 ), if D 1 < 0 < D 2, and D 1 > D 2 ; D 2 with another point, if D 2 0 D 1 with another point, if D 1 0; In the following numerical examples, we would like to verify Yang and Stufken s results using our proposed algorithm. The proposed algorithm also provides the optimal weights of the support points. The numerical examples of interest include: (a) β 0 = 0,β 1 = 2, and the range of induced variable c = x is [ 2,2]. (b) β 0 = 0.2,β 1 = 1.6, and the range of induced variable c = x is [ 1.4,1.8] (c) β 0 = 0.27, β 1 = 1.12, andtherangeofinducedvariablec = xis[ 0.85,1.39]. (d) β 0 = 1,β 1 = 0.9, and the range of induced variable c = 1 0.9x is [ 1.9, 0.1]. (e) β 0 = 2,β 1 = 1.9, and the range of induced variable c = 2+1.9x is [0.1,3.9]. According to Yang and Stufken s result[33], the I-optimal design should have two points and the support points in form of induced variable c = β 0 +β 1 x of the five scenarios should be: (a) symmetric. (b) symmetric, or 1.4 and another point in (1.4, 1.8]. (c) symmetric, or 0.85 and another point in (0.85, 1.39]. (d) 0.1 and another point. (e) 0.1 and another point. 25

26 In the implementation of Algorithm 2 for the above examples, the candidate pool consists of N = 2 14 Sobol points and the two interval ends 1 and 1. Table 1 shows the exact locations and weights of support points and the computational time (in seconds) of our algorithm. Table 1: Local I-Optimal Design of Univariate Logistic Model with Two Parameters β 0 = 0 β 0 = 0.2 β 0 = 0.27 β 1 = 2 β 1 = 1.6 β 1 = 1.12 Support Points ( , ) ( , ) ( 1, ) Values of Induced Variable ( , ) ( , ) ( 0.85, 1.2) Weights (0.4960, ) (0.4731, ) (0.4776, ) Computational Time (in seconds) β 0 = 1 β 0 = 2 β 1 = 0.9 β 1 = 1.9 Support Points ( 1, 1) ( 1, ) Values of Induced Variable ( 1.9, 0.1) (0.1, ) Weights (0.5051, ) (0.4364, ) Computational Time (in seconds) From the results in Table 1, we can see that the support points computed using our algorithm agree with those derived theoretically by Yang and Stufken[33]. Moreover, the convergence of the proposed algorithm is fairly fast. 6.2 Multivariate Logistic Model We further consider the logistic model with multiple input variable as linear predictor to explore the computation efficiency of the proposed algorithm in terms of the dimension of predictors. The multivariate logistic model with binary response Y {0, 1} is expressed as: 26

27 eβ0+ d β i x i Prob(Y = 1 x) =. 1+e β d 0+ β i x i As mentioned in Section 3, the I-optimal designs are not necessarily unique. The following simple example with d = 2, i.e., β = [0,2,2] T, is to illuminate this point. Using the proposed sequential algorithm, we obtain an I-optimal design as: ( 1, 1), (0.2915, 1), (1, ) ξ 1 = , , whose optimality could be proved numerically using Theorem 2 (GET). By symmetry, one, can easily obtain a different I-optimal design as (1, 1), ( 1, ), ( , 1) ξ 2 = , , Actually, there are infinitely many I-optimal designs in this example, as any convex linear combination of ξ 1 and ξ 2 would yield to an I-optimal design for this example. Now we study the computational efficiency of the proposed sequential algorithm in the case of d = 2,5 and 10 predictors with coefficients as (a) d = 2, β = [2,1, 2.5] T. (b) d = 5, β = [0.5,1.6, 2.5,2, 1.8] T. (c) d = 10, β = [0.5,1.6, 2.5,2, 1.8,4, 2.1, 1.6,2.2,2.5, 2] T. In the implementation of Algorithm 2 for the above examples, the candidate pool consists of , , candidate points, respectively. The computation time of the proposed algorithm for three examples are given in Table 6.2. Table 2: Computation time (in seconds) for logistic models with different input variable dimension d = 2 d = 5 d = 10 Candidate Pool Size N Computational Time (in seconds)

28 From the results in Table 2, the computation time of the proposed algorithm does not increasedramaticallyasdincreases, especially fromd = 5tod = 10. Wewouldlike topoint out that such an observation does not necessarily imply that the dimension of predictors have no impact on the complexity of our algorithm. Since the proposed algorithm updates the weights only on a small size of design points instead of the whole candidate pool, the size of the candidate pool does not play a significant role in computation time. With that being said, the dimension of predictors would influence the computational time to some extent, but not as much as the traditional multiplicative algorithm which requires updating the weights on a huge candidate pool. 7 Discussion In this work, we have developed an efficient sequential algorithm of constructing I-optimal designs for the generalized linear models. By establishing the General Equivalence Theorem of the I-optimality for GLMs, we have obtained an insightful understanding for the proposed algorithm on how to sequentially choose the support point and update the weights of support points of the design. The proposed algorithm is computationally efficient with guaranteed convergence property. Moreover, all the computations in the proposed algorithm are explicit and simple to code by embracing the support from theory. The proposed method exemplifies a good case on the integration of theory and computation to advance the development of new statistical methodology. It is worth pointing out that the proposed sequential algorithm (Algorithm 2) is not restricted to the I-optimality for constructing optimal designs. It can be easily extended to other optimality criteria as long as the directional derivative φ(ξ,ξ) in (3) exists. Take an example of φ(ξ,ξ) = p tr(i 1 (ξ)i(ξ )) for D-optimality, where p is the dimension of the regression coefficient vector β. Then, at Step 4 on the rth iteration in Algorithm 2, the candidate point that minimizes single direction derivative φ(x,ξ r ) = p w(x)g(x) T I(ξ r ) 1 g(x) will be added to the current design ξ r, if the minimal is negative. The optimal weights for 28

29 the current design ξ r could be achieved using Algorithm 4.2 with updated rule in the kth iteration as λ k+1 i = λ k i [ ] δ w(x i )g T (x i )I(ξ λk ) 1 g(x i ) n [ w(xi )g T (x i )I(ξ λk ) 1 g(x i ) ], i = 1,...,n. δ λ k i A thorough investigation of extending the proposed algorithm for other optimality criteria will be reported elsewhere in the future. Note that this work focuses on the local I-optimal designs for the generalized linear models, which depends on given regression coefficients. To alleviate this limitation, we will further investigate the robust I-optimal design for the GLMs. One possibility is to establish a tight upper bound for the integrated mean squared error in Lemma 1 to relax the dependency on the regression coefficients. Then we can modify the proposed algorithm in search of optimal designs based on the robust optimality criterion constructed according to the upper bound. Hickernell and Liu [5] developed some theoretical results in a similar direction for linear regression models under model uncertainty, and Li and Hickernell [30] extended the results to linear regression models with gradient information. Another interesting direction for the future work is to construct I-optimal designs for the models with both quantitative and qualitative responses [29], which is commonly occurred in the manufacturing and biomedical systems. Since the GLMs include both the linear regression for continuous response and the logistic regression for binary response, it would be interesting to study the I-optimal designs under the consideration of jointly modeling both quantitative and qualitative responses. References [1] Bernstein, D. S. (2009), Matrix Mathematics (2nd Edition), Princeton University Press. [2] Box, G. E. P. and Draper, N. R. (1959), A Basis for the Selection of a Response Surface Design, Journal of the American Statistical Association, 54,

30 [3] Chernoff, H. (1999), Gustav Elfving s Impact on Experimental Design, Statistical Science, 14, [4] Dette, H. and Melas, V. B. (2011), A Note on the de la Garza Phenomenon for Locally Optimal Designs, The Annals of Statistics, 39, [5] F J Hickernell, Liu, M. (2002), Uniform designs limit aliasing, Biometrika, 89, [6] Favard, J. (1935), Sur les polynomes de Tchebicheff, Comptes Rendus de l Académie des Sciences, 200, [7] Fedorov, V. V. (1972), Theory of Optimal Experiments, New York: Academic Press. [8] Fellman, J. (1974), On the Allocation of Linear Observations, Societas Scientiarum Fennica. [9] Fiacco, A. V. and Kortanek, K. O. (eds.) (1983), A Moment Inequality and Monotonicity of an Algorithm. [10] Goos, P., Jones, B., and Syafitri, U. (2016), I-Optimal Design of Mixture Experiments, Journal of the American Statistical Association, 111, [11] Haines, L. M. (1987), The Application of the Annealing Algorithm to the Construction of Exact Optimal Designs for Linear-Regression Models, Technometrics, 29, [12] Khuri, A. I., Mukherjee, B., Sinha, B. K., and Ghosh, M. (2006), Design Issues for Generalized Linear Models, Technometrics, 48, [13] Kiefer, J. (1974), General Equivalence Theorey of Optimum Designs (Approximate Theory), Annals of Statistics, 2, [14] Kiefer, J. and Wolfowitz, J. (1960), The Equivalence of Two Extremum Problems, Canadian Journal of Mathematics, 12,

31 [15] Meyer, R. K. and Nachtsheim, C. J. (1988), Simulated Annealing in the Construction of Exact Optimal Design of Experiments, American Journal of Mathematical and Management Science, 8. [16] (1995), The Coordinate-Exchange Algorithm for Constructing Exact Optimal Experimental Designs, Technometrics, 37, [17] Nelder, J. and Wedderburn, R. (1972), Generalized Linear Models, Journal of Royal Statistical Society-Series B, 135, [18] Pursell, L. and Trimble, S. T. (1991), Gram-Schmidt Orthogonalization by Gauss Elimination, The American Mathematical Monthly, 98, [19] Schein, A. I. and Ungar, L. H. (2007), Active Learning for Logistic Regression: An Evaluation, Machine Learning, 68, [20] Silvey, S. D. (2013), Optimal design: an introduction to the theory for parameter estimation, vol. 1, Springer Science & Business Media. [21] Silvey, S. D., Titterington, D. M., and Torsney, B. (1978), An algorithm for optimal designs on a finite design space, Communications in Statistics - Theory and Methods, 14, [22] Sobol, I. M. (1967), On the Distribution of Points in a Cube and the Approximate Evaluation of Integrals, USSR Computational Mathematics and Mathematical Physics, 7, [23] Titterington, D. M. (1978), Estimation of Correlation Coefficients by Ellipsoidal Trimming, Applied Statistics, 27, [24] White, L. V. (1973), An Extention of the General Equivalence Theorem to Nonlinear Models, Biometrika, 60, [25] Whittle, P. (1973), Some General Points in the Theory of Optimal Experimental Design, Journal of Royal Statistical Society-Series B, 35,

32 [26] Wu, H. and Stufken, J. (2014), Locally φ p -optimal designs for generalized linear models with a single-variable quadratic polynomial predictor, Biometrika, 101, [27] Wynn, H. (1970), The Sequential Generation of D-optimum Experimental Designs, Annals of Mathematical Statistics, A 7, [28] Wynn, H. P. (1972), Results in the Theory and Construction of D-Optimum Experimental Designs, Journal of Royal Statistical Society-Series B, 34, [29] X Deng, Jin, R. (2015), QQ Models: Joint Modeling for Quantitative and Qualitative Quality Responses in Manufacturing Systems, Technometrics, 57, [30] Y, Y. L. and Hickernell, F. J. (2014), Design of experiments for linear regression models when gradient information is available, Journal of Statistical Planning and Inference, 144, [31] Yang, M. (2010), On the de la Garza Phenomenon, The Annals of Statistics, 38, [32] Yang, M., Biedermann, S., and Tang, E. (2013), On Optimal Designs for Nonlinear Models: A general and Efficient Algorithm, Journal of the American Statistical Association, 108, [33] Yang, M. and Stufken, J. (2009), Support points of locally optimal designs for nonlinear models with two parameters, The Annals of Statistics, 37, [34] (2012), Identifying Locally Optimal Designs for Nonlinear Models: A Simple Extension with Profound Consequences, The Annals of Statistics, 40, [35] Yu, Y. (2010), Monotonic Convergence of a General Algorithm for Computing Optimal Designs, The Annals of Statistics, 38,

OPTIMAL DESIGNS FOR GENERALIZED LINEAR MODELS WITH MULTIPLE DESIGN VARIABLES

OPTIMAL DESIGNS FOR GENERALIZED LINEAR MODELS WITH MULTIPLE DESIGN VARIABLES Statistica Sinica 21 (2011, 1415-1430 OPTIMAL DESIGNS FOR GENERALIZED LINEAR MODELS WITH MULTIPLE DESIGN VARIABLES Min Yang, Bin Zhang and Shuguang Huang University of Missouri, University of Alabama-Birmingham

More information

By Min Yang 1 and John Stufken 2 University of Missouri Columbia and University of Georgia

By Min Yang 1 and John Stufken 2 University of Missouri Columbia and University of Georgia The Annals of Statistics 2009, Vol. 37, No. 1, 518 541 DOI: 10.1214/07-AOS560 c Institute of Mathematical Statistics, 2009 SUPPORT POINTS OF LOCALLY OPTIMAL DESIGNS FOR NONLINEAR MODELS WITH TWO PARAMETERS

More information

A-optimal designs for generalized linear model with two parameters

A-optimal designs for generalized linear model with two parameters A-optimal designs for generalized linear model with two parameters Min Yang * University of Missouri - Columbia Abstract An algebraic method for constructing A-optimal designs for two parameter generalized

More information

D-optimal Designs for Factorial Experiments under Generalized Linear Models

D-optimal Designs for Factorial Experiments under Generalized Linear Models D-optimal Designs for Factorial Experiments under Generalized Linear Models Jie Yang Department of Mathematics, Statistics, and Computer Science University of Illinois at Chicago Joint research with Abhyuday

More information

On Multiple-Objective Nonlinear Optimal Designs

On Multiple-Objective Nonlinear Optimal Designs On Multiple-Objective Nonlinear Optimal Designs Qianshun Cheng, Dibyen Majumdar, and Min Yang December 1, 2015 Abstract Experiments with multiple objectives form a staple diet of modern scientific research.

More information

AP-Optimum Designs for Minimizing the Average Variance and Probability-Based Optimality

AP-Optimum Designs for Minimizing the Average Variance and Probability-Based Optimality AP-Optimum Designs for Minimizing the Average Variance and Probability-Based Optimality Authors: N. M. Kilany Faculty of Science, Menoufia University Menoufia, Egypt. (neveenkilany@hotmail.com) W. A. Hassanein

More information

A new algorithm for deriving optimal designs

A new algorithm for deriving optimal designs A new algorithm for deriving optimal designs Stefanie Biedermann, University of Southampton, UK Joint work with Min Yang, University of Illinois at Chicago 18 October 2012, DAE, University of Georgia,

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

QUADRATIC MAJORIZATION 1. INTRODUCTION

QUADRATIC MAJORIZATION 1. INTRODUCTION QUADRATIC MAJORIZATION JAN DE LEEUW 1. INTRODUCTION Majorization methods are used extensively to solve complicated multivariate optimizaton problems. We refer to de Leeuw [1994]; Heiser [1995]; Lange et

More information

Optimal and efficient designs for Gompertz regression models

Optimal and efficient designs for Gompertz regression models Ann Inst Stat Math (2012) 64:945 957 DOI 10.1007/s10463-011-0340-y Optimal and efficient designs for Gompertz regression models Gang Li Received: 13 July 2010 / Revised: 11 August 2011 / Published online:

More information

Some Nonregular Designs From the Nordstrom and Robinson Code and Their Statistical Properties

Some Nonregular Designs From the Nordstrom and Robinson Code and Their Statistical Properties Some Nonregular Designs From the Nordstrom and Robinson Code and Their Statistical Properties HONGQUAN XU Department of Statistics, University of California, Los Angeles, CA 90095-1554, U.S.A. (hqxu@stat.ucla.edu)

More information

A PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES

A PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES IJMMS 25:6 2001) 397 409 PII. S0161171201002290 http://ijmms.hindawi.com Hindawi Publishing Corp. A PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES

More information

Optimal designs for estimating the slope in nonlinear regression

Optimal designs for estimating the slope in nonlinear regression Optimal designs for estimating the slope in nonlinear regression Holger Dette Ruhr-Universität Bochum Fakultät für Mathematik 44780 Bochum, Germany e-mail: holger.dette@rub.de Viatcheslav B. Melas St.

More information

Optimum designs for model. discrimination and estimation. in Binary Response Models

Optimum designs for model. discrimination and estimation. in Binary Response Models Optimum designs for model discrimination and estimation in Binary Response Models by Wei-Shan Hsieh Advisor Mong-Na Lo Huang Department of Applied Mathematics National Sun Yat-sen University Kaohsiung,

More information

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained

More information

Self-Concordant Barrier Functions for Convex Optimization

Self-Concordant Barrier Functions for Convex Optimization Appendix F Self-Concordant Barrier Functions for Convex Optimization F.1 Introduction In this Appendix we present a framework for developing polynomial-time algorithms for the solution of convex optimization

More information

OPTIMAL DESIGNS FOR 2 k FACTORIAL EXPERIMENTS WITH BINARY RESPONSE

OPTIMAL DESIGNS FOR 2 k FACTORIAL EXPERIMENTS WITH BINARY RESPONSE 1 OPTIMAL DESIGNS FOR 2 k FACTORIAL EXPERIMENTS WITH BINARY RESPONSE Jie Yang 1, Abhyuday Mandal 2 and Dibyen Majumdar 1 1 University of Illinois at Chicago and 2 University of Georgia Abstract: We consider

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Optimization Tutorial 1. Basic Gradient Descent

Optimization Tutorial 1. Basic Gradient Descent E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.

More information

OPTIMAL DESIGNS FOR 2 k FACTORIAL EXPERIMENTS WITH BINARY RESPONSE

OPTIMAL DESIGNS FOR 2 k FACTORIAL EXPERIMENTS WITH BINARY RESPONSE Statistica Sinica 26 (2016), 385-411 doi:http://dx.doi.org/10.5705/ss.2013.265 OPTIMAL DESIGNS FOR 2 k FACTORIAL EXPERIMENTS WITH BINARY RESPONSE Jie Yang 1, Abhyuday Mandal 2 and Dibyen Majumdar 1 1 University

More information

Convex Functions and Optimization

Convex Functions and Optimization Chapter 5 Convex Functions and Optimization 5.1 Convex Functions Our next topic is that of convex functions. Again, we will concentrate on the context of a map f : R n R although the situation can be generalized

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete

More information

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.

More information

Optimal Designs for 2 k Experiments with Binary Response

Optimal Designs for 2 k Experiments with Binary Response 1 / 57 Optimal Designs for 2 k Experiments with Binary Response Dibyen Majumdar Mathematics, Statistics, and Computer Science College of Liberal Arts and Sciences University of Illinois at Chicago Joint

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

On construction of constrained optimum designs

On construction of constrained optimum designs On construction of constrained optimum designs Institute of Control and Computation Engineering University of Zielona Góra, Poland DEMA2008, Cambridge, 15 August 2008 Numerical algorithms to construct

More information

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Components of a linear model The two

More information

STAT5044: Regression and Anova

STAT5044: Regression and Anova STAT5044: Regression and Anova Inyoung Kim 1 / 15 Outline 1 Fitting GLMs 2 / 15 Fitting GLMS We study how to find the maxlimum likelihood estimator ˆβ of GLM parameters The likelihood equaions are usually

More information

D-optimal designs for logistic regression in two variables

D-optimal designs for logistic regression in two variables D-optimal designs for logistic regression in two variables Linda M. Haines 1, M. Gaëtan Kabera 2, P. Ndlovu 2 and T. E. O Brien 3 1 Department of Statistical Sciences, University of Cape Town, Rondebosch

More information

D-Optimal Designs for Second-Order Response Surface Models with Qualitative Factors

D-Optimal Designs for Second-Order Response Surface Models with Qualitative Factors Journal of Data Science 920), 39-53 D-Optimal Designs for Second-Order Response Surface Models with Qualitative Factors Chuan-Pin Lee and Mong-Na Lo Huang National Sun Yat-sen University Abstract: Central

More information

1 Computing with constraints

1 Computing with constraints Notes for 2017-04-26 1 Computing with constraints Recall that our basic problem is minimize φ(x) s.t. x Ω where the feasible set Ω is defined by equality and inequality conditions Ω = {x R n : c i (x)

More information

Some Approximations of the Logistic Distribution with Application to the Covariance Matrix of Logistic Regression

Some Approximations of the Logistic Distribution with Application to the Covariance Matrix of Logistic Regression Working Paper 2013:9 Department of Statistics Some Approximations of the Logistic Distribution with Application to the Covariance Matrix of Logistic Regression Ronnie Pingel Working Paper 2013:9 June

More information

The properties of L p -GMM estimators

The properties of L p -GMM estimators The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion

More information

Compound Optimal Designs for Percentile Estimation in Dose-Response Models with Restricted Design Intervals

Compound Optimal Designs for Percentile Estimation in Dose-Response Models with Restricted Design Intervals Compound Optimal Designs for Percentile Estimation in Dose-Response Models with Restricted Design Intervals Stefanie Biedermann 1, Holger Dette 1, Wei Zhu 2 Abstract In dose-response studies, the dose

More information

Max Margin-Classifier

Max Margin-Classifier Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization

More information

Convergence Rate of Expectation-Maximization

Convergence Rate of Expectation-Maximization Convergence Rate of Expectation-Maximiation Raunak Kumar University of British Columbia Mark Schmidt University of British Columbia Abstract raunakkumar17@outlookcom schmidtm@csubcca Expectation-maximiation

More information

Designs for Generalized Linear Models

Designs for Generalized Linear Models Designs for Generalized Linear Models Anthony C. Atkinson David C. Woods London School of Economics and Political Science, UK University of Southampton, UK December 9, 2013 Email: a.c.atkinson@lse.ac.uk

More information

An Alternative Method for Estimating and Simulating Maximum Entropy Densities

An Alternative Method for Estimating and Simulating Maximum Entropy Densities An Alternative Method for Estimating and Simulating Maximum Entropy Densities Jae-Young Kim and Joonhwan Lee Seoul National University May, 8 Abstract This paper proposes a method of estimating and simulating

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University UNDERDETERMINED LINEAR EQUATIONS We

More information

Logistic Regression. Seungjin Choi

Logistic Regression. Seungjin Choi Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

D-optimal Designs for Multinomial Logistic Models

D-optimal Designs for Multinomial Logistic Models D-optimal Designs for Multinomial Logistic Models Jie Yang University of Illinois at Chicago Joint with Xianwei Bu and Dibyen Majumdar October 12, 2017 1 Multinomial Logistic Models Cumulative logit model:

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

Closest Moment Estimation under General Conditions

Closest Moment Estimation under General Conditions Closest Moment Estimation under General Conditions Chirok Han and Robert de Jong January 28, 2002 Abstract This paper considers Closest Moment (CM) estimation with a general distance function, and avoids

More information

LECTURE NOTES ELEMENTARY NUMERICAL METHODS. Eusebius Doedel

LECTURE NOTES ELEMENTARY NUMERICAL METHODS. Eusebius Doedel LECTURE NOTES on ELEMENTARY NUMERICAL METHODS Eusebius Doedel TABLE OF CONTENTS Vector and Matrix Norms 1 Banach Lemma 20 The Numerical Solution of Linear Systems 25 Gauss Elimination 25 Operation Count

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

Optimal discrimination designs

Optimal discrimination designs Optimal discrimination designs Holger Dette Ruhr-Universität Bochum Fakultät für Mathematik 44780 Bochum, Germany e-mail: holger.dette@ruhr-uni-bochum.de Stefanie Titoff Ruhr-Universität Bochum Fakultät

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications. Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable

More information

Convergence Rates on Root Finding

Convergence Rates on Root Finding Convergence Rates on Root Finding Com S 477/577 Oct 5, 004 A sequence x i R converges to ξ if for each ǫ > 0, there exists an integer Nǫ) such that x l ξ > ǫ for all l Nǫ). The Cauchy convergence criterion

More information

17 Solution of Nonlinear Systems

17 Solution of Nonlinear Systems 17 Solution of Nonlinear Systems We now discuss the solution of systems of nonlinear equations. An important ingredient will be the multivariate Taylor theorem. Theorem 17.1 Let D = {x 1, x 2,..., x m

More information

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Machine Learning. Lecture 6: Support Vector Machine. Feng Li. Machine Learning Lecture 6: Support Vector Machine Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Warm Up 2 / 80 Warm Up (Contd.)

More information

Bregman Divergences for Data Mining Meta-Algorithms

Bregman Divergences for Data Mining Meta-Algorithms p.1/?? Bregman Divergences for Data Mining Meta-Algorithms Joydeep Ghosh University of Texas at Austin ghosh@ece.utexas.edu Reflects joint work with Arindam Banerjee, Srujana Merugu, Inderjit Dhillon,

More information

Efficient algorithms for calculating optimal designs in pharmacokinetics and dose finding studies

Efficient algorithms for calculating optimal designs in pharmacokinetics and dose finding studies Efficient algorithms for calculating optimal designs in pharmacokinetics and dose finding studies Tim Holland-Letz Ruhr-Universität Bochum Medizinische Fakultät 44780 Bochum, Germany email: tim.holland-letz@rub.de

More information

Worst case analysis for a general class of on-line lot-sizing heuristics

Worst case analysis for a general class of on-line lot-sizing heuristics Worst case analysis for a general class of on-line lot-sizing heuristics Wilco van den Heuvel a, Albert P.M. Wagelmans a a Econometric Institute and Erasmus Research Institute of Management, Erasmus University

More information

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X. Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

More information

How Good is a Kernel When Used as a Similarity Measure?

How Good is a Kernel When Used as a Similarity Measure? How Good is a Kernel When Used as a Similarity Measure? Nathan Srebro Toyota Technological Institute-Chicago IL, USA IBM Haifa Research Lab, ISRAEL nati@uchicago.edu Abstract. Recently, Balcan and Blum

More information

Optimal Power Control in Decentralized Gaussian Multiple Access Channels

Optimal Power Control in Decentralized Gaussian Multiple Access Channels 1 Optimal Power Control in Decentralized Gaussian Multiple Access Channels Kamal Singh Department of Electrical Engineering Indian Institute of Technology Bombay. arxiv:1711.08272v1 [eess.sp] 21 Nov 2017

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information

We describe the generalization of Hazan s algorithm for symmetric programming

We describe the generalization of Hazan s algorithm for symmetric programming ON HAZAN S ALGORITHM FOR SYMMETRIC PROGRAMMING PROBLEMS L. FAYBUSOVICH Abstract. problems We describe the generalization of Hazan s algorithm for symmetric programming Key words. Symmetric programming,

More information

Interior Point Methods for Mathematical Programming

Interior Point Methods for Mathematical Programming Interior Point Methods for Mathematical Programming Clóvis C. Gonzaga Federal University of Santa Catarina, Florianópolis, Brazil EURO - 2013 Roma Our heroes Cauchy Newton Lagrange Early results Unconstrained

More information

Lecture Note 5: Semidefinite Programming for Stability Analysis

Lecture Note 5: Semidefinite Programming for Stability Analysis ECE7850: Hybrid Systems:Theory and Applications Lecture Note 5: Semidefinite Programming for Stability Analysis Wei Zhang Assistant Professor Department of Electrical and Computer Engineering Ohio State

More information

u( x) = g( y) ds y ( 1 ) U solves u = 0 in U; u = 0 on U. ( 3)

u( x) = g( y) ds y ( 1 ) U solves u = 0 in U; u = 0 on U. ( 3) M ath 5 2 7 Fall 2 0 0 9 L ecture 4 ( S ep. 6, 2 0 0 9 ) Properties and Estimates of Laplace s and Poisson s Equations In our last lecture we derived the formulas for the solutions of Poisson s equation

More information

Modulation of symmetric densities

Modulation of symmetric densities 1 Modulation of symmetric densities 1.1 Motivation This book deals with a formulation for the construction of continuous probability distributions and connected statistical aspects. Before we begin, a

More information

Newton Method with Adaptive Step-Size for Under-Determined Systems of Equations

Newton Method with Adaptive Step-Size for Under-Determined Systems of Equations Newton Method with Adaptive Step-Size for Under-Determined Systems of Equations Boris T. Polyak Andrey A. Tremba V.A. Trapeznikov Institute of Control Sciences RAS, Moscow, Russia Profsoyuznaya, 65, 117997

More information

Conjugate-Gradient. Learn about the Conjugate-Gradient Algorithm and its Uses. Descent Algorithms and the Conjugate-Gradient Method. Qx = b.

Conjugate-Gradient. Learn about the Conjugate-Gradient Algorithm and its Uses. Descent Algorithms and the Conjugate-Gradient Method. Qx = b. Lab 1 Conjugate-Gradient Lab Objective: Learn about the Conjugate-Gradient Algorithm and its Uses Descent Algorithms and the Conjugate-Gradient Method There are many possibilities for solving a linear

More information

460 HOLGER DETTE AND WILLIAM J STUDDEN order to examine how a given design behaves in the model g` with respect to the D-optimality criterion one uses

460 HOLGER DETTE AND WILLIAM J STUDDEN order to examine how a given design behaves in the model g` with respect to the D-optimality criterion one uses Statistica Sinica 5(1995), 459-473 OPTIMAL DESIGNS FOR POLYNOMIAL REGRESSION WHEN THE DEGREE IS NOT KNOWN Holger Dette and William J Studden Technische Universitat Dresden and Purdue University Abstract:

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

More information

A note on profile likelihood for exponential tilt mixture models

A note on profile likelihood for exponential tilt mixture models Biometrika (2009), 96, 1,pp. 229 236 C 2009 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asn059 Advance Access publication 22 January 2009 A note on profile likelihood for exponential

More information

An introduction to Birkhoff normal form

An introduction to Birkhoff normal form An introduction to Birkhoff normal form Dario Bambusi Dipartimento di Matematica, Universitá di Milano via Saldini 50, 0133 Milano (Italy) 19.11.14 1 Introduction The aim of this note is to present an

More information

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines Nonlinear Support Vector Machines through Iterative Majorization and I-Splines P.J.F. Groenen G. Nalbantov J.C. Bioch July 9, 26 Econometric Institute Report EI 26-25 Abstract To minimize the primal support

More information

Some optimal criteria of model-robustness for two-level non-regular fractional factorial designs

Some optimal criteria of model-robustness for two-level non-regular fractional factorial designs Some optimal criteria of model-robustness for two-level non-regular fractional factorial designs arxiv:0907.052v stat.me 3 Jul 2009 Satoshi Aoki July, 2009 Abstract We present some optimal criteria to

More information

Optimal designs for rational regression models

Optimal designs for rational regression models Optimal designs for rational regression models Holger Dette, Christine Kiss Ruhr-Universität Bochum Fakultät für Mathematik 44780 Bochum, Germany holger.dette@ruhr-uni-bochum.de tina.kiss12@googlemail.com

More information

VISCOSITY SOLUTIONS. We follow Han and Lin, Elliptic Partial Differential Equations, 5.

VISCOSITY SOLUTIONS. We follow Han and Lin, Elliptic Partial Differential Equations, 5. VISCOSITY SOLUTIONS PETER HINTZ We follow Han and Lin, Elliptic Partial Differential Equations, 5. 1. Motivation Throughout, we will assume that Ω R n is a bounded and connected domain and that a ij C(Ω)

More information

Efficient computation of Bayesian optimal discriminating designs

Efficient computation of Bayesian optimal discriminating designs Efficient computation of Bayesian optimal discriminating designs Holger Dette Ruhr-Universität Bochum Fakultät für Mathematik 44780 Bochum, Germany e-mail: holger.dette@rub.de Roman Guchenko, Viatcheslav

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 1 Ridge Regression Ridge regression and the Lasso are two forms of regularized

More information

1 Problem Formulation

1 Problem Formulation Book Review Self-Learning Control of Finite Markov Chains by A. S. Poznyak, K. Najim, and E. Gómez-Ramírez Review by Benjamin Van Roy This book presents a collection of work on algorithms for learning

More information

Variable Objective Search

Variable Objective Search Variable Objective Search Sergiy Butenko, Oleksandra Yezerska, and Balabhaskar Balasundaram Abstract This paper introduces the variable objective search framework for combinatorial optimization. The method

More information

Numerical Optimization: Basic Concepts and Algorithms

Numerical Optimization: Basic Concepts and Algorithms May 27th 2015 Numerical Optimization: Basic Concepts and Algorithms R. Duvigneau R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 1 Outline Some basic concepts in optimization Some

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

FIXED POINT ITERATIONS

FIXED POINT ITERATIONS FIXED POINT ITERATIONS MARKUS GRASMAIR 1. Fixed Point Iteration for Non-linear Equations Our goal is the solution of an equation (1) F (x) = 0, where F : R n R n is a continuous vector valued mapping in

More information

3.10 Lagrangian relaxation

3.10 Lagrangian relaxation 3.10 Lagrangian relaxation Consider a generic ILP problem min {c t x : Ax b, Dx d, x Z n } with integer coefficients. Suppose Dx d are the complicating constraints. Often the linear relaxation and the

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

Lecture 5: LDA and Logistic Regression

Lecture 5: LDA and Logistic Regression Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant

More information

Numerical Analysis: Interpolation Part 1

Numerical Analysis: Interpolation Part 1 Numerical Analysis: Interpolation Part 1 Computer Science, Ben-Gurion University (slides based mostly on Prof. Ben-Shahar s notes) 2018/2019, Fall Semester BGU CS Interpolation (ver. 1.00) AY 2018/2019,

More information

Least Sparsity of p-norm based Optimization Problems with p > 1

Least Sparsity of p-norm based Optimization Problems with p > 1 Least Sparsity of p-norm based Optimization Problems with p > Jinglai Shen and Seyedahmad Mousavi Original version: July, 07; Revision: February, 08 Abstract Motivated by l p -optimization arising from

More information

UCLA Department of Statistics Papers

UCLA Department of Statistics Papers UCLA Department of Statistics Papers Title An Algorithm for Constructing Orthogonal and Nearly Orthogonal Arrays with Mixed Levels and Small Runs Permalink https://escholarship.org/uc/item/1tg0s6nq Author

More information

Minimax design criterion for fractional factorial designs

Minimax design criterion for fractional factorial designs Ann Inst Stat Math 205 67:673 685 DOI 0.007/s0463-04-0470-0 Minimax design criterion for fractional factorial designs Yue Yin Julie Zhou Received: 2 November 203 / Revised: 5 March 204 / Published online:

More information

6 Pattern Mixture Models

6 Pattern Mixture Models 6 Pattern Mixture Models A common theme underlying the methods we have discussed so far is that interest focuses on making inference on parameters in a parametric or semiparametric model for the full data

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Chapter 3 Numerical Methods

Chapter 3 Numerical Methods Chapter 3 Numerical Methods Part 2 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization 1 Outline 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization Summary 2 Outline 3.2

More information

c 2007 Society for Industrial and Applied Mathematics

c 2007 Society for Industrial and Applied Mathematics SIAM J. OPTIM. Vol. 18, No. 1, pp. 106 13 c 007 Society for Industrial and Applied Mathematics APPROXIMATE GAUSS NEWTON METHODS FOR NONLINEAR LEAST SQUARES PROBLEMS S. GRATTON, A. S. LAWLESS, AND N. K.

More information

Generalized Linear Models (GLZ)

Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the

More information

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

Descent methods. min x. f(x)

Descent methods. min x. f(x) Gradient Descent Descent methods min x f(x) 5 / 34 Descent methods min x f(x) x k x k+1... x f(x ) = 0 5 / 34 Gradient methods Unconstrained optimization min f(x) x R n. 6 / 34 Gradient methods Unconstrained

More information