Optimal Design for Nonlinear and Spatial Models: Introduction and Historical Overview

Size: px

Start display at page:

Download "Optimal Design for Nonlinear and Spatial Models: Introduction and Historical Overview"

Kathryn Amanda Heath
5 years ago
Views:

1 12 Optimal Design for Nonlinear and Spatial Models: Introduction and Historical Overview Douglas P. Wiens CONTENTS 12.1 Introduction Generalized Linear Models Selected Nonlinear Models Spatial Models References Introduction The topic of this part of the handbook optimal design for nonlinear and spatial models allows for a very broad range of subtopics. We should first distinguish these from those formulated for linear models. A salient feature of design problems for linear models is that the common functions expressing the experimenter s loss, when estimating the mean response, do not depend on the unknown parameters being estimated. In this chapter, a number of design problems are introduced in which this very convenient feature is absent, and ways of dealing with its absence are discussed in general terms. Thus, although we treat classical nonlinear regression models in which a response variable y is measured with additive error and E [y x] is a nonlinear function of parameters θ to be estimated after the experiment is conducted, there is a multitude of other applications. In this chapter, these subjects will be introduced in broad generality only, and some historical context provided; precise details and examples are given in the three chapters which follow: Designs for Generalized Linear Models (Chapter 13) Designs for Selected Nonlinear Models (Chapter 14) Optimal Design for Spatial Models (Chapter 15) Chapters 22, 24 and 25 deal with special applications that use nonlinear models Generalized Linear Models For a book-length treatment of generalized linear models (GLMs), we refer the reader to the now classic text McCullagh and Nelder (1989). Briefly, the response variable y, givena 457 Dean/Handbook of Design and Analysis of Experiments K14518_C012 Revises Page

2 458 Handbook of Design and Analysis of Experiments covariate vector x chosen by the experimenter, follows a distribution from the exponential family, with (canonical) density p ( y θ, φ, x ) { yθ b (θ) = exp + c ( y, φ )}, a (φ) for scalar functions a ( ), b ( ) and c (, ). The canonical parameter θ relates the systematic linear component η (x) = f (x) β, with regressors f (x) and regression parameters β,tothe mean μ = db(θ)/dθ via an invertible link function g, namely, η = g (μ). We write h = g 1 ; h (1) and h (2) are the first and second derivatives with respect to η. The parameters are typically estimated by maximum likelihood, computed from observations { } n y i i=1 made at points {x i} n i=1 chosen from a design space χ. The asymptotic variance of ˆβ is the inverse I 1 (β) of the information matrix: I (β) = X UX, where X is the model matrix, with ith row f (x i ) (i = 1,..., n)andu is the diagonal matrix of weights, withith diagonal element ( h (1) (η (x i )) ) 2 /Var [y xi ]. If the designer is primarily interested in precise estimation of β, then he or she will aim to maximize, in some sense, I (β); this leads to the adoption of classical alphabetic optimality criteria notably D-optimality, in which the goal is maximization of det (I (β)). The mean is estimated by ( ) ˆμ (x) = h f (x) ˆβ, with asymptotic variance and asymptotic bias given by (Robinson and Khuri 2003) Var [ ˆμ (x)] = ( h (1) (η (x))) 2 f (x) I 1 (β) f (x), Bias [ ˆμ (x)] = h (1) (η (x)) f (x) I 1 (β) X Uψ h(2) (η (x)) f (x) I 1 (β) f (x), where ψ n 1 has elements ψ i = h(2) (η (x i )) f (x i ) I 1 (β) f (x i ). 2 If interest focusses on prediction of mean values, then the designer will aim to minimize some function of the mean squared errors (MSEs) MSE [ ˆμ (x)] = Var [ ˆμ (x)] + Bias 2 [ ˆμ (x)], an obvious choice is the integral or average of MSE [ ˆμ (x)] over the design space χ. The class of GLMs spawns a wealth of particular applications and related design issues. Prominent among these is logistic regression, in which a binary response y has P ( y = 1 ) def = π = L (α + βx), Dean/Handbook of Design and Analysis of Experiments K14518_C012 Revises Page

3 Optimal Design for Nonlinear and Spatial Models 459 for L(η) = 1/ ( 1 + e η), the logistic distribution. Here μ = π and η = g (μ) = ln (π/(1 π)), the logit. One might seek a design a choice of values of x and a specification of the frequencies with which y is to be observed at these values in order to estimate the linear parameters efficiently, or to study functions of these parameters. For instance, in bioassay and dose response problems, interest often focusses on the covariate value x π0 = L 1 (π 0 ) α, β required to attain a response y = 1 in a specified proportion π 0 of the population. The role of the logistic distribution in the aforementioned may of course be played by other distributions; if L is replaced by the Gaussian distribution function, then one is dealing with probit regression, and similar design problems are of interest. One of the earliest instances of nonlinear regression design is for the exponential regression GLM Fisher (1922) considered a dilution-series problem, with P ( y = 1 ) = exp( θx) with θ, x > 0. This problem is also the subject of an example by Fedorov (1972, pp ), who notes that the information matrix for θ is a scalar, maximized by placing all observations at the solution x ( 1.6/θ) of the equation 2e θx + θx = 2. In the Poisson count model, y follows a Poisson distribution with mean μ = f (x) β, and the experimenter is typically interested in efficiently estimating functions of β. The optimal design will of course depend on which such function is of interest. Particular examples are discussed, for response surface exploration in an environmetric setting, by Myers (1999). In these problems, and indeed in virtually all design problems for GLMs, one begins by determining an optimal design under the assumption that certain parameters even those to be estimated from the experimental data are known beforehand. This, clearly untenable, assumption might then be dropped in a number of ways, all discussed in detail in the chapters which follow. One can content oneself with a locally optimal design, in which optimality is sought only at, or in a small neighbourhood of, these assumed parameter values. Alternatively one might design so as to minimize the maximum loss, with the maximum evaluated over a set of parameters a mild robustness criterion which is also discussed in Chapter 20. Another approach is to choose the design points sequentially, at each stage using parameter estimates derived from the preceding observations. A further possibility, when the loss function being minimized depends on unknown parameters, is to integrate them out, with respect to a prior, and to then minimize the average loss so obtained. This pseudo Bayesian criterion is discussed in Chapter 13 and is a topic to which we return in Section The field of optimal design for GLMs seems to have blossomed in the 1980s, and many contributors acknowledge a debt to Ford et al. (1989), who surveyed the then current state of research in a more general context of nonlinear design. Burridge and Sebastiani (1992) obtained locally D-optimal designs, that is, designs maximizing det (I (β)) for fixed values of β. For this, they pointed out that if the parameters are known, then the problem can be transformed to the D-optimality problem for a linear model with model matrix U 1/2 X; they applied methods developed for linear design theory to derive optimal designs in this transformed problem and then translated these back to the original problem. In a small simulation study, with a bivariate linear predictor η and canonical link η = μ 1/k for various values of k, the efficiencies turned out to be relatively insensitive to the settings of the parameter values. Dean/Handbook of Design and Analysis of Experiments K14518_C012 Revises Page

4 460 Handbook of Design and Analysis of Experiments Ford et al. (1992) refer to this transformed problem, in terms of U 1/2 X,asthecanonical form of the problem. They consider the structure of the induced design space in some depth and use methods of Elfving (1952) to obtain locally D-optimal and c-optimal designs; the latter are designs minimizing c I 1 (β) c for fixed c. As do Burridge and Sebastiani (1992), they concentrate on examples with two linear parameters (β 0, β 1 ); in both of these papers, the optimal designs turn out to be concentrated on one, two or three points. Atkinson and Haines (1996) apply this canonical approach to, among others, examples of multifactor experiments. A class of attractive alternatives to local optimality is given by sequential designs. The asymptotic theory related to this is most well developed for the case of D-optimality. Here it is supposed that one will obtain n 1 observations from an initial, static design. These are used to give initial estimates of the parameters, following which the remaining n n 1 observations are made sequentially, at each stage choosing the next design point so as to maximize the determinant of the information matrix evaluated at the current parameter estimates. Chaudhuri and Mykland (1993) show that, under certain conditions, the sequence of designs so obtained converges to the D-optimal design for the true parameters. These conditions include the requirement that n 1 /n 0asbothn and n 1 tend to infinity and an assumption that the parameter estimates be consistent. A consequence is that inferences made from a sequentially constructed design have the same asymptotic properties as if they were made following a static design an observation previously made by Wu (1985) in a related context. Sinha and Wiens (2002) extend the ideas of Chaudhuri and Mykland, and incorporate some uncertainty as to the nature of the parametric model. Dror and Steinberg (2008) introduce significant improvements to these methods; in particular their sequential procedure for design construction is easily adapted to multifactor experiments and to a range of possible models. One likely reason for the popularity of the D-optimality criterion in these problems is its invariance under non-singular transformations of the design space, leading to the possibility of transforming to the aforementioned canonical form of the problem. Failing this, other methods are available. Yang (2008) takes a direct algebraic approach to obtain A-optimal designs (minimizing trace ( I 1 (β) ) ) for logistic, probit and Laplace models with two linear parameters. Other criteria minimizing the integrated MSE [ ˆμ (x)], for instance rely more heavily on numerical methods of design construction. One sequential approach of considerable interest involves stochastic approximation see the discussion in Khuri et al. (2006) and, in a dose-finding framework, Cheung (2010). Once a design is constructed by this or another method it is of obvious interest to compare its performance with other candidate designs; the quantile dispersion graphs of Robinson and Khuri (2003) provide a possible means for doing this. Here, but a few of the many facets of design for GLMs have been touched upon; these, and the broad spectrum of topics discussed in Chapter 13, illustrate that design theory for GLMs continues to be an active and exciting area of research Selected Nonlinear Models Part of the richness of the theory of design for nonlinear models stems from the physical settings in which the various models arise, each resulting in unique approaches to the design problems. Some particular nonlinear regression models, of the form Dean/Handbook of Design and Analysis of Experiments K14518_C012 Revises Page

5 Optimal Design for Nonlinear and Spatial Models 461 y = η (x; θ) + ε, where ε is random error and η (x; θ) is an at least partially nonlinear function of a p-dimensional parameter vector θ, correspond to the following response functions: The response η (x; θ) = θ 1 θ 1 θ 2 ( e θ 2x e θ 1x ), θ 1, θ 2 > 0, x > 0, for which the design problem was studied by Box and Lucas (1959), is used in chemometrics to model reactions in which a substance decomposes from a state A to a state B and finally to a state C. The parameters θ 1 and θ 2 measure the rates of these two decompositions, and η is the mean yield in state B. The design variable x represents time; a consequence is that, in contrast to many other design problems, there is no possibility of replication only one observation can be made at a specific value of x. Here and elsewhere, we define f (x; θ) to be the gradient f (x; θ) = ( η (x; θ) θ 1,..., ) η (x; θ), (12.1) θ p and F (θ) to be the n p matrix with ith row f (x i ; θ), where x i denotes the settings of the variables in the ith run of the experiment. Box and Lucas make preliminary guesses θ = ( θ 1, 2) θ and adopt the local D-optimality criterion, which aims to maximize the determinant F ( θ ) F ( θ ). A motivation is that when the asymptotic distribution of the parameter estimates is employed and if the initial guesses are correct, then such a design results in confidence ellipsoids of minimum volume. When the de la Garza phenomenon holds or is assumed this is expanded upon and exploited in Chapter 14 an optimal design will have only p support points and thus F ( θ ) F ( θ ) ( = F θ ) 2 ; this simplifies the search for the optimal points, at least when p is small and when analytic rather than numerical, methods are being used. Box and Lucas obtained optimal points (x 1, x 2 ) through a combination of geometric and analytic arguments, and used this example to illustrate a stepwise journey to the optimum, through fitting a sequence of quadratic models, in x 1 and x 2, very similar to common practice in response surface exploration. The Michaelis Menten enzyme kinetic function is η (x; θ) = θ 1x θ 2 + x, θ 1, θ 2 > 0, x > 0, where x is the concentration of substrate, θ 1 the maximum reaction velocity (i.e., the horizontal asymptote as x ), and θ 2 is the half-saturation constant, that is, the value of x at which the mean velocity η attains one-half of its asymptotic value. An important feature of this model from a design standpoint is that it is nonlinear in θ 2 but not in θ 1, and then the loss function for D-optimality depends (up to a constant of proportionality) only on θ 2. Currie (1982) discusses various designs for Dean/Handbook of Design and Analysis of Experiments K14518_C012 Revises Page

6 462 Handbook of Design and Analysis of Experiments this model. Assuming homoscedastic normal errors, the maximum likelihood estimates are obtained by least squares, leading to the D-optimal design which places half of the observations at θ 2 and the other half at as large a value of x as possible. (Bates and Watts 1988, p. 126) state instead that the two locations are x 1 = x max, the maximum allowable value, and x 2 = θ 2 /(1 + 2 (θ 2 /x max )), in agreement with Currie if x 2 is evaluated at x max =.) An obvious drawback to this design, shared by others in which the number of distinct locations of the explanatory variables is no larger than p, is that there is no possibility to check the validity of the model a point which is also discussed in Chapter 20. Thus Currie discusses as well more ad hoc, but sensible, designs in which the majority of the design points are spread out over the low range of concentration, with the rest distributed throughout the higher range. He finds that the value of F (θ) F (θ) (evaluated at the assumed value of θ2 ) can be substantially smaller than that for the locally D-optimal design, but that the performance of this latter design can itself deteriorate markedly if the experimenter s guess at the value of θ 2 is inaccurate. An obvious remedy, if conditions permit, is to design sequentially, with past observations used to give improved estimates of θ 2. The Michaelis Menten model is used throughout Chapter 14 for illustration of the concepts there. The rational function response η (x; θ) = θ 1 θ 3 x θ 1 x 1 + θ 2 x 2, x 1, x 2 > 0, models chemical reactions of the type R P 1 + P, withη representing the speed of the reaction, x 1 the partial pressure of the sought product P, x 2 the partial pressure of the product P 1, θ 2 the absorption equilibrium constant for P 1, θ 3 the effective constant of the speed of reaction (appearing linearly) and θ 1 the absorption equilibrium constant for the reagent R (Fedorov 1972, pp ). Box and Hunter (1965) propose a sequential approach with, at each stage, new locations x = (x 1, x 2 ) ( ) ( chosen to maximize the resulting value of F ˆθ F ˆθ) evaluated at the current estimates ˆθ. Fedorov (1972) discusses this example in detail. Initial estimates θ of the parameter values are obtained from a preliminary experiment, with observations made at the four combinations of x 1, x 2 {1, 2}. Given a design specifying n observations, and resulting in parameter estimates ˆθ (n), the next location is given by ( )[ x n+1 = arg max f x; ˆθ ( ) ( )] 1 ) (n) F ˆθ (n) F ˆθ (n) f (x; ˆθ (n), x stopping once the changes in the parameter estimates become insignificant. The asymptotic optimality results of Chaudhuri and Mykland (1993) and Wu (1985), mentioned in Section 12.2, apply. Recall that the volume of a confidence ellipsoid on the parameters is proportional to F F 1/2. Even under exact normality, the coverage probability of such regions equals the nominal value only for linear response surfaces. Hamilton et al. (1982) obtain corrected, second-order expressions for the volume of such regions, with the Dean/Handbook of Design and Analysis of Experiments K14518_C012 Revises Page

7 Optimal Design for Nonlinear and Spatial Models 463 correction term, which is O p (n 1 ), depending on the degree of nonlinearity of the response. Hamilton and Watts (1985) then reconsider the sequential design procedure for this rational function example as an illustration of their quadratic design criterion, which aims to minimize the corrected value of the volume. They find that each subsequent observation diminishes the effect of the nonlinearity and also that the designs can be quite different from those of Box and Hunter. As in these examples, a preliminary goal of the experimenter might be to design for efficient estimation of the parameters; in this case, the same alphabetic optimality criteria as in linear regression are available. Or the experimenter might seek a design which aids in the selection of an appropriate model. When this is phrased as a discrimination problem, the mathematical goal could be the maximization of the power of a test of a hypothesis ( η = η 0 versus η = η 1, each specified up to its parameter values. If the densities p 0 y; η0 (x) ) ( and p 1 y; η1 (x) ) of y under the two models are both Gaussian, this leads to the notion of T-optimality (Atkinson and Fedorov 1975a,b). More generally (López-Fidalgo et al. 2007), it leads to KL-optimality, in which the goal is to find a design ξ maximizing inf I (η 0 (x θ 0 ), η 1 (x θ 1 )) ξ (dx), (12.2) θ 0 here I (η 0 (x), η 1 (x)) = { ( p1 y; η1 (x) ) } ( p 1 y; η1 (x) ) log ( p 0 y; η0 (x) ) dy, is the Kullback Leibler divergence, measuring the information which is lost when p 0 is used to approximate p 1. In (12.2), θ 1 is assumed known. Both static and sequential approaches are available; robustifications of this approach are discussed in Chapter 20. Whatever might be the parameter-dependent loss function, a possibility is to seek a design minimizing the average loss; namely, ξ 0 = arg min L (ξ; θ) π (θ) dθ, (12.3) ξ where L (ξ; θ) is the loss corresponding to a design ξ when the true model is parameterized by θ and π ( ) is a user-chosen function assigning greater weight to parameter values thought to be most plausible or perhaps values against which one desires greater protection. For instance, the choice L (ξ; θ) = log M (ξ; θ), where M (ξ; θ) p p = f (x; θ) f (x; θ) ξ (dx), χ gives an analogue of classical D-optimality. For this choice, an equivalence theorem (Läuter 1974; see also Section 7.3 of Cox and Reid 2000) applies and states that, under mild conditions, ξ 0 satisfies (12.3) if and only if d (x; ξ 0 ) = f (x; θ) M 1 (ξ 0 ; θ) f (x; θ) π (θ) dθ p, Dean/Handbook of Design and Analysis of Experiments K14518_C012 Revises Page

8 464 Handbook of Design and Analysis of Experiments at all points x in the design space, with equality at the support points of ξ 0. As a simple yet instructive example, suppose that one intends to fit an exponential response η (x; θ) = e θx, (12.4) with additive error, by least squares. Then in (12.1), p = 1, f (x; θ) = xe θx and the requirement becomes, in an obvious notation, [ ] x 2 e 2θx E π [ E ξ0 x 2 e 2θx] 1, (12.5) with equality at the support points. With a design region χ = (0, 1], (12.5) applied to a one-point design with all mass at x 0 χ becomes E π [e 2θ(x x 0) ] (x 0 /x) 2. Some calculus yields x 0 = min { 1, 1/E π [θ] }, as given in Chaloner (1993) and restated in Dette and Neugebauer (1997), where, as well, conditions on π are given under which this one-point design is optimal, that is, satisfies (12.3), within the class of all designs. These conditions fail if, for instance, π is uniform on = [1, θ max ],forθ max sufficiently large. Then numerical methods must be used to obtain the maximizer in (12.3) directly, with (12.5) checked for verification of the optimality. An overview of this approach to design, in which the weight functions π ( ) are chosen according to a Bayesian paradigm, is given in Chaloner and Verdinelli (1995). For multiparameter models and priors, the integrations in (12.3) can become a significant part of the problem, requiring methods such as Markov chain Monte Carlo see, for instance, the discussions in Atkinson and Haines (1996) and Atkinson et al. (1995). Another possibility is to design so as to test the assumed response function for lack of fit (O Brien 1995). Designs optimal for discrimination or for lack of fit testing are typically not very efficient for estimating the parameters of the final model; this leads to designs which optimize some mixture of these goals see Hill et al. (1968) and the discussion in Chapter 14 of the approach of Dette et al. (2005). Similar in nature to calibration problems in linear regression are dose finding studies, which are also discussed in Chapter 24. Here one seeks the value of x resulting in a specified mean response η (x; θ). If η is explicitly invertible in particular, if it is linear in the parameters then estimates of x may be obtained from those of θ, and so the design problem is concentrated on efficient estimation of a function of the parameters. Otherwise, a possible approach is to design sequentially, guided by stochastic approximation (Cheung 2010). A class of design problems, apparently first studied by Chernoff (1962), arises in quality control and concerns accelerated life testing. One assumes a, typically nonlinear, response relating the lifetime (y) of a product to stress levels (x) and possibly to other covariates. The experimenter can usually not wait for a product on test to fail under normal stress levels, and so attempts to obtain inferences upon subjecting the product to abnormally high stresses. The goal is accurate prediction of product lifetime at normal stress levels, so that there is a natural link here to the more general problem of designing experiments for purposes of extrapolation. Dean/Handbook of Design and Analysis of Experiments K14518_C012 Revises Page

9 Optimal Design for Nonlinear and Spatial Models 465 The list of design problems and applications goes on; these and others are expanded upon in Chapter 14, where as well the mathematical theory is outlined. Other useful references include Bates and Watts (1988) and Seber and Wild (1989), each of which discusses modelling, inference, computations and to some extent design, in a comprehensive manner Spatial Models Spatial models pose some unique problems, both in inference and in design. Cressie (1993, p. 313) distinguishes between spatial experimental design, in which locations are fixed and the design consists of an allocation of treatments to these locations, and spatial sampling,in which the designer is faced with a spatial stochastic process (a random field), from which he or she is to choose locations at which to make observations. Much of the impetus for spatial experimental design derived from agricultural experimentation, and hence a large debt is owed to R. A. Fisher, who introduced in the 1920s and 1930s the now common notions of replication, randomization, blocking, etc.; see Martin (1996). Randomized designs came to be replaced by more systematic layouts, the analysis of which led to particular requirements in accounting for the spatial dependence. One of such is neighbour balance the requirement that, for instance, each treatment occurs the same number of times next to each other treatment. This might arise because of competition or interference between treatments. The achievement of neighbour balance in a design can lead to interesting combinatorial problems; see, for instance, Druilhet and Walter (2012). Typically, efficiency of estimation of model parameters is not a particularly important goal in spatial studies; this is however the aim of many designs which take account of spatial information by instead adopting a particular structure of dependence between nearby observations. Commonly, the ensuing analysis utilizes generalized least squares estimates, tailored for the particular dependence structure assumed. An optimal design then might be one which minimizes a particular loss function associated with these estimates or predictions. In all these cases, there might be dependence on covariates besides location; a possible model of the mean response at location t, with treatment covariates x, might be E[y x,t] =f (x)θ 1 + g (t)θ 2. In this case the locations are fixed but the covariates x are to be chosen by the designer. That this is a nonlinear model arises from the spatial dependence between observations, hence the dependence of the loss on the unknown parameters of the correlation structure. In spatial sampling, as in spatial experimental design, efficiency might take a back seat to other goals dictated by the physical setting of the problem; see Thompson (1997) and Müller (2005). Geometry-based designs, often intended for exploratory purposes, might aim to be space filling. If model-free imputation of missing observations is the primary goal, then the designer might use probability sampling (Matérn 1960). When the probabilistic structure is known and prediction is the goal, then an information theoretic approach might be apt see Caselton and Zidek (1984), who propose the maximization of mutual information based on Shannon s entropy, and the environmental application in Zidek et al. (2000). Dean/Handbook of Design and Analysis of Experiments K14518_C012 Revises Page

10 466 Handbook of Design and Analysis of Experiments On the other hand, when efficient parameter estimation and parametric inference is the aim, we are in the realm of optimal sampling design. The first step is often the choice of a correlation function specifying the nature and degree of the dependencies between observations made at various locations. This function plays a central role in the prediction of the response at unsampled locations typically through kriging (i.e. best linear unbiased prediction) and hence on the construction of designs. The choice of a particular spatial model is discussed at length in the companion handbook Gelfand et al. (2010), and so we do not discuss this here. A common aim of the designer is to minimize the integral (or sum, if the set of locations is discrete) of the MSEs of the predictions over all locations in the region of interest. Minimizing the maximum MSE is another possibility. This MSE might arise from the spatial variation and its estimation; another contributing factor might be the estimation of the mean response E[y x,t], modelled parametrically. When a regression response is modelled, the usual alphabetic optimality criteria become germane. In some applications, physical interpretations of covariance function parameters are also important and can become the objective of the design. To give some idea of the flavour of the techniques, consider the following design problem studied by Müller (2005). A region in the Danube river basin in Austria currently has a network of 36 water quality monitoring stations. The locations are labelled relative to a grid overlying the region. To predict chloride concentration (y) at location x, the experimenter fits a regression model with spatially correlated errors and a parametric covariance function: y (x) = f (x) β + ε (x), Cov [ ε (x), ε ( x )] = c ( x, x ; θ ). For illustrative purposes, Müller redesigns this network of 36 stations in several ways. In all cases, an important feature is that there is no notion of replication only one monitoring station may be placed at a particular location. The first design illustrated is D-optimal, maximizing the determinant of the information matrix for β (with f (x) = ( 1, x ) ); this matrix of course depends on the covariance function, taken to be c ( x, x ; θ ) = θ 2 {1 3 2 θ 1 + θ 2, x = x, ( ) } x x 3 θ 3 ( x x θ 3 ) ,, 0 < x x θ3, x x > θ3. Exchange algorithms are introduced to carry out the optimization. The resulting design is in Figure 12.1; a notable feature is that the design calls for all stations to be concentrated at FIGURE 12.1 D-optimal network of chlorine monitoring stations. (From Muller, W.G., Environmetrics, 16, 495, 2005.) Dean/Handbook of Design and Analysis of Experiments K14518_C012 Revises Page

11 Optimal Design for Nonlinear and Spatial Models 467 FIGURE 12.2 Network of chlorine monitoring stations obtained via an expansion of the covariance kernel, followed by D-optimality. (From Muller, W.G., Environmetrics, 16, 495, 2005.) the boundary of the region, but to be somewhat evenly distributed on this boundary. Presumably the managers of such a network would be asked if they were perhaps duplicating efforts of others immediately across the geographic boundary of their region. Another method of D-optimal design construction in Müller (2005) relies on an expansion of the covariance function in terms of eigenfunctions {φ l (x)}, resulting in an approximation of the process as y (x) = f (x) β + γ l φ l (x) + e (x), p l=1 with uncorrelated errors {e (x)}. Here the {γ l } are uncorrelated random variables with variances given by the eigenvalues corresponding to the φ l. This representation allows for an analysis by random coefficient regression, leading to the design in Figure 12.2, exhibiting a greater coverage of the region than that of Figure There is a close relationship between spatial sampling and the design of computer experiments. Although there is no random error, in the usual sense, in such experiments, it is common to model the dependencies between the outputs of experiments, with distinct inputs, via spatial correlation structures. This then engenders a certain similarity in the design problems the inputs to the computer experiment, to be chosen by the designer, play much the same role as do the locations in spatial sampling. Designs for computer experiments are discussed in Section V. The computational demands involved in constructing spatial designs can be immense. Some techniques which have been attempted, with varying measures of success, are exchange algorithms, simulated annealing and genetic algorithms. These, and many of the topics touched on previously, are discussed at length in Chapter 15. References Atkinson, A. C., Demetrio, C. G. B., and Zocchi, S. S. (1995), Optimum dose levels when males and females differ in response, Applied Statistics, 44, Atkinson, A. C. and Fedorov, V. V. (1975a), The design of experiments for discriminating between two rival models, Biometrika, 62, Atkinson, A. C. and Fedorov, V. V. (1975b), Optimal design: Experiments for discriminating between several models, Biometrika, 62, Dean/Handbook of Design and Analysis of Experiments K14518_C012 Revises Page

12 468 Handbook of Design and Analysis of Experiments Atkinson, A. C. and Haines, L. M. (1996), Designs for nonlinear and generalized linear models, in: Design and Analysis of Experiments, Handbook of Statistics, Vol. 13, pp ; eds. Ghosh, S. and Rao, C. R., Elsevier/North-Holland. Bates, D. M. and Watts, D. G. (1988), Nonlinear Regression Analysis and Its Applications, Wiley, New York. Box, G. E. P. and Hunter, W. G. (1959), Design of experiments in non-linear situations, Biometrika, 46, Box, G. E. P. and Lucas, H. L. (1965), The experimental study of physical mechanisms, Technometrics, 7, Burridge, J. and Sebastiani, P. (1992), Optimal designs for generalized linear models, Journal of the Italian Statistical Society, 1, Caselton, W. F. and Zidek, J. V. (1984), Optimal monitoring network designs, Statistics and Probability Letters, 2, Chaudhuri, P. and Mykland, P. A. (1993), Nonlinear experiments: Optimal design and inference based on likelihood, Journal of the American Statistical Association, 88, Chaloner, K. (1993), A note on optimal bayesian design in nonlinear problems, Journal of Statistical Planning and Inference, 37, Chaloner, K. and Verdinelli, I. (1995), Bayesian experimental design: A review, Statistical Science, 10, Chernoff, H. (1962), Optimal accelerated life designs for estimation, Technometrics, 4, Cheung, Y. K. (2010), Stochastic approximation and modern model-based designs for dose-finding clinical trials, Statistical Science, 25, Cox, D. R. and Reid, N. (2000), The Theory of the Design of Experiments, Chapman & Hall. Currie, D. J. (1982), Estimating Michaelis-Menten parameters: Bias, variance and experimental design, Biometrics, 38, Cressie, N. (1993), Statistics for Spatial Data, Wiley, New York. Dette, H., Melas, V. B., and Wong, W.-K. (2005), Optimal design for goodness-of-fit of the Michaelis- Menten enzyme kinetic function, Journal of the American Statistical Association, 100, Dette, H. and Neugebauer, H.-M. (1997), Bayesian D-optimal designs for exponential regression models, Journal of Statistical Planning and Inference, 60, Dror, H. A. and Steinberg, D. M. (2008), Sequential experimental designs for generalized linear models, JournaloftheAmericanStatisticalAssociation, 103, Druilhet, P. and Walter, T. (2012), Efficient circular neighbour designs for spatial interference model, Journal of Statistical Planning and Inference, 142, Elfving, G. (1952), Optimal allocation in linear regression theory, Annals of Mathematical Statistics, 23, Fedorov, V. V. (1972), Theory of Optimal Experiments, Academic Press, New York. Fisher, R. A. (1922), On the mathematical foundations of theoretical statistics, Philosophical Transactions of the Royal Society of London, Series A, 22, Ford, I., Titterington, D. M., and Kitsos, C. P. (1989), Recent advances in nonlinear experimental design, Technometrics, 31, Gelfand, A. E., Diggle, P., Fuentes, M., and Guttorp, P. (2010), Handbook of Spatial Statistics, Chapman & Hall, New York. Hamilton, D. C. and Watts, D. G. (1985), A quadratic design criterion for precise estimation in nonlinear regression models, Technometrics, 27, Hamilton, D. C., Watts, D. G., and Bates, D. C. (1982), Accounting for intrinsic nonlinearity in nonlinear regression parameter inference regions, Annals of Statistics, 10, Hill, W. J., Hunter, W. G., and Wichern, D. W. (1968), A joint design criterion for the dual problem of model discrimination and parameter estimation, Technometrics, 10, Khuri, A. I., Mukherjee, B., Sinha, B. K., and Ghosh, M. (2006), Design issues for generalized linear models: A review, Statistical Science, 21, Läuter, E. (1974). Experimental design in a class of models, Mathematische Operations-forschung Statistik, 5, Dean/Handbook of Design and Analysis of Experiments K14518_C012 Revises Page

13 Optimal Design for Nonlinear and Spatial Models 469 López-Fidalgo, J., Tommasi, C., and Trandafir, P. C. (2007), An optimal experimental design criterion for discriminating between non-normal models, Journal of the Royal Statistical Society B, 69, Martin, R. J. (1996), Spatial experimental design, in: Design and Analysis of Experiments, Handbook of Statistics, Vol. 13, pp ; eds. Ghosh, S. and Rao, C. R., Elsevier/North-Holland. Matérn, B. (1960), Spatial Variation, Springer-Verlag, Berlin. McCullagh, P. and Nelder, J. A. (1989), Generalized Linear Models, Wiley, New York. Müller, W. G. (2005), A comparison of spatial design methods for correlated observations, Environmetrics, 16, Myers, R. H. (1999), Response surface methodology Current status and future directions, Journal of Quality Technology, 31, O Brien, T. E. (1995), Optimal design and lack of fit in nonlinear regression models, in: Proceedings of the 10th International Workshop on Statistical Modelling, Lecture Notes in Statistics,Springer-Verlag, New York, pp Robinson, K. S. and Khuri, A. I. (2003), Quantile dispersion graphs for evaluating and comparing designs for logistic regression models, Computational Statistics and Data Analysis, 43, Seber, G. A. F. and Wild, C. J. (1989), Nonlinear Regression, Wiley, New York. Sinha, S. and Wiens, D. P. (2002), Robust sequential designs for nonlinear regression, The Canadian Journal of Statistics, 30, Thompson, S. K. (1997), Effective sampling strategies for spatial studies, Metron, 55, Wu, C. F. J. (1985), Asymptotic inference from sequential design in a nonlinear situation, Biometrika, 72, Yang, M. (2008), A-optimal designs for generalized linear models with two parameters, Journal of Statistical Planning and Inference, 138, Zidek, J. V., Sun, W., and Le, N. D. (2000), Designing and integrating composite networks for monitoring multivariate gaussian pollution fields, Applied Statistics, 49, Dean/Handbook of Design and Analysis of Experiments K14518_C012 Revises Page

14 Dean/Handbook of Design and Analysis of Experiments K14518_C012 Revises Page

AP-Optimum Designs for Minimizing the Average Variance and Probability-Based Optimality

AP-Optimum Designs for Minimizing the Average Variance and Probability-Based Optimality Authors: N. M. Kilany Faculty of Science, Menoufia University Menoufia, Egypt. (neveenkilany@hotmail.com) W. A. Hassanein