By Ming-Chung Chang and Ching-Shui Cheng Academia Sinica and University of California, Berkeley

Similar documents
Optimal Two-Level Regular Fractional Factorial Block and. Split-Plot Designs

An approach to the selection of multistratum fractional factorial designs

Multistratum Fractional Factorial Designs. Ching-Shui Cheng and Pi-Wen Tsai. University of California, Berkeley and National Taiwan Normal University

Some Nonregular Designs From the Nordstrom and Robinson Code and Their Statistical Properties

Moment Aberration Projection for Nonregular Fractional Factorial Designs

Some optimal criteria of model-robustness for two-level non-regular fractional factorial designs

Optimal Selection of Blocked Two-Level. Fractional Factorial Designs

Interaction balance in symmetrical factorial designs with generalized minimum aberration

E(s 2 )-OPTIMAL SUPERSATURATED DESIGNS

Maximal Rank - Minimum Aberration Regular Two-Level Split-Plot Fractional Factorial Designs

Minimum Aberration and Related Criteria for Fractional Factorial Designs

GENERALIZED RESOLUTION AND MINIMUM ABERRATION CRITERIA FOR PLACKETT-BURMAN AND OTHER NONREGULAR FACTORIAL DESIGNS

HIDDEN PROJECTION PROPERTIES OF SOME NONREGULAR FRACTIONAL FACTORIAL DESIGNS AND THEIR APPLICATIONS 1

A General Criterion for Factorial Designs Under Model Uncertainty

Chapter 3 Transformations

Orthogonal and nearly orthogonal designs for computer experiments

2. Matrix Algebra and Random Vectors

Optimal blocking of two-level fractional factorial designs

CONSTRUCTION OF OPTIMAL MULTI-LEVEL SUPERSATURATED DESIGNS. Hongquan Xu 1 and C. F. J. Wu 2 University of California and University of Michigan

CONSTRUCTION OF OPTIMAL MULTI-LEVEL SUPERSATURATED DESIGNS. University of California, Los Angeles, and Georgia Institute of Technology

USING REGULAR FRACTIONS OF TWO-LEVEL DESIGNS TO FIND BASELINE DESIGNS

Connections between the resolutions of general two-level factorial designs

Selecting an Orthogonal or Nonorthogonal Two-Level Design for Screening

A Coset Pattern Identity between a 2 n p Design and its Complement

MATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018

Algorithmic Construction of Efficient Fractional Factorial Designs With Large Run Sizes

Boolean Inner-Product Spaces and Boolean Matrices

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

A General Criterion for Factorial Designs Under Model Uncertainty

Minimax design criterion for fractional factorial designs

COMPROMISE PLANS WITH CLEAR TWO-FACTOR INTERACTIONS

Upper triangular matrices and Billiard Arrays

Minimum Aberration and Related Designs in Fractional Factorials. 2 Regular Fractions and Minimum Aberration Designs

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Ir O D = D = ( ) Section 2.6 Example 1. (Bottom of page 119) dim(v ) = dim(l(v, W )) = dim(v ) dim(f ) = dim(v )

Ð"Ñ + Ð"Ñ, Ð"Ñ +, +, + +, +,,

BAYESIAN OPTIMAL FRACTIONAL FACTORIALS

An Approach to Constructing Good Two-level Orthogonal Factorial Designs with Large Run Sizes

Sliced Minimum Aberration Designs for Four-platform Experiments

4.2. ORTHOGONALITY 161

The decomposability of simple orthogonal arrays on 3 symbols having t + 1 rows and strength t

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian

INTELLIGENT SEARCH FOR AND MINIMUM ABERRATION DESIGNS

CONSTRUCTION OF NESTED (NEARLY) ORTHOGONAL DESIGNS FOR COMPUTER EXPERIMENTS

CONSTRUCTION OF SLICED SPACE-FILLING DESIGNS BASED ON BALANCED SLICED ORTHOGONAL ARRAYS

Common-Knowledge / Cheat Sheet

Statistica Sinica Preprint No: SS R2

Uniform semi-latin squares and their Schur-optimality

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Analysis of variance using orthogonal projections

Unit 6: Fractional Factorial Experiments at Three Levels

Lecture Summaries for Linear Algebra M51A

Improving linear quantile regression for

Classification of three-word indicator functions of two-level factorial designs

CONSTRUCTION OF SLICED ORTHOGONAL LATIN HYPERCUBE DESIGNS

Citation Statistica Sinica, 2000, v. 10 n. 4, p Creative Commons: Attribution 3.0 Hong Kong License

Forms of four-word indicator functions with implications to two-level factorial designs

Projection properties of certain three level orthogonal arrays

Lecture 1. 1 Conic programming. MA 796S: Convex Optimization and Interior Point Methods October 8, Consider the conic program. min.

Isotropic matroids III: Connectivity

Acta Mathematica Sinica, English Series Springer-Verlag Berlin Heidelberg & The Editorial Office of AMS 2015

THE STRUCTURE OF 3-CONNECTED MATROIDS OF PATH WIDTH THREE

MINIMUM MOMENT ABERRATION FOR NONREGULAR DESIGNS AND SUPERSATURATED DESIGNS

16 Chapter 3. Separation Properties, Principal Pivot Transforms, Classes... for all j 2 J is said to be a subcomplementary vector of variables for (3.

0 Sets and Induction. Sets

Lecture 7 Spectral methods

Combining Association Schemes

MATH 511 ADVANCED LINEAR ALGEBRA SPRING 2006

3. Vector spaces 3.1 Linear dependence and independence 3.2 Basis and dimension. 5. Extreme points and basic feasible solutions

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

Factorial Designs and Harmonic Analysis on Finite Abelian Groups

UNIFORM FRACTIONAL FACTORIAL DESIGNS

DS-GA 1002 Lecture notes 12 Fall Linear regression

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

STATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions

QUASI-ORTHOGONAL ARRAYS AND OPTIMAL FRACTIONAL FACTORIAL PLANS

FRACTIONAL FACTORIAL SPLIT-PLOT DESIGNS WITH MINIMUM ABERRATION AND MAXIMUM ESTIMATION CAPACITY

AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES

We describe the generalization of Hazan s algorithm for symmetric programming

Chapter 7 Iterative Techniques in Matrix Algebra

Rank minimization via the γ 2 norm

CHAPTER 6. Representations of compact groups

BALANCING GAUSSIAN VECTORS. 1. Introduction

Parity Versions of 2-Connectedness

Introduction. Chapter 1

Hypothesis testing in multilevel models with block circular covariance structures

Learning gradients: prescriptive models

A NEW CLASS OF NESTED (NEARLY) ORTHOGONAL LATIN HYPERCUBE DESIGNS

Math 117: Topology of the Real Numbers

arxiv: v1 [math.co] 3 Nov 2014

Math113: Linear Algebra. Beifang Chen

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October

Chapter 6. Orthogonality

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

Linear Algebra Practice Final

Construction of some new families of nested orthogonal arrays

Lecture Notes 1: Vector spaces

Laplacian Integral Graphs with Maximum Degree 3

FRACTIONAL FACTORIAL DESIGNS OF STRENGTH 3 AND SMALL RUN SIZES

Transcription:

Submitted to the Annals of Statistics A BAYESIAN APPROACH TO THE SELECTION OF TWO-LEVEL MULTI-STRATUM FACTORIAL DESIGNS By Ming-Chung Chang and Ching-Shui Cheng Academia Sinica and University of California, Berkeley In a multi-stratum factorial experiment, there are multiple error terms (strata) with different variances that arise from complicated structures of the experimental units. For unstructured experimental units, minimum aberration is a popular criterion for choosing regular fractional factorial designs. One difficulty in extending this criterion to multi-stratum factorial designs is that the formulation of a wordlength pattern based on which minimum aberration is defined requires an order of desirability among the relevant words, but a natural order is often lacking. Furthermore, a criterion based only on wordlength patterns does not account for the different stratum variances. Mitchell, Morris, and Ylvisaker (1995) proposed a framework for Bayesian factorial designs. A Gaussian process is used as the prior for the treatment effects, from which a prior distribution of the factorial effects is induced. This approach is applied to study optimal and efficient multi-stratum factorial designs. Good surrogates for the Bayesian criteria that can be related to wordlength and generalized wordlength patterns for regular and nonregular designs, respectively, are derived. A tool is developed for eliminating inferior designs and reducing the designs that need to be considered without requiring any knowledge of stratum variances. Numerical examples are used to illustrate the theory in several settings. 1. Introduction. In a multi-stratum factorial experiment, there are multiple error terms (strata) with different variances. For example, in an experiment conducted in several days, suppose the levels of some treatment factors are difficult to change and must be kept the same throughout the day, while the levels of the other factors can be changed from run to run on the same day. Then the precision of the estimates of the main effects of hard-to-change factors depends on the between-day variability, and that of the main effects of easy-to-change factors depends on the between-run variability on the same day. Typically the former is greater than the latter. Such an experiment is said to have two strata. The two strata arise from the structure of the experimental units that some larger units (those associated with the days, called whole-plots) are split into smaller units (those associated with the runs, called subplots). If the experiment is to be blocked, with the whole-plots grouped into more homogeneous blocks, then we will have a third stratum MSC 2010 subject classifications: 62K05, 62K15 Keywords and phrases: A-criterion, block structure, D-criterion, Gaussian process, (M.S)-criterion, minimum aberration, wordlength pattern 1

2 M. C. CHANG AND C. S. CHENG and the associated between-block variability is expected to be greater than the betweenwhole-plot variability within the same block. In general, the strata are determined by the structure of experimental units, called a block structure. When the experimental units are unstructured, the minimum aberration (MA) criterion proposed by Fries and Hunter [11] is a popular criterion for choosing regular fractional factorial designs under the hierarchical assumption that lower-order effects are more important than higher-order effects and effects of the same order are equally important. This criterion is based on the so-called wordlength pattern and has been extended to nonregular designs (Tang and Deng [20]). It has also been extended to several types of multi-stratum fractional factorial designs, such as block designs, split-plot designs, and blocked split-plot designs, by modifying the wordlength pattern. The modifications are often ad hoc and without strong justifications. The formulation of a wordlength pattern requires an order of desirability among the relevant words, but a natural order is often lacking. Furthermore, a criterion based only on wordlength patterns does not account for the different stratum variances. Cheng, Steinberg, and Sun [5] showed that minimum aberration is a good surrogate for maximum estimation capacity, a model-robust criterion that maximizes the number of estimable models among some potential models. For nonregular designs, since different estimable models may be estimated with different efficiencies, it is no longer enough to compare the number of estimable models. In this case, the information capacity criterion (Sun [18]) of maximizing the average efficiency over a set of potential models is more appropriate. One can apply the information capacity criterion to compare multistratum factorial designs, under which different estimable models may also be estimated with different efficiencies due to different stratum variances. Cheng and Tsai [6, 7] adopted this approach and derived a surrogate for the maximum information capacity criterion. This line of work, however, assumes that the three-factor and higher-order interactions are negligible, and can only be applied to orthogonal regular designs. Mitchell, Morris, and Ylvisaker [16] proposed a framework for Bayesian fractional factorial designs. A Gaussian process commonly used in the literature of computer experiments (Sacks, Welch, Mitchell, and Wynn [17]) to model unknown deterministic response functions is used as the prior for the treatment effects, from which a prior distribution for the factorial effects is induced. This approach, which provides more flexibility in incorporating the prior knowledge, was further developed by Kerr [15] and Joseph [12] for studying optimal fractional factorial designs. Joseph, Ai, and Wu [13] used the same Bayesian approach to inspire a minimum aberration criterion for mixed two- and four-level designs with unstructured units. The main objective of the present article is to apply this approach to multi-stratum factorial designs. Another work relevant to ours is Ai, Kang, and Joseph [1] in which a Bayesian approach was applied to study blocked fractional factorial designs with fixed block effects. Some preliminary materials, including block structures, treatment factorial effects, strata, orthogonal designs, Bayesian approach, and statistical models, are presented in Section 2. Explicit forms of Bayesian A- and D-optimality criteria are derived in Section 3. Generally

BAYESIAN SELECTION OF MULTI-STRATUM FACTORIAL DESIGNS 3 analytical results on Bayesian optimal fractional factorial designs are difficult to prove. In Section 4 we provide one such result in the setting of regular half-fractions with two blocks; earlier, Bayesian optimal half-fractions without blocking was obtained in [15]. In Section 5, we derive good surrogates for the Bayesian criteria that can also be applied to nonregular and nonorthogonal designs. For orthogonal blocking of complete factorial designs, the surrogate criterion reduces to the usual minimum aberration criterion. In the case of fractional factorial designs for unstructured experimental units, the surrogate criterion is shown to be a refinement of minimum and generalized minimum aberration for regular and nonregular designs, respectively. For nonregular multi-stratum designs, our approach provides a stronger justification than naive modifications of the usual wordlength patterns from regular to nonregular designs. A tool is developed for eliminating inferior designs without any knowledge of the stratum variances, thereby reducing the designs that need to be considered. Examples in several settings are provided in Section 6 to illustrate the theory. 2. Preliminaries. 2.1. Unit factors and block structures. Let Ω be a set of N experimental units. An n F - level unit factor F can be considered as a partition of Ω into n F disjoint nonempty subsets. Each subset, called an F-class, consists of units that have the same level of F. Given two factors F 1 and F 2, we say that F 1 is nested in F 2, (F 1 is finer than F 2, or F 2 is coarser than F 1 ), denoted by F 1 F 2, if any two units in the same F 1 -class are also in the same F 2 -class and F 1 F 2. We write F 1 F 2 if F 1 F 2 or F 1 = F 2. The coarsest factor, denoted by U and called the universal factor, is the single-level factor with all the units in the same class. On the other hand, the finest factor, denoted by E and called the equality factor, is the N-level factor with each class consisting of one single unit. A block structure B is a collection of unit factors on the same Ω and we include U and E in each block structure. For example, the block structure of a usual block design with N = bk units in b blocks of size k can be regarded as a collection {U, B, E} of three unit factors, where E is a bk-level factor corresponding to bk units, and B is a b-level block factor that partitions the bk units into b blocks each of size k. We have E B U. This is also the block structure of a split-plot experiment with b whole-plots each consisting of k subplots. An experiment with unstructured units is considered to have the block structure {U, E}. A factor is said to be uniform if all its classes are of the same size. Given two unit factors F 1 and F 2, the supremum of F 1 and F 2, denoted by F 1 F 2, is the factor such that (i) F 1, F 2 F 1 F 2, and (ii) F 1 F 2 G for all G such that F 1, F 2 G. We say that F 1 and F 2 are orthogonal if F 1 and F 2 have proportional frequencies in each (F 1 F 2 )-class, i.e., for each (F 1 F 2 )-class Γ, if both the ith F 1 -class and the jth F 2 -class are contained in Γ, then n ij = n i+n +j, Γ

4 M. C. CHANG AND C. S. CHENG where n i+, n +j, and n ij are the numbers of units in, respectively, the ith F 1 -class, the jth F 2 -class, and the intersection of these two classes, and Γ is the number of units in Γ. Throughout this paper, we only consider block structures B that satisfy the following conditions: (2.1) (2.2) (2.3) All the factors in B are uniform and pairwise orthogonal. E B, U B. F, G B F G B. We note that most of the block structures encountered in practice satisfy (2.1), (2.2), and (2.3). The relation between the units and the levels of F can be described by an N n F incidence matrix X F with 0 and 1 entries such that the (i, j)th entry of X F is 1 if and only if the ith unit is in the jth F-class. In particular, where 1 N is the N 1 vector of 1 s. X U = 1 N, 2.2. Treatment factorial effects. Suppose there are n two-level treatment factors. Denote each treatment combination by x = (x 1,..., x n ) T, where x i = 0 or 1 is the level of the ith treatment factor. Let α be a 2 n 1 vector with components α(x), where α(x) is the effect of treatment combination x. We can express α as α = Pβ, where P is a 2 n 2 n model matrix for a 2 n complete factorial experiment and β = (β S1,..., β S2 n ) T is the vector of treatment factorial effects β Sl, one for each S l {1,..., n}. Specifically, β = 1 2 x α(x) n is the mean and the corresponding column of P consists of 1 s. For S = {i}, 1 i n, β S is a main effect contrast of factor i, and in the associated column of P, the entry corresponding to x is 1 if x i = 1, and is 1 if x i = 0. Furthermore, for S = {i 1,..., i k }, β S is a k-factor interaction contrast and the corresponding column of P is the Hadamard product of the columns corresponding to the main effects of factors i 1,..., i k, where the Hadamard product of u = (u 1,..., u h ) T and v = (v 1,..., v h ) T is u v = (u 1 v 1,..., u h v h ) T. For convenience, β S is often written as i 1... i k (or a combination of letters if each factor is labeled by a letter), and is called a word of length k. Often we do not distinguish among S, β S, and the corresponding words. Under the Bayesian framework, β is treated as a random vector. Specifying the distribution of β is a crucial step. We take the approach proposed in [16]; see also the discussions in Section 10.11 of Cheng [4]. Here we regard α, a function of the factor settings x = (x 1,..., x n ), as a realization of a Gaussian process. Specifically, suppose the distribution of α is multivariate normal with zero mean and covariance matrix σ 2 R. It can be seen that the columns of P are mutually orthogonal; thus β = 1 2 P T α and hence n (2.4) β N ( 0, 2 2n σ 2 P T RP ).

BAYESIAN SELECTION OF MULTI-STRATUM FACTORIAL DESIGNS 5 It was shown in [16] that if α is stationary, i.e., the correlation between α(x 1 ) and α(x 2 ) depends only on the factors where x 1 and x 2 differ, then the β S s, S {1,..., n}, are independent. In this case cov(β) is diagonal and (2.5) (2 n/2 P T )cov(α)(2 n/2 P) = 2 n cov(β) is the spectral decomposition of cov(α). It follows that the columns of P, which define the β S s, S {1,..., n}, are eigenvectors, and the variances of the β S s are the eigenvalues of cov(α) divided by 2 n. As an example, assume that the Gaussian process has the covariance function cov(α(x 1 ), α(x 2 )) = σ 2 ρ j, 1 j n:x 1j x 2j where 0 < ρ 1,..., ρ n < 1. Then we have var(β S ) = σ2 2 n (1 ρ i ) 1 i n,i S 1 i n,i/ S (1 + ρ i ), cov(β S, β S ) = 0 if S S. This implies that if S S, then var(β S ) > var(β S ). This property, consistent with the hierarchical assumption, was referred to as the property of nested decreasing interaction variances in [15]. In addition, if ρ 1 = = ρ n = ρ, then we have var(β S ) = σ2 2 (1 ρ) S (1 + n ρ) n S, where S is the number of elements in S. An alternative parametrization is to let r = (1 ρ)/(1 + ρ) and τ 2 = σ 2 (1 + r) n, which leads to (2.6) var(β S ) = τ 2 r S. In this case of isotropic priors, factorial effects of the same order can be regarded as equally important; thus MA designs are expected to perform well. In the rest of the paper, we assume that α is stationary. 2.3. Design construction and defining words. The construction of a multi-stratum fractional factorial design involves the selection of a subset of treatment combinations and assigning them to various classes of the unit factors in the block structure. For regular designs, this is done by solving linear equations. With the levels represented by the two elements 0 and 1 of the field Z 2 under modulo 2 addition and multiplication, the treatment combinations can be identified with the 2 n vectors in the n-dimensional space Z n 2. For each nonzero vector a in Z n 2, the two equations at x = 0 and a T x = 1 divide the treatment combinations x into two disjoint sets, say H 0 and H 1, respectively. Let S = {1 i n : a i 0}, then the factorial effect β S defined in Section 2.2 is a difference of x:x H 0 α(x) and x:x H 1 α(x) divided by 2 n. A system of p linear equations a T i x = b i, i = 1,..., p, where a 1,..., a p are linearly independent, has 2 n p solutions, which constitute a regular 2 n p fractional factorial design.

6 M. C. CHANG AND C. S. CHENG The 2 p 1 nonzero linear combinations of a 1,..., a p (and the corresponding β S ) are called treatment defining words (effects). Let A k be the number of treatment defining words of length k. Then (A 1,..., A n ) is called the wordlength pattern of the design. The 2 p choices of b 1,..., b p divide the 2 n treatment combinations into 2 p disjoint sets, each of which can be used as a 2 n p fractional factorial design. Similarly, h linearly independent vectors b 1,..., b h such that a 1,..., a p, b 1,..., b h are linearly independent can be used to divide the treatment combinations in the fractional factorial design into 2 h blocks of equal size. The vectors a 1,..., a p are called independent treatment defining words, and b 1,..., b h are called independent block defining words. The non-zero linear combinations of a 1,..., a p, b 1,..., b h that are not treatment defining words are called block defining words (effects). 2.4. Statistical model. Let y = (y 1,..., y N ) T be the responses under a fractional factorial design, and the N experimental units have a block structure B = {F 0, F 1,..., F m }. Throughout this paper, let F 0 = U and F m = E. Suppose (2.7) m y = X T α + X Fi γ F i, γ F i j where X T is an N 2 n unit-treatment incidence matrix and γ F i = (γ F i 1,..., γf i n Fi ) T, where is the effect of the jth level of unit factor F i (for example, block effects, whole-plot effects, and subplot effects). We assume that the γ s are independent, with each γ F i j an N(0, σf 2 i ) distribution, and that they are independent of α. Then following (2.8) m y β N(X T Pβ, σf 2 i X Fi X T F i ). Let U = X T P and V = m σ2 F i X Fi X T F i. Then U is the full model matrix under the design. Each column of U, except for the column of 1 s, corresponds to a factorial effect. Throughout this paper, we only consider single-replicate complete factorial designs or fractional factorial designs in which no treatment combination is observed more than once. Then U consists of N rows of P. Since PP T = 2 n I 2 n, we have UU T = 2 n I N. If the design is a regular 2 n p fractional factorial design, then the 2 n columns of U can be partitioned into 2 n p sets of size 2 p, with the columns in the same set corresponding to aliased effects. Any two columns of U in the same alias set are either identical or can be obtained from each other by changing the signs of all the entries. Without loss of generality, we assume that all the aliased columns of U are identical. For each nonempty S {1,..., n}, let A(β S ) be the set of β S and all its aliases, and let A(β ) be the defining

BAYESIAN SELECTION OF MULTI-STRATUM FACTORIAL DESIGNS 7 contrast subgroup, i.e., the set consisting of the mean and the treatment defining words. For convenience, we also write the A(β S ) s as A 0,..., A 2 n p 1, with A 0 = A(β ). Under a block structure satisfying (2.1), (2.2), and (2.3), the covariance matrix V has m + 1 eigenspaces W F0,..., W Fm, with one eigenspace associated with each of the m + 1 unit factors, where W F0 = W U is the one-dimensional space consisting of all the vectors with constant entries, and each other eigenvector defines a unit contrast; furthermore, the eigenspaces are determined by the block structure and do not depend on the entries of V. The readers are referred to Bailey [2] and [4] for these results as well as how to determine the W Fi s from B. Let the corresponding eigenvalues be ξ F0,..., ξ Fm. Then for each c W Fi, we have var(c T y β) = c 2 ξ Fi. The eigenspaces W F0,..., W Fm are called strata and ξ F0,..., ξ Fm are referred to as stratum variances. Furthermore, (2.9) ξ F = G B: G F N n G σ 2 G. Thus if F i F j, then ξ Fi ξ Fj. The case where γ F i 1,..., γf i n Fi are unknown constants (fixed effects) can be treated by letting σf 2 i =, which leads to ξ Fj = if F i F j. Example 2.1. A block design has the block structure B = {U, B, E}. Let y ij be the observation on the jth unit in the ith block, i = 1,..., b, j = 1,..., k. Then under (2.7), y ij = α t(i,j) + γ U + γ B i + γ E i,j, where t(i, j) is the label of the treatment assigned to the jth unit in the ith block. Write γ U as µ, γi B as β i, and γi,j E as ɛ i,j. Then µ, β i s, and ɛ i,j s are independent random effects with zero means and var(µ) = σu 2, var(β i) = σb 2, var(ɛ i,j) = σe 2. Each eigenvector in W B is orthogonal to the vectors of 1 s, with all the entries corresponding to the units in the same block being equal; thus it defines a between-block contrast. The eigenspace W E is orthogonal to both W U and W B. It follows that each of its vectors defines a between-unit contrast in the same block. We call W B and W E inter- and intra-block strata, respectively. For each c W B (respectively, W E ), we have var(c T y α) = c 2 ξ B (respectively, c 2 ξ E ). The two eigenvalues ξ B and ξ E are called inter- and intra-block variances, respectively. By (2.9), we have ξ B = kσb 2 + σ2 E and ξ E = σe 2. Thus ξ B ξ E. The readers are referred to Chapter 5 of [4] for detailed derivations of strata for some simple block structures. For each nonempty S {1,..., n}, let u S be the column of U corresponding to β S. Then under a regular fractional factorial design, β S is a treatment defining word if and only if u S = 1 N ; otherwise, 1 T N u S = 0. Thus the word count A k can be expressed as (2.10) A k = 1 N S: S =k P WU u S 2,

8 M. C. CHANG AND C. S. CHENG where P WU is the orthogonal projection matrix onto W U. A regular multi-stratum fractional factorial design is said to be orthogonal if, for each column vector u of U, we have u W Fi for some i. In this case, we say that the factorial effect corresponding to u (together with its aliases) is estimated in stratum W Fi. For example, the blocked regular designs constructed by the method described in Section 2.3 are orthogonal, with all the block defining effects estimated in the interblock stratum (we say that they are confounded with blocks) and the effects that are neither block nor treatment defining effects estimated in the intrablock stratum. In this case, for each nonempty S {1,..., n}, we have that β S is a block defining word if and only if P WB u S = u S, where P WB is the orthogonal projection matrix onto W B. Let B k be the number of block defining words of length k; then similar to (2.10), we have (2.11) B k = 1 N S: S =k P WB u S 2. We refer the readers to Chapters 13 and 14 of [4] for a detailed treatment of the construction of orthogonal regular multi-stratum designs and how to determine the factorial effects estimated in each stratum. In general, for orthogonal regular multi-stratum designs, we define the word counts in various strata as follows: for k = 0,..., n and i = 0,..., m, B k,i = the number of S {1,..., n} such that S = k and u S W Fi. Then (2.12) B k,i = 1 N S: S =k P WFi u S 2. Note that for regular fractional factorial designs with unstructured units, (B 1,0,..., B n,0 ) is the usual wordlength pattern (A 1,..., A n ). For nonregular and nonorthogonal designs, the word counts B k,i can be defined directly via (2.12). In the case of nonregular designs with unstructured units, this yields the generalized wordlength pattern in the minimum G 2 -aberration criterion introduced by Tang and Deng [20]. 3. Optimality criteria. Bayesian optimality criteria, such as the D- and A-criteria, are based on the posterior covariance matrix cov(β y). From (2.4) and (2.8), the posterior distribution of β given y can easily be obtained by standard results on multivariate normal distributions. We have that β y is normal with (3.1) cov(β y) = Σ β Σ β U T (UΣ β U T + V) 1 UΣ β, where Σ β = 2 2n σ 2 P T RP. We say that a design is Bayesian A-optimal if it minimizes tr(cov(β y)) = S {1,...,n} var(β S y), and Bayesian D-optimal if it minimizes det(cov(β y)).

BAYESIAN SELECTION OF MULTI-STRATUM FACTORIAL DESIGNS 9 For any treatment combination x, α(x) can be expressed as p T x β, where p T x is the row of P corresponding to x. Thus the best linear unbiased predictor of α(x) given y is p T x E(β y), with conditional prediction error E{[p T x β p T x E(β y)] 2 y} = E{[β E(β y)] T p x p T x [β E(β y)] y}. It follows that the total conditional prediction error over all the treatment combinations is equal to var(p T x β y) = E{[β E(β y)] T p x p T x [β E(β y)] y} x full factorial x full factorial = E{[β E(β y)] T (P T P)[β E(β y)] y} = 2 n var(β S y), S {1,...,n} where the last equality holds since P T P = 2 n I 2 n. Thus the Bayesian A-optimality is equivalent to minimizing the overall conditional prediction error of the treatment effects. The following result gives an explicit form of cov(β y) for orthogonal regular fractional factorial designs. Theorem 3.1. Under an orthogonal regular 2 n p design, where the experimental units have a block structure B = {F i : i = 0,..., m} satisfying (2.1), (2.2), and (2.3), if β S is estimated in W Fi, then (3.2) var(β S y) = var(β S ) [var(β S )] 2 β:β A(β S ) var(β) + 2 (n p) ξ Fi. If β S and β S are in different alias sets, then cov(β S, β S y) = 0; if they are in the same alias set and are estimated in W Fi, then var(β S )var(β S ) cov(β S, β S y) = β:β A(β S ) var(β) +. 2 (n p) ξ Fi Proof. We first deal with the term (UΣ β U T + V) 1 in (3.1). Note that R N = F:F B W F, where N = 2 n p. Also R N = N 1 j=0 T j, where T 0 is spanned by the vector with all entries equal to 1, and each T j, 1 j N 1, is spanned by a factorial effect contrast of n p basic factors. Hence we have F:F B W F = N 1 j=0 T j. Since the design is orthogonal, each column of U is a basis vector of W Fi for some i. We can extract N distinct columns of U and standardize them to unit length to form a 2 n p 2 n p orthogonal matrix P. Then we have the following eigen-decomposition: (3.3) V = PΛ B PT,

10 M. C. CHANG AND C. S. CHENG where Λ B is a diagonal matrix with a diagonal entry equal to ξ Fi if the corresponding column in P is a basis vector of W Fi. On the other hand, restricting the Gaussian process to the treatment combinations in the regular fractional factorial design, analogous to (2.5), we have (3.4) cov(x T α) = 2 n p P Λβ PT, where Λ β is a 2 n p 2 n p diagonal matrix with each diagonal entry equal to S A var(β S) for some alias set A. Then since UΣ β U T = cov(uβ) = cov(x T α), by (3.4), we have (3.5) It follows from (3.1), (3.3), and (3.5) that UΣ β U T = 2 n p P Λβ PT. cov(β y) = Σ β Σ β U T P(2 n p Λβ + Λ B ) 1 PT UΣ β. The theorem is proved by noting that the inner product of a column of U and a column of P is 2 (n p)/2 if they are in the same alias set and is zero otherwise. Let ξ and v be the vectors of the ξ Fi s and var(β S ) s, respectively. Since S:S {1,...,n} var(β S ) is a constant, by (3.2), among the orthogonal regular designs, a design d is Bayesian A-optimal for (ξ, v) if and only if it maximizes Φ A (d; ξ, v) = S:S {1,...,n} [var(β S )] 2 β:β A(β S ) var(β) + e, S where e S = 2 (n p) ξ Fi if β S is estimated in W Fi. Under the prior distribution in (2.6), we have the following explicit form of Φ A (d; ξ, v). Theorem 3.2. Under (2.6), for an orthogonal regular fractional factorial design d, Φ A (d; ξ, v) = 2 n p 1 j=0 τ 4 n i=1 r2i N (j) i τ 2 n i=1 ri N (j), i + e j where e j = 2 (n p) ξ Fs if the effects in A j are estimated in W Fs and N (j) i is the number of words of length i in A j. In particular, (N (0) 1,..., N n (0) ) = (A 1,..., A n ) is the usual wordlength pattern. Theorem 3.3. Among the orthogonal regular 2 n p fractional factorial designs, a design d is Bayesian D-optimal if and only if it minimizes (3.6) Φ D (d; ξ, v) = 2 n p 1 j=0 e j β:β A j var(β) + e j. Under (2.6), Φ D (d; ξ, v) = 2 n p 1 j=0 e j τ 2. n i=1 ri N (j) i +e j

BAYESIAN SELECTION OF MULTI-STRATUM FACTORIAL DESIGNS 11 Proof. By (3.1), cov(β y) = Σ β W, where W = I 2 n U T (UΣ β U T + V) 1 UΣ β. Since det(σ β W) = det(σ β ) det(w) and det(σ β ) = S:S {1,...,n} var(β S), which is a constant, it is sufficient to show that det(w) is equal to the quantity in (3.6). By Theorem 3.1, cov(β y) is a block-diagonal matrix, and so is W. Let W = diag(w 0,..., W N 1 ), where each W j is a 2 p 2 p matrix corresponding to A j. Then det(w) = N 1 j=0 det(w j ). If the effects in A j are estimated in W Fi, then T j x 1 x 2 x 2 p W j = 1 x 1 T j x 2 x 2 p T j......, x 1 x 2 T j x 2 p where T j = β:β A j var(β) + 2 (n p) ξ Fi and the x l s are the var(β) s for β A j. Thus T j x 1 x 2 T j 2 p i=1 x i det(w j ) = 1 x 1 T j x 2 T j 2 p i=1 det i (T j ) 2p...... x 1 x 2 T j 2 p i=1 x i = T j T j x 1 x 2 1 2 p i=1 x i x 1 T j x 2 1 det (T j ) 2p...... x 1 x 2 1 = T j T j 0 0 2 p i=1 x i 0 T j 0 det (T j ) 2p...... 0 0 1 e j =. β:β A j var(β) + e j We note that in implementing the criteria in Theorem 3.2 and Theorem 3.3 under (2.6), one needs to specify the parameters τ, r and the ξ Fi s. When some unit factors have fixed effects, we can define the A- and D-criteria by letting the corresponding stratum variances go to infinity and taking the limits of Φ A (d; ξ, v) and Φ D (d; ξ, v). This amounts to imposing improper priors on the corresponding fixed unit effects. A useful result used in [7] is that for a Schur concave function f(x 1,..., x h ), if (x 1,..., x h ) is majorized by (y 1,..., y h ), then f(x 1,..., x h ) f(y 1,..., y h ). In addition, if f is increasing in each x i, then a good surrogate for maximizing f(x 1,..., x h ) is the (M.S)-criterion

12 M. C. CHANG AND C. S. CHENG due to Eccleston and Hedayat [10]: maximize h i=1 x i, and then minimize h those that maximize h i=1 x i. i=1 x2 i among 4. An analytical result. General analytical results in the present setting are difficult to prove. For unstructured units, minimum aberration half fractions were shown to be Bayesian A- and D-optimal in [15]. In the case of blocked 2 n 1 fractional factorial designs with two blocks of size 2 n 2, we show below that, under some mild conditions, the minimum aberration designs according to a criterion of Cheng and Wu [9] are Bayesian D- and A- optimal. To construct a design, we need a treatment defining word to define the half fraction and another treatment interaction (block defining word) to divide the 2 n 1 treatment combinations in the half fraction into two blocks. The block defining effect and its alias are then confounded with blocks. The other factorial effects are estimated in W E. The following result can be used to help choose optimal block defining word when the treatment defining word is fixed. Lemma 4.1. Among the 2 n 1 designs with two blocks of size 2 n 2 and a given treatment defining word of length L, those that confound two aliased treatment factorial effects involving n L 2 and n L 2 factors with blocks are Bayesian A- and D-optimal under the prior specified in (2.6). Proof. Let d be a 2 n 1 design with a treatment defining word of length L and two blocks of size 2 n 2. Under d, one pair of aliased words of lengths u + x and u + L x are confounded with blocks, where x {0, 1,..., L/2 } and u {0, 1,..., n L}. Let f(x, u) = Φ A (d; ξ, v) Φ A (d; ξ, v), where Φ A (d; ξ, v) is obtained from Φ A (d; ξ, v) by replacing ξ B with ξ E. Then since Φ A (d; ξ, v) is a constant not depending on the block defining word, maximizing Φ A (d; ξ, v) is the same as minimizing f(x, u). We have (4.1) f(x, u) = τ 4 r 2u (r 2x + r 2L 2x ) τ 2 r u (r x + r L x ) + ξ E /2 n 1 τ 4 r 2u (r 2x + r 2L 2x ) τ 2 r u (r x + r L x ) + ξ B /2 n 1 = ξ B ξ E 2 n 1 = ξ B ξ E 2 n 1 ξ B ξ E 2 n 1 (r x + r L x ) 2 2r L {r x + r L x + ξ B/τ 2 2 n 1 r u } {r x + r L x + ξ E/τ 2 1 2 n 1 r u } {1 + 1 r x +r L x ξ B /τ 2 2 n 1 r u }{1 + 1 r x +r L x ξ E /τ 2 2 n 1 r u } 2r L {r x + r L x + ξ B/τ 2 }{r 2 n 1 r x + r L x + ξ E/τ 2 }. u 2 n 1 r u It is easy to see from (4.1) that when x is fixed, since ξ B > ξ E and 0 < r < 1, f(x, u) attains the minimum at u = n L, the largest possible value of u. Then from the last expression of f(x, u) above, f(x, n L) is minimized at x = L 2 since rx + r L x attains its minimum at x = L 2. Thus f(x, u) is minimized at x = L 2 and u = n L. Then we have u + x = n L 2 and u + L x = n L 2.

BAYESIAN SELECTION OF MULTI-STRATUM FACTORIAL DESIGNS 13 For the D-optimality, by Theorem 3.3, minimizing Φ D (d; ξ, v) is equivalent to maximizing 2 n 1 1 (v i + ξ i /2 n 1 ), where each v i is of the form τ 2 r u (r x + r L x ), exactly one of the ξ i s is equal to ξ B, and all the others are equal to ξ E. Since 2 n 1 1 (v i + ξ i /2 n 1 ) is a Schur concave function of (v 0 + ξ 0 /2 n 1,..., v 2 n 1 1 + ξ 2 n 1 1/2 n 1 ), a blocking scheme is D-optimal if its (v 0 + ξ 0 /2 n 1,..., v 2 n 1 1 + ξ 2 n 1 1/2 n 1 ) is majorized by that of any other blocking scheme. Without loss of generality, suppose v 0... v 2 n 1 1. Then since ξ B > ξ E, it is easy to see that the blocking scheme with ξ 2 n 1 1 = ξ B produces a (v 0 + ξ 0 /2 n 1,..., v 2 n 1 1 + ξ 2 n 1 1/2 n 1 ) that is majorized by that of any other blocking scheme. Thus it suffices to show that r u (r x + r L x ) is minimized at u = n L and x = L 2. This holds since ru (r x + r L x ) is a decreasing function of both u and x ( L/2), and u and x are the largest possible values of u and x, respectively. Example 4.1. Among the orthogonal regular 2 4 1 designs with two blocks of size 4 and treatment defining word 1 (respectively, 12, 123, or 1234), the one with the block defining word 234(= 1234), (respectively, 134(= 234), 124(= 34), or 24(= 13)), is the best under the Bayesian A- and D-criteria. We now show that under some mild conditions the best design in Lemma 4.1 with L = n (that is, a maximum resolution design) is a Bayesian D-optimal 2 n 1 design with two blocks of size 2 n 2. Theorem 4.1. Let d be the 2 n 1 resolution n design with two blocks of size 2 n 2 that confounds two aliased treatment factorial effects involving n n 2 and n n 2 factors with blocks. For a given h > 0, suppose 1. r ( h 2 + 4 h) 2 /4, 2. ξ B (2 n 1 τ 2 r n 1.5 )h. Then d is Bayesian D-optimal among all the orthogonal regular 2 n 1 designs with two blocks of size 2 n 2 under the prior specified in (2.6). Remark 4.1. It can be seen that the optimal design in Theorem 4.1 has minimum aberration with respect to the wordlength pattern W 1 proposed in [9]. We note that the assumption ξ B (2 n 1 τ 2 r n 1.5 )h is equivalent to ξ B (2 n τ 2 r n )( r 1.5 2 )h, where 2 n τ 2 r n is the prior variance of a normalized n-factor interaction. It can be seen that f(h) = ( h 2 + 4 h) 2 /4, the upper bound on r in Theorem 4.1, is decreasing in h. Therefore when h is large (implying ξ B can be large), d is expected to be optimal under smaller r s. Kang and Joseph [14] argued that r = 1/3 is a good choice when there is no other prior knowledge. We have f(1.15) 1/3, indicating that for r = 1/3, the result in Theorem 4.1 holds for ξ B (2 n τ 2 r n )(3 1.5 1.15 2 ) 2.988(2n τ 2 r n ). Proof of Theorem 4.1. It is sufficient to show that Φ D (d 1 ; ξ, v) Φ D (d 2 ; ξ, v) for any two blocked 2 n 1 designs d 1 and d 2 with treatment defining words {1,..., L} and

14 M. C. CHANG AND C. S. CHENG {1,..., L 1}, respectively (L 3), and the block defining words as prescribed in Lemma 4.1. Let H 1 = {1,..., L}, H 2 = {1,..., L 1}, and U 0 = {I, H 1, H 2, H 1 H 2 }, where H 1 H 2 = (H 1 H 2 ) \ (H 1 H 2 ). Then all the 2 n words can be partitioned into 2 n 2 classes, each of which other than U 0 is a coset of U 0 with respect to the group operation. Denote these cosets by U j, j = 1,..., 2 n 2 1. Then each U j consists of two pairs of aliased words under both d 1 and d 2. It follows from Theorem 3.3 that for d = d 1 or d 2, Φ D (d; ξ, v) = 2 n 2 1 j=0 Φ D,j (d; ξ, v), where Φ D,j (d; ξ, v) is the contribution of U j to the product in (3.6). Under d 1, H 1 is aliased with I (which corresponds to the mean). Thus H 1 and I (of lengths L and 0, respectively) are the only effects estimated in W U. The other two words in U 0, H 2 and H 1 H 2, have lengths L 1 and 1, respectively, and cannot be estimated in W B, since, by Lemma 4.1, the two effects confounded with blocks have lengths n L 2 and n L 2, respectively. Thus H 2 and H 1 H 2 are estimated in W E. Likewise, under d 2, I and H 2 are estimated in W U, and the other two effects in U 0, H 1 and H 1 H 2, are estimated in W E. Without loss of generality, suppose that under d 1, the two effects estimated in W B, say H and H H 1, are in U 1. By Lemma 4.1, we may assume that H = n L 2 and H H 1 = n L 2. Then H H 1 = n L 1 2 and H (H 1 H 2 ) = n L 1 2. It follows that H H 1 and H (H 1 H 2 ) are the two words that are aliased and estimated in W B under d 2. Then H (H 1 H 2 ) and H H 2 are aliased and estimated in W E under d 1, while H and H H 2 are aliased and estimated in W E under d 2. For all j 2, all the effects in U j are estimated in W E. Based on this and the information from the previous two paragraphs, one can write down Φ D,0 (d 1 ; ξ, v), Φ D,0 (d 2 ; ξ, v), Φ D,1 (d 1 ; ξ, v), and Φ D,1 (d 2 ; ξ, v). When the experimental units are unstructured, Kerr [15] proved Φ D (d 1 ; ξ, v) Φ D (d 2 ; ξ, v) by showing that Φ D,j (d 1 ; ξ, v) Φ D,j (d 2 ; ξ, v) for all j = 0,..., 2 n 2 1. Effectively she has shown that Φ D,j (d 1 ; ξ, v) Φ D,j (d 2 ; ξ, v) for all j 2 in our setting. Thus it is sufficient to show Φ D,j (d 1 ; ξ, v) Φ D,j (d 2 ; ξ, v) for j = 0, 1. For Φ D,0 (d 1 ; ξ, v) Φ D,0 (d 2 ; ξ, v), we need to show (1 + r L + ξ U 2 n 1 τ 2 )(r + rl 1 + ξ E 2 n 1 τ 2 ) (1 + rl 1 + ξ U 2 n 1 τ 2 )(r + rl + ξ E 2 n 1 τ 2 ). ξ This inequality is equivalent to U ξ 2 n 1 τ 2 E 1 + r, which always holds since ξ 2 n 1 τ 2 U ξ E and r < 1. For Φ D,1 (d 1 ; ξ, v) Φ D,1 (d 2 ; ξ, v) when L is odd, we need to show (r 1 2 + r 1 2 + δb )(r 1 2 + r 1 2 + δe ) (2r 1 2 + δb )(2r 1 2 + δe ), where δ B = 2 (n 1) ξ B τ 2 r n L/2 (4.2) and δ E = 2 (n 1) ξ E τ 2 r n L/2. This is equivalent to 1 r (δ B δ E )r 1 2 0.

BAYESIAN SELECTION OF MULTI-STRATUM FACTORIAL DESIGNS 15 It follows from the assumption ξ B {2 n 1 τ 2 r n 1.5 }h that 0 < δ E δ B h. Then a sufficient condition for (4.2) is 1 r hr 1 2 0. This holds if r ( h 2 + 4 h) 2 /4. For Φ D,1 (d 1 ; ξ, v) Φ D,1 (d 2 ; ξ, v) when L is even, we need to show which can be simplified as (2 + δ B )(r 1 + r + δ E ) (1 + r + δ B )(1 + r 1 + δ E ), (4.3) 1 r r(δ B δ E ) 0. Since r 1/2 > r, (4.3) is implied by (4.2). We can also establish a version of Theorem 4.1 for the Bayesian A-criterion by replacing the conditions Φ D,i (d 1 ; ξ, v) Φ D,i (d 2 ; ξ, v), i = 0, 1, in the proof with Φ A,i (d 1 ; ξ, v) Φ A,i (d 2 ; ξ, v), i = 0, 1. 5. More complicated block structures: a surrogate criterion. For a setting as simple as regular half fractions with two blocks, the proof of Theorem 4.1 is already quite tedious. The determination of optimal designs with more complicated block structures is very challenging. In this section, we derive a more manageable two-stage surrogate criterion that can be applied to nonregular designs as well. Notice that the popular minimum aberration criterion is itself a surrogate for another statistically more meaningful criterion; see [5]. Interestingly the first stage of our surrogate criterion can be expressed in terms of wordlength and generalized wordlength patterns for regular and nonregular designs, respectively. The surrogate criterion depends on the stratum variances, but as in [7], a useful tool is developed for eliminating many inferior designs without having to know the stratum variances; then one only needs to select from a much smaller essentially complete class. We first note that another expression of cov(β y) instead of (3.1) is Thus we have Since det(σ 1 β ( cov(β y) = U T V 1 U + Σ 1 β ( det[cov(β y) 1 ] = det det ( m 1 ξ Fi Σ β U T P WFi U + I 2 n Σ 1 β ) 1 = ( m ( ) m det ) 1 1 U T P WFi U + Σ 1 ξ β. Fi ) 1 Σ β U T P WFi U + I 2 n. ξ Fi ) is a constant, maximizing det[cov(β y) 1 ] is equivalent to maximizing ). A design is said to be Bayesian (M.S)-optimal if it maximizes tr( m 1 ξ Fi Σ β U T P WFi U + I 2 n) and minimizes tr[( m 1 ξ Fi Σ β U T P WFi U + I 2 n) 2 ] among the designs that maximize tr( m 1 ξ Fi Σ β U T P WFi U + I 2 n). Since tr(ab) =

16 M. C. CHANG AND C. S. CHENG tr(ba) for any two compatible matrices A and B, and P 2 = P = P T for any orthogonal projection matrix P, we have ( m ) 1 m tr Σ β U T 1 [ ] P WFi U + I 2 n = tr Σ 1/2 ξ Fi ξ β (P W Fi UΣ 1/2 β )T (P WFi U) + 2 n Fi = m 1 [ ] tr (P WFi UΣ 1/2 ξ β )T (P WFi UΣ 1/2 β ) + 2 n. Fi If all the factorial effects involving k factors have identical prior variance v k, k = 0, 1,..., n (say, under (2.6)), then m 1 [ ] m tr (P WFi UΣ 1/2 ξ β )T (P WFi UΣ 1/2 β ) 1 n = v k P WFi u S 2. Fi ξ Fi k=0 S: S =k Since m P W Fi = I N, one can replace P WFm with I N m 1 P W Fi. Then m 1 n v k ξ Fi k=0 S: S =k P WFi u S 2 = c = c N m 1 ( 1 1 ) n v k ξ Fm ξ Fi k=0 ) n m 1 ( 1 ξ Fm 1 ξ Fi k=0 S: S =k v k B k,i (d), P WFi u S 2 where c = 1 n ξ Fm k=0 v k S: S =k u S 2, which is a constant, and B k,i (d) is as defined in (2.12). Furthermore, since B 0,0 (d) = 1 and B 0,i (d) = 0 for all i > 0, maximizing tr( m 1 ξ Fi Σ β U T P WFi U + I 2 n) is equivalent to minimizing (5.1) Φ M (d; ξ, v) = m 1 ( 1 1 ) n v k B k,i (d). ξ Fm ξ Fi k=1 For the designs that minimize Φ M (d; ξ, v), minimizing tr[( m 1 ξ Fi Σ β U T P WFi U + I 2 n) 2 ] reduces to minimizing Φ S (d; ξ, v) = m 1 ξ 2 tr [(Σ β U T P WFi U) 2] F i 1 [ ] + 2 tr (Σ β U T P WFl U)(Σ β U T P WFs U). ξ Fl ξ Fs 0 l<s m Thus a Bayesian (M.S)-optimal design d minimizes Φ M (d; ξ, v) and minimizes Φ S (d; ξ, v) among those that minimize Φ M (d; ξ, v). Note that Φ M (d; ξ, v) can be expressed in terms of the generalized wordlength patterns.

BAYESIAN SELECTION OF MULTI-STRATUM FACTORIAL DESIGNS 17 Under orthogonal regular designs, Φ S (d; ξ, v) can further be simplified. It can be seen that in this case (Σ β U T P WFl U)(Σ β U T P WFs U) is a zero matrix when l s. Thus we have Φ S (d; ξ, v) = m 1 [ ξf 2 tr (Σ β U T P WFi U) 2]. i For i = 1,..., m, let h i be the number of alias sets estimated in W Fi, and N (i,j) k (d) be the number of words of length k in the jth alias set estimated in W Fi, j = 1,..., h i. For i = 0, let h 0 = 1 and N (0,1) k (d) = B k,0 (d). Then B k,i (d) = h i j=1 N (i,j) k (d). Some routine calculation yields (5.2) Φ S (d; ξ, v) = m N 2 h i ξ 2 F i j=1 [ n k=0 v k N (i,j) k (d)] 2. Furthermore, if d is a complete factorial design, then B k,0 (d) = 0 for k 1 and n k = 1 for all i, j. It follows that [ h n i j=1 k=0 v kn (i,j) 2 k (d)] = n k=0 v2 k B k,i(d). Then since m B k,i(d) = Ck n, by (5.2) we have ( ) m 1 n Φ S (d; ξ, v) = c N 2 1 (5.3) ξf 2 1 m ξf 2 vk 2 B k,i(d), i i=1 k=1 k=0 N (i,j) where c is a constant. Thus an orthogonal multi-stratum complete factorial design is Bayesian (M.S)-optimal if it maximizes m 1 i=1 ( 1 1 ) n ξfm 2 ξf 2 k=1 v2 k B k,i(d) among those i that minimize m 1 i=1 ( 1 ξ Fm 1 ξ Fi ) n k=1 v kb k,i (d). Proposition 5.1. In the case of unstructured experimental units, a regular design minimizes Φ M (d; ξ, v) if and only if it minimizes n k=1 v kb k,0 (d). If v k v k+1 for all k, then a good surrogate is to sequentially minimize B 1,0 (d),..., B n,0 (d); that is, the usual minimum aberration criterion. Proof. For unstructured units, B = {F 0, F 1 }. By (5.1), minimizing Φ M (d; ξ, v) is equivalent to minimizing n k=1 v kb k,0 (d), where (B 1,0 (d),..., B n,0 (d)) is the wordlength pattern of d. The condition v k v k+1 for all k amounts to that the lower-order effects are much more important than high-order ones. For unstructured units, Proposition 5.1 formally connects our criterion with minimum aberration, a popular criterion under the commonly made effect hierarchy assumption when the prior knowledge about the underlying system is vague. Similarly, for nonregular designs, generalized minimum aberration is a good surrogate for

18 M. C. CHANG AND C. S. CHENG minimizing Φ M (d; ξ, v). Thus the Bayesian (M.S)-criterion developed here can be viewed as a refinement of minimum and generalized minimum aberration. One may also need to carry out the S-step. Under the same assumption as in Proposition 5.1, it was shown in [12] that minimum aberration designs minimize the posterior variance of the mean (intercept), but this result cannot be extended to nonregular designs. As mentioned earlier, minimum aberration regular half-fractions were shown to be Bayesian D- and A-optimal in [15]. Blocking of complete 2 n factorial designs is related to the construction of 2 n p regular fractional factorial designs in that a defining contrast subgroup that defines a fractional factorial design divides the 2 n treatment combinations into 2 p sets of size 2 n p. Each of the 2 p sets can be used as a 2 n p fractional factorial design. On the other hand, these sets together form 2 p blocks of a complete factorial. The treatment defining words of a fractional factorial design are those that are confounded with blocks in the corresponding blocking of the complete factorial. Under the hierarchical assumption, it is desirable not to confound too many lower order effects with blocks. Sun, Wu, and Chen [19] extended the minimum aberration criterion from fractional factorial designs to blocked complete factorial designs. A blocked complete factorial design is said to have minimum aberration if it sequentially minimizes B 1, B 2,..., where B i is the number of block defining words of length i. A result similar to Proposition 5.1 holds for optimal blocking of complete factorial designs. Proposition 5.2. Suppose the 2 n treatment combinations in a complete factorial design are to be divided into 2 p blocks of size 2 n p. If v k v k+1 for all k, then minimum aberration is a good surrogate for the Bayesian (M.S)-optimality over orthogonal designs. Proof. In this case, m = 2. By the discussions preceding Proposition 5.1, since ξ F1 > ξ F2, the M-step is to minimize n k=1 v kb k,1 (d) and the S-step is to maximize n k=1 v2 k B k,1(d). When v k v k+1 for all k,, minimizing n k=1 v kb k,1 (d) is achieved by sequentially minimizing B 1,1 (d), B 2,1 (d),..., i.e., minimum aberration. By the same assumption, the designs that minimize n k=1 v kb k,1 (d) have the same B k,1 values. It follows that they also have the same value of n k=1 v2 k B k,1(d). Thus the S-step is not needed. In the following theorem we provide a necessary and sufficient condition for a design to minimize Φ M (d; ξ, v) for all feasible stratum variances. Here ξ is said to be feasible if F i F j ξ Fi ξ Fj. Theorem 5.1. Suppose B is a block structure satisfying (2.1), (2.2), and (2.3). Then a necessary and sufficient condition for a design d to minimize Φ M (d; ξ, v) for all feasible ξ is that it minimizes n i:f i G k=1 v kb k,i (d) for all subsets G of B \ {F m } such that (5.4) F G, F B, and F F F G. We first show the necessity part since it is straightforward and is useful for understanding the implications and utilities of the theorem. The proof of the sufficiency part will be

BAYESIAN SELECTION OF MULTI-STRATUM FACTORIAL DESIGNS 19 deferred to the end of this section. Suppose d minimizes Φ M (d; ξ, v) for all feasible ξ. For any G B \ {F m } satisfying (5.4), let ξ be such that ξ F = ξ F0 for all F G and ξ F = ξ Fm for all F / G. Then since ξ Fm < ξ F0, it is easy to see that such a ξ is feasible. In this case, (5.5) Φ M (d; ξ, v) = ( 1 1 ) ξ Fm ξ F0 i:f i G k=1 n v k B k,i (d). It follows that d minimizes n i:f i G k=1 v kb k,i (d). This proves the necessity part. By (5.5), we can interpret the conclusion of Theorem 5.1 as that d minimizes Φ M (d; ξ, v) for all feasible ξ if and only if, for all B satisfying (5.4), it minimizes Φ M (d; ξ, v) for the cases with ξ F = ξ F0 for all F G and ξ F = ξ Fm for all F / G. Since n i:f i G k=1 v kb k,i (d) does not depend on ξ and only involves the unit factors in G, it is enough to consider only the case ξ F = ξ F0 = for all F G and ξ F = ξ Fm for all F / G. In other words, in verifying the necessary and sufficient condition, one only needs to consider the block structures G {E} (note that F m = E), where all the unit factors in G have fixed effects. Thus by checking the minimization of Φ M (d; ξ, v) for a small number of extreme cases with fixed unit effects (and no knowledge about ξ is needed to do this), one is able to conclude the strong property of minimizing Φ M (d; ξ, v) for all feasible ξ. Even if a design with such a strong optimality property does not exist, suppose n i:f i G k=1 v kb k,i (d 1 ) n i:f i G k=1 v kb k,i (d 2 ) for all subsets G of B \ {F m } satisfying (5.4), with strict inequality for at least one such G; then d 2 is worse than d 1 and we say that d 2 is inadmissible. This provides a simple way of eliminating inferior designs and substantially reducing the designs that need to be considered. Since many designs are ruled out from consideration, one can also use the actual Bayesian A- and D-criterion values to compare the remaining designs. By (5.3) and the same argument as in the proof of Theorem 5.1, the following result for complete factorial designs can be established. Theorem 5.2. Let d be an orthogonal 2 n complete factorial design with a block structure B satisfying (2.1), (2.2), and (2.3). Then d is Bayesian (M.S)-optimal for all feasible ξ if and only if, for all subsets G of B \ {F m } satisfying (5.4), d minimizes Φ M (d; ξ, v) and minimizes Φ S (d; ξ, v) among the designs that minimize Φ M (d; ξ, v), where ξ Fi = for all F i G and ξ Fi = ξ Fm for all F i / G. Example 5.1 (A chain of nested unit factors). Suppose B = {F 0,..., F m }, where F i F j for all i > j. The G s that satisfy (5.4) are the m sets {F 0 }, {F 0, F 1 },..., {F 0,..., F m 1 }. It follows from Theorem 5.1 that a design d minimizes Φ M (d; ξ, v) for all ξ such that i > j ξ Fi ξ Fj provided that it minimizes l n k=1 v kb k,i (d) for each l = 0, 1,..., m 1. By the discussions preceding Theorem 5.2, minimizing l n k=1 v kb k,i (d) is the same as minimizing Φ M (d; ξ, v) under the block structure B with ξ F0 = = ξ Fl = and ξ Fl+1 = = ξ Fm. It can be seen that (i) for l > 0, this is equivalent to minimizing Φ M (d; ξ, v) under the block structure B l = {F 0, F l, F m } with ξ Fl = ξ F0 =, i.e., an

20 M. C. CHANG AND C. S. CHENG experiment with fixed block effects, where each block consists of the units with the same level of F l, and (ii) for l = 0, it is equivalent to minimizing Φ M (d; ξ, v) under the block structure B 0 = {F 0, F m }, i.e., an experiment with unstructured units. In particular, for B = {E, B, U}, a blocked factorial experiment, if a design d minimizes Φ M (d; ξ, v) for an experiment with fixed block effects (ξ B = ) as well as an experiment with unstructured units (ξ B = ξ E ), then it minimizes Φ M (d; ξ, v) for all ξ B ξ E. Under v 1 v n, a good surrogate for minimizing Φ M (d; ξ, v) is to sequentially minimize m 1 ( 1 ξ Fm 1 ξ Fi )B 1,i (d),..., m 1 ( 1 ξ Fm 1 ξ Fi )B n,i (d). Thus the minimization of Φ M (d; ξ, v) leads to a minimum aberration criterion based on the wordlength pattern ( m 1 ) (5.6) ( 1 1 m 1 )B 1,i (d),..., ( 1 1 )B n,i (d). ξ Fm ξ Fi ξ Fm ξ Fi Such a criterion depends on the stratum variances. On the other hand, a good surrogate for minimizing n i:f i G k=1 v kb k,i (d) is to sequentially minimize i:f i G B 1,i(d),..., i:f i G B n,i (d). For each subset G of B\{F m } satisfying (5.4), this induces a minimum aberration criterion based on the wordlength pattern (5.7) B 1,i (d),..., B n,i (d). i:f i G i:f i G Similar to Theorem 5.1, the following holds for such surrogate minimum aberration criteria. Corollary 5.1. A necessary and sufficient condition for a design d to have minimum aberration with respect to the wordlength pattern (5.6) for all feasible ξ is that it has minimum aberration with respect to (5.7) for all subsets G of B \ {F m } satisfying (5.4). It follows that if d 1 is at least as good as d 2 with respect to the MA criterion based on (5.7) for all subsets G of B \ {F m } satisfying (5.4), and is better than d 2 for at least one such G, then d 2 is dominated by d 1 for feasible ξ s with respect to the MA criterion based on (5.6) and can be ruled out from consideration. The conclusion drawn for experiments with a chain of nested unit factors (in particular, blocked factorial experiments) studied in Example 5.1 also applies to the surrogate MA criteria formulated above. Remark 5.1. Theorem 5.1 has a similar flavor to Theorem 4.1 of [7], which provided a necessary and sufficient condition for a design to be optimal with respect to a surrogate for the maximum information capacity criterion considered there for all feasible ξ. In applying Theorem 4.1 of [7], one has to check certain conditions for all subsets G of B \ {F 0 } such that (5.8) F G, F B and F F F G.