Optimal design of experiments Session 4: Some theory Peter Goos / 40 Optimal design theory continuous or approximate optimal designs implicitly assume an infinitely large number of observations are available is mathematically convenient exact or discrete designs finite number of observations fewer theoretical results / 40
Continuous versus exact designs continuous { } x x ξ =... x h w w... w h x,x,...,x h : design points or support points w,w,...,w h : weights (w i 0, i w i = ) h: number of different points exact { } x x ξ =... x h n n... n h n,n,...,n h : (integer) numbers of observations at x,...,x n i n i = n h: number of different points 3 / 40 Information matrix all criteria to select a design are based on information matrix model matrix + + + f T (x ) + + + f T (x ) X = + + + = f T (x 3 )....... 0 0 0 0 0 f T (x n ) I x x x x x x 4 / 40
Information matrix (total) information matrix M = σ XT X = σ n f(x i )f T (x i ) i= per observation information matrix σ f(x i)f T (x i ) 5 / 40 Information matrix industrial example σ ( X T X ) = 0 0 0 6 6 0 6 0 0 0 0 0 0 6 0 0 0 0 0 0 4 0 0 6 0 0 0 6 4 6 0 0 0 4 6 6 / 40
Information matrix exact designs M = h n i f(x i )f T (x i ) i= where h = number of different points n i = number of replications of point i continuous designs h M = w i f(x i )f T (x i ) i= 7 / 40 D-optimality criterion seeks designs that minimize variance-covariance matrix of ˆβ... that minimize σ (X T X)... that minimize (X T X)... that maximize X T X or M D-optimal designs minimize generalized variance of ˆβ volume of confidence ellipsoid about unknown β D stands for determinant 8 / 40
D s -optimality criterion useful when the interest is only in a subset of the parameters useful when intercept or block effects are of no interest seeks designs that minimize determinant of variance-covariance matrix corresponding to subset of β Y = Xβ + ɛ = X β + X β + ɛ where β is the set of parameters of interest 9 / 40 D s -optimality criterion [ X M = (X T T X) = X X T X ] X T X X T X [ ] cov( ˆβ) = (X T X) A B = C D part of the variance-covariance matrix corresponding to β minimizing A is the same as minimizing ( X T X X T X (X T X ) X T X ) and as maximizing X T X X T X (X T X ) X T X 0 / 40
A-optimality criterion seeks designs that minimize average variance of parameter estimates seeks designs that minimize p var ( ) ˆβi /p i= p = number of model parameters seeks designs that minimize trace σ ( X T X ) or trace ( X T X ) / 40 V- and G-optimality criteria V-optimality (also Q-, IV-, I-optimality) seeks designs that minimize average prediction variance over design region χ... that minimize σ f T (x)(x T X) f(x)dx G-optimality χ seeks designs that minimize maximum prediction variance over design region χ... that minimize max χ var{ŷ(x)} = max χ f T (x)(x T X) f(x) / 40
Discussion problem: in general, all optimality criteria lead to different designs exception: D-optimal continuous designs = G-optimal continuous designs general equivalence theorem prediction variance is maximal in design points maximum = number of model parameters what optimality criterion to use? 3 / 40 D-optimality criterion most popular criterion D-optimal designs are usually quite good w.r.t. other criteria D-optimal designs are not affected by linear transformations of levels of the experimental variables computational advantages: update formulas 4 / 40
Linear transformations of factor levels X XA = Z max X T X max Z T Z = (XA) T XA = A T X T XA = A T X T X A = A X T X 5 / 40 Quadratic regression in one variable χ = [,] Y = β 0 + β x + β x + ɛ D-optimal continuous design { } 0 + Design = /3 /3 /3 information matrix 0 /3 M = 0 /3 0 /3 0 /3 6 / 40
Quadratic regression in one variable general equivalence theorem implies that D-optimal continuous design is also G-optimal D-optimal design minimizes the maximum prediction variance maximum is equal to p check by plotting prediction variance for all x [,] var{ŷ(x)} = f T (x)m f(x) 7 / 40 Prediction variance var{ŷ(x)} = f T (x)m f(x) = [ x x ] 3 0 3 3 0 0 9 3 0 = [ 3 3x 3 x 3 + 9 x] = 3 3x + 3 x 3x + 9 x4 x x x x = 3 9 x + 9 x4 8 / 40
Prediction variance var{ŷ(x)} p =3 0.5 0 0.5 9 / 40 Quadratic regression in one variable consider other design Design = { 0 } + /4 / /4 information matrix 0 M = 0 0 0 0 / 40
Prediction variance var{ŷ(x)} = f T (x)m f(x) = [ x x ] 0 0 0 0 4 = [ x x + 4x ] = x + x x + 4x 4 = x + 4x 4 x x x x / 40 Prediction variance var{ŷ(x)} 4 p =3 0.5 0 0.5 Design is not D-optimal / 40
Optimal exact designs A-optimal continuous design ξ = { } 0 4 4 what if n = 4,8,? multiply weights of continuous design by 4, 8 or integer numbers of runs at each point what if n = 5,6,7? multiply weights by 5, 6, or 7 non-integer numbers of runs at some of the points not useful in practice 3 / 40 Polynomial regression in one variable D-optimal design points for the dth order polynomial regression in one variable d x x x 3 x 4 x 5 x 6 x 7 weight / 0 /3 3 a 3 a 3 /4 4 a 4 0 a 4 /5 5 a 5 b 5 b 5 a 5 /6 6 a 6 b 6 0 b 6 a 6 /7 a 3 = /5 b 5 = (7 7)/ a 4 = 3/7 a 6 = (5 + 5)/33 a 5 = (7 + 7)/ b 6 = (5 5)/33 4 / 40
Quadratic regression in one variable Design has covariance matrix 3 0 3 (X T X) = 0 3/ 0 3 0 9/ with trace(x T X) = 3 + 3/ + 9/ = 9 Design has covariance matrix 0 (X T X) = 0 0 0 4 with trace(x T X) = + + 4 = 8 Design is A-optimal 5 / 40 3 Factorial design main-effects model Y = β 0 + β x + β x + β 3 x 3 + ɛ model matrix + + + + X = + + + + + + + + 6 / 40
3 factorial design (total) information matrix 8 0 0 0 M = X T X = 0 8 0 0 0 0 8 0 0 0 0 8 per observation information matrix 0 0 0 M = n (XT X) = 0 0 0 0 0 0 0 0 0 7 / 40 Prediction variance var{ŷ(x)} = f T (x) { M } f(x) = [ ]{ x x x } 3 M x x x 3 = + x + x + x 3 if χ = [,] 3, then this is maximal when x = ±, x = ±, x 3 = ± maximum = 4 = p general equivalence theorem is satisfied 3 factorial design is D- and G-optimal 8 / 40
More Plackett-Burman designs: optimal for main-effects model factorial designs: optimal for main-effects-plus-interactions models some remarks off-diagonal elements of information matrix often zero (impossible when quadratic terms) optimal designs are often symmetric these properties are more difficult to achieve for exact designs 9 / 40 Quadratic regression in two variables Y = β 0 +β x +β x +β x x +β x +β x +ɛ D-optimal continuous design + x 0.096 x 0.0805 0.4575 + D-optimal discrete designs n = 9: one observation at each point of the continuous D-optimal design n = 6: D-optimal exact design does not resemble D-optimal continuous one 30 / 40
Information matrix D-optimal continuous design 0 0 0 0.74 0.74 0 0.74 0 0 0 0 0 0 0.74 0 0 0 0 0 0 0.58 0 0 0.74 0 0 0 0.74 0.58 0.74 0 0 0 0.58 0.74 3 / 40 Quadratic regression in two variables n = 6 n = 7 n = 8 n = 9 D-optimal exact designs 3 / 40
Quadratic design in two variables n = 6 n = 7 n = 8 n = 9 D-optimal exact three-level designs 33 / 40 Quadratic regression in variables run, runs, 3 runs D-optimal exact design n = 8 34 / 40
An industrial example experiment to investigate effect of amount of glycerine (%), x ( x 3) speed temperature reduction ( F/min), x ( x 3) response: amount of surviving biological material (%), y context: freeze-dried coffee retaining volatile compounds in freeze-dried coffee is important for its smell and taste Excel file: quadratic3.xls 35 / 40 An industrial example Y i = β 0 + β x i + β x i +β x i x i + β x i + β x i + ɛ 9 observations / tests 3 factorial design is optimal what if combination of high x and high x are not allowed? 36 / 40
Constrained design region 3 x x + x 5 constraint x 3 x : glycerine, x : temperature reduction 37 / 40 Constrained design region: solution I scale 3 factorial design down so that it fits in constrained design region 3 x x 3 det(x T X) = 0.09 38 / 40
Constrained design region: solution II move forbidden point (3,3) inward 3 x x 3 det(x T X) = 0.0006 39 / 40 Constrained design region: solution III use optimal design approach 3 x (.9,.9) x 3 det(x T X) = 0.0005 40 / 40