Analysis of 2 n factorial experiments with exponentially distributed response variable: a comparative study

Similar documents
Analysis of 2 n Factorial Experiments with Exponentially Distributed Response Variable

Generalized linear models

Outline of GLMs. Definitions

Introduction to General and Generalized Linear Models

Generalized Linear Models (GLZ)

LOGISTIC REGRESSION Joseph M. Hilbe

Generalized Linear Models Introduction

Ch 2: Simple Linear Regression

Generalized Linear Models

Lecture 15. Hypothesis testing in the linear model

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Ch 3: Multiple Linear Regression

I i=1 1 I(J 1) j=1 (Y ij Ȳi ) 2. j=1 (Y j Ȳ )2 ] = 2n( is the two-sample t-test statistic.

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

Weighted Least Squares

Generalized Linear Models 1

Simple and Multiple Linear Regression

Introduction to Generalized Linear Models

Probability and Statistics Notes

Generalized Linear Models. Kurt Hornik

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

Generalized Estimating Equations

LECTURE 5 HYPOTHESIS TESTING

Lecture 1: Linear Models and Applications

Generalized Linear Models

Math 423/533: The Main Theoretical Topics

Problem Selected Scores

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

First Year Examination Department of Statistics, University of Florida

Lecture 8. Poisson models for counts

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data

Generalized Linear Models

Homework 1 Solutions

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.

Formal Statement of Simple Linear Regression Model

BIOS 2083 Linear Models c Abdus S. Wahed

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

Linear Models and Estimation by Least Squares

STAT 705 Chapter 16: One-way ANOVA

Diagnostics can identify two possible areas of failure of assumptions when fitting linear models.

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Single-level Models for Binary Responses

Introductory Econometrics

Linear Models in Machine Learning

9 Generalized Linear Models

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression Models P8111

if n is large, Z i are weakly dependent 0-1-variables, p i = P(Z i = 1) small, and Then n approx i=1 i=1 n i=1

Advanced Econometrics I

Generalized linear models

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

General Linear Model: Statistical Inference

Chapter 8: Hypothesis Testing Lecture 9: Likelihood ratio tests

Regression and Statistical Inference

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Intermediate Econometrics

GLM I An Introduction to Generalized Linear Models

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

MODELING COUNT DATA Joseph M. Hilbe

Poisson regression 1/15

Practical Econometrics. for. Finance and Economics. (Econometrics 2)

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Empirical Likelihood Inference for Two-Sample Problems

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

NEW APPROXIMATE INFERENTIAL METHODS FOR THE RELIABILITY PARAMETER IN A STRESS-STRENGTH MODEL: THE NORMAL CASE

1 One-way Analysis of Variance

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Weighted Least Squares

Multiple Linear Regression

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Maximum Likelihood Estimation

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

4 Multiple Linear Regression

Week 14 Comparing k(> 2) Populations

Mean. Pranab K. Mitra and Bimal K. Sinha. Department of Mathematics and Statistics, University Of Maryland, Baltimore County

Generalized Linear Models and Extensions

14 Multiple Linear Regression

Linear models and their mathematical foundations: Simple linear regression

Multiple Regression Analysis

A Bivariate Weibull Regression Model

Simple Linear Regression

Weighted Least Squares

Chapter 22: Log-linear regression for Poisson counts

Introduction The framework Bias and variance Approximate computation of leverage Empirical evaluation Discussion of sampling approach in big data

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST

A test for improved forecasting performance at higher lead times

Generalized linear models

DELTA METHOD and RESERVING

Generalized Linear Models: An Introduction

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models

ON COMBINING CORRELATED ESTIMATORS OF THE COMMON MEAN OF A MULTIVARIATE NORMAL DISTRIBUTION

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

Ma 3/103: Lecture 24 Linear Regression I: Estimation

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

Lecture 16 Solving GLMs via IRWLS

Chapter 12. Analysis of variance

Transcription:

Analysis of 2 n factorial experiments with exponentially distributed response variable: a comparative study H. V. Kulkarni a and S. C. Patil (Birajdar) b Department of Statistics, Shivaji University, Kolhapur, India, 464. a Department of Statistics, Pad.Dr.D.Y.Patil A.C.S.College,Pimpri, Pune, India, 48. b ABSTRACT The use of generalized linear models (GLM) is suggested for the analysis of designed experiments involving non-normal response variables and small size samples. This paper assesses and compares the performance of GLM with respect to log transformation and ordinary least squares (OLS) approaches on the basis of coverage results and expected length of confidence intervals for the expected responses for exponentially distributed response variable. We also focus on the relevant power functions of tests of hypothesis for testing significance of underlying factorial effects. Although, the comparison is mainly simulation based, whenever possible, theoretical investigation is also carried out for saturated 2-level factorial experiments. The coverage results of log transformation approach are uniformly superior to GLM for two replications (very small sample) which is most frequently encountered situation in the industial experiments. We also suggest a modification to GLM based CI that shows decisive improvement over the GLM. t-test under GLM approach exhibits best power function for testing significance of factorial effects but t-test under log transformation approach is also equally best. Keywords: confidence intervals; ordinary least squares; generalized linear models; log transformation; factorial experiments;.introduction and Preliminaries. Introduction The optimality properties of OLS depend on normality and homogenous assumptions. Certainly there are many situations, where the non-normal response and nonhomogeneous variance is a fact. These responses may be related non-linearly with the input factors and the variance is not constant but is a function of mean. Moreover, only small and moderate size sample is available due to practical constraints. Traditionally in such situations, a *Correspondence Email: kulkarnih@rediffmail.com

suitably transformed response variable (most frequently a variance stabilizing transformation) is analyzed using OLS and the results are transformed back to the original scale. But this approach has many drawbacks, see for example, Myers and Montgomery [3]. GLM, introduced by Nelder and Wedderburn [] used as an analog to the above mentioned traditional approaches. Myers and Montgomery [3] gave a tutorial on GLM, Lewis et.al.[4] gave many examples of designed experiments with non-normal responses and Lewis et.al.[5] attempted a simulation study on analysis of designed experiments involving non-normal response variables using GLM. The use of GLM is suggested for the analysis of designed experiments involving non-normal response variables and small size samples. All the above discussions are mainly concentrated on estimating confidence intervals (CI) for the expected responses. The authors compared all above mentioned approaches on the basis of response estimation and prediction which is only limited to a sample of parameter combinations. This does not seem to be adequate for assessing the overall performance of GLM. In industrial settings, sometimes the response variable could have an exponential distribution. The emphasis of this paper is to assess and compare the performance of GLM with respect to transformation and OLS approaches on the basis of coverage results and expected length of CI(E(LOCI)) for the expected responses for exponentially distributed response variable. We also focus on the power functions of relevant tests of hypothesis for testing significance of underlying factorial effects. Although, the comparison is mainly simulation based, whenever possible, theoretical investigation is also carried out which covers a reasonable representative subset of the parameter space representing a variety of situations that could be encounter in practice. We also suggest a modification to CI for GLM that shows decisive improve- 2

ment over the GLM approach at the prescribed confidence coefficient. For transformation approach, we consider the log transformation, which is variance stabilizing transformation and is commonly used for lifetime data and refer it as LOG approach henceforth..2 Nature of the underlying design matrix The set up discussed here is that of saturated 2-level factorial designs with r(> ) replications. The number of parameters (m) in this case is equal to the number of distinct treatment combinations (k) under which maximum likelihood estimators (m.l.e.), under the GLM setup take a closed form expression that enables the theoretical comparison of three approaches. Note that, the underlying model can be viewed as a regression model with design matrix (cf. Myers and Montgomery [3]), X n k = ( A,..., A ) () where n = rk and A is a k k symmetric Hadamard matrix corresponding to a single replicate of the experiment. The matrix A being a Hadamard matrix, we have A = k A (2) From () and the properties of matrix A, it follows that, X X = rki k and XX is an rk rk matrix consisting of r r array of k k submatrices, where each submatrix equals ki k, I k being an identity matrix of order k. Throughout this paper, we assume that, the response variable Y is an exponentially distributed random variable. Y ij, i =, 2,..., k, j =, 2,..., r are observations on Y. W ij = log(y ij ) is the log transformed response, Ȳ i. = rj= Y r ij, Sy 2 = ki= S 2 k i where Si 2 = rj= (Y r ij Ȳi.) 2 and W i. and Sw 2 are similarly defined. 3

The remainder of the paper is organized as follows: In Section 2, our discussion is concern about the theoretical details of above three approaches; the underlying model, expressions for estimated responses, confidence limits and corresponding lengths of CI (LOCI) around expected responses for the underlying model. We provide exact theoretical expressions for E(LOCI) and expected coverage probability for the GLM approach. Good theoretical approximations of the same are also provided for the OLS and LOG approaches. Section 3 reports the simulation procedure and findings of the Sections 2 and 3 regarding the performance of the above discussed methods in terms of E(LOCI) and coverage results. The LOG approach outperform GLM for two replications (very small sample) which is most frequently encountered situation in the industial experiments. We propose a modification to GLM based CI, that shows decisive improvement over the GLM. In Section 4, we compare the three approaches based on the power functions of tests for testing significance of factorial effects. t-test under GLM and LOG approaches exhibit best power function for testing significance of factorial effects. 2.Theoretical details of the three approaches 2. The underlying model and related formulae for various approaches Table summarizes the underlying model, estimated responses, confidence limits and LOCI around expected responses for three approaches. Here Y denotes the estimated response, µ i denotes the expected response corresponding to the i th regressor vector X i, which is i th row of the coefficient matrix X. σy 2 and σw 2 are the unknown constant variances of Y and W respectively. The components of the vector of parameters β correspond to various factorial effects. t (n k),α/2 is the upper α/2 th percentile of the 4

t-distribution with n k degrees of freedom (d.f.) and Z α/2 is the upper α/2 th percentile of the standard normal distribution. For the illustration, the confidence coefficient α =.95 is used throughout the paper. Table : Underlying model and related formulae for the three approaches. Method Model Response distribution Confidence Limits a and estimated response and LOCI b under the respective model OLS EY = Xβ Y ij N(µ i, σy) 2 Yi. ± t (n k),α/2 S y/r 2 a µ i = (X i β) Yij = Y i. 2t (n k),α/2 S y/r 2 b LOG W ij = log(y ij ) W ij N(µ i, σw) 2 exp( W i. ± t (n k),α/2 S 2 w /r) a EW = Xβ Y ij = exp( W i. ) ((v 2 )/v) exp ( W i. ) b µ i = (X i β) = ( Y ij ) /r = GM v = exp(t (n k),α/2 s 2 w /r) GLM EY = exp(x β) Y ij Exponential(µ i ) Yi. exp(±z α/2 / r) a ( µ i = exp (X i β) Yij = Y i. Yi. exp{z α/2 / r)} b exp{ Z α/2 / ) r)} Note that, for OLS and LOG approach the formulae in the Table follow easily using standard results on OLS and properties of Hadamard matrix. The estimated response for LOG approach is nothing but the geometric mean (GM) of the relevant group of observations. Since GM AM, it follows that, LOG approach underestimates the expected responses. In some situatons, for example, when the response variable is time to recurrence of a defective item, this could be a serious problem. For GLM approach, the model is EY = µ = g (η); η = X β = g(µ) where g(.) is an appropriate monotonic and differentiable link function. There exist two types of links; canonical and noncanonical link. The distribution 5

of response variable Y is assumed to be a member of the exponential family. We refer to McCullagh and Nelder [2] for more theoretical details of GLM. It is well known that, for exponentially distributed response Y, the canonical link selects the reciprocal link η i = µ i. Due to some drawbacks of this link, for example, negative values of the estimated response may be encountered, we choose the noncanonical log link, which overcomes these drawbacks. For this link, the underlying model is E(Y ij ) = µ i = exp(x iβ) i =,..., k, j =,..., r. (3) where X i is the i th row of A in the design matrix () and µ corresponds to a single replicate. In Table, (µ, µ,..., µ) = E(Y ) = exp (X β). Parameter estimation in the GLM is performed using the method of maximum likelihood estimation, which is in general an iterative process and may not yield closed form expressions of m.l.e. However, simple closed form expressions of m.l.e. are available due to the assuptions on the underlying design matrix in Section.2. Since, the matrix A is nonsingular, β is just a reparameterization of µ, giving the estimated response in closed form; Y ij = µ i = exp(x i β) = Ȳi.. Consequently β = A log(ȳi.) is also available in closed form. The CI for mean response µ i at the point x i, is given by exp(x i For the noncanonical link, σ i = β ±Z α/2 σi ). (a(φ)) 2 x i(x V X) x i (cf. Myers and Montgomery [3]) where V is the diagonal matrix whose i th diagonal element is estimate of V ar(y i ) and is the diagonal matrix whose i th diagonal element i is the derivative of ( /µ i ) with respect to X iβ. For the design matrix (), noting that, for exponential response variable a(φ) =, i = exp( X i β) and V ar(y i ) = exp(2x i β), this simplifies to σ i = / r. A further simplification leads to CI for GLM approach, Ȳi.exp(±Z α/2 / r) and the corresponding 6

LOCI is Ȳi.(exp(Z α/2 / r) exp( Z α/2 / r)) which are reported in the last column of Table. The E(LOCI) is E(LOCI) = µ i (exp(z α/2 / r) exp( Z α/2 / r)) which at α =.5, for r=2 and 3 replications equals, { 3.7484µi for r=2 E(LOCI) = 2.778µ i for r=3 (4) Note that, both OLS and GLM estimate the expected response by Ȳi. which is an unbiased estimator of µ i but as noted earlier, LOG approach under estimates µ i. 2.2 Approximate Expected Lengths of CI for OLS and LOG approaches In this section we attempt to obtain approximate theoretical expressions of E(LOCI) for OLS and LOG approaches. i)e(loci) for OLS approach For OLS approach, LOCI = 2t (n k),α/2 S 2 y /r where, Sy 2 = (/k) k i= Si 2, Si 2 = rj= (Y r ij Ȳi.) 2. Noting that, Si 2 is an unbiased estimator of µ 2 i, it follows that, E(LOCI) = 2t (n k),α/2 µ /r, where µ is the average of µ 2 i, i =, 2,..., k. At α =.5, for r=2 and 3 replications this equals, { 3.9266 µ for r=2 E(LOCI) = 3.26 µ for r=3 (5) ii)e(loci) for LOG approach For LOG approach, as reported in Table, LOCI = ((v 2 )/v) exp ( W i. ) where v = exp(t (n k),α/2 s 2 w/r). Note that, E(Sw) 2 = and delta method leads to E(exp ( W i. )) = (µ r Γ i r r )r. Although v and W i. need not be independently distributed, we can have approximately E(LOCI) E((v 2 )/v)e(exp ( W i. )) 7

= (exp(t (n k),α/2 / r) exp( t (n k),α/2 / r))(µ r i r Γ(/r))r. In particular at α =.5, r=2 and 3 replications this gives, E(LOCI) { 5.4836µi for r=2 3.3944µ i for r=3 (6) Note that, for GLM and LOG approach the expected lengths depend only on the particular µ i under consideration while for OLS, they depend on all components of µ through µ and are same for all components of µ. In the next section we discuss the expected coverage probability of the above approaches. 2.3 Expected Coverage probability for three approaches i)coverage probability for GLM approach For GLM approach, the desired coverage probability for CI is P r(ȳi. exp( Z α/2 / r) µ i Ȳi. exp(z α/2 / r)) = P r(r exp( Z α/2 / r r) Y ij /µ i r exp(z α/2 / r) j= Since, r j= Y ij /µ i follows Gamma(, r) distribution, at α =.5 this equals G(r(7.993) / r,, r) G(r(.49) / r,, r) (7) where G(x,, r) is the cumulative distribution function at x for Gamma(, r) distribution. For r=2 it is.967 and for r=3 it equals.927. ii) Lower Bound (L.B.) for coverage probability of OLS approach The desired coverage probability for OLS approach is P r(ȳi. t (n k),α/2 S 2 y /r µ i Ȳi. + t (n k),α/2 S 2 y /r) 8

Note that, for exponentially distributed observations, V ar(ȳi.) = µ 2 i /r. Replacing S 2 y by E(S 2 y) = (/k) k i= µ 2 i = µ, Chebychev s inequality gives P r( µ i Ȳi. t (n k),α/2 S 2 y /r) {V ar(ȳi.)(t 2 (n k),α/2(s 2 y/r)) } = (µi 2 /t 2 (n k),α/2 µ ) This gives an approximate L.B. for coverage of OLS approach L(µ) = ( (µ i 2 /t 2 (n k),α/2 µ )) (8) The L. B. in (8) for the vectors µ used in the simulation study are listed in Tables 2 and 4, columns 6 and. iii)lower Bound for Coverage probability of LOG approach For LOG approach, referring to Table, the desired coverage probability can be approximated as P r( W i. t (n k),α/2 S 2 w/r log(µ i ) W i. + t (n k),α/2 S 2 w/r) Noting that, for LOG approach, V ar( W i. ) = /r and E(S 2 w) = arguments similar to those for OLS lead to, the approximate lower bound for coverage of LOG approach L(µ) = ( (/t 2 (n k),α/2)) (9) Unfortunately for fixed α, noting that n = rkthe L.B. in (9) is decreasing in the number of replications r and number of treatment combinations k through the degrees of freedom associated with the t-quantile. At α =.5, this equals 87.3 for r=2, k=4; 8.2 for r=3, k=4 as wel as for r=2, k=8 and 77.75 for r=3, k=8. Next section attempts a comparison of three approaches based on a simulation study. 9

3.Simulation study 3. Back-ground and Details of the simulation study Noting the dependence of E(LOCI) for the three approaches on the expected responses, µ rather than on the individual parameters β, we compare the above approaches for various values of µ. However, E(LOCI) for OLS approach depend on all the components of the mean response vector µ, while those for GLM and LOG approaches depend only on the particular µ i under consideration, the direct theoretical comparison of three approaches is not possible. Just as the theoretical arguments are indecisive when attempting a choice among the three approaches, so too are comparing based on an extensive simulation study. The small number of replications is a common practice in industrial experiments. Hence, the simulation study is carried out for 2 2 and 2 3 experiments with r=2 and 3 replications for exponentially distributed response Y. A few sets of µ = (µ, µ 2, µ 3, µ 4 ) are selected for simulation study and the values of µ are given in the first column of Tables 2 and 4. The coefficient matrix is as defined in (). Observing that, coverage results and E(LOCI) for OLS approach depend heavily on µ 2 i and µ, (cf. equations (5) and (8), a few sets of µ with µ i ranging between to 5, values of µ = k i= µ 2 i /4 ranging between to 3 and µ 2 i spread around µ, in various patterns are selected for simulation study. For each selected µ we generated 5 vectors Y whose components are independent exponential variates with E(Y ) = µ = (µ, µ ) for r=2 replications and (µ, µ, µ ) for r=3 replications. For each of these 5 vectors Y, for each component µ i, we compute the lower confidence limit (Li), upper confidence limit (U i ), and LOCI i = U i L i, i =, 2,..., 5, for the

two approaches. The simulated E(LOCI) for a particular µ i is the average of these 5 LOCI and the simulated coverage is the percentage of these 5 intervals which covers the true value of µ i. Simulated coverage results and E(LOCI) for three approaches with r=2 and 3 replications for 2 2 and 2 3 experiments are obtained and are reported in Tables 2 and 3 and Tables 4 and 5 respectively. In the next section, we report observations based on the Sections 2 and 3. 3.2 General observations regarding coverage results Theoretical coverage results reported in Section 2.3 and simulated coverage results reported in Table 2 for 2 2 and in Table 4 for 2 3 experiments with r=2 and 3 replications for the three approaches revealed the following facts: a)glm approach: The simulated coverage results for GLM approach exactly conform with theoretical ones given in (7). The coverage results are reasonable and well concentrated but these are well below the desired level. These are increasing function of the number of replications r but do not depend on the number of factors. b) OLS approach: The simulated coverage results for OLS approach follow the L.B. given in (8). The coverage results are poor if µ 2 i > µ, and good otherwise. Coverage results are very good, almost % when µ 2 i is quite less than µ (But in such cases, E(LOCI) tend to be unduly large). c)log approach: Simulated coverage results for LOG approach follow the L.B. in (9) but it is not sharp due to the approximations employed for its computation. The coverage results are reasonable and well concentrated but like GLM these are also well below the desired level. These are decreasing function of the number of replications r and number of treatment combinations k. The decreasing pattern with respect to r and k is visible from the columns 5 and 9 of the Tables 2 and 4.

3.3 General observations regarding E(LOCI) Theoretical coverage results reported in Section 2.2 and simulated E(LOCI) reported in Table 3 for 2 2 and in Table 5 for 2 3 experiments with r=2 and 3 replications for the three approaches revealed the following facts: a) GLM approach: The simulated E(LOCI) for GLM exactly match with those given in (4). The E(LOCI) depend only on the value of the particular component µ i of µ. These are increasing function of the particular component µ i of µ. Moreover, they are decreasing function of the number of replications r but do not depend on the number of factors. b)ols approach: The simulated E(LOCI) for OLS approach depend on all components of µ and are same for all components of µ but these are smaller than the ones obtained in (5). The large components of µ greatly inflate E(LOCI) of other components through µ. For example, for a 2 2 experiment with r=2, the same value of µ i = 25.79, for µ = (25.79, 2.7, 5.7546,.738) with µ = 75.69 has E(LOCI)=38.786 while for µ = (5.584,.4724, 25.79,.388) with µ = 356.27 has E(LOCI)=7.4324. In exactly similar manner, for µ = (7.389, 7.389, 7.389,.353), the observation.353, pulls down E(LOCI) for 7.389 but results in poor coverage probability 87.44. The problem basically arises since OLS uses a single pooled estimate S 2 of variability for all components. A very good approximation, E(LOCI) 3.2 µ is obtained through regression of simulated E(LOCI) that is fairly consistent with the expression for E(LOCI) obtained in (5). c)log approach: The simulated E(LOCI) for LOG approach are larger than the ones obtained in (6). Like GLM these also depend only on the value of the particular component µ i of µ and are increasing in µ i. Regression of simulated E(LOCI) on µ i for a 2 2 experiment with r=2 gave the relation E(LOCI)=3 µ i for µ i 5 and E(LOCI)=-.8 + 4.2µ i for µ i > 5. This 2

is not very close to the expression obtained in (6), may be because of the approximations used in its computation. Similar linear relationships were observed for r=3 and 2 3 experiments. 3.4 Comparison of three approaches. The coverage results of the OLS approach are most unstable. 2. For both GLM and LOG approach the coverage results are well below the desired level. The coverage results for GLM are uniformly inferior to those of LOG for two replications and are uniformly superior to those of LOG approach for three replications. 2. The E(LOCI) for GLM are uniformly smaller than LOG approach. Thus, the LOG approach outperform GLM for two replications which is most typically encountered situation in industrial experiments but all these observations are indecisive when attempting a choice between GLM and LOG approach. In the next section, we suggest a modification to CI for GLM approach that show decisive improvement over the GLM approach at desired confidence coefficient. 3.5 Proposed modification to GLM approach The proposed modification to GLM based CI is deliberately designed to achieve the coverage probability exactly equal to the desired confidence coefficient. For the given level α, a number δ can be selected so that, P r(rexp( δz α/2 / r r) Y ij /µ i rexp(δz α/2 / r)) = ( α) j= It is found that, at α =.5 δ = {.248 for r=2.566 for r=3 (8) 3

The values of δ can be well approximated as a function of r as δ =.9865 +.5252 r The values of δ in (8) suggest the CI limits be as Ȳi.exp(±.7296) for r=2 and Ȳi.exp(±.388) for r=3. Consequently the E(LOCI) is given by { 5.463µi for r=2 E(LOCI) = 3.436µ i for r=3 (9) Although, these expected widths are a little larger than those of GLM approach they are much smaller than those of LOG approach and maintain the coverage probability exactly equal to.95 and hence the proposed modification to GLM outperform the above methods and hence recommended for the use. This modification will be denoted by M-GLM henceforth. Panels (a), (b) of Figures and 2, display the box plots of coverage results for OLS, GLM, M-GLM and LOG approaches with r=2 and r=3 respectively based on Tables 2 and 4. The box plots of the corresponding E(LOCI) based on Tables 3 and 5 are displayed in the panels (c) and (d) of these figures. 4.Comparison of three approaches based on the powers of the tests for significance of factorial effects In this section, we compare the three approaches based on the power functions of tests for testing significance of factorial effects. 4. The tests under consideration Note that, for OLS and LOG approach, the regression coefficients in the respective models are scalar multiples of the factorial effects that are commonly used in the analysis of factorial experiments, so that, testing significance of the regression coefficients using t-test is equivalent to testing significance of the underlying factorial effect with usual F-test. For GLM approach two types of tests can be employed: t-test and deviance test. 4

In all the three approaches, t statistic for the hypotesis H : β i = is given by, t = ˆβ i /SE( ˆβ i ), carries n k d.f. and H is rejected if t > t (n k),α/2. Under OLS, ˆβ i are linear functions of observations of the type λ y and SE( ˆβ i ) is given by ˆσ λ 2 i where ˆσ is estimated by square root of residual sum of squares. For LOG a parallel procedure is adopted with Y replaced by the log transformed variable W. For GLM approach ˆβ i is m.l.e. of β i obtained through an iterative procedure (it has closed form expression for the saturated model given in Section 2.) and its standard error is the square root of i th diagonal element of the inverse information matrix, given by ((a(φ)) 2 (X V X) ) which for exponential response under saturated model of Section 2 equals /n. The deviance test for GLM is based on the difference between the deviances for the two nested models; the full model and the reduced model that is obtained under H by eliminating the i th column of the coefficient matrix X. The deviance of a fitted model is defined as D = 2[lnL(f itted model) lnl(saturated model)](a(φ)) For testing H : β i =, the deviance statistic takes the form F = D(β i β (i) )/(df 2 df ) D(β)/df 2 For exponentially distributed response k D(β) = 2r( lnȳi.+k), i= k r ( ) D(β i β (i) ) = 2 (y ij ˆµ i )/ ˆµ i ln(y ij / ˆµ i ) i= j= and β (i) is the vector β with i th component removed and ˆµ i is the m.l.e of µ i for the reduced model. These four test procedures henceforth will be denoted by t-ols, t-log, t-glm and DEV respectively. 5

4.2 The Power functions The power functions of the above four test procedures were simulated for a 2 2 factorial experiment with r=2 and 3 replications for each of the three hypotheses H : β =, H 2 : β 2 =, H 3 : β 2 =, which respectively correspond to the two main effects and the interaction between them. Power functions of each of these three hypotheses were exactly same, hence, here we have presented only those for H : β =. Noting that, the models under OLS and GLM are different and the coefficient β i represents respective effect on a linear scale for OLS and on logarithmic scale for GLM and LOG approaches, the power functions are generated under the following three situations, () Fixing β = (β, β 2, β 2 ) at (,, ) and (,, 2) and varying β under GLM model for a 2 2 experiment with r=2 and 3 replications. For this, 5 simulations each of Y exponential(µ) were generated from µ = exp(x β), with β varying systematically between -4 to 4 with increment of.. The power under consideration is simulated as the proportion of times the respective test rejects H among these 5 simulations. The corresponding Power functions are plotted in Figures 3(a) and 3(b) for two replications and in Figures 3(c) and 3(d) for three replications. (2)Fixing (β, β 2, β 2 ) at (6.,, ) and (7., 2, ) and varying β under OLS model as above. The corresponding power functions are plotted in Figures 4(a) and 4(b) for two replications and in Figures 4(c) and 4(d) for three replications. Here the component β had to be adjusted so as to ensure µ = Xβ non-negative. (3)For a few sets of µ considered in the first column of Table 2, taking β = A log(µ) for GLM, β varying systematically as above. The Power functions of these tests are plotted in Figures 5(a) and 5(b) respectively for 6

µ = [25.793, 2.7, 5.7546,.738] and µ = [.9375,.875,.625,.25] for two replications and similarlly in 5(c) and 5(d) for three replications. 4.3 Observations based on Power functions and recommendations From all the simulated Power functions it is clear that, ()t-log and t-glm are having best power functions among the four tests and have small type-i error rates, GLM being a little more superior to LOG. (2)The power functions of t-ols are very small while the Type-I error rates are large. (3)The deviance test is quite less powerful but has small type-i error rates. These observations suggest that t-test under LOG is equally best as under GLM approach and hence both are recommended for testing significance of factorial effects. The OLS has worst performance with respect to expected coverage, length of CI and power in the entire parameter space and hence should be strictly avoided. Further study is demanded regarding the unsaturated models and is in the progress. REFERENCES [] P. Feigl and M. Zelen, Estimation of exponential survival probabilities with concomitant information, Biometrics. 2, (965), pp. 826-838. [2] P. McCullagh and J. Nelder, Generalized Linear Models (2 nd ed.), Chapman and Hall, London, (989). [3] R. Myers and D. Montgomery, A Tutorial on Generalized Linear Models, Journal of Quality Technology 29(3) (997), pp. 274-29. [4] S. Lewis, D. Montgomery, and R. Myers, CI Coverage for Designed Experiments Analyzed with GLMs, Journal of Quality Technology 33(3) (2), pp. 279-29. [5] S. Lewis, D.Montgomery, and R. Myers, Examples of Designed Experiments with Nonnormal Responses, Journal of Quality Technology 33(3) (2), pp. 265-278. 7

Table 2: Simulated coverage results for OLS, GLM, M-GLM and LOG approaches for 2 2 factorial experiment with r=2, 3 replications µ OLS GLM M-GLM LOG r=2 r=3 L.B. Cov_OLS OLS GLM M -GLM LOG L.B. Cov_OLS.3 9 95.2 92.86 92.6 95.2 9.8.83 9.74 95 93.54 99.99 92.24 95.4 9 99.99 76.34 9.36 94.66 93.3 74.6 78.32 92.44 95.34 9.2 62.39 76.8 9. 95.8 93.24 74.6 78.28 9.88 94.8 9.4 62.39.2865 98.7 9.2 95.6 93.38 99.95 99.88 92.6 94.78 9.26 99.93.235 99.98 9.56 95.8 93.48 9.98 95. 9.94.9 9.8 94.42 93.48 92.24 95.8 9.6 9.4877 55.42 9.74 94.72 92.7 48.6 58.48 92.42 95.74 9. 24.85 7.389 87.44 9 95.4 94.48 82.7 87.7 92.68 95.8 9.24 74.93 7.389 87.58 9.2 95.64 93.66 82.7 88.62 92.94 95.58 9.74 74.93 7.389 87.96 9.24 94.84 93.5 82.7 88.22 92.66 95.36 9.88 74.93.353 9.88 95.8 93.36 99.99 92.42 95.22 9.34 99.99 25.79 58.5 9.42 94.74 92.72 5.89 6.96 92. 94.98 9.2 28.8 2.7 99.2 9.36 95. 93.44 99.67 99.8 92.2 95.24 9.82 99.52 5.7546 93.44 9.52 94.64 93.28 97.55 96.9 92.68 95.5 9.74 96.46.738 99.98 9 95.8 93.2 92.54 95.32 9.68 2.83 9.28 9.52 95.72 93.62 93.9 93.84 92.28 95.34 9.5 9.8 4.487 98.68 9.32 95.36 93.6 99.8 99.56 92.3 94.82 9.8 98.8 33.6 64.26 9.72 95.3 93.46 55.2 64.88 92.52 95.2 9.2 34.8.388 9.62 95.6 93.46 9.6 94.64 9.62 5.58 59.7 9.9 95.36 93.52 5.57 6.2 93 95.7 9.2 28.35.4724 9.58 95.66 92.2 92.52 95.9 9.42 25.79 92.24 9.2 95.8 93.66 97.54 96. 9.76 94.58 9.32 96.43.388 89.8 95.4 92.88 92.3 95.28 9.92.9375 93.68 9.8 95.4 92.82 88.7 95.28 92.6 95.52 9.7 83.63.875 94.84 9.96 95.2 93.7 9.6 95.54 92.8 95 9.84 85.74.625 9.6 9.44 95. 93.6 85.5 92.9 92.42 95.34 9.4 78.98.25 9.6 9.48 94.98 93.42 83.74 92.2 92. 94.96 9.7 76.43 5 94.66 9.98 95 93.38 88.7 95.6 9.88 94.86 9.8 83.63 4 95.36 9.8 95.22 93.52 9.6 95.98 92.2 95 9.2 85.74 7 92.68 9.92 95.8 93.34 85.5 92.92 92.4 95.24 9.46 78.98 8 9.72 9.84 95.22 93.58 83.74 9.4 92. 95.6 9.34 76.43 26 98.78 9.68 95.6 93.58 97.2 99.68 92.6 95.8 9.4 95.93 39 96.38 9.46 95.4 93 93.69 97.78 92.54 95.42 9.6 9.85 65 88.34 9.6 95.56 93.78 82.47 89. 9.72 94.8 9.6 74.59 78 83.76 9.9 95.3 93.58 74.76 83.2 9.28 94.42 9.2 63.4 8

Table 3: Simulated E(LOCI) for OLS, GLM, M-GLM and LOG approaches for 2 2 factorial experiments with r=2, 3 replications µ r=2 r=3 OLS GLM M-GLM LOG OLS GLM M-GLM LOG.3 2.265..2.4.636....83 2.265.69..279.636.5.63.7 2.265 3.68 5.363 2.944.636 2.762 3.42 3.686 2.265 3.724 5.426 2.586.636 2.766 3.47 3.657.2865 2.98.6.55 3.8.22.8.99..235 2.98.9.3.3.22.7.8.9.9 2.98...2.22... 9.4877 2.98 35.44 5.64 25.78.22 26.49 32.72 36.25 7.389 2.57 27.69 4.35 99.32 5.5 2.8 25.69 27.82 7.389 2.57 27.67 4.3 4.59 5.5 2.23 24.99 27.39 7.389 2.57 27.65 4.28.8 5.5 2.42 25.22 27.53.353 2.57.5.74.92 5.5.38.46.5 25.79 38.8 95.46 39.8 334.66 28.99 7.9 88.83 94.77 2.7 38.8 7.9.52 27.29 28.99 5.89 7.28 8.3 5.7546 38.8 2.42 3.2 8.53 28.99 6.4 9.82 2.27.738 38.8.65.95 2.6 28.99.49.6.64 2.83 54.67 46.93 68.38 5.52 4.6 33.9 4.89 45.5 4.487 54.67 6.66 24.27 56.75 4.6 2.44 5.37 6.62 33.6 54.67 25.4 82.7 423.58 4.6 92.87 4.72 25.9.388 54.67.5.2.48 4.6..3.2 5.58 7.43 438.96 639.53 633.55 29.4 323.7 399.7 42.7.4724 7.43.79 2.6 5.6 29.4.3.6.74 25.79 7.43 99.42 44.85 329.46 29.4 7.36 88.4 97.88.388 7.43.5.2.53 29.4..3.4.9375 3.49 3.5 5. 2.46 2.49 2.66 3.29 3.56.875 3.49 3.29 4.8 2.5 2.49 2.44 3. 3.33.625 3.49 4.2 5.86 4.5 2.49 2.95 3.64 3.97.25 3.49 4.6 6.5 4.55 2.49 3. 3.84 4.22 5 56.64 55.29 8.56 232.46 39.36 4.37 5. 55.86 4 56.64 52.6 76.65 8.52 39.36 38.67 47.77 52.67 7 56.64 64.2 93.4 224.95 39.36 47.6 58.25 62.62 8 56.64 68.9 99.35 255.4 39.36 49.93 6.68 67.26 26 9.66 96.4 4.46 335.27 35.44 72.5 89. 96.7 39 9.66 48.8 25.89 57.7 35.44 7.33 32.57 44. 65 9.66 242.92 353.93 923.4 35.44 79.54 22.77 244.39 78 9.66 29.86 423.77 8.59 35.44 24.86 265.4 29.4 9

Table 4: Simulated Coverage results for OLS, GLM, M-GLM and LOG approaches for 2 3 factorial experiments with r=2, 3 replications µ r=2 r=3 L.B. COV_OLS OLS GLM M-GLM LOG L.B. COV_OLS OLS GLM M-GLM LOG.5 9.38 95.8 92.8 99.96 9.8 95.2 89.38 99.95.65 98.2 9.24 95.72 9.28 93.36 98.92 92.36 95.8 89.92 92.4.75 97.6 9.4 94.9 9.78 9.5 98.8 92.5 95.46 89.5 89.53.85 96.76 9.46 95.64 92.2 88.64 97.2 9.92 94.9 89.62 86.56.25 9.6 9.4 95.2 9.3 75.43 9.72 9.98 95.34 89.6 7.93.5 92.5 9.78 95.42 9.54 79.2 92.76 9.78 94.74 89.3 75.39.5 84.84 9.94 95.36 9.4 64.62 85. 92.6 95.72 89.2 58.3.65 8.7 9.5 95.26 9.88 57.9 8.68 9.5 94.92 89.7 49.34.9375 95.4 9.64 95.46 9.78 85.4 96.3 92.4 95.6 9.2 82.3.875 96.8 9.56 95.26 9.56 86.97 97.28 92.22 94.94 88.96 84.58.625 92.68 9.84 95.22 9.54 8.78 94.52 92.72 95.22 89.58 77.26.25 92.54 9.44 95.76 92.6 78.46 92.82 92.22 94.88 89.72 74.5.8756 96.8 9.2 95.26 9.5 86.95 97.2 92.6 95.8 89 84.56.678 9.5 95.24 92.6 99.92 92.6 95.8 89.74 99.9.567 83.48 9.44 95.86 9.78 58.2 82.2 92.6 95.42 9.6 5.54.254 89.82 9.68 95.46 92. 73.23 9.78 9.88 94.66 88.7 68.32 2.825 93.76 9.48 95.36 9.76 93.3 95.4 92 94.9 89.44 86.66 5.2345 89.8 9.78 95.3 9.82 93.2 9.36 9.5 94.9 89.26 79.3 5.3274 99.44 9.54 95.32 9.74 93.3 99.72 92.8 95.38 9.6 97.45 4.487 99.44 9.6 95.4 9.42 93.5 99.84 92.3 95.26 89.4 98.9 33.55 63.56 9.2 95.58 9.74 93.3 62.34 9.98 94.7 89.2.39 2.342 8. 9.62 95.42 92.8 93 8.4 92.42 95.6 89.42 59.9.32 9.8 95.52 92.4 92.98 92.8 95.8 9.8.5439 9.72 95.22 9.6 93.8 9.4 94.66 89.8 99.97.3 9.62 95.4 9.64 92.28 95.8 9.6 2.5346 85 9. 95.48 9.54 76.4 87.56 92.62 95.48 89.28 72.7.2354 97.4 89.7 95.6 9.24 94.39 98.8 9.7 94.7 89.4 93.37 4 69.66 9.9 95.78 92.4 4.22 7.8 92.7 95.28 89.82 3.45.83 9.6 95.28 9.32 9.98 94.78 88.48 98.52 9.74 95.48 92.4 96.33 99.8 92.32 95.8 89.58 95.65 4 69.44 9.8 95.54 9.84 4.22 69.84 92.6 95.6 89.36 3.45.23 9.4 95.48 9.28 9.92 94.98 89.4 2

Table 5: Simulated E(LOCI) for OLS, GLM, M-GLM and LOG approaches for 2 3 factorial experiments with r=2, 3 replications µ r=2 r=3 OLS GLM M-GLM LOG OLS GLM M-GLM LOG.5 3.25.9.29.32 2.54.4.8.6.65 3.25 2.43 3.79 4.24 2.54.78 2.35 2..75 3.25 2.82 4.4 4.9 2.54 2.9 2.75 2.43.85 3.25 3.9 4.98 5.79 2.54 2.36 3. 2.76.25 3.25 4.6 7.8 8. 2.54 3.5 4.62 4.7.5 3.25 4.35 6.79 7.59 2.54 3.24 4.28 3.72.5 3.25 5.66 8.84 9.94 2.54 4.2 5.43 4.82.65 3.25 6.24 9.74.68 2.54 4.6 6.7 5.28.9375 3.2 3.53 5.52 6.43 2.44 2.59 3.42 3.3.875 3.2 3.26 5. 6.8 2.44 2.4 3.8 2.8.625 3.2 3.99 6.23 7.7 2.44 2.99 3.94 3.45.25 3.2 4.32 6.74 7.62 2.44 3. 4. 3.68.8756 3.2 3.32 5.8 5.78 2.44 2.43 3.2 2.83.678 3.2.25.4.46 2.44.9.25.22.567 3.2 5.9 9.22.45 2.44 4.35 5.74 5.4.254 3.2 4.69 7.3 8.29 2.44 3.48 4.59 4.2 2.825 44.43 45.45 7.95 8.56 34.85 34.2 44.87 39. 5.2345 44.43 57.69 9.4 3.24 34.85 42.7 56.3 48.78 5.3274 44.43 2.4 3.28 35.9 34.85 4.84 9.57 7.26 4.487 44.43 6.83 26.28 3.24 34.85 2.4 6.35 4.29 33.55 44.43 26.64 97.68 27.79 34.85 9.6 2.22 7.48 2.342 44.43 79.54 24.6 4.35 34.85 59.3 77.98 68.48.32 44.43..8.9 34.85.8...5439 44.43 2.3 3.7 3.5 34.85.52 2..76.3 6.35..2.2 5.8... 2.5346 6.35 9.43 4.72 7.28 5.8 7. 9.24 8.4.2354 6.35 4.6 7.9 8.2 5.8 3.43 4.53 3.96 4 6.35 5.8 23.53 26.33 5.8.2 4.66 3.28.83 6.35.7..2 5.8.5.7.6 6.35 3.7 5.79 6.49 5.8 2.8 3.7 3.26 4 6.35 4.89 23.25 26.85 5.8.2 4.78 3.9.23 6.35... 5.8... 2

Fig. : Box Plots for Coverage results and E(LOCI) for a 2 2 factorial experiments for OLS, GLM, M-GLM and LOG Approaches (a) COVERAGE for r =2 (b) COVERAGE for r =3 COVERAGE 9 8 7 COVERAGE 9 8 7 6 6 GLM LOG M_GLM METHOD OLS GLM LOG M_GLM METHOD OLS 4 (c) E(LO CI) for r =2 4 (d) E(LO CI) for r =3 E(LOCI) 3 2 E(LOCI) 3 2 G LM LO G M _G LM M ETH O D O LS G LM LO G M _G LM METH O D O LS Fig. 2: Box Plots for Coverage results and E(LOCI) for a 2 3 factorial experiment for OLS, GLM, M-GLM and LOG Approaches (a) C O V E R A G E for r =2 (b) C O V E R A G E for r =3 COVERAGE 9 8 7 COVERAGE 9 8 7 6 G LM LO G M _G LM M ETH O D O LS 6 G LM LO G M _G LM M ETH O D O LS ( c) E(LOCI) for r =2 ( d) E(LOCI) for r =3 E(LOCI) 8 6 4 E(LOCI) 6 5 4 3 2 2 GLM LOG M_GLM METHOD OLS GLM LOG M_GLM METHOD OLS 22

Figure 3: Graphs of power functions for H: β = for various parameter combinations β =( β, β 2, β 2 ) when β is systematically varied under GLM model. power (a) β = (,, ), r=2.2.8.6.4.2-6 -4-2 2 4 6 power β (b) β = (,, 2), r=2.2.8.6.4.2 t-ols t-log t-glm DEV -6-4 -2 2 4 6 β t-ols t-log T-GLM DEV POWER POWER (c) β = (,, ), r=3.2.8.6.4.2 t-ols t-log t-glm DEV -6-4 -2 2 4 6 β (d) β = (,, 2), r=3.2.8.6.4.2-6 -4-2 2 4 6 β t-ols t-log t-glm DEV Figure 4: Graphs of power functions for H: β = for various parameter combinations β =( β, β 2, β 2 ) when β is systematically varied under OLS model. (a) β = (6.,, ), r=2 (c) β = (6.,, ), r=3 Power.8.6.4.2 t-ols t-log t-glm DEV -6-4 -2 2 4 6 β POWER.2.8.6.4.2-6 -4-2 2 4 6 β t-ols t-log t-glm DEV 23

Power (b) β = (7., 2, ), r=2.8.6.4.2 t-ols t-log t-glm DEV -6-4 -2 2 4 6 β power (d) β = (7., 2, ), r=3.2.8.6.4.2 t-ols t-log t-glm DEV -6-4 -2 2 4 6 β Figure 5: Graphs of power functions of the four tests for testing H : β = for effects β =( β, β 2, β 2 ) generated from µ selected from Table 2, varying β systematically (a) µ =[25.793 2.7 5.7546.738], r=2 Power Power.2.8.6.4.2 t-ols t-log t-glm DEV -6-4 -2 2 4 6 β (b) µ = [.9375.875.625.25], r=2.2.8.6.4.2-6 -4-2 2 4 6 β t-ols t-log t-glm DEV (c) µ =[25.793 2.7 5.7546.738], r=3 PO W ER.2.8.6.4.2-6 -4-2 2 4 6 β t-ols t-log t-glm DEV (d) µ = [.9375.875.625.25], r=3 POWER.2.8.6.4.2-6 -4-2 2 4 6 β t-ols t-log t-glm DEV 24

Table 6 : Design matrix and the Survival time of leukemia patients x x 2 x x 2 Survival time of leukemia patients ( Y ) 4 - - 8 - - 65 - - 65 43 - - 5 - - 7 - - Table 7: LOCI for the Real Data Set OLS GLM M-GLM LOG Y LL UL LOCI LL UL LOCI LL UL LOCI LL UL LOCI 4-66.25 3.25 79.5 5.88 93.97 88.9 4.7 32.5 28.33.58 297.69 297. 8-33.25 46.25 79.5 4.3 225.92 2.79.2 38.57 38.55.2 527.46 526.44 65-53.75 25.75 79.5 9. 43.95 34.94 6.38 22.98 96.6.94 484.7 483.24 65-7.25 72.25 79.5 2.63 329.88 39.25 4.63 465.7 45.54 3.55 83. 826.46 43-66.25 3.25 79.5 5.88 93.97 88.9 4.7 32.5 28.33.58 297.69 297. 5-33.25 46.25 79.5 4.3 225.92 2.79.2 38.57 38.55.2 527.46 526.44 7-53.75 25.75 79.5 9. 43.95 34.94 6.38 22.98 96.6.94 484.7 483.24-7.25 72.25 79.5 2.63 329.88 39.25 4.63 465.7 45.54 3.55 83. 826.46 25