A Note on Adaptive Box-Cox Transformation Parameter in Linear Regression

Size: px

Start display at page:

Download "A Note on Adaptive Box-Cox Transformation Parameter in Linear Regression"

Spencer Montgomery
5 years ago
Views:

1 A Note on Adaptive Box-Cox Transformation Parameter in Linear Regression Mezbahur Rahman, Samah Al-thubaiti, Reid W Breitenfeldt, Jinzhu Jiang, Elliott B Light, Rebekka H Molenaar, Mohammad Shaha A Patwary Joshua A Wuollet Minnesota State University, Mankato, MN 56001, USA Correspondence Author: Mezbahur Rahman Abstract: The Box-Cox transformation is a well known family of power transformations that brings a set of data into agreement with the normality assumption of the residuals and hence the response variable of a postulated model in regression analysis. In this paper we use six different data sets to implement adaptive maximum likelihood Box-Cox transformation parameter estimation in regression analysis. In addition, we perform random permutation and Monte-Carlo simulation to investigate the performances of the adaptive method. Key Words: Normality tests; Shapiro-Wilk W statistic. I I. INTRODUCTION n regression analysis, often a key assumption regarding normality of the error variable and hence the response variable are violated. The commonly used remedy is the Box- Cox family of power transformations (Box and Cox (196)). The process is to select a parameter in the Box-Cox transformation which maximizes the normal likelihood using the data at hand and then apply regression analysis on the transformed response variable. In practice, the regression model parameters are usually estimated separately after the necessary Box-Cox power transformation parameter is selected. In literature, the estimation procedures of the Box-Cox power transformation parameter are considered by many authors. The notable ones are the normal likelihood method of Box and Cox (196), the robustified version of the normal likelihood method of Carroll (1980) and of Bickel and Doksum (1981), the transformation to symmetry method of Hinkley (1975), the quick estimate of Hinkley (1977) and of Taylor (1985). Lin and Vonesh (1989) constructed a nonlinear regression model which is used to estimate the transformation parameter such that the normal probability plot of the data on the transformed scale is as close to linearity as possible. Following the footsteps of Box and Cox (1982) and Lin and Vonesh (1989), Halawa (1996) considered the power transformation parameter estimation procedure using an artificial regression model which gives the estimates with very small variabilities compared to the normal likelihood procedure. Halawa (1996) conducted an exhaustive comparative study with normal likelihood procedure. In that study, he also considered estimation procedures of the location and the scale parameters in the likelihood. Rahman (1999) introduced a method of estimating the Box-Cox power transformation parameter using maximization of the Shapiro-Wilk W (Shapiro and Wilk (1965)) statistic along with a comparison study of the normal likelihood method (Carroll (1980)) of the artificial regression model method (Halawa (1996)). Rahman and Pearson (2008) showed that their adaptive procedure s performance is better than other alternatives in obtaining maximum likelihood estimation procedures. In this paper the estimation procedure for the Box-Cox power transformation parameter is considered using adaptive maximization of the normal likelihood along with the Newton-Raphson root finding method explicitly in multiple regression context. The motivation of the paper is given in the following section. Shapiro-Wilk W statistic p-values are computed before and after implementation of the transformation parameter. Application data sets are used and comparisons are made using random permutations for normal and non-normal samples using Monte-Carlo simulations. II. MOTIVATION Often Box-Cox transformations are implemented for the dependent variable in regression analyses when residuals are found non-normal. Then models are fitted using the transformed data and if normality is checked for residuals again, often those are found non-normal again. To investigate this dilemma, several data sets are considered. Recently, Rahman and Pearson (2008) showed that adaptive maximum likelihood performs better than two other ways of implementing maximization procedure. But in their simulations, mostly univariate data are considered. Here, we consider a multiple regression environment so that practical implementation in regression can be investigated more closely. III. BOX-COX TRANSFORMATION Page 1

2 Let Y,Y,,Y be a random sample of size n from a 1 2 n population whose functional form is unknown. Box and Cox (196) suggested that if the transformation Y Y λ 1 = λ ; λ 0 ln(y); λ= 0 (1) and l σ 2 β= β, σ2 = σ 2 = 0 σ 2 = 1 ( ) n Y -X β ( Y -X β). (5) Then the pseudo log-likelihood l =l(λ;y 1,y 2,,y n ) can be written as is performed on the data then Y will have an approximate normal distribution with mean μ and variance σ 2. In equation (1), λ is unknown and considered as the Box-Cox power transformation parameter and ln represents the natural logarithm. IV. ADAPTIVE NEWTON-RAPHSON METHOD IN MULTIPLE REGRESSION Let us consider Y=Xβ+ε to be a usual multiple regression model, where Y is an n 1 vector of dependent variable measurements, X is an n p matrix of independent variable measurements, β is a p 1 vector of unknown regression parameters ε is an n 1 vector of error variable. The least square estimate of β is β=(x'x) 1 X'Y. Residuals are computed as e=y Ŷ. If e does not follow normal distribution, Box-Cox transformation on Y is implemented as in (1). After applying the transformation, the density function of e in terms of y can be written as The steps for maximizing l using the Newton-Raphson method are as follows: y λ i 1 where P i =y i = λ, Q i = U i =ŷ =x i ' β=x i '(X'X) 1 X'P i y i λ = λ(lny )y λ y λ i i i +1 λ 2, where x' is the corresponding row of the X matrix for a given y. For a fixed λ, the log-likelihood function l A =l(x'β,σ 2 ;y 1,y 2,,y n ) stands for adaptive, can be written as, where the subscript A Equation () is maximized when the partial derivatives of () with respect to β and σ 2 are equated to zero and the solution to the resulting system is found. This leads to solving the following two equations, l β β= β, σ 2 = σ 2 = 0 β=(x X) -1 X Y () where h'( λ)= n λ (t+1) = λ (t) h( λ(t) ) h'( λ (t),(8) ) n ( P U i i ) ( ) n 2 ( P U i i) 2 R W + i i n ( ) P i U i ( ) n ( ) n ( ) Q i V i 2 Q i V i 2 P i U i 2 2, Page 2

3 where R i = V i = ŷ λ = x i ' β λ = x i (X X)-1 X Q i, W = 2 ŷ 2 x ' β i i λ 2 = λ 2 = x i (X X)-1 X R. i 2 y λ 2 (lny ) 2 y λ i i λ 2 = i 2 λ(lny )y λ +2y λ i i i 2 λ, and The computation algorithm starts with an initial value of λ (an obvious choice is 1) iteratively using (8) and then β and σ 2 are obtained using () and (5) by substituting λ for λ, if desired. V. APPLICATION We consider six different data sets to investigate Box-Cox transformation effects in linear regression models. For each data set, we decide on competing models after implementing usual model selection procedures. For each competing model, Shapiro-Wilk W ' statistic and its p-value are computed for the residuals using the exclusive Monte-Carlo procedure given by Rahman and Pearson (2000). The Box-Cox transformation parameter is estimated using the adaptive method described in section. Then the Shapiro-Wilk W ' statistic and its p-value are computed for the residuals using the exclusive Monte- Carlo procedure given by Rahman and Pearson (2000) for the transformed data. We performed simulations to investigate the effect of the estimate of λ using the adaptive method described in section. We considered 1000 random permutations of the dependent variable and computed the Shapiro-Wilk W ' statistic and p- values for transformed and non-transformed data. We recorded the number of increases in p-values after transformation indicating closer to normal compared to prior to transformation as given below in tables for respective data. We also recorded the number of changes in decisions to accept or reject normality after transformation. We also considered simulated residuals to generate dependent variable values using standard uniform, standard normal, standard exponential a mixture of normal distributions which represent a bimodal distribution and then computed similar results to what we did for random permutations. Abreviations used in tables below are as follows: RR RA AR AA NI Reject prior transformation and reject after transformation Reject prior transformation and accept after transformation Accept prior transformation and reject after transformation Accept prior transformation and accept after transformation Number of p-values increased after transformation Note that in all rejections of normality, 10% level of significance is considered Jet Turbine Engine Thrust Data Table 1: Jet Turbine Engine Thrust Data (Table B.1, Montgomery et al., 2012, Page 566) y x 1 x x 5 x 6 y x 1 x x 5 x Page

4 where, y: Thrust (mechanical force generated by the engines) (lbs), x 1 : Primary speed of rotation (rpm) : Secondary speed of rotation (rpm), x : Fuel flow rate (lbs/min), : Pressure (lbs/in 2 ), x 5 : Exhaust temperature ( F) x 6 : After exhaustive analyses two different competing models are determined. includes variables x 1, x,, x 5 x 6. includes variables x 1, x, x 5 x 6. In both the models, intercept is included. Ambient temperature at time of test ( F) Table 2: Jet Turbine Engine Thrust Data Results Hald Cement Data Table : Jet Turbine Engine Thrust Data Simulation Results Random Permutation Random Permutation Uniform (0,1) Uniform (0,1) Normal (0,1) Normal (0,1) Exponential (1) Exponential (1) N(0,1)+ 1 N( 2, 1 ) N(0,1)+ 1 N( 2, 1 ) Table : Hald Cement Data (Table B.21, Montgomery et al., 2012, Page 57) y x 1 x y x 1 x y x 1 x where, y: heat evolved in calories/gram of cement, x : 1 percentage of tricalcium aluminate, x : perentage of 2 tetracalcium silicate, x : percentage of tetracalcium aluminoferite, x : percentage of dicalcium phosphate. After exhaustive analyses two different competing models are determined. includes variables x 1. includes variables x 1 and. In both the models, intercept is included. Table 5: Hald Cement Data Results Page

5 Table 6: Hald Cement Data Simulation Results Random Permutation Random Permutation Uniform (0,1) Uniform (0,1) Normal (0,1) Normal (0,1) Exponential (1) Exponential (1) N(0,1)+ 1 N( 2, 1 ) N(0,1)+ 1 N( 2, 1 ) Solar Thermal Energy Test Data Table 7: Solar Thermal Energy Test Data (Table B.2, Montgomery et al., 2012, Page 555) y x 1 x x 5 y x 1 x x where, y: total heat flux (kwatts), x : insolation (watts/m 2, x : 1 2 position of focal point in east direction (inches), x : position of focal point in south direction (inches), x : position of focal point in north direction (inches) x : times of day. 5 After exhaustive analyses two different competing models are determined. includes variables x 1, x, x 5. includes variables x 1, x and. Model includes variables, x and. In all the models, intercept is included. Table 8: Solar Thermal Energy Test Data Results Model Page 5

6 Table 9: Solar Thermal Energy Test Data Simulation Results Random Permutation Random Permutation Model Random Permutation Uniform (0,1) Uniform (0,1) Model Uniform (0,1) Normal (0,1) Normal (0,1) Model Normal (0,1) Exponential (1) Exponential (1) Model Exponential (1) Model N(0,1)+ 1 N( 2, 1 ) N(0,1)+ 1 N( 2, 1 ) N(0,1)+ 1 N( 2, 1 ) Heat Treating Data Table 10: Heat Treating Data (TABLE B.12, Montgomery et al., 2012, Page 565) y x 1 x x 5 y x 1 x x where, y: Result of the pitch carbon analysis test, x : Furnace 1 Temperature, x : Duration of the carburizing cycle, x : 2 Carbon Concentration, x : Duration of the diffuse cycle x : Carbon concentration of the diffuse cycle. 5 After exhaustive analyses two different competing models are determined. includes variables x 1, x, x 5. includes variables and. In all the models, intercept is included. Table 11: Heat Treating Data Results Page 6

7 Table 12: Heat Treating Data Simulation Results Random Permutation Random Permutation Uniform (0,1) Uniform (0,1) Normal (0,1) Normal (0,1) Exponential (1) Exponential (1) N(0,1)+ 1 N( 2, 1 ) N(0,1)+ 1 N( 2, 1 ) Oil Extraction from Peanuts Data Table 1: Oil Extraction from Peanuts Data (TABLE B.7, Montgomery et al., 2012, Page 560) x 1 x x 5 y x 1 x x 5 y where, y: total oil yield, x : CO2 pressure (bar), x : CO2 1 2 temperature (in degress Celsius), x : Peanut moisture (percent by weight), x : CO2 flow rate (L/min) x : peanut particle 5 size (mm). After exhaustive analyses four different competing models are determined. includes variables x 1, x, x 5. includes variables x 1 x 5. In all the models, intercept is included. Table 1: Oil Extraction from Peanuts Data Results Page 7

8 Table 15: Oil Extraction from Peanuts Data Simulation Results Random Permutation Random Permutation Uniform (0,1) Uniform (0,1) Normal (0,1) Normal (0,1) Exponential (1) Exponential (1) N(0,1)+ 1 N( 2, 1 ) N(0,1)+ 1 N( 2, 1 ) Electronic Inverter Data Table 16: Electronic Inverter Data (TABLE B.1, Montgomery et al., 2012, Page 567) x 1 x x 5 y x 1 x x 5 y where, y: transient point of PMOS-NMOS Inverters, x : width 1 of the NMOS Device, x : length of the NMOS Device, x : 2 width of the PMOS Device, x : length of the PMOS Device, and x : a numeric vector. 5 After exhaustive analyses four different competing models are determined. includes variables x 1, x, x 5. includes variables x 1, x. In all the models, intercept is included Table 17: Electronic Inverter Data Results Page 8

9 Table 18: Electronic Inverter Data Simulation Results Random Permutation Random Permutation Uniform (0,1) Uniform (0,1) Normal (0,1) Normal (0,1) Exponential (1) Exponential (1) N(0,1)+ 1 N( 2, 1 ) N(0,1)+ 1 N( 2, 1 ) VI. CONCLUDING REMARKS When we are implementing Box-Cox transformation, we are expecting that W ' values should be increasing and hence the p-values should also be increasing. For data set 2, model of data set, data data 5, W ' values increase and hence the p-values also increase as expected. But for data set 1, models 1 and 2 of data set data set 6, the increment of W ' and p-values are reversed. In random permutation computations, in all data sets, numbers of increases in p-values are high (at least 8%) except in data sets 2 and 5 where the increases are low (at most 6%). In data set, numbers of rejections are very high after transformation even though the numbers of increases in p-values are very high, which might be due to the fact that the level of significance is 10% and the changes are not high enough to change the decisions. In data sets and 6, numbers of acceptance of normality are high after transformation as one might expect. In data sets 1, 2 5, mostly transformations were not needed. Note that in random permutations, we are permuting existing response measurements by fixing the design matrix. In simulations, we are simulating random numbers as responses for fixed design matrices. Among all data sets, only in data set 1, results are as expected, that is, higher rates of rejection in cases of exponential errors then higher rate of acceptance after transformation. In other data sets, such comments cannot be made. In data set 6, there are high numbers of rejections which were acceptances prior to transformation. In conclusion, we can summarize that in regression, results are very much dependent on the design matrix. So, in transformation parameter computations, design matrix should be taken into consideration. When Box-Cox transformation is considered, results are not dependable as one might expect. More research should be done for effective implementation of the Box-Cox transformation. BIBLIOGRAPHY [1]. Bickel, P. J. and K. A. Doksum (1981). An Analysis of Transformations Revisited. Journal of the American Statistical Association, 76, [2]. Box, G. E. P. and D. R. Cox (1982). An Analysis of Transformations Revisited (Rebutted). Journal of the Ameran Statistical Association, 77, []. Box, G. E. P. and D. R. Cox (196). An Analysis of Transformations. Journal of the Royal Statistical Society, Series B., 26, []. Carroll, R. J. (1980). A Robust Method for Testing Transformations to Achieve Approximate Normality. Journal of the Royal Statistical Society, Series B., 2, [5]. Halawa, A. M. (1996). Estimating the Box-Cox Transformation via an Artificial Regression Model. Communications in Statistics Simulation and Computation, 25(2), [6]. Harter, H. L. (1961). Expected Values of Normal Order Statistics. Biometrika, 8, 1 and 2, [7]. Hinkley, D. V. (1975). On Power Transformation to Symmetry. Biometrika, 62, [8]. Hinkley, D. V. (1977). On Quick Choice of Power Transformation. Applied Statistics, 26, [9]. Lin, L. I. and E. F. Vonesh (1989). An Empirical Nonlinear Data- Fitting Approach for Transforming Data to Normality. American Statistician,, [10]. Montgomery, D. C., E. A. Peck G. G. Vining (2012). Introduction to Linear Regression Analysis. John Wiley & Sons, Inc., New Jersey, USA. [11]. Rahman, M. and L. M. Pearson (2008). A Note on the Maximum Likelihood Box-Cox Transformation Parameter. Journal of Probability and Statistical Science, 6(2), [12]. Rahman, M. and L. M. Pearson (2000). Shapiro-Francia W Statistic Using Exclusive Simulation, Journal of the Korean Data & Information Science Society, 11(2), [1]. Rahman, M. (1999). Estimating the Box-Cox Transformation via Shapiro-Wilk W Statistic. Communications in Statistics - Simulation and Computation, 28(1), [1]. Shapiro, S. S. and M. B. Wilk (1965). An Analysis of Variance Test for Normality. Biometrika, 52, and, [15]. Shapiro, S. S., M. B. Wilk H. J. Chen (1968). A Comparative Study of Various Tests of Normality. Journal of the American Statistical Association, 6, [16]. Taylor, J. M. G. (1985). Power Transformations to Symmetry. Annals of Mathemaical Statistics,, Page 9

JEREMY TAYLOR S CONTRIBUTIONS TO TRANSFORMATION MODEL

1 / 25 JEREMY TAYLOR S CONTRIBUTIONS TO TRANSFORMATION MODELS DEPT. OF STATISTICS, UNIV. WISCONSIN, MADISON BIOMEDICAL STATISTICAL MODELING. CELEBRATION OF JEREMY TAYLOR S OF 60TH BIRTHDAY. UNIVERSITY