A Note on Adaptive Box-Cox Transformation Parameter in Linear Regression
|
|
- Spencer Montgomery
- 5 years ago
- Views:
Transcription
1 A Note on Adaptive Box-Cox Transformation Parameter in Linear Regression Mezbahur Rahman, Samah Al-thubaiti, Reid W Breitenfeldt, Jinzhu Jiang, Elliott B Light, Rebekka H Molenaar, Mohammad Shaha A Patwary Joshua A Wuollet Minnesota State University, Mankato, MN 56001, USA Correspondence Author: Mezbahur Rahman Abstract: The Box-Cox transformation is a well known family of power transformations that brings a set of data into agreement with the normality assumption of the residuals and hence the response variable of a postulated model in regression analysis. In this paper we use six different data sets to implement adaptive maximum likelihood Box-Cox transformation parameter estimation in regression analysis. In addition, we perform random permutation and Monte-Carlo simulation to investigate the performances of the adaptive method. Key Words: Normality tests; Shapiro-Wilk W statistic. I I. INTRODUCTION n regression analysis, often a key assumption regarding normality of the error variable and hence the response variable are violated. The commonly used remedy is the Box- Cox family of power transformations (Box and Cox (196)). The process is to select a parameter in the Box-Cox transformation which maximizes the normal likelihood using the data at hand and then apply regression analysis on the transformed response variable. In practice, the regression model parameters are usually estimated separately after the necessary Box-Cox power transformation parameter is selected. In literature, the estimation procedures of the Box-Cox power transformation parameter are considered by many authors. The notable ones are the normal likelihood method of Box and Cox (196), the robustified version of the normal likelihood method of Carroll (1980) and of Bickel and Doksum (1981), the transformation to symmetry method of Hinkley (1975), the quick estimate of Hinkley (1977) and of Taylor (1985). Lin and Vonesh (1989) constructed a nonlinear regression model which is used to estimate the transformation parameter such that the normal probability plot of the data on the transformed scale is as close to linearity as possible. Following the footsteps of Box and Cox (1982) and Lin and Vonesh (1989), Halawa (1996) considered the power transformation parameter estimation procedure using an artificial regression model which gives the estimates with very small variabilities compared to the normal likelihood procedure. Halawa (1996) conducted an exhaustive comparative study with normal likelihood procedure. In that study, he also considered estimation procedures of the location and the scale parameters in the likelihood. Rahman (1999) introduced a method of estimating the Box-Cox power transformation parameter using maximization of the Shapiro-Wilk W (Shapiro and Wilk (1965)) statistic along with a comparison study of the normal likelihood method (Carroll (1980)) of the artificial regression model method (Halawa (1996)). Rahman and Pearson (2008) showed that their adaptive procedure s performance is better than other alternatives in obtaining maximum likelihood estimation procedures. In this paper the estimation procedure for the Box-Cox power transformation parameter is considered using adaptive maximization of the normal likelihood along with the Newton-Raphson root finding method explicitly in multiple regression context. The motivation of the paper is given in the following section. Shapiro-Wilk W statistic p-values are computed before and after implementation of the transformation parameter. Application data sets are used and comparisons are made using random permutations for normal and non-normal samples using Monte-Carlo simulations. II. MOTIVATION Often Box-Cox transformations are implemented for the dependent variable in regression analyses when residuals are found non-normal. Then models are fitted using the transformed data and if normality is checked for residuals again, often those are found non-normal again. To investigate this dilemma, several data sets are considered. Recently, Rahman and Pearson (2008) showed that adaptive maximum likelihood performs better than two other ways of implementing maximization procedure. But in their simulations, mostly univariate data are considered. Here, we consider a multiple regression environment so that practical implementation in regression can be investigated more closely. III. BOX-COX TRANSFORMATION Page 1
2 Let Y,Y,,Y be a random sample of size n from a 1 2 n population whose functional form is unknown. Box and Cox (196) suggested that if the transformation Y Y λ 1 = λ ; λ 0 ln(y); λ= 0 (1) and l σ 2 β= β, σ2 = σ 2 = 0 σ 2 = 1 ( ) n Y -X β ( Y -X β). (5) Then the pseudo log-likelihood l =l(λ;y 1,y 2,,y n ) can be written as is performed on the data then Y will have an approximate normal distribution with mean μ and variance σ 2. In equation (1), λ is unknown and considered as the Box-Cox power transformation parameter and ln represents the natural logarithm. IV. ADAPTIVE NEWTON-RAPHSON METHOD IN MULTIPLE REGRESSION Let us consider Y=Xβ+ε to be a usual multiple regression model, where Y is an n 1 vector of dependent variable measurements, X is an n p matrix of independent variable measurements, β is a p 1 vector of unknown regression parameters ε is an n 1 vector of error variable. The least square estimate of β is β=(x'x) 1 X'Y. Residuals are computed as e=y Ŷ. If e does not follow normal distribution, Box-Cox transformation on Y is implemented as in (1). After applying the transformation, the density function of e in terms of y can be written as The steps for maximizing l using the Newton-Raphson method are as follows: y λ i 1 where P i =y i = λ, Q i = U i =ŷ =x i ' β=x i '(X'X) 1 X'P i y i λ = λ(lny )y λ y λ i i i +1 λ 2, where x' is the corresponding row of the X matrix for a given y. For a fixed λ, the log-likelihood function l A =l(x'β,σ 2 ;y 1,y 2,,y n ) stands for adaptive, can be written as, where the subscript A Equation () is maximized when the partial derivatives of () with respect to β and σ 2 are equated to zero and the solution to the resulting system is found. This leads to solving the following two equations, l β β= β, σ 2 = σ 2 = 0 β=(x X) -1 X Y () where h'( λ)= n λ (t+1) = λ (t) h( λ(t) ) h'( λ (t),(8) ) n ( P U i i ) ( ) n 2 ( P U i i) 2 R W + i i n ( ) P i U i ( ) n ( ) n ( ) Q i V i 2 Q i V i 2 P i U i 2 2, Page 2
3 where R i = V i = ŷ λ = x i ' β λ = x i (X X)-1 X Q i, W = 2 ŷ 2 x ' β i i λ 2 = λ 2 = x i (X X)-1 X R. i 2 y λ 2 (lny ) 2 y λ i i λ 2 = i 2 λ(lny )y λ +2y λ i i i 2 λ, and The computation algorithm starts with an initial value of λ (an obvious choice is 1) iteratively using (8) and then β and σ 2 are obtained using () and (5) by substituting λ for λ, if desired. V. APPLICATION We consider six different data sets to investigate Box-Cox transformation effects in linear regression models. For each data set, we decide on competing models after implementing usual model selection procedures. For each competing model, Shapiro-Wilk W ' statistic and its p-value are computed for the residuals using the exclusive Monte-Carlo procedure given by Rahman and Pearson (2000). The Box-Cox transformation parameter is estimated using the adaptive method described in section. Then the Shapiro-Wilk W ' statistic and its p-value are computed for the residuals using the exclusive Monte- Carlo procedure given by Rahman and Pearson (2000) for the transformed data. We performed simulations to investigate the effect of the estimate of λ using the adaptive method described in section. We considered 1000 random permutations of the dependent variable and computed the Shapiro-Wilk W ' statistic and p- values for transformed and non-transformed data. We recorded the number of increases in p-values after transformation indicating closer to normal compared to prior to transformation as given below in tables for respective data. We also recorded the number of changes in decisions to accept or reject normality after transformation. We also considered simulated residuals to generate dependent variable values using standard uniform, standard normal, standard exponential a mixture of normal distributions which represent a bimodal distribution and then computed similar results to what we did for random permutations. Abreviations used in tables below are as follows: RR RA AR AA NI Reject prior transformation and reject after transformation Reject prior transformation and accept after transformation Accept prior transformation and reject after transformation Accept prior transformation and accept after transformation Number of p-values increased after transformation Note that in all rejections of normality, 10% level of significance is considered Jet Turbine Engine Thrust Data Table 1: Jet Turbine Engine Thrust Data (Table B.1, Montgomery et al., 2012, Page 566) y x 1 x x 5 x 6 y x 1 x x 5 x Page
4 where, y: Thrust (mechanical force generated by the engines) (lbs), x 1 : Primary speed of rotation (rpm) : Secondary speed of rotation (rpm), x : Fuel flow rate (lbs/min), : Pressure (lbs/in 2 ), x 5 : Exhaust temperature ( F) x 6 : After exhaustive analyses two different competing models are determined. includes variables x 1, x,, x 5 x 6. includes variables x 1, x, x 5 x 6. In both the models, intercept is included. Ambient temperature at time of test ( F) Table 2: Jet Turbine Engine Thrust Data Results Hald Cement Data Table : Jet Turbine Engine Thrust Data Simulation Results Random Permutation Random Permutation Uniform (0,1) Uniform (0,1) Normal (0,1) Normal (0,1) Exponential (1) Exponential (1) N(0,1)+ 1 N( 2, 1 ) N(0,1)+ 1 N( 2, 1 ) Table : Hald Cement Data (Table B.21, Montgomery et al., 2012, Page 57) y x 1 x y x 1 x y x 1 x where, y: heat evolved in calories/gram of cement, x : 1 percentage of tricalcium aluminate, x : perentage of 2 tetracalcium silicate, x : percentage of tetracalcium aluminoferite, x : percentage of dicalcium phosphate. After exhaustive analyses two different competing models are determined. includes variables x 1. includes variables x 1 and. In both the models, intercept is included. Table 5: Hald Cement Data Results Page
5 Table 6: Hald Cement Data Simulation Results Random Permutation Random Permutation Uniform (0,1) Uniform (0,1) Normal (0,1) Normal (0,1) Exponential (1) Exponential (1) N(0,1)+ 1 N( 2, 1 ) N(0,1)+ 1 N( 2, 1 ) Solar Thermal Energy Test Data Table 7: Solar Thermal Energy Test Data (Table B.2, Montgomery et al., 2012, Page 555) y x 1 x x 5 y x 1 x x where, y: total heat flux (kwatts), x : insolation (watts/m 2, x : 1 2 position of focal point in east direction (inches), x : position of focal point in south direction (inches), x : position of focal point in north direction (inches) x : times of day. 5 After exhaustive analyses two different competing models are determined. includes variables x 1, x, x 5. includes variables x 1, x and. Model includes variables, x and. In all the models, intercept is included. Table 8: Solar Thermal Energy Test Data Results Model Page 5
6 Table 9: Solar Thermal Energy Test Data Simulation Results Random Permutation Random Permutation Model Random Permutation Uniform (0,1) Uniform (0,1) Model Uniform (0,1) Normal (0,1) Normal (0,1) Model Normal (0,1) Exponential (1) Exponential (1) Model Exponential (1) Model N(0,1)+ 1 N( 2, 1 ) N(0,1)+ 1 N( 2, 1 ) N(0,1)+ 1 N( 2, 1 ) Heat Treating Data Table 10: Heat Treating Data (TABLE B.12, Montgomery et al., 2012, Page 565) y x 1 x x 5 y x 1 x x where, y: Result of the pitch carbon analysis test, x : Furnace 1 Temperature, x : Duration of the carburizing cycle, x : 2 Carbon Concentration, x : Duration of the diffuse cycle x : Carbon concentration of the diffuse cycle. 5 After exhaustive analyses two different competing models are determined. includes variables x 1, x, x 5. includes variables and. In all the models, intercept is included. Table 11: Heat Treating Data Results Page 6
7 Table 12: Heat Treating Data Simulation Results Random Permutation Random Permutation Uniform (0,1) Uniform (0,1) Normal (0,1) Normal (0,1) Exponential (1) Exponential (1) N(0,1)+ 1 N( 2, 1 ) N(0,1)+ 1 N( 2, 1 ) Oil Extraction from Peanuts Data Table 1: Oil Extraction from Peanuts Data (TABLE B.7, Montgomery et al., 2012, Page 560) x 1 x x 5 y x 1 x x 5 y where, y: total oil yield, x : CO2 pressure (bar), x : CO2 1 2 temperature (in degress Celsius), x : Peanut moisture (percent by weight), x : CO2 flow rate (L/min) x : peanut particle 5 size (mm). After exhaustive analyses four different competing models are determined. includes variables x 1, x, x 5. includes variables x 1 x 5. In all the models, intercept is included. Table 1: Oil Extraction from Peanuts Data Results Page 7
8 Table 15: Oil Extraction from Peanuts Data Simulation Results Random Permutation Random Permutation Uniform (0,1) Uniform (0,1) Normal (0,1) Normal (0,1) Exponential (1) Exponential (1) N(0,1)+ 1 N( 2, 1 ) N(0,1)+ 1 N( 2, 1 ) Electronic Inverter Data Table 16: Electronic Inverter Data (TABLE B.1, Montgomery et al., 2012, Page 567) x 1 x x 5 y x 1 x x 5 y where, y: transient point of PMOS-NMOS Inverters, x : width 1 of the NMOS Device, x : length of the NMOS Device, x : 2 width of the PMOS Device, x : length of the PMOS Device, and x : a numeric vector. 5 After exhaustive analyses four different competing models are determined. includes variables x 1, x, x 5. includes variables x 1, x. In all the models, intercept is included Table 17: Electronic Inverter Data Results Page 8
9 Table 18: Electronic Inverter Data Simulation Results Random Permutation Random Permutation Uniform (0,1) Uniform (0,1) Normal (0,1) Normal (0,1) Exponential (1) Exponential (1) N(0,1)+ 1 N( 2, 1 ) N(0,1)+ 1 N( 2, 1 ) VI. CONCLUDING REMARKS When we are implementing Box-Cox transformation, we are expecting that W ' values should be increasing and hence the p-values should also be increasing. For data set 2, model of data set, data data 5, W ' values increase and hence the p-values also increase as expected. But for data set 1, models 1 and 2 of data set data set 6, the increment of W ' and p-values are reversed. In random permutation computations, in all data sets, numbers of increases in p-values are high (at least 8%) except in data sets 2 and 5 where the increases are low (at most 6%). In data set, numbers of rejections are very high after transformation even though the numbers of increases in p-values are very high, which might be due to the fact that the level of significance is 10% and the changes are not high enough to change the decisions. In data sets and 6, numbers of acceptance of normality are high after transformation as one might expect. In data sets 1, 2 5, mostly transformations were not needed. Note that in random permutations, we are permuting existing response measurements by fixing the design matrix. In simulations, we are simulating random numbers as responses for fixed design matrices. Among all data sets, only in data set 1, results are as expected, that is, higher rates of rejection in cases of exponential errors then higher rate of acceptance after transformation. In other data sets, such comments cannot be made. In data set 6, there are high numbers of rejections which were acceptances prior to transformation. In conclusion, we can summarize that in regression, results are very much dependent on the design matrix. So, in transformation parameter computations, design matrix should be taken into consideration. When Box-Cox transformation is considered, results are not dependable as one might expect. More research should be done for effective implementation of the Box-Cox transformation. BIBLIOGRAPHY [1]. Bickel, P. J. and K. A. Doksum (1981). An Analysis of Transformations Revisited. Journal of the American Statistical Association, 76, [2]. Box, G. E. P. and D. R. Cox (1982). An Analysis of Transformations Revisited (Rebutted). Journal of the Ameran Statistical Association, 77, []. Box, G. E. P. and D. R. Cox (196). An Analysis of Transformations. Journal of the Royal Statistical Society, Series B., 26, []. Carroll, R. J. (1980). A Robust Method for Testing Transformations to Achieve Approximate Normality. Journal of the Royal Statistical Society, Series B., 2, [5]. Halawa, A. M. (1996). Estimating the Box-Cox Transformation via an Artificial Regression Model. Communications in Statistics Simulation and Computation, 25(2), [6]. Harter, H. L. (1961). Expected Values of Normal Order Statistics. Biometrika, 8, 1 and 2, [7]. Hinkley, D. V. (1975). On Power Transformation to Symmetry. Biometrika, 62, [8]. Hinkley, D. V. (1977). On Quick Choice of Power Transformation. Applied Statistics, 26, [9]. Lin, L. I. and E. F. Vonesh (1989). An Empirical Nonlinear Data- Fitting Approach for Transforming Data to Normality. American Statistician,, [10]. Montgomery, D. C., E. A. Peck G. G. Vining (2012). Introduction to Linear Regression Analysis. John Wiley & Sons, Inc., New Jersey, USA. [11]. Rahman, M. and L. M. Pearson (2008). A Note on the Maximum Likelihood Box-Cox Transformation Parameter. Journal of Probability and Statistical Science, 6(2), [12]. Rahman, M. and L. M. Pearson (2000). Shapiro-Francia W Statistic Using Exclusive Simulation, Journal of the Korean Data & Information Science Society, 11(2), [1]. Rahman, M. (1999). Estimating the Box-Cox Transformation via Shapiro-Wilk W Statistic. Communications in Statistics - Simulation and Computation, 28(1), [1]. Shapiro, S. S. and M. B. Wilk (1965). An Analysis of Variance Test for Normality. Biometrika, 52, and, [15]. Shapiro, S. S., M. B. Wilk H. J. Chen (1968). A Comparative Study of Various Tests of Normality. Journal of the American Statistical Association, 6, [16]. Taylor, J. M. G. (1985). Power Transformations to Symmetry. Annals of Mathemaical Statistics,, Page 9
JEREMY TAYLOR S CONTRIBUTIONS TO TRANSFORMATION MODEL
1 / 25 JEREMY TAYLOR S CONTRIBUTIONS TO TRANSFORMATION MODELS DEPT. OF STATISTICS, UNIV. WISCONSIN, MADISON BIOMEDICAL STATISTICAL MODELING. CELEBRATION OF JEREMY TAYLOR S OF 60TH BIRTHDAY. UNIVERSITY
More informationSTA121: Applied Regression Analysis
STA121: Applied Regression Analysis Linear Regression Analysis - Chapters 3 and 4 in Dielman Artin Department of Statistical Science September 15, 2009 Outline 1 Simple Linear Regression Analysis 2 Using
More informationRemedial Measures Wrap-Up and Transformations-Box Cox
Remedial Measures Wrap-Up and Transformations-Box Cox Frank Wood October 25, 2011 Last Class Graphical procedures for determining appropriateness of regression fit - Normal probability plot Tests to determine
More informationA NOTE ON THE CHOICE OF THE SMOOTHING PARAMETER IN THE KERNEL DENSITY ESTIMATE
BRAC University Journal, vol. V1, no. 1, 2009, pp. 59-68 A NOTE ON THE CHOICE OF THE SMOOTHING PARAMETER IN THE KERNEL DENSITY ESTIMATE Daniel F. Froelich Minnesota State University, Mankato, USA and Mezbahur
More informationUsing Ridge Least Median Squares to Estimate the Parameter by Solving Multicollinearity and Outliers Problems
Modern Applied Science; Vol. 9, No. ; 05 ISSN 9-844 E-ISSN 9-85 Published by Canadian Center of Science and Education Using Ridge Least Median Squares to Estimate the Parameter by Solving Multicollinearity
More informationThe Model Building Process Part I: Checking Model Assumptions Best Practice
The Model Building Process Part I: Checking Model Assumptions Best Practice Authored by: Sarah Burke, PhD 31 July 2017 The goal of the STAT T&E COE is to assist in developing rigorous, defensible test
More informationA comparison of inverse transform and composition methods of data simulation from the Lindley distribution
Communications for Statistical Applications and Methods 2016, Vol. 23, No. 6, 517 529 http://dx.doi.org/10.5351/csam.2016.23.6.517 Print ISSN 2287-7843 / Online ISSN 2383-4757 A comparison of inverse transform
More informationBSCB 6520: Statistical Computing Homework 4 Due: Friday, December 5
BSCB 6520: Statistical Computing Homework 4 Due: Friday, December 5 All computer code should be produced in R or Matlab and e-mailed to gjh27@cornell.edu (one file for the entire homework). Responses to
More informationROBUSTNESS OF TWO-PHASE REGRESSION TESTS
REVSTAT Statistical Journal Volume 3, Number 1, June 2005, 1 18 ROBUSTNESS OF TWO-PHASE REGRESSION TESTS Authors: Carlos A.R. Diniz Departamento de Estatística, Universidade Federal de São Carlos, São
More informationThe Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)
The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) Authored by: Sarah Burke, PhD Version 1: 31 July 2017 Version 1.1: 24 October 2017 The goal of the STAT T&E COE
More informationAny of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.
STATGRAPHICS Rev. 9/13/213 Calibration Models Summary... 1 Data Input... 3 Analysis Summary... 5 Analysis Options... 7 Plot of Fitted Model... 9 Predicted Values... 1 Confidence Intervals... 11 Observed
More informationCHAPTER 5 LINEAR REGRESSION AND CORRELATION
CHAPTER 5 LINEAR REGRESSION AND CORRELATION Expected Outcomes Able to use simple and multiple linear regression analysis, and correlation. Able to conduct hypothesis testing for simple and multiple linear
More informationA comparison study of the nonparametric tests based on the empirical distributions
통계연구 (2015), 제 20 권제 3 호, 1-12 A comparison study of the nonparametric tests based on the empirical distributions Hyo-Il Park 1) Abstract In this study, we propose a nonparametric test based on the empirical
More informationSequential Implementation of Monte Carlo Tests with Uniformly Bounded Resampling Risk
Sequential Implementation of Monte Carlo Tests with Uniformly Bounded Resampling Risk Axel Gandy Department of Mathematics Imperial College London a.gandy@imperial.ac.uk user! 2009, Rennes July 8-10, 2009
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationAMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression
AMS 315/576 Lecture Notes Chapter 11. Simple Linear Regression 11.1 Motivation A restaurant opening on a reservations-only basis would like to use the number of advance reservations x to predict the number
More informationRegression Graphics. 1 Introduction. 2 The Central Subspace. R. D. Cook Department of Applied Statistics University of Minnesota St.
Regression Graphics R. D. Cook Department of Applied Statistics University of Minnesota St. Paul, MN 55108 Abstract This article, which is based on an Interface tutorial, presents an overview of regression
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationEXAMINATIONS OF THE HONG KONG STATISTICAL SOCIETY
EXAMINATIONS OF THE HONG KONG STATISTICAL SOCIETY MODULE 4 : Linear models Time allowed: One and a half hours Candidates should answer THREE questions. Each question carries 20 marks. The number of marks
More informationThe Goodness-of-fit Test for Gumbel Distribution: A Comparative Study
MATEMATIKA, 2012, Volume 28, Number 1, 35 48 c Department of Mathematics, UTM. The Goodness-of-fit Test for Gumbel Distribution: A Comparative Study 1 Nahdiya Zainal Abidin, 2 Mohd Bakri Adam and 3 Habshah
More informationGeneralized Maximum Entropy Estimators: Applications to the Portland Cement Dataset
The Open Statistics & Probability Journal 2011 3 13-20 13 Open Access Generalized Maximum Entropy Estimators: Applications to the Portland Cement Dataset Fikri Akdeniz a* Altan Çabuk b and Hüseyin Güler
More informationApplication of Variance Homogeneity Tests Under Violation of Normality Assumption
Application of Variance Homogeneity Tests Under Violation of Normality Assumption Alisa A. Gorbunova, Boris Yu. Lemeshko Novosibirsk State Technical University Novosibirsk, Russia e-mail: gorbunova.alisa@gmail.com
More informationJackknife Empirical Likelihood for the Variance in the Linear Regression Model
Georgia State University ScholarWorks @ Georgia State University Mathematics Theses Department of Mathematics and Statistics Summer 7-25-2013 Jackknife Empirical Likelihood for the Variance in the Linear
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationECON 5350 Class Notes Functional Form and Structural Change
ECON 5350 Class Notes Functional Form and Structural Change 1 Introduction Although OLS is considered a linear estimator, it does not mean that the relationship between Y and X needs to be linear. In this
More informationAN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY
Econometrics Working Paper EWP0401 ISSN 1485-6441 Department of Economics AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY Lauren Bin Dong & David E. A. Giles Department of Economics, University of Victoria
More informationBinary choice 3.3 Maximum likelihood estimation
Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Output of the estimation We explain here the various outputs from the maximum likelihood estimation procedure. Solution of the maximum likelihood
More informationBusiness Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal
Department of Quantitative Methods & Information Systems Business Statistics Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220 Dr. Mohammad Zainal Chapter Goals After completing
More informationBootstrap. Director of Center for Astrostatistics. G. Jogesh Babu. Penn State University babu.
Bootstrap G. Jogesh Babu Penn State University http://www.stat.psu.edu/ babu Director of Center for Astrostatistics http://astrostatistics.psu.edu Outline 1 Motivation 2 Simple statistical problem 3 Resampling
More informationMultilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2
Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do
More informationTHE ROYAL STATISTICAL SOCIETY 2008 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE (MODULAR FORMAT) MODULE 4 LINEAR MODELS
THE ROYAL STATISTICAL SOCIETY 008 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE (MODULAR FORMAT) MODULE 4 LINEAR MODELS The Society provides these solutions to assist candidates preparing for the examinations
More informationRemedial Measures Wrap-Up and Transformations Box Cox
Remedial Measures Wrap-Up and Transformations Box Cox Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Last Class Graphical procedures for determining appropriateness
More informationIdentification of dispersion effects in replicated two-level fractional factorial experiments
Identification of dispersion effects in replicated two-level fractional factorial experiments Cheryl Dingus 1, Bruce Ankenman 2, Angela Dean 3,4 and Fangfang Sun 4 1 Battelle Memorial Institute 2 Department
More informationMath 5305 Notes. Diagnostics and Remedial Measures. Jesse Crawford. Department of Mathematics Tarleton State University
Math 5305 Notes Diagnostics and Remedial Measures Jesse Crawford Department of Mathematics Tarleton State University (Tarleton State University) Diagnostics and Remedial Measures 1 / 44 Model Assumptions
More informationby Jerald Murdock, Ellen Kamischke, and Eric Kamischke An Investigative Approach
Number and Operations Understand numbers, ways of representing numbers, relationships among numbers, and number systems develop a deeper understanding of very large and very small numbers and of various
More informationSimple and Multiple Linear Regression
Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where
More informationExamination paper for TMA4255 Applied statistics
Department of Mathematical Sciences Examination paper for TMA4255 Applied statistics Academic contact during examination: Anna Marie Holand Phone: 951 38 038 Examination date: 16 May 2015 Examination time
More informationStatistics for analyzing and modeling precipitation isotope ratios in IsoMAP
Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP The IsoMAP uses the multiple linear regression and geostatistical methods to analyze isotope data Suppose the response variable
More informationRegression Analysis for Data Containing Outliers and High Leverage Points
Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain
More informationAn Empirical Characteristic Function Approach to Selecting a Transformation to Normality
Communications for Statistical Applications and Methods 014, Vol. 1, No. 3, 13 4 DOI: http://dx.doi.org/10.5351/csam.014.1.3.13 ISSN 87-7843 An Empirical Characteristic Function Approach to Selecting a
More informationEcn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section:
Ecn 102 - Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman Midterm 2 You have until 10:20am to complete this exam. Please remember to put your name,
More informationChapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a
Chapter 9 Regression with a Binary Dependent Variable Multiple Choice ) The binary dependent variable model is an example of a a. regression model, which has as a regressor, among others, a binary variable.
More informationUNIVERSITY OF CALIFORNIA, SAN DIEGO
UNIVERSITY OF CALIFORNIA, SAN DIEGO Estimation of the primary hazard ratio in the presence of a secondary covariate with non-proportional hazards An undergraduate honors thesis submitted to the Department
More informationGeneralized, Linear, and Mixed Models
Generalized, Linear, and Mixed Models CHARLES E. McCULLOCH SHAYLER.SEARLE Departments of Statistical Science and Biometrics Cornell University A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS, INC. New
More informationLocal Influence and Residual Analysis in Heteroscedastic Symmetrical Linear Models
Local Influence and Residual Analysis in Heteroscedastic Symmetrical Linear Models Francisco José A. Cysneiros 1 1 Departamento de Estatística - CCEN, Universidade Federal de Pernambuco, Recife - PE 5079-50
More informationRegression diagnostics
Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model
More informationLeast Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions
Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 5-1-2009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error
More informationExample: Suppose Y has a Poisson distribution with mean
Transformations A variance stabilizing transformation may be useful when the variance of y appears to depend on the value of the regressor variables, or on the mean of y. Table 5.1 lists some commonly
More informationDistribution-Free Monitoring of Univariate Processes. Peihua Qiu 1 and Zhonghua Li 1,2. Abstract
Distribution-Free Monitoring of Univariate Processes Peihua Qiu 1 and Zhonghua Li 1,2 1 School of Statistics, University of Minnesota, USA 2 LPMC and Department of Statistics, Nankai University, China
More informationPsychology 282 Lecture #4 Outline Inferences in SLR
Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations
More informationMidterm 2 - Solutions
Ecn 102 - Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman Midterm 2 - Solutions You have until 10:20am to complete this exam. Please remember to put
More informationCHAPTER EIGHT Linear Regression
7 CHAPTER EIGHT Linear Regression 8. Scatter Diagram Example 8. A chemical engineer is investigating the effect of process operating temperature ( x ) on product yield ( y ). The study results in the following
More informationA Large Sample Normality Test
Faculty Working Paper 93-0171 330 STX B385 1993:171 COPY 2 A Large Sample Normality Test ^' of the JAN /> -, ^'^^srsitv fj nil, Anil K. Bera Pin T. Ng Department of Economics Department of Economics University
More informationPART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,
Concordia University Department of Mathematics and Statistics Course Number Section Statistics 360/2 01 Examination Date Time Pages Final December 2002 3 hours 6 Instructors Course Examiner Marks Y.P.
More informationAdvanced Regression Topics: Violation of Assumptions
Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, 2005 Applied Regression Analysis Lecture #7-2/15/2005 Slide 1 of 36 Today s Lecture Today s Lecture rapping Up Revisiting residuals.
More informationImproved Ridge Estimator in Linear Regression with Multicollinearity, Heteroscedastic Errors and Outliers
Journal of Modern Applied Statistical Methods Volume 15 Issue 2 Article 23 11-1-2016 Improved Ridge Estimator in Linear Regression with Multicollinearity, Heteroscedastic Errors and Outliers Ashok Vithoba
More informationStatistics for Engineers Lecture 9 Linear Regression
Statistics for Engineers Lecture 9 Linear Regression Chong Ma Department of Statistics University of South Carolina chongm@email.sc.edu April 17, 2017 Chong Ma (Statistics, USC) STAT 509 Spring 2017 April
More informationChaper 5: Matrix Approach to Simple Linear Regression. Matrix: A m by n matrix B is a grid of numbers with m rows and n columns. B = b 11 b m1 ...
Chaper 5: Matrix Approach to Simple Linear Regression Matrix: A m by n matrix B is a grid of numbers with m rows and n columns B = b 11 b 1n b m1 b mn Element b ik is from the ith row and kth column A
More informationStat 401B Exam 2 Fall 2015
Stat 401B Exam Fall 015 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning
More informationMultivariate Regression (Chapter 10)
Multivariate Regression (Chapter 10) This week we ll cover multivariate regression and maybe a bit of canonical correlation. Today we ll mostly review univariate multivariate regression. With multivariate
More information3rd Quartile. 1st Quartile) Minimum
EXST7034 - Regression Techniques Page 1 Regression diagnostics dependent variable Y3 There are a number of graphic representations which will help with problem detection and which can be used to obtain
More information9 Correlation and Regression
9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the
More informationFULL LIKELIHOOD INFERENCES IN THE COX MODEL
October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach
More informationOn Selecting Tests for Equality of Two Normal Mean Vectors
MULTIVARIATE BEHAVIORAL RESEARCH, 41(4), 533 548 Copyright 006, Lawrence Erlbaum Associates, Inc. On Selecting Tests for Equality of Two Normal Mean Vectors K. Krishnamoorthy and Yanping Xia Department
More informationLecture 18 Miscellaneous Topics in Multiple Regression
Lecture 18 Miscellaneous Topics in Multiple Regression STAT 512 Spring 2011 Background Reading KNNL: 8.1-8.5,10.1, 11, 12 18-1 Topic Overview Polynomial Models (8.1) Interaction Models (8.2) Qualitative
More informationLOGISTIC REGRESSION Joseph M. Hilbe
LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of
More informationChapter 11. Regression with a Binary Dependent Variable
Chapter 11 Regression with a Binary Dependent Variable 2 Regression with a Binary Dependent Variable (SW Chapter 11) So far the dependent variable (Y) has been continuous: district-wide average test score
More informationInference for Regression
Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationEconometric Analysis of Cross Section and Panel Data
Econometric Analysis of Cross Section and Panel Data Jeffrey M. Wooldridge / The MIT Press Cambridge, Massachusetts London, England Contents Preface Acknowledgments xvii xxiii I INTRODUCTION AND BACKGROUND
More information[y i α βx i ] 2 (2) Q = i=1
Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation
More informationEstimation for generalized half logistic distribution based on records
Journal of the Korean Data & Information Science Society 202, 236, 249 257 http://dx.doi.org/0.7465/jkdi.202.23.6.249 한국데이터정보과학회지 Estimation for generalized half logistic distribution based on records
More informationCHAPTER 2 SIMPLE LINEAR REGRESSION
CHAPTER 2 SIMPLE LINEAR REGRESSION 1 Examples: 1. Amherst, MA, annual mean temperatures, 1836 1997 2. Summer mean temperatures in Mount Airy (NC) and Charleston (SC), 1948 1996 Scatterplots outliers? influential
More informationMissouri Educator Gateway Assessments
Missouri Educator Gateway Assessments June 2014 Content Domain Range of Competencies Approximate Percentage of Test Score I. Number and Operations 0001 0002 19% II. Algebra and Functions 0003 0006 36%
More informationSymbolic Regression in Multicollinearity Problems
Symbolic Regression in Multicollinearity Problems Flor Castillo, Carlos Villa The Dow Chemical Company 1 Outline Why we need Symbolic Regression in multicollinearity A case study The proposed approach
More informationA bias improved estimator of the concordance correlation coefficient
The 22 nd Annual Meeting in Mathematics (AMM 217) Department of Mathematics, Faculty of Science Chiang Mai University, Chiang Mai, Thailand A bias improved estimator of the concordance correlation coefficient
More informationTesting Equality of Natural Parameters for Generalized Riesz Distributions
Testing Equality of Natural Parameters for Generalized Riesz Distributions Jesse Crawford Department of Mathematics Tarleton State University jcrawford@tarleton.edu faculty.tarleton.edu/crawford April
More informationApproximate Median Regression via the Box-Cox Transformation
Approximate Median Regression via the Box-Cox Transformation Garrett M. Fitzmaurice,StuartR.Lipsitz, and Michael Parzen Median regression is used increasingly in many different areas of applications. The
More informationWeighted Least Squares I
Weighted Least Squares I for i = 1, 2,..., n we have, see [1, Bradley], data: Y i x i i.n.i.d f(y i θ i ), where θ i = E(Y i x i ) co-variates: x i = (x i1, x i2,..., x ip ) T let X n p be the matrix of
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationInformation in a Two-Stage Adaptive Optimal Design
Information in a Two-Stage Adaptive Optimal Design Department of Statistics, University of Missouri Designed Experiments: Recent Advances in Methods and Applications DEMA 2011 Isaac Newton Institute for
More informationLinear Regression. Aarti Singh. Machine Learning / Sept 27, 2010
Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X
More informationMONTE CARLO ANALYSIS OF CHANGE POINT ESTIMATORS
MONTE CARLO ANALYSIS OF CHANGE POINT ESTIMATORS Gregory GUREVICH PhD, Industrial Engineering and Management Department, SCE - Shamoon College Engineering, Beer-Sheva, Israel E-mail: gregoryg@sce.ac.il
More informationTesting Goodness Of Fit Of The Geometric Distribution: An Application To Human Fecundability Data
Journal of Modern Applied Statistical Methods Volume 4 Issue Article 8 --5 Testing Goodness Of Fit Of The Geometric Distribution: An Application To Human Fecundability Data Sudhir R. Paul University of
More informationAnalysis of variance, multivariate (MANOVA)
Analysis of variance, multivariate (MANOVA) Abstract: A designed experiment is set up in which the system studied is under the control of an investigator. The individuals, the treatments, the variables
More information4. Nonlinear regression functions
4. Nonlinear regression functions Up to now: Population regression function was assumed to be linear The slope(s) of the population regression function is (are) constant The effect on Y of a unit-change
More informationUnivariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data?
Univariate analysis Example - linear regression equation: y = ax + c Least squares criteria ( yobs ycalc ) = yobs ( ax + c) = minimum Simple and + = xa xc xy xa + nc = y Solve for a and c Univariate analysis
More informationRegression Models - Introduction
Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent
More informationMULTIVARIATE TECHNIQUES, ROBUSTNESS
MULTIVARIATE TECHNIQUES, ROBUSTNESS Mia Hubert Associate Professor, Department of Mathematics and L-STAT Katholieke Universiteit Leuven, Belgium mia.hubert@wis.kuleuven.be Peter J. Rousseeuw 1 Senior Researcher,
More informationUsing Estimating Equations for Spatially Correlated A
Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship
More informationSimulating Properties of the Likelihood Ratio Test for a Unit Root in an Explosive Second Order Autoregression
Simulating Properties of the Likelihood Ratio est for a Unit Root in an Explosive Second Order Autoregression Bent Nielsen Nuffield College, University of Oxford J James Reade St Cross College, University
More informationSimple Linear Regression
Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20 Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent
More informationApplication of Ghosh, Grizzle and Sen s Nonparametric Methods in. Longitudinal Studies Using SAS PROC GLM
Application of Ghosh, Grizzle and Sen s Nonparametric Methods in Longitudinal Studies Using SAS PROC GLM Chan Zeng and Gary O. Zerbe Department of Preventive Medicine and Biometrics University of Colorado
More informationSteven Cook University of Wales Swansea. Abstract
On the finite sample power of modified Dickey Fuller tests: The role of the initial condition Steven Cook University of Wales Swansea Abstract The relationship between the initial condition of time series
More informationBusiness Statistics. Lecture 10: Correlation and Linear Regression
Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form
More informationCE 3710: Uncertainty Analysis in Engineering
FINAL EXAM Monday, December 14, 10:15 am 12:15 pm, Chem Sci 101 Open book and open notes. Exam will be cumulative, but emphasis will be on material covered since Exam II Learning Expectations for Final
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationPAijpam.eu M ESTIMATION, S ESTIMATION, AND MM ESTIMATION IN ROBUST REGRESSION
International Journal of Pure and Applied Mathematics Volume 91 No. 3 2014, 349-360 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu doi: http://dx.doi.org/10.12732/ijpam.v91i3.7
More informationNon-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models
Optimum Design for Mixed Effects Non-Linear and generalized Linear Models Cambridge, August 9-12, 2011 Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models
More informationFinite Population Sampling and Inference
Finite Population Sampling and Inference A Prediction Approach RICHARD VALLIANT ALAN H. DORFMAN RICHARD M. ROYALL A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane
More information