\It is important that there be options in explication, and equally important that. Joseph Retzer, Market Probe, Inc., Milwaukee WI

Similar documents
Bootstrap Simulation Procedure Applied to the Selection of the Multiple Linear Regressions

COLLABORATION OF STATISTICAL METHODS IN SELECTING THE CORRECT MULTIPLE LINEAR REGRESSIONS

Multicollinearity and A Ridge Parameter Estimation Approach

Multicollinearity Exercise

Special Topics. Handout #4. Diagnostics. Residual Analysis. Nonlinearity

A SAS/AF Application For Sample Size And Power Determination

Package mctest. February 26, 2018

INFORMATION AS A UNIFYING MEASURE OF FIT IN SAS STATISTICAL MODELING PROCEDURES

Application of Ghosh, Grizzle and Sen s Nonparametric Methods in. Longitudinal Studies Using SAS PROC GLM

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials.

Estimability Tools for Package Developers by Russell V. Lenth

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)

Multiple Linear Regression

Gregory Carey, 1998 Regression & Path Analysis - 1 MULTIPLE REGRESSION AND PATH ANALYSIS

Efficient Choice of Biasing Constant. for Ridge Regression

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

7. Assumes that there is little or no multicollinearity (however, SPSS will not assess this in the [binary] Logistic Regression procedure).

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Data Analyses in Multivariate Regression Chii-Dean Joey Lin, SDSU, San Diego, CA

a subset of these N input variables. A naive method is to train a new neural network on this subset to determine this performance. Instead of the comp

dqd: A command for treatment effect estimation under alternative assumptions

Implementation of Pairwise Fitting Technique for Analyzing Multivariate Longitudinal Data in SAS

A Power Analysis of Variable Deletion Within the MEWMA Control Chart Statistic

SAS/STAT 14.1 User s Guide. Introduction to Nonparametric Analysis

SESUG 2011 ABSTRACT INTRODUCTION BACKGROUND ON LOGLINEAR SMOOTHING DESCRIPTION OF AN EXAMPLE. Paper CC-01

On a connection between the Bradley-Terry model and the Cox proportional hazards model

Analysis of Longitudinal Data: Comparison between PROC GLM and PROC MIXED.

Model Selection Procedures

1 Introduction. 2 A regression model

Forecasting: Methods and Applications

Lecture 11 Multiple Linear Regression

The Model Building Process Part I: Checking Model Assumptions Best Practice

Handout 1: Predicting GPA from SAT

How To: Deal with Heteroscedasticity Using STATGRAPHICS Centurion

Table 1: Fish Biomass data set on 26 streams

Introduction to Nonparametric Analysis (Chapter)

Measuring relationships among multiple responses

FORECASTING PROCEDURES FOR SUCCESS

Number of cases (objects) Number of variables Number of dimensions. n-vector with categorical observations

Classification-relevant Importance Measures for the West German Business Cycle

Regression Models. Chapter 4. Introduction. Introduction. Introduction

DEPARTMENT OF ECONOMICS AND FINANCE COLLEGE OF BUSINESS AND ECONOMICS UNIVERSITY OF CANTERBURY CHRISTCHURCH, NEW ZEALAND

FORECASTING. Methods and Applications. Third Edition. Spyros Makridakis. European Institute of Business Administration (INSEAD) Steven C Wheelwright

Panel Data. STAT-S-301 Exercise session 5. November 10th, vary across entities but not over time. could cause omitted variable bias if omitted

Linear Regression and Its Applications

Correlation and the Analysis of Variance Approach to Simple Linear Regression

EXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :"

TWO-FACTOR AGRICULTURAL EXPERIMENT WITH REPEATED MEASURES ON ONE FACTOR IN A COMPLETE RANDOMIZED DESIGN

ON STATISTICAL INFERENCE UNDER ASYMMETRIC LOSS. Abstract. We introduce a wide class of asymmetric loss functions and show how to obtain

Another Error Measure for Selection of the Best Forecasting Method: The Unbiased Absolute Percentage Error

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e

DIMENSION REDUCTION OF THE EXPLANATORY VARIABLES IN MULTIPLE LINEAR REGRESSION. P. Filzmoser and C. Croux

MLMED. User Guide. Nicholas J. Rockwood The Ohio State University Beta Version May, 2017

Construction and analysis of Es 2 efficient supersaturated designs

Principal Components Analysis using R Francis Huang / November 2, 2016

Getting Correct Results from PROC REG

PLS205!! Lab 9!! March 6, Topic 13: Covariance Analysis

Assessing intra, inter and total agreement with replicated readings

Rewrap ECON November 18, () Rewrap ECON 4135 November 18, / 35

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Box-Cox Transformations

Lab # 11: Correlation and Model Fitting

Basic Business Statistics 6 th Edition

Testing for Unit Roots with Cointegrated Data

Statistics 5100 Spring 2018 Exam 1

Logistic Regression: Regression with a Binary Dependent Variable

PANEL DATA RANDOM AND FIXED EFFECTS MODEL. Professor Menelaos Karanasos. December Panel Data (Institute) PANEL DATA December / 1

Generating Half-normal Plot for Zero-inflated Binomial Regression

using the beginning of all regression models

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Selection of the Best Regression Equation by sorting out Variables

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e

Leonor Ayyangar, Health Economics Resource Center VA Palo Alto Health Care System Menlo Park, CA

MULTICOLLINEARITY DIAGNOSTIC MEASURES BASED ON MINIMUM COVARIANCE DETERMINATION APPROACH

PENALIZING YOUR MODELS

Regression: Ordinary Least Squares

Lectures on Simple Linear Regression Stat 431, Summer 2012

Multicollinearity Richard Williams, University of Notre Dame, Last revised January 13, 2015

A User's Guide To Principal Components

Unit 11: Multiple Linear Regression

10. Alternative case influence statistics

for Heteroskedasticity and the Eects of Misspecifying the Skedastic Function David A. Belsley Boston College February 23, 2000

The SEQDESIGN Procedure

Evaluating Goodness of Fit in

On a connection between the Bradley Terry model and the Cox proportional hazards model

MULTILEVEL IMPUTATION 1

DIFFERENT APPROACHES TO STATISTICAL INFERENCE: HYPOTHESIS TESTING VERSUS BAYESIAN ANALYSIS

Statistics for Managers using Microsoft Excel 6 th Edition

Nonlinear Regression. Summary. Sample StatFolio: nonlinear reg.sgp

Lecture 1 Linear Regression with One Predictor Variable.p2

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

Paper: ST-161. Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop UMBC, Baltimore, MD

Younshik Chung and Hyungsoon Kim 968). Sharples(990) showed how variance ination can be incorporated easily into general hierarchical models, retainin

Greene, Econometric Analysis (7th ed, 2012)

MORE ON SIMPLE REGRESSION: OVERVIEW

Nonrecursive Models Highlights Richard Williams, University of Notre Dame, Last revised April 6, 2015

Principal Component Analysis, A Powerful Scoring Technique

Dyadic Data Analysis. Richard Gonzalez University of Michigan. September 9, 2010

CHAPTER 7 - FACTORIAL ANOVA

Class Notes Spring 2014

Transcription:

Measuring The Information Content Of Regressors In The Linear Model Using Proc Reg and SAS IML \It is important that there be options in explication, and equally important that the candidates have clear population interpretations.... paintings, diamonds, and data should be examined in several ways." (Kruskal, 187) Joseph Retzer, Market Probe, Inc., Milwaukee WI Kurt Pughoeft, University of Texas, El Paso TX Abstract This paper begins by describing an implementation of Kruskal's relative importance analysis using SAS STAT and SAS IML. While Kruskal's weights lend insight into the relative importance of each regressor, they are non-additive in nature and therefore limit potential interpretation. In order to overcome the non-additivity drawback, an information theoretic measure (as suggested by Theil and Chung in \Information-theoretic measures of t for univariate and multivariate linear regressions", The American Statistician 188) is implemented. In addition, the impact of regressor variable collinearity is examined using simulated data. This paper is targeted toward experienced SAS users familiar with PROC REG in SAS STAT. Additional background knowledge of statistical concepts, particularly with respect to partial correlations and regression analysis is also recommended. 1 Measuring Importance Numerous methods for measuring importance in multiattribute value models have been presented in the literature (see, e.g., (Soo and Retzer, 12)). A review of those which pertain specically to applications in statistical models may be found in (Soo and Retzer, 15). A common method is to rely on the p-values of t-scores attached to the partial regression coecients. Kruskal notes that this is not unlike the old confusion of substantive with statistical signicance. \In fact no necessary connection between importance (a property of the population) and statistical signicance (a property of the sample and population) exists." 1 This represents a confusion between statistical vs. real signicance. While methods other than the use of p-values are employed, each in turn has its drawbacks. Boring also provides insight into potential problems when relying on p-values alone. He notes that \Science begins with description but it ends in generalization." (Boring, 11) This highlights the fact that while statisticians strive for insight regarding the population, they do so while being able to observe only the sample. Insofar as the sample is an accurate representation of the population, our inferences will be justied. We must keep in mind however, that our statistics are 1 See Kruskal 178, International Encyclopedia of Statistics. 1

descriptions of the sample and no more than that. For that reason we may strive to examine the data in more than one way in our attempt to gain insight into the population. Kruskal suggests an alternative method of measuring importance in which we average partial correlations over all model orderings. We implement his suggestions by taking advantage of a number of uniquely powerful components of the SAS system. A limiting aspect of Kruskal's weights is that they are non-additive. That is to say the sum of the weights is not intrinsically meaningful. Theil and Chung suggest examining information theoretic importance measurements which in fact are additive. Computationally, the measurement of information content can be viewed as an extension of Kruskal's analysis. This suggests that once the relative importance measures have been estimated, construction of the information measures is straightforward. This paper proceeds by examining the computation of Kruskal's weights and the subsequent creation of information measures corresponding to linear model regressor variables. It will also consider the impact of collinearity in the design matrix by utilizing simulated data. This lends insight into the appropriateness of competing measures of importance under conditions of model instability. 2 Kruskal's Relative Importance Weights, An Illustration The idea underlying relative importance weighting is to consider the strength of relationship between the regressor and the response variable under varying orderings of inclusion vis a vis the remaining variables. If we were to consider all possible orderings of the variables into the model, and the associated squared partial correlations of a particular variable with that of the response, an average of these partials squared would be considered its relative importance. Consider the case of a four regressor model in which we are attempting to estimate the relative importance of variable x1. The orderings (permutations) and the associated relevant squared partial correlations are given in Table 1 below. x1 x2 x3 x4 x1 x2 x4 x3 x1 x3 x2 x4 x1 x3 x4 x2 x1 x4 x2 x3 x1 x4 x3 x2 6 r 2 y;x1 x2 x1 x3 x4 x2 x1 x4 x3 x3 x1 x2 x4 x3 x1 x4 x2 x4 x1 x2 x3 x4 x1 x3 x2 2 (r 2 y;x1x2 + r 2 y;x1x3 +r 2 y;x1x4 ) x2 x3 x1 x4 x2 x4 x1 x3 x3 x2 x1 x4 x3 x4 x1 x2 x4 x2 x1 x3 x4 x3 x1 x2 2 (r 2 y;x1(x2;x3) + r2 y;x1(x3;x4) +r 2 y;x1(x2;x4) ) x4 x2 x3 x1 x3 x2 x4 x1 x4 x3 x2 x1 x2 x3 x4 x1 x3 x4 x2 x1 x2 x4 x3 x1 6 r 2 y;x1(x2;x3;x4) Table 1: All permutations in the four variable example 2

To understand why the relevant squared partial correlations are as given let us consider each block individually. In \Block 1" the relative importance of x1 is, in all cases, equal to the square of its simple correlation with the response y (r 2 y;x1). This is because x1 is \entering" the model rst, hence no other variables eect need be considered. In \Block 2" however this is no longer the case. For example if the ordering is x2 x1 x3 x4, the relative importance of x1 must be considered after accounting for the presence of x2 in the model. This is estimated by the square of the partial correlation of x1 with the response y given x2 (r 2 y;x1x2). A similar pattern then emerges for all orderings in the remaining blocks. The relative importance of x1 (RI x1 )is then simply the average of all squared correlations (simple and partial) between x1 and y. This may be written as follows, RI x1 = (6 r 2 y;x1 +2(r 2 y;x1x2 + r 2 y;x1x3 +r 2 y;x1x4 + r 2 y;x1(x2;x3) +r 2 y;x1(x3;x4) + r2 y;x1(x2;x4) ) +6 r 2 y;x1(x2;x3;x4) )=24: (1) 3 Partial Correlations Using Proc Reg To begin, we may note the that simple correlation squared is equivalent tor 2 in simple regression analysis. In addition, the squared partial correlation may be calculated through appropriate use of SSE from relevant simple and / or multiple regressions. For example, the squared partial correlation of x1 with y given x2 and x3 can be calculated as follows, 2 r 2 y;x1(x2;x3) = SSE x2;x3, SSE x1;x2;x3 SSE x2;x3 : (2) Where SSE x2;x3 is the regression sum of squared errors resulting from the regression of y on x2 and x3. SSE x1;x2;x3 is the regression sum of squared errors resulting from the regression of y on x1, x2 and x3. A relatively simple way of arriving at all possible subset regression SSE's as well as their corresponding R 2 's is by the use of SAS' \proc reg" with the \selection = rsquare" and \sse" model options. It should be noted that if the number of variables in the model exceeds 10, SAS will default to printing only that same number of subset regression results for any xed number of variables. For example, if there are 12 variables in total, there exist C2 12 combinations or 66 unique subset regressions with 2 regressors. SAS however will only report, by default, the rst 12 models results. This problem may beovercome by including an additional model option, \best = nnnnnn", where nnnnnn is some number large enough to account for the largest number of subset regression models resulting from a xed number of regressors. An illustration of the appropriate \proc reg" command could be written as: proc reg data=regdat out=stats noprint; model y=x1-x4 / selection=rsquare best= sse; run; The data necessary to calculate relative importance would then be found in the data set \stats." Using the information in \stats" we may take advantage of equation (2) to calculate all necessary partial correlations. The indexing of all SSE's (i.e. associating each SSE with a particular regression model) is accomplished using the natural positioning indices associated with matrices via PROC IML. Specically, the 2 See Neter and Wasserman 174. 3

data in \stats" is read into an IML matrix while simultaneously being assigned to rows which indicate the independent variable set contained in a particular model. All that remains in order to calculate the relative importance measurements is to correctly compute the weighted averages of the relevant partial correlations. Computationally, once Kruskal's weights are estimated, the extension to Theil's weights is straightforward. Note that the components on the RHS of (5) are the estimated partial correlations squared which then are transformed to yield the information measure. This transformation is then all that is required in addition to the previous routine. 4 Measuring Regressor Information Content Theil suggests in his 187 paper that an information theoretic measure which quanties the amount of information contained in a particular set of regressor variables may be decomposed to allow for assignment of importance to each independent variable. Specically, Theil suggests that I(R 2 ) 3 be used to quantify the information in the regressors regarding the response variable. The I() function is a well know information based tool for quantifying information. The implied measure would be given as, I(R 2 ) = log 2 (1, R 2 ) (3) In addition we may note that (1, R 2 )maybe decomposed as, 1, R 2 = (1, r 2 y;x1)(1, r 2 y;x2x1) (1, r 2 y;xp (x1;x2;x3;:::x(p,1)) ) (4) 5 The Impact of Collinearity The absence of multicollinearity would lead to a situation in which the importance/information content of independent variables is more easily determined. However, the presence of multicollinearity is more of the rule than the exception. Such conditions may exasperate the decision maker's eorts to construct an appropriate model. Although the decision maker may try to circumvent the problem by running a series of full and reduced regression models, choosing amongst those models could prove to be problematic. Likewise, if all possible regression models from a set of independent variables are not considered, the decision maker may make unwarranted interpretations from the regression output. The information measure associated with the decomposition illustrated in (4) is arrived at by taking the base 2 log of both sides to give, I(R 2 ) = I(r 2 y;x1)+i(r 2 y;x2x1)+ +I(r 2 y;xp (x1;x2;x3;:::x(p,1)) ) (5) Theil notes that if a natural ordering were suggested to be say x1;x2;:::xp then (5) would give a unique additive decomposition of the importance assigned to each variable in the model. If no natural ordering is agreed upon then we may follow Kruskal's method of averaging over all orderings. 3 R 2 being the multiple correlation coecient between the response and regressor variables. To examine dierences in interpretation between Theil's information measures and regression analysis, 1000 observations of 5 variables were randomly generated. The independent variables, x1, x3 were drawn from normal distributions with = 10 and = 2. Multicollinearity was induced into the sample by setting the correlation,, between x3 and x4 to 0.70. This correlation was achieved using the following transformation, x4 =(x3)+(zp 1,2 ), where z N(0; 1). The correlation matrix for the dependent and independent variables is given in Table 2. 4

CORR x1 x2 x3 x4 y x1 1.00 0.03 0.02 0.03 0.67 x2 0.03 1.00 0.02 0.07 0.04 x3 0.02 0.02 1.00 0.72 0.68 x4 0.03 0.07 0.72 1.00 0.51 y 0.67 0.04 0.68 0.51 1.00 Table 2: Pairwise Correlations The parameter estimates as well as p-values resulting from a regression employing all independent variables is given in Table 3. Parameter Variable Estimate Prob > j T j INTERCEP -0.307257 0.241 x1 0.880 0.0001 x2 0.0205 0.178 x3 1.008582 0.0001 x4 0.002871 0.87 Table 3: Regression I, Entire Independent Variable Set The Condition Index for this model was approximately 21.. A Condition Index in the neighborhood of 15-30 seems to represent a borderline situation in which collinearity may become a problem (Belsley et al., 180). A ranking of the variable's importance based upon pvalues, would be x1, x3, x2, and x4 respectively. Note that variables x2 and x4 are not statistically signicant. It appears that the multicollinearity between x3 and x4 is eecting the contribution credited to the latter variable. Indeed, it appears x4 is less valuable than x2 in explaining y. Similar results for a reduced model in which x2 and x3 are eliminated are given in Table 4. 4 Parameter Variable Estimate Prob > jtj INTERCEP 2.74016 0.0001 x1 0.3133 0.0001 x4 0.74168 0.0001 Table 4: Regression II, Reduced Independent Variable Set In this model the true contribution of x4 is more readily apparent. The information levels contained in the regressor set, as measured by Theil's technique, are given in Table 5. x1 x2 x3 x4 1.67258.002114 1.3368 0.27348 Table 5: Theil's Information Measures. (Information in Regressors on Dependent Variable y) Using Theil's information measures the ranking of the variables' information content is x1, x3, x4 and x2 respectively. The percent of total information for these variables are x1 (50.83%), x3 (40.72%), x4 (8.31%) and x2 (0%). Note that Theil's measures correctly assign higher information content to x4 as opposed to x2. 6 Conclusion Looking at simple p-values for assessing relative importance may be misleading and/or incomplete. The p-values indicate only the signicance of the regression coecient with respect to the sample and do not supply information concerning interrelationships among variables. A reasonable method for examining \relative" importance is oered by Kruskal. Calculation of Kruskal's measure can be tedious and computationally intense. This article presents an approach which employs certain aspects of the SAS programming language to derive an eective algorithm which handles the calculation of these weights in models with various sized sets of regressors. In addition, the non-additivity limitation of Kruskal's weights can be overcome by a straightforward transformation suggested by Theil and Chung. This new measure, which has a solid foundation in information theory, provides additional insight into the data. 4 The Condition Index for this model was approximately 14.8 * SAS, SAS STAT, Proc Reg and SAS IML are registered trademarks of SAS Institute Inc., Cary NC, USA. 5

References Belsley, D., Kuh, E., and Welsch, R. (180). Regression Diagnostics, Identifying Inuential Data and Sources of Collinearity. John Wiley & Sons Inc. Boring, E. G. (11). Mathematical vs. scientic signicance. Psychological Bulletin, 16(10):335{338. Kruskal, W. (187). Relative importance by averaging over orderings. The American Statistician, 41(1). Kruskal, W. and Majors, R. (18). Concepts of relative importance in scientic literature. The American Statistician, 43(1). Kruskal, W. H., editor (178). International Encyclopedia of Statistics, chapter Tests of Signicance, pages 44{58. New York Free Press. Neter, J. and Wasserman, W. (174). Applied Linear Statistical Models. Richard D. Irwin Inc. SAS Inc. (a) (188). SAS IML User's Guide Release 6.03. SAS Institute Inc. SAS Inc. (b) (18). SAS Language and Procedures Version 6. SAS Institute Inc., rst edition. SAS Inc. (c) (18). SAS Stat User's Guide Version 6, Volumes 1 and 2. SAS Institute Inc., fourth edition. Soo, E. and Retzer, J. (12). Adjustment of importance weights in multiattribute value models by minimum discrimination information. European Journal of Operational Research, 60. Soo, E. and Retzer, J. (15). A review of relative importance measures in statistics. The Proceedings of the American Statistical Association Bayesian Statistical Sciences Section. Theil, H. (187). How many bits of information does an independent variable yield in a multiple regression? Statistics and Probability Letters, 6(2). Authors Joseph Retzer, Ph.D. Market Probe, Inc. Milwaukee WI, 53226 414-778-6000 (voice) jjr@execpc.com Kurt Pughoeft, Ph.D. University oftexas El Paso TX, 768-0544 15-747-7733 (voice) pug@mail.utep.edu 6