0-1 TALEB AHMAD CASE - Center for Applied Statistics and Economics Humboldt-Universität zu Berlin
Motivation 1-1 Motivation Multivariate regression models can accommodate many explanatory which simultaneously affect the dependent variables. The hypothesis being tested is that there is a joint linear effect of the set of predictor variables on the set of response variables.
Motivation 2-1 Motivation The basic assumptions of multivariate regression model are - multivariate normality of the residuals - homogenous variances of residuals conditional on predictors - common covariance structure across observations - independent observations
Motivation 3-1 Motivation Heteroscedasticity where there is unequal variances for the predictor variables. Multicollinearity explaining variables coefficients are highly correlated. Autocorrelation when the assumption of zero correlation of the error term is violated.
Motivation 3-2 Outline 1. Motivation 2. The Boston housing data 3. für Boston housing data 4. References
The Boston housing data 4-1 Aim: explain price variation X 14 by the variations of all other 13 variables in the Boston Housing data. variables: see variables description
The Boston housing data 4-2 Explorative Zusammenhangsanalyse visual inspection skewness, kurtosis and outliers distribution Q-Q plots and Kolmogorov-Smirnov transformation, Z and log transformations to improve normality
Boxplots: original variables for Boston Housing data Boxplots: transformed variables for Boston Housing data Boxplots for variables
Q-Q plots: (left-right)original variables
Scatterplot matrix (X 1 tox 6 ) and (X 7 to X 14 ) with X 14 respectively
Scatterplot matrix for all variables
The Boston housing data 4-7 descriptive statistics Variable Mean Median Stdd. Skewn. Kurt. X 1 3.61 0.25 8.60 5.19 39.59 X 2 11.36 0 23.32 2.21 6.95 X 3 11.13 9.69 6.86 0.29 1.75 X 4 0.06 0 0.25 3.38 12.48 X 5 0.55 0.53 0.11 0.72 2.91 X 6 6.28 6.20 0.70 0.40 4.84 X 7 68.57 77.5 28.14-0.59 2.02 X 8 3.79 3.20 2.10 1.00 3.45 X 9 9.54 5 8.70 0.99 2.12 X 10 408.24 330 168.54 0.66 1.84 X 11 18.45 19.05 2.16-0.79 2.69 X 12 356.67 391.44 91.29-2.87 10.10 X 13 12.65 11.36 7.14 0.90 3.46 X 14 22.53 21.2 9.19 1.10 4.45 Table 1: Summary statistics: Boston data
Variable transformation 5-1 Transformations X 1 = log(x 1 ) X 2 = log(x 2/10) X 3 = X 3 (normal) X 4 = X 4 (binary) X 5 = log(x 5 ) X 6 = log(x 6 ) X 7 = ( X 7 ) 2 X 8 = log(x 8) 10 2 X 9 = log(x 9 ) X 10 = log(x 10 ) X 11 = exp(0.4x 11) X 10 2 12 = X 12 10 2 X 13 = (X 13 ) X 14 = log(x 14 )
Q-Q plots: (left-right) transformed variables
MLR Modelle 6-1 Multivariate Regression Lineare Modelle X = α 0 + α 1 X 1 + α 2 X 2 +... + α k X k + ε Estimate: 13 X 14 = α 0 + α j Xj + ε j=1 X j are transformed variables X 1 to X 14
MLR Modelle 6-2 Regressionschätzung: Methode Forward selection Step Multiple R R 2 F SigF Variable(s) 1 0.7851 0.6164 809.856 0.000 X13 2 0.8116 0.6587 485.399 0.000 X6 3 0.8271 0.6841 362.315 0.000 X11 4 0.8443 0.7128 310.933 0.000 X8 5 0.8535 0.7285 268.341 0.000 X5 6 0.8573 0.7350 230.723 0.000 X12 7 0.8607 0.7408 203.377 0.000 X3 8 0.8649 0.7480 184.408 0.000 X4 9 0.8663 0.7504 165.707 0.000 X 1 10 0.8689 0.7550 152.552 0.000 X10 11 0.8705 0.7578 140.542 0.000 X 9 12 0.8721 0.7605 130.451 0.000 X2 Table 2: Forward Selection
MLR Modelle 6-3 Regressionschätzung: Methode Forward selection ANOVA SS df MSS F-test P-value Regression 384.050 12 32.004 130.451 0.0000 Residuals 120.950 5e+02 0.245 Total Variation 505 505 1.000 Multip. R = 0.87 R 2 = 0.76 Adj. R 2 = 0.75 Std. Error = 0.49 Table 3: Forward Selection
MLR Modelle 6-4 Regressionschätzung PARAMETERS Beta SE StandB t-test P-value Variable α 0 0.00 0.02 0.00 0.00 1.00 Constant α 1 0.104 0.064 0.10 1.73 0.08 X 1 α 2 0.07 0.03 0.07 2.33 0.01 X 2 α 3-0.10 0.04-0.10-2.51 0.01 X 3 α 4 0.07 0.02 0.07 3.37 0.00 X 4 α 5-0.21 0.05-0.21-4.14 0.00 X 5 α 6 0.21 0.02 0.21 7.06 0.00 X 6 α 7-0.44 0.04-0.44-9.60 0.00 X 8 α 8 0.12 0.04 0.12 2.58 0.01 X 9 α 9-0.20 0.04-0.20-4.42 0.00 X 10 α 10-0.13 0.02-0.13-5.14 0.00 X 11 α 11 0.09 0.02 0.09 3.54 0.00 X 12 α 12-0.57 0.03-0.57-15.73 0.00 X 13 Table 4: Forward Selection
MLR Modelle 6-5 Regressionschätzung R 2 = 0.76 indicates 75% of variation of X 14 is explained by the model 13 X 14 = α 0 + α j Xj + ε P-values table 4 indicates that variables X 1, X 2, X 3 and X 8 have little influence on changes in X 14, the log price of the Houses. j=1
The Boston housing data: comprise 506 observations for each census district of the Boston metropolitan area.
References 7-1 References A. Handl Multivariate Analysemethoden. Springer, 2002. J. Schira Statistische Methoden der BWL und VWL. Pearson, 2002. H. Joe Multivariate Models and Dependence Concepts Chapman & Hall, London, 1997. W. Härdle und L. Simar Applied Multivariate Statistical Analysis. Springer, 2003.