Detecting and Assessing Data Outliers and Leverage Points
|
|
- Luke Cunningham
- 5 years ago
- Views:
Transcription
1 Chapter 9 Detecting and Assessing Data Outliers and Leverage Points
2 Section 9.1 Background
3 Background Because OLS estimators arise due to the minimization of the sum of squared errors, large residuals or outliers can exert considerable influence on parameter estimates, estimates of standard errors, and model prediction. Observations associated with the explanatory variables also may become more influential as they get farther from their respective means. These observations do not involve the dependent or endogenous variable in the econometric analysis, and these observations are labeled leverage points. Leverage points also can exert undue influence on parameter estimates, estimates of standard errors, and model prediction. Influence points by definition are those observations which meet both the label of outlier and leverage point. 3
4 Influence diagnostics relate only to the observations or data points indigenous to the econometric model Concern Particular observations exert undue influence on regression results Outliers/Leverage Points Influential observations may be legitimate or they may be errors in measurement; if they correspond to errors in measurement, then obtain the corrected data. Influential observations (if legitimate) may cast doubt either on the validity of the observation or on the general adequacy of the model 4
5 Section 9.2 Influence Diagnostics
6 Influence Diagnostics Nature of Problem Identification Outliers Leverage Points Studentized Residuals Hat Matrix Dffits Dfbetas Cook s D Statistic Covratio Robust Regression 6
7 Influence Diagnostics (1) Residuals e i = y i ŷ i (2) Elements of the hat matrix H = X(X T X) 1 X T The hat matrix comes from ˆ ˆ T 1 Y = XB = X ( X X ) X Y ˆ = HY T Y. 7
8 Hat diagonal elements (leverage points). T 1 T hii = Xi ( X X ) Xi Provides a measure of standardized distance from the point X to. i X i These diagonal elements highlight observations that are extreme in the X s (data points far from the mean are relatively influential). Leverage is bounded below by 1/n and bounded above by 1; the closer the leverage is to unity, the greater the leverage of the observation. 8
9 Outlier Detection Outlier detection involves the determination of whether the residual (actual value-predicted value) is an extreme negative or positive value. But, we must take into account the variance of the residuals in detecting outliers to level the playing field. 9
10 Create Standardized Residuals A standardized residual is one divided by its standard deviation. resid standardized = y i s yˆ i where s = standard deviation of residuals If the standardized residuals associated with the observations of the data set have values in excess of 3.5 and 3.5, then the corresponding data points are labeled outliers. 10
11 Studentized Residuals R where student = s 2 i R-student = studentized residual e i (1 h ii s th (i) = standard error where the i observation is deleted ) h ii = leverage point; this statistic is large if one or more of the following conditions holds: (1) e i is large ; (2) h ii is large ; or (3) 2 s i is small. The R-student statistic is distributed as t n-p, where n is the number of observations and p is the number of parameters in the model. 11
12 To detect potentially high influence observations: (1) note data points with large hat diagonals (leverage points). (2) large R-student values (size of residuals with appropriate standardization ). R student = y ŷy ˆ [ ] 2 1/ 2 s (1 h ) i i ii i 2 s i is the residual variance after deleting the i th observation 12 n For hat diagonals,,where p is the number of i= 1 h ii = p model parameters and n is the number of observations
13 Guidelines h ii > 2p / n, observations exert considerable leverage (a conservative rule of thumb is 3p/n) R-student > 2, observation is an outlier. To be an influential observation, both criteria must be met. 13
14 14
15 Influence on predicted (or fitted) value ŷ i Diagnostic (DFFITS) 1/2 = (ŷ ŷ )/s (h ) i i i i ii DF DFFITS Difference between the result with X i and without X i The -i implies that the i th observation is not involved in the computation. Represents for the i th point the number of estimated standard errors that the fitted value changes if the i th point is removed from the data set. ŷ i CUTOFF 2 p/ n 15
16 Influence on the regression coefficients For each regression coefficient, the influence diagnostics provide a statistic which gives the number of standard errors that that coefficient changes if the i th observation were set aside. ( DFBETAS ) j, i = ˆ β s j i ˆ β ( C jj j, i ) 1/ 2 C jj jth diagonal element of (X T X) 1. Large value of (DFBETAS) -- i th observation has a sizable impact on the j th regression coefficient. Cutoff 2 n 16
17 Use (DFBETAS) j,i to ascertain which observations influence specific regression coefficients. Analyst must observe n x p statistics to assess the influence on the regression coefficients. Composite measure of the influence on the set of coefficients. Cook s Distance or Cook s D D i = (ˆ ˆ ˆ T T ( β β i ) (X β β i 2 ps X)(ˆ ) Cook s distance represents the standardized distance between the vector of least squares coefficients ˆ ˆ D i > 0 β and β -i Belsley suggests 4/(n-p) as a cutoff.. 17
18 18 Cutoff for Cook s D = 4/(n-p) = 0.24
19 19
20 20
21 Influence on the Variance of Regression Coefficients COVRATIO i = ( X T i ( X T X i ) X ) 1 1 s s 2 i 2 21 The COVRATIO i statistic measures the change in the determinant of the covariance matrix of the estimates by deleting the i th observation. If COVRATIO i > 1, then i th point provides improvement - a reduction in the estimated generalized variance of the coefficient over what would be produced without the data point. If COVRATIO i < 1, inclusion of the i th point results in increasing the generalized variance. Belsley, Kuh, Welsch yardstick (for large samples) i th data point exerts an unusual amount of influence on the generalized variance if COVRATIO or COVRATIO i i > 1+ (3p / n) < 1 (3p / n) ( if n > 3p).
22 22 Cutoff for Cook s D = 4/(22-6) = 0.25
23 23
24 24 Examples of the use of influential diagnostics (1) Uri, N.D. and R. Boyd, Estimating the Regional Demand for Softwood Lumber in the United States, North Central Journal of Agricultural Economics, 12,1 (January 1990): In this article, use of: RSTUDENT HAT DIAGONAL COVRATIO DFFITS (2) Swinton, S.M. and R.P. King, Evaluating Robust Regression Techniques for Detrending Crop Yield Data with Nonnormal Errors, American Journal of Agricultural Economics, May (1991): In this article, use of: RSTUDENT HAT DIAGONAL DFBETAS
25 The INFLUENCE option (in the MODEL statement) requests the statistics proposed by Belsley, Kuh, and Welsch (1980) to measure the influence of each observation. Belsley, Kuh, and Welsch influence diagnostics hii (leverage points) R-Student (studentized residual) COVRATIO DIFFITS DFBETAS Cook s D 25
26 Section 9.3 Solutions to the Problem of Influential Observations
27 Solutions to the Problem of Influential Observations (1) Standard practice omit the influential observation(s) from the analysis; NOT a feasible solution if these observations are legitimate. (2) Robust Regression techniques weight down the influential observation(s), Huber (1973). 27 continued...
28 (3) Use of Dummy or Indicator Variables Associated with Influential Points. Create dummy variables corresponding to each influential data point. Operationally augment the model with these dummy variables and re-estimate with either OLS procedures or GLS procedures. Use of either intercept shifters and slope shifter variables. If a dummy variable is associated with the influential observation, then the corresponding coefficient estimate is likely to be statistically different from zero. Identifying how many dummies to use can be tricky. For example, suppose you have 4 outliers in a dataset of 100 observations. Are we going to use: 1) one dummy that identifies all outliers; 2) two dummies, one for the outliers that seem to be far above the mean, and one for those far below; or 3) four dummies pertaining to each outlier separately? 28
29 Example of Robust Regression Techniques from the Literature Swinton, S.M. and R.P. King, Evaluating Robust Regression Techniques for Detrending Crop Yield Data with Nonnormal Errors, American Journal of Agricultural Economics, May (1991): Because OLS minimizes the sum of squared errors, large residuals or outliers can exert considerable influence on parameter estimates. Robust regression methods give less weight than OLS to influential observations. They can be viewed as automated means of reducing outlier influence and are classified by their approach to controlling influential outliers. 29
30 Section 9.4 Robust Regression Techniques
31 31 Robust Regression Techniques M-estimators employ maximum likelihood (ML) techniques for finding estimates of model parameters that minimize some function of the regression residuals. example: Multivariate t (Judge et al., 1988) L-estimators, linear combinations of order statistics, calculate estimates of model parameters based on quantiles of the residuals. The quantiles then are combined with specified weights. examples: least absolute error (LAE) (most common) trimmed mean (TRIM) five-quantity weighted regression quantile (FIVEQUAN) Gastwirth weighted regression quantile (GASTWIRTH) Tukey tri-mean weighted regression quantile (TUKEY) No closed-form solutions for M-estimators and L-estimators. Estimation is done by optimization techniques but convergence is not guaranteed.
32 Robust Regression OLS suffers in performance in the presence of outliers and certain non-normal error distributions (heavy tails). Robust estimators are not sensitive to outliers. Robust estimators essentially weight down the influence of data points that produce residuals large in magnitude. MAD (Mean Absolute Deviation) LAR (Least Absolute Residual) LAE (Least Absolute Error) Minimizes the sum of the absolute values of the residuals 32 Criterion: Minimize Robust estimator i= 1 OLS Estimator: Minimize n y i ŷ i n i= 1 (yi ŷi) 2
33 The ROBUSTREG Procedure in SAS Overview The main purpose of robust regression is to detect outliers and provide resistant (stable) results in the presence of outliers. In order to achieve this stability, robust regression limits the influence of outliers. Historically, three classes of problems have been addressed with robust regression techniques: problems with outliers in the y-direction (response direction) problems with multivariate outliers in the x-space (i.e., outliers in the covariate space, which are also referred to as leverage points) problems with outliers in both the y-direction and the x-space 33
34 Many methods have been developed in response to these problems. However, in statistical applications of outlier detection and robust regression,the methods most commonly used today are Huber M estimation, high breakdown value estimation, and combinations of these two methods. The ROBUSTREG procedure provides four such methods: M estimation, LTS estimation, S estimation, and MM estimation. 34 continued...
35 (1) M estimation was introduced by Huber (1973), and it is the simplest approach both computationally and theoretically. Although it is not robust with respect to leverage points, it is still used extensively in analyzing data for which it can be assumed that the contamination is mainly in the response direction. 35 (2) Least Trimmed Squares (LTS) estimation is a high breakdown value method introduced by Rousseeuw (1984). The breakdown value is a measure of the proportion of contamination that an estimation method can withstand and still maintain its robustness. The performance of this method was improved by the FAST-LTS algorithm of Rousseeuw and Van Driessen (2000). (3) S estimation is a high breakdown value method introduced by Rousseeuw and Yohai (1984). With the same breakdown value, it has a higher statistical efficiency than LTS estimation. (4) MM estimation, introduced by Yohai (1987), combines high breakdown value estimation and M estimation. It has both the high breakdown property and a higher statistical efficiency than S estimation.
36 Example 1: Pine Tree Problem Consider the forestry data in the table below: 36 STAND CHARACTERISTICS FOR PINE TREES AGE HD N MDBH Source: Burkhart, H.E., R.C. Parker, M.R. Strob, and R. Oberwald. Yields of Old-field Loblolly Pine Plantations, Division of Forestry and Wildlife Resources Publication FWS-3-72, Virginia Tech, Blacksburg, VA, 1972.
37 MDBH where, i = β + β HD + β AGE N) + β ( HD / N) + ε, ( 0 1 i 2 i 3 i i MDBH i HD i N i AGE i = = = = the average diameter at breast height (measured at 4.5 feet above ground) at AGE i, the average height of dominant trees in feet, the number of pine trees per acre in age, AGE i, and the age of a particular pine stand. Compute HAT diagonals, Cook s D values, DFFITS values, and DFBETAS for each of the 20 observations. Determine whether or not any of the 20 observations exert a disproportionate influence on the results. If so, use robust regression methods to circumvent the problem. 37
38 EXAMPLE: 20 observations; p = 4 Pine Tree Data cutoff for hii 2p/n = 8/20 =.40 cutoff for DFFITS 2 p / n = 2.2 = cutoff for Cook s D 4 n p = 4 16 =.25 cutoff for DFBETAS 2 n = 2 20 = 0.45 cutoff of COVRATIO 3p 12 1 = 1 = n p = 1+ = 1.6 n 20
39 DFBETAS DFFITS Cook s D COVRATIO Intercept HD AGEN HDN OBS OBS OBS Observation 9 is a leverage point but not an outlier. Observation 10 is an outlier but not a leverage point. Observation 20 is an influential point. 39
40 Robust Regression Procedures OLS LAE a M Estimation S Estimation LTS Estimation MM Estimation Intercept (0.3466) (0.2729) (0.3838) (0.4219) NR b (0.3864) HD (0.0254) (0.0199) (0.0281) (0.0314) NR (0.0287) AGEN ( ) ( ) (0.0001) (0.0001) NR (0.0001) HDN (8.3738) (6.5914) (9.2718) ( ) NR (9.6084) R a From Shazam, not SAS b NR Not Reported by SAS 40
41 Use of Intercept Shifter for Influential Observation (OBS 20) OLS Intercept (0.3467) HD (0.0254) OLS with Intercept Shifter (0.3436) (0.0249) AGEN ( ) ( ) HDN (8.3738) (9.3732) OBS (0.3700) R R
42 Dependent Variable: MDBH Number of Observations Read 20 Number of Observations Used 20 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var continued...
43 OLS parameter estimates Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept <.0001 HD agen hdn Durbin-Watson D Pr < DW Pr > DW Number of Observations 20 1st Order Autocorrelation NOTE: Pr<DW is the p-value for testing positive autocorrelation, and Pr>DW is the p-value for testing negative autocorrelation. 43 continued...
44 Output Statistics Dependent Predicted Std Error Std Error Student Cook's Obs Variable Value Mean Predict Residual Residual Residual D ** *** ** * * * **** * * * *** ***
45 Hat Diag Cov DFBETAS Obs RStudent H Ratio DFFITS Intercept HD agen hdn continued...
46 Hat Diag Cov DFBETAS Obs RStudent H Ratio DFFITS Intercept HD agen hdn Influence Diagnostics 46
47 Sum of Residuals 0 Sum of Squared Residuals Predicted Residual SS (PRESS) Model Information Data Set WORK.PINETREE Dependent Variable MDBH Number of Independent Variables 3 Number of Observations 20 Method M Estimation Number of Observations Read 20 Number of Observations Used 20 Summary Statistics Standard Variable Q1 Median Q3 Mean Deviation MAD HD agen hdn MDBH
48 Parameter Estimates Standard 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq M estimation Intercept <.0001 HD agen hdn Scale Diagnostics Robust Standardized Mahalanobis MCD Robust Obs Distance Distance Leverage Residual Outlier * * * * Diagnostics Summary Observation Type Proportion Cutoff 48 Outlier Leverage continued...
49 Goodness-of-Fit Statistic Value R-Square AICR BICR Deviance The ROBUSTREG Procedure Model Information Data Set WORK.PINETREE Dependent Variable MDBH Number of Independent Variables 3 Number of Observations 20 Method S Estimation Number of Observations Read 20 Number of Observations Used 20 S estimation Summary Statistics Standard Variable Q1 Median Q3 Mean Deviation MAD HD agen hdn MDBH continued...
50 S estimation S Profile Total Number of Observations 20 Number of Coefficients 4 Subset Size 4 Chi Function Tukey K Breakdown Value Efficiency Parameter Estimates Standard 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr>ChiSq Intercept <.0001 HD agen hdn Scale continued...
51 Diagnostics Robust Standardized Mahalanobis MCD Robust Obs Distance Distance Leverage Residual Outlier * * * * Diagnostics Summary Observation Type Proportion Cutoff Outlier Leverage Goodness-of-Fit Statistic Value 51 R-Square Deviance continued...
52 Model Information Data Set WORK.PINETREE Dependent Variable MDBH Number of Independent Variables 3 Number of Observations 20 Method LTS Estimation Number of Observations Read 20 Number of Observations Used 20 Summary Statistics Standard Variable Q1 Median Q3 Mean Deviation MAD HD agen hdn MDBH LTS Profile 52 Total Number of Observations 20 Number of Squares Minimized 16 Number of Coefficients 4 Highest Possible Breakdown Value continued...
53 LTS Parameter Estimates Parameter DF Estimate Intercept HD agen hdn Scale (slts) Scale (Wscale) LTS estimation The ROBUSTREG Procedure Diagnostics Robust Standardized Mahalanobis MCD Robust Obs Distance Distance Leverage Residual Outlier * * * * * Diagnostics Summary Observation Type Proportion Cutoff 53 Outlier Leverage continued...
54 R-Square for LTS Estimation R-Square Model Information Data Set WORK.PINETREE Dependent Variable MDBH Number of Independent Variables 3 Number of Observations 20 Method MM Estimation 54 Number of Observations Read 20 Number of Observations Used 20 Summary Statistics Standard Variable Q1 Median Q3 Mean Deviation MAD HD agen hdn MDBH continued...
55 Profile for the Initial LTS Estimate Total Number of Observations 20 Number of Squares Minimized 16 Number of Coefficients 4 Highest Possible Breakdown Value MM estimation MM Profile Chi Function Tukey K Efficiency Parameter Estimates Standard 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr>ChiSq 55 Intercept <.0001 HD agen hdn Scale continued...
56 Diagnostics Robust Standardized Mahalanobis MCD Robust Obs Distance Distance Leverage Residual Outlier * * * * Diagnostics Summary Observation Type Proportion Cutoff Outlier Leverage Goodness-of-Fit Statistic Value 56 R-Square AICR BICR Deviance continued...
57 OLS estimation with dummy variable for the outlier (obs 20) The REG Procedure Dependent Variable: MDBH Number of Observations Read 20 Number of Observations Used 20 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var continued...
58 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept <.0001 HD agen hdn inf Dependent Variable: MDBH Durbin-Watson D Pr < DW Pr > DW Number of Observations 20 1st Order Autocorrelation NOTE: Pr<DW is the p-value for testing positive autocorrelation, and Pr>DW is the p-value for testing negative autocorrelation.
59 Section 9.5 Examples
60 Example 2: Demand Function for FF20 PUBLIX 165 Observations; p = 11; Cutoffs for hii 2 p / n = 22/165 = 0.13 DFFITS 2 p / n = 2 11/165 = 0.52 DFBETAS 2 n = = 0.16 Cook s D 4 n p = = 0.03 COVRATIO 3p 33 1 = 1 = n ; 3p 1 + = n 1.2 Influential Observations: 16, 27, 32, 52 OBS 16 OBS 27 OBS 32 OBS DFFITS Cook s D COVRATIO
61 Robust Regression Procedures OLS M Estimation S Estimation LTS Estimation MM Estimation Intercept (0.7926) (0.7109) (0.7843) NR (0.7351) Week (0.0003) (0.0003) (0.0003) NR (0.0003) LOGDISC (0.1890) (0.1695) (0.2064) NR (0.1915) LOGPRICE (0.5534) (0.4964) (0.5474) NR (0.5129) FSI (0.0893) (0.0801) (0.0885) NR (0.0827) LOGDISP (0.1873) (0.1680) (0.1991) NR (0.1846) LOGAD (0.1020) (0.0915) (0.1183) NR (0.1084) LOGDIST (3.7027) (3.3213) (3.8424) NR (3.5320) 61
62 Robust Regression Procedures OLS M Estimation S Estimation LTS Estimation MM Estimation Q (0.0406) (0.0364) (0.0408) NR (0.0382) Q (0.0419) (0.0376) (0.0417) NR (0.0391) Q (0.0429) (0.0384) (0.0450) NR (0.0415) R R Influential Observations , 27, 32, 52 27, 52 27, 52 27, 52, 149, , 52 NR not reported by SAS 62
63 The REG Procedure Dependent Variable: LOGUNITS Number of Observations Read 165 Number of Observations Used 165 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var continued...
64 Parameter Estimates OLS parameter estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept <.0001 WEEK LOGDISC <.0001 LOGPRICE <.0001 FSI LOGDISP <.0001 LOGAD <.0001 LOGDIST Q Q Q continued...
65 The REG Procedure Dependent Variable: LOGUNITS Durbin-Watson D Pr < DW <.0001 Pr > DW Number of Observations 165 1st Order Autocorrelation NOTE: Pr<DW is the p-value for testing positive autocorrelation, and Pr>DW is the p-value for testing negative autocorrelation. 65 continued...
66 Output Statistics Dependent Predicted Std Error Std Error Student Cook's Obs Variable Value Mean Predict Residual Residual Residual D ** ** ** ** * ** continued...
67 **** * * *** * *** ****** continued...
68 * ***** ** ** **** * * * * * * continued...
69 * * * * ****** * ** * * * continued...
70 Output Statistics Hat Diag Cov DFBETAS Obs RStudent H Ratio DFFITS Intercept WEEK LOGDISC LOGPRICE FSI continued...
71 continued...
72 continued...
73 continued...
74 Output Statistics DFBETAS Obs LOGDISP LOGAD LOGDIST Q1 Q2 Q continued...
75 continued...
76 continued...
77 Sum of Residuals E-13 Sum of Squared Residuals Predicted Residual SS (PRESS) Model Information Data Set WORK.PUBLIXFF20 Dependent Variable LOGUNITS Number of Independent Variables 10 Number of Observations 165 Method M Estimation Number of Observations Read 165 Number of Observations Used continued...
78 Summary Statistics Standard Variable Q1 Median Q3 Mean Deviation MAD WEEK LOGDISC LOGPRICE FSI LOGDISP LOGAD LOGDIST Q Q Q LOGUNITS continued...
79 Parameter Estimates Standard 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr>ChiSq Intercept <.0001 WEEK LOGDISC <.0001 LOGPRICE <.0001 FSI LOGDISP <.0001 LOGAD <.0001 LOGDIST Q Q Q Scale continued...
80 The ROBUSTREG Procedure Diagnostics Robust Standardized Mahalanobis MCD Robust Obs Distance Distance Leverage Residual Outlier * * Diagnostics Summary Observation Type Proportion Cutoff Outlier Leverage Goodness-of-Fit Statistic Value 80 R-Square AICR BICR Deviance continued...
81 Model Information Data Set WORK.PUBLIXFF20 Dependent Variable LOGUNITS Number of Independent Variables 10 Number of Observations 165 Method S Estimation Number of Observations Read 165 Number of Observations Used 165 Summary Statistics Standard Variable Q1 Median Q3 Mean Deviation MAD 81 WEEK LOGDISC LOGPRICE FSI LOGDISP LOGAD LOGDIST Q Q Q LOGUNITS
82 S Profile Total Number of Observations 165 Number of Coefficients 11 Subset Size 11 Chi Function Tukey K Breakdown Value Efficiency Parameter Estimates Standard 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept <.0001 WEEK LOGDISC <.0001 LOGPRICE < continued...
83 The ROBUSTREG Procedure Parameter Estimates Standard 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq FSI LOGDISP <.0001 LOGAD <.0001 LOGDIST Q Q Q Scale continued...
84 Diagnostics Robust Standardized Mahalanobis MCD Robust Obs Distance Distance Leverage Residual Outlier * * Diagnostics Summary Observation Type Proportion Cutoff Outlier Leverage Goodness-of-Fit Statistic Value R-Square Deviance Model Information Data Set WORK.PUBLIXFF20 Dependent Variable LOGUNITS Number of Independent Variables 10 Number of Observations 165 Method LTS Estimation 84 Number of Observations Read 165 Number of Observations Used 165 continued...
85 Summary Statistics Standard Variable Q1 Median Q3 Mean Deviation MAD WEEK LOGDISC LOGPRICE FSI LOGDISP LOGAD LOGDIST Q Q Q LOGUNITS LTS Profile 85 Total Number of Observations 165 Number of Squares Minimized 126 Number of Coefficients 11 Highest Possible Breakdown Value
86 LTS Parameter Estimates Parameter DF Estimate Intercept WEEK LOGDISC LOGPRICE FSI LOGDISP LOGAD LOGDIST LTS Parameter Estimates Parameter DF Estimate Q Q Q Scale (slts) Scale (Wscale) continued...
87 Diagnostics Robust Standardized Mahalanobis MCD Robust Obs Distance Distance Leverage Residual Outlier * * * * Diagnostics Summary Observation Type Proportion Cutoff Outlier Leverage R-Square for LTS Estimation 87 R-Square continued...
88 Model Information Data Set WORK.PUBLIXFF20 Dependent Variable LOGUNITS Number of Independent Variables 10 Number of Observations 165 Method MM Estimation Number of Observations Read 165 Number of Observations Used Summary Statistics Standard Variable Q1 Median Q3 Mean Deviation MAD WEEK LOGDISC LOGPRICE FSI LOGDISP LOGAD LOGDIST Q Q Q LOGUNITS
89 Profile for the Initial LTS Estimate Total Number of Observations 165 Number of Squares Minimized 126 Number of Coefficients 11 Highest Possible Breakdown Value MM Profile Chi Function Tukey K Efficiency continued...
90 Parameter Estimates Standard 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept <.0001 WEEK LOGDISC <.0001 LOGPRICE <.0001 FSI LOGDISP <.0001 LOGAD <.0001 LOGDIST Q Q Q Scale continued...
91 Diagnostics Robust Standardized Mahalanobis MCD Robust Obs Distance Distance Leverage Residual Outlier * * Diagnostics Summary Observation Type Proportion Cutoff Outlier Leverage Goodness-of-Fit Statistic Value 91 R-Square AICR BICR Deviance
92 Section 9.6 Commentary
93 Commentary (1) Always examine the data to determine the existence of influential observations (2) The existence of both leverage points (h ii values) and outliers (R-student statistics). (3) Use of DFFITS, DFBETAS, Cook s D, and COVRATIO to determine the impact of influential observations on prediction, estimated coefficients, and the variance of the estimated coefficients. (4) Solutions to this issue amount to the use of robust regression procedures (sophisticated) or to the use of dummy variables (pedestrian). (5) The dummy variable approach is more straight forward and may involve either intercept and/or slope shifters. (6) Unless the data are incorrect, never eliminate influential observations from the analysis. 93
Residuals from regression on original data 1
Residuals from regression on original data 1 Obs a b n i y 1 1 1 3 1 1 2 1 1 3 2 2 3 1 1 3 3 3 4 1 2 3 1 4 5 1 2 3 2 5 6 1 2 3 3 6 7 1 3 3 1 7 8 1 3 3 2 8 9 1 3 3 3 9 10 2 1 3 1 10 11 2 1 3 2 11 12 2 1
More informationTopic 18: Model Selection and Diagnostics
Topic 18: Model Selection and Diagnostics Variable Selection We want to choose a best model that is a subset of the available explanatory variables Two separate problems 1. How many explanatory variables
More informationRegression Analysis for Data Containing Outliers and High Leverage Points
Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain
More informationCOMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION
COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION Answer all parts. Closed book, calculators allowed. It is important to show all working,
More informationSection 2 NABE ASTEF 65
Section 2 NABE ASTEF 65 Econometric (Structural) Models 66 67 The Multiple Regression Model 68 69 Assumptions 70 Components of Model Endogenous variables -- Dependent variables, values of which are determined
More information1) Answer the following questions as true (T) or false (F) by circling the appropriate letter.
1) Answer the following questions as true (T) or false (F) by circling the appropriate letter. T F T F T F a) Variance estimates should always be positive, but covariance estimates can be either positive
More informationStat 500 Midterm 2 12 November 2009 page 0 of 11
Stat 500 Midterm 2 12 November 2009 page 0 of 11 Please put your name on the back of your answer book. Do NOT put it on the front. Thanks. Do not start until I tell you to. The exam is closed book, closed
More informationLINEAR REGRESSION. Copyright 2013, SAS Institute Inc. All rights reserved.
LINEAR REGRESSION LINEAR REGRESSION REGRESSION AND OTHER MODELS Type of Response Type of Predictors Categorical Continuous Continuous and Categorical Continuous Analysis of Variance (ANOVA) Ordinary Least
More informationSTATISTICS 479 Exam II (100 points)
Name STATISTICS 79 Exam II (1 points) 1. A SAS data set was created using the following input statement: Answer parts(a) to (e) below. input State $ City $ Pop199 Income Housing Electric; (a) () Give the
More informationRegression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin
Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n
More informationMultiple Linear Regression
Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from
More informationOutline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping
Topic 19: Remedies Outline Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping Regression Diagnostics Summary Check normality of the residuals
More informationCHAPTER 5. Outlier Detection in Multivariate Data
CHAPTER 5 Outlier Detection in Multivariate Data 5.1 Introduction Multivariate outlier detection is the important task of statistical analysis of multivariate data. Many methods have been proposed for
More informationRegression Diagnostics
Diag 1 / 78 Regression Diagnostics Paul E. Johnson 1 2 1 Department of Political Science 2 Center for Research Methods and Data Analysis, University of Kansas 2015 Diag 2 / 78 Outline 1 Introduction 2
More informationRegression Diagnostics for Survey Data
Regression Diagnostics for Survey Data Richard Valliant Joint Program in Survey Methodology, University of Maryland and University of Michigan USA Jianzhu Li (Westat), Dan Liao (JPSM) 1 Introduction Topics
More informationStatistical Modelling in Stata 5: Linear Models
Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 07/11/2017 Structure This Week What is a linear model? How good is my model? Does
More informationChapter 11: Robust & Quantile regression
Chapter 11: Robust & Adapted from Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1 / 13 11.3: Robust regression Leverages h ii and deleted residuals t i
More informationRegression Model Specification in R/Splus and Model Diagnostics. Daniel B. Carr
Regression Model Specification in R/Splus and Model Diagnostics By Daniel B. Carr Note 1: See 10 for a summary of diagnostics 2: Books have been written on model diagnostics. These discuss diagnostics
More informationSTAT 4385 Topic 06: Model Diagnostics
STAT 4385 Topic 06: Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 1/ 40 Outline Several Types of Residuals Raw, Standardized, Studentized
More informationReview: Second Half of Course Stat 704: Data Analysis I, Fall 2014
Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014 Tim Hanson, Ph.D. University of South Carolina T. Hanson (USC) Stat 704: Data Analysis I, Fall 2014 1 / 13 Chapter 8: Polynomials & Interactions
More informationholding all other predictors constant
Multiple Regression Numeric Response variable (y) p Numeric predictor variables (p < n) Model: Y = b 0 + b 1 x 1 + + b p x p + e Partial Regression Coefficients: b i effect (on the mean response) of increasing
More informationStatistics for exp. medical researchers Regression and Correlation
Faculty of Health Sciences Regression analysis Statistics for exp. medical researchers Regression and Correlation Lene Theil Skovgaard Sept. 28, 2015 Linear regression, Estimation and Testing Confidence
More informationSTATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002
Time allowed: 3 HOURS. STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 This is an open book exam: all course notes and the text are allowed, and you are expected to use your own calculator.
More informationSTATISTICS 110/201 PRACTICE FINAL EXAM
STATISTICS 110/201 PRACTICE FINAL EXAM Questions 1 to 5: There is a downloadable Stata package that produces sequential sums of squares for regression. In other words, the SS is built up as each variable
More informationRegression Model Building
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation in Y with a small set of predictors Automated
More informationAlternative Biased Estimator Based on Least. Trimmed Squares for Handling Collinear. Leverage Data Points
International Journal of Contemporary Mathematical Sciences Vol. 13, 018, no. 4, 177-189 HIKARI Ltd, www.m-hikari.com https://doi.org/10.1988/ijcms.018.8616 Alternative Biased Estimator Based on Least
More informationLINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises
LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on
More informationREGRESSION DIAGNOSTICS AND REMEDIAL MEASURES
REGRESSION DIAGNOSTICS AND REMEDIAL MEASURES Lalmohan Bhar I.A.S.R.I., Library Avenue, Pusa, New Delhi 110 01 lmbhar@iasri.res.in 1. Introduction Regression analysis is a statistical methodology that utilizes
More informationDr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines)
Dr. Maddah ENMG 617 EM Statistics 11/28/12 Multiple Regression (3) (Chapter 15, Hines) Problems in multiple regression: Multicollinearity This arises when the independent variables x 1, x 2,, x k, are
More informationunadjusted model for baseline cholesterol 22:31 Monday, April 19,
unadjusted model for baseline cholesterol 22:31 Monday, April 19, 2004 1 Class Level Information Class Levels Values TRETGRP 3 3 4 5 SEX 2 0 1 Number of observations 916 unadjusted model for baseline cholesterol
More informationEXST7015: Estimating tree weights from other morphometric variables Raw data print
Simple Linear Regression SAS example Page 1 1 ********************************************; 2 *** Data from Freund & Wilson (1993) ***; 3 *** TABLE 8.24 : ESTIMATING TREE WEIGHTS ***; 4 ********************************************;
More informationLAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION
LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION In this lab you will first learn how to display the relationship between two quantitative variables with a scatterplot and also how to measure the strength of
More informationIES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc
IES 612/STA 4-573/STA 4-576 Winter 2008 Week 1--IES 612-STA 4-573-STA 4-576.doc Review Notes: [OL] = Ott & Longnecker Statistical Methods and Data Analysis, 5 th edition. [Handouts based on notes prepared
More informationSAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c
Inference About the Slope ffl As with all estimates, ^fi1 subject to sampling var ffl Because Y jx _ Normal, the estimate ^fi1 _ Normal A linear combination of indep Normals is Normal Simple Linear Regression
More informationa. YOU MAY USE ONE 8.5 X11 TWO-SIDED CHEAT SHEET AND YOUR TEXTBOOK (OR COPY THEREOF).
STAT3503 Test 2 NOTE: a. YOU MAY USE ONE 8.5 X11 TWO-SIDED CHEAT SHEET AND YOUR TEXTBOOK (OR COPY THEREOF). b. YOU MAY USE ANY ELECTRONIC CALCULATOR. c. FOR FULL MARKS YOU MUST SHOW THE FORMULA YOU USE
More informationSTA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007
STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 LAST NAME: SOLUTIONS FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 302 STA 1001 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator.
More informationNonlinear Regression. Summary. Sample StatFolio: nonlinear reg.sgp
Nonlinear Regression Summary... 1 Analysis Summary... 4 Plot of Fitted Model... 6 Response Surface Plots... 7 Analysis Options... 10 Reports... 11 Correlation Matrix... 12 Observed versus Predicted...
More informationPrepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti
Prepared by: Prof Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti Putra Malaysia Serdang M L Regression is an extension to
More informationGeneral Linear Model (Chapter 4)
General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients
More informationBooklet of Code and Output for STAC32 Final Exam
Booklet of Code and Output for STAC32 Final Exam December 8, 2014 List of Figures in this document by page: List of Figures 1 Popcorn data............................. 2 2 MDs by city, with normal quantile
More informationAny of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.
STATGRAPHICS Rev. 9/13/213 Calibration Models Summary... 1 Data Input... 3 Analysis Summary... 5 Analysis Options... 7 Plot of Fitted Model... 9 Predicted Values... 1 Confidence Intervals... 11 Observed
More informationBeam Example: Identifying Influential Observations using the Hat Matrix
Math 3080. Treibergs Beam Example: Identifying Influential Observations using the Hat Matrix Name: Example March 22, 204 This R c program explores influential observations and their detection using the
More informationChapter 11: Robust & Quantile regression
Chapter 11: Robust & Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1/17 11.3: Robust regression 11.3 Influential cases rem. measure: Robust regression
More informationLecture 9 SLR in Matrix Form
Lecture 9 SLR in Matrix Form STAT 51 Spring 011 Background Reading KNNL: Chapter 5 9-1 Topic Overview Matrix Equations for SLR Don t focus so much on the matrix arithmetic as on the form of the equations.
More informationLecture notes on Regression & SAS example demonstration
Regression & Correlation (p. 215) When two variables are measured on a single experimental unit, the resulting data are called bivariate data. You can describe each variable individually, and you can also
More informationAutocorrelation or Serial Correlation
Chapter 6 Autocorrelation or Serial Correlation Section 6.1 Introduction 2 Evaluating Econometric Work How does an analyst know when the econometric work is completed? 3 4 Evaluating Econometric Work Econometric
More informationMath 423/533: The Main Theoretical Topics
Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)
More informationLecture 1: Linear Models and Applications
Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation
More informationContents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects
Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32 Model Diagnostics:
More informationCircle a single answer for each multiple choice question. Your choice should be made clearly.
TEST #1 STA 4853 March 4, 215 Name: Please read the following directions. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directions This exam is closed book and closed notes. There are 31 questions. Circle
More informationUnit 10: Simple Linear Regression and Correlation
Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for
More informationIntroduction to Linear regression analysis. Part 2. Model comparisons
Introduction to Linear regression analysis Part Model comparisons 1 ANOVA for regression Total variation in Y SS Total = Variation explained by regression with X SS Regression + Residual variation SS Residual
More information((n r) 1) (r 1) ε 1 ε 2. X Z β+
Bringing Order to Outlier Diagnostics in Regression Models D.R.JensenandD.E.Ramirez Virginia Polytechnic Institute and State University and University of Virginia der@virginia.edu http://www.math.virginia.edu/
More informationDepartment of Mathematics The University of Toledo. Master of Science Degree Comprehensive Examination Applied Statistics.
Department of Mathematics The University of Toledo Master of Science Degree Comprehensive Examination Applied Statistics April 8, 205 nstructions Do all problems. Show all of your computations. Prove all
More informationG. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication
G. S. Maddala Kajal Lahiri WILEY A John Wiley and Sons, Ltd., Publication TEMT Foreword Preface to the Fourth Edition xvii xix Part I Introduction and the Linear Regression Model 1 CHAPTER 1 What is Econometrics?
More informationMath 3330: Solution to midterm Exam
Math 3330: Solution to midterm Exam Question 1: (14 marks) Suppose the regression model is y i = β 0 + β 1 x i + ε i, i = 1,, n, where ε i are iid Normal distribution N(0, σ 2 ). a. (2 marks) Compute the
More informationCircle the single best answer for each multiple choice question. Your choice should be made clearly.
TEST #1 STA 4853 March 6, 2017 Name: Please read the following directions. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directions This exam is closed book and closed notes. There are 32 multiple choice
More informationssh tap sas913, sas
B. Kedem, STAT 430 SAS Examples SAS8 ===================== ssh xyz@glue.umd.edu, tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm Multiple Regression ====================== 0. Show
More informationVariable Selection and Model Building
LINEAR REGRESSION ANALYSIS MODULE XIII Lecture - 37 Variable Selection and Model Building Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur The complete regression
More informationTwo Stage Robust Ridge Method in a Linear Regression Model
Journal of Modern Applied Statistical Methods Volume 14 Issue Article 8 11-1-015 Two Stage Robust Ridge Method in a Linear Regression Model Adewale Folaranmi Lukman Ladoke Akintola University of Technology,
More information10 Model Checking and Regression Diagnostics
10 Model Checking and Regression Diagnostics The simple linear regression model is usually written as i = β 0 + β 1 i + ɛ i where the ɛ i s are independent normal random variables with mean 0 and variance
More informationLecture 3: Inference in SLR
Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals
More informationLeverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response.
Leverage Some cases have high leverage, the potential to greatly affect the fit. These cases are outliers in the space of predictors. Often the residuals for these cases are not large because the response
More informationT-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum
T-test: means of Spock's judge versus all other judges 1 The TTEST Procedure Variable: pcwomen judge1 N Mean Std Dev Std Err Minimum Maximum OTHER 37 29.4919 7.4308 1.2216 16.5000 48.9000 SPOCKS 9 14.6222
More information8. Example: Predicting University of New Mexico Enrollment
8. Example: Predicting University of New Mexico Enrollment year (1=1961) 6 7 8 9 10 6000 10000 14000 0 5 10 15 20 25 30 6 7 8 9 10 unem (unemployment rate) hgrad (highschool graduates) 10000 14000 18000
More informationChapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression
BSTT523: Kutner et al., Chapter 1 1 Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression Introduction: Functional relation between
More informationLecture 10 Multiple Linear Regression
Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable
More information9 Correlation and Regression
9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the
More informationBE640 Intermediate Biostatistics 2. Regression and Correlation. Simple Linear Regression Software: SAS. Emergency Calls to the New York Auto Club
BE640 Intermediate Biostatistics 2. Regression and Correlation Simple Linear Regression Software: SAS Emergency Calls to the New York Auto Club Source: Chatterjee, S; Handcock MS and Simonoff JS A Casebook
More informationMultiple Linear Regression
Andrew Lonardelli December 20, 2013 Multiple Linear Regression 1 Table Of Contents Introduction: p.3 Multiple Linear Regression Model: p.3 Least Squares Estimation of the Parameters: p.4-5 The matrix approach
More informationRegression Diagnostics Procedures
Regression Diagnostics Procedures ASSUMPTIONS UNDERLYING REGRESSION/CORRELATION NORMALITY OF VARIANCE IN Y FOR EACH VALUE OF X For any fixed value of the independent variable X, the distribution of the
More informationLecture 11: Simple Linear Regression
Lecture 11: Simple Linear Regression Readings: Sections 3.1-3.3, 11.1-11.3 Apr 17, 2009 In linear regression, we examine the association between two quantitative variables. Number of beers that you drink
More informationUCD CENTRE FOR ECONOMIC RESEARCH WORKING PAPER SERIES
UCD CENTRE FOR ECONOMIC RESEARCH WORKING PAPER SERIES 2005 Doctors Fees in Ireland Following the Change in Reimbursement: Did They Jump? David Madden, University College Dublin WP05/20 November 2005 UCD
More informationMATH Notebook 4 Spring 2018
MATH448001 Notebook 4 Spring 2018 prepared by Professor Jenny Baglivo c Copyright 2010 2018 by Jenny A. Baglivo. All Rights Reserved. 4 MATH448001 Notebook 4 3 4.1 Simple Linear Model.................................
More informationMulti-Equation Structural Models: Seemingly Unrelated Regression Models
Chapter 15 Multi-Equation Structural Models: Seemingly Unrelated Regression Models Section 15.1 Seemingly Unrelated Regression Models Modeling Approaches Econometric (Structural) Models Time-Series Models
More information13 Simple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity
More informationSimple Linear Regression
Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20 Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent
More informationTreatment Variables INTUB duration of endotracheal intubation (hrs) VENTL duration of assisted ventilation (hrs) LOWO2 hours of exposure to 22 49% lev
Variable selection: Suppose for the i-th observational unit (case) you record ( failure Y i = 1 success and explanatory variabales Z 1i Z 2i Z ri Variable (or model) selection: subject matter theory and
More informationChapter 8 (More on Assumptions for the Simple Linear Regression)
EXST3201 Chapter 8b Geaghan Fall 2005: Page 1 Chapter 8 (More on Assumptions for the Simple Linear Regression) Your textbook considers the following assumptions: Linearity This is not something I usually
More informationDetecting outliers and/or leverage points: a robust two-stage procedure with bootstrap cut-off points
Detecting outliers and/or leverage points: a robust two-stage procedure with bootstrap cut-off points Ettore Marubini (1), Annalisa Orenti (1) Background: Identification and assessment of outliers, have
More informationPolynomial Regression
Polynomial Regression Summary... 1 Analysis Summary... 3 Plot of Fitted Model... 4 Analysis Options... 6 Conditional Sums of Squares... 7 Lack-of-Fit Test... 7 Observed versus Predicted... 8 Residual Plots...
More informationThe Model Building Process Part I: Checking Model Assumptions Best Practice
The Model Building Process Part I: Checking Model Assumptions Best Practice Authored by: Sarah Burke, PhD 31 July 2017 The goal of the STAT T&E COE is to assist in developing rigorous, defensible test
More informationEconomics 308: Econometrics Professor Moody
Economics 308: Econometrics Professor Moody References on reserve: Text Moody, Basic Econometrics with Stata (BES) Pindyck and Rubinfeld, Econometric Models and Economic Forecasts (PR) Wooldridge, Jeffrey
More informationLecture 8: Instrumental Variables Estimation
Lecture Notes on Advanced Econometrics Lecture 8: Instrumental Variables Estimation Endogenous Variables Consider a population model: y α y + β + β x + β x +... + β x + u i i i i k ik i Takashi Yamano
More informationLecture 4: Regression Analysis
Lecture 4: Regression Analysis 1 Regression Regression is a multivariate analysis, i.e., we are interested in relationship between several variables. For corporate audience, it is sufficient to show correlation.
More informationThe General Linear Model. April 22, 2008
The General Linear Model. April 22, 2008 Multiple regression Data: The Faroese Mercury Study Simple linear regression Confounding The multiple linear regression model Interpretation of parameters Model
More informationAnalysis of Variance. Source DF Squares Square F Value Pr > F. Model <.0001 Error Corrected Total
Math 221: Linear Regression and Prediction Intervals S. K. Hyde Chapter 23 (Moore, 5th Ed.) (Neter, Kutner, Nachsheim, and Wasserman) The Toluca Company manufactures refrigeration equipment as well as
More informationThe General Linear Model. November 20, 2007
The General Linear Model. November 20, 2007 Multiple regression Data: The Faroese Mercury Study Simple linear regression Confounding The multiple linear regression model Interpretation of parameters Model
More informationIntroduction to Regression
Introduction to Regression Using Mult Lin Regression Derived variables Many alternative models Which model to choose? Model Criticism Modelling Objective Model Details Data and Residuals Assumptions 1
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationThe Steps to Follow in a Multiple Regression Analysis
ABSTRACT The Steps to Follow in a Multiple Regression Analysis Theresa Hoang Diem Ngo, Warner Bros. Home Video, Burbank, CA A multiple regression analysis is the most powerful tool that is widely used,
More informationMultiple Linear Regression
Chapter 3 Multiple Linear Regression 3.1 Introduction Multiple linear regression is in some ways a relatively straightforward extension of simple linear regression that allows for more than one independent
More informationThe Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)
The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) Authored by: Sarah Burke, PhD Version 1: 31 July 2017 Version 1.1: 24 October 2017 The goal of the STAT T&E COE
More information5.3 Three-Stage Nested Design Example
5.3 Three-Stage Nested Design Example A researcher designs an experiment to study the of a metal alloy. A three-stage nested design was conducted that included Two alloy chemistry compositions. Three ovens
More informationLecture 12 Robust Estimation
Lecture 12 Robust Estimation Prof. Dr. Svetlozar Rachev Institute for Statistics and Mathematical Economics University of Karlsruhe Financial Econometrics, Summer Semester 2007 Copyright These lecture-notes
More informationApplied Statistics and Econometrics
Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple
More informationPrediction of Bike Rental using Model Reuse Strategy
Prediction of Bike Rental using Model Reuse Strategy Arun Bala Subramaniyan and Rong Pan School of Computing, Informatics, Decision Systems Engineering, Arizona State University, Tempe, USA. {bsarun, rong.pan}@asu.edu
More informationA Modified M-estimator for the Detection of Outliers
A Modified M-estimator for the Detection of Outliers Asad Ali Department of Statistics, University of Peshawar NWFP, Pakistan Email: asad_yousafzay@yahoo.com Muhammad F. Qadir Department of Statistics,
More informationREGRESSION OUTLIERS AND INFLUENTIAL OBSERVATIONS USING FATHOM
REGRESSION OUTLIERS AND INFLUENTIAL OBSERVATIONS USING FATHOM Lindsey Bell lbell2@coastal.edu Keshav Jagannathan kjaganna@coastal.edu Department of Mathematics and Statistics Coastal Carolina University
More informationChapter 12: Multiple Regression
Chapter 12: Multiple Regression 12.1 a. A scatterplot of the data is given here: Plot of Drug Potency versus Dose Level Potency 0 5 10 15 20 25 30 0 5 10 15 20 25 30 35 Dose Level b. ŷ = 8.667 + 0.575x
More information