Economics 471: Econometrics Department of Economics, Finance and Legal Studies University of Alabama

Size: px

Start display at page:

Download "Economics 471: Econometrics Department of Economics, Finance and Legal Studies University of Alabama"

Jared Todd
6 years ago
Views:

1 Economics 471: Econometrics Department of Economics, Finance and Legal Studies University of Alabama Course Packet The purpose of this packet is to show you one particular dataset and how it is used in practice with the methods taught in lecture. The data used in the packet are available on the course website A brief description of the variables is available on the course website You should be able to replicate all of the results using the econometric package. This should help bridge the material from class with the computer program and at the same time help you with the assigned problem sets. Brief explanations on how to obtain the various tables and figures within the window framework are given within the packet and a more advanced formal set of code is given at the end which will run most of the results in one step. The academic paper from which this subset of data was collected can be found at You are not required to read or understand the material in the paper, but you may be interested in the motivation for choosing this particular data. You must be on campus to access the information in the above link. A related paper of interest is here: This packet is designed to be a tool which will help you understand the difficult material in the course. You are expected to bring this to each lecture as I will often refer to the results given in the packet. Questions, comments and suggestions are always appreciated.

2 I. Descriptive Statistics Sample Average 1 / Sample Standard Deviation 1 1 TESTSCORES HOMEWORK HRSOFCLASS CLASSSIZE PREVTESTSCORES Mean Median Maximum Minimum Std. Dev Skewness Kurtosis Jarque Bera Probability Sum Sum Sq. Dev Observations Quick Group Statistics Descriptive Statistics Common Sample Enter in Series List Dialog Box: testscores homework classsize hrsofclass prevtestscores

3 Conditional Expectation Testscores, if sex=0: Testscores, if sex=1: One may expect, on average, tenth grade boys to score nearly a point higher than tenth grade girls on math tests. smpl if sex=0 scalar meantscond0 smpl if sex=1 scalar meantscond1 Note: All scalars appear at the bottom left hand corner of the window. Sample Covariance homework) Or QuickGroup StatisticsCovariances Enter testscores homework in the dialog box. This shows the following covariance matrix (note that the diagonals represent the variances): TESTSCORES HOMEWORK TESTSCORES HOMEWORK

4 Sample Correlation Coefficient Both covariance and the correlation coefficient measure the linear dependence between the variables. Notes: A correlation coefficient of 0 does not imply independence. Both covariance and correlation coefficient have the same sign. homework) Or QuickGroup StatisticsCorrelations Enter testscores homework in the dialog box. This shows the following correlation matrix (the diagonals are unity because each variable is perfectly correlated with itself): TESTSCORES HOMEWORK TESTSCORES HOMEWORK

5 II. Histograms, PDF s and CDF s Histogram Frequency Test Scores Series: TESTSCORES Sample Observations 3733 Mean Median Maximum Minimum Std. Dev Skewness Kurtosis Jarque-Bera Probability QuickSeries StatisticsHistograms and Stats Enter testscores in the dialog box.

6 Probability Density Function (pdf), 1,2,,.036 Kernel Density (Normal, h = ) Probability TESTSCORES Probability TESTSCORES Probability if sex=0 Probability if sex=1

7 Double click on the variable of interest. View Distribution Kernel Density Graphs Normal (Gaussian) OK For the conditional, restrict the sample size: smpl if sex = 0 freeze(ts_pdf0) testscores.kdensity(k=n) smpl if sex=1 freeze(ts_pdf1)testscores.kdensity(k=n) Cumulative Distribution Function (cdf) 1.0 Empirical CDF 0.8 Probability TESTSCORES Double click on the variable of interest. View Distribution CDF-Survivor-Quantile Graphs OK

8 III. Simple Linear Regression Model (Regression of y on x) Standard output: Where Dependent Variable: TESTSCORES Method: Least Squares Date: 08/10/08 Time: 10:04 Sample: Included observations: 3733 is the total variation in x. Variable Coefficient Std. Error t-statistic Prob. C HOMEWORK R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic)

9 TESTSCORES = *HOMEWORK TEST SCORES HOMEWORK The model here is: This model relates average tenth grade math test scores to hours of daily homework. The estimated equation is: A tenth grade student scores an average of on their math test if they do not do any homework, holding all other factors fixed. Holding all other factors fixed, an additional hour of daily math homework increases a tenth grader s predicted test score by points. Quick Estimate Equation Enter into Equation Specification dialog box: testscores c homework Press OK

10 Standard Error of The standard error for the slope parameter is scalar Standard Error of Regression (SER) SER estimates the standard deviation in y after the effect of x has been accounted for. scalar sersimple=simplehw.@se Sum Squared Residuals (SSR) SSR measures the unexplained variation between the data and the predicted values. scalar ssr=simplehw.@ssr

11 Explained Sum of Squares (SSE) SSE measures the explained variation between the predicted values and the mean. Computed manually: genr y_hat = *homework genr sumofsquares = (y_hat )^2 Quick Group Statistics Descriptive Statistics Common Sample Enter in Series List Dialog Box: sumofsquares. The SSE is in the row labeled sum. or calculate SST first and then subtract SSR from SST to get SSE. Total Sum of Squares (SST) SST measures the total variation in y. It is the sum of the explained and unexplained variation. Take the values for SSR (reported in the regression output) and SSE (above) and add them. Or series scalar tss=@sum(sst)

12 R Squared measures the proportion of sample variation in the dependent variable that is accounted for by the independent variable(s) in the model. Note: never decreases (and usually increases) when adding more regressors. scalar rsqsimple=simplehw.@r2 Adjusted R squared Note: imposes a penalty for adding more independent variables. scalar rbarsqsimple=simplehw.@rbar2

13 Regression Through the Origin Dependent Variable: TESTSCORES Method: Least Squares Date: 08/11/08 Time: 07:10 Sample: Included observations: 3733 Variable Coefficient Std. Error t-statistic Prob. HOMEWORK R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Durbin-Watson stat TEST SCORES y_hat= *homework HOMEWORK

14 We would use this model if there is evidence to suggest that 0. The interpretation would be that if the student does no homework, the predicted test score would be 0. If the student increased his daily homework time by one hour, his predicted test score would increase by 60.3 points. Both seem unreasonable. Note that R squared is meaningless when the intercept is excluded from the model. Functional Forms with Logs Model Dependent Variable Independent Variable Interpretation of β level level y x level log y log(x) /100% log level log(y) x % 100 log log log(y) log(x) % % Level Log testscores ln Dependent Variable: TESTSCORES Method: Least Squares Date: 08/14/08 Time: 08:04 Sample: Included observations: 3733 Variable Coefficient Std. Error t-statistic Prob. C LNCLASSSIZE R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic) A one percent increase in class size will increase test scores by points. Generating ln of a variable. In the command window enter: series lnclasssize=@log(classsize)

15 Log Level ln Dependent Variable: LNTESTSCORES Method: Least Squares Date: 08/14/08 Time: 08:05 Sample: Included observations: 3733 Variable Coefficient Std. Error t-statistic Prob. C CLASSSIZE R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic) The coefficient for classsize multiplied by one hundred can be interpreted as the semi elasticity of test scores with respect to class size. If you add one student to the class, test scores would increase by %. Log Log ln ln Dependent Variable: LNTESTSCORES Method: Least Squares Date: 08/10/08 Time: 10:04 Sample: Included observations: 3733 Variable Coefficient Std. Error t-statistic Prob. C LNCLASSSIZE R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic)

16 The coefficient for ln(classsize) can be interpreted as the elasticity of test scores with respect to class size. A one percent increase in class size will increase test scores by 0.06%. Some may have expected a negative relationship between these two variables. This result is common in empirical studies and may be attributed to selection of high quality students into larger classes. IV. Multiple Regression Dependent Variable: TESTSCORES Method: Least Squares Date: 08/10/08 Time: 10:04 Sample: Included observations: 3733 Variable Coefficient Std. Error t-statistic Prob. C HOMEWORK CLASSSIZE HRSOFCLASS R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic) The interpretation for each coefficient remains the same. Standard Error of 1 1 Where for j1,2,,k and is the from the regression of x on the other x s

17 Quadratic Model Estimated Slope 2 Dependent Variable: TESTSCORES Method: Least Squares Date: 08/10/08 Time: 10:04 Sample: Included observations: 3733 Variable Coefficient Std. Error t-statistic Prob. C HOMEWORK HOMEWORK*HOMEWORK R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic)

18 y_hat= *homework *homework^ Test Scores x* HOMEWORK Turning Point After 2.8 hours of homework per day, the return to homework starts to diminish. When we restrict the sample to students who do more than hours of homework a day, we find that there are only 17 sample points out of a possible 3733 (0.0046%). Standardized/Beta Coefficients Where for all j 1, 2,, k

19 Dependent Variable: ZTESTSCORES Method: Least Squares Date: 08/11/08 Time: 07:10 Sample: Included observations: 3733 Variable Coefficient Std. Error t-statistic Prob. ZHOMEWORK ZHRSOFCLASS ZCLASSSIZE R-squared Mean dependent var -5.04E-09 Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Durbin-Watson stat The interpretation for this equation is that a one standard deviation change in homework will result in a standard deviation change in test scores. Homework has the largest impact out of the three regressors on test scores. To generate standardized variables, type into the command window: series ztestscores=(testscores-@mean(testscores))/@stdev(testscores) This creates a series called ztestscores that is the standardized form of testscores.

20 V. Lagged Dependent Variable Model Dependent Variable: TESTSCORES Method: Least Squares Date: 08/20/08 Time: 00:37 Sample: Included observations: 3733 Variable Coefficient Std. Error t-statistic Prob. C HOMEWORK PREVTESTSCORES R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic) We include eighth grade test scores because it helps control for unobserved variables that may influence their test scores in the tenth grade. The eighth grade test scores can be a proxy for ability or school quality. According to this model, holding all other factors fixed, increasing homework time by one hour would increase test scores by 0.88 points. The return to homework is lower than it was in the model that did not include previous test scores as a regressor.

21 V. Dummy Variable Models Dependent Variable: TESTSCORES Method: Least Squares Date: 08/11/08 Time: 07:10 Sample: Included observations: 3733 Variable Coefficient Std. Error t-statistic Prob. C HOMEWORK SEX R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic)

22 75 70 Test Scores HOMEWORK y_hat= *homework y_hat= *homework Here the base group is male tenth graders. Since sex is binary (0, 1), the only difference between the base group and the alternative is the intercept.

23 Dependent Variable: TESTSCORES Method: Least Squares Date: 08/14/08 Time: 11:21 Sample: Included observations: 3733 Variable Coefficient Std. Error t-statistic Prob. C HOMEWORK SEX DMEDUC DMEDUC DMEDUC DMEDUC DMEDUC DMEDUC DTEACHERSEX R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic) This model contains three dummy variables, two of which are binary, and one with 7 groups. The base group for this model is male tenth grader whose mother did not finish high school and currently has a male teacher. Once again, the only difference between the base group and the alternatives is the intercept. The students with the highest predicted test scores are tenth grade boys whose mothers have a master s degree and currently have a male teacher. The students with the lowest predicted test scores are tenth grade girls whose mothers dropped out of high school and currently have a female teacher. The difference between the predicted test scores of these two groups for any given level of homework is points. Generating separate dummy variables for mother s education In the command window enter: =@expand(mothereduc)

24 Dummy Model with Interaction/Cross Terms Dependent Variable: TESTSCORES Method: Least Squares Date: 08/11/08 Time: 07:10 Sample: Included observations: 3733 Variable Coefficient Std. Error t-statistic Prob. C HOMEWORK SEX HOMEWORK*SEX R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic)

25 Test Scores HOMEWORK y_hat= homework*( ) y_hat= *homework testscores = *homework testscores = homework*( ) There is a clear slope change and a relatively small change in the intercept. The return to homework is less for tenth grade girls than boys at all time periods spent on homework, except when hours spent on homework is very small.

26 VI. Confidence Intervals and Hypothesis Testing Confidence Intervals 1001α% Confidence Interval, The critical values are taken from the t tables with n k 1 degrees of freedom. Using the model, the 95% confidence interval for is or , We have 95% confidence that the true population parameter,, is in this interval. t Test : H :,, ~ Note: automatically gives the t statistic for 0. Model: Hypothesis: : 0 against H : 0 reports a t statistic of for the parameter estimate. This falls inside the rejection region even at the 99% confidence level. With this data set we can use the standard normal table since n is large. We would reject the null hypothesis and conclude that 0. Hypothesis: : 0 against H : 0 We could also determine if the properly specified model excludes the intercept parameter from the model (whether a regression through the origin is appropriate) using the same model. The t statistic of is The critical value here with 3731 ( ) degrees of freedom, at a 99% confidence level, is Once again we could have used the standard normal table. We reject the null hypothesis and conclude that 0.

27 Model: Hypothesis: : 0 against : 0 reports as the t statistic. Once again, even at a 99% confidence level, we reject the null hypothesis. This helps verify that homework has decreasing return. Testing Linear Combination of Parameters Single Restriction : : 2, 2 Where is the estimate of, Model: In order to compute, we could modify the test to state: : 0 against : 0. When we rearrange the model using, the model becomes

28 Dependent Variable: TESTSCORES Method: Least Squares Date: 08/14/08 Time: 19:58 Sample: Included observations: 3733 Variable Coefficient Std. Error t-statistic Prob. C HOMEWORK HOMEWORK+CLASSSIZE HRSOFCLASS R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic) The standard error and the t statistic of homework are important because they are the values of and respectively under the original null and alternative hypotheses. This method bypasses the need to compute the covariance between and. We can now apply these values to the original hypothesis test. The critical value from the t distribution at a 99% confidence level in a one tailed test is With this specific test we reject the null hypothesis when the t statistic is greater than the absolute value of the critical value. Since the t statistic, , is greater than 2.326, we reject the null hypothesis. F Test (Testing the Validity of the Regression) : : ~,

29 Restricted Unrestricted : 0 : In the case where the restricted model does not have a slope parameter, the value is zero for the restricted model. We can also compute the F statistic using the following equation: Where is the from the estimation of the unrestricted model. 99% confidence level The F statistic is greater than the critical value. Therefore, we reject the null hypothesis. At least one of the parameters is different zero. We can conclude by saying that the inclusion of homework, class size and hours of class explain some variation in the test scores. Restricted Unrestricted : 0 : % confidence level In this test, we come to the same conclusion as before. Either or is different from zero or, and are different from zero. Thus, the inclusion of class size and hours of class help explain some variation in test scores.

30 Restricted Unrestricted : 0 : % confidence level Notes: When there is only one restriction, the F statistic is the square of the t statistic ( ). We reject the null hypothesis in this test as well. However, only this test tells us that homework is statistically significant. The inclusion of homework helps explain some variation in the test scores. VII. Heteroskedasticity Heteroskedasticity Robust Standard Errors for White Huber Eicker Standard Errors 1 Where denotes the th residual from regressing on all other independent variables.

31 Dependent Variable: TESTSCORES Method: Least Squares Date: 08/20/08 Time: 00:43 Sample: Included observations: 3733 White Heteroskedasticity-Consistent Standard Errors & Covariance Variable Coefficient Std. Error t-statistic Prob. C HOMEWORK R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic) Parameter Regular Standard Errors Robust Standard Errors α β Since we are still using OLS estimates, the t statistic is still constructed and interpreted as before. The only difference is that we use the robust standard errors. The conclusions from the t tests concerning α and β remain the same. Note: Robust standard errors and robust t statistics can only be used if there is a large sample size. Otherwise, the inference from the t test may be wrong. Quick Estimate Equation Enter into the Equation specification dialog box testscores c homework Click on the Options tab. Now select Heteroskedasticity consistent coefficient covariance. White should be the default. Click OK.

32 Heteroskedasticity Robust F statistics computes the heteroskedasticity robust Wald statistic which is a transformation of the heteroskedasticity robust F statistic. The interpretation of the Wald statistic is the same. In this case, the restricted model is and the unrestricted model is. Wald Test: Equation: Untitled : 0 : Test Statistic Value df Probability F-statistic (3, 3729) Chi-square Null Hypothesis Summary: Normalized Restriction (= 0) Value Std. Err. C(2) C(3) C(4) Restrictions are linear in coefficients. The regular F statistic for this test was However, our inference from the output is the same because the F statistic is greater than the critical value. To compute the heteroskedasticity robust Wald statistic in, estimate the equation as before but include classsize and hrsofclass as independent variables. ViewCoefficient TestsWald Coefficient Restrictions Enter into the Wald Test dialog box: c(2)=c(3)=c(4)=0 Hit OK

33 Testing for Heteroskedasticity White Test for Heteroskedasticity :,,, :,,, If the original model contained three independent variables, the White test is based on the estimation of where is the series of residuals from the OLS estimation. The F statistic is computed from the R squared of this regression. However, this approach uses up degrees of freedom quickly, which is important when your sample size is small. The F statistic is computed automatically in. White Heteroskedasticity Test: F-statistic Prob. F(6,3726) Obs*R-squared Prob. Chi-Square(6) Test Equation: Dependent Variable: RESID^2 Method: Least Squares Date: 08/19/08 Time: 15:50 Sample: Included observations: 3733 Variable Coefficient Std. Error t-statistic Prob. C HOMEWORK HOMEWORK^ CLASSSIZE CLASSSIZE^ HRSOFCLASS HRSOFCLASS^ R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic)

34 We can obtain the critical value from the, distribution. F* =2.8 at a 99% confidence level. Since the F-statistic is greater than the critical value, we must reject the null hypothesis that homoskedasticity holds. Estimate the model. ViewResidual TestsWhite Heteroskedasticity (no cross terms) The White test with cross terms requires the same approach. We estimate the model and then. White Heteroskedasticity Test: F-statistic Prob. F(9,3723) Obs*R-squared Prob. Chi-Square(9) Test Equation: Dependent Variable: RESID^2 Method: Least Squares Date: 08/19/08 Time: 15:51 Sample: Included observations: 3733 Variable Coefficient Std. Error t-statistic Prob. C HOMEWORK HOMEWORK^ HOMEWORK*CLASSSIZE HOMEWORK*HRSOFCLASS CLASSSIZE CLASSSIZE^ CLASSSIZE*HRSOFCLASS HRSOFCLASS HRSOFCLASS^ R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic) Once again we can reject the null hypothesis since the F-statistic is greater than 2.8.

35 Estimate the model. ViewResidual TestsWhite Heteroskedasticity (cross terms) An alternative to this method is to run the regression of y on the s and then take the residuals and square them. Now estimate the following model:. Dependent Variable: RESID_SQUARED Method: Least Squares Date: 08/19/08 Time: 18:12 Sample: Included observations: 3733 Variable Coefficient Std. Error t-statistic Prob. C Y_HAT Y_HAT_SQUARED R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic) Take the R squared from this regression,, and compute the F statistic using: We can obtain the critical value from the, distribution. This approach clearly uses up fewer degrees of freedom, especially when more independent variables are added to the original model. The F statistic is Since the F statistic is greater than 4.61, we again reject the null hypothesis that homoskedasticity holds.

36 Feasible Generalized Least Squares (FGLS) h(x) is some function of the explanatory variables that determines the heteroskedasticity. Since it is hard to find such a function, we can model the function and then estimate the parameters in this model. Procedure for FGLS 1. Run the regression of y on the explanatory variables. 2. Take the residuals,, and form ln(. 3. Regress ln( on the explanatory variables. 4. Take the fitted values,, and form =exp(). 5. Estimate the original model again by WLS using weights of 1/. Dependent Variable: TESTSCORES Method: Least Squares Date: 08/19/08 Time: 19:09 Sample: Included observations: 3733 Weighting series: H_HAT Variable Coefficient Std. Error t-statistic Prob. C HOMEWORK CLASSSIZE HRSOFCLASS Weighted Statistics R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic) Unweighted Statistics R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Sum squared resid Durbin-Watson stat

37 We can also do the following: 1. Run the regression of y on the explanatory variables. 2. Take the residuals,, and form ln(. 3. Regress ln( on and. 4. Take the fitted values,, and form =exp(). 5. Estimate the original model again by WLS using weights of 1/. Dependent Variable: TESTSCORES Method: Least Squares Date: 08/19/08 Time: 19:18 Sample: Included observations: 3733 Weighting series: H_HAT2 Variable Coefficient Std. Error t-statistic Prob. C HOMEWORK CLASSSIZE HRSOFCLASS Weighted Statistics R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic) Unweighted Statistics R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Sum squared resid Durbin-Watson stat The only part of this process that may be unfamiliar to you is running the regression with WLS. In the Options tab of the Equation Estimation box, there is an option called Weighted LS/TSLS. Select that option and enter h_hat into the box labeled Weight.

38 Ramsey s Regression Specification Error Test (RESET) Model Misspecification (Neglected Nonlinearities) Original Model: Model with fitted values: : 0 : Ramsey RESET Test: F-statistic Prob. F(2,3727) Log likelihood ratio Prob. Chi-Square(2) Test Equation: Dependent Variable: TESTSCORES Method: Least Squares Date: 08/19/08 Time: 23:47 Sample: Included observations: 3733 Variable Coefficient Std. Error t-statistic Prob. C HOMEWORK CLASSSIZE HRSOFCLASS FITTED^ FITTED^ R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic)

39 The F statistic is This is greater than the critical value at a 99% confidence level,. Therefore, we reject the null hypothesis. This is evidence that the functional form is misspecified. Estimate the model as usual. ViewStability TestsRamsey s RESET Test In the RESET Specification box enter: 2 Hit OK. Test Against Nonnested Alternatives Mizon and Richard Test Two nonnested models: 1. ln classsize 2. The comprehensive model is: ln : 0 : We can form the F statistic the same way as usual. The F statistic is We reject the null hypothesis at a 99% confidence level. and are jointly statistically significant. Either one or both of these parameters does not equal zero. : 0 : The F statistic is This means that is statistically significant at a 99% confidence level.

40 Davidson MacKinnon Test First estimate the original models ln classsize Now estimate each model including the predicted value of the other model as a regressor lnclasssize θ The t statistics are and for in the two estimated models, respectively. In both instances, even at a 99% confidence level we reject the null hypothesis that 0. Therefore, neither model could be rejected. However, since the R squared value for the second model is higher than that of the first, we can conclude that the second model is better. However, note that this need not be the best possible model. It is just the best of the two being considered.

CHAPTER 6: SPECIFICATION VARIABLES

CHAPTER 6: SPECIFICATION VARIABLES Recall, we had the following six assumptions required for the Gauss-Markov Theorem: 1. The regression model is linear, correctly specified, and has an additive error term. 2. The error term has a zero