Chapter 8 Conclusion

Size: px
Start display at page:

Download "Chapter 8 Conclusion"

Transcription

1 1 Chapter 8 Conclusion Three questions about test scores (score) and student-teacher ratio (str): a) After controlling for differences in economic characteristics of different districts, does the effect of str on score depend on the fraction of English learners (pctel)? b) Does this effect depend on str? (Is there a non-linear relationship?) c) After taking economic factors and nonlinearities into account, what is the estimated effect on score of reducing str?

2 2 > teachdata = read.csv(" > attach(teachdata) > head(teachdata) sublunch score str avginc pctel

3 3 An economics study should always include a description of the data: sublunch percent qualifying for reduced-price lunch score average test score str student teacher ratio avginc district average income (in $1000 s) pctel percentage of English learners It is also common to provide descriptive statistics for the variables. The variable of interest is str ( policy variable). Two measures of the economic background of students: sublunch and avginc pctel also important because of O.V.B.

4 4 In a previous lecture, it was argued that avginc might have a non-linear relationship with score: > plot(avginc, score, xlim = c(5,60), ylim = c(600,710)) score avginc

5 5 What are some ways we can deal with this? (i) Polynomials: > avginc2 = avginc^2 > avginc3 = avginc^3 > eqcubic = lm(score ~ avginc + avginc2 + avginc3) > summary(eqcubic) Call: lm(formula = score ~ avginc + avginc2 + avginc3) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 6.001e e < 2e-16 *** avginc 5.019e e e-08 *** avginc e e * avginc e e Signif. codes: 0 *** ** 0.01 * Residual standard error: on 416 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 3 and 416 DF, p-value: < 2.2e-16

6 6 Let s plot the cubic regression function: > par(new = TRUE) > curve( *x *x^ *x^3, xlim = c(5,60), ylim = c(600,710), ylab = "", xlab = "", col = 2) score avginc

7 7 (ii) Logarithms: > eqlog = lm(score ~ log(avginc)) > summary(eqlog) Call: lm(formula = score ~ log(avginc)) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** log(avginc) <2e-16 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 418 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 418 DF, p-value: < 2.2e-16 Add this regression to the plot:

8 8 > par(new = TRUE) > curve( *log(x), xlim = c(5,60), ylim = c(600,710), ylab = "", xlab = "", col = 3) > legend("bottomright", c("cubic", "Lin-Log"), pch =" ", col=c(2,3)) score Cubic Lin-Log avginc

9 9 Do you like the cubic or lin-log model better? What are the advantages/disadvantages? Does heteroskedasticity appear to be present? We will proceed by using log(avginc). But first, to revise omitted variable bias, let s see what happens if we leave log(avginc) out of the regression. > eq1 = lm(score ~ str + pctel + sublunch) > summary(eq1) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** str e-05 *** pctel *** sublunch < 2e-16 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: 9.08 on 416 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 3 and 416 DF, p-value: < 2.2e-16

10 10 Now add log(avginc): > eq2 = lm(score ~ str + pctel + sublunch + log(avginc)) > summary(eq2) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** str ** pctel e-08 *** sublunch < 2e-16 *** log(avginc) e-11 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 415 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 4 and 415 DF, p-value: < 2.2e-16 How have the results changed? What is going on here?

11 11 Regressor (1) (2) (3) (4) (5) (6) (7) str -1.00** (0.24) -0.73** (0.23) str 2 str 3 pctel ** (0.033) hiel ** (0.032) hiel str hiel str 2 hiel str 3 sublunch ** (0.022) ** (0.030) log(avginc) 11.57** (1.74) Intercept 700.2** (4.7) 658.6** (7.7) R

12 12 Let s address (a): After controlling for differences in economic characteristics of different districts, does the effect of str on score depend on the fraction of English learners (pctel)? An easier way to examine this might be to create a dummy variable. Let s define a new variable (high percentage of English learners): hiel = 0 for classes with small percentage of English learners hiel = 1 for classes with large percentage of English learners How should we determine the threshold? > summary(pctel) Min. 1st Qu. Median Mean 3rd Qu. Max

13 13 Create hiel: hiel = 0 hiel[pctel >= 10] = 1 To address (a), create the interaction term: hielstr = hiel*str

14 14 Try a regression without economic controls: > eq3 = lm(score ~ str + hiel + hielstr) > summary(eq3) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** str hiel hielstr Signif. codes: 0 *** ** 0.01 * Residual standard error: on 416 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: 62.4 on 3 and 416 DF, p-value: < 2.2e-16 Which coefficient should we be testing to see if str has a different effect for classes with many English learners? What do we conclude? In anticipation of (c), let s test if str matters. Does it appear to matter from the results above?

15 15 H 0 : student-teacher ratio has no effect on test scores H 0 : model (3) The model under the null hypothesis is: > eqnul1 = lm(score ~ hiel) > summary(eqnul1) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** hiel <2e-16 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 418 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 418 DF, p-value: < 2.2e-16

16 16 Formula for F-statistic: F = F = (R 2 U R 2 R ) q (1 R 2 U ) (n k U 1) ( ) 2 ( ) ( ) = 7.57 Since this is greater than the 5% critical value of 3.00, we reject the null. Alternatively, use the following R-code to perform the test: > anova(eq3,eqnul1) Analysis of Variance Table Model 1: score ~ str + hiel + hielstr Model 2: score ~ hiel Res.Df RSS Df Sum of Sq F Pr(>F) *** --- Signif. codes: 0 *** ** 0.01 *

17 17 Let s try a model with economic controls. > eq4 = lm(score ~ str + hiel + hielstr + sublunch + log(avginc)) > summary(eq4) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** str hiel hielstr sublunch < 2e-16 *** log(avginc) e-11 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 414 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 5 and 414 DF, p-value: < 2.2e-16 Has the conclusion (about a different effect for classes with many English learners) changed?

18 18 Again, let s test the null that str doesn t matter. Restricted model: > eqnul2 = lm(score ~ hiel + sublunch + log(avginc)) > anova(eq4,eqnul2) Analysis of Variance Table Model 1: score ~ str + hiel + hielstr + sublunch + log(avginc) Model 2: score ~ hiel + sublunch + log(avginc) Res.Df RSS Df Sum of Sq F Pr(>F) ** --- Signif. codes: 0 *** ** 0.01 *

19 19 Regressor (1) (2) (3) (4) (5) (6) (7) str -1.00** (0.24) -0.73** (0.23) (0.54) (0.30) str 2 str 3 pctel ** (0.033) ** (0.032) hiel 5.64 (16.7) hiel str (0.84) hiel str (9.1) (0.47) hiel str 3 sublunch ** (0.022) ** (0.030) ** (0.029) log(avginc) 11.57** (1.74) 12.12** (1.8) Intercept 700.2** (4.7) 658.6** (7.7) 682.2** (10.5) 653.7** (8.9) R

20 20 Now let s address (b): is the relationship between str and score non-linear? > str2 = str^2 > str3 = str^3 > eq5 = lm(score ~ str + str2 + str3 + hiel + sublunch + log(avginc)) > summary(eq5) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) str * str ** str ** hiel e-07 *** sublunch < 2e-16 *** log(avginc) e-11 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 413 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 6 and 413 DF, p-value: < 2.2e-16

21 21 Regressor (1) (2) (3) (4) (5) (6) (7) str -1.00** (0.24) -0.73** (0.23) (0.54) (0.30) 64.33** (25.5) str ** (1.29) str ** (0.022) pctel ** (0.033) ** (0.032) hiel 5.64 (16.7) 5.50 (9.1) -5.47** (1.03) hiel str (0.84) (0.47) hiel str 2 hiel str 3 sublunch ** (0.022) ** (0.030) ** (0.029) ** (0.028) log(avginc) 11.57** (1.74) 12.12** (1.8) 11.75** (1.7) Intercept 700.2** (4.7) 658.6** (7.7) 682.2** (10.5) 653.7** (8.9) (165.8) R

22 22 To test the null hypothesis that the relationship between str and score is linear, estimate a restricted model and compare it to model (5): > eqnul3 = lm(score ~ hiel + sublunch + log(avginc)) > anova(eq5,eqnul3) Analysis of Variance Table Model 1: score ~ str + str2 + str3 + hiel + sublunch + log(avginc) Model 2: score ~ hiel + sublunch + log(avginc) Res.Df RSS Df Sum of Sq F Pr(>F) *** --- Signif. codes: 0 *** ** 0.01 * What do you conclude? What other way might you try to capture this non-linear effect? How would you test to see if str matters, using model (5)?

23 23 Let s reconsider (a) under the cubic specification. We want to know if the effect of str on score is different for classes with a high percentage of English learners. Again, the strategy is: have the dummy variable hiel interact with all terms involving str this allows for the marginal effect to differ between the two groups testing to see if the coeffecients on the interaction terms are jointly equal to zero is equivalent to testing that there is no difference between the two groups Create the new interaction terms: hielstr2 = hiel*str2 hielstr3 = hiel*str3 Add the interaction terms to model (5): eq6 = lm(score ~ str + str2 + str3 + hiel + hielstr + hielstr2 + hielstr3 + sublunch + log(avginc))

24 24 Regressor (1) (2) (3) (4) (5) (6) (7) str -1.00** (0.24) -0.73** (0.23) (0.54) (0.30) 64.33** (25.5) 83.70** (29.69) str ** (1.29) -4.38** (1.51) str ** (0.022) 0.075** (0.025) pctel ** (0.033) ** (0.032) hiel 5.64 (16.7) 5.50 (9.1) -5.47** (1.03) 816.1* (434.61) hiel str (0.84) (0.47) * (66.35) hiel str * (3.35) hiel str * (0.056) sublunch ** (0.022) ** (0.030) ** (0.029) ** (0.028) ** (0.029) log(avginc) 11.57** (1.74) 12.12** (1.8) 11.75** (1.7) 11.80** (1.75) Intercept 700.2** (4.7) 658.6** (7.7) 682.2** (10.5) 653.7** (8.9) (165.8) (192.2) R

25 25 How do we test (a) using model (6)? > anova(eq6,eq5) Analysis of Variance Table Model 1: score ~ str + str2 + str3 + hiel + hielstr + hielstr2 + hielstr3 + sublunch + log(avginc) Model 2: score ~ str + str2 + str3 + hiel + sublunch + log(avginc) Res.Df RSS Df Sum of Sq F Pr(>F) So, once again, we can t reject the null that the effect of str on score is the same regardless of number of English learners. This suggests that the interaction terms are not needed, and model (5) is adequate. For a final model, let s make sure that our results are invariant to the use of hiel or pctel. eq7 = lm(score ~ str + str2 + str3 + pctel + sublunch + log(avginc))

26 26 Regressor (1) (2) (3) (4) (5) (6) (7) str -1.00** (0.24) -0.73** (0.23) (0.54) (0.30) 64.33** (25.5) 83.70** (29.69) 65.29** (25.48) str ** (1.29) -4.38** (1.51) -3.47** (1.30) str ** (0.022) 0.075** (0.025) 0.060** (0.022) pctel ** (0.033) ** (0.032) ** (0.032) hiel 5.64 (16.7) 5.50 (9.1) -5.47** (1.03) 816.1* (434.61) hiel str (0.84) (0.47) * (66.35) hiel str * (3.35) hiel str * (0.056) sublunch ** (0.022) ** (0.030) ** (0.029) ** (0.028) ** (0.029) ** (0.030) log(avginc) 11.57** (1.74) 12.12** (1.8) 11.75** (1.7) 11.80** (1.75) 11.51** (1.73) Intercept 700.2** (4.7) 658.6** (7.7) 682.2** (10.5) 653.7** (8.9) (165.8) (192.2) (165.9) R

27 27 Summary (a) Based on hypothesis tests involving models (3), (4) and (6), there doesn t appear to be a substantial difference in the effect of str on score for classes with many English learners. (b) A hypothesis test involving model (5) indicates the relationship between str and score is non-linear. (c) Using F-tests, the null hypothesis that str has no effect on score is rejected in all models. (Only one of these F-tests was shown). Model (5) and (7) should be our preferred models based on the sequence of testing. Let s use them to provide some policy recommendation. If str = 20, then reducing str to 18 would improve score by 3.00 using model (5), and 2.93 using model (7). If str = 22, then reducing str to 20 would improve score by 1.93 (model 5) or 1.90 (model 7).

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as: 1 Joint hypotheses The null and alternative hypotheses can usually be interpreted as a restricted model ( ) and an model ( ). In our example: Note that if the model fits significantly better than the restricted

More information

Nonlinear Regression Functions

Nonlinear Regression Functions Nonlinear Regression Functions (SW Chapter 8) Outline 1. Nonlinear regression functions general comments 2. Nonlinear functions of one variable 3. Nonlinear functions of two variables: interactions 4.

More information

4. Nonlinear regression functions

4. Nonlinear regression functions 4. Nonlinear regression functions Up to now: Population regression function was assumed to be linear The slope(s) of the population regression function is (are) constant The effect on Y of a unit-change

More information

The linear model. Our models so far are linear. Change in Y due to change in X? See plots for: o age vs. ahe o carats vs.

The linear model. Our models so far are linear. Change in Y due to change in X? See plots for: o age vs. ahe o carats vs. 8 Nonlinear effects Lots of effects in economics are nonlinear Examples Deal with these in two (sort of three) ways: o Polynomials o Logarithms o Interaction terms (sort of) 1 The linear model Our models

More information

Linear Regression with one Regressor

Linear Regression with one Regressor 1 Linear Regression with one Regressor Covering Chapters 4.1 and 4.2. We ve seen the California test score data before. Now we will try to estimate the marginal effect of STR on SCORE. To motivate these

More information

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb Stat 42/52 TWO WAY ANOVA Feb 6 25 Charlotte Wickham stat52.cwick.co.nz Roadmap DONE: Understand what a multiple regression model is. Know how to do inference on single and multiple parameters. Some extra

More information

Regression and the 2-Sample t

Regression and the 2-Sample t Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 13 Nonlinearities Saul Lach October 2018 Saul Lach () Applied Statistics and Econometrics October 2018 1 / 91 Outline of Lecture 13 1 Nonlinear regression functions

More information

Replication of Examples in Chapter 6

Replication of Examples in Chapter 6 Replication of Examples in Chapter 6 Zheng Tian 1 Introduction This document is to show how to perform hypothesis testing for a single coefficient in a simple linear regression model. I replicate examples

More information

The Application of California School

The Application of California School The Application of California School Zheng Tian 1 Introduction This tutorial shows how to estimate a multiple regression model and perform linear hypothesis testing. The application is about the test scores

More information

ST430 Exam 2 Solutions

ST430 Exam 2 Solutions ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving

More information

Tests of Linear Restrictions

Tests of Linear Restrictions Tests of Linear Restrictions 1. Linear Restricted in Regression Models In this tutorial, we consider tests on general linear restrictions on regression coefficients. In other tutorials, we examine some

More information

MODELS WITHOUT AN INTERCEPT

MODELS WITHOUT AN INTERCEPT Consider the balanced two factor design MODELS WITHOUT AN INTERCEPT Factor A 3 levels, indexed j 0, 1, 2; Factor B 5 levels, indexed l 0, 1, 2, 3, 4; n jl 4 replicate observations for each factor level

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

Stat 5102 Final Exam May 14, 2015

Stat 5102 Final Exam May 14, 2015 Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions

More information

Lecture notes to Stock and Watson chapter 8

Lecture notes to Stock and Watson chapter 8 Lecture notes to Stock and Watson chapter 8 Nonlinear regression Tore Schweder September 29 TS () LN7 9/9 1 / 2 Example: TestScore Income relation, linear or nonlinear? TS () LN7 9/9 2 / 2 General problem

More information

2. Linear regression with multiple regressors

2. Linear regression with multiple regressors 2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

More information

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species Lecture notes 2/22/2000 Dummy variables and extra SS F-test Page 1 Crab claw size and closing force. Problem 7.25, 10.9, and 10.10 Regression for all species at once, i.e., include dummy variables for

More information

Chapter 7. Hypothesis Tests and Confidence Intervals in Multiple Regression

Chapter 7. Hypothesis Tests and Confidence Intervals in Multiple Regression Chapter 7 Hypothesis Tests and Confidence Intervals in Multiple Regression Outline 1. Hypothesis tests and confidence intervals for a single coefficie. Joint hypothesis tests on multiple coefficients 3.

More information

CAS MA575 Linear Models

CAS MA575 Linear Models CAS MA575 Linear Models Boston University, Fall 2013 Midterm Exam (Correction) Instructor: Cedric Ginestet Date: 22 Oct 2013. Maximal Score: 200pts. Please Note: You will only be graded on work and answers

More information

ST430 Exam 1 with Answers

ST430 Exam 1 with Answers ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.

More information

Diagnostics and Transformations Part 2

Diagnostics and Transformations Part 2 Diagnostics and Transformations Part 2 Bivariate Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University Multilevel Regression Modeling, 2009 Diagnostics

More information

Hypothesis Tests and Confidence Intervals. in Multiple Regression

Hypothesis Tests and Confidence Intervals. in Multiple Regression ECON4135, LN6 Hypothesis Tests and Confidence Intervals Outline 1. Why multipple regression? in Multiple Regression (SW Chapter 7) 2. Simpson s paradox (omitted variables bias) 3. Hypothesis tests and

More information

Example: 1982 State SAT Scores (First year state by state data available)

Example: 1982 State SAT Scores (First year state by state data available) Lecture 11 Review Section 3.5 from last Monday (on board) Overview of today s example (on board) Section 3.6, Continued: Nested F tests, review on board first Section 3.4: Interaction for quantitative

More information

Variance Decomposition and Goodness of Fit

Variance Decomposition and Goodness of Fit Variance Decomposition and Goodness of Fit 1. Example: Monthly Earnings and Years of Education In this tutorial, we will focus on an example that explores the relationship between total monthly earnings

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

1 Multiple Regression

1 Multiple Regression 1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only

More information

Nested 2-Way ANOVA as Linear Models - Unbalanced Example

Nested 2-Way ANOVA as Linear Models - Unbalanced Example Linear Models Nested -Way ANOVA ORIGIN As with other linear models, unbalanced data require use of the regression approach, in this case by contrast coding of independent variables using a scheme not described

More information

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA s:5 Applied Linear Regression Chapter 8: ANOVA Two-way ANOVA Used to compare populations means when the populations are classified by two factors (or categorical variables) For example sex and occupation

More information

The F distribution. If: 1. u 1,,u n are normally distributed; and 2. X i is distributed independently of u i (so in particular u i is homoskedastic)

The F distribution. If: 1. u 1,,u n are normally distributed; and 2. X i is distributed independently of u i (so in particular u i is homoskedastic) The F distribution If: 1. u 1,,u n are normally distributed; and. X i is distributed independently of u i (so in particular u i is homoskedastic) then the homoskedasticity-only F-statistic has the F q,n-k

More information

1 Introduction 1. 2 The Multiple Regression Model 1

1 Introduction 1. 2 The Multiple Regression Model 1 Multiple Linear Regression Contents 1 Introduction 1 2 The Multiple Regression Model 1 3 Setting Up a Multiple Regression Model 2 3.1 Introduction.............................. 2 3.2 Significance Tests

More information

Economics Introduction to Econometrics - Fall 2007 Final Exam - Answers

Economics Introduction to Econometrics - Fall 2007 Final Exam - Answers Student Name: Economics 4818 - Introduction to Econometrics - Fall 2007 Final Exam - Answers SHOW ALL WORK! Evaluation: Problems: 3, 4C, 5C and 5F are worth 4 points. All other questions are worth 3 points.

More information

Assessing Studies Based on Multiple Regression

Assessing Studies Based on Multiple Regression Assessing Studies Based on Multiple Regression Outline 1. Internal and External Validity 2. Threats to Internal Validity a. Omitted variable bias b. Functional form misspecification c. Errors-in-variables

More information

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model Lab 3 A Quick Introduction to Multiple Linear Regression Psychology 310 Instructions.Work through the lab, saving the output as you go. You will be submitting your assignment as an R Markdown document.

More information

Multiple Regression Analysis: Estimation. Simple linear regression model: an intercept and one explanatory variable (regressor)

Multiple Regression Analysis: Estimation. Simple linear regression model: an intercept and one explanatory variable (regressor) 1 Multiple Regression Analysis: Estimation Simple linear regression model: an intercept and one explanatory variable (regressor) Y i = β 0 + β 1 X i + u i, i = 1,2,, n Multiple linear regression model:

More information

1 Use of indicator random variables. (Chapter 8)

1 Use of indicator random variables. (Chapter 8) 1 Use of indicator random variables. (Chapter 8) let I(A) = 1 if the event A occurs, and I(A) = 0 otherwise. I(A) is referred to as the indicator of the event A. The notation I A is often used. 1 2 Fitting

More information

Multiple Regression: Example

Multiple Regression: Example Multiple Regression: Example Cobb-Douglas Production Function The Cobb-Douglas production function for observed economic data i = 1,..., n may be expressed as where O i is output l i is labour input c

More information

Density Temp vs Ratio. temp

Density Temp vs Ratio. temp Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,

More information

SCHOOL OF MATHEMATICS AND STATISTICS

SCHOOL OF MATHEMATICS AND STATISTICS RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester

More information

Linear Regression with Multiple Regressors

Linear Regression with Multiple Regressors Linear Regression with Multiple Regressors (SW Chapter 6) Outline 1. Omitted variable bias 2. Causality and regression analysis 3. Multiple regression and OLS 4. Measures of fit 5. Sampling distribution

More information

Econometrics Midterm Examination Answers

Econometrics Midterm Examination Answers Econometrics Midterm Examination Answers March 4, 204. Question (35 points) Answer the following short questions. (i) De ne what is an unbiased estimator. Show that X is an unbiased estimator for E(X i

More information

Multiple Linear Regression. Chapter 12

Multiple Linear Regression. Chapter 12 13 Multiple Linear Regression Chapter 12 Multiple Regression Analysis Definition The multiple regression model equation is Y = b 0 + b 1 x 1 + b 2 x 2 +... + b p x p + ε where E(ε) = 0 and Var(ε) = s 2.

More information

Booklet of Code and Output for STAC32 Final Exam

Booklet of Code and Output for STAC32 Final Exam Booklet of Code and Output for STAC32 Final Exam December 7, 2017 Figure captions are below the Figures they refer to. LowCalorie LowFat LowCarbo Control 8 2 3 2 9 4 5 2 6 3 4-1 7 5 2 0 3 1 3 3 Figure

More information

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e Economics 102: Analysis of Economic Data Cameron Spring 2016 Department of Economics, U.C.-Davis Final Exam (A) Tuesday June 7 Compulsory. Closed book. Total of 58 points and worth 45% of course grade.

More information

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 PDF file location: http://www.murraylax.org/rtutorials/regression_anovatable.pdf

More information

6. Assessing studies based on multiple regression

6. Assessing studies based on multiple regression 6. Assessing studies based on multiple regression Questions of this section: What makes a study using multiple regression (un)reliable? When does multiple regression provide a useful estimate of the causal

More information

1 The Classic Bivariate Least Squares Model

1 The Classic Bivariate Least Squares Model Review of Bivariate Linear Regression Contents 1 The Classic Bivariate Least Squares Model 1 1.1 The Setup............................... 1 1.2 An Example Predicting Kids IQ................. 1 2 Evaluating

More information

ECON Introductory Econometrics. Lecture 7: OLS with Multiple Regressors Hypotheses tests

ECON Introductory Econometrics. Lecture 7: OLS with Multiple Regressors Hypotheses tests ECON4150 - Introductory Econometrics Lecture 7: OLS with Multiple Regressors Hypotheses tests Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 7 Lecture outline 2 Hypothesis test for single

More information

Consider fitting a model using ordinary least squares (OLS) regression:

Consider fitting a model using ordinary least squares (OLS) regression: Example 1: Mating Success of African Elephants In this study, 41 male African elephants were followed over a period of 8 years. The age of the elephant at the beginning of the study and the number of successful

More information

22s:152 Applied Linear Regression. Take random samples from each of m populations.

22s:152 Applied Linear Regression. Take random samples from each of m populations. 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 7 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 68 Outline of Lecture 7 1 Empirical example: Italian labor force

More information

Econometrics 1. Lecture 8: Linear Regression (2) 黄嘉平

Econometrics 1. Lecture 8: Linear Regression (2) 黄嘉平 Econometrics 1 Lecture 8: Linear Regression (2) 黄嘉平 中国经济特区研究中 心讲师 办公室 : 文科楼 1726 E-mail: huangjp@szu.edu.cn Tel: (0755) 2695 0548 Office hour: Mon./Tue. 13:00-14:00 The linear regression model The linear

More information

Hypothesis Tests and Confidence Intervals in Multiple Regression

Hypothesis Tests and Confidence Intervals in Multiple Regression Hypothesis Tests and Confidence Intervals in Multiple Regression (SW Chapter 7) Outline 1. Hypothesis tests and confidence intervals for one coefficient. Joint hypothesis tests on multiple coefficients

More information

Introduction to Econometrics. Multiple Regression (2016/2017)

Introduction to Econometrics. Multiple Regression (2016/2017) Introduction to Econometrics STAT-S-301 Multiple Regression (016/017) Lecturer: Yves Dominicy Teaching Assistant: Elise Petit 1 OLS estimate of the TS/STR relation: OLS estimate of the Test Score/STR relation:

More information

Factorial Analysis of Variance with R

Factorial Analysis of Variance with R Factorial Analysis of Variance with R # Potato Data with R potato = read.table("http://www.utstat.toronto.edu/~brunner/data/legal/potato2.data") potato Bact Temp Rot 1 1 1 7 2 1 1 7 3 1 1 9 4 1 1 0............

More information

Linear Regression with Multiple Regressors

Linear Regression with Multiple Regressors Linear Regression with Multiple Regressors (SW Chapter 6) Outline 1. Omitted variable bias 2. Causality and regression analysis 3. Multiple regression and OLS 4. Measures of fit 5. Sampling distribution

More information

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

Chapter 6: Linear Regression With Multiple Regressors

Chapter 6: Linear Regression With Multiple Regressors Chapter 6: Linear Regression With Multiple Regressors 1-1 Outline 1. Omitted variable bias 2. Causality and regression analysis 3. Multiple regression and OLS 4. Measures of fit 5. Sampling distribution

More information

STAT 350: Summer Semester Midterm 1: Solutions

STAT 350: Summer Semester Midterm 1: Solutions Name: Student Number: STAT 350: Summer Semester 2008 Midterm 1: Solutions 9 June 2008 Instructor: Richard Lockhart Instructions: This is an open book test. You may use notes, text, other books and a calculator.

More information

Introduction to Econometrics. Multiple Regression

Introduction to Econometrics. Multiple Regression Introduction to Econometrics The statistical analysis of economic (and related) data STATS301 Multiple Regression Titulaire: Christopher Bruffaerts Assistant: Lorenzo Ricci 1 OLS estimate of the TS/STR

More information

Introduction to the Analysis of Hierarchical and Longitudinal Data

Introduction to the Analysis of Hierarchical and Longitudinal Data Introduction to the Analysis of Hierarchical and Longitudinal Data Georges Monette, York University with Ye Sun SPIDA June 7, 2004 1 Graphical overview of selected concepts Nature of hierarchical models

More information

Statistical Inference. Part IV. Statistical Inference

Statistical Inference. Part IV. Statistical Inference Part IV Statistical Inference As of Oct 5, 2017 Sampling Distributions of the OLS Estimator 1 Statistical Inference Sampling Distributions of the OLS Estimator Testing Against One-Sided Alternatives Two-Sided

More information

SCHOOL OF MATHEMATICS AND STATISTICS

SCHOOL OF MATHEMATICS AND STATISTICS SHOOL OF MATHEMATIS AND STATISTIS Linear Models Autumn Semester 2015 16 2 hours Marks will be awarded for your best three answers. RESTRITED OPEN BOOK EXAMINATION andidates may bring to the examination

More information

Coefficient of Determination

Coefficient of Determination Coefficient of Determination ST 430/514 The coefficient of determination, R 2, is defined as before: R 2 = 1 SS E (yi ŷ i ) = 1 2 SS yy (yi ȳ) 2 The interpretation of R 2 is still the fraction of variance

More information

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013 UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013 STAC67H3 Regression Analysis Duration: One hour and fifty minutes Last Name: First Name: Student

More information

ECON 4230 Intermediate Econometric Theory Exam

ECON 4230 Intermediate Econometric Theory Exam ECON 4230 Intermediate Econometric Theory Exam Multiple Choice (20 pts). Circle the best answer. 1. The Classical assumption of mean zero errors is satisfied if the regression model a) is linear in the

More information

General Linear Statistical Models - Part III

General Linear Statistical Models - Part III General Linear Statistical Models - Part III Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Interaction Models Lets examine two models involving Weight and Domestic in the cars93 dataset.

More information

Dealing with Heteroskedasticity

Dealing with Heteroskedasticity Dealing with Heteroskedasticity James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Dealing with Heteroskedasticity 1 / 27 Dealing

More information

Multiple Regression Introduction to Statistics Using R (Psychology 9041B)

Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Paul Gribble Winter, 2016 1 Correlation, Regression & Multiple Regression 1.1 Bivariate correlation The Pearson product-moment

More information

NC Births, ANOVA & F-tests

NC Births, ANOVA & F-tests Math 158, Spring 2018 Jo Hardin Multiple Regression II R code Decomposition of Sums of Squares (and F-tests) NC Births, ANOVA & F-tests A description of the data is given at http://pages.pomona.edu/~jsh04747/courses/math58/

More information

Psychology 405: Psychometric Theory

Psychology 405: Psychometric Theory Psychology 405: Psychometric Theory Homework Problem Set #2 Department of Psychology Northwestern University Evanston, Illinois USA April, 2017 1 / 15 Outline The problem, part 1) The Problem, Part 2)

More information

Model Specification and Data Problems. Part VIII

Model Specification and Data Problems. Part VIII Part VIII Model Specification and Data Problems As of Oct 24, 2017 1 Model Specification and Data Problems RESET test Non-nested alternatives Outliers A functional form misspecification generally means

More information

ECON Introductory Econometrics. Lecture 6: OLS with Multiple Regressors

ECON Introductory Econometrics. Lecture 6: OLS with Multiple Regressors ECON4150 - Introductory Econometrics Lecture 6: OLS with Multiple Regressors Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 6 Lecture outline 2 Violation of first Least Squares assumption

More information

Comparing Nested Models

Comparing Nested Models Comparing Nested Models ST 370 Two regression models are called nested if one contains all the predictors of the other, and some additional predictors. For example, the first-order model in two independent

More information

Chapter 3: Multiple Regression. August 14, 2018

Chapter 3: Multiple Regression. August 14, 2018 Chapter 3: Multiple Regression August 14, 2018 1 The multiple linear regression model The model y = β 0 +β 1 x 1 + +β k x k +ǫ (1) is called a multiple linear regression model with k regressors. The parametersβ

More information

Workshop 7.4a: Single factor ANOVA

Workshop 7.4a: Single factor ANOVA -1- Workshop 7.4a: Single factor ANOVA Murray Logan November 23, 2016 Table of contents 1 Revision 1 2 Anova Parameterization 2 3 Partitioning of variance (ANOVA) 10 4 Worked Examples 13 1. Revision 1.1.

More information

Handout 4: Simple Linear Regression

Handout 4: Simple Linear Regression Handout 4: Simple Linear Regression By: Brandon Berman The following problem comes from Kokoska s Introductory Statistics: A Problem-Solving Approach. The data can be read in to R using the following code:

More information

Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression

Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression Scenario: 31 counts (over a 30-second period) were recorded from a Geiger counter at a nuclear

More information

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58 Inference ME104: Linear Regression Analysis Kenneth Benoit August 15, 2012 August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58 Stata output resvisited. reg votes1st spend_total incumb minister

More information

MATH 423/533 - ASSIGNMENT 4 SOLUTIONS

MATH 423/533 - ASSIGNMENT 4 SOLUTIONS MATH 423/533 - ASSIGNMENT 4 SOLUTIONS INTRODUCTION This assignment concerns the use of factor predictors in linear regression modelling, and focusses on models with two factors X 1 and X 2 with M 1 and

More information

22s:152 Applied Linear Regression

22s:152 Applied Linear Regression 22s:152 Applied Linear Regression Chapter 7: Dummy Variable Regression So far, we ve only considered quantitative variables in our models. We can integrate categorical predictors by constructing artificial

More information

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Today (Re-)Introduction to linear models and the model space What is linear regression Basic properties of linear regression Using

More information

Multiple Regression Part I STAT315, 19-20/3/2014

Multiple Regression Part I STAT315, 19-20/3/2014 Multiple Regression Part I STAT315, 19-20/3/2014 Regression problem Predictors/independent variables/features Or: Error which can never be eliminated. Our task is to estimate the regression function f.

More information

36-707: Regression Analysis Homework Solutions. Homework 3

36-707: Regression Analysis Homework Solutions. Homework 3 36-707: Regression Analysis Homework Solutions Homework 3 Fall 2012 Problem 1 Y i = βx i + ɛ i, i {1, 2,..., n}. (a) Find the LS estimator of β: RSS = Σ n i=1(y i βx i ) 2 RSS β = Σ n i=1( 2X i )(Y i βx

More information

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference. Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences

More information

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C = Economics 130 Lecture 6 Midterm Review Next Steps for the Class Multiple Regression Review & Issues Model Specification Issues Launching the Projects!!!!! Midterm results: AVG = 26.5 (88%) A = 27+ B =

More information

Motor Trend Car Road Analysis

Motor Trend Car Road Analysis Motor Trend Car Road Analysis Zakia Sultana February 28, 2016 Executive Summary You work for Motor Trend, a magazine about the automobile industry. Looking at a data set of a collection of cars, they are

More information

Final Exam. Name: Solution:

Final Exam. Name: Solution: Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.

More information

SCHOOL OF MATHEMATICS AND STATISTICS Autumn Semester

SCHOOL OF MATHEMATICS AND STATISTICS Autumn Semester RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: "Statistics Tables" by H.R. Neave PAS 371 SCHOOL OF MATHEMATICS AND STATISTICS Autumn Semester 2008 9 Linear

More information

STAT 572 Assignment 5 - Answers Due: March 2, 2007

STAT 572 Assignment 5 - Answers Due: March 2, 2007 1. The file glue.txt contains a data set with the results of an experiment on the dry sheer strength (in pounds per square inch) of birch plywood, bonded with 5 different resin glues A, B, C, D, and E.

More information

Multiple Regression Analysis. Basic Estimation Techniques. Multiple Regression Analysis. Multiple Regression Analysis

Multiple Regression Analysis. Basic Estimation Techniques. Multiple Regression Analysis. Multiple Regression Analysis Multiple Regression Analysis Basic Estimation Techniques Herbert Stocker herbert.stocker@uibk.ac.at University of Innsbruck & IIS, University of Ramkhamhaeng Regression Analysis: Statistical procedure

More information

ECO321: Economic Statistics II

ECO321: Economic Statistics II ECO321: Economic Statistics II Chapter 6: Linear Regression a Hiroshi Morita hmorita@hunter.cuny.edu Department of Economics Hunter College, The City University of New York a c 2010 by Hiroshi Morita.

More information

Essential of Simple regression

Essential of Simple regression Essential of Simple regression We use simple regression when we are interested in the relationship between two variables (e.g., x is class size, and y is student s GPA). For simplicity we assume the relationship

More information

Extensions of One-Way ANOVA.

Extensions of One-Way ANOVA. Extensions of One-Way ANOVA http://www.pelagicos.net/classes_biometry_fa18.htm What do I want You to Know What are two main limitations of ANOVA? What two approaches can follow a significant ANOVA? How

More information

Rockefeller College University at Albany

Rockefeller College University at Albany Rockefeller College University at Albany PAD 705 Handout: Suggested Review Problems from Pindyck & Rubinfeld Original prepared by Professor Suzanne Cooper John F. Kennedy School of Government, Harvard

More information

Inference with Heteroskedasticity

Inference with Heteroskedasticity Inference with Heteroskedasticity Note on required packages: The following code requires the packages sandwich and lmtest to estimate regression error variance that may change with the explanatory variables.

More information

MGEC11H3Y L01 Introduction to Regression Analysis Term Test Friday July 5, PM Instructor: Victor Yu

MGEC11H3Y L01 Introduction to Regression Analysis Term Test Friday July 5, PM Instructor: Victor Yu Last Name (Print): Solution First Name (Print): Student Number: MGECHY L Introduction to Regression Analysis Term Test Friday July, PM Instructor: Victor Yu Aids allowed: Time allowed: Calculator and one

More information

Final Exam - Solutions

Final Exam - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your

More information

Table 1: Fish Biomass data set on 26 streams

Table 1: Fish Biomass data set on 26 streams Math 221: Multiple Regression S. K. Hyde Chapter 27 (Moore, 5th Ed.) The following data set contains observations on the fish biomass of 26 streams. The potential regressors from which we wish to explain

More information