ECO375 Tutorial 4 Wooldridge: Chapter 6 and 7

Similar documents
ECNS 561 Topics in Multiple Regression Analysis

ECO375 Tutorial 8 Instrumental Variables

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

CHAPTER 7. + ˆ δ. (1 nopc) + ˆ β1. =.157, so the new intercept is = The coefficient on nopc is.157.

Multiple Regression Analysis: Inference MULTIPLE REGRESSION ANALYSIS: INFERENCE. Sampling Distributions of OLS Estimators

ECO375 Tutorial 4 Introduction to Statistical Inference

Answer Key: Problem Set 6

Solutions to Problem Set 5 (Due November 22) Maximum number of points for Problem set 5 is: 220. Problem 7.3

Answer Key: Problem Set 5

Inference in Regression Analysis

Sociology 593 Exam 2 March 28, 2002

CHAPTER 4. > 0, where β

Sociology 593 Exam 2 Answer Key March 28, 2002

Multiple Regression Analysis: Further Issues

Problem Set # 1. Master in Business and Quantitative Methods

Chapter 9: The Regression Model with Qualitative Information: Binary Variables (Dummies)

Introductory Econometrics Exercises for tutorials (Fall 2014)

Inference in Regression Model

Quantitative Methods Final Exam (2017/1)

2. (3.5) (iii) Simply drop one of the independent variables, say leisure: GP A = β 0 + β 1 study + β 2 sleep + β 3 work + u.

Statistical Inference with Regression Analysis

Econometrics Review questions for exam

Solutions to Problem Set 4 (Due November 13) Maximum number of points for Problem set 4 is: 66. Problem C 6.1

Homework Set 2, ECO 311, Fall 2014

Universidad Carlos III de Madrid Econometría Nonlinear Regression Functions Problem Set 8

Unless provided with information to the contrary, assume for each question below that the Classical Linear Model assumptions hold.

Simple Regression Model. January 24, 2011

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

ECON 482 / WH Hong Binary or Dummy Variables 1. Qualitative Information

Economics 345: Applied Econometrics Section A01 University of Victoria Midterm Examination #2 Version 1 SOLUTIONS Fall 2016 Instructor: Martin Farnham

Math 1040 Final Exam Form A Introduction to Statistics Fall Semester 2010

11 Correlation and Regression

Ch 7: Dummy (binary, indicator) variables

Economics 345: Applied Econometrics Section A01 University of Victoria Midterm Examination #2 Version 2 Fall 2016 Instructor: Martin Farnham

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Stat401E Fall Lab 12

where Female = 0 for males, = 1 for females Age is measured in years (22, 23, ) GPA is measured in units on a four-point scale (0, 1.22, 3.45, etc.

Economics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects

ECO375 Tutorial 7 Heteroscedasticity

Psych 230. Psychological Measurement and Statistics

Marketing Research Session 10 Hypothesis Testing with Simple Random samples (Chapter 12)

Exercise Sheet 4 Instrumental Variables and Two Stage Least Squares Estimation

Eco 391, J. Sandford, spring 2013 April 5, Midterm 3 4/5/2013

MGEC11H3Y L01 Introduction to Regression Analysis Term Test Friday July 5, PM Instructor: Victor Yu

Making sense of Econometrics: Basics

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

15.8 MULTIPLE REGRESSION WITH MANY EXPLANATORY VARIABLES

Regression with Qualitative Information. Part VI. Regression with Qualitative Information

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

Regression #8: Loose Ends

ECON3150/4150 Spring 2016

Problem 13.5 (10 points)

In Chapter 2, we learned how to use simple regression analysis to explain a dependent

Multiple Regression Analysis

Topics in Applied Econometrics and Development - Spring 2014

In Chapter 2, we learned how to use simple regression analysis to explain a dependent

Mathematics for Economics MA course

ECO375 Tutorial 9 2SLS Applications and Endogeneity Tests

Wooldridge, Introductory Econometrics, 4th ed. Chapter 6: Multiple regression analysis: Further issues

Answers to Problem Set #4

Solutions to Exercises in Chapter 9

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

1 Linear Regression Analysis The Mincer Wage Equation Data Econometric Model Estimation... 11

An Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01

1. The shoe size of five randomly selected men in the class is 7, 7.5, 6, 6.5 the shoe size of 4 randomly selected women is 6, 5.

Notes 6: Multivariate regression ECO 231W - Undergraduate Econometrics

STA441: Spring Multiple Regression. More than one explanatory variable at the same time

Wednesday, October 10 Handout: One-Tailed Tests, Two-Tailed Tests, and Logarithms

An Introduction to Mplus and Path Analysis

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

Inferences About Two Proportions

Lecture 4: Multivariate Regression, Part 2

Lab 6 - Simple Regression

LECTURE 2: SIMPLE REGRESSION I

x i = 1 yi 2 = 55 with N = 30. Use the above sample information to answer all the following questions. Show explicitly all formulas and calculations.

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

ECONOMETRICS I. Cheating and the violation of any of the above instructions, lead to the cancellation of the student s paper.

Lecture 4 Scatterplots, Association, and Correlation

Lecture 4 Scatterplots, Association, and Correlation

Multiple Linear Regression CIVL 7012/8012

Practice exam questions

Problem Set 10: Panel Data

ECON3150/4150 Spring 2015

Homework Set 2, ECO 311, Spring 2014

Interpreting and using heterogeneous choice & generalized ordered logit models

Lecture 10: Alternatives to OLS with limited dependent variables. PEA vs APE Logit/Probit Poisson

Outline. 2. Logarithmic Functional Form and Units of Measurement. Functional Form. I. Functional Form: log II. Units of Measurement

Lab 10 - Binary Variables

Sociology Exam 2 Answer Key March 30, 2012

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

Inference for the Regression Coefficient

Problem #1 #2 #3 #4 #5 #6 Total Points /6 /8 /14 /10 /8 /10 /56

Black White Total Observed Expected χ 2 = (f observed f expected ) 2 f expected (83 126) 2 ( )2 126

Multiple Regression: Chapter 13. July 24, 2015

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0

An Introduction to Path Analysis

STA220H1F Term Test Oct 26, Last Name: First Name: Student #: TA s Name: or Tutorial Room:

Ec1123 Section 7 Instrumental Variables

Week 3: Simple Linear Regression

Linear Regression With Special Variables

Transcription:

ECO375 Tutorial 4 Wooldridge: Chapter 6 and 7 Matt Tudball University of Toronto St. George October 6, 2017 Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 1 / 36

ECO375 Tutorial 4 Welcome back! Today s coverage: Chapter 6, #3 (in slides) Chapter 6, #8 (in slides) Chapter 6, C10 (in slides) Chapter 7, #4 (in slides) Chapter 7, #8 (in slides) Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 2 / 36

Chapter 6, #3 Using the data in RDCHEM.dta, the following equation was obtained by OLS: rdintens = 2.613 +.0003 sales.000000007 sales 2 (.429) (.00014) (.0000000037) n = 32, R 2 =.1484 i) At what point does the marginal effect of sales on rdintens become negative? We can take the derivative of rdintens with respect to sales and set it equal to 0:.000000014 sales =.0003. Then we know that the point at which the marginal effect of sales becomes negative is sales = 21, 428.57. Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 3 / 36

Chapter 6, #3 Using the data in RDCHEM.dta, the following equation was obtained by OLS: rdintens = 2.613 +.0003 sales.000000007 sales 2 (.429) (.00014) (.0000000037) n = 32, R 2 =.1484 i) At what point does the marginal effect of sales on rdintens become negative? We can take the derivative of rdintens with respect to sales and set it equal to 0:.000000014 sales =.0003. Then we know that the point at which the marginal effect of sales becomes negative is sales = 21, 428.57. Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 4 / 36

Chapter 6, #3 ii) Would you keep the quadratic term in the model? Explain. Probably. The t-statistic on ˆβ sales 2 is -.000000007/.0000000037 = -1.89, which is significant against the one-sided alternative H 1 : β sales 2 < 0. Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 5 / 36

Chapter 6, #3 ii) Would you keep the quadratic term in the model? Explain. Probably. The t-statistic on ˆβ sales 2 is -.000000007/.0000000037 = -1.89, which is significant against the one-sided alternative H 1 : β sales 2 < 0. Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 6 / 36

Chapter 6, #3 iii) Define salesbil as sales measured in billions of dollars: salesbil = sales/1000. Rewrite the estimated equation with salesbil and salesbil 2 as the independent variables. Be sure to report the standard errors and the R 2. rdintens = 2.613 +.0003 sales.000000007 sales 2 = 2.613 +.0003 (1000 salesbil).000000007(1000 salesbil) 2 = 2.613 +.3 salesbil.007 salesbil 2 (.429) (.14) (.0037) Recall that se( ˆβ j ) = ˆσ/[SST j (1 R 2 )] 1/2 (3.58). Rescaling sales will have no effect on ˆσ or R 2 since it does not change the fit of the regression. It will, however, affect SST sales and SST sales 2. Specifically, SST salesbil = n i=1 (salesbil salesbil)2 = n i=1 (sales sales)2 /1000 2 = SST sales /1000 2. Similarly SST salesbil 2 = SST sales 2/1000 4. Therefore we need to scale the standard errors of ˆβ salesbil and ˆβ salesbil 2 up by 1000 and 1000 2 respectively. Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 7 / 36

Chapter 6, #3 iii) Define salesbil as sales measured in billions of dollars: salesbil = sales/1000. Rewrite the estimated equation with salesbil and salesbil 2 as the independent variables. Be sure to report the standard errors and the R 2. rdintens = 2.613 +.0003 sales.000000007 sales 2 = 2.613 +.0003 (1000 salesbil).000000007(1000 salesbil) 2 = 2.613 +.3 salesbil.007 salesbil 2 (.429) (.14) (.0037) Recall that se( ˆβ j ) = ˆσ/[SST j (1 R 2 )] 1/2 (3.58). Rescaling sales will have no effect on ˆσ or R 2 since it does not change the fit of the regression. It will, however, affect SST sales and SST sales 2. Specifically, SST salesbil = n i=1 (salesbil salesbil)2 = n i=1 (sales sales)2 /1000 2 = SST sales /1000 2. Similarly SST salesbil 2 = SST sales 2/1000 4. Therefore we need to scale the standard errors of ˆβ salesbil and ˆβ salesbil 2 up by 1000 and 1000 2 respectively. Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 8 / 36

Chapter 6, #3 iv) For the purpose of reporting the results, which equation do you prefer? The equation in part (iii) is easier to read because it contains fewer zeros to the right of the decimal. Of course the interpretation of the two equations is identical once the different scales are accounted for. Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 9 / 36

Chapter 6, #3 iv) For the purpose of reporting the results, which equation do you prefer? The equation in part (iii) is easier to read because it contains fewer zeros to the right of the decimal. Of course the interpretation of the two equations is identical once the different scales are accounted for. Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 10 / 36

Chapter 6, #8 Suppose we want to estimate the effects of alcohol consumption (alcohol) on college grade point average (colgpa). In addition to collecting information on grade point averages and alcohol usage, we also obtain attendance information (say, percentage of lectures attended, called attend). A standardised test score (say, SAT ) and high school GPA (hsgpa) are also available. Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 11 / 36

Chapter 6, #8 i) Should we include attend along with alcohol as explanatory variables in a multiple regression model? (Think about how you would interpret β alcohol ). This is going to be a judgement call. We need to think about the causal pathways going from alcohol consumption to college GPA. It is very possible that alcohol consumption alcohol reduces lecture attendance attend which then reduces college GPA colgpa. Therefore if we control for attend in our regression we need to interpret β alcohol as the effect of alcohol consumption on college GPA other than the effects coming through lecture attendance. Our decision to include attend or not depends on whether we consider this an acceptable interpretation. If not, we may want to omit attend. Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 12 / 36

Chapter 6, #8 i) Should we include attend along with alcohol as explanatory variables in a multiple regression model? (Think about how you would interpret β alcohol ). This is going to be a judgement call. We need to think about the causal pathways going from alcohol consumption to college GPA. It is very possible that alcohol consumption alcohol reduces lecture attendance attend which then reduces college GPA colgpa. Therefore if we control for attend in our regression we need to interpret β alcohol as the effect of alcohol consumption on college GPA other than the effects coming through lecture attendance. Our decision to include attend or not depends on whether we consider this an acceptable interpretation. If not, we may want to omit attend. Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 13 / 36

Chapter 6, #8 ii) Should SAT and hsgpa be included as explanatory variables? Explain. These are probably okay to include as explanatory variables since they are likely to be determined before alcohol consumption, meaning that we are not controlling for a potential causal pathway. SAT and hsgpa may relate to alcohol consumption (ex. students with lower SAT scores may be less interested in academics and more interested in partying) and they are definitely related to college GPA. Therefore we want to include them both as controls. A potential causal pathway we are shutting down, however, is the long-term effects of alcohol consumption. Students who began drinking heavily in high school may have impaired their academic performance, lowering SAT and hsgpa, which then continues to impair their college GPA colgpa. Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 14 / 36

Chapter 6, #8 ii) Should SAT and hsgpa be included as explanatory variables? Explain. These are probably okay to include as explanatory variables since they are likely to be determined before alcohol consumption, meaning that we are not controlling for a potential causal pathway. SAT and hsgpa may relate to alcohol consumption (ex. students with lower SAT scores may be less interested in academics and more interested in partying) and they are definitely related to college GPA. Therefore we want to include them both as controls. A potential causal pathway we are shutting down, however, is the long-term effects of alcohol consumption. Students who began drinking heavily in high school may have impaired their academic performance, lowering SAT and hsgpa, which then continues to impair their college GPA colgpa. Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 15 / 36

Chapter 6, C10 Use the data in BWGHT2.dta for this exercise. i) Estimate the equation log(bwght) = β 0 + β 1 npvis + β 2 npvis 2 + u by OLS, and report the results in the usual way. Is the quadratic term significant? The results are displayed here: log(bwght) = 7.958 +.0189 npvis.000429 npvis 2 (.0273) (.00368) (.00012) n = 1764, R 2 =.0213 We can see that the t-statistic on ˆβ 2 is -.000429/.00012 = -3.575, indicating that the quadratic term is very significant. Stata also reports a p-value of 0.000 (meaning it smaller than 0.001). Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 16 / 36

Chapter 6, C10 ii) Show that, based on the equation from part (i), the number of prenatal visits that maximises log(bwght) is estimated to be about 22. How many women had at least 22 prenatal visits in the sample? Just like with Chapter 6 #3, we can take the derivative of the equation in (i) with respect to npvis and set it equal to 0. We know that this will be a maximum since the coefficient on npvis 2 is negative..0189.000858 npvis = 0 which indicates npvis = 22.02 22. Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 17 / 36

Chapter 6, C10 iii) Does it make sense that birth weight is actually predicted to decline after 22 prenatal visits? While prenatal visits are a good thing for helping to prevent low birth weight, a woman s having many prenatal visits is a possible indicator of a pregnancy with difficulties. So it does make sense that the quadratic has a hump shape, provided we do not interpret the turnaround as implying that too many visits actually causes low birth weight. Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 18 / 36

Chapter 6, C10 iii) Does it make sense that birth weight is actually predicted to decline after 22 prenatal visits? While prenatal visits are a good thing for helping to prevent low birth weight, a woman s having many prenatal visits is a possible indicator of a pregnancy with difficulties. So it does make sense that the quadratic has a hump shape, provided we do not interpret the turnaround as implying that too many visits actually causes low birth weight. Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 19 / 36

Chapter 6, C10 iv) Add mother s age into the equation, using a quadratic functional form. Holding npvis fixed, at what mother s age is the birth weight of the child maximised? What fraction of women in the sample are older than the optimal age? mage is the variable indicating mother s age. We estimate that ˆβ mage =.0254 and ˆβ mage 2 =.000412. Similar to (i) we know that mage is maximised when.0254.000824 mage = 0, which indicates mage = 30.83 31. By tabulating our data we can see that 66.98% of women in the sample are younger than 31, so 33.02% are older than 31. Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 20 / 36

Chapter 6, C10 v) Would you say that mother s age and number of prenatal visits explain a lot of the variation in log(bwght)? No. R 2 =.0256 so we are explaining only about 2.6% of the variation in log(bwght). Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 21 / 36

Chapter 6, C10 vi) Using quadratics for both npvis and mage, decide whether using the natural log or the level of bwght is better for predicting bwght. When we use bwght as a dependent variable instead of log(bwght) we obtain a R 2 = 0.0192. However, to compare this to the R 2 coming from the regression with log(bwght) as a dependent variable, we need to know how well that regression predicts bwght in levels (see section 6.4). We know that this is bwght = exp(ˆσ 2 /2)exp( log(bwght)) (6.42). From here we want to compute the square correlation between bwght and bwght (this is another way of calculating the R 2 ). I compute the correlation to be.1362 and so the square correlation is.0186. This means that the regression with bwght as the dependent variable explains a tiny bit more of the variation (.0192) than the regression with log(bwght) as a dependent variable (.0186). Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 22 / 36

Chapter 7, #4 An equation explaining chief executive officer salary is log(salary) = 4.59 +.257 log(sales) +.011 roe +.158 finance (.30) (.032) (.004) (.089) +.181 consprod.283 utility (.085) (.099) n = 209, R 2 = 0.357 The data used are in CEOSAL1.dta, where finance, consprod and utility are binary variables indicating the financial, consumer products and utilities industries. The omitted variable is transportation. Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 23 / 36

Chapter 7, #4 i) Compute the approximate percentage difference in estimated salary between the utility and transportation industries, holding sales and roe fixed. Is the difference statistically significant at the 1% level? We can see from the estimated regression in the previous slide that the coefficient on utility is -.283. This means that CEO salaries in the utility industry are approximately 28.3% less on average than the CEO salaries in the transportation industry (which was omitted). The t-statistic on this coefficient is.283/.099 = 2.86, which is very statistically significant. Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 24 / 36

Chapter 7, #4 ii) Use equation (7.10) to obtain the exact percentage difference in estimated salary between the utility and transportation industries and compare this with the answer obtained in part (i). Recall that the exact percentage difference in salaries is 100 [exp( ˆβ utility ) 1] (7.10). See Example 7.5 for a derivation. The exact percentage difference between the utility and transportation industries is therefore -24.7% and so this estimate is somewhat smaller in magnitude than the one in (i). Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 25 / 36

Chapter 7, #4 iii) What is the approximate percentage difference in estimated salary between the consumer products and finance industries? Write an equation that would allow you to test whether the difference is statistically significant. The proportionate difference is.181 -.158 =.023, or about 2.3%. We could write a slightly different multiple regression equation, log(salary) =β 0 + β 1 log(sales) + β 2 roe + δ 1 consprod + δ 2 utility + δ 3 trans + u Now finance is the omitted industry, so we would interpret δ 1 as the approximate percentage difference in CEO salaries between the consumer products and finance industries. We can tell if this difference is statistically significant by checking whether δ 1 is statistically significant. Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 26 / 36

Chapter 7, #8 Suppose you collect data from a survey on wages, education, experience and gender. In addition, you ask for information about marijuana useage. The original is: On how many separate occasions last month did you smoke marijuana? i) Write an equation that would allow you to estimate the effects of marijuana useage on wages, while controlling for other factors. You should be able to make statements such as, Smoking marijuana five more times per month is estimated to change wage by x%. log(wage) =β 0 + β 1 useage + β 2 educ + β 3 exper + β 4 female + u Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 27 / 36

Chapter 7, #8 Suppose you collect data from a survey on wages, education, experience and gender. In addition, you ask for information about marijuana useage. The original is: On how many separate occasions last month did you smoke marijuana? i) Write an equation that would allow you to estimate the effects of marijuana useage on wages, while controlling for other factors. You should be able to make statements such as, Smoking marijuana five more times per month is estimated to change wage by x%. log(wage) =β 0 + β 1 useage + β 2 educ + β 3 exper + β 4 female + u Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 28 / 36

Chapter 7, #8 ii) Write a model that would allow you to test whether drug usage has different effects on wages for men and women. How would you test that there are no differences in the effects of drug useage for men and women? log(wage) =β 0 + β 1 useage + β 2 educ + β 3 exper + β 4 female + β 5 useage female + u Testing that there are no difference in the effects of drug useage for men and women would involve testing H 0 : β 5 = 0 against H 1 : β 5 0. Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 29 / 36

Chapter 7, #8 ii) Write a model that would allow you to test whether drug usage has different effects on wages for men and women. How would you test that there are no differences in the effects of drug useage for men and women? log(wage) =β 0 + β 1 useage + β 2 educ + β 3 exper + β 4 female + β 5 useage female + u Testing that there are no difference in the effects of drug useage for men and women would involve testing H 0 : β 5 = 0 against H 1 : β 5 0. Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 30 / 36

Chapter 7, #8 iii) Suppose you think it is better to measure marijuana useage by putting people into one of four categories: non-user, light user (1 to 5 times per month), moderate user (6 to 10 times per month) and heavy user (more than 10 times per month). Now, write a model that allows you to estimate the effects of marijuana useage on wage. Assuming no interaction effect between useage and sex, the model would look like, log(wage) =β 0 + β 1 light + β 2 moderate + β 3 heavy + β 4 educ+ + β 5 exper + β 6 female + u In this model, non-user is the omitted category. Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 31 / 36

Chapter 7, #8 iii) Suppose you think it is better to measure marijuana useage by putting people into one of four categories: non-user, light user (1 to 5 times per month), moderate user (6 to 10 times per month) and heavy user (more than 10 times per month). Now, write a model that allows you to estimate the effects of marijuana useage on wage. Assuming no interaction effect between useage and sex, the model would look like, log(wage) =β 0 + β 1 light + β 2 moderate + β 3 heavy + β 4 educ+ + β 5 exper + β 6 female + u In this model, non-user is the omitted category. Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 32 / 36

Chapter 7, #8 iv) Using the model in part (iii)m explain in detail how to test the null hypothesis that marijuana useage has no effect on wage. Be very specific and include a careful listing of degrees of freedom. The null hypothesis here is H 0 : β 1 = β 2 = β 3 = 0. Naturally this is going to be an F-test on q = 3 restrictions. We are also going to have degrees of freedom df = n 7 1 for a sample of size n, since we have 7 independent variables in the unrestricted model. So we would be obtaining a critical value from the F q,n 8 distribution. Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 33 / 36

Chapter 7, #8 iv) Using the model in part (iii)m explain in detail how to test the null hypothesis that marijuana useage has no effect on wage. Be very specific and include a careful listing of degrees of freedom. The null hypothesis here is H 0 : β 1 = β 2 = β 3 = 0. Naturally this is going to be an F-test on q = 3 restrictions. We are also going to have degrees of freedom df = n 6 1 for a sample of size n, since we have 6 independent variables in the unrestricted model. So we would be obtaining a critical value from the F q,n 7 distribution. Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 34 / 36

Chapter 7, #8 v) What are some of the potential problems with drawing causal inference using the survey data you collected? We can think of several here. Respondents may not accurately report their marijuana useage, perhaps out of social stigma or fear of legal repercussions. There may be omitted variables which determine both marijuana useage and wages. For example, people living in urban areas may have easier access to marijuana and may earn higher wages on average. In this example, our estimate would be downward biased. Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 35 / 36

Chapter 7, #8 v) What are some of the potential problems with drawing causal inference using the survey data you collected? We can think of several here. 1 Respondents may not accurately report their marijuana useage, perhaps out of social stigma or fear of legal repercussions. 2 There may be omitted variables which determine both marijuana useage and wages. For example, people living in urban areas may have easier access to marijuana and may earn higher wages on average. In this example, our estimate would be downward biased. Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 36 / 36