Lecture 8. Using the CLR Model. Relation between patent applications and R&D spending. Variables

Size: px

Start display at page:

Download "Lecture 8. Using the CLR Model. Relation between patent applications and R&D spending. Variables"

Horace Summers
5 years ago
Views:

1 Lecture 8. Using the CLR Model Relation between patent applications and R&D spending Variables PATENTS = No. of patents (in 000) filed RDEP = Expenditure on research&development (in billions of 99 $) The data are time series for (34 observations) First step in analysis: descriptive statistics and graphs Sample mean, standard deviation etc. of variables With time-series data, time-series plot (scale!). Note units are chosen to make the range of the variables comparable Scatterplot of PATENTS against RDEP

2 Date: 09/6/0 Time: 3:35 Sample: PATENTS RDEP Mean Median Maximum Minimum Std. Dev Skewness Kurtosis Jarque-Bera Probability Observations 34 34

3 PATENTS RDEP

4 PATENTS vs. RDEP PATENTS RDEP

5 Second step: estimate the relation PATENTS = α + β RDEP + u t (Note that I use subscript t to reflect the nature of the data) We assume that this inexact linear relation satisfies the assumption of the Classical Linear Regression (CLR) model t t Assumption : Assumption : u t, t =, K, n are random variables with E ( ) = 0 t, t =, K, n are deterministic, i.e. non-random, constants. u t Assumption 3 (Homoskedasticity) All u t ' s have the same variance, i.e. for t =, K, n Var ( ut ) = E( ut ) = σ Assumption 4 (No serial correlation) The random errors u t and u s are not correlated for all t s =, K, n If these assumptions hold then the best estimator for α, β is the OLS estimator. The output of a computer program that computes the OLS estimates and other quantities is reproduced next. The formulas that the computer uses are OLS estimates ˆβ n i i = = n ( )( Y i= ( i i ) Y ) αˆ = Y βˆ

6 3 Standard errors of estimates and standard error of regression ) ( ˆ) ( s n Std n i i n i i = = = α = = n i i s Std ) ( ˆ) (β = = n i e i n s t-statistic ˆ) ( / αˆ α α Std T = ) ˆ ( / ˆ β β β Std T = R = = = n i i n i i Y Y Y Y R ) ( ˆ ) ˆ (

7 Dependent Variable: PATENTS Method: Least Squares Date: 09/6/0 Time: :7 Sample: Included observations: 34 Variable Coefficient Std. Error t-statistic Prob. C RDEP R-squared Mean dependent var 9.38 Adjusted R-squared S.D. dependent var S.E. of regression.737 Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic)

8 Computer output Estimates: Interpretation coefficient RDEP: if RDEP changes by unit, i.e. billion of 99 $, then PATENTS changes by.799, i.e. by about 79. Estimates and standard errors: We use the standard error to find a 95% confidence interval for β. The formula for the bounds is βˆ ± c Std ( βˆ) with c the value such that Pr( T > c) =. 05 where T has a t distribution with 34-=3 degrees of freedom (df). From table in front cover c Hence, the 95% interval: [. 6764,.9075] Note if we use c = (rule of thumb) or c =. 96 (standard normal), the result is close. Test: We test H 0 : β = 0 against H : β 0 (two-sided test). If H 0 : β = 0 is true, then T = βˆ / Std( βˆ) has a t distribution with 34-=3 degrees of freedom (df). We reject if β T β > c, with c such that Pr( T > c) =. 05 for t distribution with 3 df. Hence c.038. If H : β > 0, then c is such that Pr( T > c) =. 05 for t distribution with 3 df or c For both H s we reject H 0. Alternative way to report result of test is p-value p-value= = Pr( T > t ) = Pr( T > ) = β β where T β has the distribution that holds if H 0 is true, i.e. t distribution with 3 df. If the p-value is less than.05, we reject at the 5% level. Note F-statistic=(t-statistic β ) Fit of linear relation: The R seems high, but as we shall see that does not imply that all is well. 4

9 Reporting regression results Always report Estimates of the regression coefficients Their standard errors (preferred to t-ratios. Why?) Estimate of σ or σ, i.e. s or s R (least useful, but most users want to know) If the number variables is small two options in reporting the results As equation PATENTS = RDEP (6.36) (.057) s =.7 R =. 86 Or in a table OLS estimates (standard errors) regression of no. of patents (in 000) on R&D expenditure (in billion 99 $); Constant (6.37) RDEP.79 (.057) Std. Error regression.7 R.86 5

10 How well does the model fit? With time-series we can plot the observed Y t and fitted/predicted Y ˆ t = αˆ + βˆ t and also e = Y Yˆ (see graph) t t t The fit seems to be good, but there is a clear pattern in the OLS residuals. In the scatterplot I plot et against e t, i.e. I investigate whether subsequent OLS residuals are correlated. Note the clear relation that indicates that Assumption 4 (no serial correlation) may not be correct. To check assumption 3 I plot et against i.e. the variance of u t (estimated by RDEP t. There is evidence of heteroskedasticity, e t ) is related to To check the normal distribution of u t I plot the distribution of e t. t and not the same for all t. 6

11 Residual Actual Fitted

12 RESID vs. RESIDL RESID RESIDL

13 RESID vs. RDEP RESID RDEP

14 Kernel Density (Epanechnikov, h = 9.4) RESID

15 Forecasting If RDEP 70 (not in sample), then we predict 994 = PATENTSF 994 = = 69. The variance of the prediction error is s = s ( n+ ) + + n ( i ) i= n+ n The square root (standard deviation of prediction error) is.89 (compare with s =. 7 ) A 95% prediction interval is, using equal to Y ˆ n+.038sn+ < Yn + < Yn ˆ s n < PATENTS 994 <

16 Now a rather curious relation Time-series Births= No. of births (in 0000), West- Germany Storks=No. of stork couples in Baden-Wurtemberg 8

17 BIRTHS STORKS

18 BIRTHS vs. STORKS BIRTHS STORKS

19 Dependent Variable: BIRTHS Method: Least Squares Date: 09/6/0 Time: 3:43 Sample: Included observations: Variable Coefficient Std. Error t-statistic Prob. C STORKS R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat.0030 Prob(F-statistic)

20 Residual Actual Fitted

21 RESID vs. RESIDL 5 0 RESID RESIDL

22 Dependent Variable: DBIRTHS Method: Least Squares Date: 09/6/0 Time: 3:5 Sample(adjusted): Included observations: after adjusting endpoints Variable Coefficient Std. Error t-statistic Prob. C DSTORKS R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic) 0.057

23 RESID vs. RESIDDIFL RESID RESIDDIFL

24 Lecture 9. The multiple Classical Linear Regression model Often we want to estimate the relation between a dependent variable Y and say K independent variables, K,. K Even with K explanatory variables, the linear relation is not exact: Y = β + β + L+ β K K + u with u the random error term that captures the effect of the omitted variables. Note if the relation has an intercept, then and β is the intercept of the relation. If we have n observations Y,, K,, i =, K, n, they satisfy Y = β + β + L+ β + u i i i i i K ik ik i for i =, K, n. Why do we want to include more than explanatory variable?. Because we are interested on the relation between all K variables and Y. Because we want to estimate the direct effect of one variable and include the other variables for that purpose

25 To see. consider the relation ( K = 3 ) Y β + β + + u = β 3 3 Even if we are only interested in the effect of on Y we cannot omit 3, if and 3 are related. For instance, if γ + + v 3 = γ then after substitution we find Y = ( β + β 3γ ) + ( β + β 3γ ) + u + β 3v In the relation between Y and,, the coefficient of is the sum of the direct effect β and the indirect effect β 3γ of on Y. If we include 3 in the relation, the coefficient of is β, i.e. the direct effect of on Y. The assumptions are the same as before Assumption : Assumption : u i, i =, K, n are random variables with E ( ) = 0 ik, i =, K, n, k =, K, K are deterministic, i.e. non-random, constants. u i Assumption 3 (Homoskedasticity) All u i ' s have the same variance, i.e. for i =, K, n Var ( ui) = E( ui ) = σ Assumption 4 (No serial correlation) The random errors u i and u j are not correlated for all i j =, K, n Cov ( u, u ) E( u u ) = 0 i j = i j

26 The inexact linear relation for i =, K, n Y = β + β + L+ β + u i i i K ik i with assumptions -4 is the multiple Classical Linear Regression (CLR) model (remember with an intercept i =, i =,, n ) K As in the simple CLR model the estimators of the regression parameters found by minimizing the sum of squared residuals n, K, β K ) = ( Yi β i β i L β K ik ) i= S( β β,, β K are K The β ˆ, K, βˆ that minimize the sum of squared residuals are Ordinary Least Squares K (OLS) estimators of β, K, β. K The OLS residuals are e i = Y βˆ ˆ L i i β i βˆ K ik The (unbiased) estimator of σ is Note n K = n e i i= s = n K no. of observations - no. of regression coefficients (including intercept) The OLS residuals have the same properties as in the regression model with regressor: for k =, K, K () ik e i n i= = 0 In words: The sample covariance of the OLS residuals and all regressors is 0. n e i i= For k = we have = i and hence an intercept. Goodness of fit = 0. Note: this holds if the regression model has

27 Define the fitted value as before Yˆ i = βˆ L i + + βˆ K ik By definition Y = Y ˆ + e i i i Because of () the sample covariance of the fitted values and the OLS residuals is 0. If the model has an intercept, then n n ( Yi Y ) = ( Yˆ i Y ) + i= i= i= Total Explained Unexplained Variation Variation Variation Total Sum Regression Error of Squares Sum of Sum of (TSS) Squares Squares (RSS) (ESS) n e i The R or Coefficient of Determination is defined as R RSS = = TSS ESS TSS The R increases if a regressor is added to the model. Why? Hint: Consider sum of squared residuals.

28 Adjusted (for degrees of freedom) R decreases if added variable does not explain much, i.e. if ESS does not decrease much, If ESS, R ESS /( n K) = TSS /( n ) K + ESSK are the ESS for the model with K + ESS K + < ESSK, and the respectively, then decreases if (one added) and K regressors, R increases if K + is added, but ESS K + ESSK K + > ESS K n K i.e. if the relative decrease is not big enough. For instance for n = 00, K = 0, the relative decrease has to be at least.% to lead to an improvement in arbitrary (and it is!) R R. This cutoff value looks In Section 4.3 many more criteria that balance decrease in ESS and degrees of freedom: Forget about them. Application: Demand for bus travel (Section 4.6 of Ramanathan) Variables BUSTRAVL = Demand for bus travel (000 of passenger hours) FARE = Bus fare in $ GASPRICE = Price of gallon of gasoline ($) INCOME = Average per capita income in $ DENSITY = Population density (persons/sq. mile) LANDAREA = Area of city (sq. miles) Data for 40 US cities in 988 Question: Does demand for bus decrease if average income increases? Important for examining effect economic growth on bus system. Reported are Descriptive statistics Scatterplot BUSTRVL and INCOME OLS estimates in simple linear regression model OLS estimates in multiple linear regression model Note in simple model INCOME has positive effect (be it that coefficient is not significantly different from 0)

29 If we include the other variables the effect is negative: $ more average income per head gives about 95 fewer passenger hours. Note that the fit improves dramatically from R =. 05 to R =. 9.

30 Date: 0/0/0 Time: :09 Sample: 40 BUSTRAVL DENSITY FARE GASPRICE INCOME LANDAREA Mean Median Maximum Minimum Std. Dev Skewness Kurtosis Jarque-Bera Probability Observations Date: 0/0/0 Time: :09 Sample: 40 POP Mean Median Maximum Minimum Std. Dev Skewness Kurtosis 9.95 Jarque-Bera Probability Observations 40

31 BUSTRAVL vs. INCOME 5000 BUSTRAVL INCOME

32 Dependent Variable: BUSTRAVL Method: Least Squares Date: 0/0/0 Time: :05 Sample: 40 Included observations: 40 Variable Coefficient Std. Error t-statistic Prob. C INCOME R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid.9e+08 Schwarz criterion Log likelihood F-statistic Durbin-Watson stat.8387 Prob(F-statistic) 0.558

33 Dependent Variable: BUSTRAVL Method: Least Squares Date: 0/0/0 Time: :07 Sample: 40 Included observations: 40 Variable Coefficient Std. Error t-statistic Prob. C INCOME FARE GASPRICE POP DENSITY LANDAREA R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid 8367 Schwarz criterion 6.5 Log likelihood F-statistic Durbin-Watson stat.0867 Prob(F-statistic)

34 The sampling distribution of β, ˆ K, βˆ K Under assumptions -4 the sampling distribution of β ˆ, K, βˆ has mean K β, K, β and a K sampling variance that is proportional to σ. The square root of the estimated (estimate σ by s ) variance of the individual regression coefficients are called the standard errors of the regression coefficients. Regression programs always report both βˆ and std β ˆ ), the standard error of βˆ k. The OLS estimator is also the Best Linear Unbiased Estimator (BLUE) in the multiple CLR model. If we make the additional assumption k ( k Assumption 5. The random error terms distribution. u i, i =, K, n are random variables with a normal We can derive the sampling distribution of β ˆ, K, βˆ and K s. The sampling distribution of β,, βˆ β, K, β. ˆ K multivariate normal with mean K K It can be shown that for all k =, K, K T k = βˆ k β k std( βˆ ) k has a t-distribution with n K degrees of freedom. As in the simple CLR model we can use this to find a confidence interval for done in the same way. β k. This is

35 Hypothesis testing in the multiple CLR model We consider two cases. Hypotheses involving coefficient. Hypotheses involving or more coefficients. Hypotheses involving coefficient For regression coefficient H β = β : 0 k k 0 β k the hypothesis is with alternative hypothesis or H β β two-sided alternative : k k 0 H β > β one-sided alternative : k k 0 In these hypotheses β k 0 is a hypothesized value, e.g. β = k 0 0 if the null hypothesis is no effect of k on Y. The test is based on the statistic T k = βˆ k β k 0 std( βˆ ) k If H 0 is not true, then T k is likely either large negative or large positive. For two-sided alternative, H : β k β k 0, we reject if T k > c and for a one-sided alternative, H β > β, if T k > d. : k k 0 For a test with a 5% confidence level, the cutoff values or Pr( T k > c) =.05, i.e. Pr( T k ) > c) =. 05 (two-sided) Pr( T k > d) =.05 (one-sided) where T k has a t-distribution with n K d.f. c, d are chosen such that

36 In example, n = 40, K = 7, c. 035, d. 695 Note that the coefficients of INCOME, POP are significantly different from 0 at the 5% level. The coefficient of DENSITY is significantly different from 0 at the 0% level or if we test with a one-sided alternative.

37 . Hypothesis involving or more coefficient As example consider the question whether the model with only INCOME as regressor is adequate. In the more general model with INCOME, FARE, GASPRICE, POP, DENSITY, LANDAREA this corresponds to the hypothesis that the coefficients of FARE, GASPRICE, POP, DENSITY, LANDAREA, β, β, β, β β are all 0, i.e , 7 H 0 : β 3 = 0, β 4 = 0, β 5 = 0, β 6 = 0, β 7 = 0 with alternative hypothesis H :One of these coefficien ts is not 0 How do we test this? Idea: If H 0 is true then the model with only INCOME should fit as well as the model with all regressors. Measure of fit is sum of squared OLS residuals n e i. i= Denote sum of squared residuals if H 0 is true by ESS 0. This is ESS if only INCOME is included. The ESS if H is true is denoted by ESS. Note: ESS 0 ESS. Why? We consider F ESS0 ESS = ESS Numerator: Difference of restricted ESS ( ESS 0 ) and unrestricted ESS ( ESS ) divided by the number of restrictions. Denominator: Estimator of We reject if F is large, i.e. σ in the model with all variables. F > c. If 0 H is true then F has an F-distribution with degrees of freedom 5 (numerator) and 35 (denominator).

38 From Table of F-distribution, we find that e.g. if c.49. In example: Pr( F > c) =.05 ESS = ESS = ESS0 ESS = F = = Conclusion: We reject H 0 with only INCOME is not adequate. In general, we consider at the 5% level (and at the % level; check this) and the model F = ESS0 ESS m ESS n K with m the number of restrictions and we reject H 0 if F is (too) large. This is the F test or Wald test (Wald test is really F test if n is large)

39 Special case: Test of overall significance H 0 : β = L = β K = i.e. all regression coefficients except intercept are 0. Hence, m = K 0 In example, the F statistic for this hypothesis is 64.4 with under H 0, m = 6 and n K = 33 d.f. Hence, we reject H 0 and we conclude that the regressors are needed to explain BUSTRAVL.

40 Lecture 0. Multiple CLR model: Specification errors and Multicollinearity Testing linear restrictions In the multiple CLR model Y = β + β K K + u we may want to test other types of hypotheses involving more than regression coefficient. Example Y = logc with C = aggregate consumption in a year = log I with I = aggregate income in a year = log Pop with Pop = Population size in a year Consider relations () logc = β + β log I + β3 log Pop + u and log C I + u Pop = β + Pop β log In second equation the relation is between consumption per capita and income per capita Because e.g. log I = log I log Pop Pop we rewrite the second equation as logc log Pop = β + β log I β log Pop + u or () logc = β + β log I + ( β) log Pop + u Equations () and () are the same if β3 = β or equivalently β + β3 =. To test whether the relation should be in scaled (by population) variables we test H 0 : β + β 3 =

41 against H : β + β 3 We use the F-test F = ESS0 ESS m ESS n K We must compute ESS0 and ESS. ESS is the sum of the squared residuals for the CLR model logc = β + β log I + β3 log Pop + u ESS0 is the sum of the squared residuals for CLR model under To find this we estimate the CLR model log C I + u Pop = β + Pop β log The ESS of this model is ESS. What is m? If there is restriction we can also use the t-test Consider CLR model () logc = β + β log I + ( β) log Pop + u This can be written as logc = β + I Pop + u Pop β log + log If we estimate the CLR model H, i.e. if β + β. 0 3 =

42 logc = γ + I Pop + u Pop γ log + γ 3 log we must test H γ against H γ 0 : 3 = and we can use the t-test. : 3 F-test and t-test will give same conclusion because if F = ESS0 ESS ESS n K and then T ˆ γ 3 γ 30 = std( ˆ γ ) 3 T = F Specification errors Consider the CLR model Y = β + β + + β K K + u In the specification of this model we can make two mistakes. Exclude a relevant variable. Include an irrelevant variable An explanatory variable β = 0 ) k is relevant if its regression coefficient β 0 (and irrelevant if k k What happens if you exclude a relevant variable? Consider the correct CLR model (3) Y β + β + + u = β3 3

43 and β 3 0. However we estimate (4) Y γ + + v = γ Note I use different symbols for coefficients and error term. What is the relation between β s and u and γ s and? v 3 Express the relation between and as δ + + w 3 = δ Substitution in (3) gives = ( β + δ β ) + ( β + δ β + u + β w Y 3 3) 3 Comparing with (4) we see γ + = β δβ3 γ β + δ β3 = = u + β w v 3 Hence if δ 0 the intercept is not that of the correct CLR model and if δ 0, i.e. if and 3 are related, then the slope is not the slope of the correct CLR model. If and are related and we exclude the relevant, then we estimate the sum of the 3 3 direct ( β ) and indirect (through 3 ) effect ( δ β3 ) of on Y. If we want the direct effect we estimate the wrong coefficient! This is called omitted variable bias. See example in CLR model for demand for bus travel: coefficient income was positive if included alone and negative after other (relevant) variables were included. Because we do not estimate β The standard error of ˆ γ is not equal to that of ˆβ. The confidence interval for γ is not equal to that of β The t-test for H 0 : γ = γ 0 is not that for H 0 : β = β 0. Is omitted variable bias a problem?

44 Depends on goal. Remember relation between selling price and area living room. For prediction we may not be interested in direct effect, but if we consider an addition we are.

45 Including an irrelevant variable Let the correct CLR model be We estimate Y β + + u = β (5) Y γ + γ + + v = γ 3 3 Now the relation between the two models is β = γ γ β = γ 0 v = u 3 = Hence the correct model is a special case. If we estimate we expect that ˆ γ 3 is close to 0 (and that the t-test for γ = 3 0 will not reject). Model (5) is a valid CLR model and the OLS estimator will estimate the direct effects. Standard errors and tests can be used to find confidence intervals and test hypotheses on these direct effects. It can be shown that if we estimate (5), then the standard error of ˆ γ is with std( ˆ β ) std( ˆ γ ) = r 3 std( ˆ β) the standard error of the OLS estimator of β in the correct model and r 3 3 the sample correlation between and. Hence the standard error increases if we include an irrelevant variable: the estimates are less precise.

46 Lecture. Multicollinearity Data, US Housing = new housing units started (thousands) Intrate = interest rate on mortgages GNP = GNP (billions of 98 $) Pop = US population (millions) Regressions with dependent variable housing and independent variables (and intercept) Intrate, GNP (model ) Intrate, Pop (model ) Intrate, Pop, GNP (model 3) Intrate (model 4) In regression with all variables the coefficients of Pop and GNP not significantly different from 0 F-test of hypothesis that these coefficients are both 0 F = ESS4 ESS ESS = = Hence we reject this hypothesis Also regression coefficients of Pop and GNP change from models, to model 3. What is going on? Consider relations among the explanatory variables Scatterplot GNP and Pop Scatterplot Intrate and GNP Relation between GNP and Pop almost exact Sample correlation coefficients r( GNP, POP) =.99

47 Dependent Variable: HOUSING Method: Least Squares Date: 0//0 Time: 06:5 Sample: Included observations: 3 Variable Coefficient Std. Error t-statistic Prob. C INTRATE POP R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic)

48 Dependent Variable: HOUSING Method: Least Squares Date: 0//0 Time: 06:54 Sample: Included observations: 3 Variable Coefficient Std. Error t-statistic Prob. C INTRATE GNP R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic)

49 Dependent Variable: HOUSING Method: Least Squares Date: 0//0 Time: 06:55 Sample: Included observations: 3 Variable Coefficient Std. Error t-statistic Prob. C INTRATE POP GNP R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid 478. Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic)

50 Dependent Variable: HOUSING Method: Least Squares Date: 0//0 Time: 06:56 Sample: Included observations: 3 Variable Coefficient Std. Error t-statistic Prob. C INTRATE R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic.60 Durbin-Watson stat Prob(F-statistic) 0.748

51 POP vs. GNP POP GNP

52 INTRATE vs. GNP 6 4 INTRATE GNP

53 r ( GNP, Intrate) =.88 r ( Intrate, POP) =.9 Consider CLR model Y + u i = β + β i + β 3 i3 i Assume that there is an exact relation between and 3 = γ + γ i3 i Substitute this relation in the CLR model Y i = β + β i + β ( γ + γ i = ( β + β γ ) + ( β + β γ ) ) + u i i + u i We basically have a model with explanatory variable Y + u i = δ + δ i i and all we can estimate is the intercept δ and the slope δ. If we have estimates for these coefficients can we get the original ones β, β, β 3? The relationship between the coefficients in the two models is = β β 3γ δ = β + β 3γ δ + Now the coefficients γ,γ in the exact relation between and 3 can be easily computed. However from the estimates from δ,δ we cannot solve for β, β, β 3 : equations for 3 unknowns. The problem is that we cannot distinguish between the effects of two variables that have an exact linear relation. This is an example of an identification problem: With the information that we have we cannot estimate all regression coefficients.

54 With exact linear relation the computer is not able to compute the OLS estimates of the regression coefficients. What happens if the relation between the explanatory variables is almost exact (as in our example)? Formulas for sample variance ˆ σ Var( β ) = S ( r 3 ˆ σ Var( β 3) = S ( 33 r 3 ) ) with S, S 33 the sampling variance of, 3 and r the sample correlation coefficient 3 of these variables. If sample correlation close to or (exact relation), then sampling variances large. If there is an exact relation between some explanatory variables, we have exact multicollinearity. This is rare. In the example we have multicollinearity, i.e. the relation is not exact, but strong enough that Individual coefficients are not individually significantly different from 0 Jointly they are significantly different from 0 This is also how you discover whether this problem exists. As with exact multicollinearity we cannot distinguish between the effect of and. 3 Important: Unless there is exact multicollinearity, the OLS estimators have the usual properties. In particular, they remain the best estimators. Multicollinearity is a problem with the data, not with the model. Some signs of this problem Low individual t-stats, but high F-stat for omission of these explanatory variables High sample correlations between these variables Coefficients change is one the variables is dropped

55 Solution depends on goal If multicollinearity among variables that were only included in order to avoid omitted variable bias is estimate of the effect of some other variable, then drop one of them. If you are not interested in individual coefficients, as in forecasting, ignore the problem. If you are interested in the effects of the variables that are closely related, then get more data or if not possible admit defeat. Severe multicollinearity as in the example is rare. Fear of multicollinearity could lead to the omission of explanatory variables. That is much worse. If inclusion leads to multicollinearity, you can always drop the variable again.

56 Lecture. Functional form Multiple linear regression model Y = β + β + + β K K + u Interpretation of regression coefficient β k : Change in Y if is changed by unit and the other variables are held constant. This change k Does not depend on the value of k Does not depend on the value of other variables Examples Relation between demand for electricity Y and its price. Price effect may 3 depend on temperature. Relation between consumption Y and income. Change in consumption due to extra income may decrease with income. Linear regression model seems not able to allow for this. Linear in coefficients or linear in variables Compare the following relations between Y and (omit error term u ). Y = β + + β β 3 β3. Y = β + β Both relations are nonlinear (see graphs) Relation is linear in the regression coefficients, i.e. it can be expressed as a linear relation between Y and independent variables,, 3 Define =, 3 =. For relation this is not possible (try!). If a nonlinear relation can be expressed as a linear relation by redefining variables we can estimate that relation using OLS. Review of (natural) logarithms and the exponential function (natural) logarithm log

57 exponential function e See graphs. Note Both functions increasing Logarithm only defined if is positive Some relations log e = loge = log a = a log log a = log a + log Derivatives a+ a e = e e d log = d de d = e Examples of nonlinear relations that can be expressed as a linear relation (independent variables, Z ; define the, 3, that make the relation linear; in some cases the dependent variable changes as well; u is omitted) Quadratic relation Y = β β + β + 3 Relation with interaction term Y = β + β + β3z Reciprocal relation Y = β + β Linear log relation Y = β + log β

58 Y

59 YP

61 Exponential relation β Y = e β + Power relation (here constant must be redefined as well) Y = γ β Z β 3 How do we analyze nonlinear relations? Graphs (only for independent variable) Derivatives: Change in Y associated with small change in dy Remember derivative in point (, Y ) on curve is the slope of the straight line that d touches the nonlinear relation in that point, i.e. the slope of the linear relation that approximates the nonlinear relation in that point. Example: Quadratic relation Y = β + + β β 3 Then dy d = β + β 3 Hence effect of small change of depends on (and may even change sign) Other use: Find value of that maximizes Y, e.g Y is income and is age if β = 0, β 3 =.5 the income maximal at age 40. In economics often interested in elasticities: relative change in Y associated with a small relative change in. Note relative changes do not depend on unit of measurement. Small relative change in variable : d or small relative percentage change

62 d 00 Elasticity (note 00 cancels if defined for percentage change) dy d Y = Y dy d In other words: Multiply derivative by Y In quadratic relation dy d Y = Y dy d = Y ( β + β3 ) Note depends on both and Y Remember d log = d Hence for small change in d log = d With this we can compute an elasticity as dy d Y d logy = d log Convenient if the (in)dependent variables are log s Log-log relation logy = β + β log

63 In this relation β is the elasticity of is the elasticity of Y with respect to. Also consider Semi-log relation Here logy = β + β β d Y = log d = dy Y d This is relative change in Y associated with unit change in. Note: percentage change is 00β. Relation with interaction Y = β + β + β3z If relation has more than independent variable, we look at the effect van holding Z constant. Note if Z constant Y = β + β 3 Z (we use instead of d to indicate that some variables are held constant) Note effect of depends on Z (compare electricity demand above). Special exponential relation Y t = e βt Y 0 with Y the value of Y in period and Y the value of t t 0 Y in the initial period. Relative change in Y

64 Yt Y Y t t Y0e = Y e βt β ( t ) 0 β ( t ) Y0e = e β Hence the coefficient in the relation log Y t = 0 logy + β t can be used to obtain the relative change/growth rate in Y Other relations: Lags in Y, With time series data we can define new variables by lags Y t, Y, t t t Note in period the value of Y is the value of Y in period t Linear relation with lagged variables Y = β + β Yt + β 3 t + β 4 t This expresses Delayed response Adjustment over time Habits/reluctance to change Some caveats in estimating nonlinear relations Graphs with one are misleading Do not compare R if dependent variables are different For rest estimation and testing as before Consider Y β + β + β + u = β + β + + u = 3 β 3 3 Then e.g. test of nonlinearity is test of H β 0 0 : 3 =

65 Lecture 3. Dummy variables Types of variables Continuous (income, height, weight, etc.) Discrete (gender, season, points scored etc.) Continuous variables have Origin, i.e. value is 0 Unit of measurement Often obvious, e.g. price in US$. In regression both origin and unit of measurement can be changed. Discrete variables: three types Counts, e.g. number of runs scored Ordinal, e.g. agree/neutral/disagree Nominal/categorical, e.g. gender With counts there is obvious origin and also unit of measurement is obvious Continuous variables and counts together are called quantitative variables With ordinal variables there is no origin and no unit of measurement, but there is an order With nominal variables there is no unit of measurement and no origin and even no order

66 Ordinal and nominal variables are called qualitative variables Discrete variables can be Dependent variable Independent variable If dependent variable is discrete various problems, e.g. in Y = α + β + u random error u cannot be continuous variable and hence cannot have a normal distribution In this lecture we consider qualitative variables as independent variables in linear regression models. To use a qualitative variable as an independent variables in a linear regression Y = α + β + u we must first attach numerical values to the categories. For this dummy/indicator variables are very useful. A dummy/indicator variable D is a variable that has two values: 0 and

67 Consider gender with categories female and male. We could choose or () Di = 0 Di= if i is female if i is male () D * i = 0 D * i = if i is male if i is female Because the labels are arbitrary this should not make a difference. Note the 0 is not the origin and is not the unit of measurement. They are just labels and we could have used and 99 instead (but that is not a convenient choice). The category with label 0 is called the control or reference category (I prefer reference category) Now consider the regression model Y = α + β D + u with D as in () and with What is the interpretation of Y is monthly salary. α, β? If assumption of the CLR model holds, then and hence E ( u D = 0) = E( u D = ) = 0

68 E ( Y D = 0) = α + E( u D = 0) = α with E ( Y D = ) = α + β + E( u D = ) = α + β E ( Y D = 0) is average monthly salary female employees (reference category) E ( Y D = ) is average monthly salary male employees This suggests for OLS estimators ˆ, α ˆ β αˆ = Y female and hence ˆ α + ˆ β = Y male βˆ = Ymale Y female Intercept is average for reference category Example: Sample of 49 employees n = 6, n = 3 male female Y = , Y = male Compare with regression results: female ˆ α =58.70, ˆ = β

69 Advantage of regression: direct confidence interval of/test for salary difference between male and female employees If we replace D by D *, i.e. now 0 indicates male and female we have the regression model and Y = α * + β * D * + u E ( Y D* = 0) = α * E ( Y D* = ) = α * + β * and hence ˆ* = Y male ˆβ * = Y Y α female male For the OLS estimates we find ˆ α * = ˆ β* = Note ˆ β* = ˆ β and standard error is identical: tests/confidence intervals give same conclusion. Is the result a proof of gender discrimination? Why (not)? Now consider two dummy variables Di = 0 if i is female Di = if i is male

70 and Di = 0 if i is nonwhite D i = if i is white We consider the following models () Y β + β D + D + u = β 3 3 () Y β + β D + β D + D D + u = 3 β 4 We consider the salary difference between men and women by ethnicity. In model () E( Y D =, D = 0) E( Y D = 0, D = 0) = β = = E( Y D =, D = ) E( Y D = 0, D = )

71 Restriction: Salary difference the same for whites and nonwhites In model () E ( Y D = β and =, D = 0) E( Y D = 0, D = 0) E ( Y D = β + β =, D = ) E( Y D = 0, D = ) Estimation results: Salary difference only for whites. Also: Race difference only for men. Model () has an interaction term D D. Next, we consider qualitative variable with more than categories Examples: State of residence, level of education, income category (grouped continuous variable) 4 S = 0 if no high school diploma S = if high school diploma, but no college degree S = if college degree Using S in this way is bad idea (why?) Instead we introduce two dummy variables

72 S = if high school diploma, but no college degree S = 0 otherwise and S = S = 0 if college degree otherwise Note: reference group has not a high school diploma Regression model Now Y β + β S + S + u = β 3 β is average of Y for reference group (no high school diploma) β + β is average of Y for group with high school diploma, but no college degree β + β 3 is average of Y for group with college degree How do you test

73 Education has no impact on income The return (in income) to having a college degree is 0 Give H 0 and indicate which test you want to use. Define S 3 = S 3 = 0 if no high school diploma otherwise Consider the regression model Y β + β S + β S + S + u = 3 β 4 3 Why can the coefficients of this model not be estimated? This is called the dummy variable trap Example: Monthly salary and type of work Maint=maintenance work Crafts=works in crafts Clerical=clerical work Reference category is professional Interpret the constant and the other coefficients. Combining quantitative and qualitative independent variables Consider the model

74 Y β + β D + + u = β 3 with Y is log of monthly salary, D is gender and is education (in years of schooling) In relation between Y and the intercept is β for women and β + β for men (see figure) Estimation results (what is interpretation of coefficient of gender?) Note that gender difference is not due to difference in level of education. Consider two other models (3) Y β + β + D + u = β3 In this model intercept is the same but slope is different for men and women (see figure) For women slope is β For men slope is β + β 3 (4) Y β + β D + β + D + u = 3 β 4 In this model both slope and intercept are different Model for women and for men Y β + + u = β3

75 ( 3 β 4 Y = β + β + β + ) + u This amounts to splitting the sample and estimating two separate regressions OLS estimates Advantage dummy approach: Tests

Lecture 8. Using the CLR Model

Lecture 8. Using the CLR Model Example of regression analysis. Relation between patent applications and R&D spending Variables PATENTS = No. of patents (in 1000) filed RDEXP = Expenditure on research&development