Lecture 8. Using the CLR Model. Relation between patent applications and R&D spending. Variables

Size: px
Start display at page:

Download "Lecture 8. Using the CLR Model. Relation between patent applications and R&D spending. Variables"

Transcription

1 Lecture 8. Using the CLR Model Relation between patent applications and R&D spending Variables PATENTS = No. of patents (in 000) filed RDEP = Expenditure on research&development (in billions of 99 $) The data are time series for (34 observations) First step in analysis: descriptive statistics and graphs Sample mean, standard deviation etc. of variables With time-series data, time-series plot (scale!). Note units are chosen to make the range of the variables comparable Scatterplot of PATENTS against RDEP

2 Date: 09/6/0 Time: 3:35 Sample: PATENTS RDEP Mean Median Maximum Minimum Std. Dev Skewness Kurtosis Jarque-Bera Probability Observations 34 34

3 PATENTS RDEP

4 PATENTS vs. RDEP PATENTS RDEP

5 Second step: estimate the relation PATENTS = α + β RDEP + u t (Note that I use subscript t to reflect the nature of the data) We assume that this inexact linear relation satisfies the assumption of the Classical Linear Regression (CLR) model t t Assumption : Assumption : u t, t =, K, n are random variables with E ( ) = 0 t, t =, K, n are deterministic, i.e. non-random, constants. u t Assumption 3 (Homoskedasticity) All u t ' s have the same variance, i.e. for t =, K, n Var ( ut ) = E( ut ) = σ Assumption 4 (No serial correlation) The random errors u t and u s are not correlated for all t s =, K, n If these assumptions hold then the best estimator for α, β is the OLS estimator. The output of a computer program that computes the OLS estimates and other quantities is reproduced next. The formulas that the computer uses are OLS estimates ˆβ n i i = = n ( )( Y i= ( i i ) Y ) αˆ = Y βˆ

6 3 Standard errors of estimates and standard error of regression ) ( ˆ) ( s n Std n i i n i i = = = α = = n i i s Std ) ( ˆ) (β = = n i e i n s t-statistic ˆ) ( / αˆ α α Std T = ) ˆ ( / ˆ β β β Std T = R = = = n i i n i i Y Y Y Y R ) ( ˆ ) ˆ (

7 Dependent Variable: PATENTS Method: Least Squares Date: 09/6/0 Time: :7 Sample: Included observations: 34 Variable Coefficient Std. Error t-statistic Prob. C RDEP R-squared Mean dependent var 9.38 Adjusted R-squared S.D. dependent var S.E. of regression.737 Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic)

8 Computer output Estimates: Interpretation coefficient RDEP: if RDEP changes by unit, i.e. billion of 99 $, then PATENTS changes by.799, i.e. by about 79. Estimates and standard errors: We use the standard error to find a 95% confidence interval for β. The formula for the bounds is βˆ ± c Std ( βˆ) with c the value such that Pr( T > c) =. 05 where T has a t distribution with 34-=3 degrees of freedom (df). From table in front cover c Hence, the 95% interval: [. 6764,.9075] Note if we use c = (rule of thumb) or c =. 96 (standard normal), the result is close. Test: We test H 0 : β = 0 against H : β 0 (two-sided test). If H 0 : β = 0 is true, then T = βˆ / Std( βˆ) has a t distribution with 34-=3 degrees of freedom (df). We reject if β T β > c, with c such that Pr( T > c) =. 05 for t distribution with 3 df. Hence c.038. If H : β > 0, then c is such that Pr( T > c) =. 05 for t distribution with 3 df or c For both H s we reject H 0. Alternative way to report result of test is p-value p-value= = Pr( T > t ) = Pr( T > ) = β β where T β has the distribution that holds if H 0 is true, i.e. t distribution with 3 df. If the p-value is less than.05, we reject at the 5% level. Note F-statistic=(t-statistic β ) Fit of linear relation: The R seems high, but as we shall see that does not imply that all is well. 4

9 Reporting regression results Always report Estimates of the regression coefficients Their standard errors (preferred to t-ratios. Why?) Estimate of σ or σ, i.e. s or s R (least useful, but most users want to know) If the number variables is small two options in reporting the results As equation PATENTS = RDEP (6.36) (.057) s =.7 R =. 86 Or in a table OLS estimates (standard errors) regression of no. of patents (in 000) on R&D expenditure (in billion 99 $); Constant (6.37) RDEP.79 (.057) Std. Error regression.7 R.86 5

10 How well does the model fit? With time-series we can plot the observed Y t and fitted/predicted Y ˆ t = αˆ + βˆ t and also e = Y Yˆ (see graph) t t t The fit seems to be good, but there is a clear pattern in the OLS residuals. In the scatterplot I plot et against e t, i.e. I investigate whether subsequent OLS residuals are correlated. Note the clear relation that indicates that Assumption 4 (no serial correlation) may not be correct. To check assumption 3 I plot et against i.e. the variance of u t (estimated by RDEP t. There is evidence of heteroskedasticity, e t ) is related to To check the normal distribution of u t I plot the distribution of e t. t and not the same for all t. 6

11 Residual Actual Fitted

12 RESID vs. RESIDL RESID RESIDL

13 RESID vs. RDEP RESID RDEP

14 Kernel Density (Epanechnikov, h = 9.4) RESID

15 Forecasting If RDEP 70 (not in sample), then we predict 994 = PATENTSF 994 = = 69. The variance of the prediction error is s = s ( n+ ) + + n ( i ) i= n+ n The square root (standard deviation of prediction error) is.89 (compare with s =. 7 ) A 95% prediction interval is, using equal to Y ˆ n+.038sn+ < Yn + < Yn ˆ s n < PATENTS 994 <

16 Now a rather curious relation Time-series Births= No. of births (in 0000), West- Germany Storks=No. of stork couples in Baden-Wurtemberg 8

17 BIRTHS STORKS

18 BIRTHS vs. STORKS BIRTHS STORKS

19 Dependent Variable: BIRTHS Method: Least Squares Date: 09/6/0 Time: 3:43 Sample: Included observations: Variable Coefficient Std. Error t-statistic Prob. C STORKS R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat.0030 Prob(F-statistic)

20 Residual Actual Fitted

21 RESID vs. RESIDL 5 0 RESID RESIDL

22 Dependent Variable: DBIRTHS Method: Least Squares Date: 09/6/0 Time: 3:5 Sample(adjusted): Included observations: after adjusting endpoints Variable Coefficient Std. Error t-statistic Prob. C DSTORKS R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic) 0.057

23 RESID vs. RESIDDIFL RESID RESIDDIFL

24 Lecture 9. The multiple Classical Linear Regression model Often we want to estimate the relation between a dependent variable Y and say K independent variables, K,. K Even with K explanatory variables, the linear relation is not exact: Y = β + β + L+ β K K + u with u the random error term that captures the effect of the omitted variables. Note if the relation has an intercept, then and β is the intercept of the relation. If we have n observations Y,, K,, i =, K, n, they satisfy Y = β + β + L+ β + u i i i i i K ik ik i for i =, K, n. Why do we want to include more than explanatory variable?. Because we are interested on the relation between all K variables and Y. Because we want to estimate the direct effect of one variable and include the other variables for that purpose

25 To see. consider the relation ( K = 3 ) Y β + β + + u = β 3 3 Even if we are only interested in the effect of on Y we cannot omit 3, if and 3 are related. For instance, if γ + + v 3 = γ then after substitution we find Y = ( β + β 3γ ) + ( β + β 3γ ) + u + β 3v In the relation between Y and,, the coefficient of is the sum of the direct effect β and the indirect effect β 3γ of on Y. If we include 3 in the relation, the coefficient of is β, i.e. the direct effect of on Y. The assumptions are the same as before Assumption : Assumption : u i, i =, K, n are random variables with E ( ) = 0 ik, i =, K, n, k =, K, K are deterministic, i.e. non-random, constants. u i Assumption 3 (Homoskedasticity) All u i ' s have the same variance, i.e. for i =, K, n Var ( ui) = E( ui ) = σ Assumption 4 (No serial correlation) The random errors u i and u j are not correlated for all i j =, K, n Cov ( u, u ) E( u u ) = 0 i j = i j

26 The inexact linear relation for i =, K, n Y = β + β + L+ β + u i i i K ik i with assumptions -4 is the multiple Classical Linear Regression (CLR) model (remember with an intercept i =, i =,, n ) K As in the simple CLR model the estimators of the regression parameters found by minimizing the sum of squared residuals n, K, β K ) = ( Yi β i β i L β K ik ) i= S( β β,, β K are K The β ˆ, K, βˆ that minimize the sum of squared residuals are Ordinary Least Squares K (OLS) estimators of β, K, β. K The OLS residuals are e i = Y βˆ ˆ L i i β i βˆ K ik The (unbiased) estimator of σ is Note n K = n e i i= s = n K no. of observations - no. of regression coefficients (including intercept) The OLS residuals have the same properties as in the regression model with regressor: for k =, K, K () ik e i n i= = 0 In words: The sample covariance of the OLS residuals and all regressors is 0. n e i i= For k = we have = i and hence an intercept. Goodness of fit = 0. Note: this holds if the regression model has

27 Define the fitted value as before Yˆ i = βˆ L i + + βˆ K ik By definition Y = Y ˆ + e i i i Because of () the sample covariance of the fitted values and the OLS residuals is 0. If the model has an intercept, then n n ( Yi Y ) = ( Yˆ i Y ) + i= i= i= Total Explained Unexplained Variation Variation Variation Total Sum Regression Error of Squares Sum of Sum of (TSS) Squares Squares (RSS) (ESS) n e i The R or Coefficient of Determination is defined as R RSS = = TSS ESS TSS The R increases if a regressor is added to the model. Why? Hint: Consider sum of squared residuals.

28 Adjusted (for degrees of freedom) R decreases if added variable does not explain much, i.e. if ESS does not decrease much, If ESS, R ESS /( n K) = TSS /( n ) K + ESSK are the ESS for the model with K + ESS K + < ESSK, and the respectively, then decreases if (one added) and K regressors, R increases if K + is added, but ESS K + ESSK K + > ESS K n K i.e. if the relative decrease is not big enough. For instance for n = 00, K = 0, the relative decrease has to be at least.% to lead to an improvement in arbitrary (and it is!) R R. This cutoff value looks In Section 4.3 many more criteria that balance decrease in ESS and degrees of freedom: Forget about them. Application: Demand for bus travel (Section 4.6 of Ramanathan) Variables BUSTRAVL = Demand for bus travel (000 of passenger hours) FARE = Bus fare in $ GASPRICE = Price of gallon of gasoline ($) INCOME = Average per capita income in $ DENSITY = Population density (persons/sq. mile) LANDAREA = Area of city (sq. miles) Data for 40 US cities in 988 Question: Does demand for bus decrease if average income increases? Important for examining effect economic growth on bus system. Reported are Descriptive statistics Scatterplot BUSTRVL and INCOME OLS estimates in simple linear regression model OLS estimates in multiple linear regression model Note in simple model INCOME has positive effect (be it that coefficient is not significantly different from 0)

29 If we include the other variables the effect is negative: $ more average income per head gives about 95 fewer passenger hours. Note that the fit improves dramatically from R =. 05 to R =. 9.

30 Date: 0/0/0 Time: :09 Sample: 40 BUSTRAVL DENSITY FARE GASPRICE INCOME LANDAREA Mean Median Maximum Minimum Std. Dev Skewness Kurtosis Jarque-Bera Probability Observations Date: 0/0/0 Time: :09 Sample: 40 POP Mean Median Maximum Minimum Std. Dev Skewness Kurtosis 9.95 Jarque-Bera Probability Observations 40

31 BUSTRAVL vs. INCOME 5000 BUSTRAVL INCOME

32 Dependent Variable: BUSTRAVL Method: Least Squares Date: 0/0/0 Time: :05 Sample: 40 Included observations: 40 Variable Coefficient Std. Error t-statistic Prob. C INCOME R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid.9e+08 Schwarz criterion Log likelihood F-statistic Durbin-Watson stat.8387 Prob(F-statistic) 0.558

33 Dependent Variable: BUSTRAVL Method: Least Squares Date: 0/0/0 Time: :07 Sample: 40 Included observations: 40 Variable Coefficient Std. Error t-statistic Prob. C INCOME FARE GASPRICE POP DENSITY LANDAREA R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid 8367 Schwarz criterion 6.5 Log likelihood F-statistic Durbin-Watson stat.0867 Prob(F-statistic)

34 The sampling distribution of β, ˆ K, βˆ K Under assumptions -4 the sampling distribution of β ˆ, K, βˆ has mean K β, K, β and a K sampling variance that is proportional to σ. The square root of the estimated (estimate σ by s ) variance of the individual regression coefficients are called the standard errors of the regression coefficients. Regression programs always report both βˆ and std β ˆ ), the standard error of βˆ k. The OLS estimator is also the Best Linear Unbiased Estimator (BLUE) in the multiple CLR model. If we make the additional assumption k ( k Assumption 5. The random error terms distribution. u i, i =, K, n are random variables with a normal We can derive the sampling distribution of β ˆ, K, βˆ and K s. The sampling distribution of β,, βˆ β, K, β. ˆ K multivariate normal with mean K K It can be shown that for all k =, K, K T k = βˆ k β k std( βˆ ) k has a t-distribution with n K degrees of freedom. As in the simple CLR model we can use this to find a confidence interval for done in the same way. β k. This is

35 Hypothesis testing in the multiple CLR model We consider two cases. Hypotheses involving coefficient. Hypotheses involving or more coefficients. Hypotheses involving coefficient For regression coefficient H β = β : 0 k k 0 β k the hypothesis is with alternative hypothesis or H β β two-sided alternative : k k 0 H β > β one-sided alternative : k k 0 In these hypotheses β k 0 is a hypothesized value, e.g. β = k 0 0 if the null hypothesis is no effect of k on Y. The test is based on the statistic T k = βˆ k β k 0 std( βˆ ) k If H 0 is not true, then T k is likely either large negative or large positive. For two-sided alternative, H : β k β k 0, we reject if T k > c and for a one-sided alternative, H β > β, if T k > d. : k k 0 For a test with a 5% confidence level, the cutoff values or Pr( T k > c) =.05, i.e. Pr( T k ) > c) =. 05 (two-sided) Pr( T k > d) =.05 (one-sided) where T k has a t-distribution with n K d.f. c, d are chosen such that

36 In example, n = 40, K = 7, c. 035, d. 695 Note that the coefficients of INCOME, POP are significantly different from 0 at the 5% level. The coefficient of DENSITY is significantly different from 0 at the 0% level or if we test with a one-sided alternative.

37 . Hypothesis involving or more coefficient As example consider the question whether the model with only INCOME as regressor is adequate. In the more general model with INCOME, FARE, GASPRICE, POP, DENSITY, LANDAREA this corresponds to the hypothesis that the coefficients of FARE, GASPRICE, POP, DENSITY, LANDAREA, β, β, β, β β are all 0, i.e , 7 H 0 : β 3 = 0, β 4 = 0, β 5 = 0, β 6 = 0, β 7 = 0 with alternative hypothesis H :One of these coefficien ts is not 0 How do we test this? Idea: If H 0 is true then the model with only INCOME should fit as well as the model with all regressors. Measure of fit is sum of squared OLS residuals n e i. i= Denote sum of squared residuals if H 0 is true by ESS 0. This is ESS if only INCOME is included. The ESS if H is true is denoted by ESS. Note: ESS 0 ESS. Why? We consider F ESS0 ESS = ESS Numerator: Difference of restricted ESS ( ESS 0 ) and unrestricted ESS ( ESS ) divided by the number of restrictions. Denominator: Estimator of We reject if F is large, i.e. σ in the model with all variables. F > c. If 0 H is true then F has an F-distribution with degrees of freedom 5 (numerator) and 35 (denominator).

38 From Table of F-distribution, we find that e.g. if c.49. In example: Pr( F > c) =.05 ESS = ESS = ESS0 ESS = F = = Conclusion: We reject H 0 with only INCOME is not adequate. In general, we consider at the 5% level (and at the % level; check this) and the model F = ESS0 ESS m ESS n K with m the number of restrictions and we reject H 0 if F is (too) large. This is the F test or Wald test (Wald test is really F test if n is large)

39 Special case: Test of overall significance H 0 : β = L = β K = i.e. all regression coefficients except intercept are 0. Hence, m = K 0 In example, the F statistic for this hypothesis is 64.4 with under H 0, m = 6 and n K = 33 d.f. Hence, we reject H 0 and we conclude that the regressors are needed to explain BUSTRAVL.

40 Lecture 0. Multiple CLR model: Specification errors and Multicollinearity Testing linear restrictions In the multiple CLR model Y = β + β K K + u we may want to test other types of hypotheses involving more than regression coefficient. Example Y = logc with C = aggregate consumption in a year = log I with I = aggregate income in a year = log Pop with Pop = Population size in a year Consider relations () logc = β + β log I + β3 log Pop + u and log C I + u Pop = β + Pop β log In second equation the relation is between consumption per capita and income per capita Because e.g. log I = log I log Pop Pop we rewrite the second equation as logc log Pop = β + β log I β log Pop + u or () logc = β + β log I + ( β) log Pop + u Equations () and () are the same if β3 = β or equivalently β + β3 =. To test whether the relation should be in scaled (by population) variables we test H 0 : β + β 3 =

41 against H : β + β 3 We use the F-test F = ESS0 ESS m ESS n K We must compute ESS0 and ESS. ESS is the sum of the squared residuals for the CLR model logc = β + β log I + β3 log Pop + u ESS0 is the sum of the squared residuals for CLR model under To find this we estimate the CLR model log C I + u Pop = β + Pop β log The ESS of this model is ESS. What is m? If there is restriction we can also use the t-test Consider CLR model () logc = β + β log I + ( β) log Pop + u This can be written as logc = β + I Pop + u Pop β log + log If we estimate the CLR model H, i.e. if β + β. 0 3 =

42 logc = γ + I Pop + u Pop γ log + γ 3 log we must test H γ against H γ 0 : 3 = and we can use the t-test. : 3 F-test and t-test will give same conclusion because if F = ESS0 ESS ESS n K and then T ˆ γ 3 γ 30 = std( ˆ γ ) 3 T = F Specification errors Consider the CLR model Y = β + β + + β K K + u In the specification of this model we can make two mistakes. Exclude a relevant variable. Include an irrelevant variable An explanatory variable β = 0 ) k is relevant if its regression coefficient β 0 (and irrelevant if k k What happens if you exclude a relevant variable? Consider the correct CLR model (3) Y β + β + + u = β3 3

43 and β 3 0. However we estimate (4) Y γ + + v = γ Note I use different symbols for coefficients and error term. What is the relation between β s and u and γ s and? v 3 Express the relation between and as δ + + w 3 = δ Substitution in (3) gives = ( β + δ β ) + ( β + δ β + u + β w Y 3 3) 3 Comparing with (4) we see γ + = β δβ3 γ β + δ β3 = = u + β w v 3 Hence if δ 0 the intercept is not that of the correct CLR model and if δ 0, i.e. if and 3 are related, then the slope is not the slope of the correct CLR model. If and are related and we exclude the relevant, then we estimate the sum of the 3 3 direct ( β ) and indirect (through 3 ) effect ( δ β3 ) of on Y. If we want the direct effect we estimate the wrong coefficient! This is called omitted variable bias. See example in CLR model for demand for bus travel: coefficient income was positive if included alone and negative after other (relevant) variables were included. Because we do not estimate β The standard error of ˆ γ is not equal to that of ˆβ. The confidence interval for γ is not equal to that of β The t-test for H 0 : γ = γ 0 is not that for H 0 : β = β 0. Is omitted variable bias a problem?

44 Depends on goal. Remember relation between selling price and area living room. For prediction we may not be interested in direct effect, but if we consider an addition we are.

45 Including an irrelevant variable Let the correct CLR model be We estimate Y β + + u = β (5) Y γ + γ + + v = γ 3 3 Now the relation between the two models is β = γ γ β = γ 0 v = u 3 = Hence the correct model is a special case. If we estimate we expect that ˆ γ 3 is close to 0 (and that the t-test for γ = 3 0 will not reject). Model (5) is a valid CLR model and the OLS estimator will estimate the direct effects. Standard errors and tests can be used to find confidence intervals and test hypotheses on these direct effects. It can be shown that if we estimate (5), then the standard error of ˆ γ is with std( ˆ β ) std( ˆ γ ) = r 3 std( ˆ β) the standard error of the OLS estimator of β in the correct model and r 3 3 the sample correlation between and. Hence the standard error increases if we include an irrelevant variable: the estimates are less precise.

46 Lecture. Multicollinearity Data, US Housing = new housing units started (thousands) Intrate = interest rate on mortgages GNP = GNP (billions of 98 $) Pop = US population (millions) Regressions with dependent variable housing and independent variables (and intercept) Intrate, GNP (model ) Intrate, Pop (model ) Intrate, Pop, GNP (model 3) Intrate (model 4) In regression with all variables the coefficients of Pop and GNP not significantly different from 0 F-test of hypothesis that these coefficients are both 0 F = ESS4 ESS ESS = = Hence we reject this hypothesis Also regression coefficients of Pop and GNP change from models, to model 3. What is going on? Consider relations among the explanatory variables Scatterplot GNP and Pop Scatterplot Intrate and GNP Relation between GNP and Pop almost exact Sample correlation coefficients r( GNP, POP) =.99

47 Dependent Variable: HOUSING Method: Least Squares Date: 0//0 Time: 06:5 Sample: Included observations: 3 Variable Coefficient Std. Error t-statistic Prob. C INTRATE POP R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic)

48 Dependent Variable: HOUSING Method: Least Squares Date: 0//0 Time: 06:54 Sample: Included observations: 3 Variable Coefficient Std. Error t-statistic Prob. C INTRATE GNP R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic)

49 Dependent Variable: HOUSING Method: Least Squares Date: 0//0 Time: 06:55 Sample: Included observations: 3 Variable Coefficient Std. Error t-statistic Prob. C INTRATE POP GNP R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid 478. Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic)

50 Dependent Variable: HOUSING Method: Least Squares Date: 0//0 Time: 06:56 Sample: Included observations: 3 Variable Coefficient Std. Error t-statistic Prob. C INTRATE R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic.60 Durbin-Watson stat Prob(F-statistic) 0.748

51 POP vs. GNP POP GNP

52 INTRATE vs. GNP 6 4 INTRATE GNP

53 r ( GNP, Intrate) =.88 r ( Intrate, POP) =.9 Consider CLR model Y + u i = β + β i + β 3 i3 i Assume that there is an exact relation between and 3 = γ + γ i3 i Substitute this relation in the CLR model Y i = β + β i + β ( γ + γ i = ( β + β γ ) + ( β + β γ ) ) + u i i + u i We basically have a model with explanatory variable Y + u i = δ + δ i i and all we can estimate is the intercept δ and the slope δ. If we have estimates for these coefficients can we get the original ones β, β, β 3? The relationship between the coefficients in the two models is = β β 3γ δ = β + β 3γ δ + Now the coefficients γ,γ in the exact relation between and 3 can be easily computed. However from the estimates from δ,δ we cannot solve for β, β, β 3 : equations for 3 unknowns. The problem is that we cannot distinguish between the effects of two variables that have an exact linear relation. This is an example of an identification problem: With the information that we have we cannot estimate all regression coefficients.

54 With exact linear relation the computer is not able to compute the OLS estimates of the regression coefficients. What happens if the relation between the explanatory variables is almost exact (as in our example)? Formulas for sample variance ˆ σ Var( β ) = S ( r 3 ˆ σ Var( β 3) = S ( 33 r 3 ) ) with S, S 33 the sampling variance of, 3 and r the sample correlation coefficient 3 of these variables. If sample correlation close to or (exact relation), then sampling variances large. If there is an exact relation between some explanatory variables, we have exact multicollinearity. This is rare. In the example we have multicollinearity, i.e. the relation is not exact, but strong enough that Individual coefficients are not individually significantly different from 0 Jointly they are significantly different from 0 This is also how you discover whether this problem exists. As with exact multicollinearity we cannot distinguish between the effect of and. 3 Important: Unless there is exact multicollinearity, the OLS estimators have the usual properties. In particular, they remain the best estimators. Multicollinearity is a problem with the data, not with the model. Some signs of this problem Low individual t-stats, but high F-stat for omission of these explanatory variables High sample correlations between these variables Coefficients change is one the variables is dropped

55 Solution depends on goal If multicollinearity among variables that were only included in order to avoid omitted variable bias is estimate of the effect of some other variable, then drop one of them. If you are not interested in individual coefficients, as in forecasting, ignore the problem. If you are interested in the effects of the variables that are closely related, then get more data or if not possible admit defeat. Severe multicollinearity as in the example is rare. Fear of multicollinearity could lead to the omission of explanatory variables. That is much worse. If inclusion leads to multicollinearity, you can always drop the variable again.

56 Lecture. Functional form Multiple linear regression model Y = β + β + + β K K + u Interpretation of regression coefficient β k : Change in Y if is changed by unit and the other variables are held constant. This change k Does not depend on the value of k Does not depend on the value of other variables Examples Relation between demand for electricity Y and its price. Price effect may 3 depend on temperature. Relation between consumption Y and income. Change in consumption due to extra income may decrease with income. Linear regression model seems not able to allow for this. Linear in coefficients or linear in variables Compare the following relations between Y and (omit error term u ). Y = β + + β β 3 β3. Y = β + β Both relations are nonlinear (see graphs) Relation is linear in the regression coefficients, i.e. it can be expressed as a linear relation between Y and independent variables,, 3 Define =, 3 =. For relation this is not possible (try!). If a nonlinear relation can be expressed as a linear relation by redefining variables we can estimate that relation using OLS. Review of (natural) logarithms and the exponential function (natural) logarithm log

57 exponential function e See graphs. Note Both functions increasing Logarithm only defined if is positive Some relations log e = loge = log a = a log log a = log a + log Derivatives a+ a e = e e d log = d de d = e Examples of nonlinear relations that can be expressed as a linear relation (independent variables, Z ; define the, 3, that make the relation linear; in some cases the dependent variable changes as well; u is omitted) Quadratic relation Y = β β + β + 3 Relation with interaction term Y = β + β + β3z Reciprocal relation Y = β + β Linear log relation Y = β + log β

58 Y

59 YP

60

61 Exponential relation β Y = e β + Power relation (here constant must be redefined as well) Y = γ β Z β 3 How do we analyze nonlinear relations? Graphs (only for independent variable) Derivatives: Change in Y associated with small change in dy Remember derivative in point (, Y ) on curve is the slope of the straight line that d touches the nonlinear relation in that point, i.e. the slope of the linear relation that approximates the nonlinear relation in that point. Example: Quadratic relation Y = β + + β β 3 Then dy d = β + β 3 Hence effect of small change of depends on (and may even change sign) Other use: Find value of that maximizes Y, e.g Y is income and is age if β = 0, β 3 =.5 the income maximal at age 40. In economics often interested in elasticities: relative change in Y associated with a small relative change in. Note relative changes do not depend on unit of measurement. Small relative change in variable : d or small relative percentage change

62 d 00 Elasticity (note 00 cancels if defined for percentage change) dy d Y = Y dy d In other words: Multiply derivative by Y In quadratic relation dy d Y = Y dy d = Y ( β + β3 ) Note depends on both and Y Remember d log = d Hence for small change in d log = d With this we can compute an elasticity as dy d Y d logy = d log Convenient if the (in)dependent variables are log s Log-log relation logy = β + β log

63 In this relation β is the elasticity of is the elasticity of Y with respect to. Also consider Semi-log relation Here logy = β + β β d Y = log d = dy Y d This is relative change in Y associated with unit change in. Note: percentage change is 00β. Relation with interaction Y = β + β + β3z If relation has more than independent variable, we look at the effect van holding Z constant. Note if Z constant Y = β + β 3 Z (we use instead of d to indicate that some variables are held constant) Note effect of depends on Z (compare electricity demand above). Special exponential relation Y t = e βt Y 0 with Y the value of Y in period and Y the value of t t 0 Y in the initial period. Relative change in Y

64 Yt Y Y t t Y0e = Y e βt β ( t ) 0 β ( t ) Y0e = e β Hence the coefficient in the relation log Y t = 0 logy + β t can be used to obtain the relative change/growth rate in Y Other relations: Lags in Y, With time series data we can define new variables by lags Y t, Y, t t t Note in period the value of Y is the value of Y in period t Linear relation with lagged variables Y = β + β Yt + β 3 t + β 4 t This expresses Delayed response Adjustment over time Habits/reluctance to change Some caveats in estimating nonlinear relations Graphs with one are misleading Do not compare R if dependent variables are different For rest estimation and testing as before Consider Y β + β + β + u = β + β + + u = 3 β 3 3 Then e.g. test of nonlinearity is test of H β 0 0 : 3 =

65 Lecture 3. Dummy variables Types of variables Continuous (income, height, weight, etc.) Discrete (gender, season, points scored etc.) Continuous variables have Origin, i.e. value is 0 Unit of measurement Often obvious, e.g. price in US$. In regression both origin and unit of measurement can be changed. Discrete variables: three types Counts, e.g. number of runs scored Ordinal, e.g. agree/neutral/disagree Nominal/categorical, e.g. gender With counts there is obvious origin and also unit of measurement is obvious Continuous variables and counts together are called quantitative variables With ordinal variables there is no origin and no unit of measurement, but there is an order With nominal variables there is no unit of measurement and no origin and even no order

66 Ordinal and nominal variables are called qualitative variables Discrete variables can be Dependent variable Independent variable If dependent variable is discrete various problems, e.g. in Y = α + β + u random error u cannot be continuous variable and hence cannot have a normal distribution In this lecture we consider qualitative variables as independent variables in linear regression models. To use a qualitative variable as an independent variables in a linear regression Y = α + β + u we must first attach numerical values to the categories. For this dummy/indicator variables are very useful. A dummy/indicator variable D is a variable that has two values: 0 and

67 Consider gender with categories female and male. We could choose or () Di = 0 Di= if i is female if i is male () D * i = 0 D * i = if i is male if i is female Because the labels are arbitrary this should not make a difference. Note the 0 is not the origin and is not the unit of measurement. They are just labels and we could have used and 99 instead (but that is not a convenient choice). The category with label 0 is called the control or reference category (I prefer reference category) Now consider the regression model Y = α + β D + u with D as in () and with What is the interpretation of Y is monthly salary. α, β? If assumption of the CLR model holds, then and hence E ( u D = 0) = E( u D = ) = 0

68 E ( Y D = 0) = α + E( u D = 0) = α with E ( Y D = ) = α + β + E( u D = ) = α + β E ( Y D = 0) is average monthly salary female employees (reference category) E ( Y D = ) is average monthly salary male employees This suggests for OLS estimators ˆ, α ˆ β αˆ = Y female and hence ˆ α + ˆ β = Y male βˆ = Ymale Y female Intercept is average for reference category Example: Sample of 49 employees n = 6, n = 3 male female Y = , Y = male Compare with regression results: female ˆ α =58.70, ˆ = β

69 Advantage of regression: direct confidence interval of/test for salary difference between male and female employees If we replace D by D *, i.e. now 0 indicates male and female we have the regression model and Y = α * + β * D * + u E ( Y D* = 0) = α * E ( Y D* = ) = α * + β * and hence ˆ* = Y male ˆβ * = Y Y α female male For the OLS estimates we find ˆ α * = ˆ β* = Note ˆ β* = ˆ β and standard error is identical: tests/confidence intervals give same conclusion. Is the result a proof of gender discrimination? Why (not)? Now consider two dummy variables Di = 0 if i is female Di = if i is male

70 and Di = 0 if i is nonwhite D i = if i is white We consider the following models () Y β + β D + D + u = β 3 3 () Y β + β D + β D + D D + u = 3 β 4 We consider the salary difference between men and women by ethnicity. In model () E( Y D =, D = 0) E( Y D = 0, D = 0) = β = = E( Y D =, D = ) E( Y D = 0, D = )

71 Restriction: Salary difference the same for whites and nonwhites In model () E ( Y D = β and =, D = 0) E( Y D = 0, D = 0) E ( Y D = β + β =, D = ) E( Y D = 0, D = ) Estimation results: Salary difference only for whites. Also: Race difference only for men. Model () has an interaction term D D. Next, we consider qualitative variable with more than categories Examples: State of residence, level of education, income category (grouped continuous variable) 4 S = 0 if no high school diploma S = if high school diploma, but no college degree S = if college degree Using S in this way is bad idea (why?) Instead we introduce two dummy variables

72 S = if high school diploma, but no college degree S = 0 otherwise and S = S = 0 if college degree otherwise Note: reference group has not a high school diploma Regression model Now Y β + β S + S + u = β 3 β is average of Y for reference group (no high school diploma) β + β is average of Y for group with high school diploma, but no college degree β + β 3 is average of Y for group with college degree How do you test

73 Education has no impact on income The return (in income) to having a college degree is 0 Give H 0 and indicate which test you want to use. Define S 3 = S 3 = 0 if no high school diploma otherwise Consider the regression model Y β + β S + β S + S + u = 3 β 4 3 Why can the coefficients of this model not be estimated? This is called the dummy variable trap Example: Monthly salary and type of work Maint=maintenance work Crafts=works in crafts Clerical=clerical work Reference category is professional Interpret the constant and the other coefficients. Combining quantitative and qualitative independent variables Consider the model

74 Y β + β D + + u = β 3 with Y is log of monthly salary, D is gender and is education (in years of schooling) In relation between Y and the intercept is β for women and β + β for men (see figure) Estimation results (what is interpretation of coefficient of gender?) Note that gender difference is not due to difference in level of education. Consider two other models (3) Y β + β + D + u = β3 In this model intercept is the same but slope is different for men and women (see figure) For women slope is β For men slope is β + β 3 (4) Y β + β D + β + D + u = 3 β 4 In this model both slope and intercept are different Model for women and for men Y β + + u = β3

75 ( 3 β 4 Y = β + β + β + ) + u This amounts to splitting the sample and estimating two separate regressions OLS estimates Advantage dummy approach: Tests

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

Lecture 8. Using the CLR Model

Lecture 8. Using the CLR Model Lecture 8. Using the CLR Model Example of regression analysis. Relation between patent applications and R&D spending Variables PATENTS = No. of patents (in 1000) filed RDEXP = Expenditure on research&development

More information

Lecture 12. Functional form

Lecture 12. Functional form Lecture 12. Functional form Multiple linear regression model β1 + β2 2 + L+ β K K + u Interpretation of regression coefficient k Change in if k is changed by 1 unit and the other variables are held constant.

More information

CHAPTER 6: SPECIFICATION VARIABLES

CHAPTER 6: SPECIFICATION VARIABLES Recall, we had the following six assumptions required for the Gauss-Markov Theorem: 1. The regression model is linear, correctly specified, and has an additive error term. 2. The error term has a zero

More information

4. Nonlinear regression functions

4. Nonlinear regression functions 4. Nonlinear regression functions Up to now: Population regression function was assumed to be linear The slope(s) of the population regression function is (are) constant The effect on Y of a unit-change

More information

Outline. 2. Logarithmic Functional Form and Units of Measurement. Functional Form. I. Functional Form: log II. Units of Measurement

Outline. 2. Logarithmic Functional Form and Units of Measurement. Functional Form. I. Functional Form: log II. Units of Measurement Outline 2. Logarithmic Functional Form and Units of Measurement I. Functional Form: log II. Units of Measurement Read Wooldridge (2013), Chapter 2.4, 6.1 and 6.2 2 Functional Form I. Functional Form: log

More information

Economics 471: Econometrics Department of Economics, Finance and Legal Studies University of Alabama

Economics 471: Econometrics Department of Economics, Finance and Legal Studies University of Alabama Economics 471: Econometrics Department of Economics, Finance and Legal Studies University of Alabama Course Packet The purpose of this packet is to show you one particular dataset and how it is used in

More information

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0 Introduction to Econometrics Midterm April 26, 2011 Name Student ID MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. (5,000 credit for each correct

More information

2. Linear regression with multiple regressors

2. Linear regression with multiple regressors 2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

More information

Empirical Economic Research, Part II

Empirical Economic Research, Part II Based on the text book by Ramanathan: Introductory Econometrics Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna December 7, 2011 Outline Introduction

More information

Statistical Inference. Part IV. Statistical Inference

Statistical Inference. Part IV. Statistical Inference Part IV Statistical Inference As of Oct 5, 2017 Sampling Distributions of the OLS Estimator 1 Statistical Inference Sampling Distributions of the OLS Estimator Testing Against One-Sided Alternatives Two-Sided

More information

Answers to Problem Set #4

Answers to Problem Set #4 Answers to Problem Set #4 Problems. Suppose that, from a sample of 63 observations, the least squares estimates and the corresponding estimated variance covariance matrix are given by: bβ bβ 2 bβ 3 = 2

More information

Making sense of Econometrics: Basics

Making sense of Econometrics: Basics Making sense of Econometrics: Basics Lecture 4: Qualitative influences and Heteroskedasticity Egypt Scholars Economic Society November 1, 2014 Assignment & feedback enter classroom at http://b.socrative.com/login/student/

More information

Model Specification and Data Problems. Part VIII

Model Specification and Data Problems. Part VIII Part VIII Model Specification and Data Problems As of Oct 24, 2017 1 Model Specification and Data Problems RESET test Non-nested alternatives Outliers A functional form misspecification generally means

More information

Heteroscedasticity 1

Heteroscedasticity 1 Heteroscedasticity 1 Pierre Nguimkeu BUEC 333 Summer 2011 1 Based on P. Lavergne, Lectures notes Outline Pure Versus Impure Heteroscedasticity Consequences and Detection Remedies Pure Heteroscedasticity

More information

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Multiple Regression Analysis. Part III. Multiple Regression Analysis Part III Multiple Regression Analysis As of Sep 26, 2017 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant

More information

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C = Economics 130 Lecture 6 Midterm Review Next Steps for the Class Multiple Regression Review & Issues Model Specification Issues Launching the Projects!!!!! Midterm results: AVG = 26.5 (88%) A = 27+ B =

More information

Exercise Sheet 6: Solutions

Exercise Sheet 6: Solutions Exercise Sheet 6: Solutions R.G. Pierse 1. (a) Regression yields: Dependent Variable: LC Date: 10/29/02 Time: 18:37 Sample(adjusted): 1950 1985 Included observations: 36 after adjusting endpoints C 0.244716

More information

Heteroskedasticity. Part VII. Heteroskedasticity

Heteroskedasticity. Part VII. Heteroskedasticity Part VII Heteroskedasticity As of Oct 15, 2015 1 Heteroskedasticity Consequences Heteroskedasticity-robust inference Testing for Heteroskedasticity Weighted Least Squares (WLS) Feasible generalized Least

More information

Eastern Mediterranean University Department of Economics ECON 503: ECONOMETRICS I. M. Balcilar. Midterm Exam Fall 2007, 11 December 2007.

Eastern Mediterranean University Department of Economics ECON 503: ECONOMETRICS I. M. Balcilar. Midterm Exam Fall 2007, 11 December 2007. Eastern Mediterranean University Department of Economics ECON 503: ECONOMETRICS I M. Balcilar Midterm Exam Fall 2007, 11 December 2007 Duration: 120 minutes Questions Q1. In order to estimate the demand

More information

The Simple Regression Model. Part II. The Simple Regression Model

The Simple Regression Model. Part II. The Simple Regression Model Part II The Simple Regression Model As of Sep 22, 2015 Definition 1 The Simple Regression Model Definition Estimation of the model, OLS OLS Statistics Algebraic properties Goodness-of-Fit, the R-square

More information

The general linear regression with k explanatory variables is just an extension of the simple regression as follows

The general linear regression with k explanatory variables is just an extension of the simple regression as follows 3. Multiple Regression Analysis The general linear regression with k explanatory variables is just an extension of the simple regression as follows (1) y i = β 0 + β 1 x i1 + + β k x ik + u i. Because

More information

CHAPTER 5 FUNCTIONAL FORMS OF REGRESSION MODELS

CHAPTER 5 FUNCTIONAL FORMS OF REGRESSION MODELS CHAPTER 5 FUNCTIONAL FORMS OF REGRESSION MODELS QUESTIONS 5.1. (a) In a log-log model the dependent and all explanatory variables are in the logarithmic form. (b) In the log-lin model the dependent variable

More information

Multiple Regression Analysis. Basic Estimation Techniques. Multiple Regression Analysis. Multiple Regression Analysis

Multiple Regression Analysis. Basic Estimation Techniques. Multiple Regression Analysis. Multiple Regression Analysis Multiple Regression Analysis Basic Estimation Techniques Herbert Stocker herbert.stocker@uibk.ac.at University of Innsbruck & IIS, University of Ramkhamhaeng Regression Analysis: Statistical procedure

More information

Regression with Qualitative Information. Part VI. Regression with Qualitative Information

Regression with Qualitative Information. Part VI. Regression with Qualitative Information Part VI Regression with Qualitative Information As of Oct 17, 2017 1 Regression with Qualitative Information Single Dummy Independent Variable Multiple Categories Ordinal Information Interaction Involving

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

ECON 4230 Intermediate Econometric Theory Exam

ECON 4230 Intermediate Econometric Theory Exam ECON 4230 Intermediate Econometric Theory Exam Multiple Choice (20 pts). Circle the best answer. 1. The Classical assumption of mean zero errors is satisfied if the regression model a) is linear in the

More information

Final Exam - Solutions

Final Exam - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your

More information

10. Time series regression and forecasting

10. Time series regression and forecasting 10. Time series regression and forecasting Key feature of this section: Analysis of data on a single entity observed at multiple points in time (time series data) Typical research questions: What is the

More information

Outline. 11. Time Series Analysis. Basic Regression. Differences between Time Series and Cross Section

Outline. 11. Time Series Analysis. Basic Regression. Differences between Time Series and Cross Section Outline I. The Nature of Time Series Data 11. Time Series Analysis II. Examples of Time Series Models IV. Functional Form, Dummy Variables, and Index Basic Regression Numbers Read Wooldridge (2013), Chapter

More information

Lab 07 Introduction to Econometrics

Lab 07 Introduction to Econometrics Lab 07 Introduction to Econometrics Learning outcomes for this lab: Introduce the different typologies of data and the econometric models that can be used Understand the rationale behind econometrics Understand

More information

FinQuiz Notes

FinQuiz Notes Reading 10 Multiple Regression and Issues in Regression Analysis 2. MULTIPLE LINEAR REGRESSION Multiple linear regression is a method used to model the linear relationship between a dependent variable

More information

Multiple Linear Regression CIVL 7012/8012

Multiple Linear Regression CIVL 7012/8012 Multiple Linear Regression CIVL 7012/8012 2 Multiple Regression Analysis (MLR) Allows us to explicitly control for many factors those simultaneously affect the dependent variable This is important for

More information

BUSINESS FORECASTING

BUSINESS FORECASTING BUSINESS FORECASTING FORECASTING WITH REGRESSION MODELS TREND ANALYSIS Prof. Dr. Burç Ülengin ITU MANAGEMENT ENGINEERING FACULTY FALL 2015 OVERVIEW The bivarite regression model Data inspection Regression

More information

Regression #8: Loose Ends

Regression #8: Loose Ends Regression #8: Loose Ends Econ 671 Purdue University Justin L. Tobias (Purdue) Regression #8 1 / 30 In this lecture we investigate a variety of topics that you are probably familiar with, but need to touch

More information

Multiple Regression Analysis

Multiple Regression Analysis Chapter 4 Multiple Regression Analysis The simple linear regression covered in Chapter 2 can be generalized to include more than one variable. Multiple regression analysis is an extension of the simple

More information

Topic 4: Model Specifications

Topic 4: Model Specifications Topic 4: Model Specifications Advanced Econometrics (I) Dong Chen School of Economics, Peking University 1 Functional Forms 1.1 Redefining Variables Change the unit of measurement of the variables will

More information

Econometrics Review questions for exam

Econometrics Review questions for exam Econometrics Review questions for exam Nathaniel Higgins nhiggins@jhu.edu, 1. Suppose you have a model: y = β 0 x 1 + u You propose the model above and then estimate the model using OLS to obtain: ŷ =

More information

ECON 497: Lecture Notes 10 Page 1 of 1

ECON 497: Lecture Notes 10 Page 1 of 1 ECON 497: Lecture Notes 10 Page 1 of 1 Metropolitan State University ECON 497: Research and Forecasting Lecture Notes 10 Heteroskedasticity Studenmund Chapter 10 We'll start with a quote from Studenmund:

More information

The Multiple Regression Model Estimation

The Multiple Regression Model Estimation Lesson 5 The Multiple Regression Model Estimation Pilar González and Susan Orbe Dpt Applied Econometrics III (Econometrics and Statistics) Pilar González and Susan Orbe OCW 2014 Lesson 5 Regression model:

More information

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han Econometrics Honor s Exam Review Session Spring 2012 Eunice Han Topics 1. OLS The Assumptions Omitted Variable Bias Conditional Mean Independence Hypothesis Testing and Confidence Intervals Homoskedasticity

More information

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables. Regression Analysis BUS 735: Business Decision Making and Research 1 Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate

More information

ECON 366: ECONOMETRICS II. SPRING TERM 2005: LAB EXERCISE #10 Nonspherical Errors Continued. Brief Suggested Solutions

ECON 366: ECONOMETRICS II. SPRING TERM 2005: LAB EXERCISE #10 Nonspherical Errors Continued. Brief Suggested Solutions DEPARTMENT OF ECONOMICS UNIVERSITY OF VICTORIA ECON 366: ECONOMETRICS II SPRING TERM 2005: LAB EXERCISE #10 Nonspherical Errors Continued Brief Suggested Solutions 1. In Lab 8 we considered the following

More information

1. You have data on years of work experience, EXPER, its square, EXPER2, years of education, EDUC, and the log of hourly wages, LWAGE

1. You have data on years of work experience, EXPER, its square, EXPER2, years of education, EDUC, and the log of hourly wages, LWAGE 1. You have data on years of work experience, EXPER, its square, EXPER, years of education, EDUC, and the log of hourly wages, LWAGE You estimate the following regressions: (1) LWAGE =.00 + 0.05*EDUC +

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis y = β 0 + β 1 x 1 + β 2 x 2 +... β k x k + u 2. Inference 0 Assumptions of the Classical Linear Model (CLM)! So far, we know: 1. The mean and variance of the OLS estimators

More information

Föreläsning /31

Föreläsning /31 1/31 Föreläsning 10 090420 Chapter 13 Econometric Modeling: Model Speci cation and Diagnostic testing 2/31 Types of speci cation errors Consider the following models: Y i = β 1 + β 2 X i + β 3 X 2 i +

More information

ECON3150/4150 Spring 2016

ECON3150/4150 Spring 2016 ECON3150/4150 Spring 2016 Lecture 6 Multiple regression model Siv-Elisabeth Skjelbred University of Oslo February 5th Last updated: February 3, 2016 1 / 49 Outline Multiple linear regression model and

More information

5. Erroneous Selection of Exogenous Variables (Violation of Assumption #A1)

5. Erroneous Selection of Exogenous Variables (Violation of Assumption #A1) 5. Erroneous Selection of Exogenous Variables (Violation of Assumption #A1) Assumption #A1: Our regression model does not lack of any further relevant exogenous variables beyond x 1i, x 2i,..., x Ki and

More information

Contest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2.

Contest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2. Updated: November 17, 2011 Lecturer: Thilo Klein Contact: tk375@cam.ac.uk Contest Quiz 3 Question Sheet In this quiz we will review concepts of linear regression covered in lecture 2. NOTE: Please round

More information

APPLIED MACROECONOMETRICS Licenciatura Universidade Nova de Lisboa Faculdade de Economia. FINAL EXAM JUNE 3, 2004 Starts at 14:00 Ends at 16:30

APPLIED MACROECONOMETRICS Licenciatura Universidade Nova de Lisboa Faculdade de Economia. FINAL EXAM JUNE 3, 2004 Starts at 14:00 Ends at 16:30 APPLIED MACROECONOMETRICS Licenciatura Universidade Nova de Lisboa Faculdade de Economia FINAL EXAM JUNE 3, 2004 Starts at 14:00 Ends at 16:30 I In Figure I.1 you can find a quarterly inflation rate series

More information

The regression model with one fixed regressor cont d

The regression model with one fixed regressor cont d The regression model with one fixed regressor cont d 3150/4150 Lecture 4 Ragnar Nymoen 27 January 2012 The model with transformed variables Regression with transformed variables I References HGL Ch 2.8

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Lecture (chapter 13): Association between variables measured at the interval-ratio level

Lecture (chapter 13): Association between variables measured at the interval-ratio level Lecture (chapter 13): Association between variables measured at the interval-ratio level Ernesto F. L. Amaral April 9 11, 2018 Advanced Methods of Social Research (SOCI 420) Source: Healey, Joseph F. 2015.

More information

Practice Questions for the Final Exam. Theoretical Part

Practice Questions for the Final Exam. Theoretical Part Brooklyn College Econometrics 7020X Spring 2016 Instructor: G. Koimisis Name: Date: Practice Questions for the Final Exam Theoretical Part 1. Define dummy variable and give two examples. 2. Analyze the

More information

ECON 497 Midterm Spring

ECON 497 Midterm Spring ECON 497 Midterm Spring 2009 1 ECON 497: Economic Research and Forecasting Name: Spring 2009 Bellas Midterm You have three hours and twenty minutes to complete this exam. Answer all questions and explain

More information

Econometrics -- Final Exam (Sample)

Econometrics -- Final Exam (Sample) Econometrics -- Final Exam (Sample) 1) The sample regression line estimated by OLS A) has an intercept that is equal to zero. B) is the same as the population regression line. C) cannot have negative and

More information

Practical Econometrics. for. Finance and Economics. (Econometrics 2)

Practical Econometrics. for. Finance and Economics. (Econometrics 2) Practical Econometrics for Finance and Economics (Econometrics 2) Seppo Pynnönen and Bernd Pape Department of Mathematics and Statistics, University of Vaasa 1. Introduction 1.1 Econometrics Econometrics

More information

Midterm 2 - Solutions

Midterm 2 - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman Midterm 2 - Solutions You have until 10:20am to complete this exam. Please remember to put

More information

Applied Econometrics. Applied Econometrics Second edition. Dimitrios Asteriou and Stephen G. Hall

Applied Econometrics. Applied Econometrics Second edition. Dimitrios Asteriou and Stephen G. Hall Applied Econometrics Second edition Dimitrios Asteriou and Stephen G. Hall MULTICOLLINEARITY 1. Perfect Multicollinearity 2. Consequences of Perfect Multicollinearity 3. Imperfect Multicollinearity 4.

More information

An Introduction to Mplus and Path Analysis

An Introduction to Mplus and Path Analysis An Introduction to Mplus and Path Analysis PSYC 943: Fundamentals of Multivariate Modeling Lecture 10: October 30, 2013 PSYC 943: Lecture 10 Today s Lecture Path analysis starting with multivariate regression

More information

ECON 5350 Class Notes Functional Form and Structural Change

ECON 5350 Class Notes Functional Form and Structural Change ECON 5350 Class Notes Functional Form and Structural Change 1 Introduction Although OLS is considered a linear estimator, it does not mean that the relationship between Y and X needs to be linear. In this

More information

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Overfitting Categorical Variables Interaction Terms Non-linear Terms Linear Logarithmic y = a +

More information

1 Quantitative Techniques in Practice

1 Quantitative Techniques in Practice 1 Quantitative Techniques in Practice 1.1 Lecture 2: Stationarity, spurious regression, etc. 1.1.1 Overview In the rst part we shall look at some issues in time series economics. In the second part we

More information

Exercise Sheet 5: Solutions

Exercise Sheet 5: Solutions Exercise Sheet 5: Solutions R.G. Pierse 2. Estimation of Model M1 yields the following results: Date: 10/24/02 Time: 18:06 C -1.448432 0.696587-2.079327 0.0395 LPC -0.306051 0.272836-1.121740 0.2640 LPF

More information

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity LECTURE 10 Introduction to Econometrics Multicollinearity & Heteroskedasticity November 22, 2016 1 / 23 ON PREVIOUS LECTURES We discussed the specification of a regression equation Specification consists

More information

2 Prediction and Analysis of Variance

2 Prediction and Analysis of Variance 2 Prediction and Analysis of Variance Reading: Chapters and 2 of Kennedy A Guide to Econometrics Achen, Christopher H. Interpreting and Using Regression (London: Sage, 982). Chapter 4 of Andy Field, Discovering

More information

Chapter 9. Dummy (Binary) Variables. 9.1 Introduction The multiple regression model (9.1.1) Assumption MR1 is

Chapter 9. Dummy (Binary) Variables. 9.1 Introduction The multiple regression model (9.1.1) Assumption MR1 is Chapter 9 Dummy (Binary) Variables 9.1 Introduction The multiple regression model y = β+β x +β x + +β x + e (9.1.1) t 1 2 t2 3 t3 K tk t Assumption MR1 is 1. yt =β 1+β 2xt2 + L+β KxtK + et, t = 1, K, T

More information

6. Assessing studies based on multiple regression

6. Assessing studies based on multiple regression 6. Assessing studies based on multiple regression Questions of this section: What makes a study using multiple regression (un)reliable? When does multiple regression provide a useful estimate of the causal

More information

Ecn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section:

Ecn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section: Ecn 102 - Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman Midterm 2 You have until 10:20am to complete this exam. Please remember to put your name,

More information

[ ESS ESS ] / 2 [ ] / ,019.6 / Lab 10 Key. Regression Analysis: wage versus yrsed, ex

[ ESS ESS ] / 2 [ ] / ,019.6 / Lab 10 Key. Regression Analysis: wage versus yrsed, ex Lab 1 Key Regression Analysis: wage versus yrsed, ex wage = - 4.78 + 1.46 yrsed +.126 ex Constant -4.78 2.146-2.23.26 yrsed 1.4623.153 9.73. ex.12635.2739 4.61. S = 8.9851 R-Sq = 11.9% R-Sq(adj) = 11.7%

More information

LECTURE 11. Introduction to Econometrics. Autocorrelation

LECTURE 11. Introduction to Econometrics. Autocorrelation LECTURE 11 Introduction to Econometrics Autocorrelation November 29, 2016 1 / 24 ON PREVIOUS LECTURES We discussed the specification of a regression equation Specification consists of choosing: 1. correct

More information

Ref.: Spring SOS3003 Applied data analysis for social science Lecture note

Ref.:   Spring SOS3003 Applied data analysis for social science Lecture note SOS3003 Applied data analysis for social science Lecture note 05-2010 Erling Berge Department of sociology and political science NTNU Spring 2010 Erling Berge 2010 1 Literature Regression criticism I Hamilton

More information

Outline. Nature of the Problem. Nature of the Problem. Basic Econometrics in Transportation. Autocorrelation

Outline. Nature of the Problem. Nature of the Problem. Basic Econometrics in Transportation. Autocorrelation 1/30 Outline Basic Econometrics in Transportation Autocorrelation Amir Samimi What is the nature of autocorrelation? What are the theoretical and practical consequences of autocorrelation? Since the assumption

More information

OSU Economics 444: Elementary Econometrics. Ch.10 Heteroskedasticity

OSU Economics 444: Elementary Econometrics. Ch.10 Heteroskedasticity OSU Economics 444: Elementary Econometrics Ch.0 Heteroskedasticity (Pure) heteroskedasticity is caused by the error term of a correctly speciþed equation: Var(² i )=σ 2 i, i =, 2,,n, i.e., the variance

More information

Unit 10: Simple Linear Regression and Correlation

Unit 10: Simple Linear Regression and Correlation Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for

More information

The multiple regression model; Indicator variables as regressors

The multiple regression model; Indicator variables as regressors The multiple regression model; Indicator variables as regressors Ragnar Nymoen University of Oslo 28 February 2013 1 / 21 This lecture (#12): Based on the econometric model specification from Lecture 9

More information

Univariate linear models

Univariate linear models Univariate linear models The specification process of an univariate ARIMA model is based on the theoretical properties of the different processes and it is also important the observation and interpretation

More information

Lecture 6: Linear Regression

Lecture 6: Linear Regression Lecture 6: Linear Regression Reading: Sections 3.1-3 STATS 202: Data mining and analysis Jonathan Taylor, 10/5 Slide credits: Sergio Bacallado 1 / 30 Simple linear regression Model: y i = β 0 + β 1 x i

More information

ACE 564 Spring Lecture 8. Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information. by Professor Scott H.

ACE 564 Spring Lecture 8. Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information. by Professor Scott H. ACE 564 Spring 2006 Lecture 8 Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information by Professor Scott H. Irwin Readings: Griffiths, Hill and Judge. "Collinear Economic Variables,

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

REED TUTORIALS (Pty) LTD ECS3706 EXAM PACK

REED TUTORIALS (Pty) LTD ECS3706 EXAM PACK REED TUTORIALS (Pty) LTD ECS3706 EXAM PACK 1 ECONOMETRICS STUDY PACK MAY/JUNE 2016 Question 1 (a) (i) Describing economic reality (ii) Testing hypothesis about economic theory (iii) Forecasting future

More information

MFin Econometrics I Session 5: F-tests for goodness of fit, Non-linearity and Model Transformations, Dummy variables

MFin Econometrics I Session 5: F-tests for goodness of fit, Non-linearity and Model Transformations, Dummy variables MFin Econometrics I Session 5: F-tests for goodness of fit, Non-linearity and Model Transformations, Dummy variables Thilo Klein University of Cambridge Judge Business School Session 5: Non-linearity,

More information

ECONOMETRICS HONOR S EXAM REVIEW SESSION

ECONOMETRICS HONOR S EXAM REVIEW SESSION ECONOMETRICS HONOR S EXAM REVIEW SESSION Eunice Han ehan@fas.harvard.edu March 26 th, 2013 Harvard University Information 2 Exam: April 3 rd 3-6pm @ Emerson 105 Bring a calculator and extra pens. Notes

More information

Chapter 13. Multiple Regression and Model Building

Chapter 13. Multiple Regression and Model Building Chapter 13 Multiple Regression and Model Building Multiple Regression Models The General Multiple Regression Model y x x x 0 1 1 2 2... k k y is the dependent variable x, x,..., x 1 2 k the model are the

More information

Romanian Economic and Business Review Vol. 3, No. 3 THE EVOLUTION OF SNP PETROM STOCK LIST - STUDY THROUGH AUTOREGRESSIVE MODELS

Romanian Economic and Business Review Vol. 3, No. 3 THE EVOLUTION OF SNP PETROM STOCK LIST - STUDY THROUGH AUTOREGRESSIVE MODELS THE EVOLUTION OF SNP PETROM STOCK LIST - STUDY THROUGH AUTOREGRESSIVE MODELS Marian Zaharia, Ioana Zaheu, and Elena Roxana Stan Abstract Stock exchange market is one of the most dynamic and unpredictable

More information

7. Integrated Processes

7. Integrated Processes 7. Integrated Processes Up to now: Analysis of stationary processes (stationary ARMA(p, q) processes) Problem: Many economic time series exhibit non-stationary patterns over time 226 Example: We consider

More information

Chapter 4: Regression Models

Chapter 4: Regression Models Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,

More information

Applied Econometrics (QEM)

Applied Econometrics (QEM) Applied Econometrics (QEM) based on Prinicples of Econometrics Jakub Mućk Department of Quantitative Economics Jakub Mućk Applied Econometrics (QEM) Meeting #3 1 / 42 Outline 1 2 3 t-test P-value Linear

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

Economics 308: Econometrics Professor Moody

Economics 308: Econometrics Professor Moody Economics 308: Econometrics Professor Moody References on reserve: Text Moody, Basic Econometrics with Stata (BES) Pindyck and Rubinfeld, Econometric Models and Economic Forecasts (PR) Wooldridge, Jeffrey

More information

Applied Econometrics. Professor Bernard Fingleton

Applied Econometrics. Professor Bernard Fingleton Applied Econometrics Professor Bernard Fingleton Regression A quick summary of some key issues Some key issues Text book JH Stock & MW Watson Introduction to Econometrics 2nd Edition Software Gretl Gretl.sourceforge.net

More information

Econometrics Summary Algebraic and Statistical Preliminaries

Econometrics Summary Algebraic and Statistical Preliminaries Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L

More information

1 A Non-technical Introduction to Regression

1 A Non-technical Introduction to Regression 1 A Non-technical Introduction to Regression Chapters 1 and Chapter 2 of the textbook are reviews of material you should know from your previous study (e.g. in your second year course). They cover, in

More information

ECNS 561 Multiple Regression Analysis

ECNS 561 Multiple Regression Analysis ECNS 561 Multiple Regression Analysis Model with Two Independent Variables Consider the following model Crime i = β 0 + β 1 Educ i + β 2 [what else would we like to control for?] + ε i Here, we are taking

More information

An Introduction to Path Analysis

An Introduction to Path Analysis An Introduction to Path Analysis PRE 905: Multivariate Analysis Lecture 10: April 15, 2014 PRE 905: Lecture 10 Path Analysis Today s Lecture Path analysis starting with multivariate regression then arriving

More information

Stat 500 Midterm 2 12 November 2009 page 0 of 11

Stat 500 Midterm 2 12 November 2009 page 0 of 11 Stat 500 Midterm 2 12 November 2009 page 0 of 11 Please put your name on the back of your answer book. Do NOT put it on the front. Thanks. Do not start until I tell you to. The exam is closed book, closed

More information

Answer Key: Problem Set 6

Answer Key: Problem Set 6 : Problem Set 6 1. Consider a linear model to explain monthly beer consumption: beer = + inc + price + educ + female + u 0 1 3 4 E ( u inc, price, educ, female ) = 0 ( u inc price educ female) σ inc var,,,

More information

Introduction to Econometrics Chapter 4

Introduction to Econometrics Chapter 4 Introduction to Econometrics Chapter 4 Ezequiel Uriel Jiménez University of Valencia Valencia, September 2013 4 ypothesis testing in the multiple regression 4.1 ypothesis testing: an overview 4.2 Testing

More information

Rockefeller College University at Albany

Rockefeller College University at Albany Rockefeller College University at Albany PAD 705 Handout: Suggested Review Problems from Pindyck & Rubinfeld Original prepared by Professor Suzanne Cooper John F. Kennedy School of Government, Harvard

More information

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is Practice Final Exam Last Name:, First Name:. Please write LEGIBLY. Answer all questions on this exam in the space provided (you may use the back of any page if you need more space). Show all work but do

More information