Lecture 1: intro. to regresions

Size: px

Start display at page:

Download "Lecture 1: intro. to regresions"

Ashlie Arnold
5 years ago
Views:

1 Lecture : intro. to regresions Class basics: Coach, Bro., Prof.,Not Dr.; grading (exams, HW, papers), Quizes Simple regression model (one X): Salary = b0 + b*experience + error Multiple regression model (two or more X) adding a dummy (0,) variable Salary = b0 + b*experience + b2*male+error Applic:Difference-in-Difference Estimation Wg=b0+b*LR+b2*after+b3(LR*after)+error

2 Lecture : some things to remember Y N X = ;. = ( ) = 0 i N i= X DevX Xi X N i= N N 2 ( Xi X) ( Xi X)( Yi Y) i= i= Var ( X) = ; Cov ( Y, X) = N = β + β X + μ i 0 i i N N ( X X)( Y Y) ˆ β ˆ β ˆ βx i i i= = ; 0 = Y N 2 ( Xi X) i=

3 Lecture : some things to remember--continued N ( X X)( Y Y) i i ˆ i= th β = (. var ) N 2 ( Xi X) i= generalizes for the coeff of the k iable to N ( ( X Y ( ( ki, i ˆ i= β k = N ( 2 X ki, where X k ( Yi) is the part of the var iable X k ( Y ) i= that is uncorrelated with the other var iable s in the mod el

4 Lecture 2: vectors and orthogonality Vector definition Vector sums and differences Dot product of vector defines orthogonality Regressions seen in terms of variable vectors Steps for Proofs in 388: definitions, substitutions, simplications, and expected value when appropriate

Lecture 2: some things to remember OLS coefficients come from minimizing the error, so that the error vector is orthogonal to the regression plane;

5 Lecture 2: some things to remember OLS coefficients come from minimizing the error, so that the error vector is orthogonal to the regression plane; for simple regrss: ˆ μ X = 0; ˆ μ = 0; For multivariate regression: ˆ = 0; ˆ = 0; ˆ = 0;...; ˆ = 0; μ μ μ μ X X X 2 3 k or in matrix terms, X ' ˆ μ = 0

Lecture 3: Multivariate Regrss, Frisch Thm Minimizing the (length of the) error vector yields normal equations for OLS Solving Normal equations yields OLS estimators Outcomes (Y) are

6 Lecture 3: Multivariate Regrss, Frisch Thm Minimizing the (length of the) error vector yields normal equations for OLS Solving Normal equations yields OLS estimators Outcomes (Y) are decomposed into a part explained by X (the regression plane or model) and a part unexplained by X, the error In OLS, the coefficient for the kth regressor measures the partial effect of Xk on Y

7 Lecture 3: some things to remember Y = Yˆ+ ˆ μ, whereyˆ= X ˆ β and ˆ μ= Y X ˆ β X ˆ μ = yields ˆ β = X X X Y ' 0 ( ' ) ' ˆ = ( ' ) ' = X, substituting we get Y X X X X Y P Y ˆ = ( ' ) ' = X, X = X and μ Y X X X X Y M Y where M I P Let M = ( I P ); so that M X = X P X j x,..., x, x,... j j j x,..., x, x,... j j j+ j j+ then ˆ β k = ( M X )'( M Y) j j j ( M X )'( M X ) j j j j

Lecture 4: Variances and testing Use variances to apply the EMPIRICAL rule (which works for normals, and means in large samples): 68% within standard deviation of the

8 Lecture 4: Variances and testing Use variances to apply the EMPIRICAL rule (which works for normals, and means in large samples): 68% within standard deviation of the mean 95% within 2 standard deviations of the mean 99% within 3 standard deviations of the mean (centered) R-square the variation in Y explained by the variation in the Xs

9 Lecture 4: some things to remember Standard deviation=sq. root of est. variance Ex.: s. d. of X = N i= ( X X) i ( N ) 2 Ex.2: s. d. of ˆ s N i= μ = ( ˆ μ 0) i ( N k) 2 ( statistic hypo. value) N(0,) or " z scores" = 2 about 95% of the time s tan dard deviation 2 PX Y Uncentered R-square = i= ( ˆ μ ) ˆ ( N 2) Ex.3: s. error of β ( simple OLS ) = = Centered R-square (usual) = N i 2 N N 2 2 ( Xi X) ( Xi X) i= i= Y 2 s P M X Y M Y 2 2

Lecture 5: Date Transf; Model Assump Non-singular transformations of X (i.e., postmultiplying by kxk matrix A) doesn t affect the projection of Y Pre-multiplied transformations

10 Lecture 5: Date Transf; Model Assump Non-singular transformations of X (i.e., postmultiplying by kxk matrix A) doesn t affect the projection of Y Pre-multiplied transformations by nxn matrix C is used to fix regression model when its assumptions fail Testing linear restrictions (the third transf.) Model assumptions:.y=xb+u 2. X X full rank 3. E(u X)=0

11 Lecture 5: some things to remember Only need the first three assumptions to prove betas are unbiased (use definition, substitution, simplification, expected value) ˆ ( X ' X) X ' β = β + μ Omitted variable bias: E( ˆ X) = + ( X ' X) X ' z β β α

12 Lecture 6: More Assump.; Theorems Assum.4: error has constant variances (no heteroskedasticity), and covariances between errors is zero (no serial correlation): Variance estimator of u, s-squared, is unibased OLS is BLUE (Gauss-Markov theorem) Assum. 5: normality of the error term, which implies normality of Y, which implies normality of the beta vector

13 Lecture 6: some things to remember Definitions of expected value and variance for matrices Useful theorem A: for Y ~ N ( θ, Σ ) then AY ~ N ( Aθ, AΣA ') Trace of matrix=sum of diagonal elements Positive definitive matrix C if z Cx>0 for any vector z with at least one non-zero value

Lecture 7: MR Hypothesis Testing Normality assumption used for hypothesis testing Classical frequentist approach: minimize Type I error (rejection of the null

14 Lecture 7: MR Hypothesis Testing Normality assumption used for hypothesis testing Classical frequentist approach: minimize Type I error (rejection of the null hypothesis when it is true), while guarding against Type 2 error (accepting the null hypothesis when the alternative hypothesis is true) One sided vs. two sided alternative hypo.

15 Lecture 7: some things to remember Steps in figuring out the size of type 2 error for the jth beta coefficient:. specify null and alternative hypotheses (one-sided) for that coefficient 2. use z-scores to figure out critical value for the jth coefficient (say at the 5% level) 3. using that critical value, and assuming the alternative hypo. Is true, calculate the associated z-score and look up its probability in a N(0,) table (or have STATA calculate it)

Lecture 8: MR testing linear combinations Likelihood=given the sample outcome, how probable was the outcome under the null hypothesis (i.e., as a fn. of the parm.

16 Lecture 8: MR testing linear combinations Likelihood=given the sample outcome, how probable was the outcome under the null hypothesis (i.e., as a fn. of the parm.) Three classical likelihood tests: LM: uses the likelihd at the restricted values Wald: uses the likelihd at the unrestr. Values Likelihd ratio test: compares restr. and unrestr values All three of these tests have the same asymptotic distribution as the F test

17 Lecture 8: some things to remember F = ( SSR SSR )/ r R U SSR /( n k) U STATA: test (x=0) (x3=0)(x4=0); SAS: test x,x3,x4;

18 Lecture 9: Asymptotics: Consistency LLN: for large samples, a mean converges in distribution to the mean of the distribution from which it was drawn CLT: for large samples, means tend to be normal, even if the underlying distribution is not normal Since betas are means, both rules apply

19 Lecture 9: some things to remember Betas are means: N N ˆ = ( Xi' Xi) ( Xi' Yi) N i= N i= β Consistency proofs usually are of 2 types: If estimator is always unbiased, show its variance goes to zero as N gets large, OR Use the LLN directly where applicable

Lecture 0: Large Sample MR inference Central Limit Theorem (CLT): the distribution of averages tends to be normal So betas in regression models with large samples tend to be

20 Lecture 0: Large Sample MR inference Central Limit Theorem (CLT): the distribution of averages tends to be normal So betas in regression models with large samples tend to be normally distributed, even if the underlying errors are not. Instrumental Variable (IV) estimators are consistent if you have good instruments, and are asymptotically normal

21 Lecture 0: some things to remember Instrument variables (Z) need to be: Uncorrelated with the errors, but Highly correlated with the r.h.s variables ˆ ( ' ) ' β IV = Z X Z Y When you have surplus instruments, two stage least squares is the efficient way to use the additional information (see lectures 20/2)

Lecture : MR specification Issues Always have constants in your regressions, but solemn worry about interpreting them Slope coefficients can be normalized so that they are unit-free :.

22 Lecture : MR specification Issues Always have constants in your regressions, but solemn worry about interpreting them Slope coefficients can be normalized so that they are unit-free :.elasticities coefficients from log-log specifications 2.beta coefficients std dev change in Y for a s.d. change in X When your have interactions and quadratic terms (squares), take partial derivatives to help you interpret the results Predictions are stochastic, use the correct prediction interval in interpreting the outcomes

23 Lecture : some things to remember Calculus approximation of the extremal point for 2 the model is Y= β + βx+ β X + μ 0 2 ˆ β X max,min = 2* ˆ β2 Prediction variance for the model Y = x ˆ β + μ ˆP P P Is calculated as estimator of Var Yˆ s x X X x 2 ' ( P) = ( P( ' ) P + )

24 Lecture 2: Dummy Variables (0,) Dummy variables are the way we quantify qualitative things: status (gender, race, health etc.) or choice (married, emplyd, RM, tai chi, etc) Rules about dummy variables: If m groups, using m- dummy variables Dummy var. coeff. Represent shift (in intercept) of that group relative to the omitted group Useful for fixed effects model (panels with repeated observations on the same unit)

25 Lecture 2: some things to remember dummy variables: If m groups, using m- dummy variables Dummy coeff. represent shift (in intercept) of that group relative to the omitted group Dummy can be used for continuous var. too when you want to allow for more complexity The OLS estimator in panel data sets is a (matrix) weighted average of the between group estimator (using just group means) and within group estimator (using just deviations from means within groups

Lecture 3: Dummy Variable Interactions Tests for structural (coefficient) shifts in the model, across groups or over time, can be done with dummy variable interactions: Dummies by themselves (

26 Lecture 3: Dummy Variable Interactions Tests for structural (coefficient) shifts in the model, across groups or over time, can be done with dummy variable interactions: Dummies by themselves ( interacted with the constant, a vector of ones) allows intercept shifts Dummies interacted with X allows slope shifts in the relevant X variable coefficient F-tests for such shifts are readily made in STATA or SAS

27 Lecture 3: some things to remember reg sales age covey dilbert age_cov age_dilbert; Test (covey=0) (dilbert=0); *tests intercept shifts; Test (age_cov=0) (age_dilbert=0); *tests slope shifts; Test (covey=0) (dilbert=0) (age_cov=0) (age_dilbert=0); *slope and slope shifts-tests for full structural shift in the equation;

Lecture 4: Dummy Dependent Variables Dummy dependent variable models are choice models, where the likelihood of having made a choice is modeled as a function of Xs:

28 Lecture 4: Dummy Dependent Variables Dummy dependent variable models are choice models, where the likelihood of having made a choice is modeled as a function of Xs: marriage, college, exercising, memorizing the proclamation on the family, etc. Linear version: LPM, heterosk and boundd Nonlinear versions: logits, probits, max likelihood estimation

Lecture 4: some things to remember General notation: Pr ob( Y = X ) = G( X β) i i i LPM version: exp( Xiβ) Logit version: Pr ob( Yi = Xi) = + exp( X β) Probit version: Log-liklhd ratio tests:

29 Lecture 4: some things to remember General notation: Pr ob( Y = X ) = G( X β) i i i LPM version: exp( Xiβ) Logit version: Pr ob( Yi = Xi) = + exp( X β) Probit version: Log-liklhd ratio tests: Marginal effect: Pr ob( Y = X ) = X β i i i X i β Pr ob( Yi = Xi) = ϕ() v dv where ϕ()~ v N(0,) R U 2*(loglik log lik ) PY ( i = X ) Gz () Gz () Gz () = βj where = for LPM ; = G( z)*( G( z)) for Logit X z z z j i

Lecture 5: Heteroskedasticity Heteroskedasticity=variance changes between observations Coeff are still unbiased, but Standard errors (t-tests) may be wrong Modern

30 Lecture 5: Heteroskedasticity Heteroskedasticity=variance changes between observations Coeff are still unbiased, but Standard errors (t-tests) may be wrong Modern approach: use robust standard errors: regress y x x2 x3, robust; Old approach: model the heterosked. Modern detection of het: LM tests Old detection of het: Goldfeld Quandt tests

Lecture 5: some things to remember Heteroskedasticity affects coeff covariance matrix, but does not bias the coefficients Tests LM tests: regress residuals

31 Lecture 5: some things to remember Heteroskedasticity affects coeff covariance matrix, but does not bias the coefficients Tests LM tests: regress residuals on Xs then 2 N * R ~ chi square ( d. f. = number slope coeff.) Goldfeld-Q: divide data into 3 sections then SSR III SSR I / df / df III I III I ~ F( df, df )

Lecture 6: Weighted Least Squares Weighting to correct heteroskedasticity: give more weight to the observations with the better information (lower variance) by dividing each

32 Lecture 6: Weighted Least Squares Weighting to correct heteroskedasticity: give more weight to the observations with the better information (lower variance) by dividing each observation by its variance When we do weighted least squares we no longer do orthogonal projections (because our error distributions are no longer spherical), but oblique projections

33 Lecture 6: some things to remember. transform all the data approach 2 Y = Xβ+ μ, so μ~ N(0, Σ) whereσ σ I TY = TXβ+ Tμ so Tμ N σ I 2, ~ (0, ) so TΣ T ' = σ I, or T ' T = σ Σ. Hence, ˆ β = ( X ' Σ X) X ' Σ Y 2 2 T 2. transform the covariance matrix approach estimator of V( ˆ β) = ( X ' X) X ' ˆˆ μμ' X( X ' X)

Lecture 7: Time Series Data Asymptotics Time series: random var ordered in time: Can t replicate Non-independent Additional assumptions on process: Covariance

34 Lecture 7: Time Series Data Asymptotics Time series: random var ordered in time: Can t replicate Non-independent Additional assumptions on process: Covariance stationarity Weak dependence (asymptotically indep.) Types of stochastic (time series) process: White noise Random walk Moving Average (MA(.)) Autoregressive (AR(.))

35 Lecture 7: some things to remember White noise e t Stationary Weakly dependent Random Walk y = y + e t t t Not Stationary Not Weakly dependent Moving Average Autoregressive y = α e + α e t 0 t t y y e t = ρ t + t Stationary Stationary Weakly dependent Weakly dependent (if ρ < )

Lecture 8: Persistent Time Series I(0)-integrated of order zero series,weakly dependent, examples: AR() with rho< Get consistency with large samples I()-integrated of order one series,not weakly

36 Lecture 8: Persistent Time Series I(0)-integrated of order zero series,weakly dependent, examples: AR() with rho< Get consistency with large samples I()-integrated of order one series,not weakly dependent, examples: AR() with rho= (i.e., random walks) Consistency problems of estimators, even if N large Dynamically complete models: add lagged r.h.s. variables until no serial correlation in error

37 Lecture 8: some things to remember y = y + e ρ For AR() t t t IF ρ < then you can get consistent estimates in large samples by OLS IF ρ = then you need to first difference the data to consistent estimates in large samples by OLS Rule of Thumb: if ρ.9 then first difference (because of the unit root problem)

Lecture 9: Serially Correlated Errors Violates assumption IV: namely, serial correlation occurs when the residuals are correlated (positive error at t- increases prob. of positive error at t) Coeff.

38 Lecture 9: Serially Correlated Errors Violates assumption IV: namely, serial correlation occurs when the residuals are correlated (positive error at t- increases prob. of positive error at t) Coeff. unbiased, but cov(coeff.) incorrect so that t-stats and tests of coeff are biased Common time series problem: μt = ρμt + et Detecting serial correlation: DW, regression Correcting serial correlation: two-stage differencing to increase efficiency, correct t-stats

39 Lecture 9: some things to remember Detecting AR(): Regression: DW statistic: ˆ μ = ρμˆ + e DW t t t T 2 ( ˆ μ ˆ t μt ) t= 2 = T 2 ˆ μt t= 2( ˆ ρ) Correcting AR() by differencing: y t= xtβ + μt; μt = ρμt + et y ˆ ρ y = ( x ˆ ρx ) β + e t t t t t

40 Lecture 20 & 2: Endo.; I.V., 2SLS Endogeneity: omitted var, measurement error, or simultaneous causation (S&D) Endogeneity violates assumption III, and everything is biased Fix: find instrumental variables (IV), Z, s.t. Correlated with Xs (r.h.s. regressors) Uncorrelated with the error

41 Lecture 20&2: things to remember IV in general: ˆ = ( ' ) ' β IV Z X Z Y V ˆ β = s ZX ZZZX 2 ( IV ) ( ' ) ' ( ' ) 2SLS as a type of IV: ˆ β ˆ ˆ ˆ 2SLS = ( X ' X) X' Y where X = V( V' V) V' X ˆ β ˆ SLS = X VVV V X XVVV VY 2 ( ' ( ' ) ' ) ' ( ' ) ' V( ˆ β ) = s ( X' V( V' V) V' X) 2SLS 2

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

Econometrics Honor s Exam Review Session Spring 2012 Eunice Han Topics 1. OLS The Assumptions Omitted Variable Bias Conditional Mean Independence Hypothesis Testing and Confidence Intervals Homoskedasticity