Speci cation of Conditional Expectation Functions

Speci cation of Conditional Expectation Functions Econometrics Douglas G. Steigerwald UC Santa Barbara D. Steigerwald (UCSB) Specifying Expectation Functions 1 / 24

Overview Reference: B. Hansen Econometrics Chapter 2.9-2.17, 2.31-2.32 Why focus on E (yjx)? conditional mean is the "best predictor" yields marginal e ects are the measured marginal e ects causal? speci cation of conditional mean exact for discrete covariates as E (yjx) is linear in x by de nition conditional variance is also informative D. Steigerwald (UCSB) Specifying Expectation Functions 2 / 24

Why Focus on the CEF? let g (x) be an arbitrary predictor of y suppose our goal is to minimize E (y g (x)) 2 if Ey 2 < E (y g (x)) 2 E (y E (yjx)) 2 conditional mean is the "best predictor" D. Steigerwald (UCSB) Specifying Expectation Functions 3 / 24

Proof let m (x) = E (yjx) E (y g (x)) 2 = E (e + m (x) g (x)) 2 because E (ejx) = 0, e is uncorrelated with any function of x: E (e + m (x) g (x)) 2 = Ee 2 + E (m (x) g (x)) 2 g (x) = m (x) is the minimizing value D. Steigerwald (UCSB) Specifying Expectation Functions 4 / 24

Marginal E ects E (yjx) can be interpreted in terms of marginal e ects e ect of a change in x 1 holding other covariates constant cannot hold "all else" constant causal e ect - marginal e ect of (continuous) x 1 on y y E (yjx) = + e x 1 x 1 x 1 for marginal e ects to be causal, we must establish (assume) e x 1 = 0 D. Steigerwald (UCSB) Specifying Expectation Functions 5 / 24

CEF Derivative we measure - marginal e ect on CEF the formula for this derivative di ers for continuous and discrete covariates r 1 m (x) = x 1 E (yjx 1, x 2,..., x K ) if x 1 is continuous E (yj1, x 2,..., x K ) E (yj0, x 2,..., x K ) if x 1 is discrete e ect of a change in x 1 on E (yjx) holding other covariates constant potentially varies as we change the set of covariates D. Steigerwald (UCSB) Specifying Expectation Functions 6 / 24

E ect of Covariates changing covariates changes E (yjx) and e y = E (yjx 1 ) + e 1 y = E (yjx 1, x 2 ) + e 2 E (yjx 1, x 2 ) reveals greater detail about the behavior of y e is the unexplained portion of y Adding Covariates Theorem: f Ey 2 < Var (y) Var (e 1 ) Var (e 2 ) How restrictive is the nite moment assumption? Finite Moment Assumption D. Steigerwald (UCSB) Specifying Expectation Functions 7 / 24

CEF Speci cation: No Covariates m (x) = µ y = m (x) + e becomes y = µ + e intercept-only model D. Steigerwald (UCSB) Specifying Expectation Functions 8 / 24

Binary Covariate binary covariate x 1 = 1 if gender = man 0 if gender = woman more commonly termed an indicator variable conditional expectation µ 1 for men and µ 0 for women conditional expectation function is linear in x 1 E (yjx 1 ) = β 1 x 1 + β 2 2 covariates (including the intercept) β 2 = µ 0 β 1 = µ 1 µ 0 D. Steigerwald (UCSB) Specifying Expectation Functions 9 / 24

Multiple ndicator Variables notation: x 2 = 1 (union) there are 4 conditional means µ 11 for union men and µ 10 for nonunion men (and similarly for women) conditional expectation is linear in (x 1, x 2, x 1 x 2 ) E (yjx 1 ) = β 1 x 1 + β 2 x 2 + β 3 x 1 x 2 + β 4 β 4 mean for nonunion women β 1 male wage premium for nonunion workers β 2 union wage premium for women β 3 di erence in union wage premium for men and women x 1 x 2 is the interaction term (4 covariates) p indicator variables, 2 p conditional means D. Steigerwald (UCSB) Specifying Expectation Functions 10 / 24

Categorical Covariates 8 < 1 if race = white x 3 = 2 if race = black : 3 if race = other no meaning in terms of magnitude, only indicates category E (yjx 3 ) is not linear in x 3 represent x 3 with two indicator variables: x 5 = 1 (other) conditional expectation is linear in (x 4, x 5 ) x 4 = 1 (black) and E (yjx 4, x 5 ) = β 1 x 4 + β 2 x 5 + β 3 β 3 mean for white workers β 4 black-white mean di erence β 5 other-white mean di erence no individual who is both black and other - no interaction D. Steigerwald (UCSB) Specifying Expectation Functions 11 / 24

Continuous Covariates: Linear CEF E (yjx) is linear in x: E (yjx) = x T β y = x T β + e x = (x 1,..., x k 1, 1) T β = (β 1,..., β k ) T derivative of E (yjx) r x E (yjx) = β coe cients are marginal e ects marginal e ect - impact of change in one covariate holding all other covariates xed marginal e ect is a constant general existence of CEF Existence of the Conditional Mean D. Steigerwald (UCSB) Specifying Expectation Functions 12 / 24

Measures of Conditional Distributions common measure of location - conditional mean common measure of dispersion - conditional variance conditional variance σ 2 (x) = E e 2 jx σ 2 (x) := Var (yjx) = E (y E (yjx)) 2 jx often report σ (x), same unit of measure as y alternative representation y = E (yjx) + σ (x) ɛ E (ɛjx) = 0 E ɛ 2 jx = 1 D. Steigerwald (UCSB) Specifying Expectation Functions 13 / 24

Conditional Standard Deviation σ men = 3.05 σ women = 2.81 men have higher average wage and more dispersion D. Steigerwald (UCSB) Specifying Expectation Functions 14 / 24

Heteroskedasticity homoskedasticity E e 2 jx = σ 2 (does not depend on x) heteroskedasticity E e 2 jx = σ 2 (x) (does depend on x) unconditional variance E E e 2 jx constant by construction formally, conditional heteroskedasticity heteroskedasticity is the leading case for empirical analysis D. Steigerwald (UCSB) Specifying Expectation Functions 15 / 24

Proofs 1 Proof of Adding Covariates Theorem D. Steigerwald (UCSB) Specifying Expectation Functions 16 / 24

Review Why focus on E (yjx)? "best" predictor Suppose E (yjx) = x T β. How to do you interpret β? r x E (yjx) What is required for causality? r x e = 0 When is E (yjx) known? x discrete What is the leading empirical approach for dispersion? (conditional) heteroskedasticity E e 2 jx = σ 2 (x) D. Steigerwald (UCSB) Specifying Expectation Functions 17 / 24

Finite Moment Assumption Consider the family of Pareto densities f (y) = ay a 1 y > 1 a indexes the decay rate of the tail F larger a implies tail declines to zero more quickly D. Steigerwald (UCSB) Specifying Expectation Functions 18 / 24

Tail Behavior and Finite Moments for the Pareto densities R E jyj r a = 1 y r a 1 dy = a a r r < a r a r th moment is nite i r < a extend beyond Pareto distribution using tail bounds f (y) A jyj F F a 1 for some A < and a > 0 f (y ) is bounded below a scale of a Pareto density for r < a Z Z 1 Z E jy j r = jy jr f (y ) dy f (y ) dy + 2A y r a 1 dy 1 1 1 + 2A a r < D. Steigerwald (UCSB) Specifying Expectation Functions 19 / 24

Tail Behavior if the tail of a density declines at rate jyj a 1 or faster then y has nite moments up to (but not including) a intuitively, restriction that y has nite r th moment means the tail of the density declines to zero faster than y r 1 nite mean (but not variance), density declines faster than 1 y 2 nite variance (but not third moment), density declines faster than 1 y 3 nite fourth moment, density declines faster than 1 y 5 Return to Covariates D. Steigerwald (UCSB) Specifying Expectation Functions 20 / 24

Conditional Mean Existence f E jyj < then there exists a function m (x) such that for all measurable sets X E (1 (x 2 X ) y) = E (1 (x 2 X ) m (x)) (1) from probability theory e.g. Ash (1972) Theorem 6.3.3 m (x) is almost everywhere unique F F F if h (x) satis es (1) then there is a set S such that P (S) = 1 and m (x) = h (x) for x 2 S m (x) is called the conditional mean and is written E (y jx) (1) establishes E (y ) = E (E (y jx)) D. Steigerwald (UCSB) Specifying Expectation Functions 21 / 24

General Nature of Conditional Mean E (yjx) exists for all nite mean distributions y can be discrete or continuous x can be scalar or vector valued components of x can be discrete or continuous if (y, x) have a joint continuous distribution with density f (y, x) then the conditional density f y jx (yjx) is well de ned E (yjx) = R R yf y jx (yjx) dy Return to Conditional Mean D. Steigerwald (UCSB) Specifying Expectation Functions 22 / 24

Proof of Adding Covariates Theorem Theorem: f Ey 2 <, Var (y) Var (y E (yjx 1 )) Var (y E (yjx 1, x 2 )) Ey 2 < implies existence of all conditional moments in the proof First establish Var (y) Var (y E (yjx 1 )) let z 1 = E (yjx 1 ) y µ = (y z 1 ) + (z 1 µ) F note E ((z 1 µ) (y z 1 ) jx 1 ) = 0 E (y µ) 2 = E (y z 1 ) 2 + E (z 1 µ) 2 F y and z 1 both have mean µ Var (y) = Var (y z 1 ) + Var (z 1 ) Var (y) Var (y E (yjx 1 )) D. Steigerwald (UCSB) Specifying Expectation Functions 23 / 24

Completion of Proof Second establish Var (y E (yjx 1 )) Var (y E (yjx 1, x 2 )) let z 2 = E (yjx 1, x 2 ) also with mean µ F note E ((z 2 µ) (y z 2 ) jx 1, x 2 ) = 0 Var (y) = Var (y z 2 ) + Var (z 2 ) need to show Var (z 2 ) Var (z 1 ) (E (z 2 jx 1 )) 2 E z2 2jx 1 (conditional Jensen s inequality) E (E (z 2 jx 1 )) 2 E E z2 2jx 1 (unconditional expectations) F E (z 2 jx 1 ) = z 1 E E z2 2jx 1 = E z 2 2 (LE) Var (z 1 ) Var (z 2 ) Var (y E (yjx 1 )) Var (y E (yjx 1, x 2 )). Return to Proofs D. Steigerwald (UCSB) Specifying Expectation Functions 24 / 24