Quantitative Methods in Economics Conditional Expectations Maximilian Kasy Harvard University, fall 2016 1 / 19
Roadmap, Part I 1. Linear predictors and least squares regression 2. Conditional expectations 3. Some functional forms for linear regression 4. Regression with controls and residual regression 5. Panel data and generalized least squares 2 / 19
Takeaways for these slides Probability review Conditional probability share within subpopulation Conditional expectations minimize average squared prediction error (population concept) Like best linear predictor, but dropping linearity restriction 3 / 19
Probability review Assume for simplicity finite state space. Set of States of Nature S = {s 1,...,s M }. Probability Distribution (Measure): P(s j ) 0, M j=1 P(s j) = 1. An event A is a subset of S and has probability P(A) = s A P(s). 4 / 19
A random variable Y is a mapping from S to R. Its expectation is E(Y ) = M j=1 Y (s j )P(s j ). Distribution of Y : F Y (y) = s:y (s)=y P(s). Questions for you Express E(Y ) in terms of F Y 5 / 19
Solution: E(Y ) = M j=1 Y (s j )P(s j ) = L l=1 ( ) y l P(s j ) = j:y (s j )=y l L l=1 y l F(y l ). 6 / 19
Conditional Probability If P(B) > 0 and s j B, P(s j B) = P(s j) s B P(s) = P(s j)/p(b); P(s j B) = 0 ifs j / B. Same properties as unconditional probability: P(s j B) 0, M j=1 P(s j B) = 1. Conditional probability for events: P(A B) = P(s B) = s A B P(s) = P(A B)/P(B). s A P(B) 7 / 19
Properties of conditional probability Partition Formula: If the collection of disjoint subsets {B i } forms a partition of S, then Bayes Rule: Questions for you Prove these formulas. P(A) = P(A B j )P(B j ). j P(B i A) = P(A B i)p(b i ) P(A) = P(A B i)p(b i ) j P(A B j )P(B j ). 8 / 19
Conditional Expectation Conditional expectation given event B: E(Y B) = M j=1 Y (s j )P(s j B). Let X : S {x 1,...,x K } be another random variable. The regression function r : {x 1,...,x K } R has r(x) = E(Y X = x) = E(Y {s S : X(s) = x}). Conditional expectation given random variable X 9 / 19
Joint Distribution. The random vector (X,Y ) maps S into X Y = {(x,y) : x {x 1,...,x K },y {y 1,...,y L }}, and induces F XY (x,y) = P(X = x,y = y) = s S:X(s)=x,Y (s)=y P(s) for (x,y) X Y. Marginal Distribution: F X : X [0,1], F Y : Y [0,1], Conditional Distribution. If P(X = x) 0, F X (x) = F XY (x,y), y Y F Y (y) = F XY (x,y). x X F Y X (y x) = F YX (x,y)/f X (x). 10 / 19
Optimal Prediction Recall the definition of the best linear predictor: Ŷ = β 0 + β 1 X, min E(Y Ŷ ) 2 β 0,β 1 Now consider the same problem, without linearity: min g E[Y g(x)] 2. Questions for you Rewrite E[Y g(x)] 2 in terms of F XY (x,y). Now rewrite it using F Y X (y x) and F X (x). Solve for the optimal g. 11 / 19
Solution: Rewriting: Thus: E[Y g(x)] 2 = [y g(x)] 2 F XY (x,y) x,y ( ) [y g(x)] 2 F Y X (y x) arg min c = x y (y c) 2 F(y x) = E(Y X = x). y F X (x). So the optimal choice of the function g is the regression function r. 12 / 19
General state spaces and continuous distributions So far, we considered discrete state spaces. For continuous distributions, we can derive most of the same results. Let s start with the uniform distribution on [a, b]: P([c,d]) = 1 b a d c dx = d c b a (a c < d b). If A [a,b], P(A) = 1 dx. b a A 13 / 19
Expectation General notation: E(Y ) = Y (s)dp(s) S = y df Y (y), with F Y (B) = P{s : Y (s) B}. If the induced distribution for Y is uniform on [a,b], then E(Y ) = 1 b a b a y dy. 14 / 19
Generalizing our previous definitions Joint Distribution: Marginal Distribution: F XY (B A) = P{s : X(s) B,Y (s) A}. F X (B) = F XY (B R), F Y (A) = F XY (R A). Conditional Distribution: F Y X (A x) = P(Y A X = x). 15 / 19
Partition Formula: F XY (B A) = B F Y X (A x)df X (x). Bayes Rule: B P(X B Y A) = P(Y A X = x)df X (x) P(Y A X = x)dfx (x). 16 / 19
Densities Joint Density f XY (with respect to Lebesgue measure on R 2 ): F XY (B A) = f XY (x,y)dx dy B A ( ) ( ) = f XY (x,y)dx dy = f XY (x,y)dy dx. A B B A Marginal Density: f X (x) = Conditional Density: f XY (x,y)dy, f Y (y) = f Y X (y x) = f XY (x,y)/f X (x). F Y X (A x) = f Y X (y x)dy. A f XY (x,y)dx. 17 / 19
The joint density factors into the product of the conditional density and the marginal density: f XY (x,y) = f Y X (y x)f X (x). Partition Formula: f Y (y) = f XY (x,y)dx = f Y X (y x)f X (x)dx. Bayes Rule: f X Y (x y) = f Y X (y x)f X (x) fy X (y x)f X (x)dx. 18 / 19
Law of iterated expectations For Random Variables X and Y, E[E[Y X]] = E[Y ] Proof for the continuous case: E[E[Y X]] = = = = = yf Y X (y x)dyf X (x)dx yf Y X (y x)f X (x)dydx yf X,Y (x,y)dydx y f X,Y (x,y)dxdy yf Y (y)dy = E[Y ] 19 / 19