L6: Regression II. JJ Chen. July 2, 2015

Size: px

Start display at page:

Download "L6: Regression II. JJ Chen. July 2, 2015"

Julian Holt
6 years ago
Views:

1 L6: Regression II JJ Chen July 2, 2015

2 Today s Plan Review basic inference based on Sample average Difference in sample average Extrapolate the knowledge to sample regression coefficients Standard error, robust standard error Hypothesis testing Confidence interval

3 Population Parameters

4 Toy Population Data Some population data on people s college type and their later earnings Yi: Earnings Pi: Dummy variable for attending private college Suppose there re 1,000,000 people in the population Lot of observations: measures such as expected value, variance, and standard deviations are useful to summarize the data

5 Data Source: local data frame [1,000,000 x 2] P Y

6 Distribution: Count

7 Distribution: Density

8 E, Var, SD E(Y) Var(Y) SD(Y) n(y) 78, ,266,203 19,500 1,000,000 E(Y) Var(Y) SD(Y),, and are population parameters Their Greek names are:,, and μ Y σ 2 Y σ Y

9 E, Var, SD E(Y) Var(Y) SD(Y) n(y) 78, ,759,816 19,462 1,000,000 Expected value measures central tendency Variance measures dispersion of the data Standard deviation is the square root of variance SD has the same unit as Yi Expectation and standard deviation are also useful for doing normalization

10 Normalization I To review the idea of normalization (something you already know from stat class), suppose we also have data on another population Maybe a country that has severe inflation E(Y2) Var(Y2) SD(Y2) N(Y2) 7,870,027 14,751,343,877,384 3,840, ,000

11 Normalization II If we want to compare two persons earning, but they are from two different population person A from population 1: $48,527 person Z from population 2: #4,558,819 A natural thing to do is to normalize their earnings so that they are comparable Many ways of normalization, usually done by finding a fixed unit, for example The amount of dollars person Z can buy The numbers of coffee beans both person can buy

12 Normalization III Expectation and standard deviation can be used to do normalization Step 1: Find deviation from mean Step 2: Rescale the deviation using std. dev. For example: Person A: Person Z: 4,558,819 E[Y2 i ] SD(Y2 i ) 48,527 E[ Y i ] SD( ) Y i = ( )/19456 = 1 = ( )/ = These scores are normalized deviations from means of two population

13 Normalization IV Suppose we randomly pick two persons from the first sample: Person A from population 1: $48,527 Person B from population 1: $95,301 We can also use normalization to say somethings about the relative position of their earnings: Person A: 48,527 E[ Y i ] SD( ) Y i Person B: 95,301 E[ Y i ] SD( ) Y i = (48, , 541)/19, 456 = 1.5 = (95, , 541)/19, 456 = 0.86

14 Conditional Parameters Back to the first population, we can focus on more interesting population expectation CE: E[ ] = E[ Y 1 i Yi Pi = 1] CE: E[ ] = E[ = 0] Y 0 i Yi Pi E[ Yi Pi] Together they makes a CEF: For each group, we can also find it s population variance (and standard deviation) CV: Var( Y 1 i ), Var( Y 1 i ) Sometimes CVs are the same, but most of the time are not

15 Conditional Distribution

16 Pop Scattor Plot

17 Pop Scattor Plot: Jitter Version

18 Pop Conditional Expectation

19 Pop Conditional Expectation Func

20 Population Regression Call: lm(formula = Y ~ P, data = college) Coefficients: (Intercept) P

21 Population Regression: Plot

22 Pop Expectation P Cond. E(Y) Std. Dev. Total Observations 0 69,978 17, , ,015 14, ,967

23 Population Parameters To summarize, we have the following fixed population parameters: E[ ] = 78, 614 Yi SD( ) = 19, 466 Yi E[ = 0] = μ Yi Pi 0 = 69, 949 E[ = 1] = μ Yi Pi 1 = 89, 926 E[ = 1] E[ = 0] = Δμ = 19, 977 Yi Pi Yi Pi SD( Y 0 ) = σ Y 0 = 17, 981 SD( Y 1 ) = σ Y 1 = 14, 976 α = 69, 949 Reg. Intercept: Reg. Slope: β = 19, 977

24 Unknown Population Of course, we don t have population data for many problems Thus, we really don t know all the population parameters Statisticians use a sample to make inference about a population

25 Inference: Sample Average

26 Sample 1 Suppose we get a 1% sample from the population Sample 1: Summary Statistics Sample Average Sample Variance Sample Standard Deviation Sample Size P 78, ,454,177 19,505 10,000 Sample Average Sample 1: Summary Statistics for Groups Sample Variance Sample Standard Deviation Sample Size 0 69, ,719,278 17,909 5, , ,719,421 15,057 4,303

27 Sample 2 Suppose we are lucky and have another 1% sample from the population Sample 2: Summary Statistics Sample Average Sample Variance Sample Standard Deviation Sample Size P 78, ,841,427 19,541 10,000 Sample Average Sample 2: Summary Statistics for Groups Sample Variance Sample Standard Deviation Sample Size 0 69, ,349,745 17,842 5, , ,887,797 15,063 4,330

28 Sample 3 Another 1% sample Sample 3: Summary Statistics Sample Average Sample Variance Sample Standard Deviation Sample Size P 78, ,556,306 19,585 10,000 Sample Average Sample 3: Summary Statistics for Groups Sample Variance Sample Standard Deviation Sample Size 0 70, ,989,121 18,083 5, , ,058,638 15,201 4,336

29 Sample Average is Random All the 3 sample give different sample statistics Since sample average is random, it also has mean and standard error E( Ȳ 1 i ) = E( ) SE( Ȳ 1 i ) = Y σ 1 i Y n Coming from square root of sampling variance Var( ) Ȳ 1 i

30 Review Sampling Variance V( Ȳ) = V ([ 1 n ]) Yi def. of Ȳ n i=1 = 1 V ( n ) n 2 i=1 since V(aY) = V(Y) Yi a 2 = = 1 n 2 i=1 1 n n n 2 i=1 V( ) each is indecently drawn Yi V(Y) each Yi Yi is from a same distribution

31 Review of Sampling Variance Simplifying further V( Ȳ) = 1 n n 2 i=1 V(Y) = 1 n 2 i=1 n σ 2 Y Greek name = nσ 2 Y sum of n identical quantity = n 2 cancel n σ 2 Y n

32 The Problem All the 3 sample give different sample statistics, but they all are very close to population parameters Sample 1: Ȳ 1 i = 90, 132 Sample 2: Ȳ 1 i = 89, 888 Sample 3: Ȳ 1 i = 90, 303 Population: E[ 1 Y i ] = 89, 926 The difficulties, of course, are that 1. We don t know population parameters 2. We don t have that many sample The problem is: how to draw inference based on just one sample?

33 Hypothesis Testing Hypothesis testing is one way of drawing inference We don t know population parameters? Make a guess, say: μ 1 = 90, 000 We only have one sample? Make a normalization based on (1) the assumed expectation and (2) standard error of sample average and find the relative position, say for sample 1 Ȳ 1 μ i 1 SE( Yi 1 ) = 90, , 000 SE( Yi 1 )

34 Scaled by Est. SE Very bad, the standard error of Ȳ 1 i also contains a population parameter we don t know: SE( ) = Ȳ 1 i We use sample standard deviation, the estimated SE: σ Y n σ Y Est. SE( Ȳ 1 i ) = SE^ ( Ȳ 1 i ) =, instead and get σ Y n

35 Rescaling Scores

36 The rescaling score based on sample 1 is = = Ȳ 1 μ i 1 SE^ ( Yi 1 ) 90, , 000 SE^ ( Yi 1 ) 90, , 000 σ Y n = 0.73 That is, the sample average based on the hypothesis that μ 1 = 90, 000 has a position of 0.73 standard error above the mean = 90, , ,964 10,000

37 Is it good? t value and t stat The score is a particular value calculated based on the rescaling formula: t Ȳ 1 μ i 1 SE^ ( Yi 1 ) We call the score a value and the formula is used to construct a statistics t

38 t CLT The stat has a very good property: IF E[ Yi] is indeed equal to μ, then as long as the sample is large enough the quantity t(μ) has a sampling distribution that is very close to a bell shaped standard normal distribution, regardless how Yi is distributed.

39 Hypothesis Testing Reasoning 90, 132 Say sample 1 gives a sample average of Make a guess about population expectation, say, Normalize the sample average and get the value of t 90, 000

40 Hypothesis Testing Reasoning If our guess in indeed true and the sample is large, stat should follow a standard normal distribution, and so a t value of 0.73 is no bad (within the range of [ 2, +2] ) That is, under the null hypothesis that our guess is true, the likelihood of having this t value is acceptable ( p value = 0.77, not that small) Conclusion/Decision: The sample doesn t provide strong evidence to reject the null; we choose to live with our guess unless we find new evidence later t

41 Confidence Interval Making reasonable guess asks us to think, but to save our ene thinking ideas, why not just give a range of population expec Confidence interval is another way of drawing inference When calculated in repeated samples, the 95% confidence inter approximately The first sample gives a CI: [ 2 ( ), + 2 ( )] Ȳ SE^ Ȳ Ȳ SE^ Ȳ [90, , 90, ] = [89773

42 Inference: Difference in Sample Average

43 HT and CI We re also interested in difference in pollution expectation, Δμ A naive comparison Making inference based on difference in sample average is almost the same Hypothesis Testing: normalization score based on difference in sample average,, and standard error, SE(Δ ) Ȳi Confidence Interval: The only complication is the estimated standard error formula: ΔȲi [Δ 2 SE^ (Δ Ȳ), Δ Ȳ + 2 SE^ (Δ Ȳ)] Ȳi SE^ (Δ Ȳ)

44 Complication To see the estimated standard error formula, recall that it s based on a standard error formula, which is based on sampling variance of, Δ = Ȳi Ȳ 1 i Ȳ 2 i V(ΔY) = V( Ȳ 1 Ȳ 0 ) def. of ΔY = V( Ȳ 1 ) + V( Ȳ 0 ) each Yi is indeed. drawn V 1 ( Yi) V 0 ( Yi) = + def. of sampling var. n 1 n 0

45 Complication V 1 ( Yi) V 0 ( Yi) V(ΔY) = + n 1 The complication is that whether you want to make an assumption that V 1 ( Yi) = V 0 ( Yi) That is, whether earnings for group 1 and group 2 have a same population variance (and hence standard standard deviation) of the underlying variable, n 0 Yi

46 Pop Box Plot

47 Pop Conditional Distribution

48 Unequal Var. In our particular example, population group variance are difference, so we have sampling variance, standard error, and estimated standard error for unequal group variance V 1 ( Yi) V 0 ( Yi) V(ΔY) = + n 1 n 0 V 1 ( Yi) V 0 ( Yi) SE(ΔY) = + n 1 n 0 SE^ (ΔY) = V 1 ( Yi ) V 0 ( Yi) +

49 n 1 n 0 Equal Var. If you re willing to assume equal group variance, the three formula can be simplified as )[ ( ) ( ) 1 1 ] V(ΔY) = + = V( + n 1 n 1 V 1 Yi SE(ΔY) = SD( ) Yi [ SE^ (ΔY) = SD^ [ ( Yi ) V 0 Yi n ] + n 1 n ] + n 1 n 0 Yi n 0

50 What to Choose? In most cases, especially when data is not from an experimental setting, the assumption of equal variance would be too strong

51 Hypothesis Testing Reasoning Assume group variances are not equal, we can now perform hypothesis testing, just like what we did for sample average We have a difference in sample average, ΔYi, but we want to know the difference in population expectation, Δμ = Δ E[ ] = E[ = 1] E[ = 0] Yi Y 1 i We don t know population parameters, so we make a guess, say Δμ = a number Normalize the difference in sample average and get a t value Given the guess, see if the value is too dramatic such t Pi Y 0 i Pi

52 that it s unacceptable Confidence Interval Again, confidence interval is just the similar bound: [Δ Ȳ 2 SE^ (Δ Ȳ), Δ Ȳ + 2 SE^ (Δ Ȳ)]

53 Test Sample Means by Group P mean(y) var(y) sd(y) n(y) 0 69, ,719,278 17,909 5, , ,719,421 15,057 4,303 Sample Mean Comparison Difference in Sample Mean Est. Std. Err. (323.5)

54 Inference: Sample Regression Coefficient

55 Bivariate Sample Regression Consider again a bivariate regression with a dummy variable But this time the regression is estimated based on a sample Call: lm(formula = Y ~ P, data = sam1) Coefficients: (Intercept) P

56 Estimated Regression The estimated regression is α β Pi Y^i = + = Pi How would you interpret the estimated slope?

57 The Logic Carries The estimated slope gives the difference in sample average So if we re worry about sampling uncertainty associated with the estimated slope, we can also use an estimated standard error Hypothesis Testing Confidence Interval

58 Displaying Reg. Results Software always give estimated coefficients and estimated standard errors: term estimate std.error statistic p.value 1 (Intercept) P It s customary to display both estimated regression coefficients, and estimated standard error

59 Displaying An Equation term estimate std.error statistic p.value 1 (Intercept) P Y^i = Pi (221.8) (338.13) Again, estimated standard error tells us how precise a coefficient is being estimated

60 Display An Table term estimate std.error statistic p.value 1 (Intercept) P A more common way to display regression results is regression table: A Regression Table P Y (332.5) Intercept Note: Estimated standard errors are in parenthesis. (217.9)

61 Complication Estimated slope is giving the difference in sampleaverage in bivariate regression with a dummy variable Just like the standard errors of difference in sampleaverage, there re two basic ways of computing standard errors for estimated reg. coef.: Assume equal group variance : Homoskedasticity Assume unequal group variance : Heteroskedasiticity

62 Homoskedasticity Homoskedasticity is an old fashioned assumption Throughout the world, perhaps only undergrads taking introductory stat/ metrics are using them Homoskedasticity gives an simple formula for std. err. and est. std. err: and σ e SE( β ) = n σ e SE^ ( β ) = n 1 σ X 1 σ X

63 Heteroskedasiticity A more realistic assumption is Heteroskedasiticity, and it gives another formulas, usually called (Est.) Robust Standard Error formulas and RSE( β ) = RSE( ^ β ) = 1 n 1 n V( Xie i ) (σx 2 i ) 2 ( ) V^ Xie i (σ 2 X i ) 2

64 HT and CI Hypothesis testing and confidence interval are based on the similar recipe: t(β) = β β RSE ^ ( β ) [ 2 RSE ^ ( β ), β + 2 RSE ^ ( )] β β

65 Demo Many software still use homoskedasticity standard errors as default, so usually we need to put a few extra efforts to get the est. std. err. we want In Stata, this can be done by specifying the robust option: reg Y P, robust In R, it s still a bit complicated

66 Demo: Homoskedasticity Std. Err. lm(y ~ P, data = sam1) Call: lm(formula = Y ~ P, data = sam1) Coefficients: (Intercept) P est.reg = lm(y ~ P, data = sam1)

67 Demo: Homoskedasticity Std. Err. library(broom) tidy(est.reg) term estimate std.error statistic p.value 1 (Intercept) P

68 Demo: Heteroskedasticity Std. Err. library(sandwich) vcovhc(est.reg) (Intercept) P (Intercept) P sqrt(55143) [1] 235 sqrt(104667) [1] 324

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals (SW Chapter 5) Outline. The standard error of ˆ. Hypothesis tests concerning β 3. Confidence intervals for β 4. Regression