Inference about the Slope and Intercept Recall, we have established that the least square estimates and 0 are linear combinations of the Y i s. Further, we have showed that the are unbiased and have the following variances Var n X S XX S XX ˆ and Var ˆ 0 In order to make inference we assume that ε i s have a Normal distribution, that is ε i ~ N(0, σ ). This in turn means that the Y i s are normall distributed. ˆ ˆ ˆ ˆ Since both 0 and are linear combination of the Y i s the also have a Normal distribution. STA30/00 - week 3
Inference for β in Normal Error Regression Model The least square estimate of β is, because it is a linear combination of normall distributed random variables (Y i s) we have the following result: ˆ N ~, S XX We estimate the variance of b S /S XX where S is the MSE which has n- df. Claim: The distribution of is t with n- df. Proof: ˆ ˆ S ˆ S XX STA30/00 - week 3
Tests and CIs for β The hpothesis of interest about the slope in a Normal linear regression model is H 0 : β = 0. The test statistic for this hpothesis is b 0 b tstat S S. Eb S XX We compare the above test statistic to a t with n- df distribution to obtain the P-value. Further, 00(-α)% CI for β is: S b t n ; S b tn; S Eb. XX STA30/00 - week 3 3
Important Comment Similar results can be obtained about the intercept in a Normal linear regression model. See the book for more details. However, in man cases the intercept does not have an practical meaning and therefore it is not necessar to make inference about it. STA30/00 - week 3 4
Eample We have Data on Violent and Propert Crimes in 3 US Metropolitan Areas.The data contains the following three variables: violcrim = number of violent crimes propcrim = number of propert crimes popn = population in 000's We are interested in the relationship between the size of the cit and the number of violent crimes. STA30/00 - week 3 5
Comments Regarding the Crime Eample A regression model fit to data from all 3 cities finds a statisticall significant linear relationship between numbers of violent crimes in American cities and their populations. For each increase of,000 in population, the number of violent crimes increases b 0.093 on average. 48.6% of the variation in number of violent crimes can be eplained b its relationship with population. Because these data are observational, i.e. collected without eperimental intervention, it cannot be said that larger populations cause larger numbers of crimes, but onl that such an association appears to eist. However, this linear relationship is mostl determined b New York Cit whose population and number of violent crimes are much larger than an other cit, and thus accounts for a large fraction of the variation in the data. When New York is removed from the analsis there is no longer a statisticall significant linear relationship and the linear relationship with population eplains less than 9% of the variation in number of violent crimes. STA30/00 - week 3 6
STA30/00 - week 3 7 Bivariate Normal Distribution X and Y are jointl normall distributed if their joint densit is where - < < and - < <. Can show that the marginal distributions are: and ρ is the correlation between X and Y, i.e., ep, f, ~,, ~ N Y N X Y X Y E Y X E X E var var
Properties of Bivariate Normal Distribution It can be shown that the conditional distribution of Y given X = is: Y X ~ N, Linear combinations of X and Y are normall distributed. A zero covariance between an X and Y implies that the are statisticall independent. Note that this is not true in general for an two random variables. STA30/00 - week 3 8
Sample Correlation If X and Y are random variables, and we would like a smmetric measure of the direction and strength of the linear relationship between them we can use correlation. Based on n observed pairs ( i, i ) i =,,n, the estimate of the population correlation ρ is the Pearson s Product-Moment Correlation given b r It is the MLE of ρ. n i i i i i STA30/00 - week 3 9
Facts about r It measures the strength of the linear relationship between X and Y. It is distribution free. r is alwas a number between and. r = 0 indicates no linear association. r = or indicates that the points fall perfectl on a straight line with negative slope. r = or indicates that the points fall perfectl on a straight line with positive slope. The strength of the linear relationship increases as r moves awa from 0. STA30/00 - week 3 0
STA30/00 - week 3
Relationship between Regression and Correlation STA30/00 - week 3