Regression, Inference, and Model Building

Regressio, Iferece, ad Model Buildig Scatter Plots ad Correlatio Correlatio coefficiet, r -1 r 1 If r is positive, the the scatter plot has a positive slope ad variables are said to have a positive relatioship If r is egative, the the scatter plot has a egative slope ad variables have egative relatioships If r = 0, the o liear relatioship exists betwee the two variables If r = 1, the the data falls i a perfect lie with a positive slope If r = -1, the data falls i a perfect lie with a egative slope Stregth of the correlatio is r The larger r is, the stroger the correlatio r = ( xy) ( x)( y) ( ( x ) ( x) ) ( ( y ) ( y) ) Sigificat Liear Relatioship (two-tailed test): Ho: p = 0 (implies there is o sigificat liear relatioship) Ha: p 0 (implies there is a sigificat liear relatioship) Negative Liear Relatioship (left-tailed test): Ho: p 0 Ha: p < 0 Sprig 017

Positive Liear Relatioship (right-tailed test): Ho: p 0 Ha: p > 0 Test statistic: t = r 1 r Degrees of freedom = - Sigificat Liear Relatioship (two-tailed tests): Reject Ho if t (tα) Negative Liear Relatioship (left-tailed test): Reject Ho if t t α Positive Liear Relatioship (right-tailed test): Reject Ho if t t α Usig critical values to determie statistical sigificace: The correlatio coefficiet, r, is statistically sigificat if the absolute value of the correlatio is greater tha the critical value i the table. r > r α The coefficiet of determiatio, r, is the measure of the amout of variatio i y explaied by the variatio i x. Sprig 017

Fittig a Liear Model A liear relatioship is graphically described as a lie. y = b 0 + b 1 x Error = observed Y predicted Y Error = y - y y = b 0 + b 1 x + error y = b 0 + b 1 x b 0 = y-itercept b 1 = slope of the lie y = predicted y = umber of data values Sum of Squared Errors (SSE): SSE = (error i ) = (y i y ) i = (y i (b 0 + b 1 x 1 )) Defiig the least squares lie: Slope = b 1 = ( x)( y) xy x ( x) y itercept = b 0 = 1 ( y b 1 x) r = ( ( xy) ( x)( y) ) ( ( x ) ( x) ) ( ( y ) ( y) ) The parameter r is defied as the fractio of the total variatio explaied by the least squares lie. I other words, r measures how well the least squares lie fits the sample data. If the total variatio is explaied completely, the we have r = 1 ad we say that there is a perfect liear correlatio. O the other had, if the total variatio is all uexplaied, the r = 0. Regressio Aalysis Variace of Errors (S e ) = SSE Variace of Slope: (S b1 ) = x (S e ) ( x) Sprig 017

Bouds of cofidece iterval: b 1 ± (tα) (S b1 ) Multiple Regressio Multiple regressio model y = b 0 + b 1 x 1 + b x + + b k x k b 1 =slope S b1 =stadard deviatio of slope (tα)=look up i table α ad df = - Where x 1, x,, x k are the k idepedet variables i the model ad b 1, b,, b k are the correspodig coefficiets of the idepedet variables. These values, b 1, b,, b k, are the sample estimates of the correspodig populatio parameters, β 1, β,, β k. The y-itercept of the multiple regressio equatio is b 0, which is the sample estimate of the populatio parameter, β 0. The multiple coefficiet of determiatio, R Test the claim that at least oe of the idepedet variables coefficiets is ot equal to 0 Ho: β 1 = β = = β k = 0 Ha: At least oe coefficiet does ot equal 0 p-value = Sigificace F (from ruig the ANOVA test i excel) p-value < alpha reject the ull p-value alpha fail to reject the ull Test if specific variables are sigificat Ho: β 1 = 0 Ha: β 1 0 *If reject ull the variable is sigificat ANOVA Regressio Aalysis of Variace provides iformatio about how well a estimated regressio model fits the data. ANOVA table divides the total variatio i Y ito variatio that ca be explaied by the model ad variatio that caot be explaied by the model. The divisio of variatio takes place i the colum labeled SS. Source SS DF MS F Regressio SSR k MSR = SSR k Error SSE -(k+1) SSE MSE = (k + 1) Total TotalSS -1 F = MSR MSE Sprig 017

Total variatio i Y = variatio i Y explaied by the model + variatio i Y ot explaied by the model TotalSS = SSR + SSE SSR = Sum of Squared Regressio = amout of variatio i Y explaied by the model SSR = TotalSS - SSE SSE = Sum of Squared Error = variatio i Y that the model could ot explai SSE = (y i y ) i=1 TotalSS = Total Sum of Squared Deviatios about the mea of the depedet variable Y TotalSS = (y i y ) i=1 *Note: that if this expressio were divided by (-1), it would be the sample variace of k = umber of idepedet variables i the model, called DFR Degrees of Freedom amog Regressio -(k+1) = degrees of freedom associated with uexplaied variatio i the model, called DFE Degrees of Freedom amog Error -1 = total degrees of freedom, where is the umber of observatios, called DFT Total Degrees of Freedom MSR = SSR, average amout of variatio explaied per idepedet variable k MSE = SSE, variace of the error terms (k+1) F statistic = MSR, large F values are desirable sice that would idicate that explaied MSE variatio is relatively larger tha the uexplaied. Sprig 017