Regressio, Part I I. Differece from correlatio. II. Basic idea: A) Correlatio describes the relatioship betwee two variables, where either is idepedet or a predictor. - I correlatio, it would be irrelevat if we chaged the axes o our graph. - This is NOT true for regressio. B) I regressio, we ofte have a variable that is idepedet, ad aother that depeds o this variable. - For example, temperature (idepedet) ad activity level (depedet) i cold blooded aimals. Activity level obviously depeds o temperature (ad NOT vice-versa). - Ofte we also use this idepedet variable to predict the depedet variable. - We're also iterested i testig for sigificace, though i this case we look at a lie, ot just the relatioship betwee the variables. A) There is a possible relatioship betwee the variables uder cosideratio. This relatioship is modeled usig a equatio. I the simplest case, a equatio for a lie. III. Fittig the lie. 1) This raises the questio: is the lie sigificat? This is oe place where statistics come i. 2) There are actually may differet ways of estimatig the lie, but we'll oly lear the most commo method. A) We ll use what is called the least-squares lie. B) Illustrate. See fig. 12.7, p. 509 [12.7, p. 534] {12.3.7, p. 501}. C) The basic idea is that we add up all the residuals, ad the rotate the lie util the sum of squares for these distaces is miimized. 1) Residual - the vertical distace from a give poit to the lie goig through the poits (at least i regressio).
D) To do this, you eed calculus - (if you've had calculus: you differetiate, set the result to 0, solve, ad the make sure it s actually a miimum) 1) If you do all the calculus, you wid up with our estimates for the equatio of a lie: = b 0 = y x x) 2 where is the itercept ad is the slope. So basically you ll wid up with the equatio of a lie: Y = X where Y is the value of Y predicted for the particular X. ********* 4 th editio silliess: I a strage fit (o oe else really does it this way), the 4th editio does this differetly, ad presets the followig equatios: = r ( s y s x ) = y x it turs out they're the same if you ote the followig: r ( s y s x ) = ( x i (x x) 2 ( y 2 ( y 2 1 (x x) 2 1 = (x i x) 2 (the -1 terms cacel, the the SS y terms cacel, ad you're left with the square root of SS x x SS x i the deomiator, so the expressios are the same) We will use the expressio for ad as i the 2d ad 3rd editios, sice that's the way it's almost always doe.}. ********* Ed 4 th editio silliess. 2) Recap: You calculate ad, the put everythig ito the form for the equatio of a lie.
3) Note: estimates 0 estimates 1 As usual, β 0 ad β 1 are ukow. Sice we do t kow what they are, we estimate them with ad. Side commet: aother way of lookig at thigs: Y i = X i e i we ofte use this because this tells us what each y i is equal to. The e i s are the deviatio or residual betwee our equatio of a lie ad the actual value of y. The earlier equatio gives us a lie, this gives us a exact relatioship betwee x ad y. - Icidetally, miimizig the sum of the squares of the e i s will give us our least squares. - Also ote that this estimates the followig: y i = 0 1 x i i where the ε i is a ukow error term. 4) The ext step is to figure out if β 0 ad β 1 are sigificat, that is, do they mea aythig. As usual, we use our estimates. a) this is similar to what we ve doe previously: we hypothesize some value for β 1 - although we ca do the same thig for β 0, this is't doe too ofte (though we do occasioally get cofidece itervals for β 0. - β 1 is almost always tested for equivalece to 0, sice this idicates o slope => so o effect of x o y. - (Icidetally, if we do test β 0, we re hardly ever iterested i β 0 = 0. But, as metioed, we wo't cover this). IV. Let s illustrate thigs before we get lost. We ll use exercise 12.3, p. 511 [12.3 p. 536] {12.3.1, p. 503}. A) Set-up: a biologist ijects leucie ito frog egg cells. After a give amout of time he measures the amout of leucie that has bee absorbed by the cell proteis. Details? read the descriptio i the text. B) Results: Time 0 10 20 30 40 50 60 Leucie levels.02.25.54.69 1.07 1.50 1.74
C) You kow how to calculate most of this: 1) Sum of squares for X: 2,800 2) Sum of squares for Y: 2.4308 (Do we eed both of these? No - we oly eed the sum of squares for the x s) (Note that the 4 th editio does't give you these sice it's usig a differet way to calculate ad ). 3) Sum of cross products (SS cp ) for i = 1: (0-30)(.02-.83) = 24.3 i = 2: (10-30)(.25-.83) = 11.6 i = 3: (20-30)(.54-.83) = 2.9 i = 4: (30-30)(.69-.83) = 0 i = 5: (40-30)(1.07-.83) = 2.4 i = 6: (50-30)(1.50-.83) = 13.4 i = 7: (60-30)(1.74-.83) = 27.3 Add up everythig to get 81.90 (Agai, ote that the 4 th editio does thigs differetly) 4) The we estimate the slope of our lie: = 81.90 2,800 = 0.02925 5) So ow we have a estimate for the slope. Oce we get this, we get the itercept: = 0.83 0.02925 30 = 0.0475 6) So our equatio is: Y = -0.0475 + 0.02925 X ( or we could say: Y i = -0.0475 + 0.02925 X i + e i ) This gives us a best fit lie through the poits, usig our least squares criterio. 7) Let s plot all this to see what it looks like:
8) Some thigs to ote: a) We do t yet kow if this meas aythig - i.e., is this sigificat??? b) Eve though the slope is oly icreasig gradually, this does t imply o-sigificace (ote the differece i scales)