Chapter 13: The Correlation Coefficient and the Regression Line. We begin with a some useful facts about straight lines.

Similar documents
We say that y is a linear function of x if. Chapter 13: The Correlation Coefficient and the Regression Line

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.

AP Statistics Practice Test Unit Three Exploring Relationships Between Variables. Name Period Date

TEST 3A AP Statistics Name: Directions: Work on these sheets. A standard normal table is attached.

AP Statistics Notes Unit Two: The Normal Distributions

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007

This section is primarily focused on tools to aid us in finding roots/zeros/ -intercepts of polynomials. Essentially, our focus turns to solving.

Lab 1 The Scientific Method

Trigonometric Ratios Unit 5 Tentative TEST date

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression

, which yields. where z1. and z2

READING STATECHART DIAGRAMS

Physics 2010 Motion with Constant Acceleration Experiment 1

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA

Differentiation Applications 1: Related Rates

Getting Involved O. Responsibilities of a Member. People Are Depending On You. Participation Is Important. Think It Through

SPH3U1 Lesson 06 Kinematics

NUMBERS, MATHEMATICS AND EQUATIONS

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS

Introduction to Spacetime Geometry

We can see from the graph above that the intersection is, i.e., [ ).

Math Foundations 20 Work Plan

Chapter Summary. Mathematical Induction Strong Induction Recursive Definitions Structural Induction Recursive Algorithms

making triangle (ie same reference angle) ). This is a standard form that will allow us all to have the X= y=

Thermodynamics Partial Outline of Topics

Lesson Plan. Recode: They will do a graphic organizer to sequence the steps of scientific method.

Lab #3: Pendulum Period and Proportionalities

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

Sections 15.1 to 15.12, 16.1 and 16.2 of the textbook (Robbins-Miller) cover the materials required for this topic.

CHM112 Lab Graphing with Excel Grading Rubric

How do scientists measure trees? What is DBH?

MODULE 1. e x + c. [You can t separate a demominator, but you can divide a single denominator into each numerator term] a + b a(a + b)+1 = a + b

Lifting a Lion: Using Proportions

Experiment #3. Graphing with Excel

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank

CESAR Science Case The differential rotation of the Sun and its Chromosphere. Introduction. Material that is necessary during the laboratory

Distributions, spatial statistics and a Bayesian perspective

[COLLEGE ALGEBRA EXAM I REVIEW TOPICS] ( u s e t h i s t o m a k e s u r e y o u a r e r e a d y )

AP Physics Kinematic Wrap Up

5 th grade Common Core Standards

Activity Guide Loops and Random Numbers

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India

B. Definition of an exponential

ALE 21. Gibbs Free Energy. At what temperature does the spontaneity of a reaction change?

**DO NOT ONLY RELY ON THIS STUDY GUIDE!!!**

Medium Scale Integrated (MSI) devices [Sections 2.9 and 2.10]

37 Maxwell s Equations

The Law of Total Probability, Bayes Rule, and Random Variables (Oh My!)

The standards are taught in the following sequence.

Hypothesis Tests for One Population Mean

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >

Department of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets

CHAPTER 2 Algebraic Expressions and Fundamental Operations

LCAO APPROXIMATIONS OF ORGANIC Pi MO SYSTEMS The allyl system (cation, anion or radical).

End of Course Algebra I ~ Practice Test #2

Section 5.8 Notes Page Exponential Growth and Decay Models; Newton s Law

20 Faraday s Law and Maxwell s Extension to Ampere s Law

Physics 2B Chapter 23 Notes - Faraday s Law & Inductors Spring 2018

Math Foundations 10 Work Plan

Kinetic Model Completeness

Phys. 344 Ch 7 Lecture 8 Fri., April. 10 th,

Five Whys How To Do It Better

What is Statistical Learning?

Pattern Recognition 2014 Support Vector Machines

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data

Computational modeling techniques

2004 AP CHEMISTRY FREE-RESPONSE QUESTIONS

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint

Pipetting 101 Developed by BSU CityLab

Plan o o. I(t) Divide problem into sub-problems Modify schematic and coordinate system (if needed) Write general equations

Math 105: Review for Exam I - Solutions

Flipping Physics Lecture Notes: Simple Harmonic Motion Introduction via a Horizontal Mass-Spring System

Admissibility Conditions and Asymptotic Behavior of Strongly Regular Graphs

1b) =.215 1c).080/.215 =.372

MATHEMATICS SYLLABUS SECONDARY 5th YEAR

Flipping Physics Lecture Notes: Simple Harmonic Motion Introduction via a Horizontal Mass-Spring System

BASD HIGH SCHOOL FORMAL LAB REPORT

PSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa

In the OLG model, agents live for two periods. they work and divide their labour income between consumption and

AIP Logic Chapter 4 Notes

Chapter 8 Predicting Molecular Geometries

THE LIFE OF AN OBJECT IT SYSTEMS

Our Lady Star of the Sea Religious Education CIRCLE OF GRACE LESSON PLAN - Grade 1

Corrections for the textbook answers: Sec 6.1 #8h)covert angle to a positive by adding period #9b) # rad/sec

Associated Students Flacks Internship

Function notation & composite functions Factoring Dividing polynomials Remainder theorem & factor property

I. Analytical Potential and Field of a Uniform Rod. V E d. The definition of electric potential difference is

Biochemistry Summer Packet

Administrativia. Assignment 1 due thursday 9/23/2004 BEFORE midnight. Midterm exam 10/07/2003 in class. CS 460, Sessions 8-9 1

CESAR Science Case Rotation period of the Sun and the sunspot activity

Comparing Several Means: ANOVA. Group Means and Grand Mean

Interference is when two (or more) sets of waves meet and combine to produce a new pattern.

Physics 212. Lecture 12. Today's Concept: Magnetic Force on moving charges. Physics 212 Lecture 12, Slide 1

Who is the Holy Spirit?

ENSC Discrete Time Systems. Project Outline. Semester

1 PreCalculus AP Unit G Rotational Trig (MCR) Name:

Resampling Methods. Chapter 5. Chapter 5 1 / 52

Chapter 3 Kinematics in Two Dimensions; Vectors

Checking the resolved resonance region in EXFOR database

Revised 2/07. Projectile Motion

Transcription:

Chapter 13: The Crrelatin Cefficient and the Regressin Line We begin with a sme useful facts abut straight lines. Recall the x, y crdinate system, as pictured belw. 3 2 y = 2.5 1 y = 0.5x 3 2 1 1 2 3 1 2 3 y = 1 x + 337

We say that y is a linear functin f x if y = a + bx, fr sme numbers a and b. If y = a + bx then the graph f the functin is a straight line with y-intercept equal t a and slpe equal t b. The line is hrizntal if, and nly if, b = 0;.w. it slpes up if b > 0 and slpes dwn if b < 0. The nly lines nt cvered by the abve are the vertical lines, e.g. x = 6. Vertical lines are nt interesting in Statistics. In math class we learn that lines extend frever. In statistical applicatins, as we will see, they never extend frever. This distinctin is very imprtant. In fact, it wuld be mre accurate t say that statisticians study line segments, nt lines, but everybdy says lines. It will be very imprtant fr yu t understand lines in tw ways, what I call visually and analytically. + 338

Here is what I mean. Cnsider the line y = 5+2x. We will want t substitute (plugin) values fr x t learn what we get fr y. Fr example, x = 3. We d this analytically by substituting in the equatin: y = 5 + 2(3) = 11. But we can als d this visually, by graphing the functin. Walk alng the x axis until we get t x = 3 and then climb up a rpe (slide dwn a ple) until we hit the line. ur height when we hit the line is y = 11. (Draw picture n bard.) The Scatterplt We are interested in situatins in which we btain tw numbers per subject. Fr example, if the subjects are cllege students, the numbers culd be: X = height and Y = weight. X = scre n ACT and Y = first year GPA. + 339

X = number f AP credits and Y = first year GPA. Law schls are interested in: X = LSAT scre and Y = first year law schl GPA. and s n. In each f these examples, the Y is cnsidered mre imprtant by the researcher and is called the respnse. The X is imprtant b/c its value might help us understand Y better and it is called the predictr. Fr sme studies, reasnable peple can disagree n which variable t call Y. Here are tw examples: The subjects are married cuples and the variables are: wife s IQ and husband s IQ. The subjects are identical twins and the variables are: first brn s IQ and secnd brn s IQ. We study tw big tpics in Chapter 13. Fr the first f these, the crrelatin cefficient, it des nt matter which variable is called Y. + 340

Fr the secnd f these, the regressin line, changing the assignment f Y and X will change the answer. Thus, if yu are uncertain n the assignment, yu might chse t d the regressin line analysis twice, nce fr each assignment. The material in Chapter 13 differs substantially frm what we have dne in this class. In Chapter 13, we impse fairly strict structure n hw we view the data. This structure allws researchers t btain very elabrate answers frm a small amunt f data. Perhaps surprisingly, these answers have a histry f wrking very well in science. But it will be imprtant t have a healthy skepticism abut the answers we get and t examine the data carefully t decide whether the impsed structure seems reasnable. We begin with an example with n = 124 subjects, a very large number f subjects fr these prblems. As we will see, ften n is 10 r smaller. + 341

The subjects are 124 men wh played majr league baseball in bth 1985 and 1986. This set cntains every man wh had at least 200 fficial at-bats in the American League in bth years. The variables are: Y = 1986 Batting Average (BA) and X = 1985 BA. The idea is that, as a baseball executive, yu might be interested in learning hw effectively ffensive perfrmance ne year (1985) can predict ffensive perfrmance the next year (1986). In case yu are nt a baseball fan, here is all yu need t knw abut this example. BA is a measure f ffensive perfrmance, with larger values better. BA is nt really an average; it is a prprtin: BA equals number f hits divided by number f fficial at-bats. BA is always reprted t three digits f precisin and a BA f, say, 0.300 is referred t as hitting 300. BTW, 300 is the threshld fr gd hitting and 200 is the threshld fr really bad hitting. + 342

The names and data fr the 124 men are n pp. 442 3. Behaving like the MITK, we first study the variables individually, fllwing the ideas f Chapter 12. X 0.180 0.240 0.300 0.360 Y 0.180 0.240 0.300 0.360 These histgrams suggest small and large utliers bth years. In additin, bth histgrams are clse t symmetry and bell-shape. Als, the means and sd s changed little frm X t Y. Year Mean St.Dev. 1985 0.2664 0.0280 1986 0.2636 0.0320 + 343

Belw is the scatterplt f these BA data. The first thing we lk fr are islated cases (IC). I see tw, pssibly three, IC identified by initials belw: WB, DM and FR. 1986 Batting Ave. 1985 Batting Ave. 0.170 0.210 0.250 0.290 0.330 0.370 0.170 0.210 0.250 0.290 0.330 0.370 2 2 2 WB DM FR + 344

Nw, ignre the utliers and lk fr a pattern in the remaining data. Fr the BA data, the data describe an ellipse that is tilted upwards (lwer t the left, higher t the right). This is an example f a linear relatinship between X and Y ; i.e. as X grws larger (sweep yur eyes frm left t right in the picture), the Y values tend t increase (becme higher). In Chapter 13, we limit attentin t data sets that reveal a linear relatinship between X and Y. If yur data d nt fllw a linear relatinship, yu shuld nt use the methds f Chapter 13. Thus, yur analysis shuld always begin with a scatterplt t investigate whether a linear relatinship is reasnable. Page 447 f the text presents five hypthetical scatterplts: ne reveals an increasing linear pattern; ne reveals a decreasing linear pattern; and the remaining three shw varius curved relatinships between X and Y. Thus, t reiterate; if yur scatterplt is curved, d nt use the methds f Chapter 13. + 345

Page 448 f the text presents fur scatterplts fr data sets fr small values f n (the n s are 9, 6, 12 and 13, typical sizes in practice). The subjects are spiders and the fur scatterplts crrespnd t fur categries f spiders. Fr each spider, Y is heart rate and X is weight. Abve each scatterplt is the numerical value f r, the crrelatin cefficient f the data. At this time, it suffices t nte that r > 0 indicates (reflects?) an increasing linear relatinship and r < 0 indicates a decreasing linear relatinship between Y and X. There are tw imprtant ideas revealed by these scatterplts. First, fr small n it can be difficult t decide whether a case is islated; whenever pssible, use yur scientific knwledge t help with this decisin. Secnd, especially fr a small n, the presence f ne r tw islated cases can drastically change ur view f the data. Fr example, cnsider the n = 9 small hunters. + 346

The tw spiders in the lwer left f the scatterplt might be labeled islated. Including these cases, the text states that r > 0, but if they are deleted frm the data set (which culd be a deliberate actin by the researcher, r perhaps these guys were stepped n during their cmmute t the lab) then r < 0. Scientists typically get very excited abut whether r is psitive r negative, s it is ntewrthy that its sign can change s easily. Thus far, we have been quite casual abut lking at scatterplts. We say, The pattern is linear and lks increasing (decreasing, flat). It will remain (in this curse) the jb f ur eyes and brain t decide n linearity, but the matter f increasing r decreasing will be usurped by the statisticians. Furthermre, using my eyes and brain, I can say that the pattern is decreasing fr tarantulas and fr web weavers (r agrees with me), and I can say that the linear pattern is strnger fr the tarantulas. + 347

The crrelatin cefficient agrees with me n the issue f strength and has the further benefit f quantifying the ntin f strnger in a manner that is useful t scientists. I am nt very gd at mtivating the frmula fr the crrelatin cefficient. In additin, the end f the semester is near, s time is limited. The interested student is referred t pp. 450 3 f the text fr a (partial) explanatin f the frmula. Here is the briefest f presentatins f the frmula. Each subject has an x and a y. We standardize these values int x and y : x = (x x)/s X ; y = (y ȳ)/s Y. We then frm the prduct z = x y. The idea is that z > 0 prvides evidence f an increasing relatinship and z < 0 prvides evidence f a decreasing relatinship. + 348

(The prduct is psitive if bth terms are psitive r bth are negative. Bth psitive means a large x is matched with a large y; bth negative means a small x is matched with a small y.) The crrelatin cefficient cmbines the z s by almst cmputing their mean: z r = n 1. The next slide presents 12 prttypes f the crrespndence between a scatterplt and its crrelatin cefficient. These 12 scatterplts illustrate six imprtant facts abut crrelatin cefficients. These six facts appear n pages 454 and 456 f the text and will nt be reprinted here. + 349

r = 1.00 r = 0.40 r = 0.20 r = 0.80 r = 0.80 r = 0.60 r = 0.20 r = 0.00 r = 0.40 r = 0.60 r = 1.00 r = 0.00 + 350

13.3: The regressin line. 90 80 70 ẏ = 37.5 + 0.25x Air Temp. 60 100 150 200 Chirps per Minute 90 80 70 ŷ = 56.2 + 0.136x Air Temp.. 60 100 150 200 Chirps per Minute x y ẏ y ẏ (y ẏ) 2 145.0 62.6 73.75 11.15 124.32 172.0 81.5 80.50 1.00 1.00 155.0 77.9 76.25 1.65 2.72 137.0 84.2 71.75 12.45 155.00 179.5 92.8 82.37 10.43 108.68 192.0 86.9 85.50 1.40 1.96 207.0 87.8 89.25 1.45 2.10 165.5 69.8 78.87 9.07 82.36 193.0 71.6 85.75 14.15 200.22 100.0 71.6 62.50 9.10 82.81 189.0 80.4 84.75 4.35 18.92 SSE(ẏ) = 780.10 + 351

x y ŷ y ŷ (y ŷ) 2 145.0 62.6 75.92 13.32 177.42 172.0 81.5 79.59 1.91 3.64 155.0 77.9 77.28 0.62 0.38 137.0 84.2 74.83 9.37 87.76 179.5 92.8 80.61 12.19 148.55 192.0 86.9 82.31 4.59 21.05 207.0 87.8 84.35 3.45 11.89 165.5 69.8 78.71 8.91 79.35 193.0 71.6 82.45 10.85 117.68 100.0 71.6 69.80 1.80 3.24 189.0 80.4 81.90 1.50 2.26 SSE(ŷ) = 653.23 n n = 11 ccasins, Susan Rbrds determined tw values fr different crickets: Y is the air temperature and X is the cricket s chirp rate in chirps per minute. In her campcraft class, she was tld that ne can calculate the air temperature with the fllwing equatin: ẏ = 37.5 + 0.25x. Abve, we have a scatterplt f Susan s data with this line. The mst bvius fact is that calculate was way t ptimistic! + 352

As Ygi Berra nce said, Yu can bserve a lt by just watching. Let s fllw his advice and examine the scatterplt and table abve. We see that n sme ccasins, ẏ prvides an accurate predictin f y. Visually, this is represented by circles that are n, tuching, r nearly tuching the line. But n many ther ccasins, the predictins are pr: the line is either far lwer than the circle (the predictin is t small) r the line is far higher than the circle (the predictin is t large). Next, we d smething very strange. We change perspective and instead f saying that the predictin is t small (large) we say that the bservatin is t large (small). Egcentric? Yes, but there are tw reasns. First, lk at the scatterplt and line again. It is easier t fcus n the line and see hw the pints deviate frm it, than it is t fcus n all the pints (n culd be large) and see hw the line deviates. + 353

Secnd, we plan t cmpare ẏ and y by subtractin. We culd use ẏ y r y ẏ. The frmer takes y as the standard and the latter reverses the rles. Fr circles belw the line, I want this errr t be a negative number; t get that I must subtract in the rder y ẏ; that is, I take the predictin as the standard and the bservatin errs by nt agreeing. Lk at the table again. The ideal fr the errr y ẏ is 0. As the errr mves away frm 0, in either directin, the inadequacy f the predictin becmes mre and mre serius. Fr math reasns (and ften it makes sense scientifically; at least apprximately) we cnsider an errr f, say, 5 t be exactly as serius as an errr f +5. As in Chapter 12, we might be tempted t achieve this by taking the abslute value f each errr, but, again, we get much better math results by squaring the errrs. Finally, we sum all f the squared errrs t btain: SSE(ẏ) = 780.10. + 354

Ideally, SSE = 0 and the larger it is, the wrse the predictin. Yu are prbably thinking that we need t adjust SSE t accunt fr sample size, but we wn t bther with that. Instead, we pse the fllwing questin: Can we imprve n Susan s line? r: Can we find anther predictin line which has an SSE that is smaller than Susan s 780.10? I suggest the line ŷ = 56.2+0.136x. Frm the table, we see that SSE(ŷ) = 653.23. Thus, accrding t The Principle f Least Squares ŷ is superir t ẏ. Can we d better than my ŷ? N. Majr Result: There is always a unique line that minimizes SSE ver all pssible lines. The equatin f the line is given as ŷ = b 0 + b 1 x, where b 1 = r(s Y /s X ) and b 0 = ȳ b 1 x. + 355

Fr the cricket data, fr example, it can be shwn that x = 166.8, s X = 31.0, ȳ = 78.83, s Y = 9.11, and r = 0.461. Substituting these values int the abve yields b 1 = 0.461(9.11/31.0) = 0.1355, and b 0 = 78.83 0.1355(166.8) = 56.23. Thus, the equatin f the best predictin line is ŷ = 56.23 + 0.1355x, which I runded in my earlier presentatin f it. The means and sd s f the BA data were given n slide 343 and it has r = 0.554. Thus, b 1 = 0.554(0.032/0.028) = 0.633, and b 0 = 0.2636 0.633(0.2664) = 0.095. Thus, the equatin f the regressin line is ŷ = 0.095 + 0.633x. This line appears n page 471 f the text. + 356

We have seen that it is easy t calculate ŷ and it is the best line pssible (based n the principle f least squares), but is it any gd? (Is Sylvester Stallne s best perfrmance any gd? Is there a reasn he has never dne Shakespeare?) First, nte that we can see why r is s imprtant. We need five numbers t calculate ŷ: tw numbers that tell us abut x nly; tw numbers that tell us abut y nly; and ne number (r) that tells us hw x and y relate t each ther. In ther wrds, r tells us all we need t knw abut the assciatin between x and y. We btain the regressin line by calculating tw numbers: b 0 and b 1. Thus, bviusly, this pair f numbers is imprtant. Als, b 1, the slpe, is imprtant by itself; it tells us hw a change in x affects ur predictin ŷ. + 357

Unlike mathematics, hwever, the intercept, b 0, alne usually is nt f interest. Nw in math, the intercept is interpreted as the value f y when x = 0. Cnsider ur examples. Fr the Cricket study, x = 0 gives us ŷ = 56.2. But we have n data at r near x = 0; thus, we really dn t knw what it means fr x t equal 0. (Discuss.) Similarly, fr the BA study, x = 0 predicts a 1986 BA f 0.095. But nbdy batted at r near 0.000 in 1985. In fact, I cnjecture that in the histry f baseball there has never been a psitin player with at least 200 at-bats wh batted 0.000. Cnsider the fllwing scatterplt f fish activity versus water temperature fr fish in an aquarium. (Shuld we use these data t predict fish activity fr x = 32? Fr x = 212?) + 358

Fish Activity 500 450 400 350 300 707274767880 Water Temp. (F) The abve cnsideratins has resulted in sme statisticians advcating a secnd way t write the equatin fr ŷ: Fr the cricket study: ŷ = ȳ + b 1 (x x). ŷ = 78.83 + 0.461( 9.11 )(x 166.8) = 31.0 78.83 + 0.1355(x 166.8). + 359

This secnd frmula cntains three numbers and they all have meaning: the mean f the predictr; the mean f the respnse and the slpe. Fr better r wrse, this frmulatin has nt becme ppular and yu are nt respnsible fr it n the final. It des, hwever, give us an easy prf f ne f the mst imprtant features f the regressin line, smething I like t call: The law f preservatin f medicrity! Suppse that a subject is medicre n x; that is, the subject s x = x. What is the predicted respnse fr this subject? Plugging x = x int we get ŷ = ȳ + b 1 (x x) ŷ = ȳ + b 1 ( x x) = ŷ = ȳ + b 1 (0) = ȳ. + 360

Visually, the law f preservatin f medicrity means that the regressin line passes thru the pint ( x, ȳ). Let us return t the BA study. Recall that the regressin line is: ŷ = 0.095 + 0.633x. Recall that x = 0.266 and ȳ = 0.264 are clse and the tw sd s are similar. Thus, fr the entire sample there was nt much change in center r spread frm 1985 t 1986. If yu want, yu can verify the numbers in the fllwing table. x ŷ x 2s X = 0.210 0.229 x s X = 0.238 0.246 x = 0.266 0.264 x + s X = 0.294 0.282 x + 2s X = 0.322 0.299 In my experience, many peple find this table surprising, if nt wrng. + 361

In particular, fr small values f x the predictins seem (intuitively) t be t large, while fr large values f x the predictins seem t be t small. S, where is the errr, in the regressin line r in these peple s intuitin? Well, the quick answer is that the intuitin is wrng. If yu lk at the data n page 443 f the text, yu can verify the fllwing facts: f the 19 players with x 0.294, 15 had their BA decline in 1986. f the 18 players with x 0.238, 11 had their BA increase in 1986. This is nt a quirk f the BA study. It always happens when we d regressin. It is easiest t see when, as abve, X and Y have similar means and similar sd s, as in the BA study. This phenmenn is called the regressin effect. (Discuss histry.) + 362

We can see the regressin effect by rewriting the secnd versin f the regressin line as fllws: ŷ ȳ = r( x x ). s X s Y Ignre the r fr a mment. Then the RHS f this equatin is the standardized value f x, call it x. The LHS is the value f ŷ standardized using the mean and sd f the y s; call it ŷ. Thus, this equatin becmes: ŷ = rx. Fr ease f expsitin, suppse that larger values f x and y are preferred t smaller values and that r > 0. Cnsider a subject wh has x = 1. This is a talented subject; she is ne sd better than the mean. Thus, we shuld predict that she will als be talented n y. The intuitive predictin is t ignre r and predict that ŷ = 1. + 363

But as we saw in the BA study, this intuitive predictin is t large. We need t include r. Nw, if r = 1, then there is a perfect linear relatinship between x and y and x = 1 will yield ŷ = 1. (I cnjecture that the reasn peple make intuitive predictins is that they tacitly assume r = 1; r, mre likely, they dn t have any feel fr relatinships that are nt perfect.) S, what des the regressin line tell us? It tells us that we must pay attentin t the r. Lk again at the equatin: ŷ = rx. T make this argument precise, let s cnsider the BA study fr which, recall, r = 0.554, which I will read as 55.4%. We see that fr a talented player, say x = 1, the predicted value is ŷ = 0.554(1) = 0.554. + 364

In wrds, fr a player wh is ne sd abve the mean n x, we predict that he will be 0.554 sd s abve the mean n y. Thus, part f what we tk fr talent in x is transitry; dare I call it luck? And nly part f the talent (r t be exact) is passed n t the ŷ. I really like this interpretatin f r; it is the prprtin f the advantage in x that is transmitted t ŷ. A similar argument applies fr x < 0, the pr players. nly part (r again) f the pr perfrmance in 1985 is predicted t carry ver t 1986. Example: Brett Favre isn t s great! My subjects are the 32 NFL teams. X = the number f victries in 2005 and Y = the number f victries in 2006. Belw are summary statistics: x = ȳ = 8; s X = 3.389; s Y = 2.896; r = 0.286. The regressin line is: ŷ = 8 + 0.244(x 8). + 365

Thus, fr x = 13, ŷ = 9.22 and fr x = 4, ŷ = 7.02. Thus, using this equatin, we wuld predict that the 2008 Jets wuld win 7.02 games and the 2008 Packers wuld win 9.22 games. And the regressin line des nt take int accunt the incredibly easy schedule the Jets play (8 games against the pathetic western cnferences) and the difficult schedule fr the Packers. Mrever, if Crsby makes the last secnd field gal against the Vikings, the Packers are tied fr first and wuld be the team in the playffs. And this is nt even allwing fr the fact that the Packers lst t Tennessee b/c f a cin tss! Finally, fr x = 16, ŷ = 9.95; s, even if Brady is nt injured, the Patrits wuld be predicted t win abut 10 games. Fr the remainder f the curse, we pnder ne last questin: The regressin line is the best line, but is it any gd? + 366

I will prvide tw answers t this questin: ne bjective and ne subjective. Nt surprisingly, I greatly prefer the subjective answer. The cefficient f determinatin Recall that SSE measures hw badly the regressin line fits the data, with SSE = 0 a perfect fit and larger values f SSE representing wrse fits. SSE = (y ŷ) 2. Think f SSE as the best we can d using X t predict Y. (Implicit in this sentence is that we are predicting with a straight line.) It is natural t wnder: Hw well can we predict Y w/ using X? Let s think abut this. All we knw fr each subject is its x and y values. It seems unfair t predict y by y (always have the carnival-guy guess yur weight befre yu get n the scale), and if we cannt use X, then all the subjects lk the same t us; + 367

that is, if we cannt use X then we must make the same predictin, call it c, fr each subject. Using c fr each subject, the squared errr fr a subject is (y c) 2 and the ttal f these is: (y c) 2. By using algebra r calculus, we can prve that this ttal is minimized by taking c = ȳ. Thus, the best predictin f Y w/ using X is ȳ fr every subject and the ttal f the squared errrs is: SST = (y ȳ) 2. Next, we nte that fr any set f data, SSE SST. Discuss. n page 476 f the text I have the values f SSE and SST fr several studies. With real data, these numbers tend t be messy; thus, I will intrduce my ideas with nice fake numbers. + 368

Suppse that fr a set f data SST = 100 and SSE = 30. This means that w/ using X ur ttal squared errr is 100, but by using X we can reduce it t 30. It seems bvius that we want t cmpare these numbers: SST SSE = 100 30 = 70. The difficulty is: 70 what? S, we d ne mre cmparisn: SST SSE 100 30 = = 70/100 = 0.70, SST 100 r 70%. We call this rati the cefficient f determinatin and dente it by R 2. An R 2 f 70% tells us that by using X t predict Y 70% f the squared errr in Y disappears r is accunted fr (chse yur favrite expressin). Nte that R 2 is a perfectly bjective measure: it tells us hw much better we d using X than we d nt using X. Fr example, using X is 70% better than nt using X and if we get real happy abut this we might verlk the fact that ur predictins might still be bad. + 369

(I have n dubt that Sylvester Stallne is a 99% r mre better actr than I.) Finally, it can be shwn algebraically that R 2 = r 2. This equatin has a gd and a bad cnsequence. First, the gd. This reinfrces the fact that a relatinship with, say, r = 0.60 is exactly as strng as a relatinship with r = 0.60 b/c they bth have R 2 = 0.36. Secnd, the bad. This equatin has inspired many peple t wnder which is better, R 2 r r, which I find annying. Wrse yet, mst seem t reach the wrng cnclusin: that R 2 is better. (Discuss.) Sectin 13.4: The last sectin (time fr massive rejicing) Each subject has an x and a y. nce we determine the regressin line, each subject has a ŷ and thus an errr e = y ŷ. Nw things get funny. Statisticians gt tired f explaining t clients that we are nt being judgmental when we talk abut each f their cases having an errr. + 370

S, we decided t start calling e the residual, nt the errr. But we cannt call these r b/c the letter r is ingrained as representing the crrelatin cefficient. Thus, just remember e stands fr residual! If yu study regressin in the future, yu will learn that the cllectin f residuals fr a data set can be very useful. Here we will cnsider a few minr prperties f them. We have n numbers: e 1, e 2,..., e n ; a residual fr each subject. The MITK thinks, This reminds me f Chapter 12. S, we draw a picture (dt plt r histgram) f the residuals. Belw is a frequency histgram f the residuals fr the BA study. 20 10 0.120 0.060 0.000 0.060 + 371

What d we see? Well, there is ne small utlier (guess wh?). Nte that utlier, as in Chapter 12, is a term fr ne set f numbers, while islated case is a term fr tw dimensinal data. Als, we can see that every utlier is islated, but a case can be islated w/ having its residual be an utlier. Next, we calculate the mean and sd f the residuals. Fact: Fr any set f data e = 0. (Discuss.) Thus, ē = 0. The sd f the residuals is dented by s. Nte that there is n subscript n this s. In Chapter 13 we have three sd s f interest: s X, s Y and s. Statisticians think that the mst imprtant f these is the sd f the residuals; thus, it is unencumbered by a subscript. Fr the BA data, s = 0.027, 27 pints. B/c the distributin f the residuals is bell-shaped (curiusly, it usually is fr regressin data), we can use the Empirical Rule f Chapter 12 t interpret s = 0.027. + 372

In particular, apprximately 68% f the residuals are between 0.027 and +0.027. In wrds, fr apprximately 68% f the players, ŷ is within 27 pints f y. (And fr the ther 32% f the data ŷ and y differ by mre than 27 pints.) Here is the subjective part. As a baseball fan, I pine that these are nt very gd predictins. T err by 27 pints is a lt in baseball. BTW, if FR is drpped frm the data set, s is reduced t 0.025, which is better, but I still believe that the predictins are nt very gd. + 373

As anther example, fr the Favre data, s = 2.821 which means that fr apprximately 32% f the teams ŷ misses y by 3 r mre games. (By actual cunt, the number is 11 f 32, r 34%.) In my pinin this is a very bad predictin rule. I will nte that there is anther restrictin n the residuals (ther than that they sum t 0). It is: (xe) = 0. The interested reader can refer t the bk t see ne reasn this is useful. (This fact is used a great deal in advanced wrk n regressin.) Finally, page 487 shws the (smetimes huge) effect a single islated case can have n the value f r and, hence, the regressin line. Page 487 and its cnsequences will nt be n the final. + 374