"It is the mark of a truly intelligent person to be moved by statistics." George Bernard Shaw

Similar documents
Simple Linear Regression

Statistics MINITAB - Lab 5

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn:

CLASS NOTES. for. PBAF 528: Quantitative Methods II SPRING Instructor: Jean Swanson. Daniel J. Evans School of Public Affairs

STA302/1001-Fall 2008 Midterm Test October 21, 2008

Chapter 13 Student Lecture Notes 13-1

Simple Linear Regression

12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

ENGI 3423 Simple Linear Regression Page 12-01

b. There appears to be a positive relationship between X and Y; that is, as X increases, so does Y.

Lecture Notes Types of economic variables

Example: Multiple linear regression. Least squares regression. Repetition: Simple linear regression. Tron Anders Moger

Lecture 8: Linear Regression

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Correlation and Regression Analysis

Summary of the lecture in Biostatistics

Objectives of Multiple Regression

Multiple Linear Regression Analysis

Probability and. Lecture 13: and Correlation

Chapter Two. An Introduction to Regression ( )

Chapter 8. Inferences about More Than Two Population Central Values

ESS Line Fitting

Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

Chapter 14 Logistic Regression Models


residual. (Note that usually in descriptions of regression analysis, upper-case

Statistics: Unlocking the Power of Data Lock 5

Chapter 13, Part A Analysis of Variance and Experimental Design. Introduction to Analysis of Variance. Introduction to Analysis of Variance

Lecture 1 Review of Fundamental Statistical Concepts

Correlation and Simple Linear Regression

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

Linear Regression with One Regressor

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

ECON 482 / WH Hong The Simple Regression Model 1. Definition of the Simple Regression Model

Statistics. Correlational. Dr. Ayman Eldeib. Simple Linear Regression and Correlation. SBE 304: Linear Regression & Correlation 1/3/2018

Multiple Choice Test. Chapter Adequacy of Models for Regression

4. Standard Regression Model and Spatial Dependence Tests

UNIT 7 RANK CORRELATION

Functions of Random Variables

Module 7: Probability and Statistics

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

f f... f 1 n n (ii) Median : It is the value of the middle-most observation(s).

Simple Linear Regression and Correlation. Applied Statistics and Probability for Engineers. Chapter 11 Simple Linear Regression and Correlation

Reaction Time VS. Drug Percentage Subject Amount of Drug Times % Reaction Time in Seconds 1 Mary John Carl Sara William 5 4

Simple Linear Regression - Scalar Form

2.28 The Wall Street Journal is probably referring to the average number of cubes used per glass measured for some population that they have chosen.

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

Previous lecture. Lecture 8. Learning outcomes of this lecture. Today. Statistical test and Scales of measurement. Correlation

The equation is sometimes presented in form Y = a + b x. This is reasonable, but it s not the notation we use.

MEASURES OF DISPERSION

1. The weight of six Golden Retrievers is 66, 61, 70, 67, 92 and 66 pounds. The weight of six Labrador Retrievers is 54, 60, 72, 78, 84 and 67.

Handout #8. X\Y f(x) 0 1/16 1/ / /16 3/ / /16 3/16 0 3/ /16 1/16 1/8 g(y) 1/16 1/4 3/8 1/4 1/16 1

Midterm Exam 1, section 2 (Solution) Thursday, February hour, 15 minutes

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA

Midterm Exam 1, section 1 (Solution) Thursday, February hour, 15 minutes

STA 105-M BASIC STATISTICS (This is a multiple choice paper.)

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

CHAPTER 8 REGRESSION AND CORRELATION

Bootstrap Method for Testing of Equality of Several Coefficients of Variation

Chapter 2 Simple Linear Regression

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

C. Statistics. X = n geometric the n th root of the product of numerical data ln X GM = or ln GM = X 2. X n X 1

University of Belgrade. Faculty of Mathematics. Master thesis Regression and Correlation

Econometric Methods. Review of Estimation

Logistic regression (continued)

Lecture 3. Sampling, sampling distributions, and parameter estimation

Estimation of Stress- Strength Reliability model using finite mixture of exponential distributions

ε. Therefore, the estimate

Chapter 11 The Analysis of Variance

CHAPTER VI Statistical Analysis of Experimental Data

SPECIAL CONSIDERATIONS FOR VOLUMETRIC Z-TEST FOR PROPORTIONS

Handout #1. Title: Foundations of Econometrics. POPULATION vs. SAMPLE

hp calculators HP 30S Statistics Averages and Standard Deviations Average and Standard Deviation Practice Finding Averages and Standard Deviations

Chapter 5 Properties of a Random Sample

Chapter 2 Supplemental Text Material

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

Wu-Hausman Test: But if X and ε are independent, βˆ. ECON 324 Page 1

MS exam problems Fall 2012

Chapter Statistics Background of Regression Analysis

TESTS BASED ON MAXIMUM LIKELIHOOD

Simulation Output Analysis

Special Instructions / Useful Data

: At least two means differ SST

Measures of Dispersion

CHAPTER 2. = y ˆ β x (.1022) So we can write

Using Statistics To Make Inferences 9

UNIVERSITY OF EAST ANGLIA. Main Series UG Examination

Third handout: On the Gini Index

Can we take the Mysticism Out of the Pearson Coefficient of Linear Correlation?

Comparison of Dual to Ratio-Cum-Product Estimators of Population Mean

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

ENGI 4421 Propagation of Error Page 8-01

Analysis of Variance with Weibull Data

= 1. UCLA STAT 13 Introduction to Statistical Methods for the Life and Health Sciences. Parameters and Statistics. Measures of Centrality

Permutation Tests for More Than Two Samples

Applied Statistics and Probability for Engineers, 5 th edition February 23, b) y ˆ = (85) =

Transcription:

Chapter 0 Chapter 0 Lear Regresso ad Correlato "It s the mark of a truly tellget perso to be moved by statstcs." George Berard Shaw Source: https://www.google.com.ph/search?q=house+ad+car+pctures&bw=366&bh=667&tbm =sch&mgl=u6a6khdohjuw5m%53a%53b43ovk06_el86m%53bhttp%553a% 55F%55Furdedrve.com%55Floger-drves-ad-larger-vehcles%55Fautodelvery%55F&source=u&pf=m&fr=u6a6khDohjuW5M%53A%5C43OVK06_el 86M%5C_&usg= pdf5qfgyspzkxt4hfozotcuu%3d&ved=0ahukewjl3pme773l AhUFKKYKHerSDGkQyjcILg&e=aHjlVuWKFoXQmAXqpbPIBg#mgd=XPPD7XP H3wXM%3A%3BXPPD7XPH3wXM%3A%3BLXrQgW_TdN79M%3A&mgrc=XP PD7XPH3wXM%3A May thgs real lfe are related. For stace, the prce of a house s drectly related to ts floor area. The prce of a car depeds o ts ege ad model. Moreover, may measuremets of body parts are related to oe aother. Furthermore, class rakgs may be assocated wth

rakgs stadardzed tests. I chapter 3, you leared how to measure the degree of assocato betwee two quattatve varables. I ths chapter, you wll recall ths procedure ad you wll also lear to measure the degree of assocato betwee raked varables. Moreover, You wll also lear how to use lear equatos modellg relatoshps amog varables. Specfcally, you wll lear to: Illustrate the ature of bvarate data Costruct a scatter plot Descrbe shape (form), tred (drecto), ad varato (stregth) based o a scatter plot Calculate the Pearso product momet correlato coeffcet ad terpret Draw the best-ft le o a scatter plot Calculate the slope ad y-tercept of the regresso le ad terpret Predct the value of the depedet varable gve the value of the depedet varable Solve problems volvg correlato ad regresso aalyss Use regresso aalyss modellg real-lfe data Calculate the Spearma rak correlato coeffcet ad terpret

Cocept Map 3

Bg Ideas Essetal Questos Relatoshps real-lfe How do you measure the degree of stuatos ca be measured ad relatoshps betwee two modelled. varables? How ca you model relatoshps amog varables? LINEAR CORRELATION Cosder the followg fal grades algebra ad statstcs obtaed by a sample of studets. Studet A B C D E F G H I J K L Algebra 8 87 78 93 95 87 80 85 85 86 90 83 Statstcs 84 85 75 9 96 90 80 86 83 84 9 85 A comparso of the grades of the studets these two subjects would lead you to ask the questo: "Is there a relatoshp betwee these algebra ad statstcs grades?" Specfcally, ca you say that studets who have hgh grades algebra have also hgh grades statstcs? It was Sr Fracs Galto, a cous of Charles Darw, who troduced the dea of correlato aalyss, a statstcal method to determe f there s a assocato betwee two 4

varables. Galto udertook detaled studes o huma characterstcs ad he foud out that there s a very strog relatoshp betwee the heghts of fathers ad the heghts of ther sos. A very useful vsual tool the process of determg f there s ay relatoshp betwee two varables, say X ad Y, s the scatter plot. For the gve data set above, we obta the = data pots (x,y) o the Cartesa plae by usg x = algebra grade ad y = statstcs grade of the th studet. Hece, each studet would be represeted by a pot as show the followg scatter plot. 00 96 9 88 STATISTICS 84 80 76 7 76 80 84 88 9 96 00 ALGEBRA Fgure. Scatter plot of statstcs versus algebra grades It could be see from the scatter plot that the data pots may ot fall exactly o a straght le but they ted to follow very closely a straght le wth a postve slope. Ths s a dcato that there s a strog drect lear relatoshp betwee algebra ad statstcs grades such that studets wth hgh grades algebra are expected to have hgh grades also statstcs. Hece, for 5

ths data set, you could say that there s a strog postve correlato betwee algebra grades ad statstcs grades. After you have draw the scatter plot ad observe that there s a lear relatoshp betwee the two varables X ad Y, you could the determe the approprate correlato coeffcet, whch measures the stregth of the lear relatoshp betwee two varables. Pop-Up! Lear correlato s a statstcal method of determg the ature ad stregth of the lear relatoshp betwee two varables X ad Y usg a sgle umercal value kow as the correlato coeffcet. Pearso's r Karl Pearso developed a coeffcet of lear correlato that could be used to determe the ature ad stregth of lear relatoshp betwee two quattatve varables X ad Y. Ths correlato coeffcet s called Pearso s sample product-momet correlato coeffcet, whch s popularly kow as Pearso's r, s gve by the followg formula. 6

7 Pop-Up! y y x x y x y x r, where: x = th value of the varable X y = th value of the varable Y = umber of observatos or data pots Note that ths Pearso s r formula s a smplfed form of the sample correlato coeffcet r Chapter 3 show below: x y xy r x y x y The resultg value of ths correlato coeffcet rages from to +. Specfcally, there are two peces of formato that ca be obtaed from t, amely:. The postve (+) or egatve () sg dcates the ature of the lear relatoshp betwee X ad Y, where

r 0 (postve correlato) dcates a drect lear relatoshp betwee X ad Y (.e., Y s expected to crease as X creases); ad r 0 (egatve correlato) dcates a drect lear relatoshp betwee X ad Y (.e., Y s expected to decrease as X creases).. The magtude of r, dsregardg the + or sg, dcates the stregth of the lear relatoshp betwee X ad Y so that r close to dcates a strog correlato betwee X ad Y; r close to ½ dcates a moderate correlato betwee X ad Y; r close to 0 dcates a weak correlato betwee X ad Y; r = + dcates a perfect postve correlato betwee X ad Y; r = dcates a perfect egatve correlato betwee X ad Y; ad r = 0 dcates that there s o lear relatoshp (zero correlato) betwee X ad Y. 8

The followg are some prototype scatter plots. Fgure. Prototype scatter plots p e r f e c t p o s t v e c o r r e la t o lo w p o s t v e c o r r e la t o Y X s t r o g p o s t v e c o r r e la t o Y X s t r o g e g a t v e c o r r e la t o Y Y X z e r o c o r r e la t o Y X z e r o c o r r e la t o Y X X It ca be see from these llustratos that a perfect correlato betwee X ad Y meas that all the data pots the scatter plot le exactly o a straght le. I ths case, t would be 9

possble to gve a accurate predcto of Y based o the kow value of X. The closer the data pots are to a lear patter, the stroger the correlato betwee X ad Y. Ad the farther the data pots are from a lear patter, the weaker the correlato betwee X ad Y. It should be oted that the correlato coeffcet r s a measure of lear relatoshp betwee X ad Y so that a zero correlato smply meas that there s o lear relatoshp betwee X ad Y. But t does ot elmate the possblty that there may be some other kd of assocato betwee them. A example of ths stuato s gve by the last dagram the prototype scatter plots where there s zero correlato betwee X ad Y eve though there s a strog quadratc (parabolc) relatoshp betwee them. Example For the gve data o the algebra grades ad statstcs grades of the sample of = studets, compute for the Pearso s r ad terpret. Soluto Let X deote the algebra grade ad let Y deote the statstcs grade. The requred computatos to determe Pearso's r are show the table below. Studet x y x y x y A 8 84 6,74 7,056 6,888 B 87 85 7,569 7,5 7,395 C 78 75 6,084 5,65 5,850 0

D 93 9 8,649 8,464 8,556 E 95 96 9,05 9,6 9,0 F 87 90 7,569 8,00 7,830 G 80 80 6,400 6,400 6,400 H 85 86 7,5 7,396 7,30 I 85 83 7,5 6,889 7,055 J 86 84 7,396 7,056 7,4 K 90 9 8,00 8,464 8,80 L 83 85 6,889 7,5 7,055 x =,03 y =,03 x =88,855 y =89,6 x y =88,963 Usg the formula for Pearso's r, you get (88,963) (,03)(,03) r = 0.94, (88,855) (,03) (89,6) (,03) whch dcates that there s a strog postve correlato betwee the algebra grades ad statstcs grades of the studets. Ths meas that to a hgh extet, studets wth hgh grades algebra also ted to have hgh grades statstcs. A statstc that s closely assocated wth the correlato coeffcet s the sample coeffcet of determato, r 00 (%), whch gves the proporto of total varablty Y whch could be explaed or accouted for by the lear relatoshp wth X. Ths coeffcet ca be used to compare the stregths of the lear relatoshps betwee two pars of varables: X ad Y versus X ad Y. Suppose the correlato coeffcet betwee X ad Y s r = 0.8 ad the correlato coeffcet betwee X ad Y s r = 0.4, whch correspod to r = 64% ad r =

6%, respectvely, the we could say that the lear relatoshp betwee X ad Y s four tmes as strog compared to the lear relatoshp betwee X ad Y. Example Usg the gve data o Algebra grades ad Statstcs grades, fd the sample coeffcet of determato ad terpret. Soluto: From the example, you obtaed r = 0.94, whch yelds r = 88.4%. Ths meas that 88.4% of the total varablty the statstcs grades could be accouted for by the lear relatoshp wth the algebra grades. Furthermore, the remag.6% of the varablty the statstcs grades could be explaed by other factors besdes the algebra grades. Testg the sgfcace of Pearso s populato correlato coeffcet : I addto to the estmate of the lear relatoshp betwee two umercal varables X ad Y usg the correlato coeffcet Pearso s r, you ca also draw a ferece about the true lear relatoshp betwee X ad Y. To test for the sgfcace of the lear relatoshp betwee two umercal varables X ad Y, you test the ull hypothess H0: = 0 agast a approprate alteratve hypothess Ha usg the test statstc r t, r

whch has the t dstrbuto wth v = degrees of freedom. The ull hypothess H0: = 0 meas that there s o sgfcat lear relatoshp betwee X ad Y, where the parameter deotes the ukow true value of the correlato coeffcet. At a level of sgfcace, you reject H0 accordg to the decso rules gve below. Ha Decso Rule: Reject H0 f Iterpretato 0 t< t/( ) or t>t/( ) There s a sgfcat lear relatoshp betwee X ad Y. > 0 t>t ( ) There s sgfcat postve correlato betwee X ad Y. < 0 t< t ( ) There s a sgfcat egatve correlato betwee X ad Y. Otherwse, fal to reject H0. Recall from Example, the obtaed Pearso s r betwee X = algebra grade ad Y = statstcs grade from a sample of studets s r = 0.94. Recall also that ths sample correlato coeffcet dcates a strog postve correlato betwee the two varables. It also correspods to a sample coeffcet of determato r = 0.8836, whch dcates that approxmately 88.4% of the varato statstcs grades (Y) ca be accouted for by a lear relatoshp wth algebra grades (X). To test the sgfcace of the lear relatoshp betwee the algebra grades ad statstcs grades usg a sgfcace level of 5%, Step. H0: There s o sgfcat lear relatoshp betwee algebra grades ad statstcs grades, that s, = 0. 3

H0: There s a sgfcat lear relatoshp betwee algebra grades ad statstcs grades, that s, 0. Step. = 0.05. Step 3. The test statstc to use s r t r, wth v = degrees of freedom. Step 4. Reject H0 f t < t0.05,0 =.8 or t > t0.05,0 =.8. Otherwse, fal to reject H0. Step 5. Substtutg the avalable formato the test statstc, you get (0.94) t 8.77. (0.94) Step 6. Sce the computed value of the test statstc t s greater tha.8 ad hece, falls to the crtcal rego, the reject H0. Step 7. At = 5%, you have suffcet evdece to dcate a sgfcat lear relatoshp betwee algebra grades ad statstcs grades. Note that the p-value assocated wth the computed test statstc, whch s 8.77, s 0.000003. Sce p-value <, that s 0.000003 < 0.05, the reject H0 ad you arrve at the same cocluso. 4

5 Example 4:The data of the study of the effectveess of a oe-moth physcal exercse program weght reducto usg a sample of eght persos are show the table below. Weght ( pouds) Perso 3 4 5 6 7 8 Before 09 78 69 80 9 58 80 After 96 7 70 07 77 90 59 80 Compute ad test the sgfcace of Pearso s to determe f there s a sgfcat lear relatoshp betwee the weght before ad the weght after the oe-moth physcal exercse program. Soluto: Let X represet the weght pouds before the physcal exercse program ad let Y represet the weght after the physcal exercse program. Pearso s r: y y x x y x y x r 8 69,878, 478, 450 8 75, 498, 478 8 64,56, 450 = 0.9768

Test of sgfcace of Pearso s populato correlato coeffcet : Step. H0: There s o sgfcat lear relatoshp betwee the weghts before ad after the oe-moth exercse program, that s, = 0. H0: There s a sgfcat lear relatoshp betwee the weghts before ad after the oe-moth exercse program, that s, 0. Step. = 0.05. Step 3. The test statstc to use s r t r, wth v = degrees of freedom. Step 4. Reject H0 f t < t0.05,6 =.4469 or t > t0.05,6 =.4469. Otherwse, fal to reject H0. Step 5. Substtutg the avalable formato the test statstc, you get (0.9768) 8 t.800. (0.9768) Step 6. Sce the computed value of the test statstc t s greater tha.4469 ad hece, falls to the crtcal rego, the reject H0. Step 7. At = 5%, you have suffcet evdece to dcate a sgfcat lear relatoshp betwee the weghts before ad after the oe-moth exercse program. Note that the p-value assocated wth the computed test statstc, whch s.800, s 0.00003. Sce p-value <, that s 0.00003 < 0.05, the reject H0 ad you arrve at the same cocluso. 6

7 Example 5:The utrtost s clam that dvduals ted to report decreasg detary take the more they are tervewed. Data from a sample of eght female uversty studets are show below: Studet 3 4 5 6 7 8 Day 905 37 863 9 48 06 705 Day 658 479 00 6 999 097 83 44 Compute ad test the sgfcace of Pearso s to determe f there s a sgfcat lear relatoshp betwee the recorded detary take o day ad the recorded detary take o day. Soluto: Let X represet the recorded detary take o day ad let Y represet the recorded detary take o day. Pearso s r: y y x x y x y x r 8 4,845,840 4, 70 3,56 8 8,35, 8 4, 70 8 3,345,36 3,56 = 0.4489 Test of sgfcace of Pearso s : Step. H0: There s o sgfcat lear relatoshp betwee the recorded detary take o day ad the recorded detary take o day, that s, = 0.

H0: There s a sgfcat lear relatoshp betwee the recorded detary take o day ad the recorded detary take o day, that s, 0. Step. = 0.05. Step 3. The test statstc to use s r t r, wth v = degrees of freedom. Step 4. Reject H0 f t < t0.05,6 =.4469 or t > t0.05,6 =.4469. Otherwse, fal to reject H0. Step 5. Substtutg the avalable formato the test statstc, you get (0.4489) 8 t.304. (0.4489) Step 6. Sce the computed value of the test statstc t does ot fall the crtcal rego, that s.304.4469 ad.304.4469, the you fal to reject H0. Step 7. At = 5%, you have o suffcet evdece to dcate a sgfcat lear relatoshp betwee the recorded detary take o day ad the recorded detary take o day. Note that the p-value assocated wth the computed test statstc, whch s.304, s 0.646. Sce p-value >, that s 0.646 > 0.05, the you fal to reject H0 ad you arrve at the same cocluso. 8

Spearma's rho A correspodg correlato coeffcet that ca be used to measure the stregth of the assocato betwee two varables o the ordal scale, especally whe there are oly few data pots, s the Spearma's Rak-Order Correlato Coeffcet, rs, or smply, Spearma's rho. Uder the Spearma's rho, the data cossts of two sets of rakgs correspodg to the values of the varables X ad Y. But just lke Pearso's r, the resultg values of Spearma's rho also rage from to +. You could also terpret Spearma's rho a smlar maer to Pearso's r. The procedure for calculatg the Spearma's rho s to compare the rakgs o the varables X ad Y for the subjects uder study. Average the raks of ted observatos, f ay. The dfferece betwee each par of raks s deoted by d. These dffereces are squared ad added ad the used to calculate the followg correlato coeffcet: Pop-Up! rs 6 d, ( ) where: d = dfferece betwee the raks assged to the th data pot (x,y) = umber of pars of data. 9

Example 6 The followg table gves the prelmary scores ad the fal rakgs obtaed by a group of 8 female studets for a campus beauty search.fd the degree of assocato betwee prelmary score ad fal rakg. Caddate Prelmary Score Fal Rakg A B C D E F G H I 8 87 78 95 85 (ted wth H) 8 84 85 (ted wth E) 90 6 (worst) 8 5 4 3 7 9 (best) Soluto: Usg the varables X = rak based o the prelmary score ad Y = fal rakg, the followg table gves the rakgs o X ad Y ad the dffereces raks for the = 9 pars of observatos. The computato for d s show the last colum. Caddate x y d d A 3 B 7 6 0

C 0 0 D 9 8 E 5.5* 5 0.5 0.5 F 4-4 G 4 3 H 5.5* 7 -.5.5 I 8 9 - d =.5 *mea of raks 5 ad 6 Substtutg to the formula for rs, we fd that rs 6 d 6(.5) = = 0.904, ( ) 9(8) whch suggests a strog postve correlato betwee the prelmary scores ad the fal rakgs of the beauty cotestats. Ths meas that the rakgs obtaed by the beauty cotestats based o ther prelmary scores geerally agree wth ther fal rakgs. Correlato versus causato We ed ths secto by takg ote of the possble msuse the terpretato of the correlato coeffcet. It should be emphaszed that the correlato betwee two varables X ad Y, o matter how strog t s, does ot ecessarly mply causato betwee the two varables. A hgh correlato smply dcates that there s a strog lear assocato betwee the two varables eve though there s o cause ad effect relatoshp exstg betwee them. It could be that there s a thrd factor that s correlated as well as the cause for these two varables. For example, t s a

kow fact that compared to other moths of the year, sales would crease ad at the same tme the temperature gets colder durg the Chrstmas seaso December. Hece, ths case, there s a drect correlato betwee sales ad temperature. But t would be llogcal to say that hgher sales causes the temperature to go dow or that the lower temperature causes sales to go up. Istead, the real reaso for the hgher sales ad lower temperature durg ths moth s the fact that t s the Chrstmas seaso. SIMPLE LINEAR REGRESSION ANALYSIS I the prevous secto, we leared that correlato aalyss s used to determe f there s a relatoshp betwee two quattatve varables X ad Y. I ths secto, you wll lear aother techque of establshg such relatoshp betwee X ad Y, ad that s through regresso aalyss. Although the assgmet of X ad Y are doe arbtrarly betwee the two quattatve varables correlato aalyss, ths s ot the case regresso aalyss. Here, the depedet or predctor varable s deoted as X, whle the depedet or respose varable s deoted as Y. Hece, regresso aalyss, you wat to see the effect of X o Y. Example 8 For the followg studes, detfy the depedet ad depedet varables of terest. (a) The presdet of a homeowers assocato wats to predct the mothly assocato dues based o the umber of cars owed by hs homeowers.

(b) A educator wats to vestgate the effect of umber of hours of sleep the day before the exam ad the exam score. (c) A utrtost wats to determe f weght ( kg) of adolescets depeds o ther usual food take ( calores). Soluto (a) The depedet varable s the umber of cars owed, whle the depedet varable s the mothly assocato dues. (b) The depedet varable s the umber of hours of sleep, whle the depedet varable s the exam score. (c) The depedet varable s the usual food take, whle the depedet varable s weght. The scatter plot was very helpful vsualzg such relatoshps betwee X ad Y. If a lear patter s evdet from the scatter plot, we may be terested obtag the estmate of such le ad ths s doe usg regresso aalyss. Regresso Le From Fgure, we observed that the data pots ted to follow very closely a straght le wth a postve slope. Such regresso le s draw the scatter plot preseted below. 3

Statstcs Scatterplot of Statstcs agast Algebra Spreadsheet v *c Statstcs = -6.88+.0803*x 98 96 94 9 90 88 86 84 8 80 78 76 74 76 78 80 8 84 86 88 90 9 94 96 Algebra Fgure. Scatter plot of statstcs versus algebra gardes The regresso le s used to predct or estmate the expected value of Y (called the depedet or respose varable) correspodg to gve values of X (called the depedet or explaatory varable). The fuctoal form of the regresso le s gve by the smple lear regresso model where Y s the th observed value of the depedet varable, X s the th observed value of the depedet varable, β 0 s the y-tercept or regresso costat, β s the slope or regresso coeffcet, ad ε s the th radom error assocated wth Y for all =,,,. The regresso parameters or coeffcets β 0 ad β are ukow but we ca estmate them usg the method of least squares. I ths method, the regresso le that best fts the data s obtaed. Wth 4

the ad of Calculus, ths s doe by obtag the sum of the squared devatos betwee the actual Y ad ts expected value, gve by ad mmzg t. The least squares estmators of the parameters β ad β 0 are, respectvely, gve by the followg formula. Pop-Up! x y xy ˆ b x x b ˆ 0 0 y b x Thus, the best ft or regresso le s expressed as yˆ b0 bx. Example 8 5

For the data o the algebra ad statstcs grades of a radom sample of studets gve Example, fd the regresso le. Soluto Usg the estmator of the slope of the regresso le, you have b 03, 03, 88963, 0803. 03, 88855,. Whle the estmator of the y-tercept s gve by, 03, 03 b0. 0803 6. 88. Thus, the regresso le gve Fgure _?_ s yˆ 6.88.0803x. Wth such equato, we ca predct the expected statstcs grade of a studet who obtaed a algebra grade of 90. It s gve by y ˆ 6.88.0803(90) 90.4089. The slope ad y-tercept of the best ft le s terpreted a maer smlar to the terpretato of such a lear equato. That s, the slope represets the expected amout of chage Y for every oe ut chage X. O the other had, the y-tercept s the expected value of Y whe the value of X=0 provded the scope of the model cludes X=0. 6

Example 9 Iterpret the regresso coeffcets obtaed Example 8. Soluto Sce the obtaed slope of the regresso le s.0803, ths meas that there s a estmated average crease of.0803 uts the statstcs grade for every oe ut crease the algebra grade. Ths estmate apples to algebra grades ragg from 78 to 95, the lowest ad hghest reported algebra grade by the studets, respectvely. Sce 0 s ot wth ths rage, t s meagless to terpret the y-tercept of -6.88. Testg the Sgfcace of β I addto to the best ft le that descrbes the lear relatoshp betwee X ad Y, you ca also make fereces regardg the regresso parameters. However, fereces cocerg β s partcularly mportat sce t ca determe f deed there exsts a lear relatoshp betwee X ad Y. For testg H 0: β = 0 (.e., There s o lear relatoshp betwee X ad Y.) agast H : β 0 (.e., There s a sgfcat lear relatoshp betwee X ad Y.), you ca follow the steps testg for the sgfcace of the Pearso s dscussed earler. Ths s because the two hypotheses H 0: = 0 ad H 0: β = 0 are equvalet. 7

Dagostc Checkg Such ferece s vald provded that assumptos uderlyg the smple lear regresso model are satsfed. These assumptos clude the followg: () () () (v) ε must be ormally dstrbuted. ε must have costat varace for all levels of the depedet varables. ε must be ucorrelated. The relatoshp betwee X ad Y s lear. Resdual aalyses are the performed to determe f these assumptos are satsfed. These are doe usg graphcal tools ad usg statstcal tests. However, these are beyod the scope of ths book. Measure of Model Adequacy The coeffcet of smple determato R, also kow as the measure of goodess-of-ft, dscussed correlato aalyss s lkewse computed to assess further the usefuless of the smple lear regresso model for predcto purposes. The formula for R s gve by b SPxy R = SS y. R measures the total varato the Y that s explaed by the smple lear regresso model that utlzes X. It has values betwee 0 or. The larger the R, the more the total varato of Y s explaed by X. 8

Example 0 Example. Usg the gve data o Algebra grades ad Statstcs grades, terpret the R of 88.4% computed Soluto Sce 88.4% s ear 00%, the the regresso le s a good ft for the data o algebra ad statstcs grades. Chapter Performace Tasks Modellg Real Lfe Relatoshps Collect real lfe data volvg at least two quattatve varables ad two raked varables. Examples of real-lfe data that you may gather are as follows:. measuremets of body parts such heght, arm spa, weght, ad keelg heght.. frst gradg perod ad secod gradg perod grades, Eglsh ad math grade, scece ad math grade, etc. 3. daly allowace ad math grade 4. daly allowace ad daly expeses Draw a scatter plot, compute Pearso s r, set up the smple lear regresso model (SLRM) from ther data, ad draw fereces. Possble data for Spearma s rho:. the studets rakgs the frst gradg perod ad the secod gradg perod.. the studets make a commo lst of thgs lke hobbes ad have them raked by, say, boys versus grls, based o ther preferece 3. the studets could also make a commo lst of collegate degree programs ad have them raked by two groups based o ther preferece 4. raks of the top 0 studets the last gradg perod ad ther daly allowace Rak the values of the varables, compute Spearma s rho ad terpret. 9

Statstcs Lks Pearso s r was amed after Karl Pearso (7 March 857 7 Aprl 936), a Eglsh mathematca ad bostatstca. Hs other cotrbutos to classcal statstcal methods clude method of momets, Pearso s ch-squared test, ad prcpal compoets aalyss. Key Cocepts / Terms Lear correlato Perfect correlato Postve correlato Negatve correlato Zero correlato Pearso's r Spearma's rho Coeffcet of determato Chapter Assessmet. The value of the correlato coeffcet r, as well as rs, s always betwee A. ad B. ad C. 0 ad D. 0 ad 00. The coeffcet of determato r could assume values ragg from A. ad B. ad C. 0 ad D. 0 ad 00 3. Whch of the followg statemets s true? A. A perfect correlato betwee the varables X ad Y mples a cause ad effect relatoshp betwee these two varables. 30

B. A postve correlato dcates a strog lear relatoshp. C. A egatve correlato dcates that there s o lear relatoshp betwee the two varables. D. A ear-zero correlato betwee X ad Y suggests a very weak lear relatoshp. 4. Whch of the followg dcates a strog, but ot perfect, lear relatoshp betwee X ad Y? A. B. 0.9 C. 0.03 D. 0.57 5. Whch of the followg dcates a perfect lear relatoshp betwee X ad Y? A. B. 0.9 C. 0.03 D. 0.57 6. Whch of the followg dcates a moderate ad drect lear relatoshp betwee X ad Y? A. B. 0.9 C. 0.03 D. 0.57 7. Whch of the followg dcates that there s o lear relatoshp betwee X ad Y? A. B. 0.9 C. 0.03 D. 0.57 8. A egatve correlato betwee two varables X ad Y suggests that A. there s o correlato betwee X ad Y. B. small values of X are assocated wth small values of Y. C. large values of X are assocated wth small values of Y. D. the predcted value of the depedet varable Y s always egatve. 3

9. Gve the regresso equato y = 3( + x), the correspodg Pearso s r s A. 3 B. C. 3 D. caot be determed 0. If the Pearso s r has a value of, the the slope of the correspodg lear regresso equato s A. B. C. egatve D. postve. I the estmated regresso model ˆ b b x, whch quatty gves the slope of the regresso le? y 0 a) b 0 c) ŷ b) b d) x. What does b represet the regresso model yˆ b0 b x? a) Value of y whe x=0. b) Value of x whe y=0. c) Icrease the value of x for a ut crease y. d) Icrease the value of y for a ut crease x. 3

3. What does b 0 represet the regresso model yˆ b0 b x? a) Value of ŷ whe x=0. b) Value of x whe y=0. c) Icrease the value of x for a ut crease y. d) Icrease the value of y for a ut crease x. 4. Whch of the followg quattes has always the same sg as r? a) b 0 c) ŷ b) b d) x For umbers 5-7. Gve the followg data: X 3 4 5 Y 0 3 4 5 7 5.What s the estmated regresso le? A. yˆ 9. 6x B. yˆ.6 9x C. yˆ 5. 5x D. yˆ.5 5x 33

6. What s the predcted value of y whe x=.5? A. 3 B. 5 C. 0 D. 7. Whch of the followg statemets about x ad y s true? A. As x creases, y creases. B. As x decreases, y creases. C. As x creases, y decreases. D. The relatoshp of x ad y caot be determed. For os. 8-0. Gve the followg data. X 3 4 5 Y 30 8 5 7 8. What s the estmated regresso le? A. yˆ 34 3. x B. yˆ 34 3. x 34

C. yˆ 3. 34x D. yˆ 3. 34x 9. Whch of the followg statemets s true? A. For every ut crease x, there s a 3. crease y. B. For every ut crease x, there s a 3. decrease y. C. For every ut crease x, there s a 34 crease y. D. For every ut crease x, there s a 34 decrease y. 0. What s the predcted value of y whe x=5? A. 8 B. 0 C. 38 D. 50 35

Chapter Workout. Iterpret each the followg correlato coeffcets betwee two varables X ad Y. (a) 0.4 (b) 0.9 (c) (d) 0.85 (e) 0.5. Idetfy the type of correlato that exsts betwee each of the followg pars of varables. (a) quz scores ad fal grade (b) market value ad age of a equpmet (c) cdece of lug cacer ad smokg level (d) IQ ad heght (e) heght ad weght 3. The followg table shows the umber of hours spet for studyg ad the grade obtaed by a studet each of hs 6 subjects durg the last gradg perod. No. of hours.5.0 4.0.0 3.0.5 Grade 85 85 95 87 90 9 a) Compute ad terpret Pearso's r. b) Fd the estmated regresso le. 36

4. The followg table shows the rakgs gve by two judges to the eght etres a postermakg cotest. Etry # 3 4 5 6 7 8 Judge A 6 8 3 5 7 4 Judge B 3 4 6 7 5 8 Use Spearma's rho, rs to determe f the two judges agree o the rakgs that they gave to the etres. 5. Idetfy the ature of the correlato betwee the two umercal varables X ad Y each of the followg scatter plots. 5 4 3 0 0 0 40 60 80 00 0 (a) 37

50 40 30 0 0 0 0 0 40 60 80 00 (b) 00 50 00 50 0 0 0 40 60 80 (c) 6. Gve the followg data: x 4 5 6 3 9 8 y.5.7.0.9.3.8..0 A. Compute ad terpret Pearso's r. B. What proporto of the total varablty Y ca be explaed by the lear relatoshp wth X? 38

7. The followg data gve the IQ ad shoe sze of a sample of 0 studets. Studet A B C D E F G H I J IQ 05 00 00 90 95 95 0 95 00 05 Shoe sze 7 9 7.5 8 8 8.5 0 8.5 9 Compute Pearso's r ad terpret. 8. To determe f there s a assocato betwee the prce ad the qualty ratg of a certa household applace, the followg data o the prces ( pesos) ad the qualty ratgs (-worst to 7-best) of seve brads of the household applace were recorded. Brad A B C D E F G Prce,780,500,500,000,00,800,00 Qualty 7 4 3 6 5 Compute Spearma's rho, rs ad terpret. 39

9. The followg s the data for a radom sample of 8 households o the umber of members (X) ad the daly expedture o food (Y). 3 4 5 6 7 8 X 5 3 0 4 4 5 7 6 Y 50 0 300 80 00 00 40 30 A. Fd the estmated regresso le. B. Fd the estmated value of daly expedture o food whe the umber of members s 5. 0. Suppose a costructo compay keeps record of the umber of workers (X) ad the umber of workg days to fsh a 00 sq m two-storey house (Y). X 5 3 0 4 0 Y 94 08 8 00 0 0 A. Fd the estmated regresso le. B. Iterpret b. 40