determine whether or not this relationship is.

Similar documents
Math 52 Linear Regression Instructions TI-83

ASSIGNMENT 3 SIMPLE LINEAR REGRESSION. Old Faithful

Chapter 11. Correlation and Regression

Chapter 12 : Linear Correlation and Linear Regression

t-test for b Copyright 2000 Tom Malloy. All rights reserved. Regression

Linear Correlation and Regression Analysis

Steps to take to do the descriptive part of regression analysis:

Analyzing Lines of Fit

Ch Inference for Linear Regression

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist

Business Statistics. Lecture 9: Simple Regression

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

Chapter 16. Simple Linear Regression and Correlation

Ordinary Least Squares Regression Explained: Vartanian

Mathematical Notation Math Introduction to Applied Statistics

REVIEW 8/2/2017 陈芳华东师大英语系

Chapter 16. Simple Linear Regression and dcorrelation

Correlation and Regression

AMS 7 Correlation and Regression Lecture 8

Review 6. n 1 = 85 n 2 = 75 x 1 = x 2 = s 1 = 38.7 s 2 = 39.2

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

y n 1 ( x i x )( y y i n 1 i y 2

Can you tell the relationship between students SAT scores and their college grades?

Lecture 15: Chapter 10

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Mathematical Modeling

Midterm 2 - Solutions

Intermediate Algebra Summary - Part I

10.1 Simple Linear Regression

Correlation A relationship between two variables As one goes up, the other changes in a predictable way (either mostly goes up or mostly goes down)

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

Prof. Bodrero s Guide to Derivatives of Trig Functions (Sec. 3.5) Name:

Correlation. We don't consider one variable independent and the other dependent. Does x go up as y goes up? Does x go down as y goes up?

Chapter 10. Correlation and Regression. McGraw-Hill, Bluman, 7th ed., Chapter 10 1

Using Tables and Graphing Calculators in Math 11

Session 4 2:40 3:30. If neither the first nor second differences repeat, we need to try another

Business Statistics. Lecture 10: Course Review

Statistics Introductory Correlation

Applied Regression Analysis

Chapter 24. Comparing Means

Using a graphic display calculator

Chapter 9. Correlation and Regression

Correlation Analysis

Pure Math 30: Explained!

The simple linear regression model discussed in Chapter 13 was written as

Let the x-axis have the following intervals:

Statistics for IT Managers

Lecture 14. Analysis of Variance * Correlation and Regression. The McGraw-Hill Companies, Inc., 2000

Lecture 14. Outline. Outline. Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA)

Biostatistics: Correlations

Reminder: Univariate Data. Bivariate Data. Example: Puppy Weights. You weigh the pups and get these results: 2.5, 3.5, 3.3, 3.1, 2.6, 3.6, 2.

Chapter 23: Inferences About Means

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.

Wed, June 26, (Lecture 8-2). Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in SPSS.

Introduce Exploration! Before we go on, notice one more thing. We'll come back to the derivation if we have time.

LI EAR REGRESSIO A D CORRELATIO

Ch 13 & 14 - Regression Analysis

Business Statistics. Lecture 10: Correlation and Linear Regression

appstats27.notebook April 06, 2017

Chapter 23. Inference About Means

Harvard University. Rigorous Research in Engineering Education

Conditions for Regression Inference:

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

Simple Linear Regression

MINI LESSON. Lesson 2a Linear Functions and Applications

Conceptual Explanations: Modeling Data with Functions

Chapter 10. Correlation and Regression. McGraw-Hill, Bluman, 7th ed., Chapter 10 1

Chapter 4: Regression Models

CRP 272 Introduction To Regression Analysis

Important note: Transcripts are not substitutes for textbook assignments. 1

Section 3: Simple Linear Regression

PS5: Two Variable Statistics LT3: Linear regression LT4: The test of independence.

Correlation and Regression

Statistics and Quantitative Analysis U4320. Segment 10 Prof. Sharyn O Halloran

Correlation and regression

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Module 8: Linear Regression. The Applied Research Center

Prob/Stats Questions? /32

3 9 Curve Fitting with Polynomials

LECTURE 15: SIMPLE LINEAR REGRESSION I

Example: Forced Expiratory Volume (FEV) Program L13. Example: Forced Expiratory Volume (FEV) Example: Forced Expiratory Volume (FEV)

where Female = 0 for males, = 1 for females Age is measured in years (22, 23, ) GPA is measured in units on a four-point scale (0, 1.22, 3.45, etc.

Chapter 20 Comparing Groups

Hypothesis testing. Data to decisions

Mathematical Notation Math Introduction to Applied Statistics

Review of Statistics 101

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

20 Hypothesis Testing, Part I

L06. Chapter 6: Continuous Probability Distributions

Difference between means - t-test /25

Correlation and Regression (Excel 2007)

Introductory Statistics

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.

Do Now 18 Balance Point. Directions: Use the data table to answer the questions. 2. Explain whether it is reasonable to fit a line to the data.

4.1 Introduction. 4.2 The Scatter Diagram. Chapter 4 Linear Correlation and Regression Analysis

Correlation 1. December 4, HMS, 2017, v1.1

Part III: Unstructured Data

Multiple Regression Analysis

Correlation & Simple Regression

Transcription:

Section 9-1 Correlation A correlation is a between two. The data can be represented by ordered pairs (x,y) where x is the (or ) variable and y is the (or ) variable. There are several types of correlations that can be ascertained by graphing a scatter-plot of the ordered pairs and looking at the pattern. If the dots tend to run from left to right in a more or less fashion, the correlation is. If the dots tend to run from left to right in a more or less fashion, the correlation is. If the dots tend to be all over the graph with pattern, the correlation is. If the dots form a pattern other than a (, for example), the correlation is The correlation coefficient is a measure of the and of a relationship between two. The correlation coefficient is denoted by the letter. The correlation coefficient is denoted by, the Greek letter (pronounced row ). The correlation coefficient runs from to ; the closer the value is to either end, the the is. A correlation coefficient of would signify a linear relationship. A correlation coefficient of would signify a linear relationship. A correlation coefficient of would signify linear relationship. While there is a formula for finding the value of r, we are going to use the calculator to find this for us. Steps to graphing a scatter-plot and finding the correlation coefficient on the calculator. 1) Turn StatPlot On 2 nd Y=, select plot 1, turn it on, and make sure that it is looking at L1 and L2. 2) STAT-EDIT, enter data points. Use L1 for the x-values and L2 for the y-values. 3) Set your window WINDOW Set x-min to a number less than the smallest x-value in your list. Set x-max to a number greater than the largest x-value in your list Set y-min to a number less than the smallest y-value in your list.. Set y-max to a number greater than the largest y-value in your list 4) Hit the GRAPH key to look at your scatter-plot. 5) To find the correlation coefficient, run STAT-Test-F. The calculator will give you the values for r 2, and r. We ll talk more about r 2 later, but for now we are looking at r to quantify the strength of the relationship. This test also gives you the equation of the line of regression, as well as an abundance of other information that we will use later, including p. To graph the regression line, simply enter the equation into the Y= screen and press GRAPH. The line will appear, and go through the scatter-plot that you already have. Once we have a number that represents the of the relationship, we need to determine whether or not this relationship is. This is necessary to determine whether the line can be used for y-values. There are ways to determine if the relationship is significant. Since we have been doing hypothesis testing for two chapters now, we will use the hypothesis

test method of determining whether a relationship is significant as our first choice. The hypotheses are written in the following way: To test whether there is any correlation at all, the hypotheses are H0: ρ = 0 and Ha: ρ 0 Notice that this means that the null hypothesis states that there is relationship. In this class, we will ONLY conduct the two-tailed test for any significance. Once we have the hypotheses written, we will conduct a t-test to test them. STAT-Test-F (LinRegTTest) will give us the t-score, as well as the r value, the p value, and the equation of best fit. Enter data into L1 and L2, then run STAT-Test-F, making sure to indicate that you are running a two-tailed test. Set Freq: to 1 and leave RegEQ: blank. If p α, reject H 0. If p > α, fail to reject H 0. Using the Pearson Correlation Coefficient chart (Table 11), found on page A28 in the back of your book can only be used if the desired α level is or. To use the chart, simply use the number of (n) for the and the for the to find the critical value. If the absolute value of is than the value, the relationship is. If the absolute value of is than or the value, the relationship is. Correlation and Causation It is important to remember that just because two variables are related does not necessarily mean that one causes the other. There are 4 possibilities: 1) A cause-and-effect relationship between the variables. x causes y. For example, spending more money on advertising results in more sales. 2) A cause-and-effect relationship between the variables. y causes x. For example, maybe more time between Old Faithful eruptions causes the next one to last longer, instead of the other way around. 3) A, as yet unknown, variable may be both x and y. The Chapter Opener on page 495 shows a positive correlation between a movie s budget and its ticket sales. Which one causes the other? Maybe they are both caused by the actors who star in the movies. Big stars demand more money to appear in films (budget goes up). Big stars draw more people to the theaters to see their movies (ticket sales go up). Maybe they are both caused by the hype generated by the movie studio prior to the release of the movie. Advertising causes the budget to go up. Advertising may lure more into the theater to see the movie (ticket sales up). 4) The variables only to be related; it s a. For example, there may be a strong positive correlation between the number of coyotes living in an area and the number of families owning more than two cars in that same area, but it is highly unlikely that one causes the other. The relation would probably be due to coincidence. Example 3 (Page 498) Old Faithful, located in Yellowstone National Park, is the world s most famous geyser. The duration (in minutes) of several of Old Faithful s eruptions and the times (in minutes) until the next eruption

are shown in the table below. Display the data in a scatterplot and determine whether there appears to be a positive or negative linear correlation or no linear correlation at all. Duration, x 1.8 1.82 1.90 1.93 1.98 2.05 2.13 2.30 2.37 2.82 3.13 3.27 3.65 Time, y 56 58 62 56 57 57 60 57 61 73 76 77 77 STAT- ( ) Enter Duration (x) values into Enter Time (y) values into 2 nd Y=, turn On Select ( option) Make sure that the correct lists are being looked at. Window Set x-min to something than the x-value in your data set. Set x-max to something than the x-value in your data set. Repeat for y. Graph This plot appears to show a correlation. Example 5 (Page 501) Duration, x 3.78 3.83 3.88 4.10 4.27 4.30 4.43 4.47 4.53 4.55 4.60 4.63 Time, y 79 85 80 89 90 89 89 86 89 86 92 91 Use a technology tool (TI-84 Plus) to calculate the correlation coefficient for the Old Faithful data given in Example 3. What can you conclude? STAT TEST F. r ; since is pretty close to 1, it suggests a linear correlation. Example 6 (Page 503) Using the data from Example 5, and the Pearson Correlation Coefficient chart on page A28, determine whether the correlation coefficient is significant. To use the table, simply look at the row for n and the column for α. This is your critical value. If the r value of the correlation is than the critical value, the correlation is significant. Looking at the Pearson Correlation Coefficient chart, the critical value for n = 25 and α =.05 is. Since the r value that we got when we ran the test was r, and, we conclude that the correlation is. At the 5% level of significance, there is evidence to conclude that there is a linear correlation between the duration of Old Faithful s eruptions and the time between eruptions. Example 6 (Page 503) Using the data from Example 5, and the hypothesis testing method, determine whether the correlation coefficient is significant. Write the hypotheses: H 0 : ρ = 0, H a : ρ 0 (claim) STAT - TEST - F Set for L1 and L2 and two-tailed test. We get a t-value (standardized test statistic) of. Our p-value is.

Since p α, we H 0. Remember that the null says that there is significant correlation. Since we that, we are saying that there a significant correlation. At the 5% level of significance, there is evidence to conclude that there is a linear correlation between the duration of Old Faithful s eruptions and the time between eruptions. Example 7 (Page 505) Using the data from example 4 (provided below), test the significance of this correlation coefficient. Use α = 0.05. Advertising $ (in thousands) 2.4 1.6 2.0 2.6 1.4 1.6 2.0 2.2 Company Sales (in thousands) 225 184 220 240 180 184 186 215 H 0 : ; ( correlation) H a : ( correlation) Enter Advertising values into and Sales values into. Run STAT-Test- Designate L1 and L2 and specify a two-tailed test. Don t change Freq or RegEQ. The results are: t ; p, r Since p α, we H 0. This means that there a correlation between advertising expenses and company sales. Also, since r is, it is a correlation. At the 5% significance level, there is evidence to conclude that there is a linear correlation between advertising expenses and company sales. Section 9-2 Equation of Best Fit for Linear Regression The only thing in Section 9-2 that is new is to use the equation of the line of best fit to make predictions about y-values. You can only use the equation to make predictions if the correlation is!! That's why we ran the tests in 9-1 to determine whether the correlation is significant or not. When you run STAT - TEST - F, you get the equation of the line of best fit, too. Using the data from Example 7 in 9-1, we got the following: y = a + bx; a = 104.061, and b = 50.729, so the equation is y = + Example 3 - (Page 516) Use the equation of best fit from Example 7 in 9-1 to predict the expected company sales for the following advertising expenses. a) 1.5 thousand b) 1.8 thousand c) 2.5 thousand Remember, we have already determined that the correlation is significant, so this equation can be used for making predictions. 1) Plug each value of x into the equation to find the y-value prediction. y = 50.729(1.5) + 104.061 180.155, or $180,155 y = 50.729(1.8) + 104.061 195.373, or $195,373 y = 50.729(2.5) + 104.061 230.884, or $230,884 2) Enter the equation into y = ( y = 50.729x + 104.061) Use 2nd Window to set beginning of table, then use 2nd Graph to see the y-value for each x.

Section 9-3 We already know how to calculate the correlation coefficient, r. The square of this coefficient is called the coefficient of. The coefficient of determination is equal to the of the variation to the total variation. In other words, if r 2 =.81, then of the variation between x and y can be by the between x and y. The other 19% of the variation is and is due to other factors or to sampling error. How to find the Standard Error of Estimate: 1) Go to STAT Edit Put values into and values into. 2) STAT The Standard Error of Estimate is the of the residuals ( ). Scroll down the list of values given as the results of the test, and find. Construct a Prediction Interval for a Specific x-value (x0). 1) Determine degrees of freedom ( ) 2) Use and given x ( ) to find. 3) Find the critical t value (tc) that corresponds to the level of confidence (c) by using the calculator (InvT( 1 c )), with degrees of freedom being found at.) 2 4) Use the tc value and the Se value to calculate the margin of error (E). E = t c S e 1 + 1 n + n(x 0 x ) 2 n x 2 ( x) 2 n = sample size x 0 is the x value that you used to find y x is the sample mean. x 2 is total of all the squared x s. Square first, then add them up. ( x) 2 is the total of x s squared. Add first, then square the answer. The values for x, x 2, and x can all be found by going to STAT ( ) 5) Find the left and right endpoints by E from y and then E to y. These answers are your interval. Example 1 (Page 526) The correlation coefficient for the advertising expenses and company sales data as calculated in Example 4 of Section 9-1 is r 0.913. Find the coefficient of determination. What does this tell you about the explained variation of the data about the regression line? About the unexplained variation? r 2 =. About of the variation in the company sales can be by the variation in the advertising expenditures. About (the rest) of the variation is and is due to chance or other variables. Example 2 (Page 528) The regression equation for the advertising expenses and company sales data as calculated in Example 1 of Section 9-2 is. Find the standard error of estimate. x 2.4 1.6 2.0 2.6 1.4 1.6 2.0 2.2 y 225 184 220 240 180 184 186 215

1) Go to STAT Put values into L1 and values into L2 2) STAT Test F Find on the list of values given from this test. 3) The standard error of the estimate is. Example 3 (Page 530) Using the results of Example 2, construct a 95% prediction interval for the company sales when the advertising expenses are $2100. What can you conclude? We were told that, so we plug in for x to find y. y =. From here, we need to be able to use the formula for the margin of error. It s not pretty, but it works. E = t c s e 1 + 1 + n(x 0 x ) 2 n n( x 2 ) ( x) 2 t c = (2 nd VARS 4, (1 -.95)/2, with 6 degrees of freedom. s e =, from last example. n = x 0 = (this is the x value we used to find y ). x =, ( x 2 ) =, x = (These values are from STAT-Calc 1 (1-Var Stats)). E = (2.447)(10.29) 1 + 1 8 + 8(2.1 1.975)2 8(32.44) (15.8) 2 26.857 We 26.857 from to get the end of the estimate. We 26.857 from to get the end of the estimate. < y < We can be 95% confident that when advertising expenses are $2100, the company sales will be between and.