Examining Relationships. Chapter 3

Similar documents
Practice Questions for Exam 1

1. Use Scenario 3-1. In this study, the response variable is

Linear Regression Communication, skills, and understanding Calculator Use

AP STATISTICS Name: Period: Review Unit IV Scatterplots & Regressions

IF YOU HAVE DATA VALUES:

The response variable depends on the explanatory variable.

Chapter 6. Exploring Data: Relationships. Solutions. Exercises:

SECTION I Number of Questions 42 Percent of Total Grade 50

AP Statistics Bivariate Data Analysis Test Review. Multiple-Choice

Test 3A AP Statistics Name:

Chapter 3: Examining Relationships

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

Mrs. Poyner/Mr. Page Chapter 3 page 1

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories.

Lecture 4 Scatterplots, Association, and Correlation

Lecture 4 Scatterplots, Association, and Correlation

Ch 13 & 14 - Regression Analysis

20. Ignore the common effect question (the first one). Makes little sense in the context of this question.

Chapter 16. Simple Linear Regression and dcorrelation

IT 403 Practice Problems (2-2) Answers

Chapter 16. Simple Linear Regression and Correlation

Chapter 3: Describing Relationships

Chapter 4 - Writing Linear Functions

q3_3 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Review of Regression Basics

AP Statistics Unit 2 (Chapters 7-10) Warm-Ups: Part 1

College Algebra. Word Problems

Ch. 3 Review - LSRL AP Stats

Chapter 12 Summarizing Bivariate Data Linear Regression and Correlation

Related Example on Page(s) R , 148 R , 148 R , 156, 157 R3.1, R3.2. Activity on 152, , 190.

Correlation Coefficient: the quantity, measures the strength and direction of a linear relationship between 2 variables.

COLLEGE ALGEBRA. Linear Functions & Systems of Linear Equations

Using a Graphing Calculator

Study Guide AP Statistics

The following formulas related to this topic are provided on the formula sheet:

AP Final Review II Exploring Data (20% 30%)

7-6 Growth and Decay. Let t = 7 in the salary equation above. So, Ms. Acosta will earn about $37, in 7 years.

Example: Can an increase in non-exercise activity (e.g. fidgeting) help people gain less weight?

HOMEWORK (due Wed, Jan 23): Chapter 3: #42, 48, 74

3.2: Least Squares Regressions

LI EAR REGRESSIO A D CORRELATIO

Chapter 9. Correlation and Regression

Chapter 2: Looking at Data Relationships (Part 3)

REVIEW 8/2/2017 陈芳华东师大英语系

Chapter 10. Correlation and Regression. Lecture 1 Sections:

Section 2.5 from Precalculus was developed by OpenStax College, licensed by Rice University, and is available on the Connexions website.

(A) 20% (B) 25% (C) 30% (D) % (E) 50%

3. A beam or staircase frame from CSP costs $2.25 for each rod, plus $50 for shipping and handling.

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

Sem. 1 Review Ch. 1-3

Recall, Positive/Negative Association:

6. 5x Division Property. CHAPTER 2 Linear Models, Equations, and Inequalities. Toolbox Exercises. 1. 3x = 6 Division Property

Objectives. 2.3 Least-squares regression. Regression lines. Prediction and Extrapolation. Correlation and r 2. Transforming relationships

Algebra I Practice Exam

Mini-Lecture 4.1 Scatter Diagrams and Correlation

Lesson 3.notebook May 17, Lesson 2 Problem Set Solutions

Practice Questions for Math 131 Exam # 1

Math 135 Intermediate Algebra. Homework 3 Solutions

CRP 272 Introduction To Regression Analysis

Chapter 5 Least Squares Regression

March 14 th March 18 th

What is the easiest way to lose points when making a scatterplot?

9. Linear Regression and Correlation

Algebra 2 Level 2 Summer Packet

Chapter 5 Friday, May 21st

Inference for Regression Inference about the Regression Model and Using the Regression Line

Chapter 7 9 Review. Select the letter that corresponds to the best answer.

The empirical ( ) rule

M 225 Test 1 B Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

SMAM 319 Exam1 Name. a B.The equation of a line is 3x + y =6. The slope is a. -3 b.3 c.6 d.1/3 e.-1/3

Chapter 3: Examining Relationships

Least Squares Regression

M 140 Test 1 B Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

a) Graph the equation by the intercepts method. Clearly label the axes and the intercepts. b) Find the slope of the line.

Relationships Regression

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

INFERENCE FOR REGRESSION

Correlation and Linear Regression

ALGEBRA 1 SEMESTER 1 INSTRUCTIONAL MATERIALS Courses: Algebra 1 S1 (#2201) and Foundations in Algebra 1 S1 (#7769)

Chapter 10 Correlation and Regression

ALGEBRA I SEMESTER EXAMS PRACTICE MATERIALS SEMESTER (1.1) Examine the dotplots below from three sets of data Set A

Pre-Algebra Mastery Test #8 Review

Chapter 3: Describing Relationships

Chapter 14 Multiple Regression Analysis

b(n) = 4n, where n represents the number of students in the class. What is the independent

MATH 081. Diagnostic Review Materials PART 2. Chapters 5 to 7 YOU WILL NOT BE GIVEN A DIAGNOSTIC TEST UNTIL THIS MATERIAL IS RETURNED.

Algebra 1 Semester Exam

THIS IS A CLASS SET - DO NOT WRITE ON THIS PAPER

Correlation and Regression

Complete Week 8 Package

4. Based on the table below, what is the joint relative frequency of the people surveyed who do not have a job and have a savings account?

Chapter 6 Assessment. 3. Which points in the data set below are outliers? Multiple Choice. 1. The boxplot summarizes the test scores of a math class?

Name: Class: Date: Unit 1. Thinking with Mathematical Models Investigation 2: Linear Models & Equations. Practice Problems

SMAM 314 Practice Final Examination Winter 2003

Determine is the equation of the LSRL. Determine is the equation of the LSRL of Customers in line and seconds to check out.. Chapter 3, Section 2

Relations and Functions

Overview. 4.1 Tables and Graphs for the Relationship Between Two Variables. 4.2 Introduction to Correlation. 4.3 Introduction to Regression 3.

Chapter 5 Test Review

Algebra 1 Fall Review

Transcription:

Examining Relationships Chapter 3

Scatterplots A scatterplot shows the relationship between two quantitative variables measured on the same individuals. The explanatory variable, if there is one, is graphed on the x-axis. Scatterplots reveal the direction, form, and strength.

Patterns Direction: variables are either positively associated or negatively associated Form: linear is preferred, but curves and clusters are significant Strength: determined by how close the points in the scatterplot are linear

Least Squares Regression Line If the data in a scatterplot appears to be linear, we often like to model the data by a line. Least-squares regression is a method for writing an equation passing through the centroid for a line that models linear data. A least squares regression line is a straight line that predicts how a response variable, y, changes as an explanatory variable, x, changes.

Height (cm) 1. Following are the mean heights of Kalama children: Age (months) 18 19 20 21 22 23 24 25 26 27 28 29 Height (cm) 76.1 77.0 78.1 78.2 78.8 79.7 79.9 81.1 81.2 81.8 82.8 83.5 a) Sketch a scatter plot. Age (months)

b) What is the correlation coefficient? Interpret in terms of the problem. c) Calculate and interpret the slope. d) Calculate and interpret the y-intercept. e) Write the equation of the regression line. f) Predict the height of a 32 month old child. b) r.9944 There is a strong positive linear relationship between height and age. c) b=.635 For every additional month in age, there is an increase of about.635 cm in height. d) a = 64.93 cm At zero months, the estimated mean height for the Kalama children is 64.93 cm. e) yˆ 64.93.6350 x ; x = age, y=predicted ˆ height predicted height 64.93.6350 age f) yˆ 64.93.6350(32) 85.25 cm

2. Good runners take more steps per second as they speed up. Here are there average numbers of steps per second for a group of top female runners at different speeds. The speeds are in feet per second. Speed (ft/s) 15.86 16.88 17.50 18.62 19.97 21.06 22.11 Steps per second 3.05 3.12 3.17 3.25 3.36 3.46 3.55 b) r.9990 c) There is a strong, positive linear relationship between speed and steps per second. b.0803 For every 1 foot per second increase in speed, the steps increase typically by.0803 steps per second.

2. Good runners take more steps per second as they speed up. Here are there average numbers of steps per second for a group of top female runners at different speeds. The speeds are in feet per second. Speed (ft/s) 15.86 16.88 17.50 18.62 19.97 21.06 22.11 Steps per second 3.05 3.12 3.17 3.25 3.36 3.46 3.55 d) a 1.766 If a runners speed was 0, the steps per second is about 1.77 steps. e) y 17661.. 0803x x = speed, y ˆ predicted steps predicted steps 1.7661.0803 speed f) yˆ 1.7661.0803(20) 3.37 steps per second

3. According to the article First-Year Academic Success... (1999) there is a mild correlation (r =.55) between high school GPA and college GPA. The high school GPA s have a mean of 3.7 and standard deviation of 0.47. The college GPA s have a mean of 2.86 with standard deviation of 0.85. a) What is the explanatory variable? b) What is the slope of the LSRL of college GPA on high school GPA? Intercept? Interpret these in context of the problem. c) Billy Bob s high school GPA is 3.2, what could we expect of him in college? a) High school GPA b) b r s y.. 55 85 sx. 47.9947 For every additional point in high school GPA, there is an increase of approximately.9947 in the college GPA. a y bx 2. 86. 9947( 37. ).8204 c) yˆ.8204.9947(3.2) 2. 36 G.P.A.

Car dealers across North America use the Red Book to help them determine the value of used cars that their customers trade in when purchasing new cars. The book lists on a monthly basis the amount paid at recent used-car auctions and indicates the values according to condition and optional features, but does not inform the dealers as to how odometer readings affect the trade-in value. In an experiment to determine whether the odometer reading should be included, ten 3-year-old cars are randomly selected of the same make, condition, and options. The trade-in value and mileage are shown below. a) predicted trade in 56.2.2668 odometer b) For every 1000 miles on the odometer, there is a decrease of about $26.68 in trade in value. c) r.8934 There is a strong negative linear relationship between a car s odometer reading and the trade-in value.

Coefficient of determination 2 r.7982 Specifically, the value is the percentage of the variation of the dependent variable that is explained by the regression line based on the independent variable. In other words, in a bivariate data set, the y- values vary a certain amount. How much of that variation can be accounted for if we use a line to model the data.

r 2 example 1 Jimmy works at a restaurant and gets paid $8 an hour. He tracks how much total money he has earned each hour during his first shift. Collection 1 1 2 3 4 5 6 7 hours money < 1 8 2 16 3 24 4 32 5 40 6 48 7 56

r 2 example 1 What is my correlation coefficient? r = 1 What is my coefficient of determination? r 2 = 1

r 2 example 1 Look at the variation of y-values about the mean. Collection 1 1 2 3 4 5 6 7 money 8 16 24 32 40 48 56 What explains why the money varies as much as it does?

r 2 example 1 If I draw a line, what percent of the changes in the values of money can be explained by the regression line based on hours?

r 2 example 1 We know 100% of the variation in money can be determined by the linear relationship based on hours.

r 2 example 2 Will be able to explain the relationship between hours and total money for a member of the wait staff with the same precision?

Car dealers across North America use the Red Book to help them determine the value of used cars that their customers trade in when purchasing new cars. The book lists on a monthly basis the amount paid at recent used-car auctions and indicates the values according to condition and optional features, but does not inform the dealers as to how odometer readings affect the trade-in value. In an experiment to determine whether the odometer reading should be included, ten 3-year-old cars are randomly selected of the same make, condition, and options. The trade-in value and mileage are shown below. d) 2 r.7982 We know 79.82% of the variation in trade-in values can be determined by the linear relationship between odometer reading and trade-in value. In other words, about 80% of the variation can be explained by the odometer while the remaining 20% of the variation relies on other variables. e) f) trade in 56.2.2668(60) $4,019.54 42 56.2.2668x 54,223 miles

The scatterplot shows the advertised prices (in thousands of dollars) plotted against ages (in years) for a random sample of Plymouth Voyagers on several dealers lots. Price = 12.37 1.13 Age R-sq = 75.5% a) r.755.8689 b) For every year, there is a decrease of about $1,130 in price. c) Since the 10 year old Plymouth appears to break from the pattern, expect the correlation to be closer to 1. Plymouth Voyagers d)we would expect the slope to become steeper. Price_1000 14 12 10 8 6 4 2 2 4 6 8 10 Age_in_years Scatter Plot

In one of the Boston city parks there has been a problem with muggings in the summer months. A police cadet took a random sample of 10 days (out of the 90-day summer) and compiled the following data. For each day, x represents the number of police officers on duty in the park and y represents the number of reported muggings on that day. a) predicted muggings = 9.77.4932police officers b) r.9691 There is a strong negative linear relationship between the number of police officers and the number of muggings. c) For every additional police officer on duty, there is a decrease of approximately.4932 muggings in the park. d) We know 93.91% of the variation in muggings can be predicted by the linear relationship between number of police officers on duty and number of muggings. e) yˆ=9.77.4932(9) 5.34 muggings

Residual plot A residual is the difference between an observed value of the response variable and the value predicted by the regression line. residual = observed y predicted y The residual plot is the gold standard to determine if a line is a good representation of the data set.

The residual plot is randomly scattered above and below the regression line indicating a line is an appropriate model for the data. Example 1 Age: 18 19 20 21 22 23 24 25 26 27 28 29 Height: 76.1 77.0 78.1 78.2 78.8 79.7 79.9 81.1 81.2 81.8 82.8 83.5 Scatter plot of data Display of LSRL Residual plot

Example 2 x y 1 3 4 5 6 8 10 12 15 1 4.66 6.96 9.52 12.29 18.38 25.11 32.42 44.31 Scatter plot of data Display of LSRL Residual plot The residual plot indicates a clear pattern indicating a line is not a good representation for the data.

Example 3 x y 2 4 7 9 12 15 20 21 25 27 29 30 9 13 25 30 35 49 65 75 70 73 99 79 Scatter plot of data Display of LSRL Residual plot The residual plot is randomly scattered above and below the regression line but steadily increases in distance indicating a line may be reliable model only for lower x-values of the data.

The growth and decline of forests included a scatter plot of y = mean crown dieback (%), which is one indicator of growth retardation, and x = soil ph. A statistical computer package MINITAB gives the following analysis: The regression equation is dieback=31.0 5.79 soil ph Predictor Coef Stdev t-ratio p Constant 31.040 5.445 5.70 0.000 soil ph -5.792 1.363-4.25 0.001 s=2.981 R-sq=51.5% a) What is the equation of the least squares line? b) Where else in the printout do you find the information for the slope and y- intercept? c) Roughly, what change in crown dieback would be associated with an increase of 1 in soil ph? a) y 310. 579. x x soil ph, yˆ predicted dieback c) A decrease of 5.79%

d) What value of crown dieback would you predict when soil ph = 4.0? e) Would it be sensible to use the least squares line to predict crown dieback when soil ph = 5.67? f) What is the correlation coefficient? d) y 310. 579. x y 310. 579. ( 4. 0) 7. 84% dieback e) y 310. 579. x y 310. 579. (5. 67) 1319%. dieback f) r.515.7176 There is a moderate negative correlation between soil ph and percent crown dieback.

The following output data from MINITAB shows the number of teachers (in thousands) for each of the states plus the District of Columbia against the number of students (in thousands) enrolled in grades K-12. Predictor Coef Stdev t-ratio p Constant 4.486 2.025 2.22 0.031 Enroll 0.053401 0.001692 31.57 0.000 s=2.589 R-sq=81.5% a) yˆ 4.486 0.053401x x student enrollment, yˆ predicted # of teachers For every increase of 1000 in student enrollment, the number of teachers increases by about 53.4. There is a strong, positive linear relationship between students and teachers. b) r.815.903

The following output data from MINITAB shows the number of teachers (in thousands) for each of the states plus the District of Columbia against the number of students (in thousands) enrolled in grades K-12. Predictor Coef Stdev t-ratio p Constant 4.486 2.025 2.22 0.031 Enroll 0.053401 0.001692 31.57 0.000 s=2.589 R-sq=81.5% b) r 2 =.815 We know about 81.5% of the variation in the number of teachers can be attributed to the linear relationship based on student enrollment. c) 40 4.486 0.053401x d) yˆ 4.486 0.053401 35.7

Transforming Data Model Explanatory Response Transformation Equ. Exponential x log y log yˆ a bx log yˆ 10 10 Final Model Equation yˆ 10 10 a a bx bx

Transforming Data Model Power Explanatory Response Transformation Equ. log x log y log yˆ a blog x log y ˆ a b log x 10 10 yˆ 10 10 a yˆ 10 10 a b log x Final Model Equation yˆ 10 a x b log x b

Exponential Power 5.4.0645 x yˆ 10 10 yˆ 10 17.18 x 8.84

Exponential Power yˆ 10 3.23 2.49 x

Exponential? Power?

Linear Power yˆ 2.4 0.581x yˆ 10 0.081 x.848

Exponential Power yˆ 10 1.1 x 1.86

x yˆ yˆ 10 10.9 0.00637 x population density predicted intensity For every increase of 1 in population density, the log(agricultural intensity) increases by about 0.0064.

We know 86% of the variation in the log(intensity) can be explained by the linear relationship based on population density.

Cautions about Regression Correlation and regression describe only linear relationships and are not resistant to the influence of outliers. Extrapolation is not a reliable prediction. A lurking variable influences the interpretation of a relationship, yet is not the explanatory or response variable.

The question of causation Association scenario 1 x y Causation

The question of causation Association scenario 2 x y z Common Response (lurking)

The question of causation Association scenario 3 x? y z Confounding

Examples of relationships Mother s body mass index; daughter s body mass index A high school senior s SAT score; the student s first-year college GPA The number of years of education a worker has; the worker s income

More examples The amount of time spent attending religious services; how long the person lives Amount of artificial sweetener saccharin in a rat s diet; count of tumors in the rat s bladder Monthly flow of money into savings; monthly flow of money into investments

Final cautions Even when direct causation is present, it is rarely a complete explanation of an association between two variables. Even well-established causal relations may not generalize to other settings. No strength of association or correlation establishes a cause-and-effect link between two variables.

Regression Practice An economist is studying the job market in Denver area neighborhoods. Let x represent the total number of jobs in a given neighborhood, and let y represent the number of entry-level jobs in the same neighborhood. A sample of six Denver neighborhoods gave the following information (units in 100s of jobs.) x 16 33 50 28 50 25 y 2 3 6 5 9 3

Regression Practice You are the foreman of the Bar-S cattle ranch in Colorado. A neighboring ranch has calves for sale, and you going to buy some calves to add to the Bar-S herd. How much should a healthy calf weight? Let x be the age of the calf (in weeks), and let y be the weight of the calf (in kilograms). x y 1 3 10 16 26 36 42 50 75 100 150 200

Regression Practice Do heavier cars really use more gasoline? Suppose that a car is chosen at random. Let x be the weight of the car (in hundreds of pounds), and let y be the miles per gallon (mpg). x y 27 44 32 47 23 40 34 52 30 19 24 13 29 17 21 14