Name Class Date. Residuals and Linear Regression Going Deeper

Similar documents
Reteach 2-3. Graphing Linear Functions. 22 Holt Algebra 2. Name Date Class

Regressions of Olympic Proportions

6.1.1 How can I make predictions?

OHS Algebra 2 Summer Packet

3.1 Notes for Lines and Linear Growth: What does a constant rate mean?

Correlation Coefficient: the quantity, measures the strength and direction of a linear relationship between 2 variables.

a) Do you see a pattern in the scatter plot, or does it look like the data points are

Classroom Assessments Based on Standards Integrated College Prep I Unit 3 CP 103A

AP Statistics Two-Variable Data Analysis

H l o t lol t M t c M D gc o ed u o g u al a 1 g A al lg Al e g b e r r 1 a

Session 4 2:40 3:30. If neither the first nor second differences repeat, we need to try another

q3_3 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MFM1P Foundations of Mathematics Unit 3 Lesson 11

Copyright, Nick E. Nolfi MPM1D9 Unit 6 Statistics (Data Analysis) STA-1

Reminder: Univariate Data. Bivariate Data. Example: Puppy Weights. You weigh the pups and get these results: 2.5, 3.5, 3.3, 3.1, 2.6, 3.6, 2.

HW38 Unit 6 Test Review

Analyzing Lines of Fit

S12 - HS Regression Labs Workshop. Linear. Quadratic (not required) Logarithmic. Exponential. Power

Math Sec 4 CST Topic 7. Statistics. i.e: Add up all values and divide by the total number of values.

SAMPLE. Investigating the relationship between two numerical variables. Objectives

find the constant of variation. Direct variations are proportions.

Least-Squares Regression. Unit 3 Exploring Data

Steps to take to do the descriptive part of regression analysis:

MINI LESSON. Lesson 2a Linear Functions and Applications

Using a Graphing Calculator

a. Length of tube: Diameter of tube:

Year 10 Mathematics Semester 2 Bivariate Data Chapter 13

Complete Week 8 Package

IT 403 Practice Problems (2-2) Answers

Unit 4 Linear Functions

appstats8.notebook October 11, 2016

Section 2.2: LINEAR REGRESSION

1. Write an expression of the third degree that is written with a leading coefficient of five and a constant of ten., find C D.

Bivariate Data Summary

1.1. Solving Simple Equations. Essential Question How can you use simple equations to solve real-life problems?

H.Algebra 2 Summer Review Packet

Statistics 100 Exam 2 March 8, 2017

Unit #2: Linear and Exponential Functions Lesson #13: Linear & Exponential Regression, Correlation, & Causation. Day #1

1) A residual plot: A)

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

4.1 Introduction. 4.2 The Scatter Diagram. Chapter 4 Linear Correlation and Regression Analysis

Solving a Linear-Quadratic System

Lesson 3.4 Exercises, pages

UNIT 3 Relationships

date: math analysis 2 chapter 18: curve fitting and models

POLYNOMIAL FUNCTIONS. Chapter 5

BIVARIATE DATA data for two variables

Correlation A relationship between two variables As one goes up, the other changes in a predictable way (either mostly goes up or mostly goes down)

Five people were asked approximately how many hours of TV they watched per week. Their responses were as follows.

NAME: DATE: SECTION: MRS. KEINATH

Geometric Formulas (page 474) Name

Mostly Review. Phy 123L

Unit 8 Practice Problems Lesson 1

Algebra I Calculator Activities

CHAPTER 5-1. Regents Exam Questions - PH Algebra Chapter 5 Page a, P.I. 8.G.13 What is the slope of line shown in the

Math 52 Linear Regression Instructions TI-83

Lesson 4 Linear Functions and Applications

Why? Step 3 Substitute the value from Step 2 into either equation, and solve for the other variable. Write the solution as an ordered pair.

Grade Middle/Junior High School Mathematics Competition 1 of 10

MPM1D - Principles of Mathematics Unit 3 Lesson 11

Descriptive Statistics Class Practice [133 marks]

Chapter Test. Solve the equation. Check your solution, if possible y = n 13 = 1.4n (8d 5) + 13 = 12d 2 6.

Module 1 Linear Regression

Describing Bivariate Relationships

Assignment 5 Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

9-4. Quadratics and Projectiles. Vocabulary. Equations for the Paths of Projectiles. Activity. Lesson

UNIT PLAN. Big Idea/Theme: Measurement systems are used to solve real world problems.

OHS Algebra 1 Summer Packet

Introduce Exploration! Before we go on, notice one more thing. We'll come back to the derivation if we have time.

The American School of Marrakesh. Algebra 2 Algebra 2 Summer Preparation Packet

[ ] 7. ( a) 1. Express as a single power. x b) ( 3) ( 3) = c) = 2. Express as a single power. b) ( 7) ( 7) = e) ( a) ( a) 3.

Chapter 12: Linear Regression and Correlation

4-2. Matrix Addition. Vocabulary. How Are Matrices Added? Lesson. Definition of Matrix Addition. Mental Math

Overview. 4.1 Tables and Graphs for the Relationship Between Two Variables. 4.2 Introduction to Correlation. 4.3 Introduction to Regression 3.

SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question. x )

2012 Pellissippi State Middle School Math Competition. Sponsored by: Oak Ridge Associated Universities

Chapter 5 Least Squares Regression

10.1: Scatter Plots & Trend Lines. Essential Question: How can you describe the relationship between two variables and use it to make predictions?

MATH-A Day 8 - Stats Exam not valid for Paper Pencil Test Sessions

Math 1 Semester 1 Final Review

Absolute Value Equations(One Absolute Value) Objectives. Absolute Value Inequalities (> or ) Absolute Value Inequalities (< or )

Review of Section 1.1. Mathematical Models. Review of Section 1.1. Review of Section 1.1. Functions. Domain and range. Piecewise functions

Use slope and y-intercept to write an equation. Write an equation of the line with a slope of 1 } 2. Write slope-intercept form.

Algebra I Notes Modeling with Linear Functions Unit 6

5.1 Bivariate Relationships

3. What is the decimal place of the least significant figure (LSF) in the number 0.152? a. tenths place b. hundredths place c.

1. A machine produces packets of sugar. The weights in grams of thirty packets chosen at random are shown below.

OCR Maths S1. Topic Questions from Papers. Representation of Data

Date: Pd: Unit 4. GSE H Analytic Geometry EOC Review Name: Units Rewrite ( 12 3) 2 in simplest form. 2. Simplify

ALGEBRA II/TRIG HONORS SUMMER ASSIGNMENT

MEP Primary Practice Book 5b a) Use a ruler to draw the required parts of this 10 cm line segment. i) ii) iii) iv) 1 unit

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Chapter 2 Statistics. Mean, Median, Mode, and Range Definitions

3.2: Least Squares Regressions

QUIZ 1 (CHAPTERS 1-4) SOLUTIONS MATH 119 FALL 2012 KUNIYUKI 105 POINTS TOTAL, BUT 100 POINTS

Name: Teacher s Name: Estimated Test Date:

New Elementary Mathematics 2 This test covers material taught in New Elementary Mathematics 2 (

CHAPTER. Scatterplots

a. Write what the survey would look like (Hint: there should be 2 questions and options to select for an answer!).

Maintaining Mathematical Proficiency

Transcription:

Name Class Date 4-8 and Linear Regression Going Deeper Essential question: How can you use residuals and linear regression to fit a line to data? You can evaluate a linear model s goodness of fit using residuals. A residual is the difference between an actual value of the dependent variable and the value predicted by the linear model. After calculating residuals, you can draw a residual plot, which is a scatter plot of points whose x-coordinates are the values of the independent variable and whose y-coordinates are the corresponding residuals. Whether the fit of a line to data is suitable and good depends on the distribution of the residuals, as illustrated below. Distribution of residuals about the x-axis is random and tight. A linear fit to the data is suitable and strong. Distribution of residuals about the x-axis is random but loose. A linear fit to the data is suitable but weak. Distribution of residuals about the x-axis is not random. A linear fit to the data may not be suitable. 1 S-ID.2.6b EXAMPLE Creating a Residual Plot and Evaluating Fit Using t as the years since 1970 and as the median age of females, a student fit the line = 0.25t + 29 to the data shown in the table. Make a residual plot and evaluate the goodness of fit. A Calculate the residuals. Substitute each value of t into the equation to find the value predicted for by the linear model. Then subtract predicted from actual to find the residual. t actual predicted Residual 0 29.2 29.0 0.2 Year Median Age of Females 1970 29.2 1980 31.3 1990 34.0 2000 36.5 2010 38.2 10 31.3 20 34.0 30 36.5 40 38.2 Chapter 4 231 Lesson 8

B Plot the residuals. 0.8 0.4 0-0.4-0.8 0 10 20304050 t values C Evaluate the suitability of a linear fit and the goodness of the fit. Is there a balance between positive and negative residuals? Is there a pattern to the residuals? If so, describe it. Is the absolute value of each residual small relative to (actual)? For instance, when t = 0, the residual is 0.2 and the value of is 29.2, so the relative size of the residual is 0.2 0.7%, which is quite small. 29.2 What is your overall evaluation of the suitability and goodness of the linear fit? REFLECT 1a. Suppose the line of fit with equation = 0.25t + 29 is changed to = 0.25t + 28.8. What effect does this change have on the residuals? On the residual plot? Is the new line a better fit to the data? Explain. Chapter 4 232 Lesson 8

You can use a graphing calculator to fit a line to a set of paired numerical data that have a strong positive or negative correlation. The calculator uses a method called linear regression, which involves minimizing the sum of the squares of the residuals. 2 S-ID.2.6c EXPLORE Comparing Sums of Squared Suppose in the first Example one person came up with the equation = 0.25t + 29.0 while another came up with = 0.25t + 28.8 where, in each case, t is the time in years since 1970 and is the median age of females. A Complete each table below in order to calculate the squares of the residuals for each line of fit. Table for = 0.25t + 29.0 t (actual) = 0.25t + 29.0 (predicted) Square of 0 29.2 29.0 0.2 0.04 10 31.3 20 34.0 30 36.5 40 38.2 Table for = 0.25t + 28.8 t (actual) = 0.25t + 28.8 (predicted) Square of 0 29.2 28.8 0.4 0.16 10 31.3 20 34.0 30 36.5 40 38.2 Chapter 4 233 Lesson 8

B Find the sum of the squared residuals for each line of fit. Sum of squared residuals for = 0.25t + 29.0: Sum of squared residuals for = 0.25t + 28.8: C Identify the line that has the smaller sum of the squared residuals. REFLECT 2a. If you use a graphing calculator to perform linear regression on the data, you obtain the equation = 0.232t + 29.2. Complete the table to calculate the squares of the residuals and then the sum of the squares for this line of fit. t (actual) = 0.232t + 29.2 (predicted) Square of 0 29.2 29.2 0 0 10 31.3 20 34.0 30 36.5 40 38.2 Sum of squared residuals: 2b. Explain why the model = 0.232t + 29.2 is a better fit to the data than = 0.25t + 29.0 or = 0.25t + 28.8. Chapter 4 234 Lesson 8

3 Because linear regression produces an equation for which the sum of the squared residuals is as small as possible, the line obtained from linear regression is sometimes called the least-squares regression line. It is also called the line of best fit. Not only will a graphing calculator automatically find the equation of the line of best fit, but it will also give you the correlation coefficient and display the residual plot. S-ID.2.6c Performing Linear Regression on a Graphing Calculator EXAMPLE The table gives the distances (in meters) that a discus was thrown by men to win the gold medal at the Olympic Games from 1920 to 1964. (No Olympic Games were held during World War II.) Use a graphing calculator to find the line of best fit, to find the correlation coefficient, and to evaluate the goodness of fit. A Identify the independent and dependent variables, and specify how you will represent them. The independent variable is time. Since the graphing calculator uses the variables x and y, let x represent time. To simplify the values of x, define x as years since 1920 so that, for instance, x = 0 represents 1920 and x = 44 represents 1964. Then x = represents 1924, x = represents 1928, x = represents 1932, and so on. The dependent variable is the distance that won the gold medal for the men s discus throw. Let y represent that distance. Year of Olympic Games Men s Gold Medal Discus Throw (meters) 1920 44.685 1924 46.155 1928 47.32 1932 49.49 1936 50.48 1940 No Olympics 1944 No Olympics 1948 52.78 1952 55.03 1956 56.36 B Enter the paired data into two lists, L 1 and L 2, on your graphing calculator after pressing STAT. Do the distances increase or decrease over time? What does this mean for the correlation? 1960 59.18 1964 61.00 Chapter 4 235 Lesson 8

C Create a scatter plot of the paired data using STAT PLOT. The calculator will choose a good viewing window and plot the points automatically if you press ZOOM and select ZoomStat. Describe the correlation. D Perform linear regression by pressing STAT and selecting LinReg (ax + b) from the CALC menu. The calculator reports the slope a and y-intercept b of the line of best fit. It also reports the correlation coefficient r. Does the correlation coefficient agree with your description of the correlation in Part C? Explain. E Graph the line of best fit by pressing Y=, entering the equation of the line of best fit, and then pressing GRAPH. You should round the values of a and b when entering them so that each has at most 4 significant digits. What is the equation of the line of best fit? F Create a residual plot by replacing L 2 with RESID in STAT PLOT as the choice for Ylist. (You can select RESID from the NAMES menu after pressing 2nd STAT.) Evaluate the suitability and goodness of the fit. Chapter 4 236 Lesson 8

REFLECT 3a. Interpret the slope and y-intercept of the line of best fit in the context of the data. 3b. Use the line of best fit to make predictions about the distances that would have won gold medals if the Olympic Games had been held in 1940 and 1944. Are the predictions interpolations or extrapolations? 3c. Several Olympic Games were held prior to 1920. Use the line of best fit to make a prediction about the distance that would have won a gold medal in the 1908 Olympics. What value of x must you use? Is the prediction an interpolation or an extrapolation? How does the prediction compare with the actual value of 40.89 meters? PRACTICE Throughout these exercises, use a graphing calculator. 1. The table gives the distances (in meters) that a discus was thrown by men to win the gold medal at the Olympic Games from 1968 to 2008. a. Find the equation of the line of best fit. b. Find the correlation coefficient. c. Evaluate the suitability and goodness of the fit. Year of Olympic Games Men s Gold Medal Discus Throw (meters) 1968 64.78 1972 64.40 1976 67.50 1980 66.64 1984 66.60 1988 68.82 1992 65.12 1996 69.40 2000 69.30 d. Does the slope of the line of best fit for the 1968 2008 data equal the slope of the line of best fit for the 1920 1964 data? If not, speculate about why this is so. 2004 69.89 2008 68.82 Chapter 4 237 Lesson 8

2. Women began competing in the discus throw in the 1928 Olympic Games. The table gives the distances (in meters) that a discus was thrown by women to win the gold medal at the Olympic Games from 1928 to 1964. a. Find the equation of the line of best fit. Year of Olympic Games Women s Gold Medal Discus Throw (meters) 1928 39.62 1932 40.58 1936 47.63 1940 No Olympics b. Find the correlation coefficient. 1944 No Olympics 1948 41.92 c. Evaluate the suitability and goodness of the fit. 1952 51.42 1956 53.69 1962 55.10 1964 57.27 3. Research the distances that a discus was thrown by women to win the gold medal at the Olympic Games from 1968 to 2008. Explain why a linear model is not appropriate for the data. 4. The table lists the median heights (in centimeters) of girls and boys from age 2 to age 10. Choose either the data for girls or the data for boys. a. Identify the real-world variables that x and y will represent. Age (years) Median Height (cm) of Girls Median Height (cm) of Boys 2 84.98 86.45 3 93.92 94.96 4 100.75 102.22 b. Find the equation of the line of best fit. c. Find the correlation coefficient. d. Evaluate the suitability and goodness of the fit. 5 107.66 108.90 6 114.71 115.39 7 121.49 121.77 8 127.59 128.88 9 132.92 133.51 10 137.99 138.62 Chapter 4 238 Lesson 8

Name Class Date Additional Practice 4-8 1. The data in the table are graphed at right along with two lines of fit. 0 2 4 6 7 3 4 6 a. Find the sum of the squares of the residuals for 3 9. b. Find the sum of the squares of the residuals for 1 5. 2 c. Which line is a better fit for the data? 2. Use the data in the table to answer the questions that follow. 5 6 6.5 7.5 9 0 1 3 2 4 a. Find an equation for a line of best fit. b. What is the correlation coefficient? c. How well does the line represent the data? d. Describe the correlation. 3. Use the data in the table to answer the questions that follow. 10 8 6 4 2 1 1.1 1.2 1.3 1.5 a. Find an equation for a line of best fit. b. What is the correlation coefficient? c. How well does the line represent the data? d. Describe the correlation. 4. The table shows the number of pickles four students ate during the week versus their grades on a test. The equation of the least-squares line is 2.11 79.28, and 0.97. Discuss correlation and causation for the data set. Pickles Eaten 0 2 5 10 Test Score 77 85 92 99 Chapter 4 239 Lesson 8

Problem Solving 1. The table shows the number of hours different players practice basketball each week and the number of baskets each player scored during a game. Alan Brenda Caleb Shawnernando Gabriela 5 10 7 2 0 21 6 11 8 4 2 19 a. Find an equation for a line of best fit. Round decimals to the nearest tenth. b. Interpret the meaning of the slope and -intercept. c. Find the correlation coefficient. 2. Use your equation above to predict the number of baskets scored by a player who practices 40 hours a week. Round to the nearest whole number. A 32 baskets B 33 baskets C 34 baskets D 35 baskets 3. Which is the best description of the correlation? F strong positive G weak positive H weak negative J strong negative 4. Given the data, what advice can you give to a player who wants to increase the number of baskets he or she scores during a game? A Practice more hours per week. B Practice fewer hours per week. C Practice the same hours per week. D There is no way to increase baskets. 5. Do the data support causation, correlation, or chance? F correlation G causation H chance J chance and correlation Chapter 4 240 Lesson 8