The following formulas related to this topic are provided on the formula sheet:

Similar documents
Linear Regression Communication, skills, and understanding Calculator Use

AP STATISTICS Name: Period: Review Unit IV Scatterplots & Regressions

Mrs. Poyner/Mr. Page Chapter 3 page 1

AP Statistics Bivariate Data Analysis Test Review. Multiple-Choice

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

SECTION I Number of Questions 42 Percent of Total Grade 50

Determine is the equation of the LSRL. Determine is the equation of the LSRL of Customers in line and seconds to check out.. Chapter 3, Section 2

3.2: Least Squares Regressions

Scatterplots and Correlation

AP CALCULUS BC 2007 SCORING GUIDELINES

Chapter 3: Examining Relationships

20. Ignore the common effect question (the first one). Makes little sense in the context of this question.

Ch. 3 Review - LSRL AP Stats

Section 5.4 Residuals

Math Released Item Algebra 2. Radioactive Element Equations VH147862

y n 1 ( x i x )( y y i n 1 i y 2

Section I: Multiple Choice Select the best answer for each question.

1. Use Scenario 3-1. In this study, the response variable is

The response variable depends on the explanatory variable.

Lecture 4 Scatterplots, Association, and Correlation

Lecture 4 Scatterplots, Association, and Correlation

Unit 6 - Introduction to linear regression

Practice Questions for Exam 1

Chapter 12 Summarizing Bivariate Data Linear Regression and Correlation

Sem. 1 Review Ch. 1-3

Chapter 5 Friday, May 21st

Examining Relationships. Chapter 3

Review of Regression Basics

Guidelines for Graphing Calculator Use at the Commencement Level

Quantitative Bivariate Data

Relationships Regression

Simple Linear Regression

Unit 6 - Simple linear regression

IT 403 Practice Problems (2-2) Answers

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

q3_3 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Chapter 6: Exploring Data: Relationships Lesson Plan

Section 2.5 from Precalculus was developed by OpenStax College, licensed by Rice University, and is available on the Connexions website.

Test 3A AP Statistics Name:

Prob/Stats Questions? /32

AP Statistics Unit 2 (Chapters 7-10) Warm-Ups: Part 1

MODELING. Simple Linear Regression. Want More Stats??? Crickets and Temperature. Crickets and Temperature 4/16/2015. Linear Model

Chapter 3: Describing Relationships

IF YOU HAVE DATA VALUES:

Chapter 8. Linear Regression /71

Nov 13 AP STAT. 1. Check/rev HW 2. Review/recap of notes 3. HW: pg #5,7,8,9,11 and read/notes pg smartboad notes ch 3.

STAT 350 Final (new Material) Review Problems Key Spring 2016

appstats8.notebook October 11, 2016

What is the easiest way to lose points when making a scatterplot?

Stat 101 Exam 1 Important Formulas and Concepts 1

7.0 Lesson Plan. Regression. Residuals

Objectives. 2.3 Least-squares regression. Regression lines. Prediction and Extrapolation. Correlation and r 2. Transforming relationships

Chapter 7. Linear Regression (Pt. 1) 7.1 Introduction. 7.2 The Least-Squares Regression Line

Stat 101: Lecture 6. Summer 2006

Chapter 2: Looking at Data Relationships (Part 3)

Related Example on Page(s) R , 148 R , 148 R , 156, 157 R3.1, R3.2. Activity on 152, , 190.

9. Linear Regression and Correlation

Correlation & Simple Regression

Scatterplots. 3.1: Scatterplots & Correlation. Scatterplots. Explanatory & Response Variables. Section 3.1 Scatterplots and Correlation

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

AP CALCULUS AB 2011 SCORING GUIDELINES

AP Statistics. Chapter 6 Scatterplots, Association, and Correlation

AP Statistics. Chapter 9 Re-Expressing data: Get it Straight

Conditions for Regression Inference:

AP CALCULUS BC 2010 SCORING GUIDELINES

Chapter 6. September 17, Please pick up a calculator and take out paper and something to write with. Association and Correlation.

Unit Six Information. EOCT Domain & Weight: Algebra Connections to Statistics and Probability - 15%

AP Physics C: Electricity and Magnetism

********************************************************************************************************

Chapter 3: Describing Relationships

Mini-Lecture 4.1 Scatter Diagrams and Correlation

Review for Algebra Final Exam 2015

Chapter 9. Correlation and Regression

UNIT 12 ~ More About Regression

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories.

Chapter 12 : Linear Correlation and Linear Regression

Chapter 10. Correlation and Regression. Lecture 1 Sections:

Chapter 10 Correlation and Regression

Math: Question 1 A. 4 B. 5 C. 6 D. 7

6.2b Homework: Fit a Linear Model to Bivariate Data

Regression. Marc H. Mehlman University of New Haven

AP PHYSICS 2011 SCORING GUIDELINES

Multiple-Choice Answer Key

Looking at data: relationships

AP Statistics Two-Variable Data Analysis

Analysis of Bivariate Data

Name Date Class. Standardized test prep Review of Linear Equations 8 Blue/Green

INFERENCE FOR REGRESSION

Study Guide AP Statistics

Looking at Data Relationships. 2.1 Scatterplots W. H. Freeman and Company

appstats27.notebook April 06, 2017

PRACTICE TEST ANSWER KEY & SCORING GUIDELINES GRADE 8 MATHEMATICS

Start with review, some new definitions, and pictures on the white board. Assumptions in the Normal Linear Regression Model

Chapter 5: Data Transformation

Statistics 100 Exam 2 March 8, 2017

Enrichment. Letter Frequency Letter Frequency Letter Frequency a j s b k t c l u d 11 m v e n 17 w f o x g p y h 9 q z i r 20

NEW ENGLAND COMMON ASSESSMENT PROGRAM

CHAPTER 5-1. Regents Exam Questions - PH Algebra Chapter 5 Page a, P.I. 8.G.13 What is the slope of line shown in the

AP Final Review II Exploring Data (20% 30%)

M 225 Test 1 B Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

Transcription:

Student Notes Prep Session Topic: Exploring Content The AP Statistics topic outline contains a long list of items in the category titled Exploring Data. Section D topics will be reviewed in this session. D. Exploring bivariate data 1. Analyzing patterns in scatterplots. Correlation and linearity 3. Least squares regression line 4. Residual plots, outliers, and influential points 5. Transformations to achieve linearity: logarithmic and power transformations Formulas Provided The following formulas related to this topic are provided on the formula sheet: y = b0 + b1 x (note: your calculator uses a and b rather than b0 and b1 ) b1 = " (x! x)(y! y) " (x! x) i i i (note: it is unlikely you will need this formula. The slope will be found on the calculator or given in computer output) b0 = y! b1 x (note: this formula simply tells us that the ordered pair (x, y) lies on the line y = b0 + b1 x ) r= " xi! x % " yi! y % 1 ( n!1 $# sx '& $# sy '& b1 = r sy sx Calculator Use You may need to use your calculator to create a scatter plot, compute the equation of a least squares regression line (and the values of r and r ), graph the regression line with the data, and create a residual plot. Generally computer output and graphs are provided with bivariate data analysis questions, but you cannot be sure that these will be provided.

Communication, skills, and understanding 1. If you are asked to make a scatter plot or residual plot, be sure to include a title, labels on the horizontal and vertical axes, and scales on both axes. In a scatter plot, the explanatory or predictor variable is on the horizontal axis and the response variable is on the vertical axis.. When you describe the information provided by a scatter plot, be sure to comment on the direction, shape (form), and strength of the relationship in context. Also, comment on unusual features (such as outliers). 3. The least squares regression line passes through the point (x, y) and has slope r sy. So given the values of sx x, y, sx, sy and r, you can find the equation of the least squares regression line using point slope form of a sy linear equation y! y = (r )(x! x). sx 4. When you write the equation of a least squares regression line, be sure to include the hat on the y variable and be sure to identify both variables in your equation. 5. If asked to interpret the intercept or slope, be sure that your interpretation is in the context of the problem. Also make sure that you do not make a deterministic statement about the slope or intercept. The intercept provides an estimate for the value of y when x is zero. The slope provides information about the estimated amount that the y variable changes (or the amount that the y variable changes on average) for each unit change in the x variable. 6. Residual = observed y value predicted y value = y! y 7. To determine whether the model is a good fit for the data, examine a residual plot and make sure that the residuals are randomly scattered about the horizontal axis. 8. Be careful in using the least squares line to predict outside the domain of the observed values of the explanatory variable. Extrapolation is risky! 9. The least squares line is the line that minimizes the sum of the squared residuals. 10. An influential point is a point that noticeably affects the slope of the regression line when removed from (or added to) the data set. An outlier is a point that noticeably stands apart from the other points. 11. The magnitude of the correlation coefficient gives information about the strength of the linear relationship between two quantitative variables over the observed domain. That is, correlation provides information about how tightly points are clustered about a line. 1. If asked to interpret the correlation coefficient, be sure to comment on the strength and direction of the relationship in context. 13. The correlation coefficient is sensitive to the effect of outliers. 14. The magnitude of the correlation coefficient does not provide information about whether a linear model is appropriate. You must also consider the residual plot. 15. When there is a strong linear association between two variables, the value of r is close to 1 or 1; when there is a very weak linear association between two variables, the value of r is close to 0. A value of r close to 0 could be associated with a strong curved relationship. 16. A strong association does not imply causation. 17. The coefficient of determination, r, gives the proportion of variation in the observed y values that can be attributed to the linear relationship with the x variable. You must be able to interpret r in context. 18. The transformation (x,ln y) or (x,log y) will straighten data that can be modeled by an exponential function. A exponential function y = ab x does not pass through the origin. 19. The transformation (ln x,ln y) or (log x,log y)will straighten data that can be modeled by a power function. A power function y = ax n does pass through the origin.

Multiple Choice questions from 1997 Exam. 8. There is a linear relationship between the number of chirps made by the striped ground cricket and the air temperature. A least squares fit of some data collected by a biologist gives the model y = 5. + 3.3x 9 < x < 5 where x is the number of chirps per minute and y is the estimated temperature in degrees Fahrenheit. What is the estimated increase in temperature that corresponds to an increase of 5 chirps per minute? A. B. C. D. E. 3.3 F 16.5 F 5. F 8.5 F 41.7 F 31. The equation of the least squares regression line for the points on the scatterplot (not pictured) is y = 1.3 + 0.73x. What is the residual for the point (4, 7)? A. B. C. D. E..78 3.00 4.00 4. 7.00 Multiple Choice Questions from 00 Exam 6. The correlation between two scores X and Y equals 0.8. If both the X scores and the Y scores are converted to z scores, then the correlation between the z scores for X and the z scores for Y would be A. B. C. D. E. 0.8 0. 0.0 0. 0.8 17. A least squares regression line was fitted to the weights (in pounds) versus age (in months) of a group of many young children. The equation of the line is y = 16.6 + 0.65x where y is the predicted weight and x is the age of the child. A 0 month old child in this group has an actual weight of 5 pounds. Which of the following is the residual weight, in pounds, for this child? A. B. C. D. E. 7.85 4.60 4.60 5.00 7.85

31. A wildlife biologist is interested in the relationship between the number of chirps per minute for crickets and temperature. Based on the collected data, the least squares regression line is y = 10.53 + 3.41x, where x is the number of degrees Fahrenheit by which the temperature exceeds 50! F and y is the number of chirps per minute. Which of the following best describes the meaning of the slope of the least squares regression line? A. B. C. D. E. For each increase in temperature of 1 F, the estimated number of chirps per minute increases by 10.53. For each increase in temperature of 1 F, the estimated number of chirps per minute increases by 3.41. For each increase of one chirp per minute, there is an estimated increase in temperature of 10.53 F. For each increase of one chirp per minute, there is an estimated increase in temperature of 3.41 F. The slope has no meaning because the units of measure for x and y are not the same. 34. Each of 100 laboratory rats has available both plain water and a mixture of water and caffeine in their cages. After 4 hours, two measures were recorded for each rat: the amount of caffeine the rat consumed, X, and the rat's blood pressure, Y. The correlation between X and Y was 0.48. Which of the following conclusions is justified on the basis of this study? A. The correlation between X and Y in the population of rats is also 0.48. B. If the rats stop drinking the water/caffeine mixture, this would cause a reduction in their blood pressure. C. About 18 percent of the variation in blood pressure can be explained by a linear relationship between blood pressure and caffeine consumed. D. Rats with lower blood pressure do not like the water/caffeine mixture as much as do rats with higher blood pressure. E. Since the correlation is not very high, the relationship between the amount of caffeine consumed and blood pressure is not linear.

MC Answers: 8-B 31-A 6-E 17-B 31-B 34-C 1999 #1 1. Lydia and Bob were searching the Internet to find information on air travel in the United States. They found data on the number of commercial aircraft flying in the United States during the years 1990-1998. The dates were recorded as years since 1990. Thus, the year 1990 was recorded as year 0. They fit a least squares regression line to the data. The graph of the residuals and part of the computer output for their regression are given below. A. Is a line an appropriate model to use for these data? What information tells you this? B. What is the value of the slope of the least squares regression line? Interpret the slope in the context of this situation.

C. What is the value of the intercept of the least squares regression line? Interpret the intercept in the context of this situation. D. What is the predicted number of commercial aircraft flying in 199? E. What was the actual number of commercial aircraft flying in 199?

AP STATISTICS 1999 SCORING GUIDELINES Question 1 Solution: a. Yes. Test for slope indicates that the linear model is useful (Ho: BETA is equal to 0, Ha: BETA is not equal to 0, t = 54.11, p-value =.000) and the residual plot shows no pattern, indicating a linear model is appropriate. b. Slope = 33.517 aircraft/year. On average, the number of commercial aircraft flying in the U.S. increased by approximately 33.517 each year. (OK if rounded to 34 in interpretation) c. Intercept = 939.93 aircraft. Predicted number of commercial aircraft that were flying in 1990 (since x = 0 corresponds to year 1990) was 939.93. (OK if rounded to 940 in interpretation) d. For 199, x =, so predicted number of commercial aircraft flying is 939.93 + 33.517() = 3406.964 aircraft e. From the residual plot, the residual for 199 is +40, so actual - predicted = 40 and Actual = 3406.964 + 40 = 3446.964 aircraft. Since actual number flying must be an integer, actual must have been 3447. Notes: Part (a) can be considered essentially correct even if it fails to mention the t test, as long as it discusses the residual plot. Parts (b) and (c) should draw the distinction between the model and the data. They can beconsidered essentially correct if the student incorporates the idea of estimation using words such as on average, predicted, approximately, about, etc. Parts (b) and (c) can be considered partially correct if the student (1) incorrectly identifies the values for the slope and intercept but gives an essentially correct interpretation OR () correctly identifies the values for the slope and intercept but gives an incomplete interpretation or an interpretation not in context for one or both. Parts (d) and (e) can be considered essentially correct if incorrect numbers from previous parts are correctly substituted. Part (e) can be considered essentially correct even if it fails to round to an integer. Points: 4 Complete Response Gives an essentially correct response to all 5 parts. 3 Substantial Response Essentially correct on 4 of the 5 parts. OR Essentially correct responses on a, d, and e AND partially correct responses on both b and c. Developing Response Essentially correct on 3 of the 5 parts. OR Partially correct responses on both b and c AND essentially correct responses on of the remaining parts 1 Minimal Response Essentially correct on 1 or of the 5 parts. OR Partially correct responses on both b and c

005 #3 3. The Great Plains Railroad is interested in studying how fuel consumption is related to the number of railcars for its trains on a certain route between Oklahoma City and Omaha. A random sample of 10 trains on this route has yielded the data in the table below. A scatterplot, a residual plot, and the output from the regression analysis for these data are shown below.

A. Is a linear model appropriate for modeling these data? Clearly explain your reasoning. B. Suppose the fuel consumption cost is $5 per unit. Give a point estimate (single value) for the change in the average cost of fuel per mile for each additional railcar attached to a train. Show your work. C. Interpret the value of r in the context of this problem. D. Would it be reasonable to use the fitted regression equation to predict the fuel consumption for a train on this route if the train had 65 railcars? Explain.

AP STATISTICS 005 SCORING GUIDELINES Question 3 Solution Part (a): Yes, the linear model is appropriate for these data. The scatterplot shows a strong, positive, linear association between the number of railcars and fuel consumption, and the residual plot shows a reasonably random scatter of points above and below zero. Part (b): According to the regression output, fuel consumption will increase by.15 units for each additional railcar. Since the fuel consumption cost is $5 per unit, the average cost of fuel per mile will increase by approximately ($5)(.15) = $53.75 for each railcar that is added to the train. Part (c): The regression output indicates that r = 96.7% or 0.967. Thus, 96.7% of the variation in the fuel consumption values is explained by using the linear regression model with number of railcars as the explanatory variable. Part (d): No, the data set does not contain any information about fuel consumption for any trains with more than 50 cars. Using the regression model to predict the fuel consumption for a train with 65 railcars, known as extrapolation, is not reasonable. Scoring Each part is scored as essentially correct (E), partially correct (P), or incorrect (I). Part (a) is essentially correct (E) if the model is deemed appropriate AND the explanation clearly indicates: There is a linear pattern in the scatterplot; OR There is no pattern in the residual plot. Part (a) is partially correct (P) if the: Model is deemed appropriate AND the student refers to the scatterplot or residual plot but fails to state the relevant characteristic of the plot; OR Student refers to the relevant characteristic of the scatterplot or residual plot without deeming model appropriate. Part (a) is incorrect (I) if the student: States that the model is appropriate without an explanation; OR States that the model is inappropriate; OR Makes a decision based only on numeric values from the computer output. Part (b) is essentially correct (E) if the point estimate for the slope (.15 or.1495) and the fuel consumption cost per unit ($5) are used to calculate the correct point estimate ($53.75 or $53.7375 $53.74). Part (b) is partially correct (P) if only the point estimate for the slope (.15 or.1495) is stated with a supporting calculation or interpretation. Part (c) is essentially correct (E) if the student states: 96.7% of the variation in fuel consumption is explained by the linear regression model; OR 96.7% of the variation in fuel consumption is explained by the number of railcars. Part (c) is partially correct (P) if the student makes one of the above statements using R-Sq(adj) = 96.3%.

Part (d) is essentially correct (E) if the student states that this is unreasonable due to extrapolation. Part (d) is partially correct (P) if the student states this is: Unreasonable but provides a weak explanation; OR Reasonable even though it is considered a slight extrapolation. Note: Any answer appearing without supporting work is scored as incorrect (I). Each essentially correct (E) response counts as 1 point, each partially correct (P) response counts as 1/ point. 4 3 1 Complete Response Substantial Response Developing Response Minimal Response Note: If a response is in between two scores (for example, 1/ points), use a holistic approach to determine whether to score up or down depending on the strength of the response and communication.

Sample: 3A Score: 4 In part (a) the student s comment that the original data appears linear is too vague on its own to earn credit. However, the subsequent statement about the residual plot being randomly distributed is sufficient. The student gives a clear explanation for a correct calculation in part (b). Although the response makes no mention of the linear model in part (c), it does convey a generally correct understanding of what rmeasures. In part (d) the response shows a clear understanding of why extrapolation is not appropriate.