Math 11R Regression - Notetaking Guide Name: Period: Day 1 Scatter Plots and Linear Correlation Coefficients Statisticians and scientists gather data to determine correlations or between events. Regression Analysis: use existing data to obtain an equation from which we can predict future data where one variable is based upon another variable. Before attempting a regression analysis of data, it is often helpful to examine a scatter plot of the data to see which regression model is most likely going to be a good fit. Keep in mind that when working with real world data, it is unlikely that any regression model is going to be a "perfect" fit. The goal is to find the model that fits as many of the data points as possible and will be the best indicator of trends in the data. Scatter plots: graphically displays two related sets of data. Such a visual representation can indicate patterns, trends and relationships. ****Remember when making a scatter plot, do NOT connect the dots.**** The types of regression we will study are: linear, logarithmic, exponential, & power regression. Basic things to look for when determining a model: If the "shape" of more than one model appears to fit the data, test all of your choices to see which model is actually the best fit and the best predictor. Remember that other representations of these shapes may also exist due to the different natures of the data. The slope may be either The slope is constant The slope rises rapidly and then Remember the log graphs exist in Quads I & IV unless they have a shift The slope (or declines) slowly and then rises (falls) steeply. Remember the graphs had asymptotes and generally existed in Quads I & II unless a shift. This curve possesses characteristics NOT seen in the first three models. Not a straight line but more gradual change than exponential Linear data is the most popular because it is easy to read and interpret We will only use natural log (ln) regression Often deals with growth of populations, bacteria, radioactive decay. Could be quadratic (2 order), cubic (3 order), quartic (4th order)... 1
Examples : Which type of function (linear, exponential, logarithmic, or power) would best model the data in each of the scatter plots shown below? Explain your reasoning. Linear Regression Line of best fit (trend line) - A line on a scatter plot which can be drawn near the points to more clearly show the trend between two sets of data. Positive correlation: Negative correlation: No Correlation: Strong positive and negative correlations have data points very close to the line of best fit.. Weak positive and negative correlations have data points which are not clustered near the line of best fit. Coefficients of linear correlation (r):. 2
Weight (ounces) The closer the absolute value of r is to 1, the more closely the regression line fits the data (stronger linear correlation) r = -1-1 r < 0 r = 0 0 < r 1 r = 1 Correlation Coefficients Examples 1. For a certain data set relating a and b, the line of the best fit has a positive slope. This indicates that in general (1) Larger values of a are paired with larger values of b. (2) Larger values of a are paired with smaller values of b. (3) Larger values of b are paired with smaller values of a. (4) The slope gives no information about the relationship between the sizes or a and b. 2. The correlation between the ages and prices of similar makes and models of cars on a used car lot would probably be (1) positive (2) negative (3) zero (4) impossible to determine 3. The correlation between the heights and shoe sizes of the Junior Varsity Basketball team would probably be (1) positive (2) negative (3) zero (4) impossible to determine 4. Which correlation coefficient shows the best association between two variables? (1) 0.90 (2) -0.30 (3) 0.80 (4) 1.20 5. The scatter plot shows the heights (h) and the weight (w) of babies born in a northeastern hospital over the course of a week. Does the scatter plot show a positive correlation, a negative correlation, or relatively no correlation? Explain your answer. 200 150 100 50 Births 0 18 19 20 21 22 23 Height (inches) 3
Homework R1 In 1 6, match each scatterplot with the appropriate correlation coefficient. a) +1 b) +0.8 c) +0.3 d) 0 e) -0.6 f) -0.9 1. 2. 3. 4. 5. 6. Match each graph with a description of its correlation coefficient: positive, negative, or almost zero. 7. 8. 9. 4
Homework R1 (cont d.) 10. Which correlation coefficient could match the graph? 1) -0.9 2) -0.3 3) 0.6 4) 1 11. Which of the following is a true statement? 1) The line of best fit always passes through each of the given data points. 2) The line of best fit must have a correlation coefficient of +1 or -1. 3) A correlation coefficient of -1 means that there is no correlation. 4) If the correlation coefficient is negative, the line of best fit has a negative slope. 12. Which of the following correlation coefficients most clearly represents a linear relationship for a given set of data points? 1) -0.89 2) -0.52 3) 0.58 4) 0.76 13. Which of the following correlation coefficients would indicate no significant linear relationship for the independent and depended variables in a data set? 1) -1 2) -0.52 3) 0.15 4) 0.90 Consider the paired variables and decide whether they have a positive correlation, a negative correlation, or almost zero correlation. 14. The number of bedrooms in a family home and the number of years the family has lived in the home. 15. The age of a child and the number of years of school that a child has attended. 16. The speed at which a car is driven and the distance traveled in a 6-hour period. 17. The age of a nonclassic car and its value for trade-in. 18. The waiting time at a restaurant and the number of entrees offered. 5
Day 2 Regression Analysis Warm-Up 1. If is a positive acute angle and sin a, which expression represents cos in terms of a? 1 (1) a (3) a 2 (2) 1 a (4) 1 1 a 2 x 2. Solve algebraically for x: 27 9 2 1 4x Remember, Regression analysis uses existing data to obtain an equation from which we can predict future data. The Correlation Coefficient (r) is an indication of how well a model fits a particular set of data. The correlation coefficient is designated by r and falls into the range -1 < r < 1. If r is close to 1 (or -1), the model is considered a "good fit". If r is close to 0, the model is "not a good fit". Comparing correlation coefficients of different regression models for the same set of data cannot be used to determine which is the best regression model. Examine the scatterplots!!! Both linear and non-linear regressions can be found using the graphing calculator. All types of regressions on the calculator are prepared in a similar manner. The calculator is capable of determining various types of regressions. Your options can be found under STAT CALC (arrow down for more choices) Examples: 1. Create a scatterplot on your calculator. x 3 8 4 17 11 y 6 12 2 24 10 a. Find the line of best fit, rounded to the nearest hundredth. b. Enter this linear equation into your y= and plot it. c. What is the correlation coefficient rounded to the nearest ten-thousandth? Is this a good fit? d. Find the y-value when x = 5, rounded to the nearest hundredth. e. Find the x-value when y = 26, rounded to the nearest tenth 6
TI 83+ GRAPHING CALCULATOR 1. TO DRAW A SCATTER PLOT: a) Enter the data in a list: To clear existing list, go to the top of the column, highlight L#, press: b) Turn Plot 1 on by pressing OR Press Clear any equations in the y = menu. Turn Plot 1 on by going up and highlighting it and press c) Press (ZoomStat) TO TURN DIAGNOSTICS (r) ON **On TI-84+ with new operating system - mode To turn r on: Press this will bring you to the catalog in the D section Arrow Down until Diagnostic On is highlighted, press Screen will say Done when turned on. Turn r on screens look like: 2. TO DETERMINE THE REGRESSION EQUATION OF BEST FIT: (Linear. Logarithmic, Power, Exponential) a) Press (to highlight Calc) **A number from the list below** There are 4 required and 1 optional types of regressions for Test B LinReg(ax + b) Linear Regression: y = ax + b (a is the slope, b is the y-intercept LnReg Natural Logarithmic Regression: y = a + b ln x ExpReg Exponential Regression: y = a(b) x (Or A) PwrReg Power Regression: y = a(x) b 7
b) Remember, r value, or correlation coefficient, will tell you how good the fit is. If r does not appear when doing the regressions, it must be turned on. The default is OFF when the calculator is reset. 3. TO DRAW THE EQUATION: a) Using the a & b found above rounded appropriately, type the equation into then press b) A curve will appear on scatter plot. (Plot 1 must still be on (see 1 above) to see scatter plot) If no rounding directions are provided, the curve can be dumped into the y= directly using the following key strokes. MAKE SURE YOU ARE IN Y= TO START WITH. 4. TO FIND A Y VALUE ON THE LINE: a) Press b) X= appears on the screen. Type in the known x value, press and the y-value will appear c) If ERR: INVALID appears, press (Goto). The domain must include the x-value, press the xmin &/or xmax and change d) Repeat steps a and b 5. TO FIND AN X VALUE ON THE LINE: a) Enter given y value in menu as y 2 b) Press and two lines should appear c) Adjust the window for proper domain and range if the intersection is difficult to view. d) Press for intersect e) Move cursor left or right by pressing arrows to get close to the point of intersection. Press for the first curve, for the second curve, (no need to guess). f) The point of intersection will appear at the bottom of the screen. X value is what to look for. Y value should be the given value. (To find x value without calculator, plug given y into equation and solve for x algebraically) 8
For steps d e your screen will look like: 2. The table below, created in 1996, shows a history of transit fares from 1955 to 1995. On your calculator, find the exponential regression equation with the coefficient and base rounded to the nearest thousandth. Using this equation, determine the prediction that should have been made for the year 1998, to the nearest cent. Year 55 60 65 70 75 80 85 90 95 Fare (S) 0.10 0.15 0.20 0.30 0.40 0.60 0.80 1.15 1.50 3. Create a scatterplot on your calculator. X 3 7 15 23 31 Y 15 18 26 32 37 a. Find the line of best fit. b. Enter this linear equation into your y= and plot it. c. What is the correlation coefficient? Is this a good fit? d. Find the y-value when x = 25. e. Find the x-value when y = 20 9
4. The data below show the average growth rates of 12 Weeping Higan cherry trees planted in Washington, DC. At the time of planting, the trees were one year old and were all 6 feet in height. a. Determine a logarithmic regression model that approximates the data, rounded to the nearest thousandth. b. Graph the new equation. c. Determine if the model is a good fit. d. What was the average height of the trees at 1.5 years of age, to the nearest tenth of a foot? Age of Tree (in years Height (in feet) 1 6 2 9.5 3 13 4 15 5 16.5 6 17.5 7 18.5 8 19 9 19.5 10 19.7 11 19.8 e. What is the predicted average height of the trees at 20 years of age, rounded to the nearest tenth? Is this realistic? f. If the average height of the cherry trees is 10 feet, what is the age of the trees, to the nearest tenth of a year? 5. Write a power function y = ax b whose graph passes through the points (2, 4), (6, 10). What is the correlation coefficient? 10
Homework R2 1. Which equation models the data in the accompanying table? Time in hours, x 0 1 2 3 4 5 6 Population, y 5 10 20 40 80 160 320 (1) y = 2x + 5 (2) 2 x y (3) y = 2x (4) y 52 x 2. The table shows the amount A in a savings account t years after the account was opened. t 0 1 2 3 4 5 6 7 A 210 255 310 377 459 557 677 822 a. Write an exponential model that relates the amount of savings A to the years after the account was opened, t, rounding to the nearest hundredth. b. What is the amount of savings accrued after 10 years, round to the nearest dollar? c. When does the account reach $400. Round your answer to the nearest tenth of a year. 3. Create a scatterplot on your calculator. X 2 3.5 5 7 8 Y 6 32 93 244 358 a. Find the line of best fit, rounded to the nearest hundredth. b. Enter this linear equation into your y= and plot it. c. What is the correlation coefficient rounded to the nearest ten-thousandth? Is this a good fit? d. Find the y-value when x = 6, rounded to the nearest tenth. e. Find the x-value when y = 300, rounded to the nearest tenth 11
Homework R2 (cont d.) 4. Write a power function whose graph passes through the points (3, 8) and (6, 17). 5. Find an exponential model for the data: (0, 17.56), (1, 16.03), (2, 14.64), (3, 13.36), (4, 12.20), (5, 11.14), (6, 10.17). Round to the nearest hundredth. 6. For which pair of data would you expect a negative correlation? a) The number of hours studied for a test and grades on that test. b) ages of husbands and wives c) sale price of an item and number of units of that item sold d) income and shoe sizes of adults 7. For which pair of measurements would you expect no significant correlation? a) hand size and shoe size. b) income and education. c) car weight and the number of miles it can travel on 1 gallon of gasoline. d) bowling scores and number of traffic tickets. 8. a) Find the equation of the regression line based on the data below b) What is the correlation coefficient? X 2 4 5 6 8 Y 3 1 0-1 -3 12
Day 3 Scatterplots Warm-Up 1. The accompanying diagram shows unit circle O, with radius OB = 1. Which line segment has a length equivalent to cos θ? (1) AB (3) OC (2) CD (4) OA 2. The accompanying diagram shows a triangular plot of land that is part of Fran's garden. She needs to change the dimensions of this part of the garden, but she wants the area to stay the same. She increases the length of side AC to 22.5 feet. If angle A remains the same, by how many feet should side AB be decreased to make the area of the new triangular plot of land the same as the current one? Scatter plots graphically displays two related sets of data. Such a visual representation can indicate patterns, trends and relationships. Remember when making a scatter plot, do NOT connect the dots. To Create a Scatter Plot: 1. Determine the range of values: high and low for both x and y. 2. If data does not start at 0, put a lightning bolt on the scale. 3. Equally space the values on each axis to include all points. 4. Label each axes & Title the graph. 5. Plot points. 1) The table below gives the average number of children per family in a foreign country for the years 1980-2005. If x = 0 represents the year 1980, and y represents the average number of children per family. Year (x) 1980 1985 1990 1995 2000 2005 Children (y)t 4.8 4.0 3.5 2.7 2.4 2.0 a. Graph the points and find the equation of best fit using a linear regression. Round all values to the nearest thousandth. b. Using this equation, find the estimated average number of children per family of this country for the year 2010. c. In what year does this model predict the average number of children in this country will be 0? 13
2) The data in the accompanying table compares stopping distance with speed. Speed, x (mph) 10 20 30 40 50 60 70 Stopping Distance, y (ft.) 12.5 36.0 69.5 114.0 169.5 249.0 325.5 a. Make a scatterplot of the data. b. Find the power model to represent this data, rounded to the nearest hundredth. c. What distance is required to stop if you are traveling at a speed of 65 mph, rounded to the nearest hundredth? d. What is the speed you are traveling if the distance to stop is 50 feet, rounded to the nearest hundredth? 3) When determining an equation that describes the relationship between the weight, w, in pounds added to a spring and the length, L, in inches of the spring, as shown in the accompanying figure, Cynthia recorded the measurements shown in the accompanying table. a. Estimate an equation of the line, L = aw + b, that best fits the data Cynthia recorded, rounded to the nearest hundredth. b. On the calculator graph the regression line with a scatterplot of the data. Length, L Weight, lbs 10.75 10 11.8 20 13 30 13.9 40 15.7 50 c. Determine the load to the nearest tenth of a pound when L = 12.5 inches. d. Determine the length of the spring to the nearest tenth of an inch when w = 68 pounds. 14
4) The following data represents the wind speed (mph) and the corresponding wind chill factor at a temperature of 10 F. Wind Speed (mph) 5 10 15 20 25 30 Wind Chill ( 0 F) 1-4 -7-9 -11-12 a. On the accompanying grid, make a scatter plot of this data. b. Write a logarithmic regression equation expressing the regression coefficients to the nearest hundredth. c. What is the wind chill factor when the wind speed is 35 mph, to the nearest hundredth? d. What is the speed of the wind if the wind chill is - 2 0 F, to the nearest hundredth? 5) The breaking strength, y, in tons, of steel cable with diameter d, in inches, is given in the table below. d (in) y (tons) 0.50 0.75 1.00 1.25 1.50 1.75 9.85 21.80 38.30 59.20 84.40 114.00 a. On the accompanying grid, make a scatter plot of these data. b. Write the exponential regression equation, expressing the regression constants to the nearest tenth. 15
Homework R3 1.) The relationship of a man s hat size and shoe size is given in the accompanying table. Hat size, x 7.5 8 8.5 9 Shoe size, y 8 8.5 9 9.5 a. Create a scatterplot. b. Is the correlation positive, negative, or zero? c. What is the line of best fit? d. What is the correlation coefficient? 2) The table shows the population y (in millions) and the population rank x for 9 cities in Argentina in 1991. Population City Rank, x (millions),y Cordoba 2 1.21 Rosario 3 1.12 La Matanza 4 1.11 Mendoza 5 0.77 La Plata 6 0.64 Moron 7 0.64 San Minguel de Tucuman 8 0.62 Lomas de Zamoras 9 0.57 Mar de Plata 10 0.51 a. Draw a scatter plot of x vs. y. b. Find a power model for the original data. Estimate the population of the city Vicente Lopez, which has a population rank of 20, rounded to the nearest thousandth. 16
Homework R3 (cont d.) 3) Data: The data at the right shows the cooling temperatures of a freshly brewed cup of coffee after it is poured from the brewing pot into a serving cup. The brewing pot temperature is approximately 180 F. a) Determine an exponential regression model equation to represent the data. Round answers to the nearest thousandth. b) Graph the equation on the calculator. What is the correlation coefficient rounded to the nearest ten-thousandth. Is it a good fit? c) Based on this new equation, what is the initial temperature of the coffee, rounded to the nearest tenth? Time (min) Temp ( F) 0 179.5 5 168.7 8 158.1 11 149.2 15 141.7 18 134.6 22 125.4 25 123.5 30 116.3 34 113.2 38 109.1 42 105.7 45 102.2 50 100.5 d) When is the coffee at a temperature of 106 F, rounded to the nearest hundredth of a minute? e) What is the predicted temperature of the coffee after 1.5 hours, rounded to the nearest tenth of a degree? f) In 1992, a woman sued McDonald's for serving coffee at a temperature of 180 that caused her to be severely burned when the coffee spilled. An expert witness at the trial testified that liquids at 180 will cause a full thickness burn to human skin in two to seven seconds. It was stated that had the coffee been served at 155, the liquid would have cooled and avoided the serious burns. The woman was awarded over 2.7 million dollars. As a result of this famous case, many restaurants now serve coffee at a temperature around 155. How long should restaurants wait (after pouring the coffee from the pot) before serving coffee, to ensure that the coffee is not hotter than 155, rounded to the nearest hundredth? 17
Day 4 - Review 1. For each pair of data values below, tell whether you would expect positive, negative, or zero correlation. a. Number of hours studied for a test versus the grade on the test. b. Income versus shoe sizes of adults. c. Oven temperature versus amount of cooking time. d. Education versus income. e. The length of one s hair versus time spent in the shower. f. Movie ratings (R, PG, G ) versus movie popularity. g. Height versus arm span 2. Which scatter diagram shows the strongest positive correlation? 3. The relationship of a woman s shoe size and length of a woman s foot, in inches, is given in the accompanying table. Woman's Shoe Size 5 6 7 8 Foot Length (in) 9.00 9.25 9.5 9.75 The linear correlation coefficient for this relationship is (1) 1 (2) 1 (3) 0.5 (4) 0 4. A linear regression equation of best fit between a student s attendance and the degree of success in school is h = 0.5x + 68.5. The correlation coefficient, r, for these data would be (1) 0 r 1 (2) 1 r 0 (3) r = 0 (4) r = 1 18
5. Which graph represents data used in a linear regression that produces a correlation coefficient closest to 1? (1) (2) (3) 6. A box containing 1,000 coins is shaken, and the coins are emptied onto a table. Only the coins that land heads up are returned to the box, and then the process is repeated. The accompanying table shows the number of trials and the number of coins returned to the box after each trial. Trial 0 1 3 4 6 Coins Returned 1,000 610 220 132 45 a. Write an exponential regression equation, rounding the calculated values to the nearest ten-thousandth. b. Use the equation to predict how many coins would be returned to the box after the eighth trial. 7. The accompanying table shows the boiling points of water at different altitudes. a. Make a scatter plot of the data. Location Altitude, h (km) Boiling Point, t, (C) Wellington, New Zealand 0 100 Banff, Alberta, Canada 1.38 95 Quito, Ecuador 2.85 90 Mt. Logan, Canada 5.95 80 b. Estimate a and b correct to the nearest hundredth. Then write an equation of the line h = at + b that best fits the data. c. Write the correlation coefficient to the nearest thousandth. 19
8. Kathy swims laps at the local fitness club. As she times her laps, she finds that each succeeding lap takes a little longer as she gets tired. If the first lap takes her 33 seconds, the second lap takes 38 seconds, the third takes 42 seconds, the fifth takes 50 seconds, and the seventh lap takes 54 seconds, state the power regression equation for this set of data, rounding all coefficients to the nearest hundredth. Using your written regression equation, estimate the number of seconds that it would take Kathy to complete her tenth lap, to the nearest tenth of a second. 9. The populations of the United States in the years 1900, 1950, 1980, and 1999 are shown in the accompanying table, where t=0 represents the year 1900. Year 1900 1950 1980 1999 a. Determine an exponential function y=ab t Population (millions) 76.2 151.3 226.5 272.7 that best fits the data in the table. Estimate a and b to the nearest thousandth. b. If this growth relationship continues, predict the U.S. population in 2020, to the nearest tenth of a million. c. If this growth relationship continues, in what year will the U.S. population exceed 400 million for the first time? 10. The data in the accompanying table show the growth of cellular phone subscriptions in the United States from 1993 to 1999. a. Fit an exponential curve that best fits the data, where x=0 represents the year 1990 and y is the number of cell phone subscribers. Approximate a and b to the nearest thousandth. Year Subscriptions (millions) 1993 16.0 1995 33.79 1996 44.04 1997 55.31 1999 86.05 b. Use the model to estimate the number of cellular phone subscribers in 1998. 20
11. According to recent surveys, the percentage of new plant and equipment expenditures by US manufacturing companies on pollution control is as shown: 1975 1980 1981 1984 1987 9.3 4.8 4.3 3.3 4.3 a. Use a linear regression model to find the line of best fit. b. Estimate the figure for 1985. (Round your answer to one decimal place.) 12. A real estate agent plans to compare the price of a cottage, y, in a town on the seashore to the number of blocks, x, the cottage is from the beach. The accompanying table shows a random sample of sales and location data. a. Write a linear regression equation that relates the price of a cottage to its distance from the beach, rounded to the nearest ten-thousandth. b. Use the equation to predict the price of a cottage, to the nearest dollar, located three blocks from the beach. Number of Blocks from the Beach (x) Price of Cottage (in thousands) (y) 5 $132 0 $310 4 $204 2 $238 1 $275 7 $60.8 Day 4 Review Answers 1a) pos b) zero c) neg d) pos e) pos - debatable f) zero g) pos 2) 1 3) 1 4) 1 5) 3 6a) y = 1018.2839(.5969) x b) 16 coins 7b) y = -0.30x + 29.79 c) -1.0000 8) y = 32.35x 0.26, 58.9sec 9a) y = 77.322(1.013) x b) 364.3 million people c) 2027 10a) y = 7.746(1.319) x b) 70.964 million people 11a) y = -0.434729064x +866.5721675 b) 3.6 plant and equip expend. 12a) y = -34.7397x + 313.3091 b) $209,090 21