SAMPLE. Investigating the relationship between two numerical variables. Objectives

Size: px
Start display at page:

Download "SAMPLE. Investigating the relationship between two numerical variables. Objectives"

Transcription

1 C H A P T E R 23 Investigating the relationship between two numerical variables Objectives To use scatterplots to display bivariate (numerical) data To identify patterns and features of sets of data from scatterplots To identify positive, negative or no association between variables from a scatterplot To introduce the q-correlation coefficient to measure the strength of the relationship between two variables To introduce Pearson s product-moment correlation coefficient r to measure the strength of the linear relationship between two variables To fit a straight line to data by eye, and using the method of least squares To interpret the slope of a regression line and its intercept, if appropriate To predict the value of the dependent (response) variable from an independent (explanatory) variable, using a linear equation In Chapter 22 statistics of one variable were discussed. Sometimes values of a variable for more than one group have been examined, such as age of mothers and age of fathers, but only one variable was considered for each individual at a time. When two variables are observed for each subject, bivariate data are obtained. For example, it might be interesting to record the number of hours spent studying for an exam by each student in a class and the mark they achieved in the exam. If each of these variables were considered separately the methods discussed earlier would be used. It may be of more interest to examine the relationship between the two variables, in which case new bivariate techniques are required. When exploring bivariate data, questions arise such as, Is there a relationship between two variables? or Does knowing the value of one of the variables tell us anything about the value of the other variable? 554

2 Chapter 23 Investigating the relationship between two numerical variables 555 Consider the relationship between the number of cigarettes smoked per day and blood pressure. Since one opinion might be that varying the number of cigarettes smoked may affect blood pressure, it is necessary to distinguish between blood pressure, which is called the dependent or response variable, and the number of cigarettes, which is called the independent or explanatory variable. In this chapter some techniques are introduced which enable questions concerning the nature of the relationship between such variables to be answered Displaying bivariate data As with data concerning one variable, the most important first step in analysing bivariate data is the construction of a visual display. When both of the variables of interest are numerical then a scatterplot (or bivariate plot) may be constructed. This is the single most important tool in the analysis of such bivariate data, and should always be examined before further analysis is undertaken. The pairs of data points are plotted on the cartesian plane, with each pair contributing one point to the plot. Using the normal convention, the variable plotted horizontally is denoted as x, and the variable plotted vertically as y. The following example examines the features of the scatterplot in more detail. Example 1 The number of hours spent studying for an examination by each member of a class, and the marks they were awarded, are given in the table. Student Hours Mark Student Hours Mark Construct a scatterplot of these data. Solution The first decision to be made when preparing this scatterplot is whether to show Mark or Hours on the horizontal (x) axis. Since a student s mark is likely to depend on the hours that they spend studying, in this case Hours is the independent variable and Mark is the dependent variable. By convention, the independent variable is plotted on the horizontal (x) axis, and the dependent variable on the vertical (y) axis, giving the scatterplot shown.

3 556 Essential Advanced General Mathematics Mark y Hours x From this scatterplot, a general trend can be seen of increasing marks with increasing hours of study. There is said to be a positive association between the variables. Twovariables are positively associated when larger values of y are associated with larger values of x,asshown in the previous scatterplot. Examples of variables which exhibit positive association are height and weight, foot size and hand size, and number of people in the family and household expenditure on food. Example 2 The age, in years, of several cars and their advertised price in a newspaper are given in the following table. Age (years) Price ($) Age (years) Price ($) Construct a scatterplot to display these data. Solution In this case the independent variable is the age of the car, which is plotted on the horizontal axis. The dependent variable, price, is plotted on the vertical axis. Price ($) y Age (years) x

4 Chapter 23 Investigating the relationship between two numerical variables 557 From the scatterplot a general trend of decreasing price with increasing age of car can be seen. There is said to be a negative association between the variables. Twovariables are negatively associated when larger values of y are associated with smaller values of x,asshown in the scatterplot above. Examples of other variables which exhibit negative association are weight and number of weeks spent on a diet program, hearing ability and age, and number of cold rainy days per week and sales of ice creams. The third alternative is that a scatterplot shows no particular pattern, indicating no association between the variables. y x There is no association between two variables when the values of y are not related to the values of x,asshown in the preceding scatterplot. Examples of variables which show no association are height and IQ for adults, price of cars and fuel consumption, and size of family and number of pets. When one point, or a few points, do not seem to fit with the rest of the data they are called outliers. Sometimes a point is an outlier, not because its x value or its y value is in itself unusual, but rather because this particular combination of values is atypical. Consequently such an outlier cannot always be detected from single variable displays, such as stem-and-leaf plots. Forexample, consider this scatterplot. While the y variable plotted on the horizontal axis takes values 8 from 1 to 8 and the variable plotted on the vertical 6 axis takes values from 2 to 8, the combination (2, 8) 4 is clearly an outlier x

5 558 Essential Advanced General Mathematics Using the TI-Nspire The calculator can be used to construct a scatterplot of statistical data. The procedure is illustrated using the age and price of car data from Example 2. The data is easiest entered in a Lists & Spreadsheet application ( 3). Firstly, use the up/down arrows ( )to name the first column age and the second column price. Then enter the data as shown. Open a Data & Statistics application ( 5) tograph the data. At first the data displays as shown. Specify the x variable by selecting Add X Variable from the Plot Properties (b 2 4) and selecting age. Specify the y variable by selecting Add Y Variable from the Plot Properties (b 2 6) and selecting price. The data now displays as shown.

6 Chapter 23 Investigating the relationship between two numerical variables 559 Using the Casio ClassPad The table represents the results of 12 students in two tests. Test 1 score Test 1 score Enter the data into list1 (x) and list2 (y) inthe Setting...and select the tab for Graph3. (Note: Following on from the types of graphs in univariate statistics, this allows the Scatterplot settings to be remembered and called upon when required.) Ensure that all other graphs are de-selected and tap to produce the graph shown in the full screen. module. Tap SetGraph, Select the graph window (bold border) and tap Analysis, Trace to scroll from point to point and display the coordinates at the bottom of the graph. Exercise 23A Note: Save your data for 1 4 in named lists as they will be needed for later exercises. 1 The amount of a particular pain relief drug given to each patient and the time taken for the patient to experience pain relief are shown. a b c Patient Drug dose (mg) Response time (min) Plot the response time against drug dose. From the scatterplot, describe any association between the two variables. Identify outliers, if any, and interpret. 2 The proprietor of a hairdressing salon recorded the amount spent advertising in the local paper and the business income for each month of a year, with the following results.

7 560 Essential Advanced General Mathematics Month Advertising ($) Business ($) a b c Month Advertising ($) Business ($) Plot the business income against the advertising expenditure. From the scatterplot, describe any association between the two variables. Identify outliers, if any, and interpret. 3 The number of passenger seats on the most commonly used commercial aircraft, and the airspeeds of these aircraft, in km/h, are shown in the following table. a b c Number of seats Airspeed (km/h) Number of seats Airspeed (km/h) Plot the airspeed against the number of seats. From the scatterplot, describe any association between the two variables. Identify outliers, if any, and interpret. 4 The price and age of several secondhand caravans are listed in the table. a b c Age (years) Price ($) Age (years) Price ($) Plot the price of the caravans against their age. From the scatterplot, describe any association between the two variables. Identify outliers, if any, and interpret The q-correlation coefficient If the plot of a bivariate data set shows a basic trend, apart from some randomness, then it is useful to provide a numerical measure of the strength of the relationship between the two variables. Correlation is a measure of strength of a relationship which applies only to

8 Chapter 23 Investigating the relationship between two numerical variables 561 numerical variables. Thus it is sensible, for example, to calculate the correlation between the heights and weights for a group of students, but not between height and gender, as gender is not a numerical variable. There are many different numerical measures of correlation, and each has different properties. In this section the q-correlation coefficient will be introduced. Consider the scatterplot of the number of hours spent by each member of a class when studying for an examination, and the mark they were awarded, from Example 1. This shows a positive association. To calculate the q-correlation coefficient, first find the median value for each of the variables separately. This can be done from the data, but it is usually simpler to calculate directly from the plot. There are 20 data points, and the median values are halfway between the 10th and 11th points, both vertically and horizontally. A vertical line is then drawn through the median x value, and a horizontal line through the median y value. The effect of this is to divide the plot into four regions, as shown. Marks y Hours x Each of the four regions which have been created in this way is called a quadrant, and it can be noticed immediately that most of the points in this plot are in the upper right and lower left quadrants. In fact, wherever there is a positive association between variables this will be the case. Consider the scatterplot of the age of cars and the advertised price from Example 2, which shows negative association. Again the median value for each of the variables is found separately. There are 15 data points, giving the median values at the 8th points, both vertically and horizontally. A vertical line is then drawn through the median x value, and a horizontal line through the median y value. In this particular case they are coordinates of the same point, but this need not be so. Price ($) y Age (years) x

9 562 Essential Advanced General Mathematics It can be seen in this example that most points are in the upper left and the lower right quadrants, and this is true whenever there is a negative association between variables. These observations lead to a definition of the q-correlation coefficient. The q-correlation coefficient can be determined from the scatterplot as follows. Find the median of all the x values in the data set, and draw a vertical line through this value. Find the median of all the y values in the data set, and draw a horizontal line through this value. y 0 The plane is now divided into four quadrants. Label the quadrants A, B, C and D as shown in the diagram. Count the number of points in each of the quadrants A, B, C and D. Any points which lie on the median lines are omitted. Let a, b, c, d represent the number of points in each of the quadrants A, B, C and D respectively. Then the q-correlation coefficient is given by (a + c) (b + d) q = a + b + c + d Example 3 Use the scatterplot from Example 1 to determine the q-correlation coefficient for the number of hours each member of a class spent studying for an examination and the mark they were awarded. Solution There are nine points in quadrant A, one in quadrant B, nine in quadrant C and one in quadrant D. (a + c) (b + d) Thus q = a + b + c + d (9 + 9) (1 + 1) = = = = 0.8 B C A D x

10 Chapter 23 Investigating the relationship between two numerical variables 563 Example 4 Use the scatterplot from Example 2 to determine the q-correlation coefficient for the age of cars and their advertised price. Solution There is one point in quadrant A, six in quadrant B, one in quadrant C and six in quadrant D. (a + c) (b + d) q = a + b + c + d (1 + 1) (6 + 6) = = = = 0.71 From Examples 3 and 4 it can be seen that q-correlation coefficients may take both positive and negative values. Consider the situation when all the points are in the quadrants A and C. (a + c) (b + d) q = a + b + c + d = a + c a + c = 1 (since b and d are both equal to zero) Thus the maximum value the q-correlation coefficient may take is 1, and this indicates a measure of strong positive association. Suppose all the points are in the quadrants B and D. (a + c) (b + d) q = a + b + c + d (b + d) = b + d = 1 (since a and c are both equal to zero) Thus the minimum value the q-correlation coefficient may take is 1, and this indicates a measure of strong negative association.

11 564 Essential Advanced General Mathematics When the same number of points are in each of the quadrants A, B, C and D then: (a + c) (b + d) q = a + b + c + d 0 = a + b + c + d (since a = b = c = d) = 0 This value of the q-correlation coefficient clearly indicates that no association exists. q-correlation coefficients can be classified as follows: 1 q 0.75 strong negative relationship 0.75 < q 0.50 moderate negative relationship 0.50 < q 0.25 weak negative relationship 0.25 < q < 0.25 no relationship 0.25 q < 0.50 weak positive relationship 0.50 q < 0.75 moderate positive relationship 0.75 q 1 strong positive relationship Exercise 23B 1 Use the table of q-correlation coefficients to classify the following. a q = 0.20 b q = 0.30 c q = 0.85 d q = 0.33 e q = 0.95 f q = 0.75 g q = 0.75 h q = 0.24 i q = 1 j q = 0.25 k q = 1 l q = Calculate the q-correlation coefficient for each pair of variables shown in the following scatterplots. a y x

12 Chapter 23 Investigating the relationship between two numerical variables 565 b y Example 4 c d y y The amount of a particular pain relief drug given to each patient and the time taken for the patient to experience pain relief are shown. Patient Drug dose (mg) Response time (min) x x x a Use your scatterplot from 1,Exercise 23A to find the q-correlation coefficient for response time and drug dosage.

13 566 Essential Advanced General Mathematics Example 3 b Classify the strength and direction of the relationship between response time and drug dosage according to the table given. 4 The proprietor of a hairdressing salon recorded the amount spent advertising in the local paper and the business income for each month of a year, with the following results. Month Advertising ($) Business ($) a b Month Advertising ($) Business ($) Use your scatterplot from 2,Exercise 23A to find the q-correlation coefficient for advertising expenditure and total business conducted. Classify the strength and direction of the relationship between advertising expenditure and business income according to the table given. 5 The number of passenger seats on the most commonly used commercial aircraft, and the airspeeds of these aircraft, in km/h, are shown in the following table. a b Number of seats Airspeed (km/h) Number of seats Airspeed (km/h) Use your scatterplot from 3,Exercise 23A to find the q-correlation coefficient for the number of seats on an airline and the air speed. Classify the strength and direction of the relationship between the number of seats on an airline and the air speed according to the table given. 6 The price and age of several secondhand caravans are listed in the table. Age (years) Price ($) Age (years) Price ($)

14 a b Chapter 23 Investigating the relationship between two numerical variables 567 Use your scatterplot from 4,Exercise 23A to find the q-correlation coefficient for price and age of secondhand caravans. Classify the strength and direction of the relationship between price and age of secondhand caravans according to the table given The correlation coefficient When a relationship is linear the most commonly used measure of strength of the relationship is Pearson s product-moment correlation coefficient, r. Itgives a numerical measure of the degree to which the points in the scatterplot tend to cluster around a straight line. Pearson s product-moment correlation is defined to be degree which the variables vary together r = degree which the two variables vary separately Formally, if we call the two variables x and y and we have n observations then Pearson s product-moment correlation for this set of observations is r = 1 n ( )( ) xi x yi ȳ n 1 i=1 s x s y where x and s x are the mean and standard deviation of the x scores and ȳ and s y are the mean and standard deviation of the y scores. There are two key assumptions made in calculating Pearson s correlation coefficient r. They are the data is numerical the relationship being described is linear. Pearson s correlation coefficient r has the following properties: If there is no linear relationship, r = 0. y 0 r = 0 For a perfect positive linear relationship, r =+1. y x For a perfect negative linear relationship, r = 1. y x x 0 r = +1 0 r = 1

15 Verbal Age Calf 568 Essential Advanced General Mathematics Otherwise, 1 r +1 Pearson s correlation coefficient r can be classified as follows: 1 r 0.75 strong negative linear relationship 0.75 r 0.50 moderate negative linear relationship 0.50 r 0.25 weak negative linear relationship 0.25 < r < 0.25 no linear relationship 0.25 r < 0.50 weak positive linear relationship 0.50 r < 0.75 moderate positive linear relationship 0.75 r 1 strong positive linear relationship The following scatterplots show linear relationships of various strengths together with the corresponding value of Pearson s product-moment correlation coefficient. CO level Traffic volume Carbon monoxide level in the atmosphere and traffic volume: r = Mortality Smoking ratio Mortality rate due to lung cancer and smoking ratio (100 average): r = Mathematics Testosterone level Age first convicted and testosterone (a male hormone) level of a group of prisoners: r = Score Age 1st word Score on aptitude test (taken later in life) and age (in months) when first word spoken: r = Scores on standardised tests of verbal and mathematical ability: r = Age Calf measurement and age of adult males: r = 0.005

16 Chapter 23 Investigating the relationship between two numerical variables 569 Using the TI-Nspire The value Pearson s product-moment correlation coefficient, r, can be calculated using the calculator. This will be illustrated using the age and price of car data from Example 2. With the data entered as the two lists age and price respectively, we now open a Calculator application ( 1)tocalculate Pearson s product-moment correlation coefficient. Use Linear Regression (mx + b) from the Stat Calculations submenu of the Statistics menu (b 613) and complete the dialog box as shown. Press enter to obtain the regression information including the value for r as shown. Using the Casio ClassPad Consider the following set of data x y Enter the data into list1 (x) and list2 (y). Tap Calc, Linear Reg and select the settings shown. Tap OK to produce the results shown in the second screen. The value of r is shown in the answer box. TapOKtoproduce a scatterplot.

17 570 Essential Advanced General Mathematics After the mean and standard deviation, Pearson s product-moment correlation is one of the most frequently computed descriptive statistics. It is a powerful tool but it is also easily misused. The presence of a linear relationship should always be confirmed with a scatterplot before Pearson s product-moment correlation is calculated. And, like the mean and the standard deviation, Pearson s correlation coefficient r is very sensitive to the presence of outliers in the sample. Exercise 23C 1 Use the table of Pearson s correlation coefficients r to classify the following. a r = 0.20 b r = 0.30 c r = 0.85 d r = 0.33 e r = 0.95 f r = 0.75 g r = 0.75 h r = 0.24 i r = 0.50 j r = 0.25 k r = 1 l r = 1 2 By comparing the plots given to those on page 538 estimate the value of Pearson s correlation coefficient r. a y b y c e y x x d y y f y x x x x

18 Chapter 23 Investigating the relationship between two numerical variables The amount of a particular pain relief drug given to each patient and the time taken for the patient to experience relief are shown. a b Patient Drug dose (mg) Response time (min) Determine the value of Pearson s correlation coefficient. Classify the relationship between drug dose and response time according to the table given. 4 The proprietor of a hairdressing salon recorded the amount spent on advertising in the local paper and the business income for each month for a year, with the following results. Month Advertising ($) Business ($) a b Month Advertising ($) Business ($) Determine the value of Pearson s correlation coefficient. Classify the relationship between the amount spent on advertising and business income according to the table given. 5 The number of passenger seats on the most commonly used commercial aircraft, and the airspeeds of these aircraft, in km/h, are shown in the following table. a b Number of seats Airspeed (km/h) Number of seats Airspeed (km/h) Determine the value of Pearson s correlation coefficient. Classify the relationship between the number of passenger seats and airspeed according to the table given.

19 572 Essential Advanced General Mathematics 6 The price and age of several secondhand caravans are listed in the table. a b Age (years) Price ($) Age (years) Price ($) Determine the value of Pearson s correlation coefficient. Classify the relationship between price and age according to the table given. 7 The following are the scores for a group of ten students who each had two attempts at a test (out of 70). a b Attempt Attempt Construct a scatterplot of these data, and describe the relationship between scores on attempt 1 and attempt 2. Is it appropriate to calculate the value of Pearson s correlation coefficient for these data? Give reasons for your answer. 8 This table represents the results of two Student Test 1 Test 2 different tests for a group of students a Construct a scatterplot of these data, and describe the relationship between scores on Test 1 and Test b Is it appropriate to calculate the value of Pearson s correlation coefficient for these data? Give reasons for your answer c Determine the values of the q-correlation coefficient and Pearson s correlation coefficient r d Classify the relationship between Test 1 and Test 2 using both the q-correlation coefficient and Pearson s correlation coefficient r, and compare. e It turns out that when the data was entered into the student records, the result for Test 2, Student 9 was entered as 32 instead of 320. i Recalculate the values of the q-correlation coefficient and Pearson s correlation coefficient r with this new data value. ii Compare these values with the ones calculated in c.

20 Chapter 23 Investigating the relationship between two numerical variables Lines on scatterplots If a linear relationship exists between two variables it is possible to predict the value of the dependent variable from the value of the independent variable. The stronger the relationship between the two variables the better the prediction that is made. To make the prediction it is necessary to determine an equation which relates the variables and this is achieved by fitting a line to the data. Fitting a line to data is often referred to as regression, which comes from a Latin word regressum which means moved back. The simplest equation relating two variables x and y is a linear equation of the form y = a + bx where a and b are constants. This is similar to the general equation of a straight line, where a represents the coordinate of the point where the line crosses the y axis (the y axis intercept), and b represents the slope of the line. In order to summarise any particular (x, y) data set, numerical values for a and b are needed that will make the line pass close to the data. There are several ways in which the values of a and b can be found, of which the simplest is to find the straight line by placing a ruler on the scatter diagram, and drawing a line by eye, which appears to follow the general trend of the data. Example 6 The following table gives the gold medal winning distance, in metres, for the men s long jump for the Olympic games for the years 1896 to (Some years were missing owing to the two world wars.) Find a straight line which fits the general trend of the data, and use it to predict the winning distance in the year Year Distance (m) Year Distance (m) Solution Distance Year

21 574 Essential Advanced General Mathematics Note that this scatterplot does not start at the origin. Since the values of the coordinates that are of interest on both axes are a long way from zero, it is sensible to plot the graph for that range of values only. In fact, any values less than 1896 on the horizontal axis are meaningless in this context. The line shown on the scatterplot is only one of many which could be drawn. To enable the line to be used for prediction it is necessary to find its equation. To do this, first determine the coordinates of any two points through which it passes on the scatterplot. Appropriate points are (1932, 7.65) and (1976, 8.36). The equation of the straight line is then found by substituting in the formula which gives the equation for a straight line between two points. y y 1 = y 2 y 1 x x 1 x 2 x 1 y = x = = y 7.65 = 0.016(x 1932) y = 0.016x or distance = year The intercept for this equation is m. In theory, this is the winning distance for the year 0! In practice, there is no meaningful interpretation for the y axis intercept in this situation. But the same cannot be said about the slope. A slope of means that on average the gold medal winning distance increases by about 1.6 centimetres at each successive games. Using this equation the gold medal winning distance for the long jump in 2008 would be predicted as y = = 8.87 m Obviously, attempting to project too far into the future may give us answers which are not sensible. When using an equation for prediction, derived from data, it is sensible to use values of the explanatory variable which are within a reasonable range of the data. The relationship between the variables may not be linear if we move too far from the known values.

22 Chapter 23 Investigating the relationship between two numerical variables 575 Example 7 The following table gives the alcohol consumption per head (in litres) and the hospital admission rate to each of the regions of Victoria in Per capita Hospital consumption admissions per Region (litres of alcohol) 1000 residents Loddon Mallee Grampians Barwon Gippsland Hume Western Metropolitan Northern Metropolitan Eastern Metropolitan Southern Metropolitan Find a straight line which fits the general trend of the data, and interpret the intercept and slope. Solution Admissions Alcohol consumption 10.0 One possible line passes through the points (7, 36) and (9, 42). y y 1 Thus = y 2 y 1 x x 1 x 2 x 1 y = x = 6 2 = 3 y 36 = 3(x 7) y = 3x + 15 or admission rate = alcohol consumption The intercept for this equation is 15, implying that we predict a hospital admission rate would be 15 per 1000 residents for a region with 0 alcohol consumption. While this is interpretable, it would be a brave prediction as it is well out of the range of the

23 576 Essential Advanced General Mathematics Example 6 Example 7 data. A slope of 3 means that on average the admission rate rises by 3 per 1000 residents for each additional litre of alcohol consumed per capita. Exercise 23D 1 Plot the following set of data points on graph paper. x y Draw a straight line which fits the data by eye, and find an equation for this line. 2 Plot the following set of points on graph paper. x y Draw a straight line which fits the data by eye, and find an equation for this line. 3 The numbers of burglaries during two successive years for various districts in one state are given in the following table. a Make a scatterplot of the data. District Year 1 (x) Year 2 (y) b Find the equation of a straight line which A relates the two variables. B c Describe the trend in burglaries in this state. C D E F G H I J K The following data give a girl s height (in cm) between the ages of 36 months and 60 months. Age (x) Height (y) a Make a scatterplot of the data. b Find the equation of a straight line which relates the two variables. c Interpret the intercept and slope, if appropriate. d Use your equation to estimate the girl s height at age i 42 months ii 18 years e How reliable are your answers to d?

24 Chapter 23 Investigating the relationship between two numerical variables The following table gives the adult heights (in cm) of ten pairs of mothers and daughters. a b c Mother (x) Daughter (y) Make a scatterplot of the data. Find the equation of a straight line which relates the two variables. Estimate the adult height of a girl whose mother is 170 cm tall. 6 The manager of a company which manufactures MP3 players keeps a weekly record of the cost of running the business and the number of units produced. The figures for a period of eight weeks are shown in the table. a b c d Number of MP3 players produced (x) Cost in 000 s $ (y) Make a scatterplot of the data. Find the equation of a straight line which relates the two variables. What is the manufacturer s fixed cost for operating the business each week? What is the cost of production of each unit, over and above this fixed operating cost? 7 The amount of a particular pain relief drug given to each patient and the time taken for the patient to experience pain relief are shown. Patient Drug dose (mg) Response time (min) a Find the equation of a straight line which relates the two variables. b Interpret the intercept and slope if appropriate. c Use your equation to predict the time taken for the patient to experience pain relief if 6 mg of the drug is given. Is this answer realistic? 8 The proprietor of a hairdressing salon recorded the amount spent on advertising in the local paper and the business income for each month for a year, with the results shown. a Find the equation of a straight line which relates the two variables. b Interpret the intercept and slope if appropriate. c Use your equation to predict the business income which would be attracted if the proprietor of the salon spent the following amounts on advertising: i $1000 ii $0 Month Advertising ($) Business ($)

25 578 Essential Advanced General Mathematics 23.5 The least squares regression line Fitting a line to a scatterplot by eye is not generally the best way of modelling a relationship. What is required is a method which uses a more objective criterion. A simple method is two divide the data set into two halves on the basis of the median x value, and to fit a line which passes through the mean x and y values of the lower half, and the mean x and y values of the upper half. This is called the two-mean line, and while easy to determine, it is not very widely used. The most common procedure is the method of least squares. The least squares regression line is the line for which the sum of squares of the vertical deviations from the data to the line (as indicated in the diagram) is a minimum. These deviations are called the residuals. y (x i, y i ) y = a + bx Consider the line y = a + bx We would like to find a and b such that the sum of the residuals is zero. That is, n (y i a bx i ) = 0 1 i=1 and the sum of residuals square is as small as possible. That is, n (y i a bx i ) 2 is a minimum 2 i=1 We will use the symbol S to denote From 1, n (y i a bx i ) 2 i=1 n (y i a bx i ) = 0 i=1 n y i na b i=1 n x i = 0 i=1 ȳ a b x = 0 a = ȳ b x 3 10 x

26 Chapter 23 Investigating the relationship between two numerical variables 579 Substituting this relationship in 2 n S = [y i (ȳ b x) bx i ] 2 = = i=1 n [(y i ȳ) b(x i x)] 2 i=1 n [(y i ȳ) 2 2b(x i x)(y i ȳ) + b 2 (x i x) 2 ] i=1 This can be thought of as a quadratic expression in b. In order to find the value of b which minimises S, wewill differentiate with respect to b and set the derivative equal to zero. Simplifying gives b = ds n db = 2 (x i x)(y i ȳ) + 2b i=1 = 0 n (x i x)(y i ȳ) i=1 4 n (x i x) 2 i=1 n (x i x) 2 Equations 3 and 4 can then be used to calculate the least squares estimates of the y axis intercept and the slope. Using the TI-Nspire The calculator can be used to construct the least squares regression line. The procedure is illustrated using the age and price of car data from Example 2. It has previously been illustrated how to enter the data as two lists, age and price respectively, and from these construct the scatterplot in a Data & Statistics application resulting in the data displayed as shown. i=1

27 580 Essential Advanced General Mathematics Now use Show Linear (mx + b) from the Regression submenu of the Analyze menu (b 461)toplace the regression line on the scatterplot as shown. Using the Casio ClassPad The following data gives the heights (in cm) and weights (in kg) of 11 people. Height (x) Weight (y) Enter the data into list1 (x) and list2 (y). Tap Calc, Linear Reg and select the settings shown. Tap OK to produce the results shown in the second screen. The format of the formula, y = ax + b is shown at the top and the values of a, b are shown in the answer box. TapOKtoproduce a scatterplot showing the regression line. Note: The formula can be automatically copied into a selected entry line if desired by selecting a graph number, e.g. y1 inthe Copy Formula line. After the equation of the least squares line has been determined, we can interpret the intercept and slope in terms of the problem at hand, and use the equation to make predictions. The method of least squares is also sensitive to any outliers in the data.

28 Chapter 23 Investigating the relationship between two numerical variables 581 Example 8 Consider again the gold medal winning distance, in metres, for the men s long jump for the Olympic games for the years 1896 to Find the equation of the least squares regression line for these data, and use it to predict the winning distance for the year Year Distance (m) Year Distance (m) Solution Using a calculator or computer the equation is found to be distance = year which is quite similar to the equation to the line fitted by eye. The predicted distance for the year 2008 is Example 9 distance = = 8.86 m Consider again the data from Example 7 which related alcohol consumption per head (in litres) and the hospital admission rate to each of the regions of Victoria in Per capita consumption Hospital admissions Region (litres of alcohol) per 1000 residents Loddon Mallee Grampians Barwon Gippsland Hume Western Metropolitan Northern Metropolitan Eastern Metropolitan Southern Metropolitan Find the equation of the least squares regression line which fits these data. Solution Using a calculator or computer the equation is found to be admissions = alcohol which is slightly different from the line fitted by eye.

29 582 Essential Advanced General Mathematics Example 8 Correlation and causation The existence of even a strong linear relationship between two variables is not, in itself, sufficient to imply that altering one variable causes a change in the other. It only implies that this might be the explanation. It may be that both the measured variables are affected by a third and different variable. For example, if data about crime rates and unemployment in a range of cities were gathered, a high correlation would be found. But could it be inferred that high unemployment causes a high crime rate? The explanation could be that both of these variables are dependent on other factors, such as home circumstances, peer group pressure, level of education or economic conditions, all of which may be related to both unemployment and crime rates. These two variables may vary together, without one being the direct cause of the other. Exercise 23E 1 The following data give a girl s height (in cm) between the ages of 36 months and 60 months. Age (x) Height (y) a Using the method of least squares find the equation of a straight line which relates the two variables. b Interpret the intercept and slope, if appropriate. c Use your equation to estimate the girl s height at age i 42 months ii 18 years d How reliable are your answers to part c? 2 The number of burglaries during two successive years for various districts in one state are given in the following table. Using the method of least squares find the equation of a straight line which relates the two variables. District Year 1 (x) Year 2 (y) A B C D E F G H I J K

30 Example 9 Chapter 23 Investigating the relationship between two numerical variables The following table gives the adult heights (in cm) of ten pairs of mothers and daughters. a b c Mother (x) Daughter (y) Using the method of least squares find the equation of a straight line which relates the two variables. Interpret the slope in this context. Estimate the adult height of a girl whose mother is 170 cm tall. 4 The manager of a company which manufactures MP3 players keeps a weekly record of the cost of running the business and the number of units produced. The figures for a period of eight weeks are: Number of MP3 players produced (x) Cost in 000 s $ (y) a Using the method of least squares find the equation of a straight line which relates the two variables. b What is the manufacture s fixed cost for operating the business each week? c What is the cost of production of each unit, over and above this fixed operating cost? 5 The amount of a particular pain relief drug given to each patient and the time taken for the patient to experience pain relief are shown. a b c Patient Drug dose (mg) Response time (min) Using the method of least squares find the equation of a straight line which relates the two variables. Interpret the intercept and slope if appropriate. Use your equation to predict the time taken for the patient to experience pain relief if 6 mg of the drug is given. Is this answer realistic? 6 The proprietor of a hairdressing salon recorded the amount spent on advertising in the local paper and the business income for each month for a year, with the results shown. a Using the method of least squares find the equation of a straight line which relates the two variables. b Interpret the intercept and slope if appropriate. Month Advertising ($) Business ($)

31 584 Essential Advanced General Mathematics c Use your equation to predict the volume which would be attracted if the proprietor of the salon spent the following amounts on advertising. i $1000 ii $0 Using a CAS calculator with statistics II How to construct a scatterplot Construct a scatterplot for the following set of test scores. Treat Test 1 as the independent (x-) variable. Test 1 score Test 2 score Enter the data into your calculator using the Stats/List Editor. Type the data into list1 and list2. Setup the calculator to plot a statistical graph. a Press F2 to access the Plots menu. b Press ENTER to select 1:Plot Setup. c Press F1 to Define Plot1. d Complete the dialogue box as follows: For Plot Type: select 1:Scatter Leave Mark:asBox. For x, type in list1. For y, type in list2. Pressing ENTER confirms your selection and returns you tothe Plot Setup menu. Pressing F5 (Zoom Data) in Plot Setup automatically plots the scatter plot in a properly scaled window.

32 Chapter 23 Investigating the relationship between two numerical variables 585 How to calculate the correlation coefficient r Use a calculator to calculate the correlation coefficient r for the following data. x y Give the answer correct to two decimal places. Enter the data into your calculator using the Stats/List Editor. Type the data into list1 and list2. Calculate the correlation coefficient. a Press F4 to access the Calculate menu. b Move the cursor down ( )toregressions and across ( )tothe Regressions menu. c Press ENTER to select 1: LinReg(a+bx). This will take you to the LinReg(a+bx) dialogue box. d Complete the dialogue box. For Xlist: type in list1. For Ylist: type in list2. e Press ENTER to obtain the results. How to determine the equation of a least squares regression The following data gives the heights (in cm) and weights (in kg) of 11 people. Height (x) Weight (y) Use a graphics calculator to determine the equation of the least squares regression line that will enable weight to be predicted from height. Enter the data into your calculator using the Stats/List Editor.

33 586 Essential Advanced General Mathematics Type the data into list1 and list2. a Press F4 to access the Calculate menu. b Move the cursor down (D)toRegressions and across (B)tothe Regressions menu. c Press ENTER to select 1: LinReg(a+bx). This will take you to the LinReg(a+bx) dialogue box. d Complete the dialogue box. For Xlist: type in list1. For Ylist: type in list2. e Press ENTER to obtain the results.

34 Chapter 23 Investigating the relationship between two numerical variables 587 Chapter summary Bivariate data arises when measurements on two variables are collected for each subject. A scatterplot is an appropriate visual display of bivariate data if both of the variables are numerical. A scatterplot of the data should always be constructed to assist in the identification of outliers and illustrate the association (positive, negative or none). Twovariables are positively associated when larger values of y are associated with larger values of x.two variables are negatively associated when larger values of y are associated with smaller values of x. There is no association between two variables when the values of y are not related to the values of x. When constructing the scatterplot, the independent or explanatory variable is plotted on the horizontal (x) axis, and the dependent or response variable is plotted on the vertical (y) axis. If a linear relationship is indicated by the scatterplot a measure of its strength can be found by calculating the q-correlation coefficient, orpearson s product-moment correlation coefficient, r. If the values on a scatterplot are divided by lines representing the median of x and the median of y into four quadrants A, B, C and D, with a, b, c, d representing the number of points in each quadrant respectively, then the q-correlation coefficient is given by (a + c) (b + d) q = a + b + c + d Pearson s product-moment correlation, r, isameasure of strength of linear relationship between two variables, x and y.ifwehaven observations then for this set of observations r = 1 n ( )( ) xi x yi ȳ n 1 i=1 s x s y where x and s x are the mean and standard deviation of the x scores and ȳ and s y are the mean and standard deviation of the y scores. For these correlation coefficients 1 q 1 1 r 1 with values close to ±1 indicating strong correlation, and those close to 0 indicating little correlation. If a linear relationship is indicated from the scatterplot a straight line may be fitted to the data, either by eye or using the least squares regression method. The least squares regression line is the line for which the sum of squares of the vertical deviations from the data to the line is a minimum. The value of the slope (b)gives the extent of the change in the dependent variable associated with a unit change in the independent variable. Once found, the equation to the straight line may be used to predict values of the response variable (y) from the explanatory variable (x). The accuracy of the prediction depends on how closely the straight line fits the data, and an indication of this can be obtained from the correlation coefficient. Review

35 588 Essential Advanced General Mathematics Review Multiple-choice questions 1 Forwhich one of the following pairs of variables would it be appropriate to construct a scatterplot? A eye colour (blue, green, brown, other) and hair colour (black, brown, blonde, red, other) B score out of 100 on a test for a group of Year 9 students and a group of Year 11 students C political party preference (Labor, Liberal, Other) and age in years D age in years and blood pressure in mm Hg E height in cm and gender (male, female) 2 Forwhich one of the following plots would it be appropriate to calculate the value of the q-correlation coefficient? A B C D E 3 A q-correlation coefficient of 0.32 would describe a relationship classified as A weak positive B moderate positive C strong positive D close to zero E moderately strong 4 The scatterplot shows the relationship between age and the number of alcoholic drinks consumed on the weekend by a group of people. The value of the q-correlation coefficient is closest to A 1 B 7 9 No. of drinks C 5 6 D 7 9 E Age

36 Chapter 23 Investigating the relationship between two numerical variables The following scatterplot shows the relationship between height and weight for a group of people. The value of the Pearson s product-moment correlation coefficient r is closest to A 1 B 0.8 C 0.5 D 0.3 E 0 Weight Height (cm) Questions 6 and 7 relate to the following information. The weekly income and weekly expenditure on food for a group of 10 university students is given in the following table. 200 Weekly income ($) Weekly food expenditure ($) The value of the Pearson product-moment correlation coefficient r for these data is closest to A 0.2 B 0.4 C 0.6 D 0.7 E The least squares regression line which would enable expenditure on food to be predicted from weekly income is closest to A weekly income B weekly income C weekly income D weekly income E weekly income Review

37 590 Essential Advanced General Mathematics Review Questions 8 and 9 relate to the following information. Suppose that the least squares regression line which would enable expenditure on entertainment (in dollars) to be predicted from weekly income is given by Weekly expenditure on entertainment = Weekly income 8 Using this rule the expenditure on entertainment by an individual with an income of $600 per week is predicted to be A $40 B $ C $100 D $46 E $240 9 From this rule which of the following statements is correct? A On average for each extra dollar of income an extra 10 cents is spent on entertainment B On average for each extra 10 cents in income an extra $1 is spent on entertainment C On average for each extra dollar of income an extra 40 cents is spent on entertainment D On average people spend $40 per week on entertainment E On average people spend $50 per week on entertainment 10 For the scatterplot shown the line of best fit would have a slope closest to: 250 A 0.1 B 0.1 C 10 D 10 E 200 Short-answer questions Technology is required to answer some of the following questions. 1 The following table gives the number of times Inside 50 Score (points) the ball was inside the 50 metre line in an AFL football game, and the team s score in that game a Plot the score against the number of Inside 50s b From the scatterplot, describe any association between the two variables Use the scatterplot constructed in 1 to determine q-correlation between the score and the number of Inside 50s

38 Chapter 23 Investigating the relationship between two numerical variables The distance traveled to work and the time taken for a group of company employees are given in the following table. Determine the value of the Pearson product-moment correlation r for these data. Distance (kms) Time (mins) The following scatterplot shows the relationship between height and weight for a group of people. Draw a straight line which fits the data by eye, and find an equation for this line. weight height (cm) 5 The time taken to complete a task, and the number of errors on the task, were recorded for a sample of 10 primary school children. Determine the equation of the least squares regression line which fits these data. 6 For the data in 5: a Interpret the intercept and slope of the least squares regression line. b Use the least squares regression line to predict the number of errors which would be observed for a child who took 10 seconds to complete the task. Time (seconds) Errors Review

39 592 Essential Advanced General Mathematics Review Extended-response questions 1 A marketing company wishes to predict the likely number of new clients each of its graduates will attract to the business in their first year of employment, by using their scores on a marketing exam in the final year of their course. a Which is the independent variable Number of new and which is the dependent variable? Exam score clients b Construct a scatterplot of these data c Describe the association between the 72 9 Number of new clients and Exam 68 8 score d Determine the value of the q-correlation coefficient for these data, and classify 61 8 the strength of the relationship e Determine the value of the Pearson product-moment correlation coefficient 70 5 for these data and classify the strength of the relationship f Determine the equation for the least squares regression line and write it down in terms of the variables Number of new clients and Exam score. g Interpret the intercept and slope of the least squares regression line in terms of the variables in the study. h Use your regression equation to predict to the nearest whole number the Number of new clients for a person who scored 100 on the exam. i How reliable is the prediction made in h? 2 To investigate the relationship between marks on an assignment and the final examination mark a sample of 10 students was taken. The table indicates the marks for the assignment and the final exam mark for each individual student. a Which is the independent variable Assignment mark Final exam mark and which is the dependent variable? (max = 80) (max = 90) b Construct a scatterplot of these data c Describe the association between the assignment mark and exam mark d Determine the value of the q-correlation coefficient for these data, and classify the strength of the relationship e Determine the value of the Pearson product-moment correlation coefficient for these data and classify the strength of the relationship

40 Chapter 23 Investigating the relationship between two numerical variables 593 f Use your answer to d to comment on the statement: Good final exam marks are the result of good assignment marks. g Determine the equation for the least squares regression line and write it down in terms of the variables Final exam mark and Assignment mark. h Interpret the intercept and slope of the least squares regression line in terms of the variables in the study. i Use your regression equation to predict the Final exam mark for a student who scored 50 on the assignment. j How reliable is the prediction made in i? 3 A marketing firm wanted to investigate the relationship between airplay and CD sales (in the following week) of newly released songs. Data was collected on a random sample of 10 songs. a Which is the independent variable and which No. of times the Weekly sales is the dependent variable? song was played of the CD b Construct a scatterplot of these data c Describe the association between the number of times the song was played and weekly sales d Determine the value of the q-correlation coefficient for these data, and classify the strength of the relationship e Determine the value of the Pearson product-moment correlation coefficient for these data and classify the strength of the relationship f Determine the equation for the least squares regression line and write it down in terms of the variables Number of times the song was played and Weekly sales. g Interpret the intercept and slope of the least squares regression line in terms of the variables in the study. h Use your regression equation to predict the weekly sales for a song which was played 60 times. i How reliable is the prediction made in h? Review

SAMPLE 4CORE. Displaying and describing relationships between two variables. 4.1 Investigating the relationship between two categorical variables

SAMPLE 4CORE. Displaying and describing relationships between two variables. 4.1 Investigating the relationship between two categorical variables C H A P T E R 4CORE Displaying and describing relationships between two variables What are the statistical tools for displaying and describing relationships between two categorical variables? a numerical

More information

Summarising numerical data

Summarising numerical data 2 Core: Data analysis Chapter 2 Summarising numerical data 42 Core Chapter 2 Summarising numerical data 2A Dot plots and stem plots Even when we have constructed a frequency table, or a histogram to display

More information

Name Class Date. Residuals and Linear Regression Going Deeper

Name Class Date. Residuals and Linear Regression Going Deeper Name Class Date 4-8 and Linear Regression Going Deeper Essential question: How can you use residuals and linear regression to fit a line to data? You can evaluate a linear model s goodness of fit using

More information

Year 10 Mathematics Semester 2 Bivariate Data Chapter 13

Year 10 Mathematics Semester 2 Bivariate Data Chapter 13 Year 10 Mathematics Semester 2 Bivariate Data Chapter 13 Why learn this? Observations of two or more variables are often recorded, for example, the heights and weights of individuals. Studying the data

More information

Chapter 6: Exploring Data: Relationships Lesson Plan

Chapter 6: Exploring Data: Relationships Lesson Plan Chapter 6: Exploring Data: Relationships Lesson Plan For All Practical Purposes Displaying Relationships: Scatterplots Mathematical Literacy in Today s World, 9th ed. Making Predictions: Regression Line

More information

Linear Regression Communication, skills, and understanding Calculator Use

Linear Regression Communication, skills, and understanding Calculator Use Linear Regression Communication, skills, and understanding Title, scale and label the horizontal and vertical axes Comment on the direction, shape (form), and strength of the relationship and unusual features

More information

Scatterplots and Correlation

Scatterplots and Correlation Bivariate Data Page 1 Scatterplots and Correlation Essential Question: What is the correlation coefficient and what does it tell you? Most statistical studies examine data on more than one variable. Fortunately,

More information

Chapter 3: Examining Relationships

Chapter 3: Examining Relationships Chapter 3: Examining Relationships Most statistical studies involve more than one variable. Often in the AP Statistics exam, you will be asked to compare two data sets by using side by side boxplots or

More information

Chapter 12 Summarizing Bivariate Data Linear Regression and Correlation

Chapter 12 Summarizing Bivariate Data Linear Regression and Correlation Chapter 1 Summarizing Bivariate Data Linear Regression and Correlation This chapter introduces an important method for making inferences about a linear correlation (or relationship) between two variables,

More information

Chapter 12: Linear Regression and Correlation

Chapter 12: Linear Regression and Correlation Chapter 12: Linear Regression and Correlation Linear Equations Linear regression for two variables is based on a linear equation with one independent variable. It has the form: y = a + bx where a and b

More information

5.1 Bivariate Relationships

5.1 Bivariate Relationships Chapter 5 Summarizing Bivariate Data Source: TPS 5.1 Bivariate Relationships What is Bivariate data? When exploring/describing a bivariate (x,y) relationship: Determine the Explanatory and Response variables

More information

Example: Can an increase in non-exercise activity (e.g. fidgeting) help people gain less weight?

Example: Can an increase in non-exercise activity (e.g. fidgeting) help people gain less weight? Example: Can an increase in non-exercise activity (e.g. fidgeting) help people gain less weight? 16 subjects overfed for 8 weeks Explanatory: change in energy use from non-exercise activity (calories)

More information

AP Statistics Two-Variable Data Analysis

AP Statistics Two-Variable Data Analysis AP Statistics Two-Variable Data Analysis Key Ideas Scatterplots Lines of Best Fit The Correlation Coefficient Least Squares Regression Line Coefficient of Determination Residuals Outliers and Influential

More information

YEAR 10 GENERAL MATHEMATICS 2017 STRAND: BIVARIATE DATA PART II CHAPTER 12 RESIDUAL ANALYSIS, LINEARITY AND TIME SERIES

YEAR 10 GENERAL MATHEMATICS 2017 STRAND: BIVARIATE DATA PART II CHAPTER 12 RESIDUAL ANALYSIS, LINEARITY AND TIME SERIES YEAR 10 GENERAL MATHEMATICS 2017 STRAND: BIVARIATE DATA PART II CHAPTER 12 RESIDUAL ANALYSIS, LINEARITY AND TIME SERIES This topic includes: Transformation of data to linearity to establish relationships

More information

Describing Bivariate Relationships

Describing Bivariate Relationships Describing Bivariate Relationships Bivariate Relationships What is Bivariate data? When exploring/describing a bivariate (x,y) relationship: Determine the Explanatory and Response variables Plot the data

More information

CHAPTER 5 LINEAR REGRESSION AND CORRELATION

CHAPTER 5 LINEAR REGRESSION AND CORRELATION CHAPTER 5 LINEAR REGRESSION AND CORRELATION Expected Outcomes Able to use simple and multiple linear regression analysis, and correlation. Able to conduct hypothesis testing for simple and multiple linear

More information

Chapter 14. Statistical versus Deterministic Relationships. Distance versus Speed. Describing Relationships: Scatterplots and Correlation

Chapter 14. Statistical versus Deterministic Relationships. Distance versus Speed. Describing Relationships: Scatterplots and Correlation Chapter 14 Describing Relationships: Scatterplots and Correlation Chapter 14 1 Statistical versus Deterministic Relationships Distance versus Speed (when travel time is constant). Income (in millions of

More information

Data transformation. Core: Data analysis. Chapter 5

Data transformation. Core: Data analysis. Chapter 5 Chapter 5 5 Core: Data analsis Data transformation ISBN 978--7-56757-3 Jones et al. 6 66 Core Chapter 5 Data transformation 5A Introduction You first encountered data transformation in Chapter where ou

More information

Important note: Transcripts are not substitutes for textbook assignments. 1

Important note: Transcripts are not substitutes for textbook assignments. 1 In this lesson we will cover correlation and regression, two really common statistical analyses for quantitative (or continuous) data. Specially we will review how to organize the data, the importance

More information

Chapter 12 : Linear Correlation and Linear Regression

Chapter 12 : Linear Correlation and Linear Regression Chapter 1 : Linear Correlation and Linear Regression Determining whether a linear relationship exists between two quantitative variables, and modeling the relationship with a line, if the linear relationship

More information

MATH 1070 Introductory Statistics Lecture notes Relationships: Correlation and Simple Regression

MATH 1070 Introductory Statistics Lecture notes Relationships: Correlation and Simple Regression MATH 1070 Introductory Statistics Lecture notes Relationships: Correlation and Simple Regression Objectives: 1. Learn the concepts of independent and dependent variables 2. Learn the concept of a scatterplot

More information

SAMPLE. Describing the distribution of a single variable

SAMPLE. Describing the distribution of a single variable Objectives C H A P T E R 22 Describing the distribution of a single variable To introduce the two main types of data categorical and numerical To use bar charts to display frequency distributions of categorical

More information

3.2: Least Squares Regressions

3.2: Least Squares Regressions 3.2: Least Squares Regressions Section 3.2 Least-Squares Regression After this section, you should be able to INTERPRET a regression line CALCULATE the equation of the least-squares regression line CALCULATE

More information

Lecture 48 Sections Mon, Nov 16, 2009

Lecture 48 Sections Mon, Nov 16, 2009 and and Lecture 48 Sections 13.4-13.5 Hampden-Sydney College Mon, Nov 16, 2009 Outline and 1 2 3 4 5 6 Outline and 1 2 3 4 5 6 and Exercise 13.4, page 821. The following data represent trends in cigarette

More information

BIOSTATISTICS NURS 3324

BIOSTATISTICS NURS 3324 Simple Linear Regression and Correlation Introduction Previously, our attention has been focused on one variable which we designated by x. Frequently, it is desirable to learn something about the relationship

More information

Overview. 4.1 Tables and Graphs for the Relationship Between Two Variables. 4.2 Introduction to Correlation. 4.3 Introduction to Regression 3.

Overview. 4.1 Tables and Graphs for the Relationship Between Two Variables. 4.2 Introduction to Correlation. 4.3 Introduction to Regression 3. 3.1-1 Overview 4.1 Tables and Graphs for the Relationship Between Two Variables 4.2 Introduction to Correlation 4.3 Introduction to Regression 3.1-2 4.1 Tables and Graphs for the Relationship Between Two

More information

Business Statistics. Lecture 10: Correlation and Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form

More information

Chapter 6. Exploring Data: Relationships

Chapter 6. Exploring Data: Relationships Chapter 6 Exploring Data: Relationships For All Practical Purposes: Effective Teaching A characteristic of an effective instructor is fairness and consistenc in grading and evaluating student performance.

More information

MATH 1150 Chapter 2 Notation and Terminology

MATH 1150 Chapter 2 Notation and Terminology MATH 1150 Chapter 2 Notation and Terminology Categorical Data The following is a dataset for 30 randomly selected adults in the U.S., showing the values of two categorical variables: whether or not the

More information

The following formulas related to this topic are provided on the formula sheet:

The following formulas related to this topic are provided on the formula sheet: Student Notes Prep Session Topic: Exploring Content The AP Statistics topic outline contains a long list of items in the category titled Exploring Data. Section D topics will be reviewed in this session.

More information

Quantitative Bivariate Data

Quantitative Bivariate Data Statistics 211 (L02) - Linear Regression Quantitative Bivariate Data Consider two quantitative variables, defined in the following way: X i - the observed value of Variable X from subject i, i = 1, 2,,

More information

Module 1 Linear Regression

Module 1 Linear Regression Regression Analysis Although many phenomena can be modeled with well-defined and simply stated mathematical functions, as illustrated by our study of linear, exponential and quadratic functions, the world

More information

AP Statistics Bivariate Data Analysis Test Review. Multiple-Choice

AP Statistics Bivariate Data Analysis Test Review. Multiple-Choice Name Period AP Statistics Bivariate Data Analysis Test Review Multiple-Choice 1. The correlation coefficient measures: (a) Whether there is a relationship between two variables (b) The strength of the

More information

BIVARIATE DATA data for two variables

BIVARIATE DATA data for two variables (Chapter 3) BIVARIATE DATA data for two variables INVESTIGATING RELATIONSHIPS We have compared the distributions of the same variable for several groups, using double boxplots and back-to-back stemplots.

More information

y n 1 ( x i x )( y y i n 1 i y 2

y n 1 ( x i x )( y y i n 1 i y 2 STP3 Brief Class Notes Instructor: Ela Jackiewicz Chapter Regression and Correlation In this chapter we will explore the relationship between two quantitative variables, X an Y. We will consider n ordered

More information

Copyright, Nick E. Nolfi MPM1D9 Unit 6 Statistics (Data Analysis) STA-1

Copyright, Nick E. Nolfi MPM1D9 Unit 6 Statistics (Data Analysis) STA-1 UNIT 6 STATISTICS (DATA ANALYSIS) UNIT 6 STATISTICS (DATA ANALYSIS)... 1 INTRODUCTION TO STATISTICS... 2 UNDERSTANDING STATISTICS REQUIRES A CHANGE IN MINDSET... 2 UNDERSTANDING SCATTER PLOTS #1... 3 UNDERSTANDING

More information

Objectives. 2.3 Least-squares regression. Regression lines. Prediction and Extrapolation. Correlation and r 2. Transforming relationships

Objectives. 2.3 Least-squares regression. Regression lines. Prediction and Extrapolation. Correlation and r 2. Transforming relationships Objectives 2.3 Least-squares regression Regression lines Prediction and Extrapolation Correlation and r 2 Transforming relationships Adapted from authors slides 2012 W.H. Freeman and Company Straight Line

More information

Chapter 3: Examining Relationships

Chapter 3: Examining Relationships Chapter 3: Examining Relationships 3.1 Scatterplots 3.2 Correlation 3.3 Least-Squares Regression Fabric Tenacity, lb/oz/yd^2 26 25 24 23 22 21 20 19 18 y = 3.9951x + 4.5711 R 2 = 0.9454 3.5 4.0 4.5 5.0

More information

Prob/Stats Questions? /32

Prob/Stats Questions? /32 Prob/Stats 10.4 Questions? 1 /32 Prob/Stats 10.4 Homework Apply p551 Ex 10-4 p 551 7, 8, 9, 10, 12, 13, 28 2 /32 Prob/Stats 10.4 Objective Compute the equation of the least squares 3 /32 Regression A scatter

More information

1 A Review of Correlation and Regression

1 A Review of Correlation and Regression 1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then

More information

AP Statistics. Chapter 6 Scatterplots, Association, and Correlation

AP Statistics. Chapter 6 Scatterplots, Association, and Correlation AP Statistics Chapter 6 Scatterplots, Association, and Correlation Objectives: Scatterplots Association Outliers Response Variable Explanatory Variable Correlation Correlation Coefficient Lurking Variables

More information

Bivariate data data from two variables e.g. Maths test results and English test results. Interpolate estimate a value between two known values.

Bivariate data data from two variables e.g. Maths test results and English test results. Interpolate estimate a value between two known values. Key words: Bivariate data data from two variables e.g. Maths test results and English test results Interpolate estimate a value between two known values. Extrapolate find a value by following a pattern

More information

Chapter 8. Linear Regression /71

Chapter 8. Linear Regression /71 Chapter 8 Linear Regression 1 /71 Homework p192 1, 2, 3, 5, 7, 13, 15, 21, 27, 28, 29, 32, 35, 37 2 /71 3 /71 Objectives Determine Least Squares Regression Line (LSRL) describing the association of two

More information

Topic 10 - Linear Regression

Topic 10 - Linear Regression Topic 10 - Linear Regression Least squares principle Hypothesis tests/confidence intervals/prediction intervals for regression 1 Linear Regression How much should you pay for a house? Would you consider

More information

Chapter 7. Scatterplots, Association, and Correlation

Chapter 7. Scatterplots, Association, and Correlation Chapter 7 Scatterplots, Association, and Correlation Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 29 Objective In this chapter, we study relationships! Instead, we investigate

More information

a. Yes, it is consistent. a. Positive c. Near Zero

a. Yes, it is consistent. a. Positive c. Near Zero Chapter 4 Test B Multiple Choice Section 4.1 (Visualizing Variability with a Scatterplot) 1. [Objective: Analyze a scatter plot and recognize trends] Doctors believe that smoking cigarettes lowers lung

More information

Least Squares Regression

Least Squares Regression Least Squares Regression Sections 5.3 & 5.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 14-2311 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

Scatterplots. 3.1: Scatterplots & Correlation. Scatterplots. Explanatory & Response Variables. Section 3.1 Scatterplots and Correlation

Scatterplots. 3.1: Scatterplots & Correlation. Scatterplots. Explanatory & Response Variables. Section 3.1 Scatterplots and Correlation 3.1: Scatterplots & Correlation Scatterplots A scatterplot shows the relationship between two quantitative variables measured on the same individuals. The values of one variable appear on the horizontal

More information

1) A residual plot: A)

1) A residual plot: A) 1) A residual plot: A) B) C) D) E) displays residuals of the response variable versus the independent variable. displays residuals of the independent variable versus the response variable. displays residuals

More information

AP Statistics Unit 2 (Chapters 7-10) Warm-Ups: Part 1

AP Statistics Unit 2 (Chapters 7-10) Warm-Ups: Part 1 AP Statistics Unit 2 (Chapters 7-10) Warm-Ups: Part 1 2. A researcher is interested in determining if one could predict the score on a statistics exam from the amount of time spent studying for the exam.

More information

Chapter 9. Correlation and Regression

Chapter 9. Correlation and Regression Chapter 9 Correlation and Regression Lesson 9-1/9-2, Part 1 Correlation Registered Florida Pleasure Crafts and Watercraft Related Manatee Deaths 100 80 60 40 20 0 1991 1993 1995 1997 1999 Year Boats in

More information

Chapter 3: Examining Relationships

Chapter 3: Examining Relationships Chapter 3 Review Chapter 3: Examining Relationships 1. A study is conducted to determine if one can predict the yield of a crop based on the amount of yearly rainfall. The response variable in this study

More information

6.1.1 How can I make predictions?

6.1.1 How can I make predictions? CCA Ch 6: Modeling Two-Variable Data Name: Team: 6.1.1 How can I make predictions? Line of Best Fit 6-1. a. Length of tube: Diameter of tube: Distance from the wall (in) Width of field of view (in) b.

More information

(c) Plot the point ( x, y ) on your scatter diagram and label this point M. (d) Write down the product-moment correlation coefficient, r.

(c) Plot the point ( x, y ) on your scatter diagram and label this point M. (d) Write down the product-moment correlation coefficient, r. 1. The heat output in thermal units from burning 1 kg of wood changes according to the wood s percentage moisture content. The moisture content and heat output of 10 blocks of the same type of wood each

More information

Bivariate Data Summary

Bivariate Data Summary Bivariate Data Summary Bivariate data data that examines the relationship between two variables What individuals to the data describe? What are the variables and how are they measured Are the variables

More information

SECTION I Number of Questions 42 Percent of Total Grade 50

SECTION I Number of Questions 42 Percent of Total Grade 50 AP Stats Chap 7-9 Practice Test Name Pd SECTION I Number of Questions 42 Percent of Total Grade 50 Directions: Solve each of the following problems, using the available space (or extra paper) for scratchwork.

More information

Mrs. Poyner/Mr. Page Chapter 3 page 1

Mrs. Poyner/Mr. Page Chapter 3 page 1 Name: Date: Period: Chapter 2: Take Home TEST Bivariate Data Part 1: Multiple Choice. (2.5 points each) Hand write the letter corresponding to the best answer in space provided on page 6. 1. In a statistics

More information

CRP 272 Introduction To Regression Analysis

CRP 272 Introduction To Regression Analysis CRP 272 Introduction To Regression Analysis 30 Relationships Among Two Variables: Interpretations One variable is used to explain another variable X Variable Independent Variable Explaining Variable Exogenous

More information

4.1 Introduction. 4.2 The Scatter Diagram. Chapter 4 Linear Correlation and Regression Analysis

4.1 Introduction. 4.2 The Scatter Diagram. Chapter 4 Linear Correlation and Regression Analysis 4.1 Introduction Correlation is a technique that measures the strength (or the degree) of the relationship between two variables. For example, we could measure how strong the relationship is between people

More information

11 Correlation and Regression

11 Correlation and Regression Chapter 11 Correlation and Regression August 21, 2017 1 11 Correlation and Regression When comparing two variables, sometimes one variable (the explanatory variable) can be used to help predict the value

More information

Correlation & Regression. Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University of Alexandria

Correlation & Regression. Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University of Alexandria بسم الرحمن الرحيم Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University of Alexandria Correlation Finding the relationship between

More information

Examining Relationships. Chapter 3

Examining Relationships. Chapter 3 Examining Relationships Chapter 3 Scatterplots A scatterplot shows the relationship between two quantitative variables measured on the same individuals. The explanatory variable, if there is one, is graphed

More information

Introductory Statistics

Introductory Statistics Introductory Statistics This document is attributed to Barbara Illowsky and Susan Dean Chapter 12 Open Assembly Edition Open Assembly editions of open textbooks are disaggregated versions designed to facilitate

More information

NAME: DATE: SECTION: MRS. KEINATH

NAME: DATE: SECTION: MRS. KEINATH 1 Vocabulary and Formulas: Correlation coefficient The correlation coefficient, r, measures the direction and strength of a linear relationship between two variables. Formula: = 1 x i x y i y r. n 1 s

More information

MFM1P Foundations of Mathematics Unit 3 Lesson 11

MFM1P Foundations of Mathematics Unit 3 Lesson 11 The Line Lesson MFMP Foundations of Mathematics Unit Lesson Lesson Eleven Concepts Overall Expectations Apply data-management techniques to investigate relationships between two variables; Determine the

More information

Final Exam - Solutions

Final Exam - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your

More information

Steps to take to do the descriptive part of regression analysis:

Steps to take to do the descriptive part of regression analysis: STA 2023 Simple Linear Regression: Least Squares Model Steps to take to do the descriptive part of regression analysis: A. Plot the data on a scatter plot. Describe patterns: 1. Is there a strong, moderate,

More information

Related Example on Page(s) R , 148 R , 148 R , 156, 157 R3.1, R3.2. Activity on 152, , 190.

Related Example on Page(s) R , 148 R , 148 R , 156, 157 R3.1, R3.2. Activity on 152, , 190. Name Chapter 3 Learning Objectives Identify explanatory and response variables in situations where one variable helps to explain or influences the other. Make a scatterplot to display the relationship

More information

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories.

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories. Chapter Goals To understand the methods for displaying and describing relationship among variables. Formulate Theories Interpret Results/Make Decisions Collect Data Summarize Results Chapter 7: Is There

More information

Math 52 Linear Regression Instructions TI-83

Math 52 Linear Regression Instructions TI-83 Math 5 Linear Regression Instructions TI-83 Use the following data to study the relationship between average hours spent per week studying and overall QPA. The idea behind linear regression is to determine

More information

AP STATISTICS Name: Period: Review Unit IV Scatterplots & Regressions

AP STATISTICS Name: Period: Review Unit IV Scatterplots & Regressions AP STATISTICS Name: Period: Review Unit IV Scatterplots & Regressions Know the definitions of the following words: bivariate data, regression analysis, scatter diagram, correlation coefficient, independent

More information

CORRELATION ANALYSIS. Dr. Anulawathie Menike Dept. of Economics

CORRELATION ANALYSIS. Dr. Anulawathie Menike Dept. of Economics CORRELATION ANALYSIS Dr. Anulawathie Menike Dept. of Economics 1 What is Correlation The correlation is one of the most common and most useful statistics. It is a term used to describe the relationship

More information

Lecture 4 Scatterplots, Association, and Correlation

Lecture 4 Scatterplots, Association, and Correlation Lecture 4 Scatterplots, Association, and Correlation Previously, we looked at Single variables on their own One or more categorical variable In this lecture: We shall look at two quantitative variables.

More information

Lecture 4 Scatterplots, Association, and Correlation

Lecture 4 Scatterplots, Association, and Correlation Lecture 4 Scatterplots, Association, and Correlation Previously, we looked at Single variables on their own One or more categorical variables In this lecture: We shall look at two quantitative variables.

More information

The Simple Linear Regression Model

The Simple Linear Regression Model The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate

More information

Sampling, Frequency Distributions, and Graphs (12.1)

Sampling, Frequency Distributions, and Graphs (12.1) 1 Sampling, Frequency Distributions, and Graphs (1.1) Design: Plan how to obtain the data. What are typical Statistical Methods? Collect the data, which is then subjected to statistical analysis, which

More information

Business Statistics. Lecture 9: Simple Regression

Business Statistics. Lecture 9: Simple Regression Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals

More information

UCLA STAT 10 Statistical Reasoning - Midterm Review Solutions Observational Studies, Designed Experiments & Surveys

UCLA STAT 10 Statistical Reasoning - Midterm Review Solutions Observational Studies, Designed Experiments & Surveys UCLA STAT 10 Statistical Reasoning - Midterm Review Solutions Observational Studies, Designed Experiments & Surveys.. 1. (i) The treatment being compared is: (ii). (5) 3. (3) 4. (4) Study 1: the number

More information

Looking at Data Relationships. 2.1 Scatterplots W. H. Freeman and Company

Looking at Data Relationships. 2.1 Scatterplots W. H. Freeman and Company Looking at Data Relationships 2.1 Scatterplots 2012 W. H. Freeman and Company Here, we have two quantitative variables for each of 16 students. 1) How many beers they drank, and 2) Their blood alcohol

More information

date: math analysis 2 chapter 18: curve fitting and models

date: math analysis 2 chapter 18: curve fitting and models name: period: date: math analysis 2 mr. mellina chapter 18: curve fitting and models Sections: 18.1 Introduction to Curve Fitting; the Least-Squares Line 18.2 Fitting Exponential Curves 18.3 Fitting Power

More information

MAL SHIELD. Student Textbook

MAL SHIELD. Student Textbook MAL SHIELD Student Textbook Queensland GO Maths Student Textbook Level /6 (Year 9) Copyright 008 ORIGO Education Author: Mal Shield Consultant: Kathy Blum Shield, M. J. (Malcolm John). Go maths : level

More information

Objectives. 2.1 Scatterplots. Scatterplots Explanatory and response variables Interpreting scatterplots Outliers

Objectives. 2.1 Scatterplots. Scatterplots Explanatory and response variables Interpreting scatterplots Outliers Objectives 2.1 Scatterplots Scatterplots Explanatory and response variables Interpreting scatterplots Outliers Adapted from authors slides 2012 W.H. Freeman and Company Relationship of two numerical variables

More information

Hit the brakes, Charles!

Hit the brakes, Charles! Hit the brakes, Charles! Teacher Notes/Answer Sheet 7 8 9 10 11 12 TI-Nspire Investigation Teacher 60 min Introduction Hit the brakes, Charles! bivariate data including x 2 data transformation. As soon

More information

Outline. Lesson 3: Linear Functions. Objectives:

Outline. Lesson 3: Linear Functions. Objectives: Lesson 3: Linear Functions Objectives: Outline I can determine the dependent and independent variables in a linear function. I can read and interpret characteristics of linear functions including x- and

More information

Stat 101 Exam 1 Important Formulas and Concepts 1

Stat 101 Exam 1 Important Formulas and Concepts 1 1 Chapter 1 1.1 Definitions Stat 101 Exam 1 Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2. Categorical/Qualitative

More information

Chapter 2.1 Relations and Functions

Chapter 2.1 Relations and Functions Analyze and graph relations. Find functional values. Chapter 2.1 Relations and Functions We are familiar with a number line. A number line enables us to locate points, denoted by numbers, and find distances

More information

Intermediate Algebra Summary - Part I

Intermediate Algebra Summary - Part I Intermediate Algebra Summary - Part I This is an overview of the key ideas we have discussed during the first part of this course. You may find this summary useful as a study aid, but remember that the

More information

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation Scatterplots and Correlation Name Hr A scatterplot shows the relationship between two quantitative variables measured on the same individuals. variable (y) measures an outcome of a study variable (x) may

More information

Linear Regression 3.2

Linear Regression 3.2 3.2 Linear Regression Regression is an analytic technique for determining the relationship between a dependent variable and an independent variable. When the two variables have a linear correlation, you

More information

Regression Analysis. BUS 735: Business Decision Making and Research

Regression Analysis. BUS 735: Business Decision Making and Research Regression Analysis BUS 735: Business Decision Making and Research 1 Goals and Agenda Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn

More information

Key Concepts. Correlation (Pearson & Spearman) & Linear Regression. Assumptions. Correlation parametric & non-para. Correlation

Key Concepts. Correlation (Pearson & Spearman) & Linear Regression. Assumptions. Correlation parametric & non-para. Correlation Correlation (Pearson & Spearman) & Linear Regression Azmi Mohd Tamil Key Concepts Correlation as a statistic Positive and Negative Bivariate Correlation Range Effects Outliers Regression & Prediction Directionality

More information

Topic 2 Part 1 [195 marks]

Topic 2 Part 1 [195 marks] Topic 2 Part 1 [195 marks] The distribution of rainfall in a town over 80 days is displayed on the following box-and-whisker diagram. 1a. Write down the median rainfall. 1b. Write down the minimum rainfall.

More information

appstats8.notebook October 11, 2016

appstats8.notebook October 11, 2016 Chapter 8 Linear Regression Objective: Students will construct and analyze a linear model for a given set of data. Fat Versus Protein: An Example pg 168 The following is a scatterplot of total fat versus

More information

Using a graphic display calculator

Using a graphic display calculator 12 Using a graphic display calculator CHAPTER OBJECTIVES: This chapter shows you how to use your graphic display calculator (GDC) to solve the different types of problems that you will meet in your course.

More information

Relationships Regression

Relationships Regression Relationships Regression BPS chapter 5 2006 W.H. Freeman and Company Objectives (BPS chapter 5) Regression Regression lines The least-squares regression line Using technology Facts about least-squares

More information

Solutionbank S1 Edexcel AS and A Level Modular Mathematics

Solutionbank S1 Edexcel AS and A Level Modular Mathematics Page 1 of 2 Exercise A, Question 1 As part of a statistics project, Gill collected data relating to the length of time, to the nearest minute, spent by shoppers in a supermarket and the amount of money

More information

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data Chapter 2: Summarising numerical data Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data Extract from Study Design Key knowledge Types of data: categorical (nominal and ordinal)

More information

OCR Maths S1. Topic Questions from Papers. Bivariate Data

OCR Maths S1. Topic Questions from Papers. Bivariate Data OCR Maths S1 Topic Questions from Papers Bivariate Data PhysicsAndMathsTutor.com 1 The scatter diagrams below illustrate three sets of bivariate data, A, B and C. State, with an explanation in each case,

More information

CORRELATION AND REGRESSION

CORRELATION AND REGRESSION CORRELATION AND REGRESSION CORRELATION The correlation coefficient is a number, between -1 and +1, which measures the strength of the relationship between two sets of data. The closer the correlation coefficient

More information

8.1 Frequency Distribution, Frequency Polygon, Histogram page 326

8.1 Frequency Distribution, Frequency Polygon, Histogram page 326 page 35 8 Statistics are around us both seen and in ways that affect our lives without us knowing it. We have seen data organized into charts in magazines, books and newspapers. That s descriptive statistics!

More information