Year 10 Mathematics Semester 2 Bivariate Data Chapter 13

Similar documents
YEAR 10 GENERAL MATHEMATICS 2017 STRAND: BIVARIATE DATA PART II CHAPTER 12 RESIDUAL ANALYSIS, LINEARITY AND TIME SERIES

TOPIC 13 Bivariate data

2017 Year 10 General Mathematics Chapter 1: Linear Relations and Equations Chapter 10: Linear Graphs and Models

Bivariate data data from two variables e.g. Maths test results and English test results. Interpolate estimate a value between two known values.

Linear Regression 3.2

8th Grade. Two Variable Data. Slide 1 / 122 Slide 2 / 122. Slide 4 / 122. Slide 3 / 122. Slide 6 / 122. Slide 5 / 122. Data.

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

Study Unit 2 : Linear functions Chapter 2 : Sections and 2.6

AP STATISTICS Name: Period: Review Unit IV Scatterplots & Regressions

MINI LESSON. Lesson 2a Linear Functions and Applications

Linear Regression Communication, skills, and understanding Calculator Use

Plotting Coordinates

Algebra I Final Study Guide

Chapter 12 : Linear Correlation and Linear Regression

Ch. 9 Pretest Correlation & Residuals

Chapter 3: Examining Relationships

Rate of Change and slope. Objective: To find rates of change from tables. To find slope.

Section Linear Correlation and Regression. Copyright 2013, 2010, 2007, Pearson, Education, Inc.

Unit 4 Linear Functions

Lab 1 Uniform Motion - Graphing and Analyzing Motion

MAC Module 2 Modeling Linear Functions. Rev.S08

CORE. Chapter 3: Interacting Linear Functions, Linear Systems. Algebra Assessments

Reteach 2-3. Graphing Linear Functions. 22 Holt Algebra 2. Name Date Class

Chapter 8 - Forecasting

Correlation and Regression

Copyright, Nick E. Nolfi MPM1D9 Unit 6 Statistics (Data Analysis) STA-1

CHAPTER 3 Describing Relationships

NAME: DATE: SECTION: MRS. KEINATH

SY14-15 Algebra Exit Exam - PRACTICE Version

IB Questionbank Mathematical Studies 3rd edition. Bivariate data. 179 min 172 marks

Unit #2: Linear and Exponential Functions Lesson #13: Linear & Exponential Regression, Correlation, & Causation. Day #1

Fall IM I Exam B

Lesson 4 Linear Functions and Applications

Simple Linear Regression

Linear Motion with Constant Acceleration

Midterm 2 - Solutions

Math 112 Spring 2018 Midterm 1 Review Problems Page 1

POLYNOMIAL FUNCTIONS. Chapter 5

4.1 Introduction. 4.2 The Scatter Diagram. Chapter 4 Linear Correlation and Regression Analysis

Chapter 6. Exploring Data: Relationships

CORRELATION AND REGRESSION

Example #1: Write an Equation Given Slope and a Point Write an equation in slope-intercept form for the line that has a slope of through (5, - 2).

Name Class Date. Residuals and Linear Regression Going Deeper

Chapter 5 Least Squares Regression

Business Statistics. Lecture 10: Correlation and Linear Regression

Chapter 6: Exploring Data: Relationships Lesson Plan

Upon completion of this chapter, you should be able to:

Analyzing Lines of Fit

Bivariate Data Summary

MATH 1113 Exam 1 Review

Pre-Algebra Chapter 8 Linear Functions and Graphing

Modeling Linear Relationships In the Patterns of Change unit, you studied a variety of

Chapter 1 Linear Equations and Graphs

Overview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation

March 14 th March 18 th

appstats8.notebook October 11, 2016

5, 0. Math 112 Fall 2017 Midterm 1 Review Problems Page Which one of the following points lies on the graph of the function f ( x) (A) (C) (B)

Overview. 4.1 Tables and Graphs for the Relationship Between Two Variables. 4.2 Introduction to Correlation. 4.3 Introduction to Regression 3.

Study Island. Scatter Plots

Algebra. Topic: Manipulate simple algebraic expressions.

1.1 Linear Equations and Inequalities

Chapter 12 - Part I: Correlation Analysis

Section 2.5 from Precalculus was developed by OpenStax College, licensed by Rice University, and is available on the Connexions website.

Q1. The table shows information about some items for sale in a clothes shop.

MATH 1101 Exam 1 Review. Spring 2018

BIVARIATE DATA data for two variables

Chapter 7. Scatterplots, Association, and Correlation

Chapter 12 Summarizing Bivariate Data Linear Regression and Correlation

CORRELATION ANALYSIS. Dr. Anulawathie Menike Dept. of Economics

A Plot of the Tracking Signals Calculated in Exhibit 3.9

0815AI Common Core State Standards

Checkpoint 1 Simplifying Like Terms and Distributive Property

Complete Week 9 Package

Inequalities Chapter Test

Int Math 1 Statistic and Probability. Name:

Correlation. Relationship between two variables in a scatterplot. As the x values go up, the y values go down.

Lecture 4 Scatterplots, Association, and Correlation

Lecture 4 Scatterplots, Association, and Correlation

Error Analysis, Statistics and Graphing Workshop

Chapter 2.1 Relations and Functions

Algebra II. Note workbook. Chapter 2. Name

BIOSTATISTICS NURS 3324

e. 0(4) f. 8/0 g. 0/15 h. (8/5)(6/4) 48 0 undefined 0

MATH 1710 College Algebra Final Exam Review

Topic 10 - Linear Regression

1) A residual plot: A)

Math 243 OpenStax Chapter 12 Scatterplots and Linear Regression OpenIntro Section and

Expressions and Equations 6.EE.9

Simple Inequalities Involving Addition and Subtraction. Unit 3 Inequalities.notebook. November 18, Table of Contents

a. Yes, it is consistent. a. Positive c. Near Zero

Chapter 11. Correlation and Regression

Lesson 3 Average Rate of Change and Linear Functions

SCATTER DIAGRAMS M.K. HOME TUITION. Mathematics Revision Guides Level: GCSE Higher Tier

Unit 6 Systems of Equations

7CORE SAMPLE. Time series. Birth rates in Australia by year,

Correlation Coefficient: the quantity, measures the strength and direction of a linear relationship between 2 variables.

More Vocabulary for Expressions

Name Date Class Unit 4 Test 1 Review: Linear Functions

q3_3 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

2: SIMPLE HARMONIC MOTION

Transcription:

Year 10 Mathematics Semester 2 Bivariate Data Chapter 13 Why learn this? Observations of two or more variables are often recorded, for example, the heights and weights of individuals. Studying the data allows us to investigate whether there is any relationship between the variables, how strong the relationship is, and whether one variable can be effectively predicted from information about another variable. Statistics can be applied to medical research, sport, agriculture, sustainability, weather forecasting and fashion trends, to name but a few fields. The capacity to analyse data and draw conclusions is an essential skill in a world where information is readily available and often manipulated. Example: An ice cream shop keeps track of how much ice cream they sell versus the temperature on that day. The two variables are Ice Cream Sales and Temperature. Here are their figures for the last 12 days: Temperature C 24.2 26.4 21.9 25.2 28.5 32.1 29.4 35.1 33.4 28.1 32.6 27.2 Ice Cream Sales $215 $325 $185 $332 $406 $522 $412 $614 $544 $421 $445 $408 And here is the same data as a Scatter Plot: Now we can easily see that warmer weather and more ice cream sales are linked, but the relationship is not perfect. Questions Section Title Questions 13.2 Bivariate Data 1d g, 2, 3, 4, 5, 7, 9, 12 13.3 Lines of best fit 1, 2, 3, 4, 5, 6, 7, 8, 10, 11 13.4 Time Series 1, 2, 3, 4, 5, 6, 7, 9, 10 More resources http://drweiser.weebly.com Page 1 of 14

TABLE OF CONTENTS Why learn this? 1 Questions 1 More resources 1 13.2 BIVARIATE DATA 3 Scatterplots 3 Dependent and independent variables 3 Example 1 3 Correlation 4 Worked Example 2 4 Correlation and causation 4 Worked Example 3 5 13.3 LINES OF BEST FIT 6 Lines of best fit by eye 6 Worked Example 4 6 Making predictions 7 Worked Example 5 8 Worked Example 6 8 Interpolation and extrapolation 9 Reliability of predictions 10 Correlation coefficient 10 13.4 TIME SERIES 11 Types of trends 11 Worked Example 9 12 Trend lines 12 Worked Example 10 12 Worked Example 11 13 SUMMARY TOPIC 13: BIVARIATE DATA 14 Bivariate data 14 Lines of best fit 14 Time series 14 Page 2 of 14

13.2 Bivariate Data Bivariate data are data with two variables. The list of bivariate data can be considered as numerical pairs of the type: (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) Bivariate data are usually represented graphically on scatterplots. Scatterplots A scatterplot is a graph that shows whether there is a relationship between two variables. Each data value on a scatterplot is shown by a point on a Cartesian plane. Dependent and independent variables One variable is generally the dependent variable, and the other the independent variable. The dependent variable, as the name suggests, is the one whose value depends on the other variable. The independent variable takes on values that do not depend on the value of the other variable. When data are expressed in the form of a table, generally the independent variable is written in the first row or the first column. The independent variable is placed on the x- axis and the dependent variable on the y- axis. Examples: A child s height might depend on their age A person s salary might depend on their age A car s value might depend on its age or distance travelled Example 1 The operators of a casino keep records of the number of people playing a Jackpot type game. The table below shows the number of players for different prize amounts. Draw a scatter plot of the data (no calculator) Number of Players 260 280 285 340 390 428 490 Prize ($) 1000 1500 2000 2500 3000 3500 4000 Think 1. Determine which is the dependent variable and which is the independent variable. 2. Rule up a set of axes. Label the title of the graph. Label the horizontal axis Number of tickets sold and the vertical axis Total revenue ($). 3. Use an appropriate scale on the horizontal and vertical axes. 4. Plot the points on the scatterplot. Page 3 of 14

Correlation Correlation is a descriptive measure of the relationship between two variables. Correlation describes the strength, the direction and the form of the relationship between the two variables. The closer the points are to a straight line form, the stronger the correlation between the two variables. The strength is described as weak, moderate or strong. If the points on a scatterplot have a generally positive slope, the relationship has a positive direction. If the slope is negative, the direction is negative. Worked Example 2 State the type of correlation between the variables y x and y, shown on the scatterplot. Correlation and causation Even a strong correlation does not necessarily mean that the increase or decrease in the level of one variable causes an increase or decrease in the level of the other. It is best to avoid statements such as: An increase in rainfall causes an increase in the wheat growth. The following guidelines should be closely followed in order to draw a conclusion about the relationship between the two variables based on the scatterplot. If the correlation between x and y is weak, we can conclude that there is little evidence to show that the larger x is, the larger (positive correlation) or smaller (negative correlation) y is. If the correlation between x and y is moderate, we can conclude that there is evidence to show that the larger x is, the larger (positive correlation) or smaller (negative correlation) y is. If the correlation between x and y is strong, we can conclude that the larger x is, the larger (positive correlation) or smaller (negative correlation) y is. Page 4 of 14

Worked Example 3 Mary sells business shirts in a department store. She always records the number of different styles of shirt sold during the day. The table below shows her sales over one week. Price ($) 14 18 20 21 24 25 28 30 32 35 Number of shirts sold 21 22 18 19 17 17 15 16 14 11 a) Construct a scatterplot of the data. In a new document, on a Lists & Spreadsheet page, label column A as price and label column B as sold. Enter the data as shown in the table. Open a Data & Statistics page. Press TAB e to locate the label of the horizontal axis and select the variable price. Press TAB e again to locate the label of the vertical axis and select the variable sold. To change the colour of the scatterplot, place the pointer over one of the data points. Then press DOC ~. Press: 2: Edit 2 7: Colour 7. 2: Fill Colour 2. Select a colour from the palette for the scatterplot. Press ENTER. b) State the type of correlation between the two variables and, hence, draw a corresponding conclusion. The points on the plot form a path that resembles a straight, narrow band, directed from the top left corner to the bottom right corner. The points are close to forming a straight line. There is a strong, negative, linear correlation between the two variables. The price of the shirt appears to affect the number sold; that is, the more expensive the shirt the fewer sold. Page 5 of 14

13.3 Lines of Best Fit If the points on a scatterplot appear to lie fairly closely distributed in a linear pattern, a straight line can be drawn through the data. The line can then be used to make predictions about the data. A line of best fit is a line on a scatterplot that is positioned so that it is as close as possible to all the data points. A line of best fit is used to generalise the relationship between two variables. Lines of best fit by eye A line of best fit can be drawn on a scatterplot by eye. This means that a line is positioned so that there is an equal number of points above and below the line. Once a line of best fit has been placed on the scatterplot, an equation for this line can be established, using the coordinates of any two points on the line. The equation for the line passing through the two selected points can then be calculated. The equation through the two points x ", y " and x ), y ) is given by: y = mx + c, where m = y ) y " x ) x " The process of fitting a line to a set of points is often referred to as regression. The regression line or trend line (also known as the line of best fit) may be placed on a scatterplot by eye or by using the CAS Worked Example 4 The data in the table show the cost of using the internet at a number of different internet cafes based on hours used per month. Hours used per month 10 12 20 18 10 13 15 17 14 11 Total monthly cost ($) 15 18 30 32 18 20 22 23 22 18 a) Construct a scatterplot of the data. In a new document, on a Lists & Spreadsheet page, label column A as hours and label column B as cost. Enter the data as shown in the table. Open a Data & Statistics page. Press TAB e to locate the label of the horizontal axis and select the variable price. Press TAB e again to locate the label of the vertical axis and select the variable sold. Page 6 of 14

b) Draw in the line of best fit by eye. c) Find the equation of the line of best fit in terms of the variables n (number of hours) and C (monthly cost). Alternatively on the CAS Press menu b 4: Analyze 6: Regression 1: Show linear (mx+b) The equation in terms of y=mx+b will be shown on the graph. Making predictions The line of best fit can be used to predict the value of one variable from that of another. Because of the subjective nature of the line, it should be noted that predictions are not accurate values, but rough estimates. Although this is the case, predictions using this method are considered valuable when no other methods are available. If the equation of the line of best fit is known, or can be derived, predictions can be made by substituting known values into the equation of the line of best fit. Page 7 of 14

Worked Example 5 Use the given scatterplot and line of best fit to predict: a) The value of y when x=10 b) The value of x when y=10. Worked Example 6 The table below shows the number of boxes of tissues purchased by hayfever sufferers and the number of days affected by hayfever during the blooming season in spring. Number of days affected by hayfever (d) 3 12 14 7 9 5 6 4 10 8 Total number of boxes of tissues (T ) 1 4 5 2 3 2 2 2 3 3 a) Construct a scatterplot of the data and draw a line of best fit. b) Determine the equation of the line of best fit. c) Interpret the meaning of the gradient. d) Use the equation of the line of best fit to predict the number of boxes of tissues purchased by people suffering from hayfever over a period of: i. 11 days ii. 15 days. Page 8 of 14

Alternatively on the CAS In a new document, on a Lists & Spreadsheet page, label column A as days and label column B as tissues. Enter the data as shown in the table. Press menu b 4: Statistics 1: Stat Calculations 3: Linear Regression (mx+b) Enter X: days Y: tissues, then OK The equation and values of m and b (c) are shown. Add a new calculator page and Type f1(11) enter Type f1(15) enter To get the value of tissues for 11 and 15 days. Interpolation and extrapolation Interpolation is the term used for predicting a value of a variable from within the range of the given data. Extrapolation occurs when the value of the variable being predicted is outside the range of the given data. In Worked example 6, the number of days ranged from 3 days to 14 days. Making a prediction for 11 days is an example of interpolation, whereas making a prediction for 15 days is an example of extrapolation. Predictions involving interpolation are considered to be quite reliable. Those involving extrapolation should be treated with caution, as they rely on the trend of the line remaining unchanged beyond the range of the data. Page 9 of 14

Reliability of predictions When predictions of any type are made, it is useful to know whether they are reliable or not. If the line of best fit is used to make predictions, they can be considered to be reliable if each of the following is observed. o The number of data values is large. o The scatterplot indicates reasonably strong correlation between the variables. o The predictions are made using interpolation. Correlation coefficient Once a relationship between two variables has been established, it is helpful to develop a quantitative value to measure the strength of the relationship. One way is to calculate a correlation coefficient, r. This is easily done using a CAS calculator, but a manual method is shown below. The formula for the Correlation coefficient r is: r = x x y y x x ) y y ) where x and y are the two sets of scores x and y are the means of those scores the symbol represents the sum of the expressions indicated. The correlation coefficient is a value in the range 1 to +1. The value of 1 indicates a perfect negative relationship between the two variables, while the value of +1 indicates a perfect positive relationship. For values within this range, a variety of descriptors are used. Page 10 of 14

13.4 Time series Time series data are any data that have time as the independent variable. The data are graphed and the graphs are used to determine if a trend is present in the data. Identifying a trend can help when making predictions about the future. Types of trends A general upward or downward trend is a graph that overall goes up or down as illustrated in the graph shown below. A seasonal pattern displays fluctuations that repeat at the same time each week, month or quarter and usually last less than one year. The graph below illustrates that the peak selling time for houses is in the spring. A cyclical pattern displays fluctuations that repeat but will usually take longer than a year to repeat. An example of this is shown in the graph below, which depicts software products sold. Random patterns do not show any regular fluctuation. They are usually caused by unpredictable events such as the economic recession illustrated in the graph below. Trends can work in combinations; for example, you can have a seasonal pattern with an upward Page 11 of 14

Worked Example 9 The data below show the average daily mass of a person (to the nearest 100 g), recorded over a period of 4 weeks. 63.6, 63.8, 63.5, 63.7, 63.2, 63.0, 62.8, 63.3, 63.1, 62.7, 62.6, 62.5, 62.9, 63.0, 63.1, 62.9, 62.6, 62.8, 63.0, 62.6, 62.5, 62.1, 61.8, 62.2, 62.0, 61.7, 61.5, 61.2 a) Plot these masses as a time series graph. b) Comment on the trend. Trend lines A trend line is a type of line of best fit. Trend lines indicate the general trend of the data. Trend lines are useful in forecasting, or making predictions about the future, by extrapolation. Extrapolation can have limited reliability, as predictions are based on the assumption that the current trend will continue into the future. Worked Example 10 The graph at shows the average cost of renting a one- bedroom flat, as recorded over a 10- year period. a) If appropriate, draw in a line of best fit and comment on the type of the trend. b) Assuming that the current trend will continue, use the line of best fit to predict the cost of rent in 5 years time. Page 12 of 14

Worked Example 11 Data were recorded about the number of families who moved from Sydney to Newcastle over the past 10 years. a) Use technology to construct a time series graph, with a line of best fit, that represents the data. In a new problem on a Lists & Spreadsheet page label column A as year label column B as numbermoved. Enter the data as in the table above. Open a Data & Statistics page. Press TAB e to locate the label of the horizontal axis and select the variable year. Press TAB e again to locate the label of the vertical axis and select the variable numbermoved. Press: MENUb 4: Analyze 4 2: Add Moveable Line 2 A line and its equation appear automatically on the graph as shown. Repositioning the line is done in two steps by moving the position of the y-intercept and altering the gradient. To change the y-intercept, move the pointer until it rests somewhere around the middle of the line. Press the Click key a, then use the Touchpad to move the line parallel to itself until the y-intercept is in an appropriate position. To change the gradient, move the pointer until it rests somewhere near one end of the line. Press the Click key a, then use the Touchpad to rotate the line and change the gradient. Continue to use these tools until you are satisfied with the line of best fit by eye. b) Describe the trend: There appears to be an upward trend over the 10 years. c) Measure the correlation. Back on the Spreadsheet page Press: MENUb 4: Statistics 4 1: Stat Calculations 1 3: Linear Regression (mx + b) 3. Select year as the X List and numbermoved as the Y List, leave the next fields blank, TAB e to OK, then press ENTER. Page 13 of 14

d) Comment on the results. Over the last 10 years, an increasing number of families have decided to make the move from Sydney to Newcastle. The correlation is strong and positive (0.8761), making it possible to predict that this trend is likely to continue. Summary Topic 13: Bivariate data Bivariate data Bivariate data involve two sets of related variables for each piece of data. Bivariate data are best represented on a scatterplot. On a scatterplot each piece of data is shown by a single point whose x- coordinate is the value of the independent variable, and whose y- coordinate is the value of the dependent variable. The relationship between two variables is called correlation. Correlation can be classified as linear or non- linear; positive or negative; and weak, moderate or strong. If the points appear to be scattered about the scatterplot in no particular order, then no correlation between the two variables exists. If the points form a straight line, then the relationship between the variables is perfectly linear. When drawing conclusions based on the scatterplot, it is important to distinguish between the correlation and the cause. Strong correlation between the variables does not necessarily mean that an increase in one variable causes an increase or decrease in the other. Lines of best fit If the scatterplot indicates a linear relationship between two variables, the linear model of the relationship can be established as follows: o position a line of best fit on the scatterplot o select any two points on the line and determine the equation of the line. The equation of the line passing through two points ( x1, y 1) and ( x2, y 2) is given by: y2 y1 y = mx+ b where m = x 2 x 1 The line of best fit can be used for predicting the value of one variable when given the value of the other. This can be done graphically or, if the equation of the line is known, algebraically (by substituting known values into the equation of the line of best fit). When the value that is being predicted using the line of best fit is within the given range, the process is called interpolation. When the value that is being predicted using the line of best fit is outside the given range, the process is called extrapolation. Only predictions made using interpolation can be considered reliable. Least squares regression involves a mathematical approach to fitting a line of best fit to bivariate data that show a strong linear correlation. It takes error lines, forms squares, and minimises the sum of the squares. A calculator is best used for the calculations. The correlation coefficient r is a quantitative measure of the correlation between two variables. The value of r lies in the range 1 to +1. The closer the value of r lies to zero, the weaker the correlation between the two variables. Time series Time series graphs are line graphs with the time plotted on the horizontal axis. Time series are used for analysing general trends and for making predictions for the future. Predictions involving time series graphs are always based on the assumption that the current trend will continue in the future. Page 14 of 14