Data transformation. Core: Data analysis. Chapter 5

Similar documents
Chapter 11. Correlation and Regression

Chapter 5: Data Transformation

Ready To Go On? Skills Intervention 2-1 Solving Linear Equations and Inequalities

The American School of Marrakesh. Algebra 2 Algebra 2 Summer Preparation Packet

5A Exponential functions

6.4 graphs OF logarithmic FUnCTIOnS

74 Maths Quest 10 for Victoria

Interpret Linear Graphs

H.Algebra 2 Summer Review Packet

Review Topics for MATH 1400 Elements of Calculus Table of Contents

Algebra 2 Chapter 2 Page 1

Exponential and Logarithmic Functions

SAMPLE. Inequalities and linear programming. 9.1 Linear inequalities in one variable

15.2 Graphing Logarithmic

Linear Equation Theory - 2

SAMPLE. Exponential Functions and Logarithms

Released Items. Grade 8 Mathematics North Carolina End-of-Grade Assessment. Published January 2019

LESSON #11 - FORMS OF A LINE COMMON CORE ALGEBRA II

LESSON #12 - FORMS OF A LINE COMMON CORE ALGEBRA II

P.4 Lines in the Plane

a. In the statement "Height is a function of weight," which is the independent variable and which is the dependent variable?

INTRODUCTION GOOD LUCK!

UNCORRECTED SAMPLE PAGES. 3Quadratics. Chapter 3. Objectives

7-1. Exploring Exponential Models. Vocabulary. Review. Vocabulary Builder. Use Your Vocabulary. 1. Cross out the expressions that are NOT powers.

Cubic and quartic functions

Essential Question How can you use a scatter plot and a line of fit to make conclusions about data?

ONLINE PAGE PROOFS. Relationships between two numerical variables

1.1. Use a Problem Solving Plan. Read a problem and make a plan. Goal p Use a problem solving plan to solve problems. VOCABULARY. Formula.

Historical Note. Regression. Line of Best Fit

Further algebra. polynomial identities

3.2 LOGARITHMIC FUNCTIONS AND THEIR GRAPHS

Turn to Section 4 of your answer sheet to answer the questions in this section.

Coached Instruction Supplement

Unit 10 - Graphing Quadratic Functions

Summarising numerical data

Math 115 First Midterm February 8, 2017

2-3. Linear Regression and Correlation. Vocabulary

Quadratic Functions Objective: To be able to graph a quadratic function and identify the vertex and the roots.

DMA 50 Worksheet #1 Introduction to Graphs: Analyzing, Interpreting, and Creating Graphs

Graphs and polynomials

3.7 Linear and Quadratic Models

SAMPLE. Exponential and logarithmic functions

Core Connections: Course 3 Checkpoint Materials

Graphing Linear Equations

15.2 Graphing Logarithmic

LESSON #24 - POWER FUNCTIONS COMMON CORE ALGEBRA II

Ready To Go On? Skills Intervention 5-1 Using Transformations to Graph Quadratic Functions

Study Guide and Intervention

LESSON #1 - BASIC ALGEBRAIC PROPERTIES COMMON CORE ALGEBRA II

1.7 Inverse Functions

Chapter 5: Systems of Equations

1.5. Analyzing Graphs of Functions. The Graph of a Function. What you should learn. Why you should learn it. 54 Chapter 1 Functions and Their Graphs

Heinemann VCE Zone textbook reference General Mathematics

Chapter 2 Linear Relations and Functions

136 Maths Quest 10 for Victoria

Precalculus Honors - AP Calculus A Information and Summer Assignment

Ready To Go On? Skills Intervention 6-1 Polynomials

9Logarithmic. functions using calculus UNCORRECTED PAGE PROOFS

Chapter 6. Exploring Data: Relationships

LESSON #28 - POWER FUNCTIONS COMMON CORE ALGEBRA II

2.4 Library of Functions; Piecewise-defined Functions. 1 Graph the Functions Listed in the Library of Functions

For use after the chapter Graphing Linear Equations and Functions 3 D. 7. 4y 2 3x 5 4; (0, 1) x-intercept: 6 y-intercept: 3.

5.6 RATIOnAl FUnCTIOnS. Using Arrow notation. learning ObjeCTIveS

UNIT 2 QUADRATIC FUNCTIONS AND MODELING Lesson 2: Interpreting Quadratic Functions Instruction

Systems of Linear and Quadratic Equations. Check Skills You ll Need. y x. Solve by Graphing. Solve the following system by graphing.

NCC Precalculus Partnership Program Final Examination, 2004

Final Exam Review. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

TOPIC ESSENTIAL QUESTION

15.2 Graphing Logarithmic

First Semester Final Review NON-Graphing Calculator

and y f ( x ). given the graph of y f ( x ).

Maintaining Mathematical Proficiency

c) domain {x R, x 3}, range {y R}

Hit the brakes, Charles!

y = f(x + 4) a) Example: A repeating X by using two linear equations y = ±x. b) Example: y = f(x - 3). The translation is

6.2b Homework: Fit a Linear Model to Bivariate Data

Chapter 11. Systems of Equations Solving Systems of Linear Equations by Graphing

Instructor: Imelda Valencia Course: A3 Honors Pre Calculus

Unit 3: Relations and Functions

1.2. Characteristics of Polynomial Functions. What are the key features of the graphs of polynomial functions?

tan t = y x, x Z 0 sin u 2 = ; 1 - cos u cos u 2 = ; 1 + cos u tan u 2 = 1 - cos u cos a cos b = 1 2 sin a cos b = 1 2

Math 115 First Midterm February 9, 2016

Lesson 1 Homework Practice

MATH 1710 College Algebra Final Exam Review

MASSACHUSETTS COMPREHENSIVE ASSESSMENT SYSTEM

MATH GRADE 8 UNIT 4 LINEAR RELATIONSHIPS EXERCISES

4 The Cartesian Coordinate System- Pictures of Equations

Physics Lab 1 - Measurements

12Variation UNCORRECTED PAGE PROOFS

Limits 4: Continuity

Biostatistics in Research Practice - Regression I

Cambridge University Press Photocopying is restricted under law and this material must not be transferred to another party.

MATH 021 UNIT 1 HOMEWORK ASSIGNMENTS

SAMPLE. A Gallery of Graphs. To recognise the rules of a number of common algebraic relationships: y = x 1,

Study Guide and Intervention. The Quadratic Formula and the Discriminant. Quadratic Formula. Replace a with 1, b with -5, and c with -14.

7.2 Properties of Graphs

Algebra 2 (All Levels)

A11.1 Areas under curves

UNCORRECTED. To recognise the rules of a number of common algebraic relations: y = x 1 y 2 = x

Chapter 3: Examining Relationships

Transcription:

Chapter 5 5 Core: Data analsis Data transformation ISBN 978--7-56757-3 Jones et al. 6

66 Core Chapter 5 Data transformation 5A Introduction You first encountered data transformation in Chapter where ou used a log scale to transform a skewed histogram into a more easil interpreted smmetric histogram. In this chapter, ou will learn to use the squared, log and reciprocal transformations to linearise scatterplots, the first step towards solving problems involving non-linear associations. The circle of transformations The tpes of scatterplots that can be transformed b the squared, log or reciprocal transformations can be fitted together into what we call the circle of transformations. The circle of transformations Possible transformations Possible transformations log log log log The purpose of the circle of transformations is to guide us in our choice of transformation to linearise a given scatterplot. There are two things to note when using the circle of transformations: In each case, there is more than one tpe of transformation that might work. These transformations onl appl to scatterplots with a consistentl increasing or decreasing trend. For eample, the scatterplot opposite has a consistentl increasing trend so the circle of transformations applies. Comparing the scatterplot to those in the circle of transformations we see that there are three transformations, the, the / or the log, that have the potential to linearise this scatterplot. At this stage ou might find it helpful to use the interactive Data transformation (accessible through the Interactive Tetbook) to see how these different transformations can be used to linearise scatterplots. ISBN 978--7-56757-3 Jones et al. 6

5A 5B Using data transformation to linearise a scatterplot 67 Eercise 5A The scatterplots below are non-linear. For each, identif the transformations, log, /,, log,/ or none that might be used to linearise the plot. a 5 b 5 3 3 3 5 6 7 8 9 3 5 6 7 8 9 c 5 d 5 3 3 5 6 7 8 9 3 3 5 6 7 8 9 5B Using data transformation to linearise a scatterplot The squared transformation The squared transformation is a stretching transformation. It works b stretching out the upper end of the scale on either the -or-ais. The effect of appling an transformation to a scatterplot is illustrated graphicall below. Transformation Outcome Graph Spreads out the high -values relative to the lower -values, leaving the -values unchanged. This has the effect of straightening out curves like the one shown opposite. The -squared transformation works in a similar manner but stretches out the scale on the -ais. ISBN 978--7-56757-3 Jones et al. 6

68 Core Chapter 5 Data transformation The following eample shows how the -squared transformation works in practice. Eample Appling the squared transformation A base jumper leaps from the top of a cliff, 56 metres above the valle floor. The scatterplot below shows the height (in metres) of the base jumper above the valle floor ever second, for the first seconds of the jump. After this time she opened her parachute to bring her safel to the ground. A scatterplot shows that there is a strong negative association between the height of the base jumper above the ground and time. However, the association is clearl non-linear as can be seen from the red dotted line on the scatterplot. Because the association is clearl non-linear, it makes no sense to tr to model the association with a straight line. Before we can fit a least squares line to the data, we need to linearise the scatter plot. Height (metres) 6 5 3 3 5 6 7 8 9 Time (seconds) The circle of transformation suggests that we could use either an or a to linearise this scatterplot. We will use the transformation. That involves changing the scale on the time ais to time. When we make this change, we see that the association between height and time is linear. See the plot opposite. Now that we have a linearised scatterplot, we can use a least squares line to model the association between height and time. The equation of this line is: height = 56.9 time Height (metres) 6 5 3 3 56 7 8 9 Time Like an regression line, we can use its equation to make predictions. For eample, after 3. seconds, we predict that the height of the base jumper is: height = 56.9 3. = 53 m (to nearest m) Performing a data transformation is quite computationall intensive, but our CAS calculator is well suited to the task. ISBN 978--7-56757-3 Jones et al. 6

5B Using data transformation to linearise a scatterplot 69 Using the TI-Nspire CAS to perform a squared transformation The table shows the height (in m) of a base jumper for the first seconds of her jump. Time 3 5 6 7 8 9 Height 56 555 5 56 8 38 383 3 6 63 7 a Construct a scatterplot displaing height (the RV) against time (the EV). b Linearise the scatterplot and fit a least squares line to the transformed data. c Use the regression line to predict the height of the base jumper after 3. seconds. Steps Start a new document b pressing / + N. Select Add Lists & Spreadsheet. Enter the data into lists named time and height, as shown. 3 Name column C as timesq (short for time squared ). Move the cursor to the gre cell below timesq. Enter the epression = time^ b pressing =, then tping time^. Pressing calculates and displas the values of timesq. 5 Press / + I and select Add Data & Statistics. Construct a scatterplot of height against time. Let time be the eplanator variable and height the response variable. The plot is clearl non-linear. 6 Press / + I and select Add Data & Statistics. Construct a scatterplot of height against time. The plot is now linear. ISBN 978--7-56757-3 Jones et al. 6

7 Core Chapter 5 Data transformation 7 Press b>analze>regression>show Linear (a + b) to plot the line on the scatterplot with its equation. Note: The in the equation on the screen corresponds to the transformed variable time. 8 Write down the regression equation in terms of the variables height and time. 9 Substitute 3. for time in the equation to find the height after 3. seconds. height = 56.9 time height = 56.9 3. = 53 m Using the CASIO Classpad to perform a squared transformation The table shows the height (in m) of a base jumper for the first seconds of her jump. Time 3 5 6 7 8 9 Height 56 555 5 56 8 38 383 3 6 63 7 a Construct a scatterplot displaing height (the RV) against time (the EV). b Linearise the scatterplot and fit a least squares line to the transformed data. c Use the regression line to predict the height of the base jumper after 3. seconds. ISBN 978--7-56757-3 Jones et al. 6

5B Using data transformation to linearise a scatterplot 7 Steps In the Statistics application enter the data into lists named time and height. Name the third list timesq (short for time squared). 3 Place the cursor in the calculation cell at the bottom of the third column and tpe time^. This will calculate the values of time. Let time be the eplanator variable () and height the response variable (). Construct a scatterplot of height against time. Tap and complete the Set StatGraphs dialog bo as shown. Tap to view the scatterplot. The plot is clearl non-linear. 5 Construct a scatterplot of height against time. Tap and complete the Set StatGraphs dialog bo as shown. Tap to view the scatterplot. The plot is now clearl linear. 6 Fit a regression line to the transformed data. Go to Calc, Regression, Linear Reg. Complete the Set Calculation dialog bo as shown and tap OK. Note: The in the linear equation corresponds to the transformed variable time. Tap OK a second time to plot and displa the regression line on the scatterplot. 7 Write down the equation in terms of height = 56.9 time. height and time. 8 Substitute 3. for time in the equation. height = 56.9 3. = 53 m ISBN 978--7-56757-3 Jones et al. 6

7 Core Chapter 5 Data transformation 5B Eercise 5B The -squared transformation: some prerequisite skills Evaluate in the following epression, correct to one decimal place. a = 7 + 8 when =.5 b = 7 + 3 when =.5 c =.56.7 when =.3 d =.75 + 5.95 when =.7 The -squared transformation: calculator eercises The scatterplot opposite was constructed from the data in the table below. 3 6 5 7 From the scatterplot, it is clear that the association between and is non-linear. 5 5 3 a Linearise the scatterplot b appling an -squared transformation and fit a least squares line to the transformed data. b Give its equation. c Use the equation to predict the value of when =. 3 The scatterplot opposite was constructed from the data in the table below. 3 5 3 9 9 33 5 From the scatterplot, the association between and is non-linear. 6 5 3 5 3 5 a Linearise the scatterplot b appling an -squared transformation and fit a least squares line to the transformed data. b Give its equation. c Use the equation to predict the value of when = 6. The -squared transformation: some prerequisite skills Evaluate in the following epression. Give the answers correct to one decimal place. a = 6 + when =.57 b =.7 3. when =.3 c =6 + when = ( > ) d = 58 + when = 3( < ) ISBN 978--7-56757-3 Jones et al. 6

5B 5B Using data transformation to linearise a scatterplot 73 The -squared transformation: calculator eercises 5 The scatterplot opposite was constructed from the data in the table below. 6 8..8 3.7.5 5. 5.7 From the scatterplot, the association between and is non-linear. 6. 5.. 3... 6 8 a Linearise the scatterplot b appling a -squared transformation and fit a least squares line to the transformed data. b Give its equation. Write the coefficient, correct to two significant figures. c Use the equation to predict the value of when = 9. Give the answer correct to one decimal place. Applications of the squared transformation 6 The table gives the diameter (in m) of five different umbrellas and the number of people each umbrella is designed to keep dr. A scatter plot is also shown. Diameter Number.5.7.85 3.. 5 Number of people 5 3...6.8. Diameter (metres) a Appl the squared transformation to the variable diameter and determine the least squares regression line for the transformed data. Number is the EV. Write the slope and intercept of this line, correct to one significant figure, in the spaces provided. number = + diameter b Use the equation to predict the number of people who can be sheltered b an umbrella of.3 m. Give our answer correct to the nearest person. ISBN 978--7-56757-3 Jones et al. 6

7 Core Chapter 5 Data transformation 5B 7 The time (in minutes) taken for a local anaesthetic to take effect is associated with to the amount administered (in units). To investigate this association a researcher collected the data. Amount.5.6.7.8.9....3..5 Time 3.7 3.6 3. 3.3 3. 3..9.7.5.3. The association between the variables amount and time is non-linear as can be seen from the scatterplot below. A squared transformation applied to the variable time will linearise the scatterplot. a Appl the squared transformation to the variable time and fit a least 3.5 squares regression line to the transformed data. Amount is the EV. 3 Write the equation of this line with.5 the slope and intercept, correct to two significant figures..5 b Use the equation to predict the time for the anaesthetic to take effect..6.8...6 Amount (units) when the dose is. units. Give the answer correct to one decimal place. Time (minutes) 5C The log transformation Skillsheet The logarithmic transformation is a compressing transformation and the upper end of the scale on either the -orthe-ais. The effect of appling a log transformation to a scatterplot is illustrated graphicall below. Transformation Outcome Graph log Compresses the higher -values relative to the lower -values, leaving the -values unchanged. This has the effect of straightening out curves like the one shown. The log transformation works in similar manner but compressing the scale on the -ais. ISBN 978--7-56757-3 Jones et al. 6

5C The log transformation 75 Eample Appling the log transformation The general wealth of a countr, often measured b its Gross Domestic Product (GDP), is one of several variables associated with lifespan in different countries. However, the association it not linear, as can be seen in the scatterplot below which plots lifespan (in ears) against GDP (in dollars) for 3 different countries. Because the association is non-linear, it makes no sense to tr to model the association with a straight line. But before we can fit a least squares regression line to the data, we need to transform the data. The circle of transformation suggests that we could use the, log or to linearise the transformation. We will use the log transformation. That is, we change the scale on the GDP-ais to log (GDP). Lifespan (ears) 83 8 79 77 75 73 7 69 67 35 GDP When we make this change, we see that the association between the variables lifespan and log (GDP) is linear. See the plot opposite. Note: On the plot, when log (GDP) =, the actual GDP is or $. We can now fit a least squares line to model the association between the variables lifespan and log (GDP). The equation of this line is: lifespan = 5.3 + 5.59 log (GDP) Lifespan (ears) 83 8 79 77 75 73 7 69 67.5 3 3.5 log (GDP).5 Like an other regression line we can use its equation to make predictions. For eample, for countr with a GDP of $, the lifespan is predicted to be: lifespan = 5.3 + 5.59 log = 78.3 ears (correct to one decimal place) Following the normal convention, log means log. ISBN 978--7-56757-3 Jones et al. 6

76 Core Chapter 5 Data transformation Using the TI-Nspire CAS to perform a log transformation The table shows the lifespan (in ears) and GDP (in dollars) of people in countries. The association is non-linear. Using the log transformation: linearise the data, and fit a regression line to the transformed data (GDP is the EV) write its equation in terms of the variables lifespan and GDP correct to three significant figures. use the equation of the regression line to predict the lifespan in a countr with a GDP of $, correct to one decimal place. Lifespan GDP 8. 36 3 79.8 3 8 79. 6 66 77. 89 78.8 6 893 8.5 5 59 7.9 7 5 7. 73 77.9 7 73 7.3 9 73. 63 68.6 3 Steps Start a new document b pressing / + N. Select Add Lists & Spreadsheet. Enter the data into lists named lifespan and gdp. 3 Name column C as lgdp (short for log (GDP)). Now calculate the values of log (GDP) and store them in the list named lgdp. Move the cursor to the gre cell below the lgdp heading. We need to enter the epression = log(gdp). To do this, press = then tpe in log(gdp). Pressing calculates and displas the values of lgdp. 5 Press / + I and select Add Data & Statistics. Construct a scatterplot of lifespan against GDP. Let GDP be the eplanator variable and lifespan the response variable. The plot is clearl non-linear. ISBN 978--7-56757-3 Jones et al. 6

5C The log transformation 77 6 Press / + I and select Add Data & Statistics. Construct a scatterplot of lifespan against log GDP. The plot is now clearl linear. 7 Press b>analze>regression>show Linear (a + b) to plot the line on the scatterplot with its equation. Note: The in the equation on the screen corresponds to the transformed variable log (GDP). 8 Write the regression equation in terms of the variables lifespan and log (GDP). 9 Substitute for GDP in the equation to find the lifespan of people in a countr with GDP of $. lifespan = 5.3 5.59 log (GDP) lifespan = 5.3 5.59 log = 78.3 ears Using the CASIO Classpad to perform a log transformation The table shows the lifespan (in ears) and GDP (in dollars) of people in countries. The association is non-linear. Using the log transformation: linearise the data, and fit a regression line to the transformed data (GDP is the EV) write its equation in terms of the variables lifespan and GDP correct to three significant figures. use the equation to predict the lifespan in a countr with a GDP of $ correct to one decimal place. Lifespan GDP 8. 36 3 79.8 3 8 79. 6 66 77. 89 78.8 6 893 8.5 5 59 7.9 7 5 7. 73 77.9 7 73 7.3 9 73. 63 68.6 3 ISBN 978--7-56757-3 Jones et al. 6

78 Core Chapter 5 Data transformation Steps In the Statistics application enter the data into lists named Lifespan and GDP. Name the third list loggdp. 3 Place the cursor in the calculation cell at the bottom of the third column and tpe log (GDP). Let GDP be the eplanator variable () and lifespan the response variable (). Construct a scatterplot of lifespan against log (GDP). Tap and complete the Set StatGraphs dialog bo as shown. Tap to view the scatterplot. The plot is linear. 5 To find the least squares regression equation and fit a regression line to the transformed data. Go to Calc, Regression, Linear Reg. Complete the Set Calculation dialog bo as shown and tap OK. This generates the regression results. Note: The in the linear equation corresponds to the transformed variable log (GDP). Tap OK a second time to plot and displa the regression line on the scatterplot. 6 Write the equation in terms of lifespan and log (GDP). 7 Substitute for GDP in the equation. lifespan = 5.3 5.59 log (GDP) lifespan = 5.3 5.59 log = 78.3 ears ISBN 978--7-56757-3 Jones et al. 6

5C 5C The log transformation 79 Eercise 5C The log transformation: some prerequisite skills Evaluate the following epressions correct to one decimal place. a = 5.5 + 3. log.3 b =.3 + 5. log. c = 8.5 +. log d = 96. 3. log 33 The log transformation: calculator eercise The scatterplot opposite was constructed from the data in the table below. 5 5 5 3.. 7.5 9.. From the scatterplot, it is clear that the association between and is non-linear. 9 8 7 6 5 3 6 8 a Linearise the scatterplot b appling a log transformation and fit a least squares line to the transformed data. b Write down its equation and the coefficient, correct to one significant figure. c Use the equation to predict the value of when =. 3 The scatterplot opposite was constructed from the data in the table below. 3 36 98 5..8 9. 6.8 5. From the scatterplot, it is clear that the relationship between and is non-linear. 6 8 6 6 8 a Linearise the scatterplot b appling a log transformation and fit a least squares line to the transformed data. b Write down its equation and coefficient, correct one significant figure. c Use the equation to predict the value of when =. The log transformation: some prerequisite skills Find the value of in the following, correct to one decimal place if not eact. a log = b log =.3 c log = 3.5 + where =.5 d log =.5 +. where = 7.3 ISBN 978--7-56757-3 Jones et al. 6

8 Core Chapter 5 Data transformation 5C The log transformation: calculator eercise 5 The scatterplot opposite was constructed from the data in the table below....3..5 5.8 5. 39.8 63.. From the scatterplot, it is clear that the relationship between and is non-linear. a Linearise the scatterplot b appling a log transformation and fit a least squares line to the transformed data. b Write down its equation. 8 6...3..5 c Use the equation to predict the value of when =.6, correct to one decimal place. Applications of the log transformation 6 The table below shows the level of performance level achieved b people on completion of a task. Also shown is the time spent (in minutes) practising the task. In this situation, time is the EV. The association between the level and time is non-linear as seen in the scatterplot. Time Level.5.5.5 3 3 3 3.5 5 6 3.5 7 3.9 7 3.6 Level 3.5 3.5.5.5 3 5 6 7 Time (minutes) A log transformation can be applied to the variable time to linearise the scatterplot. a Appl the log transformation to the variable time and fit a least squares line to the transformed data. log (time) is the EV. Write the slope and intercept of this line, correct to two significant figures in the spaces provided. level = + log (time) b Use the equation to predict the level of performance (correct to one decimal place) for a person who spends.5 minutes practising the task. ISBN 978--7-56757-3 Jones et al. 6

5C 5D The reciprocal transformation 8 7 The table below shows the number of internet users signing up with a new internet service provider for each of the first nine months of their first ear of operation. A scatterplot of the data also shown. Month Number 3 3 35 5 6 6 6 7 78 8 9 9 8 Number 8 6 3 The association between number and month is non-linear. 5 6 7 8 9 Month a Appl the log transformation to the variable number and fit a least squares line to the transformed data. Month is the EV. Write the slope and intercept of this line, correct to four significant figures, in the spaces provided. log (number) = + month b Use the equation to predict the number of internet users after months. Give answer to the nearest whole number. 5D The reciprocal transformation The reciprocal transformation is a stretching transformation that compresses the upper end of the scale on either the -or-ais. The effect of appling a reciprocal transformation to a scatterplot is illustrated below. Transformation Outcome Graph The reciprocal transformation works b compressing larger values of relative to lower values of. This has the effect of straightening out curves like the one shown opposite. The reciprocal transformation works the same wa but in the -direction. ISBN 978--7-56757-3 Jones et al. 6

8 Core Chapter 5 Data transformation The following eample shows how the / transformation works in practice. Eample 3 Appling the reciprocal transformation A homeware compan makes rectangular stick labels with a variet of lengths and widths. The scatterplot opposite displas the width (in cm) and length (in cm) of eight of their stick labels. There is a strong negative association between the width of the stick labels and their lengths, but it is clearl non-linear. Before we can fit a least squares regression line to the data, we need to linearise the scatterplot. Width (cm) 3.5 3.5.5 3.5.5 5 5.5 6 6.5 7 Length (cm) The circle of transformation suggests that we could use the log, /, / or log transformation to linearise the scatterplot. We will use the / transformation. That is, we will change the scale on the width ais to /width. This tpe of transformation is known as a reciprocal transformation. When we make this change, we see that the association between /width and length is linear. See the plot opposite. We can now fit a least squares line to model the association between /width and length. Note: On the plot opposite, when /width =., the actual width is /. =.5 cm. The equation of this line is: /width =.5 +.86 length Like an other regression line we can use its equation to make predictions. /width.6.55.5.5..35.3.5. 3.5 For eample, for a stick label of length 5 cm, we would predict that: /width =.5 +.86 5 =.5 or width = =.5 cm (to d.p.).5.5 5 5.5 6 6.5 7 Length (cm) ISBN 978--7-56757-3 Jones et al. 6

5D The reciprocal transformation 83 Using the TI-Nspire CAS to perform a squared transformation The table shows the length (in cm) and width (in cm) of eight sizes of stick labels. Length.8..5 3. 3.5.6..9 Width 6.8 5.6.6. 3.5. 5. 5.5 Using the / transformation: linearise the data, and fit a regression line to the transformed data (length is the EV) write its equation in terms of the variables length and width use the equation to predict the width of a stick label with a length of 5 cm. Steps Start a new document b pressing / + N. Select Add Lists & Spreadsheet. Enter the data into lists named length and width. 3 Name column C as recipwidth (short for /width). Calculate the values of recipwidth. Move the cursor to the gre cell below the recipwidth heading. Tpe in =/width. Press to calculate the values of recipwidth. Press / + I and select Add Data & Statistics. Construct a scatterplot of width against length. Let length be the eplanator variable and width the response variable. The plot is clearl non-linear. 5 Press / + I and select Add Data & Statistics. Construct a scatterplot of recipwidth (/width) against length. The plot is now clearl linear. 6 Press b>analze>regression>show Linear (a + b) to plot the line on the scatterplot with its equation. Note: The in the equation on the screen corresponds to the transformed variable /width. ISBN 978--7-56757-3 Jones et al. 6

8 Core Chapter 5 Data transformation 7 Write down the regression equation in terms of the variables width and length. 8 Substitute 5 cm for length in the equation. /width =.5 +.66 length /width =.5 +.66 5 =.39 or width = /.39 =.56 cm Using the CASIO Classpad to perform a reciprocal transformation The table shows the length (in cm) and width (in cm) of eight sizes stick labels. Length.8..5 3. 3.5.6..9 Width 6.8 5.6.6. 3.5. 5. 5.5 Using the / transformation: linearise the data, and fit a regression line to the transformed data. Length is the RV. write its equation in terms of the variables length and width. use the equation to predict the width of a stick label with length of 5 cm. Steps Open the Statistics application and enter the data into lists named length and width. Name the third list recwidth (short for reciprocal width). 3 Place the cursor in the calculation cell at the bottom of the third column and tpe /width. This will calculate all the reciprocal values of the width. Let length be the eplanator variable () and width the response variable (). Construct a scatterplot of /width against length. Tap and complete the Set StatGraphs dialog bo as shown. Tap to view the scatterplot. The plot is now clearl linear. ISBN 978--7-56757-3 Jones et al. 6

5D 5D The reciprocal transformation 85 5 Fit a regression line to the transformed data. Go to Calc, Regression, Linear Reg. Complete the Set Calculation dialog bo as shown and tap OK. This generates the regression results. Note: The in the linear equation corresponds to the transformed variable /width; that is /. Tap OK a second time to plot and displa the line on the scatterplot. 6 Write down the equation in terms of the variables width and length. 7 Substitute 5 cm for length in the equation. /width =.5 +.69 length /width =.5 +.69 5 =.39 or width = /.39 =.56 cm Eercise 5D The reciprocal (/) transformation: some prerequisite skills Evaluate the following epressions correct to one decimal place. a = 6 + when = 3 b.3 =.9 when =. c = 8.97 7.95 when =.97 d =.6 + 3.5 when =.8 The reciprocal (/) transformation: calculator eercise The scatterplot opposite was constructed from the data in the table below. 6 8 6 3 5 From the scatterplot, it is clear that the association between and is non-linear. 6 6 8 ISBN 978--7-56757-3 Jones et al. 6

86 Core Chapter 5 Data transformation 5D a Linearise the scatterplot b appling a / transformation and fit a least squares line to the transformed data. b Write down its equation. c Use the equation to predict the value of when = 5. The reciprocal (/) transformation: some prerequisite skills 3 Evaluate the following epressions correct to two decimal places. a = 3 when = b = 6 + when = c =.5 +. when =.5 d =.7 +.3 when =.5 The reciprocal (/) transformation: calculator eercise The scatterplot opposite was constructed from the data in the table below. 3 5.5.33.5. From the scatterplot, it is clear that the association between and is non-linear..5 3 5 a Linearise the scatterplot b appling a / transformation and fit a least squares line to the transformed data. b Write down its equation. c Use the equation to predict the value of when =.5. Applications of the reciprocal transformation 5 The table shows the horsepower of cars and their fuel consumption. The association between horsepower and fuel consumption is non-linear. 6 Consumption Horsepower 5. 55 7.3 5.6 75 7. 6.3 38. 88.5 8.6 7.9 7.7 3 Horsepower 5 3 9 8 7 6 6 8 6 Fuel consumption (km/litre) ISBN 978--7-56757-3 Jones et al. 6

5D 5D The reciprocal transformation 87 a Appl the reciprocal transformation to the variable time and fit a least squares line to the transformed data. Horsepower is the RV. Write the intercept and slope of this line in the provided, correct to three significant figures. horsepower = + consumption b Use the equation to predict the horsepower of a car with a fuel consumption of 9km/litre. 6 Ten students were given an opportunit to practise a comple matching task as often as the liked before the were assessed. The number of times the practised the task and the number of errors the made when assessed are given in the table. Times Errors 9 5 5 6 Errors 8 6 6 8 Times 7 3 7 3 9 a Appl the reciprocal transformation to the variable errors and determine the least squares regression with the number of times the task was practiced as the EV. Write the intercept and slope of this line in the boes provided, correct to two significant figures. errors = + times b Use the equation to predict the number of errors made when the task is practised si times. ISBN 978--7-56757-3 Jones et al. 6

88 Core Chapter 5 Data transformation Review Ke ideas and chapter summar Data transformation Squared transformation Logarithmic transformation Reciprocal transformation The circle of transformations In regression analsis, data transformation is used to linearise a scatterplot prior to modelling the association with a least squares line. The squared transformation stretches out the upper end of the scale on an ais. The logarithmic transformation compresses the upper end of the scale on an ais. The reciprocal transformation compresses the upper end of the scale on an ais but to a greater etent than the log transformation. The circle of transformations provides guidance in choosing the transformations that can be used to linearise various tpes of scatterplots. See Section 5A. Skills check Having completed this chapter ou should be able to: use the circle of transformation to identif an appropriate transformation to linearise a scatterplot appl a square, log or reciprocal transformation to linearise a scatterplot (to one ais onl) fit a least squares regression line to a linearised scatterplot, and use its equation to make predictions. Multiple-choice questions Select the statement that correctl completes the sentence: The effect of a squared transformation is to... A stretch the high values in the data B maintain the distance between values C stretch the low values in the data D compress the high values in the data E reverse the order of the values in the data Select the statement that correctl completes the sentence: The effect of a log transformation is to... A stretch the high values in the data B maintain the distance between values C stretch the low values in the data D compress the high values in the data E reverse the order of the values in the data ISBN 978--7-56757-3 Jones et al. 6

Chapter 5 review 89 3 The association between two variables and, as shown in the scatterplot, is non-linear. In an attempt to transform the relationship to linearit, a student would be advised to: A leave out the first four points B use a transformation C use a log transformation 5 3 356789 Review D usea/ transformation E use a least squares regression line The association between two variables and, as shown in the scatterplot, is non-linear. Which of the following sets of transformations could possibl linearise this relationship? A log,/, log,/ B, C, log,/ D log,/, E a + b 5 3 356789 5 The association between two variables and, as shown in the scatterplot, is non-linear. Which of the following transformations is most likel to linearise the relationship? A a/ transformation B a transformation C a log transformation D a/ transformation E a log transformation 5 3 356789 6 The following data were collected for two related variables and. 3 5 6 7 8 9 7 8.6 8.9 8.8 9.9 9.7..5.7.. A scatterplot indicates a non-linear association. The data is linearised using a log transformation and a least squares line is then fitted. The equation of this line is closest to: A C E = 7.5 +.37 log =.7 +.5 log = 7. + 3.86 log B D =.37 + 7.5 log = 3.86 + 7. log ISBN 978--7-56757-3 Jones et al. 6

9 Core Chapter 5 Data transformation Review 7 The data in the scatterplot opposite shows the width (cm) and the surface area (cm ) of leaves sampled from different trees. The scatterplot is non-linear. To linearise the scatterplot, (Width) is plotted against area and a least squares regression line is then fitted to the linearised plot. Width (cm) 9 8 7 6 5 3 3567 8 9 Area (cm ) The equation of this least squares regression line is: (Width) =.8 +.8 Area Using this equation, a leaf with a surface area of cm is predicted to have a width, in cm, closest to: A 9. B 9.9 C.6 D 8.6 E 97.8 VCAA (3) 8 The association between the total weight of produce picked from a vegetable garden and its width is non-linear. An transformation is used to linearise the data. When a least squares line is fitted to the data, its -intercept is and its slope is 5. Assuming that weight is the response variable, the equation of this line is: A (weight) = + 5 width B width = + 5 (weight) C width = 5 + (weight) D (weight) = + 5 (weight) E (weight) = 5 + weight 9 A model that describes the association between the hours spent studing for an eam and the mark achieved is: mark = + log (hours) From this model, we would predict that a student who studies for hours would score a mark (to the nearest whole number) of: A 8 B 78 C 8 D 7 E A/ transformation is used to linearise a scatterplot. The equation of a least squares line fitted to this data is: / =. +.5 This regression line predicts that, when = 6, is closest to: A.7 B.7 C. D. E 3.7 ISBN 978--7-56757-3 Jones et al. 6

Chapter 5 review 9 Etended-response questions The average age at first marriage (average age) and average earl income in dollars per person (income) was recorded for a group of 7 countries. The results are displaed below. A scatterplot of the data is also shown. Average age Income ($) (ears) 75 3 6 8 6 6 6 8 7 6 3 5 3 3 3 38 5 9 33 7 5 5 9 36 9 3 6 3 5 6 9 Average age (ears) 3 3 8 6 8 6 3 Income ($) The association between average age and income is non-linear. A log transformation can be applied to the variable income and used to linearise the data. a Appl this log transformation to the data and determine the equation of the least squares regression line that allows average age to be predicted from log (income). Write the coefficients for this equation correct to three significant figures in the spaces provided. average age = + log (income) b Use the equation to predict the average age of women at first marriage in a countr with an average income of $ per person. Write our answer correct to one decimal place. based on VCAA () Review ISBN 978--7-56757-3 Jones et al. 6

9 Core Chapter 5 Data transformation Review The table below shows the percentage of people who can read (literac rate) and the gross domestic product (GDP), in dollars, for a selection of countries. A scatterplot of the data is also shown. The scatterplot can be linearised b using a log transformation. GDP Literac rate 677 7 6 35 9 9 97 8 9 99 5 99 7 539 99 3 73 9 86 99 9 6 35 665 6 38 99 36 6 Literac rate (%) 8 6 5 5 5 GDP ($ ) a Appl the log transformation to the variable GDP and verif that it linearises the data b constructing a scatterplot from the transformed data. b Fit a least squares line to the transformed data and write down its equation terms of the variable literac rate and log (GDP). Literac rate is the RV. c Give the slope and intercept correct to three significant figures. d Use the regression line to predict the literac rate of a countr with a GDP of $ to the nearest per cent. 3 Measurements of the distance travelled (metres) and time taken (seconds) were made on a falling bod. The data are given in the table below. Time 3 5 6 Distance 5. 8.. 79. 8. 68. Time a Construct a scatterplot of the data and comment on its form. b Determine the values of time and complete the table. c Construct a scatterplot of distance against time. d Fit a least squares line to the transformed data. Distance is the RV. e Use the regression equation to predict the distance travelled in 7 seconds. f Obtain a residual plot and comment on the assumption of linearit. ISBN 978--7-56757-3 Jones et al. 6