PS5: Two Variable Statistics LT3: Linear regression LT4: The test of independence.

Similar documents
PS2: Two Variable Statistics

MINI LESSON. Lesson 2a Linear Functions and Applications

Lesson 4 Linear Functions and Applications

4.1 Introduction. 4.2 The Scatter Diagram. Chapter 4 Linear Correlation and Regression Analysis

Overview. 4.1 Tables and Graphs for the Relationship Between Two Variables. 4.2 Introduction to Correlation. 4.3 Introduction to Regression 3.

Learning Goals. 2. To be able to distinguish between a dependent and independent variable.

Intermediate Algebra Summary - Part I

Mt. Douglas Secondary

AP Final Review II Exploring Data (20% 30%)

Chapter 2: Looking at Data Relationships (Part 3)

Approximate Linear Relationships

Chapter 3: Examining Relationships

Talking feet: Scatterplots and lines of best fit

Let the x-axis have the following intervals:

Warm-up: 1) A craft shop sells canvasses in a variety of sizes. The table below shows the area and price of each canvas type.

Final Exam - Solutions

determine whether or not this relationship is.

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878

Lesson 3 Average Rate of Change and Linear Functions

appstats8.notebook October 11, 2016

Correlation Coefficient: the quantity, measures the strength and direction of a linear relationship between 2 variables.

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

Prob/Stats Questions? /32

Stat 101 Exam 1 Important Formulas and Concepts 1

Resistant Measure - A statistic that is not affected very much by extreme observations.

Using a graphic display calculator

y n 1 ( x i x )( y y i n 1 i y 2

Conditional Probability Solutions STAT-UB.0103 Statistics for Business Control and Regression Models

Northwood High School Algebra 2/Honors Algebra 2 Summer Review Packet

Chapter 8. Linear Regression /71

Summer Review for Mathematical Studies Rising 12 th graders

Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall)

Chapter 5 Test Review

Teaching S1/S2 statistics using graphing technology

Correlation and Regression

Unit 7 Graphs and Graphing Utilities - Classwork

We will now find the one line that best fits the data on a scatter plot.

H l o t lol t M t c M D gc o ed u o g u al a 1 g A al lg Al e g b e r r 1 a

Reteach 2-3. Graphing Linear Functions. 22 Holt Algebra 2. Name Date Class

OHS Algebra 2 Summer Packet

2. LECTURE 2. Objectives

Prof. Bodrero s Guide to Derivatives of Trig Functions (Sec. 3.5) Name:

MAC Module 2 Modeling Linear Functions. Rev.S08

Sampling, Frequency Distributions, and Graphs (12.1)

Objectives. Materials

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Final Exam - Solutions

What are the mean, median, and mode for the data set below? Step 1

Rate of Change and slope. Objective: To find rates of change from tables. To find slope.

MATH 1150 Chapter 2 Notation and Terminology

Graphing Skill #1: What Type of Graph is it? There are several types of graphs that scientists often use to display data.

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels.

Function Junction: Homework Examples from ACE

BIOSTATISTICS NURS 3324

Math 12 - for 4 th year math students

Statistics for Managers Using Microsoft Excel

Is there a connection between gender, maths grade, hair colour and eye colour? Contents

Marketing Research Session 10 Hypothesis Testing with Simple Random samples (Chapter 12)

ECON 497 Midterm Spring

Do Now 18 Balance Point. Directions: Use the data table to answer the questions. 2. Explain whether it is reasonable to fit a line to the data.

Session 4 2:40 3:30. If neither the first nor second differences repeat, we need to try another

Lecture 9. Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests

Example. χ 2 = Continued on the next page. All cells

1.3.1 Measuring Center: The Mean

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Chapter 4 - Writing Linear Functions

IB Questionbank Mathematical Studies 3rd edition. Bivariate data. 179 min 172 marks

Chapter2 Description of samples and populations. 2.1 Introduction.

PS2.1 & 2.2: Linear Correlations PS2: Bivariate Statistics

Q1. The table shows information about some items for sale in a clothes shop.

Unit 4 Linear Functions

Chapter 10: Chi-Square and F Distributions

Chapter 5 : Probability. Exercise Sheet. SHilal. 1 P a g e

Psych 230. Psychological Measurement and Statistics

AP STATISTICS Name: Period: Review Unit IV Scatterplots & Regressions

Linear Regression 3.2

Constant Acceleration

Lecture 1: Description of Data. Readings: Sections 1.2,

8th Grade. Two Variable Data. Slide 1 / 122 Slide 2 / 122. Slide 4 / 122. Slide 3 / 122. Slide 6 / 122. Slide 5 / 122. Data.

SCATTER DIAGRAMS M.K. HOME TUITION. Mathematics Revision Guides Level: GCSE Higher Tier

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

9. Linear Regression and Correlation

ENV Laboratory 1: Quadrant Sampling

Quantitative Bivariate Data

2.1 Scatterplots. Ulrich Hoensch MAT210 Rocky Mountain College Billings, MT 59102

(quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables)

Steps to take to do the descriptive part of regression analysis:

Describing distributions with numbers

Copyright, Nick E. Nolfi MPM1D9 Unit 6 Statistics (Data Analysis) STA-1

Descriptive Statistics Class Practice [133 marks]

Chapter 10. Correlation and Regression. McGraw-Hill, Bluman, 7th ed., Chapter 10 1

Least-Squares Regression

Statistics Revision Questions Nov 2016 [175 marks]

Wahkiakum School District, Pre-EOC Algebra

Analyzing Lines of Fit

Can you tell the relationship between students SAT scores and their college grades?

Practice Questions for Exam 1

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

Expressions and Equations

Homework (due Wed, Oct 27) Chapter 7: #17, 27, 28 Announcements: Midterm exams keys on web. (For a few hours the answer to MC#1 was incorrect on

Transcription:

PS5: Two Variable Statistics LT3: Linear regression LT4: The test of independence. Example by eye. On a hot day, nine cars were left in the sun in a car parking lot. The length of time each car was left in the sun was recorded, as well as the temperature inside the car at the end of the period. Car A B C D E F G H I Time 50 5 25 40 15 45 55 10 15 Temp 100 70 88 96 77 110 121 80 73 A. Calculate the mean of both variable B. Draw a scatter diagram of the data. C. Plot the point, on the scatter D. diagram and then draw the line of best fit. E. Predict: The temp at 35 minutes The temp at 75 minutes. Comment on the reliability of your predictions. 1

More on reliability of a prediction The accuracy of an interpolation depends on how linear the original data was. This can be gauged by the correlation coefficient and by ensuring that the data is randomly scattered around the line of best fit. The accuracy of an extrapolation depends not only on how linear the original data was, but also on the assumption that the linear trend will continue past the poles. The validity of this assumption depends greatly on the situation we are looking at. Issues The problem with drawing a line of best fit by eye is that the line drawn will vary from one person to another. Instead, we use a method known as linear regression to find the equation of the line which best fits the data. The most common method is the method of least squares. The Least Squares Regression Line For any line we draw to model the points, we can find the vertical distances d 1, d 2, d 3,... between each point and the line. The least squares regression line is the line which makes this sum as small as possible. We calculate the equation of the line using = +. (a is slope) Take a look at how regressions are used with stocks, specifically Strangles. 2

The Least Squares Regression Line Lets take the last data and calculate this on our GDC! Car A B C D E F G H I Time (x) 50 5 25 40 15 45 55 10 15 Temp 100 70 88 96 77 110 121 80 73 A) Use technology to find the least squares regression line. It is very simple to calculate the least square regression line, and furthermore, its easy to graph the line. With the data in and, (no mean point) 1) Press Stat Calc. linreg(ax+b). 2) make sure it looks like this 3) calculate. 4) put into form. 5) plot it with y =. On your own. You try. The annual income and average weekly grocery bill for a selection of families is shown below: Family A B C D E F G H Income (x thousand pounds) 55 36 25 47 60 64 42 50 Grocery bill (y pounds) 120 90 60 160 190 250 110 150 a Construct a scatter diagram to illustrate the data. b Use technology to find the least squares regression line. Graph it. c Estimate the weekly grocery bill for a family with an annual income of 95,000 pounds. Comment on whether this estimate is likely to be reliable. Answer: Annual grocery bill: pounds. Because this is extrapolation, it may be unreliable. 3

The least Squares Formulae (by hand) Some questions may be asked about this. Lets do a small problem for background but note that this will not be on the IB paper. = ( ) Remember: what is? We us a different formula. = What is? = This is new. Problem: x y xy 1 3 3 5 5 6 We need to find, and sum of = 3, = 4.667 The least Squares Fomulae (by hand) = ( ) = Problem: x y xy 1 3 3 1 = = 3 4.667 = 3 5 15 9 5 6 30 25 Sum 48 35 = 9 = 4.667 =. 3 4.667 =.7499 3 We need to find, and sum of = 3, = 4.667 =.7499 + 2.417 4

New equation that works with r. Here is another equation that works without finding and. = ( ) This is nice because you are able to use formulas or data that can be found easily with a calculator. Cool. Lets move on. LT4, 11E the test of independence. Pronounced Chi = Ki. This is the formal test of independence and it s counterparts. 5

11E: Categorical or grouped tests. Currently, we are only able to look at numerical data through the Least Squared Regression Test. What if we want to look at categorical or grouped data? In other words, suppose the variables gender and regular exercise are examined. They may be dependent, for example females may be more likely to exercise regularly than males. Alternatively, the variables may be independent, which means the gender of a person has no effect on whether they exercise regularly. We need to use the or chi-squared test of independence. (pronounced ky squared test ) the test of independence looks to see if two variables, usually grouped or categorical are independent or dependent of one another. What are some other examples that can be use? EX: Is GPA dependent on height or visa versa? We can test this dependency or lack there of using the test of independence. Contingency tables To understand grouped or categorical data, we use a contingency table to show the number of values that will fall under the that particular category. Example: Suppose we surveyed 400 people about their gender and number of text messages per week. Since this is a numerical value, we want to group it into categories, such as 300, and < 300. We want to test whether the two variables are independent, therefore we need to put the variables in a contingency table and count the number of people within these two categories. After we fill in the contingency table, it s time to run the test. < sum Male Female sum 60 152 212 135 53 188 195 205 400 6

Calculating by hand. After we have gathered the data and placed it into a contingency table, we need to calculate the expected values if the variables were independent. For example, if they were independent, then 300 = ( 300). =. This would be the probability of the survey for that cell, so the expected value would be 400 < sum Male 60 135 195 Female 152 53 205 sum 212 188 400 = 103.35 males texting over 300 texts. To streamline this, we make an expected frequency table and calculate the expected of each cell. < sum Male Femal e 212 195 = 103.35 400 212 205 = 108.65 400 188 195 = 91.65 195 400 188 205 = 96.35 205 400 sum 212 188 400 Contingency tables examined. The observed frequency table is the one where you collected the data. The expected frequency table is the one we calculate to find the expected frequency. Here is a generic look at a expected frequency contingency table sum Category V Category W = = c = = d sum a b n 7

Before we continue Chi-squared test examines the difference between the observed values obtained from our sample and the expected values we have recently calculated. To do this, we look at = Where is the observed frequency and is the expected frequency. If the variables are independent, the observed and expected values will be very similar. This means that the values of ( ) will be small, and hence will be small. If the variables are not independent, the observed values will differ significantly from the expected values. The values of ( ) will be large, and hence will be large. Now, lets continue using a table to help us calculate. Finished the calculations What we want to find is the sum of divided by. We fill in the table top row, first column and continue column then row from the cont. table. 60 103.35 43.35 1879.22 18.18 135 91.65 43.35 1879.22 20.50 152 108.65 43.35 1879.22 17.30 53 96.35 43.35 1879.22 19.50 total 75.48 The therefore this is fairly large, therefore we can predict that gender and the amount of texting may be dependent. What are some things that that hinder our reliability? 8

Do it on your own! There was a survey conducted in my Concepts of Algebra class that asked students what gender they were, and their hat type preference. The results were as follows: Fedora Cowboy sum Fedora Cowboy sum Male 13 23 17 21 Male Fem ale 6 17 36 9 Fem ale sum sum 1) Calculate a contingency table Find using a table. total Calculating Chi-squared using GDC What we need to know is that by hand, and using technology are both required for your IB papers. How to use a TI-84 plus to calculate Lets enter the data into a matrix to calculated 1) the expected values, and 2) the. Step 1) enter the raw data into a 2 3 matrix. 2 Step 2). Step 3) Calculate! That gets the. To find the filled in expected values. Go back to matrix and edit matrix B. 9

The formal test. Next time. Homework: 11D p.18 #1,2,5,6 11E.1 p.22 #2,3 10