PS2.1 & 2.2: Linear Correlations PS2: Bivariate Statistics

Similar documents
PS2: Two Variable Statistics

Chapter 2: Tools for Exploring Univariate Data

CS 361: Probability & Statistics

Shape, Outliers, Center, Spread Frequency and Relative Histograms Related to other types of graphical displays

Stat 101 Exam 1 Important Formulas and Concepts 1

Correlation A relationship between two variables As one goes up, the other changes in a predictable way (either mostly goes up or mostly goes down)

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

AP Final Review II Exploring Data (20% 30%)

Steps to take to do the descriptive part of regression analysis:

MEASURING THE SPREAD OF DATA: 6F

HUDM4122 Probability and Statistical Inference. February 2, 2015

SCATTERPLOTS. We can talk about the correlation or relationship or association between two variables and mean the same thing.

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?!

MATH 10 INTRODUCTORY STATISTICS

Bivariate Data Summary

Unit Six Information. EOCT Domain & Weight: Algebra Connections to Statistics and Probability - 15%

Univariate (one variable) data

8/4/2009. Describing Data with Graphs

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y

Statistics 1. Edexcel Notes S1. Mathematical Model. A mathematical model is a simplification of a real world problem.

4.1 Introduction. 4.2 The Scatter Diagram. Chapter 4 Linear Correlation and Regression Analysis

UNIVERSITY OF MASSACHUSETTS Department of Biostatistics and Epidemiology BioEpi 540W - Introduction to Biostatistics Fall 2004

Contents. Acknowledgments. xix

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

AP Statistics. Chapter 6 Scatterplots, Association, and Correlation

BNG 495 Capstone Design. Descriptive Statistics

M 140 Test 1 B Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

PubHlth 540 Fall Summarizing Data Page 1 of 18. Unit 1 - Summarizing Data Practice Problems. Solutions

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

Relationships between variables. Visualizing Bivariate Distributions: Scatter Plots

AP Statistics Two-Variable Data Analysis

= n 1. n 1. Measures of Variability. Sample Variance. Range. Sample Standard Deviation ( ) 2. Chapter 2 Slides. Maurice Geraghty

Introduction to Statistics

STATISTICS 1 REVISION NOTES

TOPIC: Descriptive Statistics Single Variable

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

Chapter 6. September 17, Please pick up a calculator and take out paper and something to write with. Association and Correlation.

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1

MATH 10 INTRODUCTORY STATISTICS

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

Slide 7.1. Theme 7. Correlation

2. LECTURE 2. Objectives

Histograms allow a visual interpretation

STA220H1F Term Test Oct 26, Last Name: First Name: Student #: TA s Name: or Tutorial Room:

Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore

review session gov 2000 gov 2000 () review session 1 / 38

Review of Multiple Regression

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Performance of fourth-grade students on an agility test

Chapter 1 Introduction & 1.1: Analyzing Categorical Data

Describing Bivariate Relationships

REVIEW 8/2/2017 陈芳华东师大英语系

Session 4 2:40 3:30. If neither the first nor second differences repeat, we need to try another

THE PEARSON CORRELATION COEFFICIENT

appstats8.notebook October 11, 2016

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes

Describing distributions with numbers

CRP 272 Introduction To Regression Analysis

Simple Regression Model. January 24, 2011

Chapter 7 Linear Regression

Elementary Statistics

Vocabulary: Samples and Populations

M & M Project. Think! Crunch those numbers! Answer!

Section 5.4. Ken Ueda

Reminder: Univariate Data. Bivariate Data. Example: Puppy Weights. You weigh the pups and get these results: 2.5, 3.5, 3.3, 3.1, 2.6, 3.6, 2.

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

PS5: Two Variable Statistics LT3: Linear regression LT4: The test of independence.

Chapter 7. Scatterplots, Association, and Correlation. Copyright 2010 Pearson Education, Inc.

Unit 2. Describing Data: Numerical

Chapter 6 The Normal Distribution

5.1 Bivariate Relationships

Correlation and regression

Chapter 19 Sir Migo Mendoza

Chapter 8. Linear Regression /71

STT 315 This lecture is based on Chapter 2 of the textbook.

Chapter 6 The Standard Deviation as a Ruler and the Normal Model

y n 1 ( x i x )( y y i n 1 i y 2

The science of learning from data.

Math 147 Lecture Notes: Lecture 12

Lecture 1: Descriptive Statistics

8/28/2017. Both examine linear (straight line) relationships Correlation works with a pair of scores One score on each of two variables (X and Y)

STATISTICS/MATH /1760 SHANNON MYERS

Arvind Borde / MAT , Week 5: Relationships I

STA Module 5 Regression and Correlation. Learning Objectives. Learning Objectives (Cont.) Upon completing this module, you should be able to:

Statistical Methods. by Robert W. Lindeman WPI, Dept. of Computer Science

Probability Distributions

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

HOMEWORK (due Wed, Jan 23): Chapter 3: #42, 48, 74

Relationships Regression

Chapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals

Chapter 6 Scatterplots, Association and Correlation

1. Create a scatterplot of this data. 2. Find the correlation coefficient.

Describing Distributions With Numbers

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Descriptive Univariate Statistics and Bivariate Correlation

HOLLOMAN S AP STATISTICS BVD CHAPTER 08, PAGE 1 OF 11. Figure 1 - Variation in the Response Variable

appstats27.notebook April 06, 2017

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.

MATH 10 INTRODUCTORY STATISTICS

Transcription:

PS2.1 & 2.2: Linear Correlations PS2: Bivariate Statistics LT1: Basics of Correlation LT2: Measuring Correlation and Line of best fit by eye Univariate (one variable) Displays Frequency tables Bar graphs Histograms Cumulative frequency graphs Box and whisker plots Stem and leaf plots Pie charts Statistics Measures of center Mean Median Mode Measure of spread Range Interquartile range Variance Standard deviation 1

Bivariate Statistics Find how two variables are related Height and weight SAT scores and GPA Sleep and GPA Passing a driving test and gender Which of these are quantitative discrete, quantitative continuous or qualitative? Statistics and displays are different because we want to measure the strength of the association; the techniques vary depending on the type of data Bivariate Statistics: What to determine. Determine the type of variables (categorical, quantitative discrete / continuous). Age and length of hair Eye color and shirt brand choice. Careful not to mix quantitative and qualitative information. For the mixed forms, you would have to run an ANOVA test, which we will not be covering and isn t apart of the IB program. Sometimes students think that it would be 2 variables, but really are only 1. height and hair Color. Race and SAT scores. Determine which variable is the independent variable and which variable is the dependent variable. Independent variable-predictor variable (x-axis) Ex. Age Dependent variable-result of the predictor (y-axis) Ex. Weight 2

IA Talk. Picking a topic 1) When you pick a topic, you need to focus on a question that you want answered. 2) Students tend to come up with questions that are really just talking about 2 variable being tested. Rather, come up with a topic you want to explore and then use the bi-variate stats to help you answer the questions. Example: Tennis and the affects on achievement. Are Blondes happier then brunettes? Picking a topic and a question to ask. All of you should pick a topic and a questions that is relevant to something your interested in. Example: I m interested in records. Maybe I can see if people who listen to vinyl records are more creative than people who listen to digital music. What would my title be? Analog Music and Creativity What would my big questions be? Are people who listen to vinyl more creative? How would I test this? What do you think? Lets move to the IA handouts Writing Guide and Good Ideas vs. Bad Ideas. 3

Correlation: an overview Data - Both variables must be quantitative and continuous* Correlation - Refers to the strength and direction of the relationship between two variables Displays Scatterplots display data on a graph. Statistics - Correlation coefficient, coefficient of determination, and line of best fit help us understand the data *a correlation can be done with quantitative discrete, but gaps appear in the calculations or assumptions are made from the data. Characteristics of Correlation There are several characteristics we consider when describing the correlation between two variables: direction, linearity, strength, outliers, and causation. Direction: Upward = positive correlation. Both independent and dependent variables are increasing. Downward = Negative correlation: As the independent variable increases, the dependent variable decreases. Random: = No correlation. Strength: Strong, moderate and weak (how close are they to a perfect correlation?). 4

Linearity! Sometimes they are non-linear. Look. Which one is non-linear? Which one is linear? Outliers. We observe and investigate any outliers, or isolated points which do not follow the trend formed by the main body of data. If an outlier is the result of a recording or graphing error, it should be discarded. However, if the outlier proves to be a genuine piece of data, it should be kept. Sometimes an outlier shows important information on a demographic or sections of the data that isn t represented. 5

Causation?? Look at this example: There is a strong positive correlation between children s arm length and their running speed. Does this mean children with longer arms can run faster? Correlation does not mean causation! Blatant confounding variable? - Age Another example: In Springfield the number of stray cats where collected, and in Eugene the number of meat items at taco bell were collected over several years and a strong negative correlation was found between the variables. The implication is that as the number of stray cats decreases, the number of meat dishes increases. This could simply mean that over time, one variable increased and the other decreased. Examples of Causal assumptions. Here is one: Detroit has one of the highest arson rates in the nation. It also has one of the most employed fire districts. If we graphed the amount of arsons in surrounding districts and the amount of firefighters employed, we find that there is a strong positive correlation Detroit Total arsons: 957 Total firefighters: 830. Surrounding Districts Total arsons: 207 Total firefighters: 210. What are we saying about arsons and firefighter? Do firefighters cause more arsons? What is the confounding variable? 6

10/19/2017 Pearson s product-moment Correlation Coefficient It is important to get a more precise measure of the strength of linear correlation between two variables. We achieve this using Pearson s product-moment correlation coefficient r. Calculating r can be tricky, but our calculators will help us out quite a bit. For a set of n data given as ordered pairs,,,, (, ),., (, ), Pearson s correlation coefficient is = All values of PPMCC should be 1 1.. Positive numbers are positively correlated, while negative numbers are negatively correlated. The size of r indicates the strength. Pearson's Correlation Coefficient In examinations you are expected to calculate r using technology. Therefore, you will put the data in your calculator and run a linreg(ax+b) test to find. This method is simple and shouldn t be too much of a worry on your paper. However, calculating r using the formula is recommended for the Internal Assessment to get full marks on the mathematic process portion. For us as practice we will only be looking at a few points rather than a multitude of points due to time. The formula is: = ()() You may recognize the bottom. What does this look similar to? The top is called the covariance, which tells us what happens to a specific point compared to the mean. So if we test to see how each point is associated with the mean, we come up with, or a representation of correlation from -1 to 1. 7

Pearson s product-moment Correlation Coefficient The other calculation for Pearson s Correlation Coefficient: = Standard Deviation of x Standard Deviation of y Covariance of x and y. Sometimes, all we need is to find the standard deviation of x and y, with a given covariance, and we can calculate What is the precise scale for r? 1.0 = Perfect correlation.9 -.99 Very strong.7 -.89 Strong.5 -.69 Moderately Strong.5 Moderate.3 -.49 Moderately Weak For both positive and negative r values..1 -.29 Weak.0 -.09 Very Weak 0 Perfectly no correlation 8

Scatterplot Patterns: Name the correlation. Strong, positive Moderately strong, positive Moderately weak, positive No correlation Scatterplot Patterns: Continue to name the correlation. Strong, negative Moderately strong, negative Weak, negative No correlation 9

10/19/2017 You try. Give some examples of variables that would have a positive correlation or a negative correlation. Try to identify the independent variable and the dependent variable. Calculating by hand Using the long equation = ()() example and find the values needed to find r., Lets look at an Example: Daisy investigates how the volume of water in a pot affects the time it takes to boil on the stove. The results are given in the table. Find and interpret Pearson s correlation coefficient between the two variables. The table is on the next slide. Pot Volume (, ) Time to boil ( min) A 1 2 B 2 4 C 4 7 D 6 9 10

10/19/2017 Calculating by hand (use stat edit) = Pot ()() Volume (, ) Time to boil ( min) A 1 2, each portion section is needed to be solved. ( )( ) B 2 4 C 4 7 D 6 9 total Calculating mean of, should be a cinch! = =, = = = Try One On Your Own Period 7 s test scores are as follows as well as their IQ for 5 people. by hand, find what type of correlation there is by interpreting r. Person Score (x) IQ (y) 1 66 124 2 49 126 3 55 130 4 68 168 5 58 101 Total Calculating mean of, should be a cinch! = =, = = ( )( ) = 11

Try One On Your Own Period 7 s test scores are as follows as well as their IQ for 5 people. by hand, find what type of correlation there is by interpreting r. Score Person IQ (y) ( )( ) (x) 1 66 124 6.8-5.8-39.44 46.24 33.64 2 49 126-10.2-3.8 38.76 104.04 14.44 3 55 130-4.2.2 -.84 17.64.04 4 68 168 8.8 38.2 336.16 77.44 1459.2 5 58 101-1.2-28.8 34.56 1.44 829.44 Total 296 649 369.2 246.8 2336.8 = = 59. 2 =, = = 129.8 369.2 246.8 2336.8.4862 Now that you ve done this once or twice; Click Here for something amazing by an amazing person. Start at 1:21 Score IQ (y) ( )( ) (x) 66 124 49 126 55 130 68 168 58 101 31 111 60 199 12

Bug Weight (g) 1 3.84 14 2 2.11 12 3 4.8 14 4 1.95 9 5 3.44 11 Length (mm) Graphing a Scatterplot 1) Enter the data in Stat Edit 2) Press 2 nd Make sure scatterplot is on and are the two lists. (also make sure nothing is in and no other stat plots are on). 3) Press zoom 9 or. 4) There you have a scatterplot! 16 14 12 10 8 6 4 2 Bugs 0 0 1 2 3 4 5 6 Turn on r value 1) 2 nd catalog 2) Press D, and scroll down to diagnostic on, enter, then enter. Finding the r value (Pearson s Correlation Coefficient) 1) Enter the data in Stat Edit 2) Press Stat 3) Press calculate (older = enter). 4) There you have r! : The Coefficient of Determination. To help describe the correlation between two variables, we can also calculate the coefficient of determination r 2. This is simply the square of Pearson s Product moment correlation coefficient, and as such the direction of correlation is eliminated. describes the direction of the correlation and how correlated something is. describes the type of correlation at each point. In other words, it describes the percent in which one variable will follow the correlation. How often will a given variable depend on the other variable? Do not get these confused. The IA specifically states to dock points if students get these mixed up as evidence that the student doesn t know what stands for. 13

Homework 11A #1, 3, 5 11B.1 #2-6 11B.2 #1-3 11B.3 P. 327 #1-4 14