PS2.1 & 2.2: Linear Correlations PS2: Bivariate Statistics LT1: Basics of Correlation LT2: Measuring Correlation and Line of best fit by eye Univariate (one variable) Displays Frequency tables Bar graphs Histograms Cumulative frequency graphs Box and whisker plots Stem and leaf plots Pie charts Statistics Measures of center Mean Median Mode Measure of spread Range Interquartile range Variance Standard deviation 1
Bivariate Statistics Find how two variables are related Height and weight SAT scores and GPA Sleep and GPA Passing a driving test and gender Which of these are quantitative discrete, quantitative continuous or qualitative? Statistics and displays are different because we want to measure the strength of the association; the techniques vary depending on the type of data Bivariate Statistics: What to determine. Determine the type of variables (categorical, quantitative discrete / continuous). Age and length of hair Eye color and shirt brand choice. Careful not to mix quantitative and qualitative information. For the mixed forms, you would have to run an ANOVA test, which we will not be covering and isn t apart of the IB program. Sometimes students think that it would be 2 variables, but really are only 1. height and hair Color. Race and SAT scores. Determine which variable is the independent variable and which variable is the dependent variable. Independent variable-predictor variable (x-axis) Ex. Age Dependent variable-result of the predictor (y-axis) Ex. Weight 2
IA Talk. Picking a topic 1) When you pick a topic, you need to focus on a question that you want answered. 2) Students tend to come up with questions that are really just talking about 2 variable being tested. Rather, come up with a topic you want to explore and then use the bi-variate stats to help you answer the questions. Example: Tennis and the affects on achievement. Are Blondes happier then brunettes? Picking a topic and a question to ask. All of you should pick a topic and a questions that is relevant to something your interested in. Example: I m interested in records. Maybe I can see if people who listen to vinyl records are more creative than people who listen to digital music. What would my title be? Analog Music and Creativity What would my big questions be? Are people who listen to vinyl more creative? How would I test this? What do you think? Lets move to the IA handouts Writing Guide and Good Ideas vs. Bad Ideas. 3
Correlation: an overview Data - Both variables must be quantitative and continuous* Correlation - Refers to the strength and direction of the relationship between two variables Displays Scatterplots display data on a graph. Statistics - Correlation coefficient, coefficient of determination, and line of best fit help us understand the data *a correlation can be done with quantitative discrete, but gaps appear in the calculations or assumptions are made from the data. Characteristics of Correlation There are several characteristics we consider when describing the correlation between two variables: direction, linearity, strength, outliers, and causation. Direction: Upward = positive correlation. Both independent and dependent variables are increasing. Downward = Negative correlation: As the independent variable increases, the dependent variable decreases. Random: = No correlation. Strength: Strong, moderate and weak (how close are they to a perfect correlation?). 4
Linearity! Sometimes they are non-linear. Look. Which one is non-linear? Which one is linear? Outliers. We observe and investigate any outliers, or isolated points which do not follow the trend formed by the main body of data. If an outlier is the result of a recording or graphing error, it should be discarded. However, if the outlier proves to be a genuine piece of data, it should be kept. Sometimes an outlier shows important information on a demographic or sections of the data that isn t represented. 5
Causation?? Look at this example: There is a strong positive correlation between children s arm length and their running speed. Does this mean children with longer arms can run faster? Correlation does not mean causation! Blatant confounding variable? - Age Another example: In Springfield the number of stray cats where collected, and in Eugene the number of meat items at taco bell were collected over several years and a strong negative correlation was found between the variables. The implication is that as the number of stray cats decreases, the number of meat dishes increases. This could simply mean that over time, one variable increased and the other decreased. Examples of Causal assumptions. Here is one: Detroit has one of the highest arson rates in the nation. It also has one of the most employed fire districts. If we graphed the amount of arsons in surrounding districts and the amount of firefighters employed, we find that there is a strong positive correlation Detroit Total arsons: 957 Total firefighters: 830. Surrounding Districts Total arsons: 207 Total firefighters: 210. What are we saying about arsons and firefighter? Do firefighters cause more arsons? What is the confounding variable? 6
10/19/2017 Pearson s product-moment Correlation Coefficient It is important to get a more precise measure of the strength of linear correlation between two variables. We achieve this using Pearson s product-moment correlation coefficient r. Calculating r can be tricky, but our calculators will help us out quite a bit. For a set of n data given as ordered pairs,,,, (, ),., (, ), Pearson s correlation coefficient is = All values of PPMCC should be 1 1.. Positive numbers are positively correlated, while negative numbers are negatively correlated. The size of r indicates the strength. Pearson's Correlation Coefficient In examinations you are expected to calculate r using technology. Therefore, you will put the data in your calculator and run a linreg(ax+b) test to find. This method is simple and shouldn t be too much of a worry on your paper. However, calculating r using the formula is recommended for the Internal Assessment to get full marks on the mathematic process portion. For us as practice we will only be looking at a few points rather than a multitude of points due to time. The formula is: = ()() You may recognize the bottom. What does this look similar to? The top is called the covariance, which tells us what happens to a specific point compared to the mean. So if we test to see how each point is associated with the mean, we come up with, or a representation of correlation from -1 to 1. 7
Pearson s product-moment Correlation Coefficient The other calculation for Pearson s Correlation Coefficient: = Standard Deviation of x Standard Deviation of y Covariance of x and y. Sometimes, all we need is to find the standard deviation of x and y, with a given covariance, and we can calculate What is the precise scale for r? 1.0 = Perfect correlation.9 -.99 Very strong.7 -.89 Strong.5 -.69 Moderately Strong.5 Moderate.3 -.49 Moderately Weak For both positive and negative r values..1 -.29 Weak.0 -.09 Very Weak 0 Perfectly no correlation 8
Scatterplot Patterns: Name the correlation. Strong, positive Moderately strong, positive Moderately weak, positive No correlation Scatterplot Patterns: Continue to name the correlation. Strong, negative Moderately strong, negative Weak, negative No correlation 9
10/19/2017 You try. Give some examples of variables that would have a positive correlation or a negative correlation. Try to identify the independent variable and the dependent variable. Calculating by hand Using the long equation = ()() example and find the values needed to find r., Lets look at an Example: Daisy investigates how the volume of water in a pot affects the time it takes to boil on the stove. The results are given in the table. Find and interpret Pearson s correlation coefficient between the two variables. The table is on the next slide. Pot Volume (, ) Time to boil ( min) A 1 2 B 2 4 C 4 7 D 6 9 10
10/19/2017 Calculating by hand (use stat edit) = Pot ()() Volume (, ) Time to boil ( min) A 1 2, each portion section is needed to be solved. ( )( ) B 2 4 C 4 7 D 6 9 total Calculating mean of, should be a cinch! = =, = = = Try One On Your Own Period 7 s test scores are as follows as well as their IQ for 5 people. by hand, find what type of correlation there is by interpreting r. Person Score (x) IQ (y) 1 66 124 2 49 126 3 55 130 4 68 168 5 58 101 Total Calculating mean of, should be a cinch! = =, = = ( )( ) = 11
Try One On Your Own Period 7 s test scores are as follows as well as their IQ for 5 people. by hand, find what type of correlation there is by interpreting r. Score Person IQ (y) ( )( ) (x) 1 66 124 6.8-5.8-39.44 46.24 33.64 2 49 126-10.2-3.8 38.76 104.04 14.44 3 55 130-4.2.2 -.84 17.64.04 4 68 168 8.8 38.2 336.16 77.44 1459.2 5 58 101-1.2-28.8 34.56 1.44 829.44 Total 296 649 369.2 246.8 2336.8 = = 59. 2 =, = = 129.8 369.2 246.8 2336.8.4862 Now that you ve done this once or twice; Click Here for something amazing by an amazing person. Start at 1:21 Score IQ (y) ( )( ) (x) 66 124 49 126 55 130 68 168 58 101 31 111 60 199 12
Bug Weight (g) 1 3.84 14 2 2.11 12 3 4.8 14 4 1.95 9 5 3.44 11 Length (mm) Graphing a Scatterplot 1) Enter the data in Stat Edit 2) Press 2 nd Make sure scatterplot is on and are the two lists. (also make sure nothing is in and no other stat plots are on). 3) Press zoom 9 or. 4) There you have a scatterplot! 16 14 12 10 8 6 4 2 Bugs 0 0 1 2 3 4 5 6 Turn on r value 1) 2 nd catalog 2) Press D, and scroll down to diagnostic on, enter, then enter. Finding the r value (Pearson s Correlation Coefficient) 1) Enter the data in Stat Edit 2) Press Stat 3) Press calculate (older = enter). 4) There you have r! : The Coefficient of Determination. To help describe the correlation between two variables, we can also calculate the coefficient of determination r 2. This is simply the square of Pearson s Product moment correlation coefficient, and as such the direction of correlation is eliminated. describes the direction of the correlation and how correlated something is. describes the type of correlation at each point. In other words, it describes the percent in which one variable will follow the correlation. How often will a given variable depend on the other variable? Do not get these confused. The IA specifically states to dock points if students get these mixed up as evidence that the student doesn t know what stands for. 13
Homework 11A #1, 3, 5 11B.1 #2-6 11B.2 #1-3 11B.3 P. 327 #1-4 14