Biostatistics: Correlations

Similar documents
Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Important note: Transcripts are not substitutes for textbook assignments. 1

Statistics Introductory Correlation

One-sample categorical data: approximate inference

Wed, June 26, (Lecture 8-2). Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in SPSS.

Chapter 14. Statistical versus Deterministic Relationships. Distance versus Speed. Describing Relationships: Scatterplots and Correlation

AP Statistics. Chapter 6 Scatterplots, Association, and Correlation

Relationships between variables. Visualizing Bivariate Distributions: Scatter Plots

t-test for b Copyright 2000 Tom Malloy. All rights reserved. Regression

Chapter 14. From Randomness to Probability. Copyright 2012, 2008, 2005 Pearson Education, Inc.

AMS 7 Correlation and Regression Lecture 8

Business Statistics. Lecture 10: Correlation and Linear Regression

determine whether or not this relationship is.

MATH 1070 Introductory Statistics Lecture notes Relationships: Correlation and Simple Regression

Regression Analysis: Basic Concepts

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

Finding Limits Graphically and Numerically

Confidence Intervals. - simply, an interval for which we have a certain confidence.

LECTURE 15: SIMPLE LINEAR REGRESSION I

REVIEW 8/2/2017 陈芳华东师大英语系

Intermediate Algebra. Gregg Waterman Oregon Institute of Technology

Chapter 1 Review of Equations and Inequalities

Section 1.1. Chapter 1. Quadratics. Parabolas. Example. Example. ( ) = ax 2 + bx + c -2-1

Review of Statistics 101

DIRECTED NUMBERS ADDING AND SUBTRACTING DIRECTED NUMBERS

2.2 Graphs of Functions

ABE Math Review Package

Relationships between variables. Association Examples: Smoking is associated with heart disease. Weight is associated with height.

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

MITOCW ocw f99-lec17_300k

Correlation measures the strength of the relationship between 2 variables.

CRP 272 Introduction To Regression Analysis

The Celsius temperature scale is based on the freezing point and the boiling point of water. 12 degrees Celsius below zero would be written as

6: Polynomials and Polynomial Functions

SECTION 2.3: LONG AND SYNTHETIC POLYNOMIAL DIVISION

Intermediate Algebra. Gregg Waterman Oregon Institute of Technology

Scientific Notation. Chemistry Honors

MITOCW R11. Double Pendulum System

Chapter 19 Sir Migo Mendoza

( )( b + c) = ab + ac, but it can also be ( )( a) = ba + ca. Let s use the distributive property on a couple of

MITOCW watch?v=ztnnigvy5iq

MATH240: Linear Algebra Review for exam #1 6/10/2015 Page 1

Lecture 11: Extrema. Nathan Pflueger. 2 October 2013

The following are generally referred to as the laws or rules of exponents. x a x b = x a+b (5.1) 1 x b a (5.2) (x a ) b = x ab (5.

MITOCW MITRES18_005S10_DerivOfSinXCosX_300k_512kb-mp4

An Invitation to Mathematics Prof. Sankaran Vishwanath Institute of Mathematical Science, Chennai. Unit - I Polynomials Lecture 1B Long Division

5:1LEC - BETWEEN-S FACTORIAL ANOVA

Linear Programming and its Extensions Prof. Prabha Shrama Department of Mathematics and Statistics Indian Institute of Technology, Kanpur

MITOCW ocw f99-lec05_300k

We will work with two important rules for radicals. We will write them for square roots but they work for any root (cube root, fourth root, etc.).

CORRELATION AND REGRESSION

CONTENTS Page Rounding 3 Addition 4 Subtraction 6 Multiplication 7 Division 10 Order of operations (BODMAS)

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.

MITOCW ocw f99-lec23_300k

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Statistics Primer. A Brief Overview of Basic Statistical and Probability Principles. Essential Statistics for Data Analysts Using Excel

Topic 10 - Linear Regression

32. SOLVING LINEAR EQUATIONS IN ONE VARIABLE

5.2 Polynomial Operations

SCATTERPLOTS. We can talk about the correlation or relationship or association between two variables and mean the same thing.

Statistics 1. Edexcel Notes S1. Mathematical Model. A mathematical model is a simplification of a real world problem.

Correlation. Patrick Breheny. November 15. Descriptive statistics Inference Summary

Continuity and One-Sided Limits

A constant is a value that is always the same. (This means that the value is constant / unchanging). o

LOOKING FOR RELATIONSHIPS

Modern Algebra Prof. Manindra Agrawal Department of Computer Science and Engineering Indian Institute of Technology, Kanpur

THE PEARSON CORRELATION COEFFICIENT

Section 4.6 Negative Exponents

Discrete Random Variables

Lecture 3. The Population Variance. The population variance, denoted σ 2, is the sum. of the squared deviations about the population

Chapter 19: Logistic regression

Everything is not normal

INTRODUCTION TO ANALYSIS OF VARIANCE

Introduction. So, why did I even bother to write this?

Implicit Differentiation Applying Implicit Differentiation Applying Implicit Differentiation Page [1 of 5]

MITOCW ocw f99-lec30_300k

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

8/28/2017. Both examine linear (straight line) relationships Correlation works with a pair of scores One score on each of two variables (X and Y)

MITOCW ocw-18_02-f07-lec02_220k

Pre-calculus is the stepping stone for Calculus. It s the final hurdle after all those years of

6-3 Solving Systems by Elimination

Chapter 13 - Inverse Functions

Design and Optimization of Energy Systems Prof. C. Balaji Department of Mechanical Engineering Indian Institute of Technology, Madras

Math 1320, Section 10 Quiz IV Solutions 20 Points

CENTRAL LIMIT THEOREM (CLT)

Data files for today. CourseEvalua2on2.sav pontokprediktorok.sav Happiness.sav Ca;erplot.sav

Real Analysis Prof. S.H. Kulkarni Department of Mathematics Indian Institute of Technology, Madras. Lecture - 13 Conditional Convergence

Upon completion of this chapter, you should be able to:

Alex s Guide to Word Problems and Linear Equations Following Glencoe Algebra 1

IOP2601. Some notes on basic mathematical calculations

Self-Directed Course: Transitional Math Module 4: Algebra

Algebra & Trig Review

The Simple Linear Regression Model

Chapter 6 Scatterplots, Association and Correlation

Causal Inference. Prediction and causation are very different. Typical questions are:

Associative Property. The word "associative" comes from "associate" or "group; the Associative Property is the rule that refers to grouping.

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels.

Correlation. January 11, 2018

Chapter 3 ALGEBRA. Overview. Algebra. 3.1 Linear Equations and Applications 3.2 More Linear Equations 3.3 Equations with Exponents. Section 3.

GCSE AQA Mathematics. Numbers

Transcription:

Biostatistics: s One of the most common errors we find in the press is the confusion between correlation and causation in scientific and health-related studies. In theory, these are easy to distinguish an action or occurrence can cause another (such as smoking causes lung cancer), or it can correlate with another (such as smoking is correlated with alcoholism). If one action causes another, then they are most certainly correlated. But just because two things occur together does not mean that one caused the other, even if it seems to make sense. One way to get a general idea about whether or not two variables are related is to plot them on a scatterplot. If the dots on the scatterplot tend to go from the lower left to the upper right it means that as one variable goes up the other variable tends to go up also. This is a called a direct (or positive) relationship. On the other hand, if the dots on the scatterplot tend to go from the upper left corner to the lower right corner of the scatterplot, it means that as values on one variable go up values on the other variable go down. This is called an indirect (or negative) relationship. s between X and Y Weak Positive Strong Positive No Weak Negative Strong Negative No linear Spurious s: http://www.huffingtonpost.com/2014/05/15/spurious-correlations-graphs-make-no-sense-video_n_5325835.html Example 1: Is there a direct or indirect correlation? Is it fair to conclude there is a causal relationship?

Example 2: Is there a direct or indirect correlation? Is it fair to conclude there is a causal relationship? Example 3: Is there a direct or indirect correlation? Is it fair to conclude there is a causal relationship? How can one best determine if there is a causal relationship between two variables? The most effective way of doing this is through a controlled study. In a controlled study, two groups of people who are comparable in almost every way are given two different sets of experiences (such one group watching soap operas and the other game shows), and the outcome is compared. If the two groups have substantially different outcomes, then the different experiences may have caused the different outcome. Biostatistics: Coefficient (r) A really smart guy named Karl Pearson figured out how to calculate a summary number that allows you to answer the question How strong is the relationship of a correlation? In honor of his genius, the statistic was named after him. It is called Pearson s Coefficient (r).

Step 1: calculate and fill in the X 2 and Y 2 values Step 2: multiply each X score by its paired Y score which will give you the cross-products of X and Y. Step 3: fill in the last row of the table which contains all of you Sum Of statements. In other words, just add up all of the X scores to get the ΣX, all of the X 2 scores to get the Σ X 2 and etc. Step 4: Enter the numbers you have calculated in the spaces where they should go in the formula. Step 5: Multiply the (ΣX)( ΣY) in the numerator (the top part of the formula) and do the squaring to (ΣX) 2 and (ΣY) 2 in the denominator (the bottom part of the formula).

Step 6: Do the division by n parts in the formula. Step 7: Do the subtraction parts of the formula. Step 8: Multiply the numbers in the denominator. Step 9: Take the square root of the denominator. Step 10: Take the last step and divide the numerator by the denominator and you will get the Coefficient! What Good Is A Coefficient? As can see above, we just did a whole lot of calculating just to end up with a single number. How ridiculous is that? Seems kind of like a waste of time, huh? Well, guess again! It is actually very cool! ( Yeah, right! you say, but let me explain.) Important Things Coefficients Tell You 1. : If your correlation coefficient is a negative number you can tell, just by looking at it, that there is an indirect, negative relationship between the two variables. As you may recall, a negative relationship means that as values on one variable increase (go up) the values on the other variable tend to decrease (go down) in a predictable manner. If your correlation coefficient is a positive number, then you know that you have a direct, positive relationship. This means that as one variable increases (or decreases) the values of the other variable tend to go in the same direction. If one increases, so does the other. If one decreases, so does the other in a predictable manner. 2. : a. A correlation coefficient of -1.00 tells you that there is a perfect negative relationship between the two variables. This means that as values on one variable increase there is a perfectly predictable decrease in values on the other variable. In other words, as one variable goes up, the other goes in the opposite direction (it goes down). b. A correlation coefficient of +1.00 tells you that there is a perfect positive relationship between the two variables. This means that as values on one variable increase there is a perfectly predictable increase in values on the other variable. In other words, as one variable goes up so does the other. c. A correlation coefficient of 0.00 tells you that there is a zero correlation, or no relationship, between the two variables. In other words, as one variable changes (goes up or down) you can t really say anything about what happens to the other variable.

3. a. Most correlation coefficients (assuming there really is a relationship between the two variables you are examining) tend to be somewhat lower than plus or minus 1.00 (meaning that they are not perfect relationships) but are somewhat above 0.00. Remember that a correlation coefficient of 0.00 means that there is no relationship between your two variables based on the data you are looking at. b. The closer a correlation coefficient is to 0.00, the weaker the relationship is and the less able you are to tell exactly what happens to one variable based on knowledge of the other variable. The closer a correlation coefficient approaches plus or minus 1.00 the stronger the relationship is and the more accurately you are able to predict what happens to one variable based on the knowledge you have of the other variable. Making Statistical Inferences from Pearson s r. How do you determine whether or not your correlation is simply a chance occurrence or if it really is true of the population? You will need three things in order to determine whether you can infer that the relationship you found in your sample also is true (in other words, is generalizable in the larger population: 1. The Coefficient that you calculated 2. Something called the degrees of freedom which is simply the number of pairs of data in your sample minus 2. DF =

3. A table of Critical Values of the correlation coefficient. Pearson s R Critical Values We ll always use the 0.05 level of significance The first thing you need to do is look down the degrees of freedom column until you see the row with the number of degrees of freedom that matches your sample degrees of freedom. Look across to the number listed under.05. This number is called the critical value of r. Critical r = SO WHAT? If your calculated r value is the number in the table, you conclude that the correlation is If your calculated r value is than the number in the table, you conclude that the correlation is