Correlation. Two variables: Which test? Relationship Between Two Numerical Variables. Two variables: Which test? Contingency table Grouped bar graph

Similar documents
1 Inferential Methods for Correlation and Regression Analysis

Regression. Correlation vs. regression. The parameters of linear regression. Regression assumes... Random sample. Y = α + β X.

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Mathematical Notation Math Introduction to Applied Statistics

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

Chapters 5 and 13: REGRESSION AND CORRELATION. Univariate data: x, Bivariate data (x,y).

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

Regression, Inference, and Model Building

Linear Regression Models

Common Large/Small Sample Tests 1/55

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

11 Correlation and Regression

Final Review. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

Correlation and Regression

Dr. Maddah ENMG 617 EM Statistics 11/26/12. Multiple Regression (2) (Chapter 15, Hines)

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.

Statistics Lecture 27. Final review. Administrative Notes. Outline. Experiments. Sampling and Surveys. Administrative Notes

Statistics 20: Final Exam Solutions Summer Session 2007

Simple Regression. Acknowledgement. These slides are based on presentations created and copyrighted by Prof. Daniel Menasce (GMU) CS 700

Class 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Important Formulas. Expectation: E (X) = Σ [X P(X)] = n p q σ = n p q. P(X) = n! X1! X 2! X 3! X k! p X. Chapter 6 The Normal Distribution.

SIMPLE LINEAR REGRESSION AND CORRELATION ANALYSIS

Chapter 1 (Definitions)

S Y Y = ΣY 2 n. Using the above expressions, the correlation coefficient is. r = SXX S Y Y

Assessment and Modeling of Forests. FR 4218 Spring Assignment 1 Solutions

Sample Size Determination (Two or More Samples)

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

(all terms are scalars).the minimization is clearer in sum notation:

Chapter 4 - Summarizing Numerical Data

Describing the Relation between Two Variables

Correlation Regression

Properties and Hypothesis Testing

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

Stat 200 -Testing Summary Page 1

Chapter 7 Student Lecture Notes 7-1

Final Examination Solutions 17/6/2010

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS

Biostatistics for Med Students. Lecture 2

(7 One- and Two-Sample Estimation Problem )

Class 27. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Worksheet 23 ( ) Introduction to Simple Linear Regression (continued)

Chapter 13, Part A Analysis of Variance and Experimental Design

Linear Regression Analysis. Analysis of paired data and using a given value of one variable to predict the value of the other

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

Stat 139 Homework 7 Solutions, Fall 2015

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Open book and notes. 120 minutes. Cover page and six pages of exam. No calculators.

Inferential Statistics. Inference Process. Inferential Statistics and Probability a Holistic Approach. Inference Process.

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.

STP 226 ELEMENTARY STATISTICS

Formulas and Tables for Gerstman

INSTRUCTIONS (A) 1.22 (B) 0.74 (C) 4.93 (D) 1.18 (E) 2.43

- E < p. ˆ p q ˆ E = q ˆ = 1 - p ˆ = sample proportion of x failures in a sample size of n. where. x n sample proportion. population proportion

Chapter 23: Inferences About Means

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Lecture 11 Simple Linear Regression

Chapter 11: Asking and Answering Questions About the Difference of Two Proportions

Interval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ),

Topic 9: Sampling Distributions of Estimators

Read through these prior to coming to the test and follow them when you take your test.

Module 1 Fundamentals in statistics

Investigating the Significance of a Correlation Coefficient using Jackknife Estimates

Statistics. Chapter 10 Two-Sample Tests. Copyright 2013 Pearson Education, Inc. publishing as Prentice Hall. Chap 10-1

Lesson 11: Simple Linear Regression

Computing Confidence Intervals for Sample Data

Sampling Distributions, Z-Tests, Power

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Sampling Error. Chapter 6 Student Lecture Notes 6-1. Business Statistics: A Decision-Making Approach, 6e. Chapter Goals

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

n but for a small sample of the population, the mean is defined as: n 2. For a lognormal distribution, the median equals the mean.

Expectation and Variance of a random variable

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Chapter 2 Descriptive Statistics

z is the upper tail critical value from the normal distribution

NANYANG TECHNOLOGICAL UNIVERSITY SYLLABUS FOR ENTRANCE EXAMINATION FOR INTERNATIONAL STUDENTS AO-LEVEL MATHEMATICS

University of California, Los Angeles Department of Statistics. Hypothesis testing

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

Chapter 20. Comparing Two Proportions. BPS - 5th Ed. Chapter 20 1

[ ] ( ) ( ) [ ] ( ) 1 [ ] [ ] Sums of Random Variables Y = a 1 X 1 + a 2 X 2 + +a n X n The expected value of Y is:

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

5. A formulae page and two tables are provided at the end of Part A of the examination PART A

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2

Bivariate Sample Statistics Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 7

M1 for method for S xy. M1 for method for at least one of S xx or S yy. A1 for at least one of S xy, S xx, S yy correct. M1 for structure of r

BIOSTATISTICS. Lecture 5 Interval Estimations for Mean and Proportion. dr. Petr Nazarov

Exam II Review. CEE 3710 November 15, /16/2017. EXAM II Friday, November 17, in class. Open book and open notes.

Samples from Normal Populations with Known Variances

Statistical Properties of OLS estimators

Tables and Formulas for Sullivan, Fundamentals of Statistics, 2e Pearson Education, Inc.

Logit regression Logit regression

MCT242: Electronic Instrumentation Lecture 2: Instrumentation Definitions

1 Models for Matched Pairs

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Simple Linear Regression

Transcription:

Correlatio Y Two variables: Which test? X Explaatory variable Respose variable Categorical Numerical Categorical Cotigecy table Cotigecy Logistic Grouped bar graph aalysis regressio Mosaic plot Numerical Multiple histograms Correlatio Scatter plot Cumulative frequecy t-test distributios Regressio Y Two variables: Which test? X Explaatory variable Respose variable Categorical Numerical Categorical Cotigecy table Cotigecy Logistic Grouped bar graph aalysis regressio Mosaic plot Numerical Multiple histograms Correlatio Scatter plot Cumulative frequecy t-test distributios Regressio Relatioship Betwee Two Numerical Variables

Relatioship Betwee Two Numerical Variables Correlatio What is the tedecy of two umerical variables to co-vary (chage together)? Correlatio What is the tedecy of two umerical variables to co-vary (chage together)? Correlatio coefficiet r measures the stregth ad directio of the liear associatio betwee two umerical variables Correlatio What is the tedecy of two umerical variables to co-vary (chage together)? Correlatio coefficiet r measures the stregth ad directio of the liear associatio betwee two umerical variables Populatio parameter:! (rho) Sample estimate: r

r = " X ) Y i " Y ( ) " X ) 2 Y i " Y #( ) 2 r = " X ) Y i " Y ( ) " X ) 2 Y i " Y #( ) 2 Sum of squares: X ad Y r = Sum of products " X ) Y i " Y ( ) " X ) 2 Y i " Y #( ) 2 Shortcuts # # X $ ' i Y i " X )( Y i "Y ) = &# X i Y i ) " i=1 % ( $ ' &# X i ) " X ) 2 2 % ( = ) " i=1 2 Sum of squares: X ad Y $ ' &#Y i ) #( Y i "Y ) 2 2 % ( = #( Y i ) " i=1 2

r r Correlatio assumes... r r Radom sample X is ormally distributed with equal variace for all values of Y Y is ormally distributed with equal variace for all values of X Correlatio assumes... Radom sample X is ormally distributed with equal variace for all values of Y Y is ormally distributed with equal variace for all values of X Correlatio coefficiet facts -1 <! < 1; -1 < r < 1 Bivariate ormal distributio

Correlatio coefficiet facts -1 <! < 1; -1 < r < 1 Positive r: variables icrease together Negative r: whe oe variable icreases, the other decreases, ad vice-versa Correlatio coefficiet facts -1 <! < 1; -1 < r < 1 Positive r: variables icrease together Negative r: whe oe variable icreases, the other decreases, ad vice-versa egative ucorrelated positive r = -1 r=0 r = 1 Correlatio coefficiet facts Coefficiet of determiatio = r 2 Describes the proportio of variatio i oe variable that ca be predicted from the other Stadard error of r SE r = 1" r 2 " 2

Cofidece Limits for r # z = 0.5l 1+ r & % ( $ 1" r ' " z = 1 # 3 z " Z 0.05(2) # z $ % $ z + Z 0.05(2) # z Example Are the effects of ew mutatios o matig success ad productivity correlated? Data from Drosophila melaogaster = 31 idividuals z "1.96# z $ % $ z +1.96# z r = e2z "1 e 2z +1 X is productivity, Y is the matig success Sum of products = 2.796 Sum of squares for X = 16.245 Sum of squares for Y = 1.6289 X is productivity, Y is the matig success " X )( Y i "Y ) = 2.796 i=1 " X ) 2 =16.245 i=1 #( Y i "Y ) 2 =1.6289 i=1

r = 2.796 ( 16.245) 1.6289 ( ) = 0.5435 r = 2.796 ( 16.245) ( 1.6289) = 0.5435 SE r = 1" r2 " 2 = 0.7045 29 = 0.1558 Cofidece Limits for r # z = 0.5l 1+ r & # 1+ 0.5435& % ( = 0.5l% ( $ 1" r ' $ 1" 0.5435' z = 0.609 Cofidece Limits for r # z = 0.5l 1+ r & # 1+ 0.5435& % ( = 0.5l% ( $ 1" r ' $ 1" 0.5435' z = 0.609 1 " z = # 3 = 1 31# 3 = 0.189

Cofidece Limits for r # z = 0.5l 1+ r & # 1+ 0.5435& % ( = 0.5l% ( $ 1" r ' $ 1" 0.5435' z = 0.609 1 " z = # 3 = 1 31# 3 = 0.189 z "1.96# z $ % $ z +1.96# z Cofidece Limits for r # z = 0.5l 1+ r & # 1+ 0.5435& % ( = 0.5l% ( $ 1" r ' $ 1" 0.5435' z = 0.609 1 " z = # 3 = 1 31# 3 = 0.189 z "1.96# z $ % $ z +1.96# z 0.609 "1.96 # 0.189 $ % $ 0.609 +1.96 # 0.189 0.239 " # " 0.979 Cofidece Limits for r 0.239 " # " 0.979 r = e2z "1 e 2z +1 Cofidece Limits for r 0.239 " # " 0.979 r = e2z "1 e 2z +1 e 2*0.239 "1 e 2*0.239 +1 # $ # e2*0.979 "1 e 2*0.979 +1 0.235 " # " 0.753

Example: Why Sleep? Example: Why Sleep? 10 experimetal subjects Measured icrease i slow-wave activity durig sleep Measured improvemet i task after sleep - had-eye coordiatio activity Example: Why Sleep? Why sleep? Sum of products: 1127.4 Sum of squares X: 2052.4 Sum of squares Y: 830.9 Calculate a 95% C.I. for!

Hypothesis Testig for Correlatios Ca test hypotheses relatig to correlatios amog variables Closely related to regressio - the topic for ext Tuesday s lecture Hypothesis Testig for Correlatios H 0 :! = 0 H A :! " 0 If! = 0,... r is ormally distributed with mea 0 Example Are the effects of ew mutatios o matig success ad productivity correlated? t = r SE r with df = -2 Data from Drosophila melaogaster

Hypotheses H 0 : Matig success ad productivity are ot related (! = 0) H A : Matig success ad productivity are correlated (! " 0) X is productivity, Y is the matig success Sum of products = 2.796 Sum of squares for X = 16.245 Sum of squares for Y = 1.6289 r = 2.796 ( 16.245) 1.6289 ( ) = 0.5435 r = 2.796 ( 16.245) 1.6289 ( ) = 0.5435 SE r = 1" r2 " 2 = 0.7045 29 = 0.1558 SE r = 1" r 2 " 2 = 0.7045 29 = 0.1558 t = 0.5435 0.1558 = 3.49

df= -2=31-2=29 df= -2=31-2=29 t=3.49 is greater tha t 0.05(2), 29 = 2.045, so we ca reject the ull hypothesis ad say that productivity ad male matig success are correlated across geotypes. Why sleep? Sum of products: 1127.4 Sum of squares X: 2052.4 Sum of squares Y: 830.9 Test for a correlatio differet from zero i these data. Checkig Assumptios for Correlatio Bivariate ormal distributio Relatioship is liear (straight lie) Cloud of poits i scatter plot is circular or elliptical Frequecy distributios of X ad Y are ormal

Liear Relatioship? Maximum correlatio possible Maximum correlatio possible Correlatio of zero

Cloud of poits elliptical? Maximum correlatio possible Correlatio of zero X ad Y ormal? Use usual techiques for both X ad Y separately Be wary of outliers Quick Referece Guide - Correlatio Coefficiet What is it for? Measurig the stregth of a liear associatio betwee two umerical variables What does it assume? Bivariate ormality ad radom samplig Parameter:! Estimate: r Formulae: " X )( Y i " Y ) r = SE " X ) 2 #( Y i " Y ) 2 r = 1" r 2 " 2

Quick Referece Guide - t-test for zero liear correlatio What is it for? To test the ull hypothesis that the populatio parameter,!, is zero What does it assume? Bivariate ormality ad radom samplig Test statistic: t Null distributio: t with -2 degrees of freedom Formulae: t = r SE r Sample Test statistic t = r SE r Reject H o T-test for correlatio compare How uusual is this test statistic? P < 0.05 P > 0.05 Null hypothesis!=0 Null distributio t with -2 d.f. Fail to reject H o