Paired Data and Linear Correlation

Similar documents
Worksheet 23 ( ) Introduction to Simple Linear Regression (continued)

II. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

Regression, Inference, and Model Building

Linear Regression Analysis. Analysis of paired data and using a given value of one variable to predict the value of the other

P1 Chapter 8 :: Binomial Expansion

Chapters 5 and 13: REGRESSION AND CORRELATION. Univariate data: x, Bivariate data (x,y).

Linear Regression Demystified

Simple Linear Regression

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

STP 226 EXAMPLE EXAM #1

11 Correlation and Regression

Linear Regression Models

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Example: Find the SD of the set {x j } = {2, 4, 5, 8, 5, 11, 7}.

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Topic 9: Sampling Distributions of Estimators

Exam II Covers. STA 291 Lecture 19. Exam II Next Tuesday 5-7pm Memorial Hall (Same place as exam I) Makeup Exam 7:15pm 9:15pm Location CB 234

MATHEMATICS Paper 2 22 nd September 20. Answer Papers List of Formulae (MF15)

1 Inferential Methods for Correlation and Regression Analysis

Inverse Matrix. A meaning that matrix B is an inverse of matrix A.

Regression, Part I. A) Correlation describes the relationship between two variables, where neither is independent or a predictor.

Random Variables, Sampling and Estimation

Chapter 12 Correlation

Economics 250 Assignment 1 Suggested Answers. 1. We have the following data set on the lengths (in minutes) of a sample of long-distance phone calls

Assessment and Modeling of Forests. FR 4218 Spring Assignment 1 Solutions

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

BHW #13 1/ Cooper. ENGR 323 Probabilistic Analysis Beautiful Homework # 13

CHAPTER 2. Mean This is the usual arithmetic mean or average and is equal to the sum of the measurements divided by number of measurements.

ECON 3150/4150, Spring term Lecture 3

Confidence Intervals for the Population Proportion p

SIMPLE LINEAR REGRESSION AND CORRELATION ANALYSIS

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,

REGRESSION (Physics 1210 Notes, Partial Modified Appendix A)

6.3 Testing Series With Positive Terms

Chapter 8: Estimating with Confidence

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics

S Y Y = ΣY 2 n. Using the above expressions, the correlation coefficient is. r = SXX S Y Y

Statistical Properties of OLS estimators

Infinite Sequences and Series

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

Data Analysis and Statistical Methods Statistics 651

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

Topic 9: Sampling Distributions of Estimators

PRACTICE PROBLEMS FOR THE FINAL

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

NUMERICAL METHODS FOR SOLVING EQUATIONS

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

Chapter 20. Comparing Two Proportions. BPS - 5th Ed. Chapter 20 1

Chapter 23: Inferences About Means

Zeros of Polynomials

Northwest High School s Algebra 2/Honors Algebra 2 Summer Review Packet

Summary: CORRELATION & LINEAR REGRESSION. GC. Students are advised to refer to lecture notes for the GC operations to obtain scatter diagram.

Math 155 (Lecture 3)

Sample Size Determination (Two or More Samples)

Bivariate Sample Statistics Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 7

Polynomial Functions and Their Graphs

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

The picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled

We will conclude the chapter with the study a few methods and techniques which are useful

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Chapter 6. Sampling and Estimation

Topic 9: Sampling Distributions of Estimators

NUMERICAL METHODS COURSEWORK INFORMAL NOTES ON NUMERICAL INTEGRATION COURSEWORK

Open book and notes. 120 minutes. Cover page and six pages of exam. No calculators.

DeBakey High School For Health Professions Mathematics Department. Summer review assignment for rising sophomores who will take Algebra 2

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

A Statistical hypothesis is a conjecture about a population parameter. This conjecture may or may not be true. The null hypothesis, symbolized by H

ANALYSIS OF EXPERIMENTAL ERRORS

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

Algebra of Least Squares

MAT1026 Calculus II Basic Convergence Tests for Series

Estimation for Complete Data

MATH CALCULUS II Objectives and Notes for Test 4

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Sequences I. Chapter Introduction

Parameter, Statistic and Random Samples

Mathematical Notation Math Introduction to Applied Statistics

Chapter If n is odd, the median is the exact middle number If n is even, the median is the average of the two middle numbers

Correlation and Regression

September 2012 C1 Note. C1 Notes (Edexcel) Copyright - For AS, A2 notes and IGCSE / GCSE worksheets 1

Eton Education Centre JC 1 (2010) Consolidation quiz on Normal distribution By Wee WS (wenshih.wordpress.com) [ For SAJC group of students ]

HYPOTHESIS TESTS FOR ONE POPULATION MEAN WORKSHEET MTH 1210, FALL 2018

- E < p. ˆ p q ˆ E = q ˆ = 1 - p ˆ = sample proportion of x failures in a sample size of n. where. x n sample proportion. population proportion

CHAPTER I: Vector Spaces

Central Limit Theorem the Meaning and the Usage

n m CHAPTER 3 RATIONAL EXPONENTS AND RADICAL FUNCTIONS 3-1 Evaluate n th Roots and Use Rational Exponents Real nth Roots of a n th Root of a

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

n but for a small sample of the population, the mean is defined as: n 2. For a lognormal distribution, the median equals the mean.

3 Resampling Methods: The Jackknife

MA Advanced Econometrics: Properties of Least Squares Estimators

This chapter focuses on two experimental designs that are crucial to comparative studies: (1) independent samples and (2) matched pair samples.

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

Chapter 2 The Monte Carlo Method

STP 226 ELEMENTARY STATISTICS


PRACTICE PROBLEMS FOR THE FINAL

Transcription:

Paired Data ad Liear Correlatio Example. A group of calculus studets has take two quizzes. These are their scores: Studet st Quiz Score ( data) d Quiz Score ( data) 7 5 5 0 3 0 3 4 0 5 5 5 5 6 0 8 7 0 0 8 5 7 9 8 0 0 5 0 Graphig scores i rectagular coordiate system yields the followig graph (the so called scatter diagram)

There is obvious relatio betwee the scores o the quizzes. Better scores o the first quiz seem to imply better scores o the secod quiz, worse scores o the first quiz imply worse scores o the secod quiz. We measure this correlatio with a Pearso (Liear Product Momet) Coefficiet of Correlatio. Pearso Coefficiet of Correlatio Calculatio Formulas (populatio of size ) r zxz y x y xy r xy (.) x y xy 7 5 55 5 0 300 0 3 0 0 5 50 5 5 5 0 8 80 0 0 400 5 7 55 8 0 80 5 0 50 x 5 y 33 xy 795 795.5 3.3 r 0 0.89. 5.74895.778

I case we are dealig with sample (of size ) ad usig sample parameters the usig s yields the followig r xy xy ss xy. s s r xy s s (.) ote that we are gettig the same umber oly usig differet formulas. Here is oe more formula ofte used as well, give by our textbook, r s s xy xy x x xy x y y x x y y y r xy x y x x y y (.3) The easiest oe to use is probably (.) but there you eed to calculate stadard deviatios for both ad before you use the formula. The formula (.3) is useful if we

do ot have to calculate stadard deviatios but you eed to calculate squares of the data poits. This is the formula we are goig to use most ofte. Let s calculate Pearso Correlatio Coefficiet for our example usig formulas (.) ad (.3). We eed slightly expaded table, x y xy x y 7 5 55 89 5 5 0 300 5 400 0 3 0 0 9 0 5 50 00 5 5 5 5 5 5 0 8 80 00 64 0 0 400 400 400 5 7 55 5 89 8 0 80 64 00 5 0 50 5 00 x 5 y 33 xy 795 x 653 y 037 Usig formula (.): r xy 795 0.5 3.3 s s 9 0.89 6.059886 5.45790 Ad oe more time usig formula (.3): r xy x y x x y y 0795 533 0653 5 0037 33 0.89 Example. Istead of the quiz # we use the shoe size of studet, here is the data summarizes i the table,

Studet Studet s Shoe Size ( data) d Quiz Score ( data) 0.5 5 0 0 3 3 4 7 5 5 0 5 6 8.5 8 7 8.5 0 8 7 9 9 0 0 8 0 Here is the scatter diagram, Diagram does ot idicate ay coectio betwee a shoe size ad quiz # score. We should see this from Pearso correlatio coefficiet as well.

Calculatio shows x y xy x y 0.5 5 57.5 0.5 5 0 0 00 00 400 3 33 9 7 5 05 49 5 0 5 50 00 5 8.5 8 68 7.5 64 8.5 0 70 7.5 400 7 04 44 89 9 0 90 8 00 8 0 80 64 00 x 94.5 y 33 xy 57.5 x 94 y 037 Usig formula (.): xy 57.5 0 9.45 3.3 r 9 0.0087. s s.5755.4579 Usig formula (.3): r xy x y x x y y 057.5 94.533 094 94.5 0037 33 0.00867. Both approximatios rouded off to the te-thousadth give 0.0087. This is a very small umber ad idicates that liear correlatio betwee ad is isigificat. Example 3. This is a radom sample for a Bosto Park muggig problem. The sample of paired data has bee take over radomly chose 0 days i the summer of 000. The first etry i

the pair is the umber of police officers o duty i the park, the secod etry is the umber of muggigs reported o that day. Days Police officers o duty i the park umber of reported muggis 0 5 5 3 6 4 9 5 4 7 6 6 8 7 8 8 5 9 4 3 0 7 6 Diagram idicates sigificat liear correlatio. We would say the correlatio is egative meaig the fittig lie is decreasig (more police officers associated with less muggigs, less police officers associated with more muggigs).

Correlatio coefficiet calculatio is easy, here usig formula (.3) that does ot require calculatio of stadard deviatio. x y xy y x 0 5 50 00 5 5 30 5 4 6 6 56 9 9 8 4 7 8 6 49 6 8 48 36 64 8 8 34 5 60 44 5 4 3 4 96 9 7 6 4 49 36 03 47 343 347 95 r xy x y x x y y 03430347 4 0.969. 86 74 0347 03 095 47 Importat otes about Pearso Correlatio Coefficiet (discussed o lecture!) ***. Correlatio Coefficiet is used to determie liear correlatio; it caot idicate ay other type of correlatio. o-liear correlatio is ot measured or expressed by correlatio coefficiet.. Correlatio Coefficiet is a umber betwee ad. Whe close to the the correlatio is egative liear, whe close to the correlatio is positive liear. Whe CC is close to 0 the there is o sigificat liear correlatio betwee ad. 3. Positive or egative liear correlatio does ot mea causatio. Homework: Check olie.

I derivatio of formula (.) we used Appedix ote for Formula (.) ad More! r x y xy zxz y. The last equatio follows this way: x y x y xy y x y x xy xy. There is aother popular formula for Pearso Correlatio Coefficiet that follows from secod formula above, r x y x y x x y y x y x x y y. That is this oe: r x x y y x x y y (.4)