Biostatistics. Chapter 11 Simple Linear Correlation and Regression. Jing Li

Similar documents
Statistics for Economics & Business

Statistics MINITAB - Lab 2

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

STATISTICS QUESTIONS. Step by Step Solutions.

Basic Business Statistics, 10/e

Statistics for Business and Economics

Chapter 9: Statistical Inference and the Relationship between Two Variables

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

Lecture 3 Stat102, Spring 2007

Chapter 14 Simple Linear Regression

Lecture 6: Introduction to Linear Regression

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

Introduction to Regression

The Ordinary Least Squares (OLS) Estimator

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

e i is a random error

Chapter 11: Simple Linear Regression and Correlation

/ n ) are compared. The logic is: if the two

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

Scatter Plot x

17 - LINEAR REGRESSION II

β0 + β1xi. You are interested in estimating the unknown parameters β

Chapter 13: Multiple Regression

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

STAT 3008 Applied Regression Analysis

Comparison of Regression Lines

β0 + β1xi. You are interested in estimating the unknown parameters β

Economics 130. Lecture 4 Simple Linear Regression Continued

18. SIMPLE LINEAR REGRESSION III

Learning Objectives for Chapter 11

Linear correlation and linear regression

Properties of Least Squares

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

28. SIMPLE LINEAR REGRESSION III

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

Chapter 3. Two-Variable Regression Model: The Problem of Estimation

Linear Regression Analysis: Terminology and Notation

Biostatistics 360 F&t Tests and Intervals in Regression 1

SIMPLE LINEAR REGRESSION

β0 + β1xi and want to estimate the unknown

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore

Chapter 15 Student Lecture Notes 15-1

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Regression Analysis. Regression Analysis

Midterm Examination. Regression and Forecasting Models

ISQS 6348 Final Open notes, no books. Points out of 100 in parentheses. Y 1 ε 2

Topic 7: Analysis of Variance

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Negative Binomial Regression

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

STAT 3340 Assignment 1 solutions. 1. Find the equation of the line which passes through the points (1,1) and (4,5).

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Chapter 15 - Multiple Regression

Reminder: Nested models. Lecture 9: Interactions, Quadratic terms and Splines. Effect Modification. Model 1

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

Chapter 12 Analysis of Covariance

Sociology 301. Bivariate Regression. Clarification. Regression. Liying Luo Last exam (Exam #4) is on May 17, in class.

x i1 =1 for all i (the constant ).

Modeling and Simulation NETW 707

Measuring the Strength of Association

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

Linear Correlation. Many research issues are pursued with nonexperimental studies that seek to establish relationships among 2 or more variables

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting.

T E C O L O T E R E S E A R C H, I N C.

Chapter 14 Simple Linear Regression Page 1. Introduction to regression analysis 14-2

Lecture 16 Statistical Analysis in Biomaterials Research (Part II)

PubH 7405: REGRESSION ANALYSIS. SLR: INFERENCES, Part II

Correlation and Regression

Activity #13: Simple Linear Regression. actgpa.sav; beer.sav;

is the calculated value of the dependent variable at point i. The best parameters have values that minimize the squares of the errors

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis

Sociology 470. Bivariate Regression. Extra Points. Regression. Liying Luo Job talk on Thursday 11/3 at Pond 302

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

JAB Chain. Long-tail claims development. ASTIN - September 2005 B.Verdier A. Klinger

Regression. The Simple Linear Regression Model

III. Econometric Methodology Regression Analysis

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

However, since P is a symmetric idempotent matrix, of P are either 0 or 1 [Eigen-values

Chapter 7 Generalized and Weighted Least Squares Estimation. In this method, the deviation between the observed and expected values of

Topic- 11 The Analysis of Variance

Diagnostics in Poisson Regression. Models - Residual Analysis

UNIVERSITY OF TORONTO. Faculty of Arts and Science JUNE EXAMINATIONS STA 302 H1F / STA 1001 H1F Duration - 3 hours Aids Allowed: Calculator

Chapter 4: Regression With One Regressor

The SAS program I used to obtain the analyses for my answers is given below.

Chapter 10. What is Regression Analysis? Simple Linear Regression Analysis. Examples

This column is a continuation of our previous column

Andreas C. Drichoutis Agriculural University of Athens. Abstract

Chapter 3 Describing Data Using Numerical Measures

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

CORRELATION AND REGRESSION

4.3 Poisson Regression

Rockefeller College University at Albany

Transcription:

Bostatstcs Chapter 11 Smple Lnear Correlaton and Regresson Jng L jng.l@sjtu.edu.cn http://cbb.sjtu.edu.cn/~jngl/courses/2018fall/b372/ Dept of Bonformatcs & Bostatstcs, SJTU

Recall eat chocolate

Cell 175, 347 359, October 4, 2018

Cell 175, 347 359, October 4, 2018 Comparson of frequency of the nonreference allele between the NIPT estmatons (CMDB) and Han Chnese estmatons n the 1000 genomes project (CHN)

Covarance ( ) 1 ) )( ( ), ( cov 1 = å = n Y y X x y x n Covarance s 2 =Var(x) =E(xµ) 2 Varance ( = = å ) ) a measure of how much two random varables change together

Interpretng Covarance cov(x,y) > 0 cov(x,y) < 0 cov(x,y) = 0 X and Y are postvely correlated X and Y are nversely correlated X and Y are ndependent

Correlaton coeffcent Pearson s Correlaton Coeffcent s standardzed covarance (untless): r = cov arance(x, y) var x var y Karl Pearson 1857 1936

Correlaton Measures the relatve strength of the lnear relatonshp between two varables Ranges between 1 and 1 The closer to 1, the stronger the negatve lnear relatonshp The closer to 1, the stronger the postve lnear relatonshp The closer to 0, the weaker any postve lnear relatonshp

Scatter Plots of Data wth Varous Correlaton Coeffcents Y Y Y Y X X r = 1 r =.6 r = 0 Y Y X r = +1 X r = +.3 X r = 0 X nslde from: Statstcs for Managers Usng Mcrosoft Excel 4th Edton, 2004 PrentceHall

Lnear Correlaton Lnear relatonshps Curvlnear relatonshps Y Y X X Y Y X X nslde from: Statstcs for Managers Usng Mcrosoft Excel 4th Edton, 2004 PrentceHall

Lnear Correlaton Strong relatonshps Weak relatonshps Y Y X X Y Y X X

Lnear Correlaton No relatonshp Y X Y nslde from: Statstcs for Managers Usng Mcrosoft Excel 4th Edton, 2004 PrentceHall X

Calculatng by hand 1 ) ( 1 ) ( 1 ) )( ( var var ), ( cov ˆ 1 2 1 2 1 = = å å å = = = n y y n x x n y y x x y x y x arance r n n n

Smpler calculaton formula y x xy n n n n n n SS SS SS y y x x y y x x n y y n x x n y y x x r = = = å å å å å å = = = = = = 1 2 1 2 1 1 2 1 2 1 ) ( ) ( ) )( ( 1 ) ( 1 ) ( 1 ) )( ( ˆ y x xy SS SS SS r = ˆ Numerator of covarance Numerators of varance

Correlaton Analyss 1 < r < 1 If the correlaton coeffcent s close to +1 that means you have a strong postve relatonshp. If the correlaton coeffcent s close to 1 that means you have a strong negatve relatonshp. If the correlaton coeffcent s close to 0 that means you have no correlaton. WE HAVE THE ABILITY TO TEST THE HYPOTHESIS H 0 : r = 0

Dstrbuton of the correlaton coeffcent SE( ˆr) = 1 r2 n 2 The sample correlaton coeffcent follows a T dstrbuton wth n2 degrees of freedom (snce you have to estmate the standard error). t = r / 1 r 2 n 2

Hstory Galton's Sweet Pea Data In Natural Inhertance, Galton (1894) provded a table, whch contaned a lst of frequences of daughter seeds of varous szes organzed n rows accordng to the sze of ther parent seeds In 1896, Pearson publshed hs frst rgorous treatment of correlaton and regresson A smpler proof than Pearson's for the productmoment method proposed by Ghsell (1981)

Lnear Regresson Can we predct Novel Laureates per 10 mllon populaton usng chocolate consumpton? Chocolate ~ Nobel laureates Smple Lnear Regresson

Lnear Regresson Regresson analyss s used to predct the value of one varable (the dependent varable, ) on the bass of other varables (the ndependent varables, ). Dependent varable: denoted Y Independent varables: denoted X 1, X 2,, X k If we only have ONE ndependent varable, the model s whch s referred to as smple lnear regresson. We would be nterested n estmatng β 0 and β 1 from the data we collect.

Lnear Regresson Varables: X = Independent Varable (we provde ths) Y = Dependent Varable (we observe ths) Parameters: β 0 = YIntercept β 1 = Slope ε ~ Normal Random Varable (μ ε = 0, σ ε =???) [Nose]

The Intercept, β0

The Slope, β1

The Slope, β1

Buldng the Model Collect Data Test 2 Grade = β 0 +β1*(test 1 Grade) From Data: Estmate β 0 Estmate β 1 Estmate σ ε Student Test 1 Test 2 1 50 32 2 51 33 3 52 34 4 53 35 5 54 36 6 55 37 7 56 39 8 57 40 9 58 41 10 59 42 11 60 43 12 61 44 13 62 46 14 63 47 15 64 48 16 65 49 17 66 50 18 67 51 19 68 53 20 69 54 21 70 55 22 71 56 23 72 57

Test 2 Test B2 Test B2 Lnear Regresson Analyss 100 80 Plot of Ftted Model 92 82 Plot of Ftted Model 60 40 20 72 62 52 0 40 50 60 70 80 90 100 Test 1 100 90 Plot of Ftted Model 42 60 70 80 90 100 Test B1 80 70 60 50 50 60 70 80 90 100 Test B1

Whch lne has the best ft to the data????

Estmatng the Coeffcents In much the same way we base estmates of on, we estmate wth b 0 and wth b 1, the yntercept and slope (respectvely) of the least squares or regresson lne gven by: (Ths s an applcaton of the least squares method and t produces a straght lne that mnmzes the sum of the squared dfferences between the ponts and the lne)

Least Squares Lne these dfferences are called resduals or errors Ths lne mnmzes the sum of the squared dfferences between the ponts and the lne but where dd the lne equaton come from? How dd we get.934 for a yntercept and 2.114 for slope??

Least Squares Lne [sure glad we have computers now!] The coeffcents b 1 and b 0 for the least squares lne are calculated as: SSE = (Y Y ˆ 2 ) = (Y b0 b 1 X) 2

Statstcs Least Squares Lne See f you can estmate Yntercept and slope from ths data Recall Data Informaton Data Ponts: x y 1 6 2 1 3 9 4 5 5 17 6 12 y =.934 + 2.114x

Least Squares Lne See f you can estmate Yntercept and slope from ths data X Y X Xbar Y Ybar (XXbar)*(YYbar) (X Xbar) 2 1 6 2.500 2.333 5.833 6.250 2 1 1.500 7.333 11.000 2.250 3 9 0.500 0.667 0.333 0.250 4 5 0.500 3.333 1.667 0.250 5 17 1.500 8.667 13.000 2.250 6 12 2.500 3.667 9.167 6.250 Sum = 21 50 0.000 0.000 37.000 17.500 Xbar = 3.500 Ybar = 8.333 s xy = 7.400 37.00/(61) s 2 x = 3.500 17.5/(61) b 1 = 2.114 7.4/3.5 b 0 = 0.933 8.33 2.114*3.50

Example: Arm Crcumference and Heght

Arm Crcumference and Heght Ttest ANOVA

Vsualzng Arm Crcumference and Heght Relatonshp

Scatterplot wth regresson lne

Example: Arm Crcumference and Heght Estmated mean arm crcumference for chldren 60 cm n heght

Example: Arm Crcumference and Heght Estmated mean arm crcumference for chldren 60 cm n heght Notce, most ponts don t fall drectly on the lne: we are estmatng the mean arm crcumference of chldren 60 cm tall: observed ponts vary about the estmated mean

Lnear regresson assumes that The relatonshp between X and Y s lnear Y s dstrbuted normally at each value of X The varance of Y at every value of X s the same (homogenety of varances) The observatons are ndependent