Chapter 10. Simple Linear Regression and Correlation

Similar documents
Simple Linear Regression

Chapter 12 - Lecture 2 Inferences about regression coefficient

Scatter plot of data from the study. Linear Regression

Ch 2: Simple Linear Regression

Scatter plot of data from the study. Linear Regression

Lecture 14 Simple Linear Regression

regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Six Sigma Black Belt Study Guides

A discussion on multiple regression models

Section 3: Simple Linear Regression

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

Can you tell the relationship between students SAT scores and their college grades?

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Simple and Multiple Linear Regression

STAT2012 Statistical Tests 23 Regression analysis: method of least squares

Lecture 11: Simple Linear Regression

ECO220Y Simple Regression: Testing the Slope

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

STA121: Applied Regression Analysis

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

ST430 Exam 1 with Answers

Correlation Analysis

Simple Linear Regression

Ch Inference for Linear Regression

Inference for Regression

Correlation 1. December 4, HMS, 2017, v1.1

Inferences for Regression

MATH11400 Statistics Homepage

Simple Linear Regression

28. SIMPLE LINEAR REGRESSION III

Intro to Linear Regression

Inference for the Regression Coefficient

9. Linear Regression and Correlation

Simple Linear Regression Analysis

Simple Linear Regression

STAT 3A03 Applied Regression With SAS Fall 2017

Correlation and Linear Regression

s e, which is large when errors are large and small Linear regression model

Ch 3: Multiple Linear Regression

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Statistics and Quantitative Analysis U4320. Segment 10 Prof. Sharyn O Halloran

Applied Econometrics (QEM)

Density Temp vs Ratio. temp

2.4.3 Estimatingσ Coefficient of Determination 2.4. ASSESSING THE MODEL 23

Linear Correlation and Regression Analysis

SMAM 314 Exam 42 Name

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS

Homework 2: Simple Linear Regression

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Lecture 10 Multiple Linear Regression

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Multiple Linear Regression

REVIEW 8/2/2017 陈芳华东师大英语系

Assumptions, Diagnostics, and Inferences for the Simple Linear Regression Model with Normal Residuals

Linear models and their mathematical foundations: Simple linear regression

AMS 7 Correlation and Regression Lecture 8

Business Statistics. Lecture 10: Correlation and Linear Regression

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Sociology 6Z03 Review II

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

Chapter 8 Handout: Interval Estimates and Hypothesis Testing

Applied Regression Analysis

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

Regression Analysis. Simple Regression Multivariate Regression Stepwise Regression Replication and Prediction Error EE290H F05

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

R 2 and F -Tests and ANOVA

Chapter 4 Describing the Relation between Two Variables

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH1725 (May-June 2009) INTRODUCTION TO STATISTICS. Time allowed: 2 hours

Review 6. n 1 = 85 n 2 = 75 x 1 = x 2 = s 1 = 38.7 s 2 = 39.2

Analysing data: regression and correlation S6 and S7

INFERENCE FOR REGRESSION

Linear Models and Estimation by Least Squares

The Simple Linear Regression Model

Intro to Linear Regression

Simple linear regression

Area1 Scaled Score (NAPLEX) .535 ** **.000 N. Sig. (2-tailed)

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

TMA4255 Applied Statistics V2016 (5)

Multiple Linear Regression

11. Regression and Least Squares

Math 3330: Solution to midterm Exam

STAT 4385 Topic 03: Simple Linear Regression

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

Chapter 12 : Linear Correlation and Linear Regression

Econ 300/QAC 201: Quantitative Methods in Economics/Applied Data Analysis. 12th Class 6/23/10

STAT 350 Final (new Material) Review Problems Key Spring 2016

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Business Statistics. Lecture 9: Simple Regression

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

Analysis of Bivariate Data

STAT 511. Lecture : Simple linear regression Devore: Section Prof. Michael Levine. December 3, Levine STAT 511

Formal Statement of Simple Linear Regression Model

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

Econometrics. 4) Statistical inference

Transcription:

Chapter 10. Simple Linear Regression and Correlation In the two sample problems discussed in Ch. 9, we were interested in comparing values of parameters for two distributions. Regression analysis is the part of statistics that deals with investigation of the relationships between two or more variables. In this chapter, we generalize the linear relation y = α + βx + e to a linear probabilistic relationship, develop procedures for making inferences about the parameters of the model, and obtain a quantitative measure(the correlation coefficient) of the extent to which the two variables are related. 1. Simple Linear Regression Model : y = α + βx + e where e N(0, σ 2 ) There exist parameters α, β and σ 2 such that for any fixed value of the independent variable x, the dependent variable y is related to x through the model equation. The quantity e in the model equation is a random variable, assumed to be normally distributed with E(e)=0 and Var(e) =σ 2. 1

True slope,β β is the measure the change in the mean of the response variable for every unit change in the x(explanatory variable). 1. beta > 0 2.beta < 0 3.beta = 0 True intercept,α α tells you the mean of the response variable wehn the explanatory variable, x = 0 2

2. Estimating Model parameters (α, β and σ 2 ) The parameters α(intercept), β(slpoe) will almost never be known to an investigator. Instead, sample data consisting of n observed pairs of (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) will be available, from which the model parameters can be estimated. Notation: a = ˆα =estimate of α b = ˆβ = estimate of β the estimated linear regression equation ŷ = a + bx Principle of Least Squares Find a and b such that n min e 2 i = i=1 = n [y i ŷ i ] 2 i=1 n [y i (a + bx i )] 2 i=1 b = a = ȳ ˆβ x i=1 (x i x)(y i ȳ) i=1 (x i x) 2 = r s x s y Here, r is the correlation between y and x, s y is the standard deviation of y and the s x is the standard deviation of x. 3

Measuring the variability The estimate of σ 2 is i=1 (y i ŷ i ) 2 n 2. The estimate of σ = the estimated standard deviation, s e s e = s 2 e SS resid and SS total represent the sums of squares of the residuals and the sums of squares of the y i about ȳ, respectively. The sums of squares due to regression is SS reg = SS total SS resid. 3. Inferences about the slope β. 1. E(b) = β 2. Var(b) V ar(b) = σ 2 b = = σ 2 i=1 (x i x) 2 s 2 i=1 (x i x) 2 (replacing σ 2 by its estimate s 2 e gives an estimator for σ 2 b ) 3. The estimator of β has a normal distribution. Confidence Interval for β The distribution of the standardized variable t is a t distribution with d.f.= n 2. t = b β s ˆβ A 100(1 α)% confidence interval for β, the slope of the regression line, (b t α/2,n 2 s b, b + t α/2,n 2 s b ) 4

Hypothesis testing for β Null hypothesis(h o ) : β = b 0 Test Statistic : t = b b 0 s b Alternative Hypothesis H o : β > b 0 H o : β < b 0 H o : β = b 0 Rejection Region t > t α,n 2 t < t α,n 2 t < t α,n 2 or t > t α,n 2 H o should be rejected if p value less than α and not rejected if p value > α. < Example > handout (in Class) 5

4. Inferences based on the estimated Regression Line y = α + βx Let x o denote a particular value of x. For a + bx 0 1. E(a + bx 0 ) = α + βx 2. σ a+bx0 = σ Replacing σ 2 by its estimate s 2 i=1 (x i x) e. 2 1 + (x 0 x) 2 n 3. a + bx 0 N(α + βx, σ 2 a+bx 0 ) Confidence Interval for α + βx The distribution of the standardized variable t has a t distribution with d.f.= n 2. t = a + bx 0 (α + βx 0 ) s a+bx0 Hence, A 100(1 α)% confidence interval for α + βx 0, the average value when x has value x o, has the form ((a + bx 0 ) t α/2,n 2 s a+bx0, (a + bx 0 ) + t α/2,n 2 s a+bx0 6

5. Inferences about the population Correlation Coefficient The correlation coefficient r is a measure of how strongly related x and y are in the observed sample. Population Correlation Coefficient ρ Sample Correlation Coefficient r ρ = ρ(x, Y ) = cov(x, Y ) σ X σ Y r = ˆρ = i=1 (x i x)(y i ȳ) i=1 (x i x) 2 (y i ȳ) 2 Properties of r 1. The value of r is independent of the units in which x and y are measured. 2. 1 r 1 3. r=1 B1 if and only if all (x i, y i ) pairs lie on a straight line with positive(negative) slope, respectively. 7

Hypothesis testing for ρ Null hypothesis( H 0 ) : ρ = 0 ( x and y are independent) Test Statistic: t = r (1 r 2 ) 2 /(n 2) Alternative Hypothesis Rejection Region H 0 : ρ > 0 H o : ρ < 0 H o : ρ = 0 t > t α,n 2 t < t α,n 2 t < t α,n 2 or t > t α,n 2 The t critical value is based on n 2 d.f. H o should be rejected if p value less than α and not rejected if p value > α. < Exmaple > Here are the golf scores of 12 members of a college women s golf team in two rounds of tournament play. Plot the data and find the correlation between the two scores andt he test the null hypothesis that ρ = 0. player 1 2 3 4 5 6 7 8 9 10 11 12 Round 1 89 90 87 95 86 81 102 105 83 88 91 79 Round 2 94 85 89 89 81 76 107 89 87 91 88 80 8