Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Similar documents
Statistics for Business and Economics

Statistics for Economics & Business

Basic Business Statistics, 10/e

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Correlation Analysis

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

Chapter 11: Simple Linear Regression and Correlation

Chapter 14 Simple Linear Regression

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

Chapter 15 - Multiple Regression

Chapter 15 Student Lecture Notes 15-1

Learning Objectives for Chapter 11

Chapter 13: Multiple Regression

Correlation and Regression

18. SIMPLE LINEAR REGRESSION III

Statistics MINITAB - Lab 2

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Comparison of Regression Lines

Lecture 6: Introduction to Linear Regression

28. SIMPLE LINEAR REGRESSION III

x i1 =1 for all i (the constant ).

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting.

Statistics II Final Exam 26/6/18

Scatter Plot x

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

STAT 3008 Applied Regression Analysis

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

Introduction to Regression

Economics 130. Lecture 4 Simple Linear Regression Continued

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

17 - LINEAR REGRESSION II

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Topic 7: Analysis of Variance

Chapter 9: Statistical Inference and the Relationship between Two Variables

Biostatistics 360 F&t Tests and Intervals in Regression 1

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis

/ n ) are compared. The logic is: if the two

STAT 3340 Assignment 1 solutions. 1. Find the equation of the line which passes through the points (1,1) and (4,5).

Regression. The Simple Linear Regression Model

STATISTICS QUESTIONS. Step by Step Solutions.

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

a. (All your answers should be in the letter!

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

e i is a random error

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Negative Binomial Regression

Lecture 4 Hypothesis Testing

Chapter 14 Simple Linear Regression Page 1. Introduction to regression analysis 14-2

x = , so that calculated

SIMPLE LINEAR REGRESSION

January Examinations 2015

Biostatistics. Chapter 11 Simple Linear Correlation and Regression. Jing Li

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

β0 + β1xi and want to estimate the unknown

Diagnostics in Poisson Regression. Models - Residual Analysis

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

Properties of Least Squares

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu

Statistics Chapter 4

Chapter 8 Indicator Variables

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

CHAPTER 8. Exercise Solutions

Midterm Examination. Regression and Forecasting Models

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

The SAS program I used to obtain the analyses for my answers is given below.

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Chapter 3. Two-Variable Regression Model: The Problem of Estimation

Chapter 10. What is Regression Analysis? Simple Linear Regression Analysis. Examples

Chapter 3 Describing Data Using Numerical Measures

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

PubH 7405: REGRESSION ANALYSIS. SLR: INFERENCES, Part II

Answers Problem Set 2 Chem 314A Williamsen Spring 2000

Statistics for Managers using Microsoft Excel 6 th Edition

IV. Modeling a Mean: Simple Linear Regression

Systematic Error Illustration of Bias. Sources of Systematic Errors. Effects of Systematic Errors 9/23/2009. Instrument Errors Method Errors Personal

The Ordinary Least Squares (OLS) Estimator

Regression Analysis. Regression Analysis

This column is a continuation of our previous column

Statistical Evaluation of WATFLOOD

Reduced slides. Introduction to Analysis of Variance (ANOVA) Part 1. Single factor

LINEAR REGRESSION ANALYSIS. MODULE VIII Lecture Indicator Variables

III. Econometric Methodology Regression Analysis

STAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression

STAT 511 FINAL EXAM NAME Spring 2001

7.1. Single classification analysis of variance (ANOVA) Why not use multiple 2-sample 2. When to use ANOVA

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications

ANSWERS CHAPTER 9. TIO 9.2: If the values are the same, the difference is 0, therefore the null hypothesis cannot be rejected.

Introduction to Analysis of Variance (ANOVA) Part 1

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

Lecture 16 Statistical Analysis in Biomaterials Research (Part II)

Lecture 3 Stat102, Spring 2007

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

Linear Regression Analysis: Terminology and Notation

Transcription:

Department of Quanttatve Methods & Informaton Systems Tme Seres and Ther Components QMIS 30 Chapter 6 Fall 00 Dr. Mohammad Zanal These sldes were modfed from ther orgnal source for educatonal purpose only. The copyrghted materals belong to Busness Statstcs: A Decson Makng Approach, 7e 008 Prentce Hall, Inc. Chapter Goals After completng ths chapter, you should be able to: Explan the smple lnear regresson model Obtan and nterpret the smple lnear regresson equaton for a set of data Descrbe R as a measure of explanatory power of the regresson model Understand the assumptons behnd regresson analyss Explan measures of varaton and determne whether the ndependent varable s sgnfcant

3 Chapter Goals (contnued) After completng ths chapter, you should be able to: Calculate and nterpret confdence ntervals for the regresson coeffcents Use a regresson equaton for predcton Form forecast ntervals around an estmated Y value for a gven X Use graphcal analyss to recognze potental problems n regresson analyss Explan the correlaton coeffcent and perform a hypothess test for zero populaton correlaton 4 Overvew of Lnear Models An equaton can be ft to show the best lnear relatonshp between two varables: Y = β 0 + β X Where Y s the dependent varable and X s the ndependent varable β 0 s the Y-ntercept β s the slope

5 Least Squares Regresson Estmates for coeffcents β 0 and β are found usng a Least Squares Regresson technque The least-squares regresson lne, based on sample data, s yˆ = b0 + bx Where b s the slope of the lne and b 0 s the y- ntercept: Cov(x,y) b = s x b0 = y bx 6 Introducton to Regresson Analyss Regresson analyss s used to: Predct the value of a dependent varable based on the value of at least one ndependent varable Explan the mpact of changes n an ndependent varable on the dependent varable Dependent varable: the varable we wsh to explan (also called the endogenous varable) Independent varable: the varable used to explan the dependent varable (also called the exogenous varable)

7 Lnear Regresson Model The relatonshp between X and Y s descrbed by a lnear functon Changes n Y are assumed to be caused by changes n X Lnear regresson populaton equaton model Y = β + β x + ε 0 Where β 0 and β are the populaton model coeffcents and ε s a random error term. 8 Smple Lnear Regresson Model The populaton regresson model: Dependent Varable Populaton Y ntercept Y = β + β X + 0 Populaton Slope Coeffcent Independent Varable ε Random Error term Lnear component Random Error component

9 Smple Lnear Regresson Model (contnued) Y Y = β + β X + ε 0 Observed Value of Y for X Predcted Value of Y for X ε Random Error for ths X value Slope = β Intercept = β 0 X X 0 Smple Lnear Regresson Equaton The smple lnear regresson equaton provdes an estmate of the populaton regresson lne Estmated (or predcted) y value for observaton y ˆ = b + b Estmate of the regresson ntercept 0 Estmate of the regresson slope x Value of x for observaton The ndvdual random error terms e have a mean of zero e = y - yˆ ) = y -(b + b x ) ( 0

Least Squares Estmators b 0 and b are obtaned by fndng the values of b 0 and b that mnmze the sum of the squared dfferences between y and ŷ : mn SSE = mn = mn = mn e (y yˆ ) [y (b 0 + b x )] Dfferental calculus s used to obtan the coeffcent estmators b 0 and b that mnmze SSE Least Squares Estmators (contnued) The slope coeffcent estmator s n (x x)(y y) Cov(x, y) = b = = = n sx (x x) = And the constant or y-ntercept s b0 = y bx The regresson lne always goes through the mean x, y r xy s s y x

3 Fndng the Least Squares Equaton The coeffcents b 0 and b, and other regresson results n ths chapter, wll be found usng a computer Hand calculatons are tedous Statstcal routnes are bult nto Mntab Other statstcal analyss software can be used 4 Lnear Regresson Model Assumptons The true relatonshp form s lnear (Y s a lnear functon of X, plus random error) The error terms, ε are ndependent of the x values The error terms are random varables wth mean 0 and constant varance, σ (the constant varance property s called homoscedastcty) E[ε ] = 0 and E[ε ] = σ for ( =, K,n) The random error terms, ε, are not correlated wth one another, so that E[ε ε ] = 0 j for all j

5 Interpretaton of the Slope and the Intercept s the estmated average value of y when the value of x s zero (f x = 0 s n the range of observed x values) b 0 s the estmated change n the average value of y as a result of a one-unt change n x b 6 Smple Lnear Regresson Example A real estate agent wshes to examne the relatonshp between the sellng prce of a home and ts sze (measured n square feet) A random sample of 0 houses s selected Dependent varable (Y) = house prce n $000s Independent varable (X) = square feet

7 Sample Data for House Prce Model House Prce n $000s (Y) Square Feet (X) 45 400 3 600 79 700 308 875 99 00 9 550 405 350 34 450 39 45 55 700 8 Graphcal Presentaton House Prce ($000s) House prce model: scatter plot 450 400 350 300 50 00 50 00 50 0 0 500 000 500 000 500 3000 Square Feet

9 Regresson Usng Mntab Mntab wll be used to generate the coeffcents and measures of goodness of ft for regresson 0 Mntab Output

Mntab Output (contnued) Regresson Statstcs R Square 0.58 Adjusted R Square 0.58 Standard Error 4.3303 Observatons 0 The regresson equaton s: house prce = 98.4833 + 0.0977 (square feet) ANOVA df SS MS F Sgnfcance F Regresson 8934.9348 8934.9348.0848 0.0039 Resdual 8 3665.565 708.957 Total 9 3600.5000 Coeffcents Standard Error t Stat P-value Lower 95% Upper 95% Intercept 98.4833 58.03348.69 0.9-35.5770 3.07386 Square Feet 0.0977 0.0397 3.33 0.00 0.03374 0.8580 Graphcal Presentaton House prce model: scatter plot and regresson lne Intercept = 98.48 House Prce ($000s) 450 400 350 300 50 00 50 00 50 0 0 500 000 500 000 500 3000 Square Feet Slope = 0.0977 house prce = 98.4833 + 0.0977 (square feet)

3 Interpretaton of the Intercept, b 0 house prce = 98.4833 + 0.0977 (square feet) b 0 s the estmated average value of Y when the value of X s zero (f X = 0 s n the range of observed X values) Here, no houses had 0 square feet, so b 0 = 98.4833 just ndcates that, for houses wthn the range of szes observed, $98,48.33 s the porton of the house prce not explaned by square feet 4 Interpretaton of the Slope Coeffcent, b house prce = 98.4833 + 0.0977 (square feet) b measures the estmated change n the average value of Y as a result of a oneunt change n X Here, b =.0977 tells us that the average value of a house ncreases by.0977($000) = $09.77, on average, for each addtonal one square foot of sze

5 Measures of Varaton Total varaton s made up of two parts: SST = SSR + SSE Total Sum of Squares Regresson Sum of Squares Error Sum of Squares SST = (y y SSR = (yˆ y SSE = (y yˆ ) where: y ) = Average value of the dependent varable y = Observed values of the dependent varable ŷ = Predcted value of y for the gven x value ) 6 Measures of Varaton (contnued) SST = total sum of squares Measures the varaton of the y values around ther mean, y SSR = regresson sum of squares Explaned varaton attrbutable to the lnear relatonshp between x and y SSE = error sum of squares Varaton attrbutable to factors other than the lnear relatonshp between x and y

7 y Y _ y y _ SST = (y - y) Measures of Varaton SSE = (y - y ) _ SSR = (y - y) (contnued) y _ y x X 8 Coeffcent of Determnaton, R The coeffcent of determnaton s the porton of the total varaton n the dependent varable that s explaned by varaton n the ndependent varable The coeffcent of determnaton s also called R-squared and s denoted as R SSR regresson sum of squares R = = SST total sum of squares note: 0 R

9 Examples of Approxmate r Values Y r = Y r = X Perfect lnear relatonshp between X and Y: 00% of the varaton n Y s explaned by varaton n X r = X 30 Examples of Approxmate r Values Y Y X 0 < r < Weaker lnear relatonshps between X and Y: Some but not all of the varaton n Y s explaned by varaton n X X

3 Examples of Approxmate r Values Y r = 0 No lnear relatonshp between X and Y: r = 0 X The value of Y does not depend on X. (None of the varaton n Y s explaned by varaton n X) 3 Regresson Statstcs R Square 0.58 Adjusted R Square 0.58 Standard Error 4.3303 Observatons 0 Mntab Output SSR 8934.9348 R = = = 0.5808 SST 3600.5000 58.08% of the varaton n house prces s explaned by varaton n square feet ANOVA df SS MS F Sgnfcance F Regresson 8934.9348 8934.9348.0848 0.0039 Resdual 8 3665.565 708.957 Total 9 3600.5000 Coeffcents Standard Error t Stat P-value Lower 95% Upper 95% Intercept 98.4833 58.03348.69 0.9-35.5770 3.07386 Square Feet 0.0977 0.0397 3.33 0.00 0.03374 0.8580

33 Correlaton and R The coeffcent of determnaton, R, for a smple regresson s equal to the smple correlaton squared R = rxy 34 Estmaton of Model Error Varance An estmator for the varance of the populaton model error s σˆ = s e n e = SSE = = n n Dvson by n nstead of n s because the smple regresson model uses two estmated parameters, b 0 and b, nstead of one s e = s e s called the standard error of the estmate

35 Regresson Statstcs R Square 0.58 Adjusted R Square 0.58 Standard Error 4.3303 Observatons 0 Mntab Output s e = 4.3303 ANOVA df SS MS F Sgnfcance F Regresson 8934.9348 8934.9348.0848 0.0039 Resdual 8 3665.565 708.957 Total 9 3600.5000 Coeffcents Standard Error t Stat P-value Lower 95% Upper 95% Intercept 98.4833 58.03348.69 0.9-35.5770 3.07386 Square Feet 0.0977 0.0397 3.33 0.00 0.03374 0.8580 36 Comparng Standard Errors s e s a measure of the varaton of observed y values from the regresson lne Y Y small s e X large se X The magntude of s e should always be judged relatve to the sze of the y values n the sample data.e., s e = $4.33K s moderately small relatve to house prces n the $00 - $300K range

.5 37 Inferences About the Regresson Model The varance of the regresson slope coeffcent (b ) s estmated by s b se = (x x) = se (n )s x where: s b = Estmate of the standard error of the least squares slope s e = SSE = Standard error of the estmate n 38 Mntab Output Regresson Statstcs R Square 0.58 Adjusted R Square 0.58 Standard Error 4.3303 Observatons 0 s = b 0.0397 ANOVA df SS MS F Sgnfcance F Regresson 8934.9348 8934.9348.0848 0.0039 Resdual 8 3665.565 708.957 Total 9 3600.5000 Coeffcents Standard Error t Stat P-value Lower 95% Upper 95% Intercept 98.4833 58.03348.69 0.9-35.5770 3.07386 Square Feet 0.0977 0.0397 3.33 0.00 0.03374 0.8580

39 Comparng Standard Errors of the Slope S b s a measure of the varaton n the slope of regresson lnes from dfferent possble samples Y Y small S b X large S b X 40 Inference about the Slope: t Test t test for a populaton slope Is there a lnear relatonshp between X and Y? Null and alternatve hypotheses H 0 : β = 0 H : β 0 Test statstc t = (no lnear relatonshp) (lnear relatonshp does exst) b β s b d.f. = n where: b = regresson slope coeffcent β = hypotheszed slope s b = standard error of the slope

4 Inference about the Slope: t Test (contnued) House Prce n $000s (y) Square Feet (x) 45 400 3 600 79 700 308 875 99 00 9 550 405 350 34 450 39 45 55 700 Estmated Regresson Equaton: house prce = 98.5 + 0.098 (sq.ft.) The slope of ths model s 0.098 Does square footage of the house affect ts sales prce? 4 Inferences about the Slope: t Test Example H 0 : β = 0 H : β 0 From Mntab output: s b Coeffcents Standard Error t Stat P-value Intercept 98.4833 58.03.69 0.9 Square Feet 0.0977 0.0397 3.33 0.00 b b β t = s b 0.0977 0 = t 0.0397 = 3.3938

43 H 0 : β = 0 H : β 0 d.f. = 0- = 8 t 8,.05 =.3060 α/=.05 Inferences about the Slope: t Test Example Test Statstc: t = 3.39 Reject H 0 Do not reject H 0 Reject H 0 -t n-,α/ 0 t n-,α/ From Mntab output: α/=.05 -.3060.3060 3.39 b sb (contnued) Coeffcents Standard Error t Stat P-value Intercept 98.4833 58.03.69 0.9 Square Feet 0.0977 0.0397 3.33 0.00 Decson: Reject H 0 Concluson: There s suffcent evdence that square footage affects house prce t 44 H 0 : β = 0 H : β 0 Inferences about the Slope: t Test Example Ths s a two-tal test, so the p-value s P(t > 3.39)+P(t < -3.39) = 0.0039 (for 8 d.f.) P-value = 0.0039 From Mntab output: Decson: P-value < α so (contnued) P-value Coeffcents Standard Error t Stat P-value Intercept 98.4833 58.03.69 0.9 Square Feet 0.0977 0.0397 3.33 0.00 Reject H 0 Concluson: There s suffcent evdence that square footage affects house prce

45 Confdence Interval Estmate for the Slope Confdence Interval Estmate of the Slope: b tn,α/sb < β < b + tn,α/s Mntab Prntout for House Prces: b d.f. = n - Coeffcents Standard Error t Stat P-value Lower 95% Upper 95% Intercept 98.4833 58.03.69 0.9-35.5770 3.07386 Square Feet 0.0977 0.0397 3.33 0.00 0.03374 0.8580 At 95% level of confdence, the confdence nterval for the slope s (0.0337, 0.858) 46 Confdence Interval Estmate for the Slope (contnued) Coeffcents Standard Error t Stat P-value Lower 95% Upper 95% Intercept 98.4833 58.03.69 0.9-35.5770 3.07386 Square Feet 0.0977 0.0397 3.33 0.00 0.03374 0.8580 Snce the unts of the house prce varable s $000s, we are 95% confdent that the average mpact on sales prce s between $33.70 and $85.80 per square foot of house sze Ths 95% confdence nterval does not nclude 0. Concluson: There s a sgnfcant relatonshp between house prce and square feet at the.05 level of sgnfcance

47 F-Test for Sgnfcance F Test statstc: F = MSR MSE where MSR = SSR k SSE MSE = n k where F follows an F dstrbuton wth k numerator and (n k - ) denomnator degrees of freedom (k = the number of ndependent varables n the regresson model) 48 Mntab Output Regresson Statstcs R Square 0.58 Adjusted R Square 0.58 Standard Error 4.3303 Observatons 0 MSR 8934.9348 F = = =.0848 MSE 708.957 Wth and 8 degrees of freedom ANOVA df SS MS F Sgnfcance F Regresson 8934.9348 8934.9348.0848 0.0039 Resdual 8 3665.565 708.957 Total 9 3600.5000 P-value for the F-Test Coeffcents Standard Error t Stat P-value Lower 95% Upper 95% Intercept 98.4833 58.03348.69 0.9-35.5770 3.07386 Square Feet 0.0977 0.0397 3.33 0.00 0.03374 0.8580

0 49 H 0 : β = 0 H : β 0 α =.05 df = df = 8 Do not reject H 0 Crtcal Value: F α = 5.3 F-Test for Sgnfcance α =.05 Reject H 0 F.05 = 5.3 F Test Statstc: MSR F = =.08 MSE Decson: Reject H 0 at α = 0.05 (contnued) Concluson: There s suffcent evdence that house sze affects sellng prce 50 Predcton The regresson equaton can be used to predct a value for y, gven a partcular x For a specfed value, x n+, the predcted value s ˆ y n+ = b0 + bxn+

5 Predctons Usng Regresson Analyss Predct the prce for a house wth 000 square feet: house prce = = 98.5 + 0.098 (sq.ft.) 98.5 + 0.098(000) = 37.85 The predcted prce for a house wth 000 square feet s 37.85($,000s) = $37,850 5 Relevant Data Range When usng a regresson model for predcton, only predct wthn the relevant range of data Relevant data range House Prce ($000s) 450 400 350 300 50 00 50 00 50 0 0 500 000 500 000 500 3000 Square Feet Rsky to try to extrapolate far beyond the range of observed X s

53 Estmatng Mean Values and Predctng Indvdual Values Goal: Form ntervals around y to express uncertanty about the value of y for a gven x Confdence Interval for Y the expected value of y, gven x y y = b 0 +b x Predcton Interval for an sngle observed y, gven x x X 54 Confdence Interval for the Average Y, Gven X Confdence nterval estmate for the expected value of y gven a partcular x Confdence nterval for E(Y n+ X n+ ): yˆ n+ ± t n,α/ s e + n (xn x) (x x) + Notce that the formula nvolves the term (x x n + ) so the sze of nterval vares accordng to the dstance x n+ s from the mean, x

55 Predcton Interval for an Indvdual Y, Gven X Confdence nterval estmate for an actual observed value of y gven a partcular x Confdence nterval for yˆ n+ : yˆ n+ ± t n,α/ s e + n (xn x) + (x x) + Ths extra term adds to the nterval wdth to reflect the added uncertanty for an ndvdual case 56 Estmaton of Mean Values: Example Confdence Interval Estmate for E(Y n+ X n+ ) Fnd the 95% confdence nterval for the mean prce of,000 square-foot houses Predcted Prce y = 37.85 ($,000s) (x x) yˆ n + ± tn-,α/se + = 37.85 ± 37. n (x x) The confdence nterval endponts are 80.66 and 354.90, or from $80,660 to $354,900

57 Estmaton of Indvdual Values: Example Confdence Interval Estmate for y n+ Fnd the 95% confdence nterval for an ndvdual house wth,000 square feet Predcted Prce y = 37.85 ($,000s) (X X) yˆ n + ± tn-,α/se + + = 37.85 ± 0.8 n (X X) The confdence nterval endponts are 5.50 and 40.07, or from $5,500 to $40,070.7 58 Correlaton Analyss Correlaton analyss s used to measure strength of the assocaton (lnear relatonshp) between two varables Correlaton s only concerned wth strength of the relatonshp No causal effect s mpled wth correlaton Correlaton was frst presented n Chapter 3

59 Correlaton Analyss The populaton correlaton coeffcent s denoted ρ (the Greek letter rho) The sample correlaton coeffcent s where s xy r = (x = s s x xy s y x)(y n y) 60 Hypothess Test for Correlaton To test the null hypothess of no lnear assocaton, H 0 : ρ = 0 the test statstc follows the Student s t dstrbuton wth (n ) degrees of freedom: t = r (n ) ( r )

6 Decson Rules Hypothess Test for Correlaton Lower-tal test: H 0 : ρ 0 H : ρ < 0 Upper-tal test: H 0 : ρ 0 H : ρ >0 Two-tal test: H 0 : ρ = 0 H : ρ 0 α α α/ α/ -t α t α -t α/ t α/ Reject H 0 f t < -t n-, α Reject H 0 f t > t n-, α Reject H 0 f t < -t n-, α/ or t > t n-, α/ Where r (n ) t = ( r ) has n - d.f. 6 Graphcal Analyss The lnear regresson model s based on mnmzng the sum of squared errors If outlers exst, ther potentally large squared errors may have a strong nfluence on the ftted regresson lne Be sure to examne your data graphcally for outlers and extreme ponts Decde, based on your model and logc, whether the extreme ponts should reman or be removed

63 Chapter Summary Introduced the lnear regresson model Revewed correlaton and the assumptons of lnear regresson Dscussed estmatng the smple lnear regresson coeffcents Descrbed measures of varaton Descrbed nference about the slope Addressed estmaton of mean values and predcton of ndvdual values