Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Similar documents
Inference for the Regression Coefficient

Mathematics for Economics MA course

Inference for Regression Simple Linear Regression

Inference for Distributions Inference for the Mean of a Population. Section 7.1

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Inferences for Regression

Inference for Regression Inference about the Regression Model and Using the Regression Line

INFERENCE FOR REGRESSION

Regression Models - Introduction

23. Inference for regression

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

ST Correlation and Regression

Correlation Analysis

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Correlation and Regression (Excel 2007)

Sociology 6Z03 Review II

LECTURE 5. Introduction to Econometrics. Hypothesis testing

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Homework 2: Simple Linear Regression

Inference for Regression

Unit 6 - Introduction to linear regression

Simple linear regression

Statistical Modelling in Stata 5: Linear Models

9. Linear Regression and Correlation

Basic Business Statistics 6 th Edition

Confidence Intervals, Testing and ANOVA Summary

Confidence Interval for the mean response

Unit 6 - Simple linear regression

Basic Business Statistics, 10/e

Lectures on Simple Linear Regression Stat 431, Summer 2012

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Chi-square tests. Unit 6: Simple Linear Regression Lecture 1: Introduction to SLR. Statistics 101. Poverty vs. HS graduate rate

Topic 14: Inference in Multiple Regression

CHAPTER EIGHT Linear Regression

df=degrees of freedom = n - 1

Simple Linear Regression

Review of Statistics 101

Psychology 282 Lecture #4 Outline Inferences in SLR

Regression Analysis: Exploring relationships between variables. Stat 251

Business Statistics. Lecture 10: Course Review

Regression Models - Introduction

Statistics for Managers using Microsoft Excel 6 th Edition

AMS 7 Correlation and Regression Lecture 8

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

Variance Decomposition and Goodness of Fit

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

The Multiple Regression Model

ECO220Y Simple Regression: Testing the Slope

Correlation & Simple Regression

Section 3: Simple Linear Regression

Chapter 16. Simple Linear Regression and dcorrelation

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Lecture 18: Simple Linear Regression

Chapter 7 Student Lecture Notes 7-1

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Review 6. n 1 = 85 n 2 = 75 x 1 = x 2 = s 1 = 38.7 s 2 = 39.2

Inference with Simple Regression

Chapter 27 Summary Inferences for Regression

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Scatter plot of data from the study. Linear Regression

Single and multiple linear regression analysis

Chapter 3 Multiple Regression Complete Example

SIMPLE REGRESSION ANALYSIS. Business Statistics

Regression. Marc H. Mehlman University of New Haven

Ch 2: Simple Linear Regression

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Math 3330: Solution to midterm Exam

Objectives. 2.3 Least-squares regression. Regression lines. Prediction and Extrapolation. Correlation and r 2. Transforming relationships

STATS DOESN T SUCK! ~ CHAPTER 16

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Chapter 7: Simple linear regression

Scatter plot of data from the study. Linear Regression

STAT Chapter 11: Regression

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Formal Statement of Simple Linear Regression Model

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Applied Regression Modeling: A Business Approach Chapter 3: Multiple Linear Regression Sections

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

Lecture 9: Linear Regression

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

Chapter 14. Linear least squares

Chapter 14 Student Lecture Notes 14-1

Note on Bivariate Regression: Connecting Practice and Theory. Konstantin Kashin

Statistics II Exercises Chapter 5

Six Sigma Black Belt Study Guides

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

Chapter 14 Simple Linear Regression (A)

Answer Key. 9.1 Scatter Plots and Linear Correlation. Chapter 9 Regression and Correlation. CK-12 Advanced Probability and Statistics Concepts 1

Introduction to Simple Linear Regression

9 Correlation and Regression

Chapter 16. Simple Linear Regression and Correlation

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

Lecture 18 Miscellaneous Topics in Multiple Regression

Lecture 3: Inference in SLR

Regression analysis is a tool for building mathematical and statistical models that characterize relationships between variables Finds a linear

Transcription:

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details Section 10.1, 2, 3

Basic components of regression setup Target of inference: linear dependency of a response variable on one or more explanatory variables One explanatory variable simple linear regression (SLR) One or more multiple regression The least-squares regression line describes this dependency in the data. A population regression line describes its underlying idealization Involved in describing the probabilities of observing certain values in the sample measurements.

Brief review of least-squares regression The least-squares regression line makes the sum of squared-prediction errors as small as possible. The slope is and the intercept is Predictions are made by plugging in values of x The residuals describe the leftover variation in y after fitting the least-squares regression line The coefficient of determination, r 2, measures the proportion of variability in y that is explained by x

Formal setup for inference in regression The data arise as n pairs of measurements, (x 1, y 1 ),, (x n, y n ) (x i, y i ) are measurements on the i th individual The statistical model is y i = β 0 + β 1 x i + ε i µ y = β 0 + β 1 x i is the mean response when x = x i The ε i are independent, and each ε i is N(0, σ) The least-squares regression line is Sample estimate of µ y = β 0 + β 1 x i

Example: Wages and experience Do wages rise with experience? In a study of employment trends, wage (y, in $/week) and length of service (LOS = x, in months) measurements were obtained from n = 59 workers in similar customer-service positions. Wages LOS Wages LOS Wages LOS Wages LOS Wages LOS Wages LOS 389 94 403 76 443 222 486 60 547 228 443 104 395 48 378 48 353 58 393 7 347 27 566 34 329 102 348 61 349 41 311 22 328 48 461 184 295 20 488 30 499 153 316 57 327 7 436 156 377 60 391 108 322 16 384 78 320 74 321 25 479 78 541 61 408 43 360 36 404 204 221 43 315 45 312 10 393 96 369 83 443 24 547 36 316 39 418 68 277 98 529 66 261 13 362 60 324 20 417 54 649 150 270 47 417 30 415 102 307 65 516 24 272 124 332 97 450 95

Example: Wages and experience (continued) Summary statistics: Least-squares regression line and scatterplot:

Sampling framework Idea: Each value of x defines a subpopulation Multiple, independent SRSs: Each SRS is drawn from a distinct subpopulation y = response variable = measurement of interest x = explanatory variable = subpopulation and sample labels One SRS, with multiple measurements: measure (x i, y i ) on the i th individual but treat the x i as fixed quantities Model describes the conditional distribution of y given its associated subpopulation

Comments on the statistical model y i = β 0 + β 1 x i + ε i with independent ε i, each N(0, σ) Data = Fit + Residual Linearity: µ y = β 0 + β 1 x connects subpopulation means Constant spread: σ does not depend on x Normality: response measurements are bell-shaped within each subpopulation

Residuals and residual standard deviation Unknown population quantities: The random variables ε i are residual deviations The parameter σ is the residual standard deviation Analogous quantities calculated from the sample: The i th (sample) residual is e i = y i ŷ i The regression standard error is

Properties of the slope estimate Suppose (x 1, y 1 ),, (x n, y n ) satisfy the assumptions of the statistical model for SLR Mean: Standard deviation: Standard error:

Some computational formulas Regression standard error: Standard error for slope:

Example: Wages and experience (continued) Regression standard error: Standard error for slope:

The t test and CI for slope in SLR Assumptions: The statistical model for SLR Hypotheses: H 0 : β 1 = 0 versus a one- or two-sided H a Test statistic: P-value: P(T -t) for H a : β 1 < 0 P(T t) for H a : β 1 > 0 2P(T t ) for H a : β 1 0, where T is t(n 2) CI: For confidence level C, the interval is where t* is such that P(T t*) = (1 C)/2

Example: Wages and experience (continued) Hypotheses: H 0 : β 1 = 0 versus H a : β 1 > 0 Summary statistics: b 1 = 0.59, s = 82.2, and SE b1 = 0.21 Test statistic: t = b 1 / SE b1 = 0.59 / 0.21 = 2.85 P-value: P(T 2.85) = 0.003, with k = n 2 = 57 d.f. Decision: Reject H 0 at significance level α = 0.05, and conclude that wages rise with experience

Example: Wages and experience (continued) How much do wages rise with experience? 95% CI: P(T 2.00) = 0.025, using k = n 2 = 57 d.f. t* = 2.00, and the interval is b 1 ± t*se b1 = 0.59 ± (2.00)(0.21) = 0.59 ± 0.41 = (0.18, 1.00) Conclude an increase in weekly salary between $0.18 and $1.00 per month of service, on average

Robustness A moderate lack of Normality may be tolerated Better for large n Outliers or influential observations may be problematic Basic tool: residual plots Example: Wages and experience (continued)

Connections to correlation One SRS, with multiple measurements: (x i, y i ) are paired measurements from one SRS Idea: Treat x as random and work with correlation r is the sample correlation ρ is the population correlation A test of H 0 : ρ = 0 may be carried out with identical calculations as a test of H 0 : β 1 = 0 but CI formulas for ρ and β 1 are very different Different interpretations: Correlation is for two-way relationships; regression is for one-way relationships

Uncertainty in predicted values Plugging x into ŷ = b 0 + b 1 x provides a prediction of the response. Two possible interpretations: ŷ is an estimate of the subpopulation mean µ y = β 0 + β 1 x ŷ is a prediction of an unobserved response, y, from a subpopulation with mean µ y = β 0 + β 1 x Note: There is more uncertainty in the second interpretation since the target of inference is random

Confidence interval for µ y Suppose ŷ is to be an estimate of µ y = β 0 + β 1 x Standard error: CI: For confidence level C, the interval is where t* is such that P(T t*) = (1 C)/2

Prediction interval for y Suppose ŷ is to be a prediction of y from a subpopulation with mean µ y = β 0 + β 1 x Standard error: PI: For confidence level C, the interval is where t* is such that P(T t*) = (1 C)/2

Example: Wages and experience (continued) What is the mean of the subpopulation of workers who s length of service is x = 125? Estimate of µ y : ŷ = b 0 + b 1 x = 349.4 + (0.59)(125) = 423.2 Standard error: 95% CI:

Example: Wages and experience (continued) Suppose the length of service of some interesting worker is x = 125. What is his or her weekly wage? Prediction of y: ŷ = b 0 + b 1 x = 423.2 Standard error: 95% PI:

Confidence and prediction bands Observe: PIs are less precise than CIs Reflects greater uncertainty of the prediction problem

Decomposition of variation Analysis of variance (ANOVA) equation: Total variation in y (= 0 if all y i are equal) Variation about the line (= 0 if all y i = ŷ i ) Variation along the line (= 0 if b 1 = 0)

ANOVA setup Total variation in y: Variation along the line: Variation about the line: Note: Total d.f. = Regression d.f. + Residual d.f.

Related calculations Coefficient of determination: proportion of total variation accounted for by the regression line Mean squares: Alternative formula (which generalizes to multiple regression)

Example: Wages and experience (continued) Relevant summary statistics: Mean square statistics: Sums of square statistics (MS d.f.):

Testing in ANOVA The ANOVA F statistic is May be used to test H 0 : β 1 = 0 versus H a : β 1 0 and its generalization to multiple regression Large values of F provide evidence against H 0 : β 1 = 0 In SLR case, t = F P-value is 2P(T F) where T is t(n 2)