Lecture 3: Inference in SLR

Similar documents
Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Lecture 10 Multiple Linear Regression

Topic 14: Inference in Multiple Regression

Lecture 12 Inference in MLR

Lecture 1 Linear Regression with One Predictor Variable.p2

ST505/S697R: Fall Homework 2 Solution.

Overview Scatter Plot Example

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Lecture 9 SLR in Matrix Form

Correlation Analysis

STOR 455 STATISTICAL METHODS I

Chapter 2 Inferences in Simple Linear Regression

Statistics 512: Applied Linear Models. Topic 1

STAT Chapter 11: Regression

Lecture 13 Extra Sums of Squares

Econometrics. 4) Statistical inference

Lecture 7 Remedial Measures

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

Inference for Regression Simple Linear Regression

EXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :"

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

df=degrees of freedom = n - 1

Course Information Text:

Regression Models - Introduction

Ch 2: Simple Linear Regression

Inferences for Regression

Business Statistics. Lecture 10: Course Review

Psychology 282 Lecture #4 Outline Inferences in SLR

Review of Statistics 101

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

General Linear Model (Chapter 4)

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

STAT 3A03 Applied Regression With SAS Fall 2017

Simple linear regression

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

Formal Statement of Simple Linear Regression Model

Correlation and the Analysis of Variance Approach to Simple Linear Regression

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Statistics for Managers using Microsoft Excel 6 th Edition

Sociology 6Z03 Review II

Basic Business Statistics 6 th Edition

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Lecture 19: Inference for SLR & Transformations

y response variable x 1, x 2,, x k -- a set of explanatory variables

Homework 2: Simple Linear Regression

Inference for Regression Inference about the Regression Model and Using the Regression Line

Inference in Normal Regression Model. Dr. Frank Wood

Lecture 18 Miscellaneous Topics in Multiple Regression

Topic 20: Single Factor Analysis of Variance

Inference in Regression Analysis

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Math 3330: Solution to midterm Exam

Example: Four levels of herbicide strength in an experiment on dry weight of treated plants.

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Lecture notes on Regression & SAS example demonstration

Inference for the Regression Coefficient

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Lecture 11 Multiple Linear Regression

Statistics 5100 Spring 2018 Exam 1

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

The simple linear regression model discussed in Chapter 13 was written as

6. Multiple Linear Regression

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

Business Statistics. Lecture 10: Correlation and Linear Regression

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Business Statistics. Lecture 9: Simple Regression

Chapter 16. Simple Linear Regression and dcorrelation

Lecture 2 Linear Regression: A Model for the Mean. Sharyn O Halloran

Mathematics for Economics MA course

A discussion on multiple regression models

Statistics for Engineers Lecture 9 Linear Regression

Lecture 18: Simple Linear Regression

Chapter 12 - Lecture 2 Inferences about regression coefficient

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

Simple Linear Regression

Inference with Simple Regression

9. Linear Regression and Correlation

Lecture 6 Multiple Linear Regression, cont.

Chapter 1 Linear Regression with One Predictor

Chapter 6 Multiple Regression

STA 4210 Practise set 2a

Lecture 11: Simple Linear Regression

The Multiple Regression Model

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Multiple Regression Analysis: Heteroskedasticity

STAT Chapter 8: Hypothesis Tests

STK4900/ Lecture 3. Program

Linear models and their mathematical foundations: Simple linear regression

STATISTICS 110/201 PRACTICE FINAL EXAM

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

Multiple Linear Regression

Outline. Topic 19 - Inference. The Cell Means Model. Estimates. Inference for Means Differences in cell means Contrasts. STAT Fall 2013

Unbalanced Data in Factorials Types I, II, III SS Part 1

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

Ordinary Least Squares Regression Explained: Vartanian

Multiple Regression Analysis

Transcription:

Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1

Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals Prediction Intervals 3-

Review: Significance Tests One Sample T-test Take a sample of size n from some (normal) population: H 0 : 0 Y t H : sy a 0 Compare t to a critical value from the students-t distribution (table B.) with (typically) 0.05. 0 3-3

Review: Significance Tests () One Sample T-test: Can turn the test statistic into a confidence interval for 1 /, n1 Y t s Y Generally a confidence interval takes the form Point Est. ± Crit. Value * SE Two Sample T-test: Compares the means of two samples. 3-4

Significance Levels The significance level is the probability of making a Type I error and rejecting the null hypothesis when it is in fact true (false positive). The most common significance level that we will use is 0.05. The corresponding confidence level is 1. So for 0.05 our confidence level will be 95%. 3-5

P-Values The p-value for a test is the probability (under the null hypothesis) of observing a test statistic that is at least as extreme as the one that is actually observed. We reject the null if P-value Mathematically, the p-value is Pr H T t, where T ~ tn 0 1 Graphically, the p-value is twice the area in the upper tail of the tn 1 distribution (above the observed t ). 3-6

Conclusions Conclude H a means there is sufficient evidence in the data to conclude that H 0 is false, and hence we can assume H a is true. Fail to Reject H 0 means there is insufficient evidence in the data to conclude that either H 0 or H a is true or false, so we default to assuming that H 0 is true. Unless prepared to make further justification (power) it is not appropriate to conclude H 0. 3-7

Power of a Test The probability of a Type II error (failing to reject H 0 when H a is in fact true or a false negative) is often denoted (not to be confused with regression coefficients). The power of a test is 1. This is the probability that H 0 will be rejected given that H a is true. Power calculations involve the non-central t- distribution (generally use a computer). 3-8

β Inference 1 Recall that X X Y Y SS b1 X X SS i i XY i X s are constant, Y s are normally distributed. Using probability theory it can thus be shown that (page 4-43) b ~ Normal, b where 1 1 1 b 1 SS X X 3-9

Test for H :β 0 0 1 As in the case of the one-sample t-test, we can develop the test statistic for testing H 0 : 1 0 vs. H a : 1 0: t b 1 where sb 1 0 s b MSE 1 SSX This statistic has a t-distribution with n degrees of freedom (not n 1 because we are also estimating 0). 3-10

Test for H :β 0 0 1 Reject H 0 if t tcrit, where tcrit t(1 ; n ). SAS will give us both the value of the t- statistic and the P-value. If the P-value is smaller than, reject in favor of H : 0 a 1 3-11

Confidence Interval for β 1 The 1001 % CI for 1 is b 1 tcrits b1 where tcrit t(1 ; n ). In terms of hypothesis testing, if the CI does not contain 0, then we reject H 0 : 1 0 and conclude that Ha : 1 0 is true. 3-1

Power In cases where we fail to reject, it is important to know the power of the test for H 0 : 1 0. There are two important questions we must answer before we can determine power: 1. What size difference is important?. Guess for the variance? Note that power calculations should be done prior to collection of data if possible. 3-13

Power () The power to detect a difference of size d is calculated using the non-central t distribution. In addition to and the degrees of freedom, we need the noncentrality parameter: 1 1 b SS 1 / X Power for some values of, can be looked up in Table B5. SAS also has a procedure for computing power (for any values). 3-14

β Inference 0 Similar to inference for 1 b ~ Normal, b 0 0 0 1 X b0 n SS X where To test 0 k : b0 k t where 0 s b 0 1 X s b MSE n SS X 3-15

Test for H :β 0 0 k The statistic has a t-distribution with n degrees of freedom; compare it with the appropriate t-critical value. SAS gives both statistic and p-value for testing 0 0; to test 0 k, obtain and use a confidence interval. The 1001 % CI for 0 is b t s b 0 crit 0 Remember: If X = 0 is not within the scope of the model, inference may be meaningless!! 3-16

Robustness In cases where the errors are not quite normal, the CIs and significance tests for 1 and 0 are still generally reasonable approximations. We say that these tests are robust with respect to minor violations of the normality assumption. 3-17

SAS Coding PROC REG data=diamonds; model price=weight /clb; RUN; clb option in PROC REG requests the confidence limits for b 1 and b 0. You can also specify alpha=0.xxx to change the significance level (default = 0.05) 3-18

SAS Output Parameter Std Variable DF Estimate Error t Value Pr > t Intercept 1-59.65 17.318-14.99 <.0001 weight 1 371.04 81.785 45.50 <.0001 Variable DF 95% Confidence Limits Intercept 1-94.48696-4.76486 weight 1 3556.39841 3885.6519 3-19

Summary of Inference SLR Model Yi 0 1Xi i ~ Normal 0, are independent, random i errors Y ~ Normal X, i 0 1 i 3-0

Summary of Inference Parameter Estimates For 1: b X X Y Y SS i i XY 1 X SS i X X For 0: b0 Y b1x For : s SSE e MSE df n E i 3-1

Summary of Inference 1001 % Confidence Intervals b t s b 1 crit 1 b t s b 0 crit 0 Where tcrit t(1 ; n ). 3-

Summary of Inference Significance tests H 0 : 1 0 vs. H a : 1 0: b1 0 t t( n ) under H 0 s b 1 H 0 : 0 0 vs. H a : 0 0: b0 0 t t( n ) under H 0 s b Reject H 0 if the P-value is small (<) 0 3-3

CI for the Mean Response The mean response when ˆh 0 1 X Y b b X h X is Y ˆh is a normal random variable (since the parameter estimates are linear combos of the Y i and these are normal). To develop a confidence interval we can obtain a formula for the standard error from. and b 0 b 1 h 3-4

Standard Error The variance associated to Y ˆh is ˆ Var Y Var b X Var b h 0 h 1 1 n Substitute MSE for X h SS X X to get the estimated variance. Take the square root to get the sy ˆh 3-5

Confidence Interval for EY h Recall: Point Est. ± Crit. Value * SE Confidence Limits are Yˆ t s Yˆ h crit h Where tcrit t(1 ; n ) 3-6

Prediction Intervals Predicting a new observation for X Xh is different from estimating the mean response in that there is additional variation associated to the normal curve EY that is centered at h Hence two components to sy ˆh, new Variance associated to the estimated mean response. Variance associated to the new obs. 3-7

Prediction Intervals () The variance associated to Y ˆh, new is ˆ Var Y Var Yˆ h, new h 1 X 1 n As before, substitute MSE for the square root to get sy equivalently, s pred. X SSX and take, or h ˆh, new 3-8

Prediction Intervals (3) The 1001 % prediction interval for a new observation at X X is given by Y t s pred ˆh crit Where tcrit t(1 ; n ) h 3-9

CI s and PI s in SAS PROC REG data=diamonds; model price=weight /clm cli; clm produces CI s for the mean response cli produces prediction intervals Intervals produced for each data point including those with missing values 3-30

SAS Output Predicted Std Error Obs Wt Price Value Mean Predict 95% CL Mean 1 0.1 3.00 186.897 8.768 170.37 03.558 0.15 33.00 98.58 6.3833 85.679 311.377 49 0.43. 1340 19.033 130 1379 Obs Wt 95% CL Predict Residual 1 0.1 10.6754 53.1187 36.109 0.15 33.1609 363.8947 3.47 49 0.43 166 1415. 3-31

Comparing Standard Errors s b MSE 1 SSX 1 X s b0 MSE n SS X 1 ˆ h s Yh MSE n SS X 1 Xh s pred MSE 1 n SS X X X X 3-3

Minimizing Standard Errors Can sometimes design experiments to minimize standard errors Increase sample size Increase SS X by spreading out the values of the predictor variable Arrange for the predictor of interest to be X X h 3-33

Upcoming in Lecture 4... We will look at one more example illustrating the use of SAS. We ll discuss the Working-Hotelling Confidence Band (.6), details of the ANOVA table (.7.9) and clean up a few details in.10. 3-34