SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c

Similar documents
unadjusted model for baseline cholesterol 22:31 Monday, April 19,

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3-1 through 3-3

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3.1 through 3.3

Comparison of a Population Means

Chapter 8 (More on Assumptions for the Simple Linear Regression)

Lecture 11: Simple Linear Regression

Answer Keys to Homework#10

BE640 Intermediate Biostatistics 2. Regression and Correlation. Simple Linear Regression Software: SAS. Emergency Calls to the New York Auto Club

Assignment 9 Answer Keys

Single Factor Experiments

EXST7015: Estimating tree weights from other morphometric variables Raw data print

Handout 1: Predicting GPA from SAT

Overview Scatter Plot Example

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

Lecture 4. Checking Model Adequacy

1) Answer the following questions as true (T) or false (F) by circling the appropriate letter.

Topic 14: Inference in Multiple Regression

General Linear Model (Chapter 4)

Topic 2. Chapter 3: Diagnostics and Remedial Measures

5.3 Three-Stage Nested Design Example

Outline. Topic 20 - Diagnostics and Remedies. Residuals. Overview. Diagnostics Plots Residual checks Formal Tests. STAT Fall 2013

STOR 455 STATISTICAL METHODS I

This is a Randomized Block Design (RBD) with a single factor treatment arrangement (2 levels) which are fixed.

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

Chapter 1 Linear Regression with One Predictor

Stat 427/527: Advanced Data Analysis I

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Lecture 3: Inference in SLR

Introduction to Design and Analysis of Experiments with the SAS System (Stat 7010 Lecture Notes)

Lecture 12 Inference in MLR

Introduction to Regression

Biological Applications of ANOVA - Examples and Readings

ST505/S697R: Fall Homework 2 Solution.

3 Variables: Cyberloafing Conscientiousness Age

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).

Lecture notes on Regression & SAS example demonstration

Table 1: Fish Biomass data set on 26 streams

Failure Time of System due to the Hot Electron Effect

PubH 7405: REGRESSION ANALYSIS SLR: DIAGNOSTICS & REMEDIES

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping

Lecture 10: 2 k Factorial Design Montgomery: Chapter 6

STAT 3A03 Applied Regression Analysis With SAS Fall 2017

Lecture 1 Linear Regression with One Predictor Variable.p2

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

Correlation and Simple Linear Regression

Chapter 2 Inferences in Simple Linear Regression

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS

Statistics for exp. medical researchers Regression and Correlation

Booklet of Code and Output for STAC32 Final Exam

Scenarios Where Utilizing a Spline Model in Developing a Regression Model Is Appropriate

Week 7.1--IES 612-STA STA doc

3rd Quartile. 1st Quartile) Minimum

One-Way Analysis of Variance (ANOVA) There are two key differences regarding the explanatory variable X.

Business Statistics. Lecture 10: Course Review

Lecture 18: Simple Linear Regression

a. The least squares estimators of intercept and slope are (from JMP output): b 0 = 6.25 b 1 =

STAT5044: Regression and Anova

Module 6: Model Diagnostics

One-way ANOVA Model Assumptions

CHAPTER 2 SIMPLE LINEAR REGRESSION

Density Temp vs Ratio. temp

Analysis of Variance. Source DF Squares Square F Value Pr > F. Model <.0001 Error Corrected Total

PLS205!! Lab 9!! March 6, Topic 13: Covariance Analysis

SAS Commands. General Plan. Output. Construct scatterplot / interaction plot. Run full model

Lecture 12: 2 k Factorial Design Montgomery: Chapter 6

Analysis of variance and regression. April 17, Contents Comparison of several groups One-way ANOVA. Two-way ANOVA Interaction Model checking

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

STATISTICS 479 Exam II (100 points)

df=degrees of freedom = n - 1

Outline. Analysis of Variance. Acknowledgements. Comparison of 2 or more groups. Comparison of serveral groups

Math 3330: Solution to midterm Exam

Statistics 512: Applied Linear Models. Topic 1

STAT 350. Assignment 4

EXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :"

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Lecture 11 Multiple Linear Regression

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Statistics for exp. medical researchers Comparison of groups, T-tests and ANOVA

ANOVA: Analysis of Variation

Sociology 6Z03 Review II

STAT 350: Summer Semester Midterm 1: Solutions

Lecture 7: Latin Square and Related Design

Confidence Intervals, Testing and ANOVA Summary

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

Forestry 430 Advanced Biometrics and FRST 533 Problems in Statistical Methods Course Materials 2010

Analysis of variance. April 16, Contents Comparison of several groups

Lecture 9: Factorial Design Montgomery: chapter 5

The Model Building Process Part I: Checking Model Assumptions Best Practice

Analysis of variance. April 16, 2009

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)

Outline. Analysis of Variance. Comparison of 2 or more groups. Acknowledgements. Comparison of serveral groups

Residuals from regression on original data 1

Analysis of Variance

Chapter 11 : State SAT scores for 1982 Data Listing

SPECIAL TOPICS IN REGRESSION ANALYSIS

Course Information Text:

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Overview. Prerequisites

Chapter 6 Multiple Regression

Transcription:

Inference About the Slope ffl As with all estimates, ^fi1 subject to sampling var ffl Because Y jx _ Normal, the estimate ^fi1 _ Normal A linear combination of indep Normals is Normal Simple Linear Regression Applied Regression and Other Multivariable Methods Sections 5-7 - 5-11, 12-1 - 12-4 Thm: If Y i _ N(μ i;ff i), then X X qx L = c iy i _ N c iμ i; c 2 i ff 2 i Can write ^fi 1 as linear combination of E's ffl Standard error of ^fi1 = S Y jx S x p n 1 ffl Use std error to form CI or test hypothesis ffl Degrees of freedom n 2 Confidence Int : ^fi1 ± t n 2;ff=2 Hypothesis Test: T = (^fi1 fi 0 1 )= 5 5-1 Inference About the Slope Inference About the Intercept ffl Want small for inference ffl In situation where X is under experimental control If S X made large! small Can increase S X by increasing dispersion of X Can also increase n to decrease, increase df ffl Sometimes interested in intercept fi 0 ffl Standard error of ^fi0 S^fi0 = S Y jx s 1 n + X 2 (n 1)S 2 X ffl In situation where X is under experimental control If S X made large! small S^fi0 Increase S X by increasing dispersion ffl Test H 0 : fi 1 = 0 to see if linear association ffl Does X help explain Y through a linear model? Rejecting does not mean linear model is best" Not rejecting doesn't mean X unimportant See page 56 for examples If X close to zero! small S^fi0 ffl Can also increase n to decrease, increase df ffl Test H 0 : fi 0 = 0 to see if line goes through origin Does not test linear model fit Really only meaningful if X around zero 5-2 5-3

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, clb model sbp=age / clb alpha=.01; /* Form 99% CI for parameters */ Dependent Variable: sbp Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 6394.02269 6394.02269 21.33 <.0001 Error 28 8393.44398 299.76586 Corrected Total 29 14787 Root MSE 17.31375 R-Square 0.4324 Dependent Mean 142.53333 Adj R-Sq 0.4121 Coeff Var 12.14716 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept 1 98.71472 10.00047 9.87 <.0001 age 1 0.97087 0.21022 4.62 <.0001 Parameter Estimates Variable DF 99% Confidence Limits Intercept 1 71.08080 126.34863 age 1 0.38999 1.55175 5-4 ffl Line describes the mean population response for X ffl Predicted mean at X = X 0 is ^μ Y jx0 = ^fi 0 + ^fi 1X 0 ffl Standard error of ^μ Y jx0 is S Y jxr 1 (X0 X)2 + n (n 1)S 2 X ffl New predicted observation at X = X 0 is ^Y X0 = ^fi 0 + ^fi 1X 0 ffl Standard error of ^Y X0 is S Y jx r 1+ 1 (X0 X)2 + n (n 1)S 2 X ffl New obs doesn't have to fall on line! bigger var ffl Recall Y X0 = μ Y jx0 + E and Var(E)=S 2 Y jx 5-5 SAS Procedures Interpolation vs Extrapolation ffl Must use caution in interpretation of ^Y X0, ^μ Y jx0 ffl If X 0 within range of observed X's! interpolation ffl If X 0 outside range of observed X's! extrapolation ffl Extrapolation should be avoided No assurances still linear outside range of data Example: Fish activity and Water Temp ffl Can also construct confidence/prediction bands ffl Prediction bands wider than confidence bands ffl Most narrow at X 5-6 model sbp=age /cli clm; /* Confidence int for i=indiv m=mean */ plot sbp*age / conf pred; /* Create plot with conf and pred bands */ Dependent Variable: sbp Output Statistics Dep Var Predicted Std Error Obs sbp Value Mean Predict 95% CL Mean 1 114.0000 115.2195 6.7058 101.4832 128.9558 2 124.0000 117.1613 6.3382 104.1781 130.1444 3 116.0000 118.1321 6.1568 105.5204 130.7439 4 120.0000 119.1030 5.9774 106.8588 131.3472 5 125.0000 122.9865 5.2825 112.1657 133.8072 6 130.0000 126.8700 4.6362 117.3731 136.3668 7 110.0000 131.7243 3.9332 123.6676 139.7810 8 136.0000 133.6661 3.6984 126.0901 141.2420 9 144.0000 136.5787 3.4139 129.5857 143.5717 10 120.0000 136.5787 3.4139 129.5857 143.5717 11 124.0000 139.4913 3.2289 132.8771 146.1055 12 128.0000 139.4913 3.2289 132.8771 146.1055 13 160.0000 141.4330 3.1700 134.9395 147.9265 14 138.0000 142.4039 3.1612 135.9285 148.8792 15 135.0000 142.4039 3.1612 135.9285 148.8792 16 142.0000 143.3748 3.1663 136.8889 149.8606 17 220.0000 144.3456 3.1853 137.8208 150.8704 18 145.0000 144.3456 3.1853 137.8208 150.8704 19 130.0000 145.3165 3.2180 138.7248 151.9082 20 142.0000 147.2582 3.3225 140.4525 154.0640 21 158.0000 150.1708 3.5675 142.8632 157.4785 22 154.0000 153.0835 3.9001 145.0946 161.0724 23 150.0000 153.0835 3.9001 145.0946 161.0724 24 140.0000 155.9961 4.2999 147.1881 164.8041 5-7

Output Statistics Obs 95% CL Predict Residual 1 77.1867 153.2523-1.2195 2 79.3939 154.9286 6.8387 3 80.4909 155.7734-2.1321 4 81.5833 156.6227 0.8970 5 85.9069 160.0661 2.0135 6 90.1549 163.5851 3.1300 7 95.3551 168.0935-21.7243 8 97.4003 169.9318 2.3339 9 100.4302 172.7271 7.4213 10 100.4302 172.7271-16.5787 11 103.4142 175.5684-15.4913 12 103.4142 175.5684-11.4913 13 105.3779 177.4882 18.5670 14 106.3520 178.4558-4.4039 15 106.3520 178.4558-7.4039 16 107.3210 179.4285-1.3748 17 108.2848 180.4064 75.6544 18 108.2848 180.4064 0.6544 19 109.2435 181.3895-15.3165 20 111.1455 183.3709-5.2582 21 113.9602 186.3815 7.8292 22 116.7292 189.4377 0.9165 23 116.7292 189.4377-3.0835 24 119.4531 192.5391-15.9961 25 123.0159 196.7432-15.8796 26 123.8945 197.8063 1.1496 27 124.7684 198.8742 0.1787 28 126.5018 201.0242 6.2370 29 126.5018 201.0242-5.7630 30 128.2167 203.1929 9.2952 Sum of Residuals 0 Sum of Squared Residuals 8393.44398 Predicted Residual SS (PRESS) 9108.39568 5-8 5-9 Diagnostics Regression Diagnostics ffl Will study more procedures throughout semester ffl These focus on simple linear regression ffl Assumptions 1 Model is correct (linearity) 2 Independent observations 3 Errors normally distributed 4 Constant variance Y i = ^μ YijX i + (Y i ^μ YijX i) Y i = ^Y i + ^E i observed = predicted + residual ffl Diagnostics will use predicted and residual values 5-10 ffl Normality Histogram/Boxplot of residuals Normal probability plot / QQ plot Shapiro-Wilks/Kolmogorov-Smirnov Test ffl Variance Plot ^E i vs ^Y i (residual plot) Bartlett's or Levene's Test (provided repeat X i obs) ffl Independence Plot ^E i vs time/space Runs test/durbin-watson Test ffl Outliers Is it influential? With and without analysis Formal tests (e.g. standardized residuals) Investigate why result may occur, don't try to eliminate 5-11

Normality Assumption ffl Histogram/Boxplot Is histogram of residuals bell-shaped? Is boxplot/histogram symmetric? ffl Normal Probability/QQ Plot Ordered residuals vs cumulative normal probs Is it approximately linear? Constant Variance ffl Often experiments with non-constant variance ffl Size of residual associated with predicted value ffl Residual plot Plot ^E i vs ^Y i Is the range constant for different levels of ^Y i ffl Bartlett's and Levene's Test More formal test Compares pooled var with sample variances Bartlett sensitive to Normality assumption 5-12 Independence ffl Plot of the residuals over time Is there a drift or pattern as trials proceed? ffl Plot residuals versus relevant variables Often variables omitted from analysis Experimental conditions (e.g., temp) May result in inclusion of factor in next exp ffl Durbin-Watson or Runs Test DW model statement option Assumes observations presented in time order Runs tests look at number of pos/neg residuals in a row 5-13 The UNIVARIATE Procedure Variable: res (Residual) SAS Procedures model sbp=age; plot r.*nqq.; /* Generate QQ Plot */ plot r.*p.; /* Generate residual Plot */ output out=fit r=res p=pred; proc gplot; /* Generate Residual Plot */ plot res*pred /vref=0 frame; proc univariate normal pctdef=4; /* Check Normality of Residuals */ var res; histogram res / normal kernel (L=2); qqplot res / normal (L=1 mu=est sigma=est); 5-14 Moments N 30 Sum Weights 30 Mean 0 Sum Observations 0 Std Deviation 17.012616 Variance 289.429103 Skewness 3.06973017 Kurtosis 13.6058151 Uncorrected SS 8393.44398 Corrected SS 8393.44398 Coeff Variation. Std Error Mean 3.10606451 Basic Statistical Measures Location Variability Mean 0.00000 Std Deviation 17.01262 Median -0.52040 Variance 289.42910 Mode. Range 97.37869 Interquartile Range 12.33250 Tests for Normality Test --Statistic--- -----p Value------ Shapiro-Wilk W 0.699983 Pr < W <0.0001 Kolmogorov-Smirnov D 0.225738 Pr > D <0.0100 Cramer-von Mises W-Sq 0.338205 Pr > W-Sq <0.0050 Anderson-Darling A-Sq 2.140256 Pr > A-Sq <0.0050 Quantiles (Definition 4) Quantile Estimate 100% Max 75.654375 99% 75.654375 95% 44.256311 90% 9.148620 75% Q3 3.906773 50% Median -0.520403 25% Q1-8.425731 10% -15.984417 5% -18.894204 1% -21.724310 0% Min -21.724310 5-15

Extreme Observations ------Lowest----- ------Highest----- Value Obs Value Obs -21.7243 14 7.42134 1-16.5787 23 7.82915 26-15.9961 13 9.29523 30-15.8796 27 18.56699 25-15.4913 8 75.65438 2 Goodness-of-Fit Tests for Normal Distribution Test ---Statistic---- -----p Value----- Kolmogorov-Smirnov D 0.22573825 Pr > D <0.010 Cramer-von Mises W-Sq 0.33820476 Pr > W-Sq <0.005 Anderson-Darling A-Sq 2.14025586 Pr > A-Sq <0.005 Quantiles for Normal Distribution -------Quantile------- Percent Observed Estimated 1.0-21.72431-39.577263 5.0-18.89420-27.983263 10.0-15.98442-21.802545 25.0-8.42573-11.474835 50.0-0.52040-0.000000 75.0 3.90677 11.474835 90.0 9.14862 21.802545 95.0 44.25631 27.983263 99.0 75.65438 39.577263 5-16 5-17 5-18 5-19