Stat 302 Statistical Software and Its Applications SAS: Simple Linear Regression

Similar documents
Chapter 8 Quantitative and Qualitative Predictors

3 Variables: Cyberloafing Conscientiousness Age

Lecture 11: Simple Linear Regression

Multicollinearity Exercise

Effect of Centering and Standardization in Moderation Analysis

a. The least squares estimators of intercept and slope are (from JMP output): b 0 = 6.25 b 1 =

data proc sort proc corr run proc reg run proc glm run proc glm run proc glm run proc reg CONMAIN CONINT run proc reg DUMMAIN DUMINT run proc reg

Handout 1: Predicting GPA from SAT

Chapter 8 (More on Assumptions for the Simple Linear Regression)

Failure Time of System due to the Hot Electron Effect

Chapter 1 Linear Regression with One Predictor

Lab # 11: Correlation and Model Fitting

STAT 350: Summer Semester Midterm 1: Solutions

General Linear Model (Chapter 4)

Lecture 11 Multiple Linear Regression

Lecture notes on Regression & SAS example demonstration

ANALYSES OF NCGS DATA FOR ALCOHOL STATUS CATEGORIES 1 22:46 Sunday, March 2, 2003

ssh tap sas913, sas

Analysis of Variance. Source DF Squares Square F Value Pr > F. Model <.0001 Error Corrected Total

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum

Statistics 5100 Spring 2018 Exam 1

unadjusted model for baseline cholesterol 22:31 Monday, April 19,

Topic 18: Model Selection and Diagnostics

Topic 20: Single Factor Analysis of Variance

Lecture 1 Linear Regression with One Predictor Variable.p2

EXST7015: Estimating tree weights from other morphometric variables Raw data print

PubH 7405: REGRESSION ANALYSIS SLR: DIAGNOSTICS & REMEDIES

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis

Booklet of Code and Output for STAC32 Final Exam

Chapter 2 Inferences in Simple Linear Regression

STAT 3A03 Applied Regression With SAS Fall 2017

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping

STAT 3A03 Applied Regression Analysis With SAS Fall 2017

SPECIAL TOPICS IN REGRESSION ANALYSIS

Analysis of Covariance

Correlation and the Analysis of Variance Approach to Simple Linear Regression

SAS Commands. General Plan. Output. Construct scatterplot / interaction plot. Run full model

Outline. Topic 22 - Interaction in Two Factor ANOVA. Interaction Not Significant. General Plan

Booklet of Code and Output for STAC32 Final Exam

Model Selection Procedures

STATISTICS 479 Exam II (100 points)

The program for the following sections follows.

Statistics for exp. medical researchers Regression and Correlation

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Paper: ST-161. Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop UMBC, Baltimore, MD

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

Table 1: Fish Biomass data set on 26 streams

Biological Applications of ANOVA - Examples and Readings

EXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :"

Overview Scatter Plot Example

Residuals from regression on original data 1

Topic 28: Unequal Replication in Two-Way ANOVA

Regression without measurement error using proc calis

1) Answer the following questions as true (T) or false (F) by circling the appropriate letter.

STOR 455 STATISTICAL METHODS I

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

Topic 16: Multicollinearity and Polynomial Regression

Answer to exercise 'height vs. age' (Juul)

STAT 350. Assignment 4

This is a Randomized Block Design (RBD) with a single factor treatment arrangement (2 levels) which are fixed.

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c

SIMPLE LINEAR REGRESSION

BE640 Intermediate Biostatistics 2. Regression and Correlation. Simple Linear Regression Software: SAS. Emergency Calls to the New York Auto Club

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

Analysis of variance and regression. November 22, 2007

a. YOU MAY USE ONE 8.5 X11 TWO-SIDED CHEAT SHEET AND YOUR TEXTBOOK (OR COPY THEREOF).

Stat 500 Midterm 2 12 November 2009 page 0 of 11

Chap 10: Diagnostics, p384

dm'log;clear;output;clear'; options ps=512 ls=99 nocenter nodate nonumber nolabel FORMCHAR=" = -/\<>*"; ODS LISTING;

ST Correlation and Regression

Odor attraction CRD Page 1

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3.1 through 3.3

df=degrees of freedom = n - 1

Measuring relationships among multiple responses

using the beginning of all regression models

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).

sociology sociology Scatterplots Quantitative Research Methods: Introduction to correlation and regression Age vs Income

Section Least Squares Regression

In Class Review Exercises Vartanian: SW 540

UNIVERSITY EXAMINATIONS NJORO CAMPUS SECOND SEMESTER 2011/2012

Linear models Analysis of Covariance

Lecture 10: 2 k Factorial Design Montgomery: Chapter 6

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

Linear models Analysis of Covariance

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Answer to exercise: Blood pressure lowering drugs

6. Multiple regression - PROC GLM

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Chapter 11 : State SAT scores for 1982 Data Listing

Models for Clustered Data

14.32 Final : Spring 2001

Lecture 12 Inference in MLR

Section I. Define or explain the following terms (3 points each) 1. centered vs. uncentered 2 R - 2. Frisch theorem -

Topic 32: Two-Way Mixed Effects Model

Models for Clustered Data

Chapter 6 Multiple Regression

Statistics 512: Applied Linear Models. Topic 1

****Lab 4, Feb 4: EDA and OLS and WLS

Chapter 12: Multiple Regression

Transcription:

1 Stat 302 Statistical Software and Its Applications SAS: Simple Linear Regression Fritz Scholz Department of Statistics, University of Washington Winter Quarter 2015 February 16, 2015

2 The Spirit of St. Louis data spirit; infile "U:\data\SpiritStLouis.csv" dsd firstobs=2; input gas weight headwind TO_distance; run; title "Spirit of St. Louis Takeoff Distance"; proc print data = spirit; run; title "Scatter Plot with Regression Line"; proc sgplot data=spirit; reg y = weight x=to_distance; run; title "Correlation"; proc corr data = spirit; var weight TO_distance; run; title "Simple Linear regression"; proc reg data = spirit; model weight = TO_distance; run;

ut The Output: Page 1 3 P Spirit of St. Louis Takeoff Distance Obs gas weight headwind TO_distance 1 36 2600 7 229 2 71 2800 9 287 3 111 3050 9 389 4 151 3300 6 483 5 201 3600 4 615 6 251 3900 2 800 7 301 4200 0 1023

4 The Output: SAS Output Page 2 Page 2 of 7

5 SAS Output The Output: Page 3 Page 3 of 7 Correlation The CORR Procedure 2 Variables: weight TO_distance Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum weight 7 3350 583.80933 23450 2600 4200 TO_distance 7 546.57143 286.64488 3826 229.00000 1023 Pearson Correlation Coefficients, N = 7 Prob > r under H0: Rho=0 weight TO_distance weight 1.00000 0.98882 <.0001 TO_distance 0.98882 <.0001 1.00000

6 The Output: Page 4 Simple Linear regression The REG Procedure Model: MODEL1 Dependent Variable: weight Number of Observations Read 7 Number of Observations Used 7 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 1 1999529 1999529 219.87 <.0001 Error 5 45471 9094.24340 Corrected Total 6 2045000 Root MSE 95.36374 R-Square 0.9778 Dependent Mean 3350.00000 Adj R-Sq 0.9733 Coeff Var 2.84668 Parameter Estimates Variable DF Parameter Estimate Standard Error t Value Pr > t Intercept 1 2249.24429 82.52306 27.26 <.0001 TO_distance 1 2.01393 0.13582 14.83 <.0001

7 SAS Output The Output: Page 5 Page 5 of 7 Simple Linear regression The REG Procedure Model: MODEL1 Dependent Variable: weight

8 The Output: Page 6 SAS Output Page 6 of 7

The Spirit of St. Louis (Using Log Transforms) 9 data spirit; infile "U:\data\SpiritStLouis.csv" dsd firstobs=2; input gas weight headwind TO_distance; TO_DistL10 = log10(to_distance); weightl10 = log10(weight); run; title "Spirit of St. Louis Takeoff Distance L10"; proc print data = spirit; run; title "Scatter Plot with Regression Line L10"; proc sgplot data=spirit; reg y = weightl10 x = TO_distL10; run; title "Correlation L10"; proc corr data = spirit; var weightl10 TO_distL10; run; title "Simple Linear regression L10"; proc reg data = spirit; model weightl10 = TO_distL10; run;

ut The Output: Page 1 10 P Spirit of St. Louis Takeoff Distance L10 Obs gas weight headwind TO_distance TO_DistL10 weightl10 1 36 2600 7 229 2.35984 3.41497 2 71 2800 9 287 2.45788 3.44716 3 111 3050 9 389 2.58995 3.48430 4 151 3300 6 483 2.68395 3.51851 5 201 3600 4 615 2.78888 3.55630 6 251 3900 2 800 2.90309 3.59106 7 301 4200 0 1023 3.00988 3.62325

11 The SAS Output: Page 2 Page 2 of 7

12 SAS Output The Output: Page 3 Page 3 of 7 Correlation L10 The CORR Procedure 2 Variables: weightl10 TO_DistL10 Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum weightl10 7 3.51937 0.07598 24.63556 3.41497 3.62325 TO_DistL10 7 2.68478 0.23461 18.79345 2.35984 3.00988 Pearson Correlation Coefficients, N = 7 Prob > r under H0: Rho=0 weightl10 TO_DistL10 weightl10 1.00000 0.99949 <.0001 TO_DistL10 0.99949 <.0001 1.00000

13 The Output: Page 4 Simple Linear regression L10 The REG Procedure Model: MODEL1 Dependent Variable: weightl10 Number of Observations Read 7 Number of Observations Used 7 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 1 0.03460 0.03460 4945.51 <.0001 Error 5 0.00003499 0.00000700 Corrected Total 6 0.03464 Root MSE 0.00265 R-Square 0.9990 Dependent Mean 3.51937 Adj R-Sq 0.9988 Coeff Var 0.07516 Parameter Estimates Variable DF Parameter Estimate Standard Error t Value Pr > t Intercept 1 2.65030 0.01240 213.76 <.0001 TO_DistL10 1 0.32370 0.00460 70.32 <.0001

14 SAS Output The Output: Page 5 Page 5 of 7 Simple Linear regression L10 The REG Procedure Model: MODEL1 Dependent Variable: weightl10

15 The Output: Page 6 SAS Output Page 6 of 7

16 Comments The unfortunate aspect of this second analysis, based on log-transforms, is that the scatter plots show the log-value scales and not the properly scaled original units. I was able to make a plot with original units, indicated on a distorted (log-transformed) scale. I was not able to easily add any tted line to it. In R this was easy via abline. This link shows how to add a sloped line to a scatter plot adding a line to a scatter plot, but it does not work for the log-scale version produced by the code on the next slide. That it took some eort to add such a sloped line in SAS 9.2 is telling in itself. The link to it seems to have disappeared. My solution to it seems klutzy, but it did the trick.

17 The Spirit of St. Louis (Using Log Transforms) data spirit; infile "U:\data\SpiritStLouis.csv" dsd firstobs=2; input gas weight headwind TO_distance; TO_DistL10 = log10(to_distance); weightl10 = log10(weight); run; title "Scatter Plot with Log Scale"; proc sgplot data = spirit; scatter y = weight x = TO_distance; yaxis type = log logstyle = logexpand logbase = 10 min = 2000 max = 6000; xaxis type = log logstyle = logexpand logbase = 10 min = 100 max = 3000; lineparm x = 500 y=3 slope=1.5; does not work; run;

18 The Output from Code SAS Output Page 1 of 1

19 The Spirit of St. Louis (Using Log Transforms) data spirit; input gas weight headwind TO_distance x1; TO_DistL10 = log10(to_distance); weightl10 = log10(weight); LS_line = 10 2.6503023 x1 0.3237002; datalines; 36 2600 7 229. 71 2800 9 287. 111 3050 9 389. 151 3300 6 483. 201 3600 4 615. 251 3900 2 800. 301 4200 0 1023..... 100.... 3000 run;

20 The Spirit of St. Louis (Using Log Transforms) (continued) title "Spirit of St. Louis Takeoff Distance L10"; proc print data = spirit; run; title "Log10-Log10 Scatter Plot with Regression Line"; proc sgplot data=spirit ; scatter y = weight x = TO_distance; yaxis type = log logstyle = logexpand logbase = 10 min = 2000 max = 6000; xaxis type = log logstyle = logexpand logbase = 10 min = 100 max = 3000; series x = x1 y = LS_line; this connects points; run;

21 The Output from Code SAS Output Page 1 of 2 Spirit of St. Louis Takeoff Distance L10 Obs gas weight headwind TO_distance x1 TO_DistL10 weightl10 LS_line 1 36 2600 7 229. 2.35984 3.41497. 2 71 2800 9 287. 2.45788 3.44716. 3 111 3050 9 389. 2.58995 3.48430. 4 151 3300 6 483. 2.68395 3.51851. 5 201 3600 4 615. 2.78888 3.55630. 6 251 3900 2 800. 2.90309 3.59106. 7 301 4200 0 1023. 3.00988 3.62325. 8.... 100.. 1984.74 9.... 3000.. 5968.25

22 The Output from Code SAS Output Page 2 of 2

Multiple Regression with Timing Data 23 data multi; do N = 10000 to 50000 by 10000; N2 = N N; fit =.068 -.00000811 N + 1.228571E-9 N2; input time @; output; end; datalines; 0.11 0.39 0.95 1.69 2.74 run; title "Concatenation Data: x <- c(x,i)" ; proc print data=multi noobs; var time N fit; run; title "Quadratic Model Fit" ; proc reg data=multi; model time = N N2; run; title "Quadratic Fit"; proc gplot data = multi; plot fit N; symbol value = dot interpol=sms line=1 width=2; plot2 time N; this adds to a plot; run;

24 The Output from Code SAS Output Page 1 of 5 Concatenation Data: x <- c(x,i) time N fit 0.11 10000 0.10976 0.39 20000 0.39723 0.95 30000 0.93041 1.69 40000 1.70931 2.74 50000 2.73393

25 The Output from Code SAS Output Page 2 of 5 Quadratic Model Fit The REG Procedure Model: MODEL1 Dependent Variable: time Number of Observations Read 5 Number of Observations Used 5 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 2 4.51467 2.25734 5338.30 0.0002 Error 2 0.00084571 0.00042286 Corrected Total 4 4.51552 Root MSE 0.02056 R-Square 0.9998 Dependent Mean 1.17600 Adj R-Sq 0.9996 Coeff Var 1.74860 Parameter Estimates Variable DF Parameter Estimate Standard Error t Value Pr > t Intercept 1 0.06800 0.04410 1.54 0.2631 N 1-0.00000811 0.00000336-2.41 0.1371 N2 1 1.228571E-9 5.49582E-11 22.35 0.0020

26 The Output from Code Quadratic Model Fit The REG Procedure Model: MODEL1 Dependent Variable: time

27 The Output from Code SAS Output Page 4 of 5

28 The Output from Code SAS Output Page 5 of 5