Booklet of Code and Output for STAC32 Final Exam

Similar documents
unadjusted model for baseline cholesterol 22:31 Monday, April 19,

Booklet of Code and Output for STAC32 Final Exam

Chapter 8 (More on Assumptions for the Simple Linear Regression)

Handout 1: Predicting GPA from SAT

STATISTICS 479 Exam II (100 points)

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum

EXST7015: Estimating tree weights from other morphometric variables Raw data print

Booklet of Code and Output for STAC32 Final Exam

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

This is a Randomized Block Design (RBD) with a single factor treatment arrangement (2 levels) which are fixed.

5.3 Three-Stage Nested Design Example

Stat 500 Midterm 2 12 November 2009 page 0 of 11

Biological Applications of ANOVA - Examples and Readings

GPA Chris Parrish January 18, 2016

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping

General Linear Model (Chapter 4)

1) Answer the following questions as true (T) or false (F) by circling the appropriate letter.

Topic 18: Model Selection and Diagnostics

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c

STAT 350: Summer Semester Midterm 1: Solutions

Lecture 11 Multiple Linear Regression

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

a. YOU MAY USE ONE 8.5 X11 TWO-SIDED CHEAT SHEET AND YOUR TEXTBOOK (OR COPY THEREOF).

Week 7.1--IES 612-STA STA doc

Chapter 6 Multiple Regression

ANOVA: Analysis of Variation

Statistics for exp. medical researchers Regression and Correlation

3 Variables: Cyberloafing Conscientiousness Age

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

Comparison of a Population Means

Stat 5102 Final Exam May 14, 2015

Biostatistics for physicists fall Correlation Linear regression Analysis of variance

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Chapter 11 : State SAT scores for 1982 Data Listing

SAS Commands. General Plan. Output. Construct scatterplot / interaction plot. Run full model

A Little Stats Won t Hurt You

SPECIAL TOPICS IN REGRESSION ANALYSIS

dm'log;clear;output;clear'; options ps=512 ls=99 nocenter nodate nonumber nolabel FORMCHAR=" = -/\<>*"; ODS LISTING;

Detecting and Assessing Data Outliers and Leverage Points

In many situations, there is a non-parametric test that corresponds to the standard test, as described below:

Outline. Topic 19 - Inference. The Cell Means Model. Estimates. Inference for Means Differences in cell means Contrasts. STAT Fall 2013

Lecture 13 Extra Sums of Squares

Confidence Interval for the mean response

Stat 302 Statistical Software and Its Applications SAS: Simple Linear Regression

Lecture 11: Simple Linear Regression

N J SS W /df W N - 1

Odor attraction CRD Page 1

171:162 Design and Analysis of Biomedical Studies, Summer 2011 Exam #3, July 16th

Multicollinearity Exercise

ANALYSES OF NCGS DATA FOR ALCOHOL STATUS CATEGORIES 1 22:46 Sunday, March 2, 2003

Linear Combinations of Group Means

Overview Scatter Plot Example

BE640 Intermediate Biostatistics 2. Regression and Correlation. Simple Linear Regression Software: SAS. Emergency Calls to the New York Auto Club

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

EXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :"

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

Descriptions of post-hoc tests

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

In Class Review Exercises Vartanian: SW 540

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Introduction to Linear regression analysis. Part 2. Model comparisons

Failure Time of System due to the Hot Electron Effect

Simple Linear Regression

Topic 16: Multicollinearity and Polynomial Regression

Analysis of Variance. Source DF Squares Square F Value Pr > F. Model <.0001 Error Corrected Total

Single Factor Experiments

UNIVERSITY EXAMINATIONS NJORO CAMPUS SECOND SEMESTER 2011/2012

Outline Topic 21 - Two Factor ANOVA

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis

Two-factor studies. STAT 525 Chapter 19 and 20. Professor Olga Vitek

Topic 14: Inference in Multiple Regression

STOR 455 STATISTICAL METHODS I

data proc sort proc corr run proc reg run proc glm run proc glm run proc glm run proc reg CONMAIN CONINT run proc reg DUMMAIN DUMINT run proc reg

MODELS WITHOUT AN INTERCEPT

MATH 644: Regression Analysis Methods

Statistiek II. John Nerbonne. March 17, Dept of Information Science incl. important reworkings by Harmut Fitz

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

Lecture 1 Linear Regression with One Predictor Variable.p2

Lecture 5: Comparing Treatment Means Montgomery: Section 3-5

Paper: ST-161. Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop UMBC, Baltimore, MD

Regression without measurement error using proc calis

Chapter 1 Linear Regression with One Predictor

Statistics 5100 Spring 2018 Exam 1

Stat 5303 (Oehlert): Randomized Complete Blocks 1

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS

Workshop 7.4a: Single factor ANOVA

Transition Passage to Descriptive Statistics 28

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

R 2 and F -Tests and ANOVA

ST430 Exam 2 Solutions

Outline. Topic 22 - Interaction in Two Factor ANOVA. Interaction Not Significant. General Plan

a. The least squares estimators of intercept and slope are (from JMP output): b 0 = 6.25 b 1 =

Booklet of Code and Output for STAD29/STA 1007 Midterm Exam

Residuals from regression on original data 1

Chapter 2 Inferences in Simple Linear Regression

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3.1 through 3.3

STAT 350. Assignment 4

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3-1 through 3-3

Topic 20: Single Factor Analysis of Variance

Transcription:

Booklet of Code and Output for STAC32 Final Exam December 8, 2014

List of Figures in this document by page: List of Figures 1 Popcorn data............................. 2 2 MDs by city, with normal quantile plot............... 3 3 Reading in the MDs data...................... 4 4 Output from proc univariate for MDs data........... 5 5 Apnea data and dierences (before minus after).......... 6 6 Table of binomial distribution with n = 13, p = 0.5........ 7 7 Data for selecting in SAS...................... 8 8 SAS code for data and means for the writers data......... 8 9 Boxplots for writers data....................... 9 10 SAS ANOVA for writers data.................... 10 11 GPA data............................... 11 12 GPA data: rst regression...................... 11 13 GPA data: regression without SATM................. 12 14 GPA data: regression with only HSGPA and SATV......... 13 15 Mystery R code............................ 13 16 Scatterplot of social worker salaries by experience......... 14 17 Regression and residual plot for predicting salary from experience 15 18 Code and output for Box-Cox transformation of salary...... 16 19 Residual plot from regression of transformed salary........ 17 20 SAS code for reading and summarizing perch data........ 18 21 Obtaining leverages for perch data................. 19 1

Brand Trial Unpopped Orville 1 26 Orville 2 35 Orville 3 18 Orville 4 14 Orville 5 8 Orville 6 6 Seaway 1 47 Seaway 2 47 Seaway 3 14 Seaway 4 34 Seaway 5 21 Seaway 6 37 Figure 1: Popcorn data 2

R> health=read.table("metrohealth.txt",header=t) R> attach(health) R> head(nummds) [1] 349 4042 256 2679 502 2352 R> qqnorm(nummds) R> qqline(nummds) 2 1 0 1 2 0 2000 4000 6000 8000 Normal Q Q Plot Theoretical Quantiles Sample Quantiles Figure 2: MDs by city, with normal quantile plot 3

Some of the health care data. Values are separated by tabs. Actual lines are very long and begin with a city name (the lines wrap here). SAS code is below. City NumMDs RateMDs NumHospitals NumBeds RateBeds NumMedicare PctChangeMedicare MedicareRate SSBNum SSBRate SSBChange NumRetired SSINum SSIRate SqrtMDs "Holland-Grand Haven, MI" 349 140 3 316 127 29533 8.3 11835 34135 13679 8.1 23165 2070 820 18.6815 "Louisville, KY-IN" 4042 340 18 3909 328 173845 3 14606 202485 17013 3 118920 29017 2416 63.5767 "Battle Creek, MI" 256 184 3 517 372 22972 2.4 16539 27245 19615 3.3 16645 4095 2945 16 data health; infile '/home/ken/metrohealth.txt' firstobs=2 dlm='09'x; input city $ nummds; Figure 3: Reading in the MDs data 4

proc univariate; var nummds; The UNIVARIATE Procedure Variable: nummds Moments N 83 Sum Weights 83 Mean 1643.3253 Sum Observations 136396 Std Deviation 1981.43175 Variance 3926071.78 Skewness 2.02744075 Kurtosis 3.92289647 Uncorrected SS 546080884 Corrected SS 321937886 Coeff Variation 120.57453 Std Error Mean 217.49039 Basic Statistical Measures Location Variability Mean 1643.325 Std Deviation 1981 Median 844.000 Variance 3926072 Mode 200.000 Range 9267 Interquartile Range 1685 Note: The mode displayed is the smallest of 7 modes with a count of 2. Tests for Location: Mu0=0 Test -Statistic- -----p Value------ Student's t t 7.555852 Pr > t <.0001 Sign M 41.5 Pr >= M <.0001 Signed Rank S 1743 Pr >= S <.0001 Quantiles (Definition 5) Quantile Estimate 100% Max 9410 99% 9410 95% 6050 90% 4612 75% Q3 2018 50% Median 844 25% Q1 333 10% 226 5% 200 1% 143 0% Min 143 Extreme Observations ----Lowest---- ----Highest--- Value Obs Value Obs 143 23 6050 61 144 43 6180 39 180 51 7575 81 The UNIVARIATE Procedure Variable: nummds Extreme Observations ----Lowest---- ----Highest--- Value Obs Value Obs 185 55 8107 67 200 37 9410 35 5 Figure 4: Output from proc univariate for MDs data

R> apnea=read.table("apnea.txt",header=t) R> attach(apnea) R> diff=before-after R> cbind(apnea,diff) before after diff 1 1.71 0.13 1.58 2 1.25 0.88 0.37 3 2.13 1.38 0.75 4 1.29 0.13 1.16 5 1.58 0.25 1.33 6 4.00 2.63 1.37 7 1.42 1.38 0.04 8 1.08 0.50 0.58 9 1.83 1.25 0.58 10 0.67 0.75-0.08 11 1.13 0.00 1.13 12 2.71 2.38 0.33 13 1.96 1.13 0.83 Figure 5: Apnea data and dierences (before minus after) 6

The table below shows the probability of obtaining less than or equal to k successes in a binomial distribution with n = 13, p = 0.5. R> k=0:13 R> p=pbinom(k,13,0.5) R> cbind(k,p) k p [1,] 0 0.0001220703 [2,] 1 0.0017089844 [3,] 2 0.0112304688 [4,] 3 0.0461425781 [5,] 4 0.1334228516 [6,] 5 0.2905273438 [7,] 6 0.5000000000 [8,] 7 0.7094726562 [9,] 8 0.8665771484 [10,] 9 0.9538574219 [11,] 10 0.9887695312 [12,] 11 0.9982910156 [13,] 12 0.9998779297 [14,] 13 1.0000000000 Figure 6: Table of binomial distribution with n = 13, p = 0.5 7

data mydata; infile '/home/ken/mydata.txt'; input x y g $; proc print; Obs x y g 1 32.3020 2 a 2 30.8283 6 a 3 29.0993 6 a 4 24.4495 2 b 5 30.3253 3 b 6 24.1334 1 a 7 23.6774 6 c 8 32.6610 9 b 9 27.5017 5 b 10 17.2036 6 b Figure 7: Data for selecting in SAS data writers; infile 'writers.txt'; input genre $ age; proc means; var age; class genre; The MEANS Procedure Analysis Variable : age N genre Obs N Mean Std Dev Minimum Maximum -------------------------------------------------------------------------------------- nonficti 24 24 76.8750000 14.0969084 40.0000000 97.0000000 novelist 67 67 71.4477612 13.0515105 35.0000000 91.0000000 poet 32 32 63.1875000 17.2970956 30.0000000 90.0000000 -------------------------------------------------------------------------------------- Figure 8: SAS code for data and means for the writers data 8

proc boxplot; plot age*genre / boxstyle=schematic; 100 80 age 60 40 20 novelist poet nonficti genre Figure 9: Boxplots for writers data 9

proc anova; class genre; model age=genre; means genre / tukey; The ANOVA Procedure Class Level Information Class Levels Values genre 3 nonficti novelist poet Number of Observations Read 123 Number of Observations Used 123 The ANOVA Procedure Dependent Variable: age Sum of Source DF Squares Mean Square F Value Pr > F Model 2 2744.19300 1372.09650 6.56 0.0020 Error 120 25088.06716 209.06723 Corrected Total 122 27832.26016 R-Square Coeff Var Root MSE age Mean 0.098598 20.55092 14.45916 70.35772 Source DF Anova SS Mean Square F Value Pr > F genre 2 2744.192998 1372.096499 6.56 0.0020 The ANOVA Procedure Tukey's Studentized Range (HSD) Test for age NOTE: This test controls the Type I experimentwise error rate. Alpha 0.05 Error Degrees of Freedom 120 Error Mean Square 209.0672 Critical Value of Studentized Range 3.35614 Comparisons significant at the 0.05 level are indicated by ***. Difference genre Between Simultaneous 95% Comparison Means Confidence Limits nonficti - novelist 5.427-2.736 13.590 nonficti - poet 13.688 4.422 22.953 *** novelist - nonficti -5.427-13.590 2.736 novelist - poet 8.260 0.887 15.634 *** poet - nonficti -13.688-22.953-4.422 *** poet - novelist -8.260-15.634-0.887 *** 10 Figure 10: SAS ANOVA for writers data

R> gpa=read.table("gpa.txt",header=t) R> head(gpa) GPA HSGPA SATV SATM Male HU SS FirstGen White CollegeBound 1 3.06 3.83 680 770 1 3.0 9.0 1 1 1 2 4.15 4.00 740 720 0 9.0 3.0 0 1 1 3 3.41 3.70 640 570 0 16.0 13.0 0 0 1 4 3.21 3.51 740 700 0 22.0 0.0 0 1 1 5 3.48 3.83 610 610 0 30.5 1.5 0 1 1 6 2.95 3.25 600 570 0 18.0 3.0 0 1 1 Figure 11: GPA data R> gpa.1=lm(gpa~hsgpa+satv+satm+male,data=gpa) R> summary(gpa.1) Call: lm(formula = GPA ~ HSGPA + SATV + SATM + Male, data = gpa) Residuals: Min 1Q Median 3Q Max -0.95975-0.27713 0.05058 0.28319 0.89525 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 6.135e-01 3.283e-01 1.869 0.06301. HSGPA 5.069e-01 7.623e-02 6.650 2.4e-10 *** SATV 1.174e-03 3.940e-04 2.979 0.00322 ** SATM -5.580e-06 4.626e-04-0.012 0.99039 Male 5.534e-02 6.020e-02 0.919 0.35901 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 0.407 on 214 degrees of freedom Multiple R-squared: 0.2494, Adjusted R-squared: 0.2353 F-statistic: 17.77 on 4 and 214 DF, p-value: 1.298e-12 Figure 12: GPA data: rst regression 11

R> gpa.2=update(gpa.1,.~.-satm) R> summary(gpa.2) Call: lm(formula = GPA ~ HSGPA + SATV + Male, data = gpa) Residuals: Min 1Q Median 3Q Max -0.95990-0.27695 0.05086 0.28309 0.89534 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 0.6117703 0.2964803 2.063 0.040272 * HSGPA 0.5068283 0.0756565 6.699 1.81e-10 *** SATV 0.0011714 0.0003423 3.422 0.000743 *** Male 0.0550773 0.0560430 0.983 0.326827 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 0.4061 on 215 degrees of freedom Multiple R-squared: 0.2494, Adjusted R-squared: 0.2389 F-statistic: 23.81 on 3 and 215 DF, p-value: 2.414e-13 Figure 13: GPA data: regression without SATM 12

R> gpa.3=update(gpa.2,.~.-male) R> summary(gpa.3) Call: lm(formula = GPA ~ HSGPA + SATV, data = gpa) Residuals: Min 1Q Median 3Q Max -0.97894-0.27639 0.02867 0.30133 0.87956 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 0.6351217 0.2955033 2.149 0.03272 * HSGPA 0.4975320 0.0750569 6.629 2.66e-10 *** SATV 0.0012283 0.0003373 3.641 0.00034 *** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 0.4061 on 216 degrees of freedom Multiple R-squared: 0.246, Adjusted R-squared: 0.239 F-statistic: 35.23 on 2 and 216 DF, p-value: 5.711e-14 Figure 14: GPA data: regression with only HSGPA and SATV R> my.df=data.frame(hsgpa=3.6,satv=640,satm=670,male=0) R> pp=predict(gpa.3,my.df) R> cbind(my.df,pp) HSGPA SATV SATM Male pp 1 3.6 640 670 0 3.212337 Figure 15: Mystery R code 13

R> socwork=read.table("socwork.txt",header=t) R> attach(socwork) R> plot(salary~experience) R> lines(lowess(salary~experience)) 0 5 10 15 20 25 2e+04 4e+04 6e+04 8e+04 1e+05 experience salary Figure 16: Scatterplot of social worker salaries by experience 14

R> esq=experience*experience R> socwork.1=lm(salary~experience+esq) R> r=resid(socwork.1) R> f=fitted(socwork.1) R> plot(r~f) 20000 30000 40000 50000 60000 70000 10000 0 10000 20000 f r Figure 17: Regression and residual plot for predicting salary from experience 15

R> library(mass) R> boxcox(salary~experience) 95% log Likelihood 50 40 30 20 10 2 1 0 1 2 λ Figure 18: Code and output for Box-Cox transformation of salary 16

R> socwork.2=lm(sal.trans~experience) R> r=resid(socwork.2) R> f=fitted(socwork.2) R> plot(r~f) R> lines(lowess(r~f)) 10.0 10.2 10.4 10.6 10.8 11.0 11.2 0.3 0.2 0.1 0.0 0.1 0.2 f r Figure 19: Residual plot from regression of transformed salary 17

data perch; infile '/home/ken/perch.txt' firstobs=2 expandtabs; input obs weight length width; z=1; proc print; proc means; var weight length width; Obs obs weight length width z 1 104 5.9 8.8 1.4 1 2 105 32.0 14.7 2.0 1 3 106 40.0 16.0 2.4 1 4 107 51.5 17.2 2.6 1 5 108 70.0 18.5 2.9 1 6 109 100.0 19.2 3.3 1 7 110 78.0 19.4 3.1 1 8 111 80.0 20.2 3.1 1 9 112 85.0 20.8 3.0 1 10 113 85.0 21.0 2.8 1 11 114 110.0 22.5 3.6 1 12 115 115.0 22.5 3.3 1 13 116 125.0 22.5 3.7 1 14 117 130.0 22.8 3.5 1 15 118 120.0 23.5 3.4 1 16 119 120.0 23.5 3.5 1 17 120 130.0 23.5 3.5 1 18 121 135.0 23.5 3.5 1 19 122 110.0 23.5 4.0 1 20 123 130.0 24.0 3.6 1 21 124 150.0 24.0 3.6 1 22 125 145.0 24.2 3.6 1 23 126 150.0 24.5 3.6 1 24 127 170.0 25.0 3.7 1 25 128 225.0 25.5 3.7 1 26 129 145.0 25.5 3.8 1 27 130 188.0 26.2 4.2 1 28 131 180.0 26.5 3.7 1 29 132 197.0 27.0 4.2 1 30 133 218.0 28.0 4.1 1 The MEANS Procedure Variable N Mean Std Dev Minimum Maximum ------------------------------------------------------------------------------ weight 30 120.6800000 53.0654079 5.9000000 225.0000000 length 30 22.1333333 4.0797510 8.8000000 28.0000000 width 30 3.3466667 0.6262826 1.4000000 4.2000000 ------------------------------------------------------------------------------ 18 Figure 20: SAS code for reading and summarizing perch data

proc reg; model z=weight length width / influence; The REG Procedure Model: MODEL1 Dependent Variable: z Number of Observations Read 30 Number of Observations Used 30 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 3 0 0.. Error 26 0 0 Corrected Total 29 0 Root MSE 0 R-Square. Dependent Mean 1.00000 Adj R-Sq. Coeff Var 0 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept 1 1.00000 0 Infty <.0001 weight 1 0 0.. length 1 0 0.. width 1 0 0.. The REG Procedure Model: MODEL1 Dependent Variable: z Output Statistics Hat Diag Cov ---------------DFBETAS-------------- Obs Residual RStudent H Ratio DFFITS Intercept weight length width 1 0. 0.5818...... 2 0. 0.2134...... 3 0. 0.1177...... 4 0. 0.0926...... 5 0. 0.0719...... 6 0. 0.2173...... 7 0. 0.0799...... 8 0. 0.0681...... 9 0. 0.0939...... 10 0. 0.2246...... 11 0. 0.0915...... 12 0. 0.0528...... 13 0. 0.1226...... 14 0. 0.0375...... 15 0. 0.0848...... 16 0. 0.0649...... 17 0. 0.0439...... 18 0. 0.0398...... 19 0. 0.2990...... 20 0. 0.0559...... 21 0. 0.0449...... 22 0. 0.0447...... 23 0. 0.0536...... 24 0. 0.0732...... 25 0. 0.4217...... 26 0. 0.0812...... 27 0. 0.1640...... 28 0. 0.1574 19...... 29 0. 0.1298...... 30 0. 0.1758...... Sum of Residuals 0 Sum of Squared Residuals 0 Predicted Residual SS (PRESS) 0 Figure 21: Obtaining leverages for perch data