Cherry.R. > cherry d h v <portion omitted> > # Step 1.

Similar documents
Checking the Poisson assumption in the Poisson generalized linear model

R Output for Linear Models using functions lm(), gls() & glm()

lm statistics Chris Parrish

Logistic Regression 21/05

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/

Exercise 5.4 Solution

Consider fitting a model using ordinary least squares (OLS) regression:

Leftovers. Morris. University Farm. University Farm. Morris. yield

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.

Logistic Regressions. Stat 430

Logistic Regression - problem 6.14

Statistics. Introduction to R for Public Health Researchers. Processing math: 100%

Booklet of Code and Output for STAD29/STA 1007 Midterm Exam

R Hints for Chapter 10

Poisson Regression. The Training Data

Interactions in Logistic Regression

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam

The GLM really is different than OLS, even with a Normally distributed dependent variable, when the link function g is not the identity.

Various Issues in Fitting Contingency Tables

Generalized Linear Models. stat 557 Heike Hofmann

Modeling Overdispersion

Sample solutions. Stat 8051 Homework 8

Reaction Days

Generalized linear models

Experimental Design and Statistical Methods. Workshop LOGISTIC REGRESSION. Jesús Piedrafita Arilla.

Booklet of Code and Output for STAD29/STA 1007 Midterm Exam

Statistical Prediction

Stat 401B Final Exam Fall 2016

Using R in 200D Luke Sonnet

Fractional Factorial Designs

Matched Pair Data. Stat 557 Heike Hofmann

Random Independent Variables

BMI 541/699 Lecture 22

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Generalised linear models. Response variable can take a number of different formats

cor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson )

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

STA 450/4000 S: January

ST430 Exam 1 with Answers

Generalized Linear Models

Age 55 (x = 1) Age < 55 (x = 0)

Linear Regression Models P8111

On the Inference of the Logistic Regression Model

Stat 401B Exam 3 Fall 2016 (Corrected Version)

MSH3 Generalized linear model Ch. 6 Count data models

Week 7 Multiple factors. Ch , Some miscellaneous parts

Logistic Regression. 0.1 Frogs Dataset

Nonlinear Models. What do you do when you don t have a line? What do you do when you don t have a line? A Quadratic Adventure

Introduction to the Generalized Linear Model: Logistic regression and Poisson regression

Residuals from regression on original data 1

Beam Example: Identifying Influential Observations using the Hat Matrix

The Statistical Sleuth in R: Chapter 7

UNIVERSITY OF TORONTO Faculty of Arts and Science

Class: Dean Foster. September 30, Read sections: Examples chapter (chapter 3) Question today: Do prices go up faster than they go down?

> modlyq <- lm(ly poly(x,2,raw=true)) > summary(modlyq) Call: lm(formula = ly poly(x, 2, raw = TRUE))

Lecture 19. Spatial GLM + Point Reference Spatial Data. Colin Rundel 04/03/2017

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Non-Gaussian Response Variables

PAPER 206 APPLIED STATISTICS

PAPER 30 APPLIED STATISTICS

STAC51: Categorical data Analysis

Duration of Unemployment - Analysis of Deviance Table for Nested Models

Classification. Chapter Introduction. 6.2 The Bayes classifier

Faculty of Science FINAL EXAMINATION Mathematics MATH 523 Generalized Linear Models

MODELS WITHOUT AN INTERCEPT

1 Forecasting House Starts

Basic Methods of Data Analysis Part 3. Sepp Hochreiter. Institute of Bioinformatics Johannes Kepler University, Linz, Austria

MSH3 Generalized linear model

Biostatistics 380 Multiple Regression 1. Multiple Regression

STAT 510 Final Exam Spring 2015

Econometrics II. Seppo Pynnönen. Spring Department of Mathematics and Statistics, University of Vaasa, Finland

STAT 572 Assignment 5 - Answers Due: March 2, 2007

Module 4: Regression Methods: Concepts and Applications

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

How to deal with non-linear count data? Macro-invertebrates in wetlands

Poisson Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Regression Methods for Survey Data

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb

Survival Analysis I (CHL5209H)

Variance Decomposition and Goodness of Fit

Introduction to General and Generalized Linear Models

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Math 2311 Written Homework 6 (Sections )

Generalized Linear Models in R

10702/36702 Statistical Machine Learning, Spring 2008: Homework 3 Solutions

PAPER 218 STATISTICAL LEARNING IN PRACTICE

9 Generalized Linear Models

Lab #5 - Predictive Regression I Econ 224 September 11th, 2018

MODULE 6 LOGISTIC REGRESSION. Module Objectives:

Inference for Regression

Statistical analysis of trends in climate indicators by means of counts

Analysis of variance. Gilles Guillot. September 30, Gilles Guillot September 30, / 29

R code and output of examples in text. Contents. De Jong and Heller GLMs for Insurance Data R code and output. 1 Poisson regression 2

Introduction to Statistics and R

Pairwise Overlap and Misclassification in Cluster. Analysis. Birzhan Akynkozhayev. Nazarbayev University 2015

ST430 Exam 2 Solutions

Transcription:

Cherry.R ####################################################################### library(mass) library(car) cherry < read.table(file="n:\\courses\\stat8620\\fall 08\\trees.dat",header=T) cherry d h v 1 8.3 70 10.3 2 10.7 81 18.8 <portion omitted 31 17.9 80 58.3 # Step 1. plot(cherry$d,cherry$v,xlab="diameter",ylab="volume") plot(cherry$h,cherry$v,xlab="height",ylab="volume")

lm1 < lm(v~ h + d,data=cherry) summary(lm1) Call: lm(formula = v ~ h + d, data = cherry) Residuals: Min 1Q Median 3Q Max 6.4065 2.6493 0.2876 2.2003 8.4847 Coefficients: Estimate Std. Error t value Pr( t ) (Intercept) 57.9877 8.6382 6.713 2.75e 07 *** h 0.3393 0.1302 2.607 0.0145 * d 4.7082 0.2643 17.816 < 2e 16 *** Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 3.882 on 28 degrees of freedom Multiple R squared: 0.948, Adjusted R squared: 0.9442 F statistic: 255 on 2 and 28 DF, p value: < 2.2e 16 plot(lm1,which=1)

qq.plot(lm1,sim=t) infl < influence.measures(lm1) summary(infl) Potentially influential observations of

lm(formula = v ~ h + d, data = cherry) : dfb.1_ dfb.h dfb.d dffit cov.r cook.d hat 24 0.74 0.34 0.97 1.50_* 0.68 0.61 0.23 shapiro.test(residuals(lm1)) Shapiro Wilk normality test data: residuals(lm1) W = 0.9743, p value = 0.644 # Step 2. bc1 < boxcox(v ~ h + d, data = cherry, + lambda = seq( 2, 2, length = 100),plotit=T) bc2 < boxcox(v ~ h + d, data = cherry, + lambda = seq(0,.6, length = 100),plotit=T)

bc2$x[bc2$y==max(bc2$y)] [1] 0.3090909 max(bc2$y) [1] 76.08043 profl < function(lambda,y,x) { + n < length(y) + z < y^lambda + # pl < (n/2)*log( sum( (z X%*%solve(t(X)%*%X)%*%t(X)%*%z )^2 ) ) + + # n*log(abs(lambda))+(lambda 1)*sum(log(y)) (n/2)*(1+log(2*pi/n)) + pl < (n/2)*log( sum( (z X%*%solve(t(X)%*%X)%*%t(X)%*%z )^2 ) ) + + n*log(abs(lambda))+(lambda 1)*sum(log(y)) + pl + } X < model.matrix(~h+d,data=cherry) profl(.30909,cherry$v,x) [1] 76.08043 lim < max(bc2$y) 0.5*qchisq(0.95,1) cbind(bc2$x,bc2$y,lim) lim [1,] 0.000000000 80.70206 78.00116 <portion omitted [19,] 0.109090909 78.16673 78.00116 [20,] 0.115151515 78.04850 78.00116 [21,] 0.121212121 77.93310 78.00116 <portion omitted [50,] 0.296969697 76.08541 78.00116 [51,] 0.303030303 76.08080 78.00116 [52,] 0.309090909 76.08043 78.00116 [53,] 0.315151515 76.08432 78.00116 <portion omitted

[82,] 0.490909091 77.97579 78.00116 [83,] 0.496969697 78.09657 78.00116 <portion omitted [100,] 0.600000000 80.57769 78.00116 lm2 < lm(i(v^(1/3))~ h + d,data=cherry) summary(lm2) Call: lm(formula = I(v^(1/3)) ~ h + d, data = cherry) Residuals: Min 1Q Median 3Q Max 0.159602 0.050200 0.006827 0.069649 0.133981 Coefficients: Estimate Std. Error t value Pr( t ) (Intercept) 0.085388 0.184315 0.463 0.647 h 0.014472 0.002777 5.211 1.56e 05 *** d 0.151516 0.005639 26.871 < 2e 16 *** Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 0.08283 on 28 degrees of freedom Multiple R squared: 0.9777, Adjusted R squared: 0.9761 F statistic: 612.5 on 2 and 28 DF, p value: < 2.2e 16 plot(lm2,which=1) qq.plot(lm2,sim=t) integer(0)

Warning message: In object$coefficients : $ operator is invalid for atomic vectors, returning NULL names(summary(lm2)) [1] "call" "terms" "residuals" "coefficients" [5] "aliased" "sigma" "df" "r.squared" [9] "adj.r.squared" "fstatistic" "cov.unscaled" summary(lm2)$sigma [1] 0.08282697 medvhat < fitted(lm2)^3 meanvhat < medvhat* (1+ (3*summary(lm2)$sigma^2)/(fitted(lm2)^2)) # Step 3. cherry$logv < log(cherry$v) cherry$logd < log(cherry$d/12) cherry$logh < log(cherry$h) plot(cherry$logd,cherry$logv,xlab="log diameter",ylab="log volume")

plot(cherry$logh,cherry$logv,xlab="log height",ylab="log volume") lm3 < lm(logv~ logh + logd,data=cherry) summary(lm3) Call: lm(formula = logv ~ logh + logd, data = cherry) Residuals: Min 1Q Median 3Q Max 0.168561 0.048488 0.002431 0.063637 0.129223

Coefficients: Estimate Std. Error t value Pr( t ) (Intercept) 1.70492 0.88190 1.933 0.0634. logh 1.11712 0.20444 5.464 7.8e 06 *** logd 1.98265 0.07501 26.432 < 2e 16 *** Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 0.08139 on 28 degrees of freedom Multiple R squared: 0.9777, Adjusted R squared: 0.9761 F statistic: 613.2 on 2 and 28 DF, p value: < 2.2e 16 plot(lm3,which=1) qq.plot(lm3,sim=t) integer(0)

shapiro.test(residuals(lm3)) Shapiro Wilk normality test data: residuals(lm3) W = 0.9592, p value = 0.2782 bc3 < boxcox(v ~ logh + logd, data = cherry, + lambda = seq(.3,.3, length = 100),plotit=T) bc3$x[bc3$y==max(bc3$y)]

[1] 0.06969697 max(bc3$y) [1] 75.04532 lim < max(bc3$y) 0.5*qchisq(0.95,1) cbind(bc3$x,bc3$y,lim) lim [1,] 0.300000000 78.31252 76.96605 <portion omitted [10,] 0.245454545 77.02905 76.96605 [11,] 0.239393939 76.90270 76.96605 <portion omitted [38,] 0.075757576 75.04963 76.96605 [39,] 0.069696970 75.04532 76.96605 [40,] 0.063636364 75.04584 76.96605 <portion omitted [68,] 0.106060606 76.89529 76.96605 [69,] 0.112121212 77.01898 76.96605 <portion omitted [100,] 0.300000000 82.07841 76.96605 cherry$off < cherry$logh+2*cherry$logd glm4 < glm(logv~ 1,data=cherry, + offset=off,family=gaussian(link="identity")) summary(glm4) Call: glm(formula = logv ~ 1, family = gaussian(link = "identity"), data = cherry, offset = off) Deviance Residuals: Min 1Q Median 3Q Max 0.168446 0.047355 0.003518 0.066308 0.136467 Coefficients: Estimate Std. Error t value Pr( t ) (Intercept) 1.19935 0.01421 84.42 <2e 16 *** Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 (Dispersion parameter for gaussian family taken to be 0.006256193) Null deviance: 0.18769 on 30 degrees of freedom Residual deviance: 0.18769 on 30 degrees of freedom AIC: 66.342 Number of Fisher Scoring iterations: 2

# Step 4. glm5 < glm(v~ h+d,data=cherry, + family=gaussian(link=power(1/3))) summary(glm5) Call: glm(formula = v ~ h + d, family = gaussian(link = power(1/3)), data = cherry) Deviance Residuals: Min 1Q Median 3Q Max 4.6590 1.5744 0.2949 1.6653 4.4737 Coefficients: Estimate Std. Error t value Pr( t ) (Intercept) 0.051322 0.224095 0.229 0.820518 h 0.014287 0.003342 4.274 0.000201 *** d 0.150331 0.005838 25.749 < 2e 16 *** Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 (Dispersion parameter for gaussian family taken to be 6.577063) Null deviance: 8106.08 on 30 degrees of freedom Residual deviance: 184.16 on 28 degrees of freedom AIC: 151.21 Number of Fisher Scoring iterations: 4 shapiro.test(residuals(glm5)) Shapiro Wilk normality test data: residuals(glm5) W = 0.9592, p value = 0.2778 library(qpcr) Rsq(glm5) [1] 0.9771597 plot(glm5,which=1)

glm6 < glm(v~ logh+logd,data=cherry, + family=gaussian(link="log")) summary(glm6) Call: glm(formula = v ~ logh + logd, family = gaussian(link = "log"), data = cherry) Deviance Residuals: Min 1Q Median 3Q Max 4.9080 1.1817 0.2101 1.7014 4.2551 Coefficients: Estimate Std. Error t value Pr( t ) (Intercept) 1.57484 1.04613 1.505 0.143422 logh 1.08765 0.24216 4.491 0.000111 *** logd 1.99692 0.08208 24.330 < 2e 16 *** Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 (Dispersion parameter for gaussian family taken to be 6.41642) Null deviance: 8106.08 on 30 degrees of freedom Residual deviance: 179.66 on 28 degrees of freedom AIC: 150.44 Number of Fisher Scoring iterations: 4

shapiro.test(residuals(glm6)) Shapiro Wilk normality test data: residuals(glm6) W = 0.9653, p value = 0.3996 Rsq(glm6) [1] 0.9777677 plot(glm6,which=1)