Reaction Days

Similar documents
R Output for Linear Models using functions lm(), gls() & glm()

Nonlinear Models. What do you do when you don t have a line? What do you do when you don t have a line? A Quadratic Adventure

Sample solutions. Stat 8051 Homework 8

Generalized linear models

Logistic Regressions. Stat 430

Generalized Linear Models

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.

Interactions in Logistic Regression

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Exercise 5.4 Solution

Exam Applied Statistical Regression. Good Luck!

Stat 5102 Final Exam May 14, 2015

Poisson Regression. The Training Data

Booklet of Code and Output for STAD29/STA 1007 Midterm Exam

Example: 1982 State SAT Scores (First year state by state data available)

Week 7 Multiple factors. Ch , Some miscellaneous parts

Non-Gaussian Response Variables

Module 4: Regression Methods: Concepts and Applications

MODELS WITHOUT AN INTERCEPT

> modlyq <- lm(ly poly(x,2,raw=true)) > summary(modlyq) Call: lm(formula = ly poly(x, 2, raw = TRUE))

Linear Regression Models P8111

Poisson Regression. Gelman & Hill Chapter 6. February 6, 2017

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/

Chapter 8 Conclusion

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb

PAPER 206 APPLIED STATISTICS

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

Introduction to Regression in R Part II: Multivariate Linear Regression

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

Booklet of Code and Output for STAD29/STA 1007 Midterm Exam

Leftovers. Morris. University Farm. University Farm. Morris. yield

STAT 572 Assignment 5 - Answers Due: March 2, 2007

Stat 401B Final Exam Fall 2016

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

Explanatory variables are: weight, width of shell, color (medium light, medium, medium dark, dark), and condition of spine.

Generalized Linear Models

Introduction to the Generalized Linear Model: Logistic regression and Poisson regression

Generalized linear models

STAT 350: Summer Semester Midterm 1: Solutions

Modeling Overdispersion

1 Multiple Regression

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Basic Methods of Data Analysis Part 3. Sepp Hochreiter. Institute of Bioinformatics Johannes Kepler University, Linz, Austria

R Hints for Chapter 10

MS&E 226: Small Data

1 Introduction 1. 2 The Multiple Regression Model 1

Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression

ST430 Exam 2 Solutions

Tests of Linear Restrictions

Logistic Regression - problem 6.14

ssh tap sas913, sas

Using R in 200D Luke Sonnet

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

Experimental Design and Statistical Methods. Workshop LOGISTIC REGRESSION. Jesús Piedrafita Arilla.

Nested 2-Way ANOVA as Linear Models - Unbalanced Example

1 Forecasting House Starts

Inference with Heteroskedasticity

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Introduction and Background to Multilevel Analysis

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c

STAC51: Categorical data Analysis

Logistic Regression 21/05

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Cherry.R. > cherry d h v <portion omitted> > # Step 1.

How to deal with non-linear count data? Macro-invertebrates in wetlands

Part II { Oneway Anova, Simple Linear Regression and ANCOVA with R

Psychology 405: Psychometric Theory

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00

Nonlinear Models. Daphnia: Purveyors of Fine Fungus 1/30 2/30

Regression. Bret Hanlon and Bret Larget. December 8 15, Department of Statistics University of Wisconsin Madison.

Chapter 22: Log-linear regression for Poisson counts

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam

STATISTICS 479 Exam II (100 points)

1 Use of indicator random variables. (Chapter 8)

Stat 5303 (Oehlert): Randomized Complete Blocks 1

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Booklet of Code and Output for STAC32 Final Exam

Consider fitting a model using ordinary least squares (OLS) regression:

Stat 401B Exam 3 Fall 2016 (Corrected Version)

General Linear Statistical Models - Part III

Distribution Assumptions

Notes for week 4 (part 2)

Chapter 3: Generalized Linear Models

Multivariate Statistics in Ecology and Quantitative Genetics Summary

Detecting and Assessing Data Outliers and Leverage Points

Pumpkin Example: Flaws in Diagnostics: Correcting Models

Generalised linear models. Response variable can take a number of different formats

Poisson Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

SCHOOL OF MATHEMATICS AND STATISTICS

9 Generalized Linear Models

Lecture 18: Simple Linear Regression

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

SEEC Toolbox seminars

Multiple Regression Part I STAT315, 19-20/3/2014

Random Independent Variables

Diagnostics and Transformations Part 2

Circle a single answer for each multiple choice question. Your choice should be made clearly.

Checking the Poisson assumption in the Poisson generalized linear model

Transcription:

Stat April 03 Week Fitting Individual Trajectories # Straight-line, constant rate of change fit > sdat = subset(sleepstudy, Subject == "37") > sdat Reaction Days Subject > lm.sdat = lm(reaction ~ Days) #OLS rate of 'decli 7 69.47 0 37 > summary(lm.sdat) 7 73.4740 37 Call: lm(formula = Reaction ~ Days) 73 97.5968 37 Residuals: 74 30.636 3 37 Min Q Median 3Q Max 75 87.76 4 37-5.064-4.8.008 7.485.7 76 39.6076 5 37 Coefficients: 77 334.488 6 37 Estimate Std. Error t value Pr(> t ) 78 343.99 7 37 (Intercept) 67.045 6.63 40.65.59e-0 ** 79 369.47 8 37 Days.98.4 9.094.7e-05 ** 80 364.36 9 37 --- Residual standard error:.8 on 8 d freedom > plot(days, Reaction) Multiple R-squared: 0.98, Adj R-squared: 0.9008 > abline(coef(lm.sdat)[], F-statistic: 8.7 on and 8 DF,p-val:.76e-05 coef(lm.sdat)[]) #see plot # Autocorrelation worries > # AR() with standard Durbin-Watson test from package lmtest > install.packages("lmtest") > dwtest(lm.sdat, alternative = "two.sided") Durbin-Watson test data: lm.sdat DW =.878, p-value = 0.599 # approx DW = ( - r) alternative hypothesis: true autocorrelation is not 0 > acf(sdat) =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= # Polynomial (Quadratic, Cubic) Trajectories; Berkeley Growth Data > bgsdat = read.table(file="d:\\drr3\\stat\\week\\bgsdata", header = T) > attach(bgsdat) > plot(age, cog) #see plot #Data from the Berkeley Growth Study > lm.bgsq = lm(cog ~ age + I(age^)) #(Nancy Bailey). Data are for Child > lm.bgsc = lm(cog ~ age + I(age^) + I(age^3)) ##8 in the BGS study with age in mont > anova(lm.bgsq, lm.bgsc) #(ranging from to 60) and intellect Analysis of Variance Table #performance "cog". Model : cog ~ age + I(age^) cog age Model : cog ~ age + I(age^) + I(age^3) 4 Res.Df RSS Df Sum of Sq F Pr(>F) 0 8 303.7 7 3 7 54.88 778.8 5.07 0.000049 *** 37 5 --- 65 7 > summary(lm.bgsc) 85 9 Call: lm(formula = cog ~ age + I(age^) + I(age^3)) 88 0 Residuals: 95 Min Q Median 3Q Max 0 -.60-3.78-0.5045 4.083 9.668 03 3 Coefficients: 07 4 Estimate Std. Error t value Pr(> t ) 3 5 (Intercept) -7.7669 3.675075 -.3 0.04967 * 8 age 0.9380 0.583345 8.705 8.9e-3 *** 48 I(age^) -0.98944 0.0407-8.9.5e-07 *** 6 4 I(age^3) 0.00386 0.00076 5.0 0.00005 *** 65 7 87 36 Residual standard error: 5.557 on 7 degrees of freedom 05 4 Multiple R-squared: 0.9946, Adjusted R-squared: 0.9936 8 48 F-statistic: 039 on 3 and 7 DF, p-value: <.e-6 8 54 8 60

Reaction 80 300 30 340 360 0 4 6 8 Days

cog 0 50 00 50 00 0 0 0 30 40 50 60 age

> dwtest(lm.bgsc, alternative = "two.sided") Durbin-Watson test data: lm.bgsc DW =., p-value = 0.006566 alternative hypothesis: true autocorrelation is not 0 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= #######Count Data, Generalized Linear Models # slide for Poisson link function (log) > am = glm(cases ~ year,data=belg.aids, family=poisson(link=log)) > summary(am) Call: glm(formula = cases ~ year, family = poisson(link = log), data = belg.aids) Deviance Residuals: > belg.aids Min Q Median 3Q Max cases year -4.6784 -.503-0.636.760.7306 Coefficients: 4 Estimate Std. Error z value Pr(> z ) 3 33 3 (Intercept) 3.40590 0.07847 40.4 <e-6 *** 4 50 4 year 0.0 0.00777 6.0 <e-6 *** 5 67 5 --- 6 74 6 7 3 7 (Dispersion parameter for poisson family taken to be ) 8 4 8 Null deviance: 87.06 on degrees of freedom 9 65 9 Residual deviance: 80.686 on degrees of freedom 0 04 0 AIC: 66.37 53 Number of Fisher Scoring iterations: 4 46 3 40 3 > plot(am) # gives you the set of diagnostic plots--resids vs fitted etc # Quadratic in year > am = glm(cases ~ year+i(year^),data=belg.aids, family=poisson(link=log)) > summary(am) Call: glm(formula = cases ~ year + I(year^), family = poisson(link = log), data = belg.aids) Deviance Residuals: Min Q Median 3Q Max -.45903-0.6449 0.0897 0.677.54596 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept).90459 0.86877 0.75 < e-6 *** year 0.556003 0.045780.45 < e-6 *** I(year^) -0.0346 0.00659-8.09 9.8e-6 *** --- (Dispersion parameter for poisson family taken to be ) Null deviance: 87.058 on degrees of freedom Residual deviance: 9.40 on 0 degrees of freedom AIC: 96.94 Number of Fisher Scoring iterations: 4 > anova(am,am) # compare nested models Analysis of Deviance Table Model : cases ~ year Model : cases ~ year + I(year^) Resid. Df Resid. Dev Df Deviance 80.686 0 9.40 7.446

> anova(am,am, test = "Chisq") Analysis of Deviance Table Model : cases ~ year Model : cases ~ year + I(year^) Resid. Df Resid. Dev Df Deviance Pr(>Chi) 80.686 0 9.40 7.446 <.e-6 *** > AIC(am,am) df AIC am 66.3698 am 3 96.9358 > # cubic doesn't help, see link > year = seq(,3,length=00) > fv = predict(am,newdata=data.frame(year=year),se=true) > plot(belg.aids$year+980,belg.aids$cases) # data > lines(year+980,exp(fv$fit),col=) # fit > lines(year+980,exp(fv$fit+*fv$se),col=3) # upper c.l. > lines(year+980,exp(fv$fit-*fv$se),col=3) # lower c.l. > # produces nice final plot, note the overlay of fit and CI bands (*se) =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= # Non-linear Models: Logistic Growth Trajectory ### http://svn.r-project.org/r/trunk/src/library/datasets/data/chickweight.r ### Data on the growth of chicks on different diets. ### Hand and Crowder (996), Table A., p. 7 > Time = c(0,, 4, 6, 8, 0,, 4, 6, 8, 0, ) > weight = c(4, 5, 59, 64, 76, 93, 06, 5, 49, 7, 99, 05) > plot(time, weight) > Chick. = as.data.frame(cbind(time, weight)) > Asym = 668; xmid = 9; scal = 6 #fit not sensitive to choices of initial vals > fm3 <- nls(weight ~ SSlogis(Time, Asym, xmid, scal), data = Chick.) > summary(fm3) Formula: weight ~ SSlogis(Time, Asym, xmid, scal) > Chick. Parameters: Time weight Estimate Std. Error t value Pr(> t ) 0 4 Asym 937.04 465.8579.0 0.0756. 5 xmid 35.8 8.39 4.38 0.008 ** 3 4 59 scal.405 0.905.599 5.08e-07 *** 4 6 64 --- 5 8 76 Residual standard error:.99 on 9 d f 6 0 93 Number of iterations to convergence: 0 7 06 Achieved convergence tolerance: 6.6e-07 8 4 5 > predict(fm3, Time) 9 6 49 [] 40.84655 48.74 56.9649 0 8 7 67.09886 78.8773 9.50348 0 99 08.8683 6.305 46.5766 05 69.5374 95.559 09.54 > #at 0 weight = 3, four parameter logistic SSfpl Self-Starting Nls Four-Parameter Log

AIDS model example belg.aids <- data.frame(cases=c(,4,33,50,67,74,3, 4,65,04,53,46,40),year=:3) am <- glm(cases ~ year,data=belg.aids, family=poisson(link=log)) plot(am) Residuals vs Fitted Normal Q Q Scale Location Residuals vs Leverage Residuals 4 0 3 3.5 4.5 5.5.0 0.5 0.5 3.5 0.0.0 0.0 0.5.0.5 3 3.5 4.5 5.5 0 0.5 0.5 3 Cook s distance 0.0 0. 0.4 Predicted values Theoretical Quantiles Predicted values Leverage...clear trend in the residual mean + some overly influential points.

AIDS model example II Try a quadratic time dependence? am <- glm(cases ~ year+i(year^),data=belg.aids, family=poisson(link=log)) plot(am) Residuals vs Fitted Normal Q Q Scale Location Residuals vs Leverage Residuals.5 0.0.0 6.5 3.5 4.5 5.5 0 6.5 0.0.0 0.0 0.4 0.8. 6.5 3.5 4.5 5.5 0 0.5 0.5 3 Cook s distance 0.0 0. 0.4 0.6 Predicted values Theoretical Quantiles Predicted values Leverage...much better.

Fitted AIDS model cases 50 00 50 00 50 98 984 986 988 990 99 year

weight 50 00 50 00 0 5 0 5 0 Time