Stat April 03 Week Fitting Individual Trajectories # Straight-line, constant rate of change fit > sdat = subset(sleepstudy, Subject == "37") > sdat Reaction Days Subject > lm.sdat = lm(reaction ~ Days) #OLS rate of 'decli 7 69.47 0 37 > summary(lm.sdat) 7 73.4740 37 Call: lm(formula = Reaction ~ Days) 73 97.5968 37 Residuals: 74 30.636 3 37 Min Q Median 3Q Max 75 87.76 4 37-5.064-4.8.008 7.485.7 76 39.6076 5 37 Coefficients: 77 334.488 6 37 Estimate Std. Error t value Pr(> t ) 78 343.99 7 37 (Intercept) 67.045 6.63 40.65.59e-0 ** 79 369.47 8 37 Days.98.4 9.094.7e-05 ** 80 364.36 9 37 --- Residual standard error:.8 on 8 d freedom > plot(days, Reaction) Multiple R-squared: 0.98, Adj R-squared: 0.9008 > abline(coef(lm.sdat)[], F-statistic: 8.7 on and 8 DF,p-val:.76e-05 coef(lm.sdat)[]) #see plot # Autocorrelation worries > # AR() with standard Durbin-Watson test from package lmtest > install.packages("lmtest") > dwtest(lm.sdat, alternative = "two.sided") Durbin-Watson test data: lm.sdat DW =.878, p-value = 0.599 # approx DW = ( - r) alternative hypothesis: true autocorrelation is not 0 > acf(sdat) =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= # Polynomial (Quadratic, Cubic) Trajectories; Berkeley Growth Data > bgsdat = read.table(file="d:\\drr3\\stat\\week\\bgsdata", header = T) > attach(bgsdat) > plot(age, cog) #see plot #Data from the Berkeley Growth Study > lm.bgsq = lm(cog ~ age + I(age^)) #(Nancy Bailey). Data are for Child > lm.bgsc = lm(cog ~ age + I(age^) + I(age^3)) ##8 in the BGS study with age in mont > anova(lm.bgsq, lm.bgsc) #(ranging from to 60) and intellect Analysis of Variance Table #performance "cog". Model : cog ~ age + I(age^) cog age Model : cog ~ age + I(age^) + I(age^3) 4 Res.Df RSS Df Sum of Sq F Pr(>F) 0 8 303.7 7 3 7 54.88 778.8 5.07 0.000049 *** 37 5 --- 65 7 > summary(lm.bgsc) 85 9 Call: lm(formula = cog ~ age + I(age^) + I(age^3)) 88 0 Residuals: 95 Min Q Median 3Q Max 0 -.60-3.78-0.5045 4.083 9.668 03 3 Coefficients: 07 4 Estimate Std. Error t value Pr(> t ) 3 5 (Intercept) -7.7669 3.675075 -.3 0.04967 * 8 age 0.9380 0.583345 8.705 8.9e-3 *** 48 I(age^) -0.98944 0.0407-8.9.5e-07 *** 6 4 I(age^3) 0.00386 0.00076 5.0 0.00005 *** 65 7 87 36 Residual standard error: 5.557 on 7 degrees of freedom 05 4 Multiple R-squared: 0.9946, Adjusted R-squared: 0.9936 8 48 F-statistic: 039 on 3 and 7 DF, p-value: <.e-6 8 54 8 60
Reaction 80 300 30 340 360 0 4 6 8 Days
cog 0 50 00 50 00 0 0 0 30 40 50 60 age
> dwtest(lm.bgsc, alternative = "two.sided") Durbin-Watson test data: lm.bgsc DW =., p-value = 0.006566 alternative hypothesis: true autocorrelation is not 0 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= #######Count Data, Generalized Linear Models # slide for Poisson link function (log) > am = glm(cases ~ year,data=belg.aids, family=poisson(link=log)) > summary(am) Call: glm(formula = cases ~ year, family = poisson(link = log), data = belg.aids) Deviance Residuals: > belg.aids Min Q Median 3Q Max cases year -4.6784 -.503-0.636.760.7306 Coefficients: 4 Estimate Std. Error z value Pr(> z ) 3 33 3 (Intercept) 3.40590 0.07847 40.4 <e-6 *** 4 50 4 year 0.0 0.00777 6.0 <e-6 *** 5 67 5 --- 6 74 6 7 3 7 (Dispersion parameter for poisson family taken to be ) 8 4 8 Null deviance: 87.06 on degrees of freedom 9 65 9 Residual deviance: 80.686 on degrees of freedom 0 04 0 AIC: 66.37 53 Number of Fisher Scoring iterations: 4 46 3 40 3 > plot(am) # gives you the set of diagnostic plots--resids vs fitted etc # Quadratic in year > am = glm(cases ~ year+i(year^),data=belg.aids, family=poisson(link=log)) > summary(am) Call: glm(formula = cases ~ year + I(year^), family = poisson(link = log), data = belg.aids) Deviance Residuals: Min Q Median 3Q Max -.45903-0.6449 0.0897 0.677.54596 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept).90459 0.86877 0.75 < e-6 *** year 0.556003 0.045780.45 < e-6 *** I(year^) -0.0346 0.00659-8.09 9.8e-6 *** --- (Dispersion parameter for poisson family taken to be ) Null deviance: 87.058 on degrees of freedom Residual deviance: 9.40 on 0 degrees of freedom AIC: 96.94 Number of Fisher Scoring iterations: 4 > anova(am,am) # compare nested models Analysis of Deviance Table Model : cases ~ year Model : cases ~ year + I(year^) Resid. Df Resid. Dev Df Deviance 80.686 0 9.40 7.446
> anova(am,am, test = "Chisq") Analysis of Deviance Table Model : cases ~ year Model : cases ~ year + I(year^) Resid. Df Resid. Dev Df Deviance Pr(>Chi) 80.686 0 9.40 7.446 <.e-6 *** > AIC(am,am) df AIC am 66.3698 am 3 96.9358 > # cubic doesn't help, see link > year = seq(,3,length=00) > fv = predict(am,newdata=data.frame(year=year),se=true) > plot(belg.aids$year+980,belg.aids$cases) # data > lines(year+980,exp(fv$fit),col=) # fit > lines(year+980,exp(fv$fit+*fv$se),col=3) # upper c.l. > lines(year+980,exp(fv$fit-*fv$se),col=3) # lower c.l. > # produces nice final plot, note the overlay of fit and CI bands (*se) =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= # Non-linear Models: Logistic Growth Trajectory ### http://svn.r-project.org/r/trunk/src/library/datasets/data/chickweight.r ### Data on the growth of chicks on different diets. ### Hand and Crowder (996), Table A., p. 7 > Time = c(0,, 4, 6, 8, 0,, 4, 6, 8, 0, ) > weight = c(4, 5, 59, 64, 76, 93, 06, 5, 49, 7, 99, 05) > plot(time, weight) > Chick. = as.data.frame(cbind(time, weight)) > Asym = 668; xmid = 9; scal = 6 #fit not sensitive to choices of initial vals > fm3 <- nls(weight ~ SSlogis(Time, Asym, xmid, scal), data = Chick.) > summary(fm3) Formula: weight ~ SSlogis(Time, Asym, xmid, scal) > Chick. Parameters: Time weight Estimate Std. Error t value Pr(> t ) 0 4 Asym 937.04 465.8579.0 0.0756. 5 xmid 35.8 8.39 4.38 0.008 ** 3 4 59 scal.405 0.905.599 5.08e-07 *** 4 6 64 --- 5 8 76 Residual standard error:.99 on 9 d f 6 0 93 Number of iterations to convergence: 0 7 06 Achieved convergence tolerance: 6.6e-07 8 4 5 > predict(fm3, Time) 9 6 49 [] 40.84655 48.74 56.9649 0 8 7 67.09886 78.8773 9.50348 0 99 08.8683 6.305 46.5766 05 69.5374 95.559 09.54 > #at 0 weight = 3, four parameter logistic SSfpl Self-Starting Nls Four-Parameter Log
AIDS model example belg.aids <- data.frame(cases=c(,4,33,50,67,74,3, 4,65,04,53,46,40),year=:3) am <- glm(cases ~ year,data=belg.aids, family=poisson(link=log)) plot(am) Residuals vs Fitted Normal Q Q Scale Location Residuals vs Leverage Residuals 4 0 3 3.5 4.5 5.5.0 0.5 0.5 3.5 0.0.0 0.0 0.5.0.5 3 3.5 4.5 5.5 0 0.5 0.5 3 Cook s distance 0.0 0. 0.4 Predicted values Theoretical Quantiles Predicted values Leverage...clear trend in the residual mean + some overly influential points.
AIDS model example II Try a quadratic time dependence? am <- glm(cases ~ year+i(year^),data=belg.aids, family=poisson(link=log)) plot(am) Residuals vs Fitted Normal Q Q Scale Location Residuals vs Leverage Residuals.5 0.0.0 6.5 3.5 4.5 5.5 0 6.5 0.0.0 0.0 0.4 0.8. 6.5 3.5 4.5 5.5 0 0.5 0.5 3 Cook s distance 0.0 0. 0.4 0.6 Predicted values Theoretical Quantiles Predicted values Leverage...much better.
Fitted AIDS model cases 50 00 50 00 50 98 984 986 988 990 99 year
weight 50 00 50 00 0 5 0 5 0 Time