Solutions - Homework #2

Size: px

Start display at page:

Download "Solutions - Homework #2"

Edwin Anthony
5 years ago
Views:

1 45 Scatterplot of Abundance vs. Relative Density Parasite Abundance Relative Host Population Density Figure : 3 Scatterplot of Log Abundance vs. Log RD Log Parasite Abundance Log Relative Host Population Density. Problem : Parasite Abundance Figure : Solutions - Homework # (a) A scatterplot of the parasite abundances versus relative host population densities is given in Figure. In viewing this plot, there is an L-shaped relationship between the two variables where the majority of values are small for both abundance and density. Truthfully, it is hard to discern much of any relationship because so many of the species are clustered at these smaller values due to the few values that are orders of magnitude larger than the others in both variables. (b) A scatterplot of the log abundances versus the log relative densities is given in Figure. In viewing this plot, the relationship is positive, fairly linear, and moderate in strength. The log transformation has effectively reduced the magnitude of the larger values in each variable allowing us to see the relationship between the two variables much more clearly. (c) The simple linear regression model y i = β + β x i + ϵ i (with y = log parasite abundance and x = log relative density) was fit to these data, producing the following parameter estimates for the intercept and slope: β =.4767 & β =.763. The R -statistic as reported from the MatLab output was R =.3, meaning that 3.% of the variation in log parasite abundances was explained by the model on log relative density. The relevant output from the tstats regression output structure is shown below and all code used in this problem is given at the end of these solutions.

2 Coefficients: Value Std. Error t value p-value Intercept Log Density Multiple R-Squared:.3 F-statistic: 7.34 on and 7 degrees of freedom, p-value =.5 (d) An estimate of σ is given by the mean square error (MSE), which is reported in MatLab as out.mse in the out regression structure. Hence, MSE =.39. The standard errors of β and β are given in the Coefficients table above as: SE( β ) =.67 & SE( β ) =.8. Using these standard errors and computing the t-critical value with n p = 9 = 7 degrees of freedom at the 99% level (t.995(7) =.898), individual 99% confidence intervals for β & β were computed via MatLab as: For β : β ± t 7 SE( β ) =.4767 ±.898(.67) =.4767 ±.7583 (-.8,.4). For β : β ± t 7 SE( β ) =.763 ±.898(.8) =.763 ±.873 (-.5,.58). Hence, we are 99% confident that the slope of the regression of log parasite abundance on log relative density is between -.5 and.58. We are also (individually) 99% confident that the log abundance at a log relative density of (relative density = ) is between -.8 and.4. Since both confidence intervals contain the value, then both β and β appear to be insignificant at the. level in this model. This is confirmed by the p-values as both values are greater than.. (e) The confidence bands were computed using the confregplot function from the course webpage, as shown in the scatterplot in Figure 3 with the fitted regression line. The prediction bands are also plotted. To get at the gains in precision in estimating the mean of y for different values of x, the table to the left below gives the margins of error in the confidence interval for E(y) for several values of x. As expected, the further we get from the mean x-value of., the greater the variability in estimating E(y). To be more precise, the margin of error at either extreme of the log densities (-. or.6) is roughly % larger (.74 to.65) than that at the middle log density (-.). x Margin of Error Problem : Logistic reparameterization: Suppose we begin with the logistic growth model parameterized as: Mu u(t) = u + (M u ) exp( kt) where (M, u, k) are the model parameters. First, divide all terms in this model by u. Doing so gives: u(t) = = M ( ) M u + exp( kt) u M [ ( M + exp log u ) ] kt ( since M [ ( )]) M = exp log. u u

3 6 4 Log Parasite Abundance 4 6 Fitted Line Confidence Bands Prediction Bands Log Relative Density Figure 3: 6 Calcium vs. Time 5 Calcium (nmoles/mg) Time (in minutes) Figure 4: ( ) M M Defining a = log and b = k, this can be written: u(t) =, as desired. u + exp(a + bt) 3. Problem 3: Weibull fit problem (a) A scatterplot of the calcium amounts vs. time is given to the right. In viewing this plot, the calcium amounts initially increase very rapidly but seem to level off around 5 nmoles/mg. It is also worth noting that there seems to be more variability in these calcium amounts as their values increase. The resulting pattern may be well-described by an exponential growth curve, although the pattern seems to change somewhat abruptly around a time of 4 minutes. (b) Using the Weibull growth model parameterized as: y i = α { exp [ (t i /σ) γ ]} + ϵ i, we first recognize that α is the upper asymptote of the curve, since as time increases, the exponential piece goes to zero. Eyeballing where this limit occurs from the scatterplot, we choose α = 4 as the starting value for α in a nonlinear least squares fit. Picking two points from the scatterplot, we can see that (x, y) = (, ) & (6, 3) roughly fit the curved pattern seen. Substituting these values into the Weibull model gives the following pair of equations: = 4 { exp [ (/σ) γ ]} and 3 = 4 { exp [ (6/σ) γ ]}. 3

4 .5 Residual Plot.5 Residual Plot.5.5 Residuals Residuals Predicted Values, Predicted Values Figure 5: Solving this system of two equations in the two unknowns (σ, γ) gives: 3 = 4 exp [ (/σ) γ ] = 4 exp [ (6/σ) γ ] = = γ = ln ln(3/4) = (/σ)γ ln(/4) = (6/σ) γ = ln(4/3) ln(4) [ ln(4/3) ln(4) = γ (dividing the equations) 6γ ] / ln(/3) =.88 = ln(ln(4)) = ln(6/σ) γ = σ = 6 exp [ ln(ln(4))/γ] = 6 exp [ ln(ln(4))/.88] = 4.4. Hence, the starting values I used to fit the Weibull model were: (α, σ, γ) = (4, 4.4,.88). (c) With these starting values found above, nlinfit was used to fit a Weibull growth curve to these data, resulting in the fitted curve plotted in Figure 4. The resulting parameter estimates are: α = 4.835, σ = 4.736, γ =.56. (d) A residual plot and normal quantile plot are shown in Figure 5. In viewing these plots, the residual plot appears to exhibit random scatter about the -line indicating variance homogeneity among the model residuals. There are a couple of large residuals at larger predicted values, but nothing systematic enough to worry about heterogeneity issues. The normal quantile plot shows a reasonably linear relationship between the residuals and the standard normal quantiles, indicating no serious departures from normality for the residuals. These two assumptions justify our use of t-based inferences in later parts of this problem. (e) The nlparci function in MatLab was used to find individual 95% confidence intervals for the three model parameters. The resulting intervals are reported below along with the by hand calculations using the standard errors reported in the next part. For each confidence interval to be at the 95% level individually, since there are n = 7 data pairs and p = 3 parameters being estimated, there are n p = 4 degrees of freedom available. Under the assumption that the sampling distributions of the parameter estimates are normal, we use a t-based confidence interval for α, σ, and γ. The critical t-value is: t = t 4 (.975) =.64 as found via MatLab. This resulted in the following set of 3 individual 95% confidence intervals for the 3 parameters: For α : α ± t SE( α) = 4.83 ±.64(.4745) = 4.83 ±.979 = (3.34, 5.6), For σ : σ ± t SE( σ) = ±.64(.76) = 4.73 ±.6 = (.9, 7.354), For γ : γ ± t SE( γ) =.56 ±.64(.7) =.6 ±.469 = (.547,.485). 4

5 clear all; The confidence interval for α can be interpreted as: We are 95% confident that the true value for α in the Weibull model relating calcium amount to suspension time is between 3.34 and 5.6. The others can be interpreted similarly. To have 3 confidence intervals simultaneously at the 95% level, we need to use the Bonferroni correction as discussed briefly in class. To do this, instead of finding the critical t-value at the.975 percentile of the t-distribution with 4 degrees of freedom, we divide the lower tail (.5) by 3 (the number of CIs desired) to get a tail probability of.5/3 =.83 and then compute t.997 (4) to get the critical value. Doing so gives t.997 (4) =.574 and a wider set of CIs: For α : (3.6, 5.54) For σ : (.46, 8.) For γ : (.43,.6) in which we are simultaneously 95% confident that the three parameters fall inside their respective intervals. (f) As indicated in the previous part, the operative t-critical value is: t = t 4 (.975) =.64. The lengths of the three confidence intervals divided by give the margins of error for the three intervals. Dividing these margins of error by t gives the standard errors for each of the three parameter estimates. These calculations are summarized in the table below. Parameter Estimate CI Margin of Error Standard Error α 4.83 (3.34,5.6) ( )/ =.98.98/t =.475 σ 4.73 (.9,7.354) ( )/ =.64.64/t =.7 γ.6 (.547,.485) ( )/ = /t =.7 (g) Since the confidence interval for γ clearly includes the value γ = (which makes sense since γ =.6 was so close to ), then we have absolutely no evidence that γ differs from and it does seem that this parameter is unnecessary. If we omit this parameter, which we are clearly justified in doing, we are left with the -parameter exponential growth model which was fully discussed in class. As it stands, the model is overspecified. MatLab Code Used % Problem : Plots of parasite abundances vs. relative densities % load../data/arneberg; % Load the data figure() reldens = arneberg.reldens; abund = arneberg.abund; plot(reldens,abund, ko ) % Plots abundance vs. rel. density xlabel( Relative Host Population Density, fontsize,4) ylabel( Parasite Abundance, fontsize,4) title( Scatterplot of Abundance vs. Relative Density, fontsize,4, fontweight, b ) figure() logreldens = log(reldens); % Log of relative densities logabund = log(abund); % Log of abundances plot(logreldens,logabund, ko ) % Plots log abundance vs. log RD xlabel( Log Relative Host Population Density, fontsize,4) ylabel( Log Parasite Abundance, fontsize,4) 5

6 title( Scatterplot of Log Abundance vs. Log RD, fontsize,4, fontweight, b ) % Problem : Regression of log abundance on log relative density % out = regstats(logabund,logreldens); % Regresses abundance (y) on density (x) out.tstat % Requests relevant parameter estimate info out.rsquare % Multiple R-Squared Value out.mse % Mean Squared Error (MSE) out.fstat % Model F-Statistic, df, pval n = length(abund); % Sample size p = ; % Defines # parameters bhat = out.beta; % Vector of parameter estimates seb = sqrt(diag(out.covb)); % Vector of standard errors tbonf = tinv((-.5),n-p); % 99% uncorrected t*-value ci_b = [bhat-tbonf*seb,... % Confidence intervals for betas bhat+tbonf*seb]; % in columns (lower,upper) ci_b % Prints confidence intervals % ================================================= % % Problem : Confidence bands for E(log abundances) % % ================================================= % xlab = Log Relative Density ; % X-axis label ylab = Log Parasite Abundance ; % Y-axis label % The confregplot function plots the confidence and prediction % bands for E(y). This function requires as inputs the x & y % variables, labels for these variables, and the confidence level. confregplot(logreldens,logabund,xlab,ylab,99) % ================================================ % % Problem 3: Weibull fit to Calcium Data with plot % % ================================================ % load../data/calcium.mat; % Loads the calcium data calc = calcium.calcium; time = calcium.time; % Defines calcium & time vectors plot(time,calc, ko ) % Plots calcium vs. time xlim([ 6]) % x-axis plotting limits xlabel( Time (in minutes), fontsize,4, fontweight, b ); ylabel( Calcium (nmoles/mg), fontsize,4, fontweight, b ); title( Calcium vs. Time, fontsize,4, fontweight, b ); hold on; % Hold the current plot % ======================================================== % % Problem 3c - Nonlinear Weibull model fit to calcium data % % ======================================================== % beta = [ ]; % Parameter starting value [betahat resid J] = nlinfit(time,... % Performs nonlinear Weibull fit calc,@weibull,beta); % returning betahats, resids time = :.:5; % Vector of times from to 5 yhat = betahat()*(-exp(-(time./... % Computes Weibull predicted 6

7 betahat()).^betahat(3))); % values (yhat s) plot(time,yhat); % Plots the fitted line hold off; % End hold on current plot %nlintool(calcium.time,calcium.calcium,@weibull,beta); % =========================================================== % % Problem 3d - Residual and normal quantile plot of residuals % % =========================================================== % figure() % st Figure yhat.weib = calc - resid; % Computes Weibull predicted values plot(yhat.weib,resid, ko ); % Plots residuals vs. predicted y-values xlabel( Predicted Values, fontsize,4, fontweight, b ); ylabel( Residuals, fontsize,4, fontweight, b ); title( Residual Plot, fontsize,4, fontweight, b ); figure(); % nd Figure qqplot(resid); % Normal quantile plot of residuals xlabel( Standard Normal Quantiles, fontsize,4, fontweight, b ); ylabel( Residuals, fontsize,4, fontweight, b ); title( Normal Quantile Plot, fontsize,4, fontweight, b ); % ======================================================================= % % Problem 3e,f - Individual 95% Confidence intervals for the 3 parameters % % ======================================================================= % ci = nlparci(betahat,resid,j); % Computes CIs for (alpha,sigma,gamma) moe = (ci(:,)-ci(:,))/; % Margin of error from CIs se = moe/tinv(.975,4); % Standard errors from CIs 7

Solutions - Homework #2

Solutions - Homework #2 1. Problem 1: Biological Recovery (a) A scatterplot of the biological recovery percentages versus time is given to the right. In viewing this plot, there is negative, slightly nonlinear