Solutions - Homework #1

Size: px

Start display at page:

Download "Solutions - Homework #1"

Arnold Stewart
5 years ago
Views:

1 Solutions - Homework #1 1. Problem 1: Below appears a summary of the paper The pattern of a host-parasite distribution by Schmid & Robinson (197). Using the gnat Culicoides crepuscularis as a host specimen and the filarial nematode Chandlerella quiscali as a parasite, the variation in the pattern of distribution of parasites in the host was studied to assess whether the pattern was random (Poisson) or not. Specifically, 143 gnats were examined from the same infected purple grackle and the number of nematodes counted on each gnat. Three analyses of these count data were performed using chi-square goodness of fit techniques to assess whether the data might have arisen from a Poisson distribution. Whether observing only the infected gnats or all gnats, the observed frequencies of nematodes did not appear to fit a Poisson model (p <.1), but fit a negative binomial model quite closely. This finding is supported by the high variance/mean ratio of 6.68 and the clumped distribution of nematodes per gnat. Schmid & Robinson list four plausible explanations for the clumped parasite pattern within this host and provide examples of other studies attributing this clumpiness to variation in nematode density. Their basic message is to emphasize the importance of pattern analysis in the study of parasitism.. Problem : We are given two sets of criteria for evaluating whether or not the HAART treatment was a success or a failure (virological - the gold standard, and clinical/immunological). The simplest way to visualize the information provided regarding the number of cases resulting in successes or failure from these two types of evaluation is to make a x table of counts, as given to the right. With 37 total patients, the counts of 14, 48, and 3 were first placed in this table, and the remaining values filled in according to the required row and column totals. With this table then, we recognize that A, B, C, and D as defined in Box.4 of the Ecology of Wildlife Diseases book are given is labeled in the table. Computing then, we have: sensitivity = specificity = observed prevalence = true prevalence = Clinical/Immunological HAART HAART Failure Success Total HAART Failure Virological A C HAART Success B Total true-failure all virological failures = A A+C = = 3 14 =.143, true-success all virological successes = D B +D = =.856, all clin/imm. failures = A+B number examined obs.prev. + specif. - 1 sens.+spec. 1 = = D N = =.1468, (A+B)/N +D/(B +D) 1 A/(A+C) B/(B +D) =.3.75 =.48. The small sensitivity value indicates that the clinical/immunological tests do a poor job indicating failure of the HAART therapy when in fact it has failed. On the other hand, the high specificity value indicates that when the treatment is a success, the clinical/immunological evaluations correctly identify it as a success about 86% of the time. The observed prevalence is just the proportion of 1

2 HAART therapy cases deemed to have failed according to the clinical/immunological evaluation; in this case, this evaluation found that the therapy failed 14.68% of the time. However, when accounting for the misclassifications made as measured by the sensitivity and specificity, the true prevalence of HAART failures was just 4.8%. This smaller value results from the large number of false positive results, where 45 patients were deemed to have HAART therapy failure, when in fact they did not. 3. Problem 3: From a randomized experiment involving 36 mice, data were collected on the percentages of Fe 3+ and Fe 4+ retained after a fixed time interval. The resulting percentages for the two groups of 18 mice each are displayed in the boxplot to the right. Iron retention percentages tend to be higher for the Fe 4+ group. The distribution of Fe 4+ retention percentages is somewhat skewed to the right with percentages ranging from.% to 11.65%, centered at a median of 5.75% with one mild outlier at 1.45%. The Fe 3+ distribution of percentages is fairly symmetric with values ranging from.71% to 5.6%, centered at a median of 3.48% with two outliers at 8.15% & 8.4%. 4. Problem 4 Percentage Iron Retained Fe 3+ Fe 4+ Iron Group (a) The MatLab code to conduct this simulation is given at the end of this solutions handout. (b) The five histograms of the variance-to-mean ratios for randomly generated samples of sizes n = 1,5,5,, and 5 from a Poisson (θ = ) distribution are shown below. 5 Poisson Ratios (n=1) 3 Poisson Ratios (n=5) Poisson Ratios (n=5) Poisson Ratios (n=) Poisson Ratios (n=5) In viewing these histograms, the distributions of variance-to-mean ratios for all sample sizes are centered at a ratio of 1 and are roughly symmetric, with some evidence of slight right-skewness, especially for the smaller sample sizes. The primary difference between these distributions is the

3 variability in the ratios. For a sample size of 1, ratios could be as small as. and as large as.5, whereas all ratios are within about. of 1 for n = 5. It is clear that deciding whether or not a variance-to-mean ratio differs from 1 (indicating a departure from randomness) depends critically on the sample size. (c) The table to the right summarizes the ratios for the five cases. As with the histograms, the confidence intervals in this table clearly indicate the decreasing variability in the variance-to-mean ratio as a function of sample size. This can also be easily observed in the ratio standard deviations. Although there was some skewness evident in the Sample Mean of SD of Confidence Size n Ratios Ratios Interval (.8,.6) (.5, 1.7) (.63, 1.46) (.76, 1.3) (.88, 1.13) ratio distributions, there does not appear to be any systematic bias, as the ratio means are all very close to the Poisson ratio of 1. (d) If someone were to come to me with count data having a mean of and a variance-to-mean ratio of 1.5 wondering whether or nor they were statistically aggregated, I would tell her that it depends on the sample size of her data. Based on this small-scale simulation study, a variance-to-mean ratio of 1.5 would not be unusual for sample sizes of 1 or 5 (since the 95% confidence intervals for the ratio contain 1.5); however, for samples of size 5 or greater, a ratio of 1.5 is unusual given the random distribution of ratios from our simulation and we might be more inclined to conclude that the data are aggregated. Given the difficulty finding a confidence interval for a variance-to-mean ratio analytically, a simulation such as this is useful for studying the properties of this ratio. 5. Problem 5 (a) Taking the hint, the log-likelihood function for θ is given by: n [ θ x i e θ ] n n n log L(θ x) = log = log θ +log e θ log 1/x i! x i! = logθ +loge nθ K (where K is a constant with respect to θ) = x i logθ nθ K. Differentiating the log-likelihood with respect to θ and setting it equal to : logl(θ x) θ = θ= θ θ n = θ = n = x. logl(θ x) Taking the nd derivative of logl(θ x) with respect to θ: θ = <, so θ= θ θ that θ is a local mamum. Since θ was uniquely determined, then θ is the absolute mamum and hence the MLE of θ. (b) A histogram of the number of tapeworms per perch is shown to the right at the top of the next page (MatLab code given at the end of the solutions). In viewing these data, the distribution of the number oftapewormsper perchis highly right-skewedwith morethan 75%ofthe gnatshaving 1 tapeworm orfewer. The median number oftapeworms is 1 and the number oftapewormsranged from to 6 per perch. These data have a variance-to-mean ratio of s /m = 1.69/.888= 1., so based on the simulation in Problem 4, these data appear more aggregated than would be expected from a Poisson distribution. 3

4 (c) The MLE of θ for these data is computed as: θ = n = 168+(75)+3(3)+4(7)+5()+6(1) 5 = =.888. To find the expected frequency for X = x tapeworms per perch for x =,1,..., we multiply the total number of tapeworms (n = 5) by: P(X = x) = θ x e θ x! Doing so gives the frequency table to the right. The calculations in this table were made using MatLab as shown at the end of the solutions. (d) Computing the chi-square test statistic using the cells in the frequency table of part (e): D = =.888x e.888. x! Histogram of Tapeworm Counts # Tapeworms per perch # tapeworms Observed Expected per perch P(X = x) (O i E i ) [ ] ( ) = + + (1 6.5) E i = = 9.8. With g = 5 groups, there are g = 3 degrees of freedom for this test. The p-value for this test (the likelihood of getting a value of 9.8 or greater from a chi-square distribution with 3 d.f.) is computed in MatLab as: 1-chicdf(D,3) =.58. This p-value indicates moderate evidence of lack of fit to a Poisson model. MatLab Code Used for Homework #1 % ======================== % % Problem 3: Iron Data EDA % % ======================== % load irondiet.mat boxplot(irondiet.fe,irondiet.type, labels,{ Fe 3+, Fe 4+ }) xlabel( Iron Group, fontsize,14); ylabel( Percentage Iron Retained, fontsize,14) median(irondiet.fe(irondiet.type==3)) median(irondiet.fe(irondiet.type==4)) % ============================= % % Problem 4: Poisson simulation % % ============================= % theta = ; % Assigns the Poisson mean at n = [ ]; % Vector of 5 sample sizes for i = 1:5 % Begins loop through n-values nsim = ; % Sets the number of simulations 4

5 for j = 1:nsim % Loops through simulations dat = poissrnd(theta,n(i),1); % Generates n Poisson() values ratio(j) = var(dat)/mean(dat); % Computes var-to-mean ratio end % End simulation loop mrat(i) = mean(ratio); % Computes mean of ratios sdrat(i) = std(ratio); % Computes SD of ratios ratio = sort(ratio); % Sorts the ratios ci.low(i) = ratio(round(nsim*.5)); % Ratio CI lower limit ci.upp(i) = ratio(round(nsim*.975)); % Ratio CI upper limit subplot(3,,i); % ith plot in 3x window hist(ratio) % Histogram of the ratios xlabel( Variance-to-Mean Ratios ) % x-as label on plot ylabel( ) % y-as label on plot title([ Poisson Ratios (n=,numstr(n(i)), ) ]) xlim([.5]); % Sets x-as limits end % End of sample size loop % ======================================= % % Problem 5: Histogram of Tapeworm Counts % % ======================================= % tapeworm = [zeros(1,35) ones(1,168)... % Vector of tapeworm counts *ones(1,75) 3*ones(1,3)... % created efficiently using 4*ones(1,7) 5 5 6]; % the "ones" function breaks = :6; % Centers for histogram bars hist(tapeworm,breaks) % Histogram of tapeworm counts xlabel( # Tapeworms per perch ) ylabel( ) title( Histogram of Tapeworm Counts ) % ======================================== % % Problem 5: MLE & Chi-Square Computations % % ======================================== % mle = mean(tapeworm); % Mean of tapeworm counts px = poisspdf(:3,mle); % Poisson probs for -3 px = [px,(1-sum(px))]; % Adds prob. for >= 4 expfreq = px*length(tapeworm); % Computes expected frequencies breaks = :4; % Centers for histogram counts [obsfreq,mid] = hist(tapeworm,breaks); % Histogram counts in "obsfreq" D = sum((obsfreq-expfreq).^./expfreq); % Computes chi-squared statistic pval = 1 - chicdf(d,3); % Computes chi-squared p-value 5

Solutions - Homework #2

Solutions - Homework #2 45 Scatterplot of Abundance vs. Relative Density Parasite Abundance 4 35 3 5 5 5 5 5 Relative Host Population Density Figure : 3 Scatterplot of Log Abundance vs. Log RD Log Parasite Abundance 3.5.5.5.5.5