You can compute the maximum likelihood estimate for the correlation

Size: px

Start display at page:

Download "You can compute the maximum likelihood estimate for the correlation"

Tyler Joseph
5 years ago
Views:

1 Stat 50 Solutions Comments on Assignment Spring 005. (a) _ 37.6 X = Σ = (b) S = Σ = (c) You can compute the maximum likelihood estimate for the correlation coefficient from either Σ or S. Then r = 9.70 (97.84)(75.05) = (0.565) t = (.565) =.9373 with df = 0 - = 8 (d) + r z = log = log = r z z L u = = (.96) + (.96) = =.38 an approximate 95% confidence interval for the correlation coefficient is (-.005) (.38) e - e -, = (-0.00, 0.88) (-.005) (.38) e + e + (e) An estimate of the generalized variance is S = , Answers may vary due to rounding off numbers in the computation of S or using the maximum likelihood estimator instead of S.. (f) An estimate of total variance is

2 trace(s) = = (g) An approximate 95% confidence interval for the correlation coefficient is (0.083, 0.96) (h) r r r r r r = 3. (- )(- ) 3 = r z = log = log = r an approximate 95% confidence interval for the partial correlation coefficient is (-0.894, 0.863) X X3 X (i) The partial correlation for given could be zero since the 95% CI includes 0. This partial correlation is the correlation between levels of aspartate aminotransferase glutamate dehydrogenase for any subpopulation of patients defined by a particular level of alanine aminotransferase. (0.4638) 0-3 (j) t = = (.4638) with df = 0-3 = 7 p-value = 0.04 There is insufficient evidence to reject the null hypothesis = 0. (a) Write the joint likelihood function in the form ρ 3. L( µ, Σ) = ( π) np/ Σ n where A = (X j-x )(X j-x ) j= null hypothesis, substitute n/ e tr( Σ A) ( X µ )' Σ - ( X µ ) e.to get the log-likelihood corresponding to the ( ) np / - Σ= σ I, Σ = σ I = σ, tr( Σ A) = tr(a), σ Then, the log-likelihood for the null hypothesis is np np n ( µσ, ) = - log( π) - log( σ ) - tr(a) - ( X µ )'( X µ ) σ σ

3 (b) Regardless of the value of, under the null hypothesis the log-likelihood _ is maximized when µ = X. Then, the last term in the log-likelihood is zero, σ the mle for σ can be derived by maximizing (, ) = - np np log( ) - log( µ Σ π σ ) - tr(a) σ Setting the first partial derivative equal to zero yields the equation ( µ, Σ) np = - + σ σ σ 4 tr(a) = 0 The solution to this equation yields the maximum likelihood estimate = np tr(a) = n- tr( ) = p np tr(s) = n σ Σ (X j- X)(X ' j-x) np j= which is simply the average of the estimated variances for the p responses obtained from the units in the sample.. (c) There are various ways to write the same formula, but the simplest form is - ln( Λ ) = (np)log( tr( ) ) - (n)log( ) p Σ Σ (d) d.f. = p(p + ) + p - (p+) = p(p + ) - = (p + )(p -) (e) Answers may vary if you do the calculations with a calculator instead of SAS or S- plus, depending on how you rounded off the numbers in the estimated covariance matrix. - ln( Λ) = with 5 d.f. p - value = 0 The null hypothesis is rejected which implies all correlation can t be zero. Bartlett s correction to -ln( Λ ) = 36.36, d.f = 5, p-value<0.000 The correction had no effect on the conclusion that the correlation is not zero.

4 3. Correlation: Generalized variance: Total variance: The first three cases exhibit different levels of correlation, but the same generalized variance. The total variance for these cases becomes smaller as the correlation gets closer to zero. When there is a perfect correlation, as in the last case, the generalized variance is zero because there is only variation along a onedimensional line, not in two-dimensional space. The total variance is positive because it only accounts for the sum of the variances of the univariate marginal distributions. 4. (a) with d.f.=98 S = (b). M = i = (n i - ) log( S ) - i = (n i - )log( S i ) = C - = - () 6( + ()() + )( - ) = MC - = 8.95 > χ 3,. 005 =.84 p - value =.0003 <.005 The covariance matrices are not the same. Homes with air conditioning exhibit larger variances for both on peak off peak use of electricity. (c). H: o ρ = ρ vs. H a : ρ ρ + r Z = + p log ~ N log, r p n 3 + r Z = + p log ~ N log, r p n 3 Since the samples are independent, Z is independent of Z. This implies that Var(Z -Z ) = Var(Z ) + Var(Z ) = + n 3 n 3 Z -Z ( ) ( ) + p ( p ) ~N log, + p ( p ) n 3 n + 3

5 under Ho, => Z -Z ~N 0, + n 3 n 3 + p + p ln ln ~ Z(0,) p p + n 3 n 3 For theses data, Z = -.4 with p-value=0.03, the null hypothesis of equal correlations can be rejected at the 0.05 level of significance. The correlations between total on-peak off-peak usage are not the same for homes with or without air conditioning. (d) T =6.066 (e) F=7.95 d.f.=,97 p-value=<0.000 true (f). Do not use a pooled estimate of the covariance matrix. Estimate Var(X X ) as S+ S compute n n T = ( X X) S+ S ( X X) n n A test statistic with an approximate F-distribution when the null hypothesis is n is F= + n p T = with (p, n+ n p ) degrees of freedom p(n+ n ) The p-value<0.000 is still small in this case.

6 5. (a) (b) There are some extreme values in the plot of the results for observer versus the monitor the plot of the results for observer versus the monitor that indicate some substantial disagreements between the expert observers the monitor. The agreement between the two expert observers is quite strong. X X X3 Value of W p-value 4.388e e The p-values of the Shapiro-Wilk statistics are very small for all three blood pressure measurements, leading to the conclusion that none of three sets of measurements ia a rom sample from a normal distribution.. The QQ plots also indicate substantial right skewness for each distribution.

7 The Chi-Square probability plot showed that the fit of a three dimensional normal model is inappropriate since it doesn t resemble a straight line through the origin having a slope of. The extreme observations seen in the scatter plots in part (a) are clearly evident in the upper right part of this plot. (c) Since the QQ plots give evidence of right skewness, it may be useful to consider a power transformation with lambda <. Results from Shapiro-Wilk tests QQ plots, suggest the following transformation for the three responses. To compare the mean responses for the two expert observers the monitor, we must use the same transformation for all three responses. It appears that the inverse transformation provides the best compromise for these three responses. X X X3 Transformation X^(-) X^(-) X3^(/) Shapiro-Wilk W p-value

8 lambda X X X3 0.5 Shapiro-Wilk p-value Log Shapiro-Wilk p-value Shapiro-Wilk p-value Shapiro-Wilk p-value (d) Using the natural logarithm of each response, we obtain S = R = These results suggest that the correlation (or agreement) between the log-responses for the two expert observers is stronger than the correlation between the log-responses between either expert the monitor. Also, variablitiy may be lower for the monitor. Perhaps the monitor has difficulty recording high blood pressure values. (e) p(p -) - ln( Λ ) = - n ln R = 60.8 with = 3 d.f. p - value = 0 Bartlett s correction, p n -- ln R = with 3 d.f. p - value = 0 Since the estimated correlations are positive the logarithm is a monotone transformation, there is evidence that at least one correlation is not zero which implies that there is more than rom agreement either between the two experts or between at least one of the experts the monitor. (f) Bartlett correction to -*log(lambda)= with d.f = 4 p-value <.000.

9 Reject the null hypothesis that all of the correlations are equal all of the variances are equal for the natural logarithms of systolic blood pressure measurements for the two expert observers the semi-automatic monitor. [,] [,] [,3] [,4] [,5] [,6] rlowerb rupperb (g) Since the two experts the monitor provide repeated measurements on the same subjects, you cannot assume that the estimated correlations are independent. You can use the bootstrap method to obtain approximate 95% confidence intervals for differences between (or ratios of) correlations. You must take bootstrap samples from the original sample, sampling with replacement, compute the correlation matrix for each bootstrap sample. Then, for each bootstrap sample compute the differences (or ratios) for the three pairs of correlations. Results below are based on 5000 bootstrap samples. Ratio 95% Bootstrap C.I. ρ / ρ 3 (.087,.39) ρ / ρ 3 (.08,.393) ρ3 / ρ 3 (0.990,.004) The correlation between the blood pressure measurements for the two expert observers is stronger than the correlation between the blood pressure measurements for the monitor either expert. The correlation between the first expert the monitor is not significantly different from the correlation between the second expert the monitor. If you wanted to make simultaneous confidence intervals, you could have invoked the Bonferroni procedure constructed individual 98.33% confidence intervals for the three differences (or ratios) (h) Use the bootstrap approach described in part (g) to construct approximate confidence intervals for ratios of variances for log-responses. Note that you cannot use F-tests to compare variances in this case, because the variance estimates are not independent. Variance Ratio Bootstrap C.I. var(ln(x))/ var(ln(x)) (.004,.043) var(ln(x))/ var(ln(x3)) (0.890,.387) var(ln(x)) /var(ln(x3)) (0.875,.350) The first expert displayed more variability than the second expert, but this does not necessarily imply that the first expert is less reliable. Why? There were no significant differences between the variance for the monitor the variance for either expert, although the confidence intervals for the variance ratios were quite wide.

10 (i) u = u u3 H : = 0 is equivalent to test that u-u=u-u3= H : C u = = u3 Let Y= CX, X ~ NID ( u, ) n Then Y = CX S y = ( Y Y ) ( Y Y ) n - j= ' T = n(y - 0) (Y - 0) = S y u u ' n - p + F = T = d.f =, 83 p-value<0.000 (p -)(n -) There is evidence that the mean blood pressure readings are not the same for the two experts or the monitor. (j) You can do three paired t-tests, or square the paired t-tests to obtain F-tests. H : u = 0 u, T^ = d.f =(, 84) p-value=0.948 There is insufficient evidence to conclude the average blood pressure measurements differ for the two experts. u u3 H : =, 0 T^ = d.f =(, 84) p-value<0.000 u u3 H : = 0 T^ = d.f =(, 84) p-value<0.000 There is significant evidence that the mean systolic blood pressure measurement is lower for the semi-automatic monitor than for either of the two experts.

STAT 501 Assignment 2 NAME Spring Chapter 5, and Sections in Johnson & Wichern.

STAT 501 Assignment 2 NAME Spring Chapter 5, and Sections in Johnson & Wichern. STAT 01 Assignment NAME Spring 00 Reading Assignment: Written Assignment: Chapter, and Sections 6.1-6.3 in Johnson & Wichern. Due Monday, February 1, in class. You should be able to do the first four problems