Ismor Fischer, //04 7.4-7.4 Problems. I Problem 4.4/9, it was show that importat relatios exist betwee populatio meas, variaces, ad covariace. Specifically, we have the formulas that appear below left. I. (A) µ X+ = µ X+ µ I. (A) x+ y = x + y σ + = σ + σ + σ X X X s + = s + s + s x y x y xy II. (A) µ X = µ X µ II. (A) x y = x y σ = σ + σ σ X X X s = s + s s x y x y xy I this problem, we verify that these properties are also true for sample meas, variaces, ad covariace i two examples. For data values {x, x,, x } ad {y, y,, y }, recall that: x = y = xi s x = yi s y = ( x x) i ( yi y). Now suppose that each value x i from the first sample is paired with exactly oe correspodig value y i from the secod sample. That is, we have the set of ordered pairs of data {( x, y ), ( x, y ),, ( x, y )}, with sample covariace give by s xy = ( xi x)( yi y). Furthermore, we ca label the pairwise sum x + y as the dataset ( x+ y, x + y,, x + y), ad likewise for the pairwise differece x y. It ca be show (via basic algebra, or Appedix A), that for ay such dataset of ordered pairs, the formulas that appear above right hold. (Note that these formulas geeralize the properties foud i Problem.5/4.) For the followig ordered data pairs, verify that the formulas i I ad II hold. (I R, use mea, var, ad cov.) Also, sketch the scatterplot. x 0 6 8 y 3 3 5 9 Repeat for the followig dataset. Notice that the values of x i ad y i are the same as before, but the correspodece betwee them is differet! x 0 6 8 y 3 9 3 5
Ismor Fischer, //04 7.4-. Expiratio dates that establish the shelf lives of pharmaceutical products are determied from stability data i drug formulatio studies. I order to measure the rate of decompositio of a particular drug, it is stored uder various coditios of temperature, humidity, light itesity, etc., ad assayed for itact drug potecy at FDA-recommeded time itervals of every three moths durig the first year. I this example, the assay (mg) of a certai 500 mg tablet formulatio is determied at time X (moths) uder ambiet storage coditios. X 0 3 6 9 500 490 470 430 350 (a) Graph these data poits (x i, y i ) i a scatterplot, ad calculate the sample correlatio coefficiet r = s xy / s x s y. Classify the correlatio as positive or egative, ad as weak, moderate, or strog. (b) Determie the equatio of the least squares regressio lie for these data poits, ad iclude a 95% cofidece iterval for the slope β. (c) Sketch a graph of this lie o the same set of axes as part (a); also calculate ad plot the fitted respose values y ˆi ad the residuals e i = y i y ˆi o this graph. (d) Complete a ANOVA table for this liear regressio, icludig the F-ratio ad correspodig p-value. (e) Calculate the value of the coefficiet of determiatio r, usig the two followig equivalet ways (ad showig agreemet of your aswers), ad iterpret this quatity as a measure of fit of the regressio lie to the data, i a brief, clear explaatio. via squarig the correlatio coefficiet r = s xy s x s y foud i (a), via the ratio r = SS Regressio SS Total of sums of squares foud i (d). (f) Test the ull hypothesis of o liear associatio betwee X ad, either by usig your aswer i (a) o H 0 : ρ = 0, or equivaletly, by usig your aswers i (b) ad/or (d) o H 0 : β = 0. (g) Calculate a poit estimate of the mea potecy whe X = 6 moths. Judgig from the data, is this realistic? Determie a 95% cofidece iterval for this value. (h) The FDA recommeds that the expiratio date should be defied as that time whe a drug cotais 90% of the labeled potecy. Usig this defiitio, calculate the expiratio date for this tablet formulatio. Judgig from the data, is this realistic? (i) The residual plot of this model shows evidece of a oliear tred. (Check this!) I order to obtai a better regressio model, first apply the liear trasformatios X = X / 3 ad = 50, the try fittig a expoetial curve = α e β X. Use this model to determie the expiratio date. Judgig from the data, is this realistic?
Ismor Fischer, //04 7.4-3 (j) Redo this problem usig the followig R code: # See help(lm) or help(lsfit), ad help(plot.lm) for details. # Compute Correlatio Coefficiet ad Scatterplot X <- c(0, 3, 6, 9, ) <- c(500, 490, 470, 430, 350) cor(x, ) plot(x,, xlab = "X = Moths", ylab = " = Assay (mg)", pch=9) Aswer this. # Least Squares Fit, Regressio Lie Plot, ANOVA F-test reglie <- lm( ~ X) summary(reglie) ablie(reglie, col = "blue") # Exercise: Why does the p-value of 0.0049 appear twice? # Estimate Mea Potecy at 6 Moths ew <- data.frame(x = 6) predict(reglie, ew, iterval = "cofidece") # Residual Plot resids <- roud(resid(reglie), ) plot(reglie, which =, id. = 5, labels.id = resids, pch=9) # Log-Trasformed Liear Regressio Xtilde <- X / 3 tilde = 50 V <- log(tilde) plot(xtilde, V, xlab = "Xtilde", ylab = "l(tilde)", pch=9) reglie.trasf <- lm(v ~ Xtilde) summary(reglie.trasf) ablie(reglie.trasf, col = "red") # Plot Trasformed Model coeffs <- coefficiets(reglie.trasf) scale <- exp(coeffs[]) shape <- coeffs[] hat <- fuctio(x)(50 scale * exp(shape * X / 3)) plot(x,, xlab = "X = Moths", ylab = " = Assay (mg)", pch=9) curve(hat, col = "red", add = TRUE)
Ismor Fischer, //04 7.4-4 3. A Third Trasformatio. Suppose that two cotiuous variables X ad are egatively correlated via the oliear relatio =, for some parameters α ad β. This is α X + β algebraically equivalet to the relatio = αx + β, which ca the be solved via simple liear regressio. Use this reciprocal trasformatio o the data ad correspodig scatterplot below, to sketch a ew scatterplot, ad solve for sample-based estimates of the parameters α ad β. (Hit: Fidig the parameter values i this example should be straightforward, ad ot require ay least squares regressio formulas.) Express the origial respose i terms of X. X 0 3 4 5 X 0 3 4 5 60 30 0 5 0 / 4. For this problem, recall that i simple liear regressio, we have the followig defiitios: sxy b =, MS Err = SS Err s, r = SSReg SSErr SS = SS, SS Tot = ( ) s y, ad S xx = ( ) s x. x Tot r (a) Formally prove that the T-score = for testig the ull hypothesis H 0 : ρ = 0, r b β is equal to the T-score = MS Err Sxx for testig the ull hypothesis H 0 : β = 0. (b) Formally prove that, i simple liear regressio (where df Reg = ), the square of the T-score = b β MS Reg Sxx is equal to the F-ratio = MS Err MS Err for testig the ull hypothesis H 0 : β = 0. Tot
Ismor Fischer, //04 7.4-5 5. I a study of bige eatig disorders amog dieters, the average weights () of a group of overweight wome of similar ages ad lifestyles are measured at the ed of every two moths (X) over a eight moth period. The resultig data values, some accompayig summary statistics, ad the correspodig scatterplot, are show below. X 0 4 6 8 x = 4 s x = 0 00 90 0 80 0 y = 00 s y = 50 (a) Compute the sample covariace s xy betwee the variables X ad. (b) Compute the sample correlatio coefficiet r betwee the variables X ad. Use it to classify the liear correlatio as positive or egative, ad as strog, moderate, or weak. (c) Determie the equatio of the least squares regressio lie for these data. Sketch a graph of this lie o the scatterplot provided above. Please label clearly! (d) Also calculate the fitted respose values ŷ i, ad plot the residuals e i = y i ŷ i, o this same graph. Please label clearly! (e) Calculate the coefficiet of determiatio r, ad iterpret its value i the cotext of evaluatig the fit of this liear model to the sample data. Be as clear as possible. (f) Iterpretatio: Evaluate the overall adequacy of the liear model to these data, usig as much evidece as possible. I particular, refer to at least two formal liear regressio assumptios which may or may ot be satisfied here, ad why.
Ismor Fischer, //04 7.4-6 6. A pharmaceutical compay wishes to evaluate the results of a ew drug assay procedure, performed o = 5 drug samples of differet, but kow potecy X. I a perfect error-free assay, the two sets of values would be idetical, thus resultig i the ideal calibratio lie = X, i.e., = 0 + X. However, experimetal variability geerates the results show below, alog with some accompayig summary statistics: the sample meas, variaces, ad covariace, respectively. X (mg) (mg) 30 40 50 60 70 x = 50 s x = 50 3 39 53 65 7 y = 5 s y = 75 (a) Graph these data poits (x i, y i ) i a scatterplot. s xy = 60 (b) Compute the sample correlatio coefficiet r. Use it to determie whether or ot X ad are liearly correlated; if so, classify as positive or egative, ad as weak, moderate, or strog. (c) Determie the equatio of the least squares regressio lie for these data. Sketch a graph of this lie o the same set of axes as part (a). Also calculate ad plot the fitted respose values ŷ i ad the residuals e i = y i ŷ i, o this same graph. (d) Usig all of this iformatio, complete the followig ANOVA table for this simple liear regressio model. (Hits: SS Total ad df Total ca be obtaied from s y give above; SS Error = residual sum of squares, ad df Error =.) Show all work. Source df SS MS F-ratio p-value Regressio SS Total df Total Error Total (e) Costruct a 95% cofidece iterval for the slope β. (f) Use the p-value i (d) ad the 95% cofidece iterval i (e) to test whether the ull hypothesis H 0 : β = 0 ca be rejected i favor of the alterative H A : β 0, at the α =.05 sigificace level. Iterpret your aswer: What exactly has bee demostrated about ay associatio that might exist betwee X ad? Be precise. (g) Use the 95% cofidece iterval i (e) to test whether the ull hypothesis H 0 : β = ca be rejected i favor of the alterative H A : β, at the α =.05 sigificace level. Iterpret your aswer i cotext: What exactly has bee demostrated about the ew drug assay procedure? Be precise. 7. Refer to the posted Rcode folder for this problem. Please aswer all questios.