Statistics GIDP Ph.D. Qualifying Exam Methodology

Size: px
Start display at page:

Download "Statistics GIDP Ph.D. Qualifying Exam Methodology"

Transcription

1 Statistics GIDP Ph.D. Qualifying Exam Methodology January 9, 2018, 9:00am 1:00pm Instructions: Put your ID (not your name) on each sheet. Complete exactly 5 of 6 problems; turn in only those sheets you wish to have graded. Each question, but not necessarily each part, is equally weighted. Provide answers on the supplied pads of paper and/or use a Microsoft word document or equivalent to report your software code and outputs. Number each problem. You may turn in only one electronic document. Embed relevant code and output/graphics into your word document. Write on only one side of each sheet if you use paper. You may use the computer and/or a calculator. Stay calm and do your best. Good luck! 1. A chemist wishes to test the effect of five chemical agents on the strength of a particular type of cloth. She selects five bolts and applies five chemicals on them. However, because of a limitation in resource, she just can run a design as follows: cloth chemical (a) What design is this? (b) State the statistical model and assumptions. (c) Are Type I and Type III sum of squares equal in the SAS output for the model y = chemical+ cloth +? Why? 1

2 (d) Besides the number of levels of the two factors, what other parameter(s) would you use to describe this design? Also give their value(s). (e) If you re given i 2 = 12.52, what is SS chemical(adjusted)? (f) Fill in the blanks in the ANOVA table below and draw conclusions at = Source DF Seq SS Adj SS Adj MS F P chemical cloth Error Total (g) If the chemist had enough materials to run 5 5 = 25 runs, what design would you suggest to her? And, what is the statistical model? (h) If the chemist had enough materials to run 50 runs and suspected that there might be some interaction between the cloth and chemical, what design would you suggest to her? And, what is the statistical model? 2

3 2. An experimenter is studying the absorbing rate from three medicines. Four batches of pills are randomly selected from each medicine and three determinations of the absorbing rate are made on each batch. Examine the data in the file medicine.csv. (a) From examining the data, indicate what design was used? (b) Give the appropriate statistical model, with assumptions. (c) What are the hypotheses for testing the batch effect? (d) What are the hypotheses for testing the medicine effect? (e) Conduct an analysis of variance on these data. Do any of the factors affect the absorbing rate? Use = Include your SAS code. 3

4 3. The yield of a food product process is being studied. The two factors of interest are temperature and pressure. Three levels of each factor are selected; however, only 9 runs can be made in one day. The experimenter runs a complete replicate of the design on each day - the combination of the levels of pressure and temperature is chosen randomly. The data are shown in the following table. Day 1 Day 2 Pressure Pressure Temperature Low Medium High Here is a portion of the associated SAS output: Sum of Mean Source Squares DF Square F-value Prob > F Day temp pres temp*pres Residual Cor Total (a) What design is this? (b) State the statistical model and the corresponding assumptions. (c) Fill in the blanks in the ANOVA table below: Sum of Mean Source Squares DF Square F-value Prob > F Day temp pres temp*pres Residual Cor Total

5 (d) Is it reasonable to use the ANOVA term and p-value for the term Day to evaluate the significance of this factor? If yes, calculate it. If not, explain why not. (e) Draw conclusions at =0.05. (f) If the experimenter did not include the factor day in the statistical model, what would a new ANOVA table look like? (Use the information given in the SAS output above.) Sum of Source Squares DF Mean Square F-Value Prob > F 5

6 4. A study recorded laboratory animal responses (Y) to a drug a related to three quantitative predictor variables: X 1 = body weight (grams), X 2 = age (months), and X 3 = administered drug dose (mg). These were recorded as follows: Y X 1 X 2 X 3 Y X 1 X 2 X The data are found in the file animal3.csv. Assume a homogenous-variance, multiple linear regression model containing only first-order terms is appropriate for these data. In any computer calculations you perform below, supply both your supporting code and pertinent output for your answers. (a) Check the predictor variables for any concerns with multicollinearity. What do you find? (b) Perform a ridge regression on these data: construct and display a trace plot and suggest a reasonable value for the tuning parameter c. State why you chose this value. (c) Consider the following values of c: 2.7, 1.5, 8.9. Choose the most reasonable from among these values and employ it as your biasing constant in a ridge regression of Y on X 1, X 2, and X 3. Give the resulting ridge estimators for all the regression coefficients. 6

7 In a biomonitoring study of workplace chemical exposure, retired factory workers were assayed for blood concentrations of an industrial chemical. Three predictor variables were also recorded: X 1 = Years worked, X 2 = Years retired, and X 3 = Age. The data are available in the file chemical.csv. In any computer calculations you perform below, supply both your supporting code and pertinent output for your answers. (a) One might argue that when Age = 0, we would expect the response to be zero. Explain why we would then also expect the response to be zero when X 1 = X 2 = 0. Operate under this assumption that the response is zero when all three predictor variables are zero, and fit an appropriate multiple regression model (use only first-order terms) to these data using all three predictors. Identify which, if any, of the three predictors significantly affects E[Y]. Conduct your tests at a family-wise error rate (FWER) of 0.5%. (b) Plot the residuals from the model fit in part (a) against the predicted values from the fit. Do any untoward patterns appear? (c) Assess if any observations possess high leverage in the model fit from part (a). (d) Since X 1 and X 2 represent time at or after possible workplace exposure, consider the joint null hypothesis H o : 1 = 2 = 0 vs. H a : any departure. Perform a single test of H o against H a at a false positive rate of 0.5%. 7

8 6. Consider the multiple linear regression model Y ~ N p (X, 2 ), where Y is an n 1 vector, X is an n p matrix, and is a p 1 vector. Show that the maximum likelihood estimator for is derived from the same estimating equations as the least squares estimator for, making the two estimators identical. For simplicity, you may assume that 2 is known. [Hint: from matrix calculus, recall that Va a = VT and at Ua = (U + U a T )a, for conformable matrices U and V and vector a.] 8

9 Statistics GIDP Ph.D. Qualifying Exam Methodology January 9, 2018, 9:00am 1:00pm Instructions: Put your ID (not your name) on each sheet. Complete exactly 5 of 6 problems; turn in only those sheets you wish to have graded. Each question, but not necessarily each part, is equally weighted. Provide answers on the supplied pads of paper and/or use a Microsoft word document or equivalent to report your software code and outputs. Number each problem. You may turn in only one electronic document. Embed relevant code and output/graphics into your word document. Write on only one side of each sheet if you use paper. You may use the computer and/or a calculator. Stay calm and do your best. Good luck! 1. A chemist wishes to test the effect of five chemical agents on the strength of a particular type of cloth. She selects five bolts and applies five chemicals on them. However, because of a limitation in resource, she just can run a design as follows: cloth chemical (a) What design is this? BIBD (balanced incomplete block design). (b) State the statistical model and assumptions. y ij = μ + i + j + ij, i = 1,...,5; j = 1,...,5 i = 0, j = 0, ij ~ N(0, 2 ) (c) Are Type I and Type III sum of squares equal in the SAS output for the model y = chemical+ cloth +? Why? No, as the orthogonality does not hold. 1

10 (d) Besides the number of levels of the two factors, what other parameter(s) would you use to describe this design? Also give their value(s). The other parameters are k and γ, the number of treatments per block and how many times each treatment appears, respectively, as well as, the number of blocks in which each pair of treatments appears together. In this case, k=4, γ =4, =3. (e) If you re given ˆ i 2 = 12.52, what is SS chemical(adjusted)? SS chemical(adjusted) = ( a/k) ˆ i 2 = (3)(5)(12.52)/4 = (f) Fill in the blanks in the ANOVA table below and draw conclusions at = Source DF Seq SS Adj SS Adj MS F P chemical < cloth Error Total We see both chemical and cloth are significant at = (g) If the chemist had enough materials to run 5 5 = 25 runs, what design would you suggest to her? And, what is the statistical model? RCBD (randomized complete block design). y ij = μ + i + j + ij, i = 1,...,5; j = 1,...,5 i = 0, j = 0, ij ~ N(0, 2 ) (h) If the chemist had enough materials to run 50 runs and suspected that there might be some interaction between the cloth and chemical, what design would you suggest to her? And, what is the statistical model? Factorial design, with two replicates for each combination. y ijk = μ + i + j + ( ) ij + ijk, i = 1,...,5; j = 1,...,5; k = 1,2 i = 0, j = 0, ( ) ij = 0, ijk ~ N(0, 2 ) 2. An experimenter is studying the absorbing rate from three medicines. Four batches of pills are randomly selected from each medicine and three determinations of the absorbing rate are made on each batch. Examine the data in the file medicine.csv. (a) From examining the data, indicate what design was used? Nested design. (b) Give the appropriate statistical model, with assumptions. 2

11 Y ijk = μ + i + j(i) + k(ij) where represents the medicine effect [a fixed effect with i = 0], represents the batch effect [a random effect with j(i) ~N(0, 2 )], and k(ij) ~ iid N(0, 2 ). (c) What are the hypotheses for testing the batch effect? H 0 : 2 = 0 H 1 : 2 > 0 (d) What are the hypotheses for testing the medicine effect? H 0 : 1 = 2 = 3 = 0 H 1 : at least one i 0 (e) Conduct an analysis of variance on these data. Do any of the factors affect the absorbing rate? Use = Include your SAS code. Analysis of Variance for Absorbing Rate Source DF SS MS F P Medicine Batch(Medicine) < Error Total There is a significant batch effect at = 0.05, as the associated p-value is P < < data Q2; input Medicine datalines; Batch 3

12 ; 33 proc print data=q2; run; proc mixed data=q2 method=type1; class Medicine Batch; model Rate=Medicine; random Batch(Medicine); run; 3. The yield of a food product process is being studied. The two factors of interest are temperature and pressure. Three levels of each factor are selected; however, only 9 runs can be made in one day. The experimenter runs a complete replicate of the design on each day - the combination of the levels of pressure and temperature is chosen randomly. The data are shown in the following table. Day 1 Day 2 Pressure Pressure Temperature Low Medium High Here is a portion of the associated SAS output: Sum of Mean Source Squares DF Square F-value Prob > F Day temp pres temp*pres Residual Cor Total

13 (a) What design is this? Block factorial design. (b) State the statistical model and the corresponding assumptions. y ijk = μ + i + j + ( ) ij + k + ijk, i = 1,2,3; j = 1,2,3; k = 1,2 i = 0, j = 0, i ( ) ij = 0, j ( ) ij = 0, k = 0,, ijk ~ N(0, 2 ) (c) Fill in the blanks in the ANOVA table below: Sum of Mean Source Squares DF Square F-value Prob > F Day temp _ pres _93.98 < _ temp*pres Residual Cor Total (d) Is it reasonable to use the ANOVA term and p-value for the term Day to evaluate the significance of this factor? If yes, calculate it. If not, explain why not. Yes. For Day, F day = 13.01/0.53 = 24.55, p-value< (e) Draw conclusions at =0.05. Both main effects, temperature and pressure, and the blocking factor are significant, as all corresponding p-values are less than = (f) If the experimenter did not include the factor day in the statistical model, what would a new ANOVA table look like? (Use the information given in the SAS output above.) Sum of Source Squares DF Mean Square F-Value Prob > F temp pres temp*pres Residual Cor Total

14 4. A study recorded laboratory animal responses (Y) to a drug a related to three quantitative predictor variables: X 1 = body weight (grams), X 2 = age (months), and X 3 = administered drug dose (mg). These were recorded as follows: Y X 1 X 2 X 3 Y X 1 X 2 X The data are found in the file animal3.csv. Assume a homogenous-variance, multiple linear regression model containing only first-order terms is appropriate for these data. In any computer calculations you perform below, supply both your supporting code and pertinent output for your answers. (a) Check the predictor variables for any concerns with multicollinearity. What do you find? To start, always plot the data! Sample R code animal.df = read.csv( file.choose() ) attach( animal.df ) pairs( cbind(y,x1,x2,x3) ) 6

15 The plot shows an especially strong linear relationship between X 1 and X 3. For assessing multicollinearity more closely, examine the correlations and the VIFs. Sample R code: cor( cbind(x1,x2,x3) ) library ( car ) vif( lm(y ~ X1+X2+X3)) mean( vif(lm(y ~ X1+X2+X3)) ) Output (edited) is: X1 X2 X3 X X X for the correlations, and X1 X2 X for the VIFs and [1] for VIF. Since max{vif k } = clearly exceeds 10, and VIF = is far larger than 6.0, there is a clear problem with multicollinearity here. The scatterplot matrix and correlation matrix both suggest that X 1 and X 3 are the greater culprits. (If one removes X 3 the VIFs drop to near But, one does not remove predictor variables just because they are highly collinear!) (b) Perform a ridge regression on these data: construct and display a trace plot and suggest a reasonable value for the tuning parameter c. State why you chose this value. Given the high multicollinearity, ridge regression is a valid alternative, but remember first to center the response variable and standardize the predictor variables. Sample R code: U = Y - mean(y) Z1 = scale( X1 ); Z2 = scale( X2 ); Z3 = scale( X3 ) library ( genridge ) const = seq(.001, 5,.001) animal.ridge = ridge( U ~ Z1 + Z2 + Z3, lambda = const ) traceplot( animal.ridge, pch='.', cex=1.5) detach( animal.df ) This produces the trace plot below. The trace for Z 2 is uninformative, but the traces for Z 1 and Z 3 suggest that their coefficient trace curves flatten by about c = 1.5, or possibly c = 2, thus something in this range would be a reasonable choice for c. 7

16 By the way: it is interesting to notice the two dashed vertical lines in the plot near c = 0. Digging into this R function, we find that these are the recommended values for c from two standard sources: the HKB value suggested by Hoerl et al. (1975, Communications in Statistics 4, pp ) and the LW value suggested by Lawless and Wang (1976, Communications in Statistics 5, pp ). Find these in the ridge() object as c( animal.ridge$khkb, animal.ridge$klw ) producing [1] As can be seen, these are much smaller than the visual indication that c is near 1.5 or 2.0. Selection of the biasing parameter c in a ridge regression is a continually developing area of study. (c) Consider the following values of c: 2.7, 1.5, 8.9. Choose the most reasonable from among these values and employ it as your biasing constant in a ridge regression of Y on X 1, X 2, and X 3. Give the resulting ridge estimators for all the regression coefficients. From among the choices c = 2.7, 1.5, 8.9, the choice of c = 2.7 is nonsensical, as c cannot be negative; while the choice of c = 8.9 gives far too large a biasing effect, as the trace plot in part (b) stabilizes well before this point. Thus c = 1.5 is the most reasonable choice, in concordance with the analysis presented in part (b). To perform the ridge analysis, we continue to use the centered response variable U and the standardized predictors Z 1, Z 2, Z 3. Sample R code is: 8

17 library ( genridge ) const = 1.5 animal1.ridge = ridge( U ~ Z1 + Z2 + Z3, lambda = const ) coef( animal1.ridge ) with consequent output Z1 Z2 Z (We could also have extracted the estimates directly as animal1.ridge$coef.) The three estimates are b R1 = , b R2 = , and b R3 = In a biomonitoring study of workplace chemical exposure, retired factory workers were assayed for blood concentrations of an industrial chemical. Three predictor variables were also recorded: X 1 = Years worked, X 2 = Years retired, and X 3 = Age. The data are available in the file chemical.csv. In any computer calculations you perform below, supply both your supporting code and pertinent output for your answers. (a) One might argue that when Age = 0, we would expect the response to be zero. Explain why we would then also expect the response to be zero when X 1 = X 2 = 0. Operate under this assumption that the response is zero when all three predictor variables are zero, and fit an appropriate multiple regression model (use only first-order terms) to these data using all three predictors. Identify which, if any, of the three predictors significantly affects E[Y]. Conduct your tests at a family-wise error rate (FWER) of 0.5%. Obviously when Age = X 3 = 0, the subject has just been born, so s/he could not have worked any years (so X 1 = 0), nor could s/he have spent any years in retirement (so X 2 = 0). Next, always plot the data! Sample R code and consequent scatterplot: chemical.df = read.csv( file.choose() ) attach( chemical.df ) X1 = Years.worked X2 = Years.retired X3 = Age pairs( cbind(y,x1,x2,x3) ) pairs( concrete.df ) 9

18 The various patterns between Y and the predictor variables are mixed; a formal analysis will prove interesting. Also, there appears to be some slight correlation between X 2 and X 3, but less for the other parings. This is verified by checking the pairwise correlations: cor( cbind(x1,x2,x3) ) X1 X2 X3 X X X So, for the record, some slight problems with multicollinearity may be present. (VIFs are not useful in linear regression through-the-origin, so these are not worth calculating here.) Next, perform the LS fit. We operate without an intercept, as per the instructions in the problem. The model sets E[Y i ] = 1 X i1 + 2 X i2 + 3 X i3. We test the three separate hypothesis H oj : j = 0 vs. H a : j 0 (no indication was given for any one-sided alternatives) for j = 1,2,3. Sample R code and (edited) output: chemical.lm = lm( Y ~ X1 + X2 + X3-1 ) summary( chemical.lm ) 10

19 Call: lm(formula = Y ~ X1 + X2 + X3-1) Coefficients: Estimate Std. Error t value Pr(> t ) X X X e-05 *** Residual standard error: on 23 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 3 and 23 DF, p-value: < 2.2e-16 To test the g = 3 hypotheses, extract their corresponding p-values from their partial t-tests and then apply a Bonferroni adjustment: pvals = summary( chemical.lm )$coefficients[,4] p.adjust( pvals, method = "bonferroni" ) X1 X2 X e e e-05 The Bonferroni-adjusted p-values for H o1 : 1 =0 and H o2 : 2 =0 are reported as 1.0 so they clearly exceed = Hence, we fail to reject H o1 and H o2. For H o3 : 3 = 0 we find p 3 = < =. Thus we reject H o3. We conclude that there is no significant effect of Years worked on E[Y] there is no significant effect of Years retired on E[Y] there is a significant effect of Age on E[Y]. (b) Plot the residuals from the model fit in part (a) against the predicted values from the fit. Do any untoward patterns appear? Residual plot: sample R code and resulting residual plot are plot( resid(chemical.lm) ~ predict(chemical.lm), pch=19 ) abline( h=0 ) 11

20 We see no troublesome patterns in the residual plot. (The one residual far to the right in the plot is eye-catching, but it does not indicate a problem with the model fit. We might think to check for possible leverage points, however, so see part (c) below.) (c) Assess if any observations possess high leverage in the model fit from part (a). Leverage analysis: sample R code and (edited) output are hii = hatvalues( chemical.lm ) p = length( coef(chemical.lm) ); n = length(y); print( 2*p/n ) [1] The rule-of-thumb cut-off is seen to be h ii > 2p/n = (2)(3)/26 = A fast check via R employs which( hii > 2*p/n ) producing i.e., the three observations at i = 5, 7, 18 exert high leverage on the model fit. Referring back to the residual plot, we can query what the predicted values were for these three leverage points: predict(chemical.lm)[which(hii > 2*p/n)] The extreme predicted value at Ŷ5 = is indeed seen to one of the high leverage points! (d) Since X 1 and X 2 represent time at or after possible workplace exposure, consider the joint null hypothesis H o : 1 = 2 = 0 vs. H a : any departure. Perform a single test of H o against H a at a false positive rate of 0.5%. Reduced model hypothesis test: sample R code and (edited) output are anova( lm(y ~ X3-1), chemical.lm ) Model 1: Y ~ X3-1 Model 2: Y ~ X1 + X2 + X3-1 Res.Df RSS Df Sum of Sq F Pr(>F) We see the test statistic is F* = with (2,23) d.f. The corresponding p-value is P[F(2,23) ] = which is clearly larger than = Thus we fail to reject 12

21 H o and conclude that Years Worked and Years Retired do not contribute significantly to the model fit. Alternatively, using a rejection region approach: find the critical point as F(0.995, 2, 23) = As F* = < , we again fail to reject H o. 6. Consider the multiple linear regression model Y ~ N p (X, 2 ), where Y is an n 1 vector, X is an n p matrix, and is a p 1 vector. Show that the maximum likelihood estimator for is derived from the same estimating equations as the least squares estimator for, making the two estimators identical. For simplicity, you may assume that 2 is known. [Hint: from matrix calculus, recall that Va a = VT and at Ua = (U + U a T )a, for conformable matrices U and V and vector a.] The likelihood function is L( ) = ( {2π} 1/2 ) n exp{ (Y X ) T (Y X )/2 2 }, so the loglikelihood becomes l( ) = n log( {2π} 1/2 ) (Y X ) T (Y X )/2 2 = n log( {2π} 1/2 ) {Y T Y T X T Y Y T X + T X T X }/2 2. Notice that T X T Y is a scalar, so it must equal its transpose: T X T Y = ( T X T Y) T = Y T X. Then l( ) = n log( {2π} 1/2 ) {Y T Y 2Y T X + T X T X }/2 2. Now take the first derivative with respect to : l( ) = 1 (2YT X T X T X ) 2 = YT X T X T X. From the Hint, Y T X = (Y T X) T = X T Y while T X T X = (X T X + {X T X} T ) = 2X T X the latter equality holding since X T X is clearly symmetric. Then, l( ) = 1 2 (2X T Y 2X T X ) 13

22 so setting l( )/ = 0 (a p 1 vector of zeroes), yields 2X T Y 2X T X = 0, or simply X T X = X T Y. This is equivalent to the least squares normal equations for given in Equation (6.24) from Kutner et al. s textbook, which shows that the two estimation methods lead to the same estimating equations. 14

Statistics GIDP Ph.D. Qualifying Exam Methodology January 10, 9:00am-1:00pm

Statistics GIDP Ph.D. Qualifying Exam Methodology January 10, 9:00am-1:00pm Statistics GIDP Ph.D. Qualifying Exam Methodology January 10, 9:00am-1:00pm Instructions: Put your ID (not name) on each sheet. Complete exactly 5 of 6 problems; turn in only those sheets you wish to have

More information

Statistics GIDP Ph.D. Qualifying Exam Methodology May 26 9:00am-1:00pm

Statistics GIDP Ph.D. Qualifying Exam Methodology May 26 9:00am-1:00pm Statistics GIDP Ph.D. Qualifying Exam Methodology May 26 9:00am-1:00pm Instructions: Put your ID (not name) on each sheet. Complete exactly 5 of 6 problems; turn in only those sheets you wish to have graded.

More information

Statistics GIDP Ph.D. Qualifying Exam Methodology May 26 9:00am-1:00pm

Statistics GIDP Ph.D. Qualifying Exam Methodology May 26 9:00am-1:00pm Statistics GIDP Ph.D. Qualifying Exam Methodology May 26 9:00am-1:00pm Instructions: Put your ID (not name) on each sheet. Complete exactly 5 of 6 problems; turn in only those sheets you wish to have graded.

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

Advanced Statistical Regression Analysis: Mid-Term Exam Chapters 1-5

Advanced Statistical Regression Analysis: Mid-Term Exam Chapters 1-5 Advanced Statistical Regression Analysis: Mid-Term Exam Chapters 1-5 Instructions: Read each question carefully before determining the best answer. Show all work; supporting computer code and output must

More information

Statistics GIDP Ph.D. Qualifying Exam Methodology

Statistics GIDP Ph.D. Qualifying Exam Methodology Statistics GIDP Ph.D. Qualifying Exam Methodology May 28, 2015, 9:00am- 1:00pm Instructions: Provide answers on the supplied pads of paper; write on only one side of each sheet. Complete exactly 2 of the

More information

STAT 571A Advanced Statistical Regression Analysis. Chapter 8 NOTES Quantitative and Qualitative Predictors for MLR

STAT 571A Advanced Statistical Regression Analysis. Chapter 8 NOTES Quantitative and Qualitative Predictors for MLR STAT 571A Advanced Statistical Regression Analysis Chapter 8 NOTES Quantitative and Qualitative Predictors for MLR 2015 University of Arizona Statistics GIDP. All rights reserved, except where previous

More information

STAT 525 Fall Final exam. Tuesday December 14, 2010

STAT 525 Fall Final exam. Tuesday December 14, 2010 STAT 525 Fall 2010 Final exam Tuesday December 14, 2010 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will

More information

Tests of Linear Restrictions

Tests of Linear Restrictions Tests of Linear Restrictions 1. Linear Restricted in Regression Models In this tutorial, we consider tests on general linear restrictions on regression coefficients. In other tutorials, we examine some

More information

1 Use of indicator random variables. (Chapter 8)

1 Use of indicator random variables. (Chapter 8) 1 Use of indicator random variables. (Chapter 8) let I(A) = 1 if the event A occurs, and I(A) = 0 otherwise. I(A) is referred to as the indicator of the event A. The notation I A is often used. 1 2 Fitting

More information

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as: 1 Joint hypotheses The null and alternative hypotheses can usually be interpreted as a restricted model ( ) and an model ( ). In our example: Note that if the model fits significantly better than the restricted

More information

No other aids are allowed. For example you are not allowed to have any other textbook or past exams.

No other aids are allowed. For example you are not allowed to have any other textbook or past exams. UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Sample Exam Note: This is one of our past exams, In fact the only past exam with R. Before that we were using SAS. In

More information

Written Exam (2 hours)

Written Exam (2 hours) M. Müller Applied Analysis of Variance and Experimental Design Summer 2015 Written Exam (2 hours) General remarks: Open book exam. Switch off your mobile phone! Do not stay too long on a part where you

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

Statistics GIDP Ph.D. Qualifying Exam Methodology

Statistics GIDP Ph.D. Qualifying Exam Methodology Statistics GIDP Ph.D. Qualifying Exam Methodology May 28th, 2015, 9:00am- 1:00pm Instructions: Provide answers on the supplied pads of paper; write on only one side of each sheet. Complete exactly 2 of

More information

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box. FINAL EXAM ** Two different ways to submit your answer sheet (i) Use MS-Word and place it in a drop-box. (ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box. Deadline: December

More information

STAT 350: Summer Semester Midterm 1: Solutions

STAT 350: Summer Semester Midterm 1: Solutions Name: Student Number: STAT 350: Summer Semester 2008 Midterm 1: Solutions 9 June 2008 Instructor: Richard Lockhart Instructions: This is an open book test. You may use notes, text, other books and a calculator.

More information

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments. Analysis of Covariance In some experiments, the experimental units (subjects) are nonhomogeneous or there is variation in the experimental conditions that are not due to the treatments. For example, a

More information

What If There Are More Than. Two Factor Levels?

What If There Are More Than. Two Factor Levels? What If There Are More Than Chapter 3 Two Factor Levels? Comparing more that two factor levels the analysis of variance ANOVA decomposition of total variability Statistical testing & analysis Checking

More information

A discussion on multiple regression models

A discussion on multiple regression models A discussion on multiple regression models In our previous discussion of simple linear regression, we focused on a model in which one independent or explanatory variable X was used to predict the value

More information

ST430 Exam 2 Solutions

ST430 Exam 2 Solutions ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving

More information

STAT 501 EXAM I NAME Spring 1999

STAT 501 EXAM I NAME Spring 1999 STAT 501 EXAM I NAME Spring 1999 Instructions: You may use only your calculator and the attached tables and formula sheet. You can detach the tables and formula sheet from the rest of this exam. Show your

More information

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

y ˆ i = ˆ  T u i ( i th fitted value or i th fit) 1 2 INFERENCE FOR MULTIPLE LINEAR REGRESSION Recall Terminology: p predictors x 1, x 2,, x p Some might be indicator variables for categorical variables) k-1 non-constant terms u 1, u 2,, u k-1 Each u

More information

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 PDF file location: http://www.murraylax.org/rtutorials/regression_anovatable.pdf

More information

Statistics GIDP Ph.D. Qualifying Exam Methodology

Statistics GIDP Ph.D. Qualifying Exam Methodology Statistics GIDP Ph.D. Qualifying Exam Methodology May 26, 2017, 9:00am-1:00pm Instructions: Put your ID (not your name) on each sheet. Complete exactly 5 of 6 problems; turn in only those sheets you wish

More information

Variance Decomposition and Goodness of Fit

Variance Decomposition and Goodness of Fit Variance Decomposition and Goodness of Fit 1. Example: Monthly Earnings and Years of Education In this tutorial, we will focus on an example that explores the relationship between total monthly earnings

More information

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA s:5 Applied Linear Regression Chapter 8: ANOVA Two-way ANOVA Used to compare populations means when the populations are classified by two factors (or categorical variables) For example sex and occupation

More information

Part II { Oneway Anova, Simple Linear Regression and ANCOVA with R

Part II { Oneway Anova, Simple Linear Regression and ANCOVA with R Part II { Oneway Anova, Simple Linear Regression and ANCOVA with R Gilles Lamothe February 21, 2017 Contents 1 Anova with one factor 2 1.1 The data.......................................... 2 1.2 A visual

More information

SCHOOL OF MATHEMATICS AND STATISTICS

SCHOOL OF MATHEMATICS AND STATISTICS RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

ST430 Exam 1 with Answers

ST430 Exam 1 with Answers ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.

More information

Stat 5102 Final Exam May 14, 2015

Stat 5102 Final Exam May 14, 2015 Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions

More information

Diagnostics and Transformations Part 2

Diagnostics and Transformations Part 2 Diagnostics and Transformations Part 2 Bivariate Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University Multilevel Regression Modeling, 2009 Diagnostics

More information

Ph.D. Preliminary Examination Statistics June 2, 2014

Ph.D. Preliminary Examination Statistics June 2, 2014 Ph.D. Preliminary Examination Statistics June, 04 NOTES:. The exam is worth 00 points.. Partial credit may be given for partial answers if possible.. There are 5 pages in this exam paper. I have neither

More information

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) 22s:152 Applied Linear Regression Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) We now consider an analysis with only categorical predictors (i.e. all predictors are

More information

Lecture 1 Linear Regression with One Predictor Variable.p2

Lecture 1 Linear Regression with One Predictor Variable.p2 Lecture Linear Regression with One Predictor Variablep - Basics - Meaning of regression parameters p - β - the slope of the regression line -it indicates the change in mean of the probability distn of

More information

MODELS WITHOUT AN INTERCEPT

MODELS WITHOUT AN INTERCEPT Consider the balanced two factor design MODELS WITHOUT AN INTERCEPT Factor A 3 levels, indexed j 0, 1, 2; Factor B 5 levels, indexed l 0, 1, 2, 3, 4; n jl 4 replicate observations for each factor level

More information

Name: Biostatistics 1 st year Comprehensive Examination: Applied in-class exam. June 8 th, 2016: 9am to 1pm

Name: Biostatistics 1 st year Comprehensive Examination: Applied in-class exam. June 8 th, 2016: 9am to 1pm Name: Biostatistics 1 st year Comprehensive Examination: Applied in-class exam June 8 th, 2016: 9am to 1pm Instructions: 1. This is exam is to be completed independently. Do not discuss your work with

More information

(4) 1. Create dummy variables for Town. Name these dummy variables A and B. These 0,1 variables now indicate the location of the house.

(4) 1. Create dummy variables for Town. Name these dummy variables A and B. These 0,1 variables now indicate the location of the house. Exam 3 Resource Economics 312 Introductory Econometrics Please complete all questions on this exam. The data in the spreadsheet: Exam 3- Home Prices.xls are to be used for all analyses. These data are

More information

Regression and the 2-Sample t

Regression and the 2-Sample t Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression

More information

Lec 1: An Introduction to ANOVA

Lec 1: An Introduction to ANOVA Ying Li Stockholm University October 31, 2011 Three end-aisle displays Which is the best? Design of the Experiment Identify the stores of the similar size and type. The displays are randomly assigned to

More information

MULTICOLLINEARITY AND VARIANCE INFLATION FACTORS. F. Chiaromonte 1

MULTICOLLINEARITY AND VARIANCE INFLATION FACTORS. F. Chiaromonte 1 MULTICOLLINEARITY AND VARIANCE INFLATION FACTORS F. Chiaromonte 1 Pool of available predictors/terms from them in the data set. Related to model selection, are the questions: What is the relative importance

More information

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference. Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences

More information

22s:152 Applied Linear Regression. 1-way ANOVA visual:

22s:152 Applied Linear Regression. 1-way ANOVA visual: 22s:152 Applied Linear Regression 1-way ANOVA visual: Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Y We now consider an analysis

More information

Essential of Simple regression

Essential of Simple regression Essential of Simple regression We use simple regression when we are interested in the relationship between two variables (e.g., x is class size, and y is student s GPA). For simplicity we assume the relationship

More information

STATISTICS 110/201 PRACTICE FINAL EXAM

STATISTICS 110/201 PRACTICE FINAL EXAM STATISTICS 110/201 PRACTICE FINAL EXAM Questions 1 to 5: There is a downloadable Stata package that produces sequential sums of squares for regression. In other words, the SS is built up as each variable

More information

Comparing Nested Models

Comparing Nested Models Comparing Nested Models ST 370 Two regression models are called nested if one contains all the predictors of the other, and some additional predictors. For example, the first-order model in two independent

More information

Analysis of Variance. Read Chapter 14 and Sections to review one-way ANOVA.

Analysis of Variance. Read Chapter 14 and Sections to review one-way ANOVA. Analysis of Variance Read Chapter 14 and Sections 15.1-15.2 to review one-way ANOVA. Design of an experiment the process of planning an experiment to insure that an appropriate analysis is possible. Some

More information

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph. Regression, Part I I. Difference from correlation. II. Basic idea: A) Correlation describes the relationship between two variables, where neither is independent or a predictor. - In correlation, it would

More information

22s:152 Applied Linear Regression. Take random samples from each of m populations.

22s:152 Applied Linear Regression. Take random samples from each of m populations. 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

Masters Comprehensive Examination Department of Statistics, University of Florida

Masters Comprehensive Examination Department of Statistics, University of Florida Masters Comprehensive Examination Department of Statistics, University of Florida May 6, 003, 8:00 am - :00 noon Instructions: You have four hours to answer questions in this examination You must show

More information

CAS MA575 Linear Models

CAS MA575 Linear Models CAS MA575 Linear Models Boston University, Fall 2013 Midterm Exam (Correction) Instructor: Cedric Ginestet Date: 22 Oct 2013. Maximal Score: 200pts. Please Note: You will only be graded on work and answers

More information

Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression

Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression Scenario: 31 counts (over a 30-second period) were recorded from a Geiger counter at a nuclear

More information

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75

More information

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

 M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2 Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the

More information

STAT22200 Spring 2014 Chapter 14

STAT22200 Spring 2014 Chapter 14 STAT22200 Spring 2014 Chapter 14 Yibi Huang May 27, 2014 Chapter 14 Incomplete Block Designs 14.1 Balanced Incomplete Block Designs (BIBD) Chapter 14-1 Incomplete Block Designs A Brief Introduction to

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

Final Exam. Name: Solution:

Final Exam. Name: Solution: Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.

More information

Regression. Marc H. Mehlman University of New Haven

Regression. Marc H. Mehlman University of New Haven Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and

More information

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

1 A Review of Correlation and Regression

1 A Review of Correlation and Regression 1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then

More information

Lecture 10. Factorial experiments (2-way ANOVA etc)

Lecture 10. Factorial experiments (2-way ANOVA etc) Lecture 10. Factorial experiments (2-way ANOVA etc) Jesper Rydén Matematiska institutionen, Uppsala universitet jesper@math.uu.se Regression and Analysis of Variance autumn 2014 A factorial experiment

More information

More about Single Factor Experiments

More about Single Factor Experiments More about Single Factor Experiments 1 2 3 0 / 23 1 2 3 1 / 23 Parameter estimation Effect Model (1): Y ij = µ + A i + ɛ ij, Ji A i = 0 Estimation: µ + A i = y i. ˆµ = y..  i = y i. y.. Effect Modell

More information

14 Multiple Linear Regression

14 Multiple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in

More information

Solution to Final Exam

Solution to Final Exam Stat 660 Solution to Final Exam. (5 points) A large pharmaceutical company is interested in testing the uniformity (a continuous measurement that can be taken by a measurement instrument) of their film-coated

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Statistics - Lecture Three. Linear Models. Charlotte Wickham 1.

Statistics - Lecture Three. Linear Models. Charlotte Wickham   1. Statistics - Lecture Three Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Linear Models 1. The Theory 2. Practical Use 3. How to do it in R 4. An example 5. Extensions

More information

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013 UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013 STAC67H3 Regression Analysis Duration: One hour and fifty minutes Last Name: First Name: Student

More information

Analysis of variance. Gilles Guillot. September 30, Gilles Guillot September 30, / 29

Analysis of variance. Gilles Guillot. September 30, Gilles Guillot September 30, / 29 Analysis of variance Gilles Guillot gigu@dtu.dk September 30, 2013 Gilles Guillot (gigu@dtu.dk) September 30, 2013 1 / 29 1 Introductory example 2 One-way ANOVA 3 Two-way ANOVA 4 Two-way ANOVA with interactions

More information

The Random Effects Model Introduction

The Random Effects Model Introduction The Random Effects Model Introduction Sometimes, treatments included in experiment are randomly chosen from set of all possible treatments. Conclusions from such experiment can then be generalized to other

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

COMPARING SEVERAL MEANS: ANOVA

COMPARING SEVERAL MEANS: ANOVA LAST UPDATED: November 15, 2012 COMPARING SEVERAL MEANS: ANOVA Objectives 2 Basic principles of ANOVA Equations underlying one-way ANOVA Doing a one-way ANOVA in R Following up an ANOVA: Planned contrasts/comparisons

More information

EXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :"

EXST Regression Techniques Page 1. We can also test the hypothesis H : œ 0 versus H : EXST704 - Regression Techniques Page 1 Using F tests instead of t-tests We can also test the hypothesis H :" œ 0 versus H :" Á 0 with an F test.! " " " F œ MSRegression MSError This test is mathematically

More information

Practice Final Examination

Practice Final Examination Practice Final Examination Mth 136 = Sta 114 Wednesday, 2000 April 26, 2:20 3:00 pm This is a closed-book examination so please do not refer to your notes, the text, or to any other books You may use a

More information

Mathematical Notation Math Introduction to Applied Statistics

Mathematical Notation Math Introduction to Applied Statistics Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and should be emailed to the instructor

More information

Chapter 5 Introduction to Factorial Designs Solutions

Chapter 5 Introduction to Factorial Designs Solutions Solutions from Montgomery, D. C. (1) Design and Analysis of Experiments, Wiley, NY Chapter 5 Introduction to Factorial Designs Solutions 5.1. The following output was obtained from a computer program that

More information

STA 101 Final Review

STA 101 Final Review STA 101 Final Review Statistics 101 Thomas Leininger June 24, 2013 Announcements All work (besides projects) should be returned to you and should be entered on Sakai. Office Hour: 2 3pm today (Old Chem

More information

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model Lab 3 A Quick Introduction to Multiple Linear Regression Psychology 310 Instructions.Work through the lab, saving the output as you go. You will be submitting your assignment as an R Markdown document.

More information

36-707: Regression Analysis Homework Solutions. Homework 3

36-707: Regression Analysis Homework Solutions. Homework 3 36-707: Regression Analysis Homework Solutions Homework 3 Fall 2012 Problem 1 Y i = βx i + ɛ i, i {1, 2,..., n}. (a) Find the LS estimator of β: RSS = Σ n i=1(y i βx i ) 2 RSS β = Σ n i=1( 2X i )(Y i βx

More information

Biostatistics for physicists fall Correlation Linear regression Analysis of variance

Biostatistics for physicists fall Correlation Linear regression Analysis of variance Biostatistics for physicists fall 2015 Correlation Linear regression Analysis of variance Correlation Example: Antibody level on 38 newborns and their mothers There is a positive correlation in antibody

More information

Exam: high-dimensional data analysis February 28, 2014

Exam: high-dimensional data analysis February 28, 2014 Exam: high-dimensional data analysis February 28, 2014 Instructions: - Write clearly. Scribbles will not be deciphered. - Answer each main question (not the subquestions) on a separate piece of paper.

More information

BIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES

BIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES BIOL 458 - Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES PART 1: INTRODUCTION TO ANOVA Purpose of ANOVA Analysis of Variance (ANOVA) is an extremely useful statistical method

More information

Swarthmore Honors Exam 2012: Statistics

Swarthmore Honors Exam 2012: Statistics Swarthmore Honors Exam 2012: Statistics 1 Swarthmore Honors Exam 2012: Statistics John W. Emerson, Yale University NAME: Instructions: This is a closed-book three-hour exam having six questions. You may

More information

13 Simple Linear Regression

13 Simple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity

More information

Econometrics Midterm Examination Answers

Econometrics Midterm Examination Answers Econometrics Midterm Examination Answers March 4, 204. Question (35 points) Answer the following short questions. (i) De ne what is an unbiased estimator. Show that X is an unbiased estimator for E(X i

More information

Sociology 593 Exam 1 Answer Key February 17, 1995

Sociology 593 Exam 1 Answer Key February 17, 1995 Sociology 593 Exam 1 Answer Key February 17, 1995 I. True-False. (5 points) Indicate whether the following statements are true or false. If false, briefly explain why. 1. A researcher regressed Y on. When

More information

Ch 13 & 14 - Regression Analysis

Ch 13 & 14 - Regression Analysis Ch 3 & 4 - Regression Analysis Simple Regression Model I. Multiple Choice:. A simple regression is a regression model that contains a. only one independent variable b. only one dependent variable c. more

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the

More information

SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning

SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning SMA 6304 / MIT 2.853 / MIT 2.854 Manufacturing Systems Lecture 10: Data and Regression Analysis Lecturer: Prof. Duane S. Boning 1 Agenda 1. Comparison of Treatments (One Variable) Analysis of Variance

More information

Unbalanced Data in Factorials Types I, II, III SS Part 1

Unbalanced Data in Factorials Types I, II, III SS Part 1 Unbalanced Data in Factorials Types I, II, III SS Part 1 Chapter 10 in Oehlert STAT:5201 Week 9 - Lecture 2 1 / 14 When we perform an ANOVA, we try to quantify the amount of variability in the data accounted

More information

STAT 510 Final Exam Spring 2015

STAT 510 Final Exam Spring 2015 STAT 510 Final Exam Spring 2015 Instructions: The is a closed-notes, closed-book exam No calculator or electronic device of any kind may be used Use nothing but a pen or pencil Please write your name and

More information

Stats fest Analysis of variance. Single factor ANOVA. Aims. Single factor ANOVA. Data

Stats fest Analysis of variance. Single factor ANOVA. Aims. Single factor ANOVA. Data 1 Stats fest 2007 Analysis of variance murray.logan@sci.monash.edu.au Single factor ANOVA 2 Aims Description Investigate differences between population means Explanation How much of the variation in response

More information

STATISTICS 479 Exam II (100 points)

STATISTICS 479 Exam II (100 points) Name STATISTICS 79 Exam II (1 points) 1. A SAS data set was created using the following input statement: Answer parts(a) to (e) below. input State $ City $ Pop199 Income Housing Electric; (a) () Give the

More information

First Year Examination Department of Statistics, University of Florida

First Year Examination Department of Statistics, University of Florida First Year Examination Department of Statistics, University of Florida May 6, 2011, 8:00 am - 12:00 noon Instructions: 1. You have four hours to answer questions in this examination. 2. You must show your

More information

Multiple Predictor Variables: ANOVA

Multiple Predictor Variables: ANOVA Multiple Predictor Variables: ANOVA 1/32 Linear Models with Many Predictors Multiple regression has many predictors BUT - so did 1-way ANOVA if treatments had 2 levels What if there are multiple treatment

More information

This exam contains 5 questions. Each question is worth 10 points. Therefore, this exam is worth 50 points.

This exam contains 5 questions. Each question is worth 10 points. Therefore, this exam is worth 50 points. GROUND RULES: This exam contains 5 questions. Each question is worth 10 points. Therefore, this exam is worth 50 points. Print your name at the top of this page in the upper right hand corner. This is

More information

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat). Statistics 512: Solution to Homework#11 Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat). 1. Perform the two-way ANOVA without interaction for this model. Use the results

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information