Statistics GIDP Ph.D. Qualifying Exam Methodology

Size: px
Start display at page:

Download "Statistics GIDP Ph.D. Qualifying Exam Methodology"

Transcription

1 Statistics GIDP Ph.D. Qualifying Exam Methodology May 26, 2017, 9:00am-1:00pm Instructions: Put your ID (not your name) on each sheet. Complete exactly 5 of 6 problems; turn in only those sheets you wish to have graded. Each question, but not necessarily each part, is equally weighted. Provide answers on the supplied pads of paper and/or use a Microsoft word document or equivalent to report your software code and outputs. Number each problem. You may turn in only one electronic document. Embed relevant code and output/graphics into your word document. Write on only one side of each sheet if you use paper. You may use the computer and/or a calculator. Stay calm and do your best. Good luck! 1. A process engineer is testing the yield of a product manufactured on five machines. Each machine has two operators, one for the day shift and one for the night shift. Assume the operator factor is random. We take five samples from each machine for each operator and obtain the following data ( machine.csv ): Machine Day Operator Night Operator (a) What design is this? (b) State the statistical model and assumptions. (c) Analyze the data and draw a conclusion. (d) If these five machines were randomly selected from many machines in the factory, would the conclusion be same as the one obtained in (3)? Explain (no calculation needed). (e) Attach your SAS code here. 2. A nickel-titanium alloy is used to make components for jet turbine aircraft engines. Cracking is a potentially serious problem, as it can lead to non-recoverable failure. A test is run at the parts producer to determine the effects of four factors on cracks. The four factors are pouring temperature (A), titanium content (B), heat treatment method (C), and the amount of grain

2 refiner used (D). Each factor contains two levels and 16 runs are performed. Two operators need to take care of these 16 runs. There might be some variation between the two operators. A B C D Operator (a) Help them to divide the workload equally by filling in the table above. (b) Assume the response measurements in the above table (from top to bottom) are 25, 71, 48, 45, 68, 40, 60, 65, 43, 80, 25, 104, 55, 86, 70, 76 and the dataset is given in aircraft.csv. Use SAS code to estimate the factor effects. Which factor effects appear to be large? Is there a large variation between the two operators? (c) Conduct an analysis of variance to verify the conclusion of (b). (d) Attach your SAS code here.

3 3. A study was carried out to compare the writing lifetime of four premium brands of pens. It was thought that the writing surface will affect lifetime, so three different surfaces were used and the data are given as below. Surface average Brand average (a) What design is this? (b) State the statistical model with assumptions. (c) How would you check whether there exists any significant interaction between the surfaces and brands of pens? State your hypothesis in mathematical notation. (d) Analyze the data using the given dataset pen.csv. (e) Attach your SAS code. (f) Assume that in this study 3 observations were collected for each combination and each value in the above table was the average of 3 replicates. A two-way ANOVA model with interaction is fitted and the MSE is Complete the following ANOVA table and draw conclusions. Source DF SS MS F-value P-value Brand Surface Interaction Error

4 4. In a study of carbohydrate uptake (Y) as a function of other factors in male diabetics, observations were taken as follows: Y Age, x 1 Weight, x 2 Dietary Protein, x Analyze these data to determine which (if any) of the predictor variables (including any appropriate interactions) appear to significantly affect carbohydrate uptake. Throughout, set α = 0.05, but for simplicity do not employ any adjustments for multiplicity/multiple inferences when assessing the effects of the predictor variables. Remember to assess the quality of the fit via standard diagnostics. Attach supporting components of your computer code. Report your findings. The data are found in the file diet.csv. 5. A large dataset of n = 1030 samples of concrete involving a total of p 1 = 8 predictor variables was collected: x 1 = Age x 2 = Cement x 3 = Furnace Slag x 4 = Superplasticizer x 5 = Water x 6 = Fly ash x 7 = Coarse Aggregate x 8 =Fine Aggregate along with Y = Compressive Strength The data appear in the file concrete.csv. Consider a multiple linear regression (MLR) model for these data, with E[Y] = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 3 + β 4 X 4 + β 5 X 5 + β 6 X 6 + β 7 X 7 + β 8 X 8. Conduct a variable selection search to identify a possible reduced model among the eight predictor variables with this data set. Employ backward elimination and take minimum-bic

5 as your selection criterion. Attach supporting components of your computer code. What is the recommended set of variables for further study? 6. Consider the simple linear regression model: Y i ~ indep. N(β 0 +β 1 X i, σ 2 ), i = 1,..., n, where in particular it is known that β 0 = 1, so that E[Y] = 1 + β 1 X. Suppose interest exists in estimating the X values at which E[Y] = 0. Let this target parameter be ξ. a) Find ξ as a function of β 1. Also find the maximum likelihood estimator for ξ. Call this ˆξ. b) Recall from Section in Casella & Berger that the Delta Method can be used to determine the asymptotic features of a function of random variables. In particular, for a random variable U and a differentiable function g(u), where E[U] = θ, a first-order approximation to E[g(U)] is E[g(U)] g(θ) + { g(θ)}e(u θ) θ Use this to find a first-order approximation for E[ˆξ]. c) In part (b), a second-order approximation to E[g(U)] is also available from Casella & Berger s book: E[g(U)] g(θ) + { g(θ) θ }E(U θ) + g(θ) ½{ 2 }E[(U θ) 2 ] θ 2 Use this to find a second-order approximation for E[ˆξ].

6 2017 May - method 1. A process engineer is testing the yield of a product manufactured on five machines. Each machine has two operators, one for the day shift and one for the night shift. Assume the operator factor is random. We take five samples from each machine for each operator and obtain the following data ( machine.csv ): Machine Day Operator Night Operator (a) What design is this? Nested design (operator is nested within the machine). (b) State the statistical model and assumptions. y!"# = μ + τ! + β!! + ε!!"! and! τ! = 0, and β!! ~N(0, σ!! ), ε!!" ~N(0, σ! ) (c) Analyze the data and draw a conclusion. From the SAS output of the above model it is obvious that the machine has a significant effect as the p-value is while the operator does not. Ty pe 1 Ana l y s i s o f Va r i a nc e Source D F Sum of Squares Me a n Square Ex pe c t e d Me a n Square Er r o r Te r m Er r o r DF F Va l u e Pr > F ma c h i n e Va r ( Re s i dua l ) + 5 Va r ( ope r a t or ( machi ne )) + Q(machine) MS ( o p e r a t o r ( ma c h i n e )) operat or( machi ne ) Va r ( Re s i dua l ) + 5 Va r ( ope r a t or ( ma c hi ne )) MS ( Re s i d u a l ) Re s i dua l Va r ( Re s i dua l ).... Or use the default setting method=reml, we get:

7 Ty pe 3 Te s t s o f Fi x e d Ef f e c t s Ef f e c t Num DF De n DF F Val ue Pr > F ma c h i n e (d) If these five machines were randomly selected from many machines in the factory, would the conclusion be same as the one obtained in (3)? Explain (no calculation needed). Yes, as the F-value for testing the machine effect is the same as the F-value in (3). (e) Attach your SAS code here. data Q1; input operator machine datalines; run; proc mixed method=typ1 data=q1; class operator machine; model y=machine; random operator(machine);

8 run; 2. A nickel-titanium alloy is used to make components for jet turbine aircraft engines. Cracking is a potentially serious problem, as it can lead to non-recoverable failure. A test is run at the parts producer to determine the effects of four factors on cracks. The four factors are pouring temperature (A), titanium content (B), heat treatment method (C), and the amount of grain refiner used (D). Each factor contains two levels and 16 runs are performed. Two operators need to take care of these 16 runs. There might be some variation between the two operators. A B C D Operator (a) Help them to divide the workload equally by filling in the table above. A B C D Operator

9 (b) Assume the response measurements in the above table (from top to bottom) are 25, 71, 48, 45, 68, 40, 60, 65, 43, 80, 25, 104, 55, 86, 70, 76 and the dataset is given in aircraft.csv. Use SAS code to estimate the factor effects. Which factor effects appear to be large? Is there a large variation between the two operators? It seems that the A, C, D, AC, and AD have large effect, as well as the operator factor. _NAME_ COL1 effect operator AC BCD ACD CD BD AB ABC BC B ABD C D AD A (c) Conduct an analysis of variance to verify the conclusion of (b). The ANOVA result below shows that the factors A, D, and interaction AC and AD are significant, as well as the operator.

10 Source DF Ty pe I I I SS Me a n S q u a r e F Val ue Pr > F A < C D AC < AD < operat or < (d) Attach your SAS code here. data Q2; input A B C D operater y; datalines; ; run; data inter; set Q2; AB=A*B; AC=A*C; AD=A*D; BC=B*C; BD=B*D; CD=C*D;ABC=AB*C; ABD=AB*D; ACD=AC*D;BCD=BC*D; block=abc*d; proc reg outest=effects data=inter; model y=a B C D AB AC AD BC BD CD ABC ABD ACD BCD block;

11 run; data effect2; set effects; drop y Intercept _RMSE_; run; proc transpose data=effect2 out=effect3; data effect4; set effect3; effect=col1*2; proc sort data=effect4; by effect; proc print data=effect4; run; data effect5; set effect4; where _NAME_^='block'; proc print data=effect5; run; proc rank data=effect5 normal=blom; var effect; ranks neff; run; proc gplot; plot effect*neff=_name_; run; proc glm data=inter; class A C D AC AD; model y=a C D AC AD; run;

12 3. A study was carried out to compare the writing lifetime of four premium brands of pens. It was thought that the writing surface will affect lifetime, so three different surfaces were used and the data are given as below. Surface average Brand average (a) What design is this? Randomized complete block design (RCBD), because the surfaces represent a known source of variation, and therefore they represent the block factor. (b) State the statistical model with assumptions. y!" = μ + τ! + β! + ε!", τ! = 0, β! = 0, and ε!" ~N(0, σ! ) (c) How would you check whether there exists any significant interaction between the surfaces and brands of pens? State your hypothesis in mathematical notation. Use one-degree freedom Tukey s method to check the interaction between these two factors. y!" = μ + τ! + β! + γτ! β! + ε!" H! : γ = 0 vs. H! : γ 0 (d) Analyze the data using the given dataset pen.csv. One-degree freedom Tukey s test shows that the interaction is not significant (pvalue= Source DF Ty pe I I I SS Me a n S q u a r e F Val ue Pr > F surface brand q So use the additive model: y!" = μ + τ! + β! + ε!" The type III SS ANOVA table shows that both surface and brand have significant effects. Source DF Ty pe I I I SS Me a n S q u a r e F Val ue Pr > F

13 Source DF Ty pe I I I SS Me a n S q u a r e F Val ue Pr > F surface brand Check model adequacy: The residual plot and QQ-plot show no unusual pattern. Te s t s f o r No r ma l i t y Te s t St at i s t i c p Val ue Shapi ro- Wi l k W Pr < W Ko l mo g o r o v - Smi rnov D Pr > D > Cr a me r - von Mi s e s W- Sq Pr > W- Sq Ande r s o n- Da r l i ng A- Sq Pr > A- Sq (e) Attach your SAS code. data Q3; input surface brand lifetime; datalines;

14 ; proc glm data=q3; class surface brand; model lifetime=surface brand; output out=diag r=res p=pred; run; data two; set diag; q=pred*pred; proc glm data=two; class surface brand; model lifetime=surface brand q/ss3; run; proc sgplot data=diag; scatter x=pred y=res; refline 0; run; proc univariate data=diag normal; var res; qqplot res/normal (L=1 mu=est sigma=est); run; (f) Assume that in this study 3 observations were collected for each combination and each value in the above table was the average of 3 replicates. A two-way ANOVA model with interaction is fitted and the MSE is Complete the following ANOVA table and draw conclusions.

15 Source DF SS MS F-value P-value Brand Surface Interaction Error Source DF SS MS F-value P-value Brand <0.001 Surface <0.01 Interaction >0.1 Error Both the brand and surface play a significant effect on the lifetime, but their interaction does not. μ = 692 τ1 = = 46 τ2 = = 10 τ3 = = 21 τ4 = = 15 β1 = = β2 = = 2.75 β3 = = 16 τβ11 = = 9.25 τβ12 = = τβ13 = = 2 τβ21 = = 9.75 τβ22 = = 8.75 τβ23 = = 1 τβ31 = = τβ32 = = 1.75 τβ33 = = 10 τβ41 = = τβ42 = = 0.75 τβ43 = = 13 SS_brand=3*3*(46^2+10^2+21^2+15^2)=25,938

16 SS_surface=4*3*(13.25^2+2.75^2+16^2)= SS_bs=3*(9.25^ ^2+2^2+9.75^2+8.75^ ^2+1.75^2+10^ ^2+0.75^2+13^2)= Or use the way below: If we assume that the table is average lifetime, then the totals for each treatment can be recovered by multiplying by 3, i.e., y!". = y!". 3, e.g., y!!. = = Similarly, the listed averages can recover other totals, SS!"#$% = y!!.. bn y! abn = SS!"#$ = y!.!. an y! abn = SS!"# = y!... n = 25, = y! abn = = 34,056 SS!"#$,!"!"# = SS!"# SS!"#$% SS!"#$ = SS! = MSE a b n 1 = In a study of carbohydrate uptake (Y) as a function of other factors in male diabetics, observations were taken as follows: Y Age, x 1 Weight, x 2 Dietary Protein, x

17 Analyze these data to determine which (if any) of the predictor variables (including any appropriate interactions) appear to significantly affect carbohydrate uptake. Throughout, set α = 0.05, but for simplicity do not employ any adjustments for multiplicity/multiple inferences when assessing the effects of the predictor variables. Remember to assess the quality of the fit via standard diagnostics. Attach supporting components of your computer code. Report your findings. The data are found in the file diet.csv. To start, always plot the data! Sample R code diet.df = read.csv( file.choose() ) attach( diet.df ) Y = Y; X1 = Age; X2 = Weight; X3 = Dietary.Protein pairs( cbind(y,x1,x2,x3), pch=19 )

18 No disturbing patterns are seen in the scatterplot matrix. Now fit the model; sample R code follows (notice use of centered predictors to properly accommodate the higher-order interaction terms): x1 = Age - mean(age); x2 = Weight - mean(weight) x3 = Dietary.Protein - mean(dietary.protein) diet.lm = lm( Y ~ x1*x2*x3 ) anova( diet.lm ) This yields the following ANOVA table (output edited):

19 Analysis of Variance Table Response: Y Df Sum Sq Mean Sq F value Pr(>F) x x x x1:x x1:x x2:x x1:x2:x Residuals From the ANOVA table we see that the Sequential Sums of Squares working from the bottom (i.e., the Partial SS) up show no significant interaction of any type (pointwise, at the 5% level). Formally, we test this via: anova( lm(y~x1+x2+x3), diet.lm ) producing Model 1: Y ~ x1 + x2 + x3 Model 2: Y ~ x1 * x2 * x3 Res.Df RSS Df Sum of Sq F Pr(>F) The P-value for testing all four interactions is P = > 0.05 = α. Again, no interactions are significant. Move now to a reduced model with only main-effect terms (so return to the original, uncentered predictor variables): dietrm.lm = lm( Y~Age+Weight+Dietary.Protein ); anova( dietrm.lm ) Analysis of Variance Table Response: Y Df Sum Sq Mean Sq F value Pr(>F) Age Weight Dietary.Protein Residuals Examining the Partial SS shows that Protein is significant at the (pointwise) 5% level. To study the other terms we can either (i) rearrange the sequential order of the reduced-model ANOVA to isolate Weight and then Age, and test their Partial SS contributions, or (ii) since each is a 1 d.f. test, just examine the t-tests for assessing each pointwise β-coefficient. The latter approach is faster: summary( dietrm.lm )

20 producing (output edited) Call: lm(formula = Y ~ X1 + X2 + X3) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) Age Weight Dietary.Protein Residual standard error: on 16 degrees of freedom Multiple R-squared: 0.515, Adjusted R-squared: F-statistic: on 3 and 16 DF, p-value: We see Age is insignificant with P = > 0.05, but Weight is significant with P = < 0.05, each at the (pointwise) 5% level. Thus a final reduced model retains only Weight and Protein: dietfinal.lm = lm( Y~Weight+Dietary.Protein ); summary( dietfinal.lm ) Call: lm(formula = Y ~ X2 + X3) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) Weight Dietary.Protein Residual standard error: on 17 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 2 and 17 DF, p-value: For diagnostic quality assessment: (i) Check VIFs for multicollinearity between X2 and X3: library( car ) vif( dietfinal.lm ); mean(vif( dietfinal.lm )) Weight Dietary.Protein Since both values are below 10 and their mean is clearly equal to < 6, no concerns with multicollinearity are evident. (ii) Check normal Q-Q plot: sample R code qqnorm( resid(dietfinal.lm), main=null, pch=19) qqline(resid(dietfinal.lm)) produces the following graphic (no substantive concerns are evidenced).

21 (iii) Studentized residual plot (with outlier screen): sample R code is n = length(y); p = length( coef(dietfinal.lm) ) tcrit = qt( 1-.5*(.05/n), n-p-1 ) plot( rstudent(dietfinal.lm) ~ fitted(dietfinal.lm), pch=19, ylim=c(-ceiling(tcrit),ceiling(tcrit)) ) abline( h=0 ) abline( h=tcrit, lty=2 ); abline( h=-tcrit, lty=2 )

22 From the residual plot, no troublesome patterns are seen, and no outliers are observed to extend past the screening limits of ±t(1 (0.05/2n); n p 1) = ± (iv) Influence measures: sample R code is influence.measures( dietfinal.lm ) which produces the following output (edited) Influence measures of lm(formula = Y ~ Weight + Dietary.Protein) : dfb.1_ dfb.wght dfb.dt.p dffit cov.r cook.d hat inf *

23 * We see observations at i = 4 and i = 12 are marked for further study: At i = 4 and i =12 the hat matrix diagonals, h ii, exceed 2p/n = 0.3, indicating high leverage at these points. At i = 4 the value of DFFITS exceeds 1 in absolute value, so this point again exhibits high influence. The Cook s Distance D i values are available as the sixth column of the influence.measures object, so we can check their associated F-probability values via Di = influence.measures(dietfinal.lm)$infmat[,6] which( pf(di, df1=p, df2=n-p) > 0.5 ) the result of which is null. Thus no influence is seen on the Cook s Distance metric. Lastly, no values of DFBETAS 2 or DFBETAS 3 exceed 1 in absolute value, so no influence is seen on that measure. 5. A large dataset of n = 1030 samples of concrete involving a total of p 1 = 8 predictor variables was collected: x 1 = Age x 2 = Cement x 3 = Furnace Slag x 4 = Superplasticizer x 5 = Water x 6 = Fly ash x 7 = Coarse Aggregate x 8 =Fine Aggregate along with Y = Compressive Strength The data appear in the file concrete.csv. Consider a multiple linear regression (MLR) model for these data, with E[Y] = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 3 + β 4 X 4 + β 5 X 5 + β 6 X 6 + β 7 X 7 + β 8 X 8. Conduct a variable selection search to identify a possible reduced model among the eight predictor variables with this data set. Employ backward elimination and take minimum-bic as your selection criterion. Attach supporting components of your computer code. What is the recommended set of variables for further study? Begin by loading the data and creating X variables (notice that the response variable Y has already been transformed as Compressive Strength):

24 concrete.df = read.csv(file.choose()) attach( concrete.df ) Y = Y x1 = age x2 = cement x3 = slag x4 = superplasticizer x5 = water x6 = fly.ash x7 = coarse.aggregate x8 = fine.aggregate Always plot the data! The command pairs( concrete.df ) produces a scatterplot matrix (see next page), in which a number of interesting patterns appears. None. however, are grossly disturbing at face value.

25 Next, build the regression fit and apply backward elimination with BIC control: library( leaps ) cement.lm = lm( Y ~x1+x2+x3+x4+x5+x6+x7+x8 ) n = length(y) step( cement.lm, direction="backward", k=log(n) ) #BIC This produces [output edited -- note that R writes AIC in the leaps output, but we did institute the stepwise regression with the BIC option k=log(n)]: Start: AIC=

26 Y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 Df Sum of Sq RSS AIC - x x <none> x x x x x x Step: AIC= Y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 Df Sum of Sq RSS AIC - x <none> x x x x x x Step: AIC= Y ~ x1 + x2 + x3 + x4 + x5 + x6 Df Sum of Sq RSS AIC <none> x x x x x x Call: lm(formula = Y ~ x1 + x2 + x3 + x4 + x5 + x6) Coefficients: (Intercept) x1 x2 x3 x x5 x

27 We see after two back-steps, a reduced model with only the first 6 predictors x1 = age x2 = cement x3 = slag x4 = superplasticizer x5 = water x6 = fly.ash is recommended for further study.

28 6. Consider the simple linear regression model: Y i ~ indep. N(β 0 +β 1 X i, σ 2 ), i = 1,..., n, where in particular it is known that β 0 = 1, so that E[Y] = 1 + β 1 X. Suppose interest exists in estimating the X values at which E[Y] = 0. Let this target parameter be ξ. a) Find ξ as a function of β 1. Also find the maximum likelihood estimator for ξ. Call this ˆξ. This is essentially a one-parameter inverse regression problem. We have E[Y] = 1 + β 1 X. Clearly, at E[Y] = 0 we have 0 = 1 + β 1 X, so solving for X produces ξ= 1/β 1. To find the MLE ˆξ, appeal to ML invariance and first find ˆβ 1. The fastest way to do so is to recognize that if E[Y] = 1 + β 1 X, then E[Y 1] = β 1 X. That is, we essentially regress the new response variable (Y i 1) against X i through the origin! Referring to the various equations in Sec. 4.4 of Kutner et al., we find that the least squares estimator for β 1 is ˆβ 1 = n i=1 X i(y i 1) / n i=1 X i 2. Under the homogeneous-variance, normal-parent assumption here, this estimator remains identical to the MLE, so take ˆξ = 1/ ˆβ 1 = n i=1 X i 2 / n i=1 X i(y i 1). b) Recall from Section in Casella & Berger that the Delta Method can be used to determine the asymptotic features of a function of random variables. In particular, for a random variable U and a differentiable function g(u), where E[U] = θ, a first-order approximation to E[g(U)] is E[g(U)] g(θ) + { g(θ)}e(u θ) θ Use this to find a first-order approximation for E[ˆξ]. Let g(β 1 ) = ξ = 1/β 1. We know that the MLE for β 1 is unbiased such that E[ ˆβ 1 ] = β 1. Then from the Delta Method we see E[ˆξ] = E[ 1/ ˆβ 1 ] g(β 1 ) + β 1 g(β 1 ) E( ˆβ 1 β 1 ) = 1/β 1 + β 1 g(β 1 )(0) = 1/β 1 = ξ. c) In part (b), a second-order approximation to E[g(U)] is also available from Casella & Berger s book: E[g(U)] g(θ) + { g(θ) θ }E(U θ) + g(θ) ½{ 2 }E[(U θ) 2 ] θ 2 Use this to find a second-order approximation for E[ˆξ]. Again, let g(β 1 ) = ξ = 1/β 1. We know that the MLE for β 1 is unbiased such that E[ ˆβ 1 ] = β 1. Thus for the second-order Delta Method approximation we have E( ˆβ 1 β 1 ) = 0 and E[( ˆβ 1 β 1 ) 2 ] = Var[ ˆβ 1 ] This latter quantity is Var[ ˆβ 1 ] = Var[ n i=1 X i(y i 1)/ n j=1 X j 2 ] = Var[ n i=1 X i(y i 1)]/( n j=1 X j 2 ) 2

29 = n i=1 X i 2 Var[Y i 1]/( n j=1 X j 2 ) 2 = n i=1 X i 2 Var[Y i ]/( n j=1 X j 2 ) 2 = n i=1 X i 2 σ 2 /( n j=1 X j 2 ) 2 = σ 2 n i=1 X i 2 /( n j=1 X j 2 ) 2 = σ 2 /( n j=1 X j 2 ). Collecting all this together yields E[ˆξ] = E[ 1/ ˆβ 1 ] g(β 1 ) + β 1 g(β 1 ) (0) + ½ 2 g(β 1 ) β 1 2 Var[ ˆβ 1 ] = 1 β 1 = 1 β σ 2 2β 1 3 n j=1 X j 2 σ 2 β 3 1 n = ξ j=1 X j 2 ξ 2 σ n j=1 X j 2. (We see that to second order, a bias exists in the point estimator. However, it can in fact be shown that E[ˆξ] does not exist, as E[ ˆξ ] diverges. Thus, one must always be careful with these sorts of approximate expansions.)

Statistics GIDP Ph.D. Qualifying Exam Methodology January 10, 9:00am-1:00pm

Statistics GIDP Ph.D. Qualifying Exam Methodology January 10, 9:00am-1:00pm Statistics GIDP Ph.D. Qualifying Exam Methodology January 10, 9:00am-1:00pm Instructions: Put your ID (not name) on each sheet. Complete exactly 5 of 6 problems; turn in only those sheets you wish to have

More information

Statistics GIDP Ph.D. Qualifying Exam Methodology May 26 9:00am-1:00pm

Statistics GIDP Ph.D. Qualifying Exam Methodology May 26 9:00am-1:00pm Statistics GIDP Ph.D. Qualifying Exam Methodology May 26 9:00am-1:00pm Instructions: Put your ID (not name) on each sheet. Complete exactly 5 of 6 problems; turn in only those sheets you wish to have graded.

More information

Statistics GIDP Ph.D. Qualifying Exam Methodology

Statistics GIDP Ph.D. Qualifying Exam Methodology Statistics GIDP Ph.D. Qualifying Exam Methodology January 9, 2018, 9:00am 1:00pm Instructions: Put your ID (not your name) on each sheet. Complete exactly 5 of 6 problems; turn in only those sheets you

More information

Lecture 10: 2 k Factorial Design Montgomery: Chapter 6

Lecture 10: 2 k Factorial Design Montgomery: Chapter 6 Lecture 10: 2 k Factorial Design Montgomery: Chapter 6 Page 1 2 k Factorial Design Involving k factors Each factor has two levels (often labeled + and ) Factor screening experiment (preliminary study)

More information

Lecture 11: Blocking and Confounding in 2 k design

Lecture 11: Blocking and Confounding in 2 k design Lecture 11: Blocking and Confounding in 2 k design Montgomery: Chapter 7 Page 1 There are n blocks Randomized Complete Block 2 k Design Within each block, all treatments (level combinations) are conducted.

More information

Statistics GIDP Ph.D. Qualifying Exam Methodology May 26 9:00am-1:00pm

Statistics GIDP Ph.D. Qualifying Exam Methodology May 26 9:00am-1:00pm Statistics GIDP Ph.D. Qualifying Exam Methodology May 26 9:00am-1:00pm Instructions: Put your ID (not name) on each sheet. Complete exactly 5 of 6 problems; turn in only those sheets you wish to have graded.

More information

Lecture 12: 2 k Factorial Design Montgomery: Chapter 6

Lecture 12: 2 k Factorial Design Montgomery: Chapter 6 Lecture 12: 2 k Factorial Design Montgomery: Chapter 6 1 Lecture 12 Page 1 2 k Factorial Design Involvingk factors: each has two levels (often labeled+and ) Very useful design for preliminary study Can

More information

Statistics GIDP Ph.D. Qualifying Exam Methodology

Statistics GIDP Ph.D. Qualifying Exam Methodology Statistics GIDP Ph.D. Qualifying Exam Methodology May 28, 2015, 9:00am- 1:00pm Instructions: Provide answers on the supplied pads of paper; write on only one side of each sheet. Complete exactly 2 of the

More information

Statistics GIDP Ph.D. Qualifying Exam Methodology

Statistics GIDP Ph.D. Qualifying Exam Methodology Statistics GIDP Ph.D. Qualifying Exam Methodology May 28th, 2015, 9:00am- 1:00pm Instructions: Provide answers on the supplied pads of paper; write on only one side of each sheet. Complete exactly 2 of

More information

Lecture 12: 2 k p Fractional Factorial Design

Lecture 12: 2 k p Fractional Factorial Design Lecture 12: 2 k p Fractional Factorial Design Montgomery: Chapter 8 Page 1 Fundamental Principles Regarding Factorial Effects Suppose there are k factors (A,B,...,J,K) in an experiment. All possible factorial

More information

20g g g Analyze the residuals from this experiment and comment on the model adequacy.

20g g g Analyze the residuals from this experiment and comment on the model adequacy. 3.4. A computer ANOVA output is shown below. Fill in the blanks. You may give bounds on the P-value. One-way ANOVA Source DF SS MS F P Factor 3 36.15??? Error??? Total 19 196.04 3.11. A pharmaceutical

More information

Answer Keys to Homework#10

Answer Keys to Homework#10 Answer Keys to Homework#10 Problem 1 Use either restricted or unrestricted mixed models. Problem 2 (a) First, the respective means for the 8 level combinations are listed in the following table A B C Mean

More information

Assignment 9 Answer Keys

Assignment 9 Answer Keys Assignment 9 Answer Keys Problem 1 (a) First, the respective means for the 8 level combinations are listed in the following table A B C Mean 26.00 + 34.67 + 39.67 + + 49.33 + 42.33 + + 37.67 + + 54.67

More information

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 Time allowed: 3 HOURS. STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 This is an open book exam: all course notes and the text are allowed, and you are expected to use your own calculator.

More information

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION Answer all parts. Closed book, calculators allowed. It is important to show all working,

More information

Chapter 6 The 2 k Factorial Design Solutions

Chapter 6 The 2 k Factorial Design Solutions Solutions from Montgomery, D. C. (004) Design and Analysis of Experiments, Wiley, NY Chapter 6 The k Factorial Design Solutions 6.. A router is used to cut locating notches on a printed circuit board.

More information

3.4. A computer ANOVA output is shown below. Fill in the blanks. You may give bounds on the P-value.

3.4. A computer ANOVA output is shown below. Fill in the blanks. You may give bounds on the P-value. 3.4. A computer ANOVA output is shown below. Fill in the blanks. You may give bounds on the P-value. One-way ANOVA Source DF SS MS F P Factor 3 36.15??? Error??? Total 19 196.04 Completed table is: One-way

More information

Lecture 14: 2 k p Fractional Factorial Design

Lecture 14: 2 k p Fractional Factorial Design Lecture 14: 2 k p Fractional Factorial Design Montgomery: Chapter 8 1 Lecture 14 Page 1 Fundamental Principles Regarding Factorial Effects Suppose there arek factors (A,B,...,J,K) in an experiment. All

More information

Strategy of Experimentation II

Strategy of Experimentation II LECTURE 2 Strategy of Experimentation II Comments Computer Code. Last week s homework Interaction plots Helicopter project +1 1 1 +1 [4I 2A 2B 2AB] = [µ 1) µ A µ B µ AB ] +1 +1 1 1 +1 1 +1 1 +1 +1 +1 +1

More information

13 Simple Linear Regression

13 Simple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity

More information

Lecture 9: Factorial Design Montgomery: chapter 5

Lecture 9: Factorial Design Montgomery: chapter 5 Lecture 9: Factorial Design Montgomery: chapter 5 Page 1 Examples Example I. Two factors (A, B) each with two levels (, +) Page 2 Three Data for Example I Ex.I-Data 1 A B + + 27,33 51,51 18,22 39,41 EX.I-Data

More information

ST430 Exam 2 Solutions

ST430 Exam 2 Solutions ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving

More information

Lecture 11: Simple Linear Regression

Lecture 11: Simple Linear Regression Lecture 11: Simple Linear Regression Readings: Sections 3.1-3.3, 11.1-11.3 Apr 17, 2009 In linear regression, we examine the association between two quantitative variables. Number of beers that you drink

More information

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat). Statistics 512: Solution to Homework#11 Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat). 1. Perform the two-way ANOVA without interaction for this model. Use the results

More information

Single Factor Experiments

Single Factor Experiments Single Factor Experiments Bruce A Craig Department of Statistics Purdue University STAT 514 Topic 4 1 Analysis of Variance Suppose you are interested in comparing either a different treatments a levels

More information

Outline. Topic 20 - Diagnostics and Remedies. Residuals. Overview. Diagnostics Plots Residual checks Formal Tests. STAT Fall 2013

Outline. Topic 20 - Diagnostics and Remedies. Residuals. Overview. Diagnostics Plots Residual checks Formal Tests. STAT Fall 2013 Topic 20 - Diagnostics and Remedies - Fall 2013 Diagnostics Plots Residual checks Formal Tests Remedial Measures Outline Topic 20 2 General assumptions Overview Normally distributed error terms Independent

More information

STATISTICS 479 Exam II (100 points)

STATISTICS 479 Exam II (100 points) Name STATISTICS 79 Exam II (1 points) 1. A SAS data set was created using the following input statement: Answer parts(a) to (e) below. input State $ City $ Pop199 Income Housing Electric; (a) () Give the

More information

Stat 4510/7510 Homework 7

Stat 4510/7510 Homework 7 Stat 4510/7510 Due: 1/10. Stat 4510/7510 Homework 7 1. Instructions: Please list your name and student number clearly. In order to receive credit for a problem, your solution must show sufficient details

More information

Topic 18: Model Selection and Diagnostics

Topic 18: Model Selection and Diagnostics Topic 18: Model Selection and Diagnostics Variable Selection We want to choose a best model that is a subset of the available explanatory variables Two separate problems 1. How many explanatory variables

More information

Lecture 11 Multiple Linear Regression

Lecture 11 Multiple Linear Regression Lecture 11 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 11-1 Topic Overview Review: Multiple Linear Regression (MLR) Computer Science Case Study 11-2 Multiple Regression

More information

Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014

Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014 Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014 Tim Hanson, Ph.D. University of South Carolina T. Hanson (USC) Stat 704: Data Analysis I, Fall 2014 1 / 13 Chapter 8: Polynomials & Interactions

More information

holding all other predictors constant

holding all other predictors constant Multiple Regression Numeric Response variable (y) p Numeric predictor variables (p < n) Model: Y = b 0 + b 1 x 1 + + b p x p + e Partial Regression Coefficients: b i effect (on the mean response) of increasing

More information

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model Topic 17 - Single Factor Analysis of Variance - Fall 2013 One way ANOVA Cell means model Factor effects model Outline Topic 17 2 One-way ANOVA Response variable Y is continuous Explanatory variable is

More information

STAT 571A Advanced Statistical Regression Analysis. Chapter 8 NOTES Quantitative and Qualitative Predictors for MLR

STAT 571A Advanced Statistical Regression Analysis. Chapter 8 NOTES Quantitative and Qualitative Predictors for MLR STAT 571A Advanced Statistical Regression Analysis Chapter 8 NOTES Quantitative and Qualitative Predictors for MLR 2015 University of Arizona Statistics GIDP. All rights reserved, except where previous

More information

Booklet of Code and Output for STAC32 Final Exam

Booklet of Code and Output for STAC32 Final Exam Booklet of Code and Output for STAC32 Final Exam December 7, 2017 Figure captions are below the Figures they refer to. LowCalorie LowFat LowCarbo Control 8 2 3 2 9 4 5 2 6 3 4-1 7 5 2 0 3 1 3 3 Figure

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression BSTT523: Kutner et al., Chapter 1 1 Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression Introduction: Functional relation between

More information

No other aids are allowed. For example you are not allowed to have any other textbook or past exams.

No other aids are allowed. For example you are not allowed to have any other textbook or past exams. UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Sample Exam Note: This is one of our past exams, In fact the only past exam with R. Before that we were using SAS. In

More information

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c Inference About the Slope ffl As with all estimates, ^fi1 subject to sampling var ffl Because Y jx _ Normal, the estimate ^fi1 _ Normal A linear combination of indep Normals is Normal Simple Linear Regression

More information

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis STAT 3900/4950 MIDTERM TWO Name: Spring, 205 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis Instructions: You may use your books, notes, and SPSS/SAS. NO

More information

Regression Diagnostics

Regression Diagnostics Diag 1 / 78 Regression Diagnostics Paul E. Johnson 1 2 1 Department of Political Science 2 Center for Research Methods and Data Analysis, University of Kansas 2015 Diag 2 / 78 Outline 1 Introduction 2

More information

STATISTICS 174: APPLIED STATISTICS TAKE-HOME FINAL EXAM POSTED ON WEBPAGE: 6:00 pm, DECEMBER 6, 2004 HAND IN BY: 6:00 pm, DECEMBER 7, 2004 This is a

STATISTICS 174: APPLIED STATISTICS TAKE-HOME FINAL EXAM POSTED ON WEBPAGE: 6:00 pm, DECEMBER 6, 2004 HAND IN BY: 6:00 pm, DECEMBER 7, 2004 This is a STATISTICS 174: APPLIED STATISTICS TAKE-HOME FINAL EXAM POSTED ON WEBPAGE: 6:00 pm, DECEMBER 6, 2004 HAND IN BY: 6:00 pm, DECEMBER 7, 2004 This is a take-home exam. You are expected to work on it by yourself

More information

STATISTICS 110/201 PRACTICE FINAL EXAM

STATISTICS 110/201 PRACTICE FINAL EXAM STATISTICS 110/201 PRACTICE FINAL EXAM Questions 1 to 5: There is a downloadable Stata package that produces sequential sums of squares for regression. In other words, the SS is built up as each variable

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3.1 through 3.3

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3.1 through 3.3 Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3.1 through 3.3 Fall, 2013 Page 1 Tensile Strength Experiment Investigate the tensile strength of a new synthetic fiber. The factor is the

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

SAS Commands. General Plan. Output. Construct scatterplot / interaction plot. Run full model

SAS Commands. General Plan. Output. Construct scatterplot / interaction plot. Run full model Topic 23 - Unequal Replication Data Model Outline - Fall 2013 Parameter Estimates Inference Topic 23 2 Example Page 954 Data for Two Factor ANOVA Y is the response variable Factor A has levels i = 1, 2,...,

More information

CAS MA575 Linear Models

CAS MA575 Linear Models CAS MA575 Linear Models Boston University, Fall 2013 Midterm Exam (Correction) Instructor: Cedric Ginestet Date: 22 Oct 2013. Maximal Score: 200pts. Please Note: You will only be graded on work and answers

More information

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where

More information

Booklet of Code and Output for STAC32 Final Exam

Booklet of Code and Output for STAC32 Final Exam Booklet of Code and Output for STAC32 Final Exam December 8, 2014 List of Figures in this document by page: List of Figures 1 Popcorn data............................. 2 2 MDs by city, with normal quantile

More information

5.3 Three-Stage Nested Design Example

5.3 Three-Stage Nested Design Example 5.3 Three-Stage Nested Design Example A researcher designs an experiment to study the of a metal alloy. A three-stage nested design was conducted that included Two alloy chemistry compositions. Three ovens

More information

1 Introduction 1. 2 The Multiple Regression Model 1

1 Introduction 1. 2 The Multiple Regression Model 1 Multiple Linear Regression Contents 1 Introduction 1 2 The Multiple Regression Model 1 3 Setting Up a Multiple Regression Model 2 3.1 Introduction.............................. 2 3.2 Significance Tests

More information

14 Multiple Linear Regression

14 Multiple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in

More information

STAT 350: Summer Semester Midterm 1: Solutions

STAT 350: Summer Semester Midterm 1: Solutions Name: Student Number: STAT 350: Summer Semester 2008 Midterm 1: Solutions 9 June 2008 Instructor: Richard Lockhart Instructions: This is an open book test. You may use notes, text, other books and a calculator.

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Lecture 1: Linear Models and Applications

Lecture 1: Linear Models and Applications Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation

More information

Stat 5102 Final Exam May 14, 2015

Stat 5102 Final Exam May 14, 2015 Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions

More information

Regression Analysis for Data Containing Outliers and High Leverage Points

Regression Analysis for Data Containing Outliers and High Leverage Points Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain

More information

Chapter 1 Linear Regression with One Predictor

Chapter 1 Linear Regression with One Predictor STAT 525 FALL 2018 Chapter 1 Linear Regression with One Predictor Professor Min Zhang Goals of Regression Analysis Serve three purposes Describes an association between X and Y In some applications, the

More information

Chapter 5 Introduction to Factorial Designs Solutions

Chapter 5 Introduction to Factorial Designs Solutions Solutions from Montgomery, D. C. (1) Design and Analysis of Experiments, Wiley, NY Chapter 5 Introduction to Factorial Designs Solutions 5.1. The following output was obtained from a computer program that

More information

Lecture 10: Experiments with Random Effects

Lecture 10: Experiments with Random Effects Lecture 10: Experiments with Random Effects Montgomery, Chapter 13 1 Lecture 10 Page 1 Example 1 A textile company weaves a fabric on a large number of looms. It would like the looms to be homogeneous

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Multiple Linear Regression. Chapter 12

Multiple Linear Regression. Chapter 12 13 Multiple Linear Regression Chapter 12 Multiple Regression Analysis Definition The multiple regression model equation is Y = b 0 + b 1 x 1 + b 2 x 2 +... + b p x p + ε where E(ε) = 0 and Var(ε) = s 2.

More information

Fractional Factorial Designs

Fractional Factorial Designs Fractional Factorial Designs ST 516 Each replicate of a 2 k design requires 2 k runs. E.g. 64 runs for k = 6, or 1024 runs for k = 10. When this is infeasible, we use a fraction of the runs. As a result,

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model Lab 3 A Quick Introduction to Multiple Linear Regression Psychology 310 Instructions.Work through the lab, saving the output as you go. You will be submitting your assignment as an R Markdown document.

More information

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc IES 612/STA 4-573/STA 4-576 Winter 2008 Week 1--IES 612-STA 4-573-STA 4-576.doc Review Notes: [OL] = Ott & Longnecker Statistical Methods and Data Analysis, 5 th edition. [Handouts based on notes prepared

More information

Suppose we needed four batches of formaldehyde, and coulddoonly4runsperbatch. Thisisthena2 4 factorial in 2 2 blocks.

Suppose we needed four batches of formaldehyde, and coulddoonly4runsperbatch. Thisisthena2 4 factorial in 2 2 blocks. 58 2. 2 factorials in 2 blocks Suppose we needed four batches of formaldehyde, and coulddoonly4runsperbatch. Thisisthena2 4 factorial in 2 2 blocks. Some more algebra: If two effects are confounded with

More information

Lecture 3: Inference in SLR

Lecture 3: Inference in SLR Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals

More information

Topic 20: Single Factor Analysis of Variance

Topic 20: Single Factor Analysis of Variance Topic 20: Single Factor Analysis of Variance Outline Single factor Analysis of Variance One set of treatments Cell means model Factor effects model Link to linear regression using indicator explanatory

More information

Stat 500 Midterm 2 12 November 2009 page 0 of 11

Stat 500 Midterm 2 12 November 2009 page 0 of 11 Stat 500 Midterm 2 12 November 2009 page 0 of 11 Please put your name on the back of your answer book. Do NOT put it on the front. Thanks. Do not start until I tell you to. The exam is closed book, closed

More information

1) Answer the following questions as true (T) or false (F) by circling the appropriate letter.

1) Answer the following questions as true (T) or false (F) by circling the appropriate letter. 1) Answer the following questions as true (T) or false (F) by circling the appropriate letter. T F T F T F a) Variance estimates should always be positive, but covariance estimates can be either positive

More information

STAT22200 Spring 2014 Chapter 14

STAT22200 Spring 2014 Chapter 14 STAT22200 Spring 2014 Chapter 14 Yibi Huang May 27, 2014 Chapter 14 Incomplete Block Designs 14.1 Balanced Incomplete Block Designs (BIBD) Chapter 14-1 Incomplete Block Designs A Brief Introduction to

More information

Overview Scatter Plot Example

Overview Scatter Plot Example Overview Topic 22 - Linear Regression and Correlation STAT 5 Professor Bruce Craig Consider one population but two variables For each sampling unit observe X and Y Assume linear relationship between variables

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Statistics for exp. medical researchers Regression and Correlation

Statistics for exp. medical researchers Regression and Correlation Faculty of Health Sciences Regression analysis Statistics for exp. medical researchers Regression and Correlation Lene Theil Skovgaard Sept. 28, 2015 Linear regression, Estimation and Testing Confidence

More information

SCHOOL OF MATHEMATICS AND STATISTICS

SCHOOL OF MATHEMATICS AND STATISTICS RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester

More information

Lecture 4. Random Effects in Completely Randomized Design

Lecture 4. Random Effects in Completely Randomized Design Lecture 4. Random Effects in Completely Randomized Design Montgomery: 3.9, 13.1 and 13.7 1 Lecture 4 Page 1 Random Effects vs Fixed Effects Consider factor with numerous possible levels Want to draw inference

More information

Chapter 6 The 2 k Factorial Design Solutions

Chapter 6 The 2 k Factorial Design Solutions Solutions from Montgomery, D. C. () Design and Analysis of Experiments, Wiley, NY Chapter 6 The k Factorial Design Solutions 6.. An engineer is interested in the effects of cutting speed (A), tool geometry

More information

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA s:5 Applied Linear Regression Chapter 8: ANOVA Two-way ANOVA Used to compare populations means when the populations are classified by two factors (or categorical variables) For example sex and occupation

More information

Comparison of a Population Means

Comparison of a Population Means Analysis of Variance Interested in comparing Several treatments Several levels of one treatment Comparison of a Population Means Could do numerous two-sample t-tests but... ANOVA provides method of joint

More information

Institutionen för matematik och matematisk statistik Umeå universitet November 7, Inlämningsuppgift 3. Mariam Shirdel

Institutionen för matematik och matematisk statistik Umeå universitet November 7, Inlämningsuppgift 3. Mariam Shirdel Institutionen för matematik och matematisk statistik Umeå universitet November 7, 2011 Inlämningsuppgift 3 Mariam Shirdel (mash0007@student.umu.se) Kvalitetsteknik och försöksplanering, 7.5 hp 1 Uppgift

More information

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010 1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of

More information

Chapter 10 Building the Regression Model II: Diagnostics

Chapter 10 Building the Regression Model II: Diagnostics Chapter 10 Building the Regression Model II: Diagnostics 許湘伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 41 10.1 Model Adequacy for a Predictor Variable-Added

More information

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3-1 through 3-3

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3-1 through 3-3 Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3-1 through 3-3 Page 1 Tensile Strength Experiment Investigate the tensile strength of a new synthetic fiber. The factor is the weight percent

More information

Regression Model Building

Regression Model Building Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation in Y with a small set of predictors Automated

More information

Introduction to Design and Analysis of Experiments with the SAS System (Stat 7010 Lecture Notes)

Introduction to Design and Analysis of Experiments with the SAS System (Stat 7010 Lecture Notes) Introduction to Design and Analysis of Experiments with the SAS System (Stat 7010 Lecture Notes) Asheber Abebe Discrete and Statistical Sciences Auburn University Contents 1 Completely Randomized Design

More information

Lecture 4. Checking Model Adequacy

Lecture 4. Checking Model Adequacy Lecture 4. Checking Model Adequacy Montgomery: 3-4, 15-1.1 Page 1 Model Checking and Diagnostics Model Assumptions 1 Model is correct 2 Independent observations 3 Errors normally distributed 4 Constant

More information

Week 7 Multiple factors. Ch , Some miscellaneous parts

Week 7 Multiple factors. Ch , Some miscellaneous parts Week 7 Multiple factors Ch. 18-19, Some miscellaneous parts Multiple Factors Most experiments will involve multiple factors, some of which will be nuisance variables Dealing with these factors requires

More information

Lecture 1 Linear Regression with One Predictor Variable.p2

Lecture 1 Linear Regression with One Predictor Variable.p2 Lecture Linear Regression with One Predictor Variablep - Basics - Meaning of regression parameters p - β - the slope of the regression line -it indicates the change in mean of the probability distn of

More information

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation. Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1

More information

Diagnostics and Transformations Part 2

Diagnostics and Transformations Part 2 Diagnostics and Transformations Part 2 Bivariate Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University Multilevel Regression Modeling, 2009 Diagnostics

More information

EXST 7015 Fall 2014 Lab 08: Polynomial Regression

EXST 7015 Fall 2014 Lab 08: Polynomial Regression EXST 7015 Fall 2014 Lab 08: Polynomial Regression OBJECTIVES Polynomial regression is a statistical modeling technique to fit the curvilinear data that either shows a maximum or a minimum in the curve,

More information

SCHOOL OF MATHEMATICS AND STATISTICS Autumn Semester

SCHOOL OF MATHEMATICS AND STATISTICS Autumn Semester RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: "Statistics Tables" by H.R. Neave PAS 371 SCHOOL OF MATHEMATICS AND STATISTICS Autumn Semester 2008 9 Linear

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

BE640 Intermediate Biostatistics 2. Regression and Correlation. Simple Linear Regression Software: SAS. Emergency Calls to the New York Auto Club

BE640 Intermediate Biostatistics 2. Regression and Correlation. Simple Linear Regression Software: SAS. Emergency Calls to the New York Auto Club BE640 Intermediate Biostatistics 2. Regression and Correlation Simple Linear Regression Software: SAS Emergency Calls to the New York Auto Club Source: Chatterjee, S; Handcock MS and Simonoff JS A Casebook

More information

One-way ANOVA Model Assumptions

One-way ANOVA Model Assumptions One-way ANOVA Model Assumptions STAT:5201 Week 4: Lecture 1 1 / 31 One-way ANOVA: Model Assumptions Consider the single factor model: Y ij = µ + α }{{} i ij iid with ɛ ij N(0, σ 2 ) mean structure random

More information