Statistics GIDP Ph.D. Qualifying Exam Methodology

Size: px
Start display at page:

Download "Statistics GIDP Ph.D. Qualifying Exam Methodology"

Transcription

1 Statistics GIDP Ph.D. Qualifying Exam Methodology May 28th, 2015, 9:00am- 1:00pm Instructions: Provide answers on the supplied pads of paper; write on only one side of each sheet. Complete exactly 2 of the first 3 problems, and 2 of the last 3 problems. Turn in only those sheets you wish to have graded. You may use the computer and/or a calculator; any statistical tables that you may need are also provided. Stay calm and do your best; good luck. 1. A dataset is observed based on an experiment to compare three different brands of pens and three different wash treatments with respect to their ability to remove marks on a particular type of fabric. There are four replications in each combination of brands and treatments. The observation averages within each combination are given in the following table. 1) Consider the mean model y "# = μ " + ε "#, where ε "# ~N(0, σ ). MSE is Compute the estimates of μ along with their standard errors. μ = 5.11, μ " = 6.65, μ " = 6.51 μ " = 6.44, μ = 7.92, μ " = 7.24 μ " = 7.01, μ " = 8.15, μ = 7.77 se=sqrt(0.2438/n)=sqrt(0.2438/4)= ) Propose an effect model with the interaction effect included; compute the estimate of parameters. The effect model is: y "# = μ + α + β + αβ " + ε "#, i = 1,2,3, j = 1,2,3, k = 1,, 4 with assumptions: α = 0, β = 0, (αβ) " = (αβ) " = 0, and ε "# ~N(0, σ ) σ =

2 μ = 6.98 α = = 0.89 α = = 0.22 α = = 0.66 β = = 0.79 β = = 0.59 β = = 0.19 αβ = = 0.19 αβ " = = 0.03 αβ " = = 0.23 αβ " = = 0.03 αβ = = 0.13 αβ " = = 0.15 αβ " = = 0.16 αβ " = = 0.08 αβ = = ) Compute standard errors of the parameter estimates for the main effects in part 2). α = y.. y = y.. ( y.. + y.. + y.. )/3 var α = 2 3 y y y.. = var y.. = 2 3 σ 3 4 = σ 18 Since a=b=3, s.e of each parameter estimate=mse/18= ) Complete the following ANOVA table.

3 source DF SS MS F- values Pen Treatment Interaction Error Total SS "# = bn SS " = an y.. y y.. y = = = = ) (3pts) Compute R-square and test significance of the interaction effect. R- squre=ss model /SS total = / = P- value for interaction=p{f 4, 27 >0.664} > An engineer suspects that the surface finish of a metal part is influenced by the feed rate and the depth of cut. She selects three feed rates and three depths of cut. However, only 9 runs can be made in one day. She runs a complete replicate of the design on each day - the combination of the levels of feed rate and depth of cut is chosen randomly. The data are shown in the following table (dataset Surface.csv is provided). Assume that the days are blocks. Day 1 Day 2 Depth Depth Feed rate (a) (3 pts) What design is this? Blocking factorial design

4 (b) (6 pts) State the statistical model and the corresponding assumptions. y "# = μ + τ + β + τβ " + δ + ε "#, i = 1,2,3, j = 1,2,3, k = 1,2 τ = 0, β day (δ k ) is a block factor. = 0, τβ " = τβ " and ε "# ~N(0, σ ) = 0, δ = 0, (c) Make conclusions at false positive rate =0.05 and check model adequacy. The ANOVA table is shown below. Both Depth and Feed factors are significant at alpha=0.05. There is no unusual pattern detected in the QQ normality plot or the residual plot. Source DF Type III SS Mean Square F Value Pr > F Day Depth Feed <.0001 Depth*Feed (d) What is the difference in randomization for experimental runs between this design and the one in Question 3? In this design: within a day, the combinations of the levels of feed rate and depth of cut are chosen randomly.

5 In the design of Q3: within a day, a particular mix is randomly selected and then it is applied to a panel by three application methods. (e) Attach SAS/R code data surf; input Day Depth Feed Surface@@; datalines; ; proc glm data=surf; class Day Depth Feed; model Surface=Day Depth Feed Depth*Feed; output out=myresult r=res p=pred; run; PROC univariate data=myresult normal; var res; qqplot res/normal(mu=0 SIGMA=EST COLOR=RED L=1); run; proc sgplot; scatter x=pred y=res; refline 0; run; 3. An experiment is designed to study pigment dispersion in paint. Four different mixes of a particular pigment are studied. The procedure consists of preparing a particular mix and then applying that mix to a panel by three application methods (brushing, spraying, and rolling). The response measured is the percentage reflectance of the pigment. Three days are required to run the experiment. The data follow (dataset pigment.csv is provided). Assume that mixes and application methods are fixed.

6 (a) (3 pts) What design is this? Split-plot design (b) (6 pts) State the statistical model and the corresponding assumptions. y "# = μ + γ + α + (γα) " + β + αβ " + ε "#, i = 1,,3, j = 1,,4, β k = 1,,3 γ ~N 0, σ, α = 0, (αβ) " = 0, (γα) ~N 0, σ " = (αβ) " = 0, and ε "# ~N(0, σ ) (c) Make conclusions and check model adequacy. ANOVA table is shown below. Both Mix and Method factors are significant at alpha=0.05, and their interaction is slightly significant (pv-value =0.06). There is no unusual pattern noticed in QQ normality plot or residual plot. Type 3 Tests of Fixed Effects Effect Num DF Den DF F Value Pr > F Mix <.0001 Method <.0001 Mix*Method

7 (d) Now assume the application methods are random while the other terms are kept same as before. State the statistical model and the corresponding assumptions using the unrestricted method; reanalyze the data. y "# = μ + γ + α + (γα) " + β + αβ " + ε ", i = 1,,3, j = 1,,4, k = 1,,3 γ ~N 0, σ, α β ~N 0, σ, αβ " ~ N 0, σ " = 0, (γα) ~N 0, σ ", and ε "# ~N(0, σ ) The ANOVA analysis shows that only the fixed effect Mix is significant. Type 3 Tests of Fixed Effects Effect Num DF Den DF F Value Pr > F Mix <.0001 The covariance component estimates for all random effects are shown below, no effect is significant at alpha=0.05. From the QQ plot and residual plot all assumptions are met. Covariance Parameter Estimates Cov Parm Estimate Standard Error Z Value Pr > Z Alpha Lower Upper Day E25 Mix*Day E54 Method Mix*Method

8 Covariance Parameter Estimates Cov Parm Estimate Standard Error Z Value Pr > Z Alpha Lower Upper Residual And the residual plots: (e) Attach SAS/R code data pigment; input Mix Method Day Resp@@; datalines;

9 ; /* proc mixed => Stat model 2: only 5 terms included (rest terms are pooled as the random error term) */ proc mixed data=pigment method=type1; class Mix Method Day; model Resp=Mix Method Mix*Method/outp=predicted; random Day Day*Mix; run; PROC univariate data=predicted normal; var Resid; qqplot Resid /normal(mu=0 SIGMA=EST COLOR=RED L=1); run; proc sgplot; scatter x=pred y=resid; refline 0; run; /* part d)*/ proc mixed data=pigment CL covtest; class Mix Method Day; model Resp=Mix /outp=predicted; random Day Day*Mix Method Mix*Method; run; PROC univariate data=predicted normal; var Resid; qqplot Resid /normal(mu=0 SIGMA=EST COLOR=RED L=1); run; proc sgplot; scatter x=pred y=resid; refline 0; 4. An intriguing use of loess smoothing for enhancing residual diagnostics employs the method to verify, or perhaps call into question, indications of variance heterogeneity in a residual plot. From a regression fit (of any sort: SLR, MLR, loess, etc.) find the absolute residuals e i, i = 1,...,n. To these, apply a loess smooth against the fitted values Ŷi. If the loess curve for the e i s exhibits departure from a horizontal line, variance heterogeneity is

10 indicated/validated. If the smooth appears relatively flat, however, the loess diagnostic suggests that variation is not necessarily heterogeneous. Apply this strategy to the following data: Y ={career batting average} (a number between 0 and 1, reported to three digit-accuracy) recorded as a function of X = {number of years played} for n = 322 professional baseball players. (The data are found in the file baseball.csv.) Plot the absolute residuals from a regression fit and overlay the loess smooth to determine whether or not the loess smooth suggests possible heterogeneous variation. Use a second-order, robust smooth. Explore the loess fit by varying the smoothing parameter over selected values in the range 0.25 q A1. Always plot the data Sample R code: baseball.df = read.csv( file.choose() ) attach( baseball.df ) Y = batting.average X = years plot( Y ~ X, pch=19 ) The plot indicates an increase in Y = batting average as X = years increases, so consider a simple linear regression (SLR) fit. For the loess smooth, first find the absolute residuals and fitted values from the SLR fit absresid = abs( resid(lm( Y~X )) ) Yhat = fitted( lm( Y~X ) ) then apply loess (use a second-order, robust smooth, to allow for full flexibility). Try the default smoothing parameter of q = 0.75:

11 baseball75.lo = loess( absresid~yhat, span = 0.75, degree = 2, family='symmetric' ) Ysmooth75 = predict( baseball75.lo, data.frame(yhat = seq(min(yhat),max(yhat),.001)) ) Plot e i against Ŷi and overlay the smooth: plot( absresid~yhat, xlim=c(.25,.29), ylim=c(0,.11) ) par( new=true ) plot( Ysmooth75~seq(min(Yhat),max(Yhat),.001), type='l', lwd=2, xaxt='n', yaxt='n', xlab='', ylab='', xlim=c(.25,.29), ylim=c(0,.11) ) For comparison, the second-order, robust smooth at q = 0.33 gives: baseball33.lo = loess( absresid~yhat, span = 0.33, degree = 2, family='symmetric' ) Ysmooth33 = predict( baseball33.lo, data.frame(yhat = seq(min(yhat),max(yhat),.001)) ) plot( absresid~yhat, xlim=c(.25,.29), ylim=c(0,.11) ) par( new=true ) plot( Ysmooth33~seq(min(Yhat),max(Yhat),.001), type='l', lwd=2, xaxt='n', yaxt='n', xlab='', ylab='', xlim=c(.25,.29), ylim=c(0,.11) )

12 which gives a more jagged smoothed curve (as would be expected). Also, the second-order, robust smooth at q = 0.50 yields: baseball50.lo = loess( absresid~yhat, span = 0.50, degree = 2, family='symmetric' ) Ysmooth50 = predict( baseball50.lo, data.frame(yhat = seq(min(yhat),max(yhat),.001)) ) plot( absresid~yhat, xlim=c(.25,.29), ylim=c(0,.11) ) par( new=true ) plot( Ysmooth50~seq(min(Yhat),max(Yhat),.001), type='l', lwd=2, xaxt='n', yaxt='n', xlab='', ylab='', xlim=c(.25,.29), ylim=c(0,.11) )

13 which appaers less jagged (again, as would be expected) and more similar to the loess curve at q = From a broader perspective, all the smoothed loess curves suggest a fairly flat relationship, so the issue of variance heterogeneity may not be critical. (Further investigation would be warranted.) The use of loess in this fashion is from Cleveland, W. S. (1979). Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association 74(368), The data are from Sec. 3.8 of Friendly, M. (2000). Visualizing Categorical Data. Cary, NC: SAS Institute, Inc.

14 5. Suppose you fit a simple linear regression model to data Y i ~ indep. N(α + βx i, σ 2 ); i=1,...,n. (a) Let ˆβ be the usual least squares (LS) estimator of β. State the distribution of ˆβ. (b) Let S 2 be the MSE = n (Y i Ŷi) 2 /(n 2), where 1 Ŷi = ˆα + ˆβx i and ˆα is the LS estimator for α. Recall that S 2 is known to be an unbiased estimator for σ 2. State a result involving the χ 2 distribution that involves S 2 and σ 2. What one important statistical relation (in terms of probability features) exists between this and the result you state in part (a)? (c) Find an unbiased estimator for the ratio β σ. A2: For simplicity, let d = n 2 and v = 1/ n 1 (x i x) 2. (a) ˆβ ˆβ ~ N(β, σ 2 β v). Notice that Z = σ v ~ N(0,1). (b) W = S2 d σ 2 ~ χ 2 (d), where d = n 2. This is statistically independent of ˆβ (and Z) in part (a). ˆβ β (c) Given Z = σ v S 2 d ~ N(0,1), independent of W = σ 2 ~ χ 2 (d), where d = n 2. Note that since 1 0 Γ(d/2)2 d/2 w (d/2) 1 e w/2 dw = 1, we know (for a = d/2) 0 wa 1 e w/2 dw = 2 a Γ(a). Thus, e.g., E[W b ] = 0 1 Γ(d/2)2 d/2 w(d/2) b 1 e w/2 dw = = Γ({d/2} b)2(d/2 ) b Γ({d/2} b) d/2 = 2 b Γ(d/2)2 Γ(d/2) for b > 0. In particular, E[W (1/2) ] = E σ S d = Γ({d 1}/2) Γ(d/2) 2. 1 Γ(d/2)2 d/2 0 w(d/2) b 1 e w/2 dw

15 Now, since Z and W are independent, T = ˆβ β σ v ˆβ β = S 2 d σ v σ 2 d σ ˆβ β S = S v ~ t(d), so E ˆβ β Z ~ t(d) and so for d > 1, E[T] = 0. This can be written as W/d S v = 0. That is, E ˆβ S v = E β S v = E σ S d = β d Γ({d 1}/2). Now multiply both sides by v to find σ v Γ(d/2) 2 βv E 1 S = β d σ v E ˆβ S = v β d Γ({d 1}/2) σ v Γ(d/2) 2 = β σ d 2 Γ({d 1}/2). Γ(d/2) Therefore, an unbiased estimator for β σ is β S Γ(d/2) Γ({d 1}/2) 2, where d = n 2. d

16 6. In the study of weathering in mountainous ecosystems, it is expected that silicon weathers away in soil as ambient temperature increases. In an experiment to study this, data were recorded on loss of silicon in soil at four independent sites over differing temperature conditions. These were: Temp. ( C+5) Silicon conc. (mg/kg) Assume that the observations satisfy Y ij ~ indep. N(µ{t i },σ 2 ), i=1,,3, j=1,,4, where t i are the 3 temperatures under study and µ{t i } is some function of t i. Using linear regression methods, find a model that fits these data both as reasonably and as parsimoniously as possible. (This question is purposefully open-ended.) From your fit, perform a test to assess the hypotheses of no effect, vs. some effect, due to temperature change in these sites. Set your false positive error rate to A3. Always plot the data Sample R code: silicon.df = read.csv( file.choose() ) attach( silicon.df ) Y = conc t = temp plot( Y ~ t, pch=19 ) The plot indicates a decrease in Y = silicon concentration as X = temperature increases, so consider a simple linear regression (SLR) fit and (first) check the residual plot:

17 siliconslr.lm = lm( Y ~ t ) plot( resid(siliconslr.lm)~t, pch=19 ); abline( h=0 ) The resid. plot indicates a clear pattern (also evident from a close look at the scatterplot): so a SLR model gives a poor fit. With the observed pattern in the residuals, the obvious thing to try next is a quadratic model: siliconqr.lm = lm( Y ~ t + I(t^2) ) plot( resid(siliconqr.lm)~t, pch=19 ); abline( h=0 )

18 The resid. plot here indicates a better fit, with possibly a slight decrease in variation at higher temperatures (i.e., slightly heterogeneous variance). But first, overlay the fitted model on the original data: bqr = coef( siliconqr.lm ) plot( Y ~ t, pch=19, xlim=c(3,10), ylim=c(0,150) ); par( new=t ) curve( bqr[1] + x*(bqr[2] + x*bqr[3]), xlim=c(3,10), ylim=c(0,150), ylab='', xlab='' )

19 The ever-present danger with a quadratic fit is evident here: the very good fit also comes with the unlikely implication that the mean response turns back up before we reach the highest observed temperature. (Quick reflection suggests that this is hard to explain: it is reasonable for the soil to lose silicon as temperature rises, but then how could it regain the silicon as the temperature starts to rise even higher?) So, start again: since the simple linear model fails to account for curvilinearity in the data, try a transformation. The logarithm is a natural choice: U = log(y); plot( U ~ t, pch=19 ) siliconlog.lm = lm( U~t ) plot( resid(siliconlog.lm)~t, pch=19 ); abline( h=0 )

20 Some improvement is shown in the residuals, but the curvilinearity may still be present and there is now clear variance heterogeneity. So, try a quadratic linear predictor again, but now against U = log(y), and also apply weighted least squares (WLS) to account for the heterogeneous variances. For the WLS fit, the per-temperature replication makes choice of the weights easy: use reciprocals of the sample variances at each temperature. s2 = by(data=silicon.df$conc, INDICES=factor(silicon.df$temp), FUN=var) w = rep( 1/s2, each=4 ) siliconqlog.lm = lm( U~t+I(t^2), weight=w ) plot( resid(siliconqlog.lm)~t, pch=19 ); abline( h=0 ) We don t see much change in the residual plot (of course, we don't expect to: the theory tells us that the inverse-variance weighting will nonetheless adjust for any variance heterogeneity). An overlay of the (back-transformed) fitted model to the data shows a much more-sensible picture: bqlog = coef( siliconqlog.lm ) plot( Y ~ t, pch=19, xlim=c(3,10), ylim=c(0,150) ); par( new=t ) curve( exp(bqlog[1] + x*(bqlog[2] + x*bqlog[3])), xlim=c(3,10), ylim=c(0,150), ylab='', xlab='' )

21 So, proceed with this model, where E[log(Y i )] = β 0 + β 1 t i + β 2 t i 2. The hypotheses of no effect due to temperature is H o : β 1 = β 2 = 0. (The alternative in H a is any difference.) Test this via (output edited) summary( siliconqlog.lm ) Call: lm(formula = U ~ t + I(t^2), weights = w) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) t I(t^2) Residual standard error: on 9 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 2 and 9 DF, p-value: The pertinent test statistic here is the full F-statistic of F calc = with (2,9) d.f. (given at bottom of output). The corresp. P-value is P = , which is well below Thus we reject H o and conclude that under this model, there is a significant effect due to temperature on (log)silicon concentration. Notice, by the way, that besides β 0, neither individual regression parameter is significant based on its 1 d.f. partial t-test. This is due (not surprisingly) to the heavy multicollinearity underlying this quadratic regression; the VIFs are both far above 10.0: require( car ) vif( siliconqlog.lm )

22 t I(t^2) Indeed, we should formally center the temperature variable before conducting the quadratic fit. Notice, however, that the full F-statistic (and hence its P-value) does not change (output edited): tminustbar = scale( t, scale=f ) summary( lm(u~tminustbar+i(tminustbar^2), weight=w) ) Residual standard error: on 9 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 2 and 9 DF, p-value:

Statistics GIDP Ph.D. Qualifying Exam Methodology

Statistics GIDP Ph.D. Qualifying Exam Methodology Statistics GIDP Ph.D. Qualifying Exam Methodology May 28, 2015, 9:00am- 1:00pm Instructions: Provide answers on the supplied pads of paper; write on only one side of each sheet. Complete exactly 2 of the

More information

Statistics GIDP Ph.D. Qualifying Exam Methodology

Statistics GIDP Ph.D. Qualifying Exam Methodology Statistics GIDP Ph.D. Qualifying Exam Methodology January 9, 2018, 9:00am 1:00pm Instructions: Put your ID (not your name) on each sheet. Complete exactly 5 of 6 problems; turn in only those sheets you

More information

Statistics GIDP Ph.D. Qualifying Exam Methodology January 10, 9:00am-1:00pm

Statistics GIDP Ph.D. Qualifying Exam Methodology January 10, 9:00am-1:00pm Statistics GIDP Ph.D. Qualifying Exam Methodology January 10, 9:00am-1:00pm Instructions: Put your ID (not name) on each sheet. Complete exactly 5 of 6 problems; turn in only those sheets you wish to have

More information

Statistics GIDP Ph.D. Qualifying Exam Methodology May 26 9:00am-1:00pm

Statistics GIDP Ph.D. Qualifying Exam Methodology May 26 9:00am-1:00pm Statistics GIDP Ph.D. Qualifying Exam Methodology May 26 9:00am-1:00pm Instructions: Put your ID (not name) on each sheet. Complete exactly 5 of 6 problems; turn in only those sheets you wish to have graded.

More information

Statistics GIDP Ph.D. Qualifying Exam Methodology

Statistics GIDP Ph.D. Qualifying Exam Methodology Statistics GIDP Ph.D. Qualifying Exam Methodology May 26, 2017, 9:00am-1:00pm Instructions: Put your ID (not your name) on each sheet. Complete exactly 5 of 6 problems; turn in only those sheets you wish

More information

Statistics GIDP Ph.D. Qualifying Exam Methodology May 26 9:00am-1:00pm

Statistics GIDP Ph.D. Qualifying Exam Methodology May 26 9:00am-1:00pm Statistics GIDP Ph.D. Qualifying Exam Methodology May 26 9:00am-1:00pm Instructions: Put your ID (not name) on each sheet. Complete exactly 5 of 6 problems; turn in only those sheets you wish to have graded.

More information

Lecture 10: Experiments with Random Effects

Lecture 10: Experiments with Random Effects Lecture 10: Experiments with Random Effects Montgomery, Chapter 13 1 Lecture 10 Page 1 Example 1 A textile company weaves a fabric on a large number of looms. It would like the looms to be homogeneous

More information

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model Topic 17 - Single Factor Analysis of Variance - Fall 2013 One way ANOVA Cell means model Factor effects model Outline Topic 17 2 One-way ANOVA Response variable Y is continuous Explanatory variable is

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Advanced Statistical Regression Analysis: Mid-Term Exam Chapters 1-5

Advanced Statistical Regression Analysis: Mid-Term Exam Chapters 1-5 Advanced Statistical Regression Analysis: Mid-Term Exam Chapters 1-5 Instructions: Read each question carefully before determining the best answer. Show all work; supporting computer code and output must

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Analysing data: regression and correlation S6 and S7

Analysing data: regression and correlation S6 and S7 Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association

More information

SLR output RLS. Refer to slr (code) on the Lecture Page of the class website.

SLR output RLS. Refer to slr (code) on the Lecture Page of the class website. SLR output RLS Refer to slr (code) on the Lecture Page of the class website. Old Faithful at Yellowstone National Park, WY: Simple Linear Regression (SLR) Analysis SLR analysis explores the linear association

More information

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c Inference About the Slope ffl As with all estimates, ^fi1 subject to sampling var ffl Because Y jx _ Normal, the estimate ^fi1 _ Normal A linear combination of indep Normals is Normal Simple Linear Regression

More information

Outline. Topic 20 - Diagnostics and Remedies. Residuals. Overview. Diagnostics Plots Residual checks Formal Tests. STAT Fall 2013

Outline. Topic 20 - Diagnostics and Remedies. Residuals. Overview. Diagnostics Plots Residual checks Formal Tests. STAT Fall 2013 Topic 20 - Diagnostics and Remedies - Fall 2013 Diagnostics Plots Residual checks Formal Tests Remedial Measures Outline Topic 20 2 General assumptions Overview Normally distributed error terms Independent

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij = K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing

More information

Chapter 11. Analysis of Variance (One-Way)

Chapter 11. Analysis of Variance (One-Way) Chapter 11 Analysis of Variance (One-Way) We now develop a statistical procedure for comparing the means of two or more groups, known as analysis of variance or ANOVA. These groups might be the result

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

COMPLETELY RANDOM DESIGN (CRD) -Design can be used when experimental units are essentially homogeneous.

COMPLETELY RANDOM DESIGN (CRD) -Design can be used when experimental units are essentially homogeneous. COMPLETELY RANDOM DESIGN (CRD) Description of the Design -Simplest design to use. -Design can be used when experimental units are essentially homogeneous. -Because of the homogeneity requirement, it may

More information

1 A Review of Correlation and Regression

1 A Review of Correlation and Regression 1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then

More information

First Year Examination Department of Statistics, University of Florida

First Year Examination Department of Statistics, University of Florida First Year Examination Department of Statistics, University of Florida August 19, 010, 8:00 am - 1:00 noon Instructions: 1. You have four hours to answer questions in this examination.. You must show your

More information

Lecture 11: Simple Linear Regression

Lecture 11: Simple Linear Regression Lecture 11: Simple Linear Regression Readings: Sections 3.1-3.3, 11.1-11.3 Apr 17, 2009 In linear regression, we examine the association between two quantitative variables. Number of beers that you drink

More information

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION Answer all parts. Closed book, calculators allowed. It is important to show all working,

More information

Applied Regression Analysis

Applied Regression Analysis Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of

More information

ANOVA (Analysis of Variance) output RLS 11/20/2016

ANOVA (Analysis of Variance) output RLS 11/20/2016 ANOVA (Analysis of Variance) output RLS 11/20/2016 1. Analysis of Variance (ANOVA) The goal of ANOVA is to see if the variation in the data can explain enough to see if there are differences in the means.

More information

Categorical Predictor Variables

Categorical Predictor Variables Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively

More information

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3.1 through 3.3

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3.1 through 3.3 Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3.1 through 3.3 Fall, 2013 Page 1 Tensile Strength Experiment Investigate the tensile strength of a new synthetic fiber. The factor is the

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Topic 20: Single Factor Analysis of Variance

Topic 20: Single Factor Analysis of Variance Topic 20: Single Factor Analysis of Variance Outline Single factor Analysis of Variance One set of treatments Cell means model Factor effects model Link to linear regression using indicator explanatory

More information

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 2. The Simple Linear Regression Model: Matrix Approach Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution

More information

STOR 455 STATISTICAL METHODS I

STOR 455 STATISTICAL METHODS I STOR 455 STATISTICAL METHODS I Jan Hannig Mul9variate Regression Y=X β + ε X is a regression matrix, β is a vector of parameters and ε are independent N(0,σ) Es9mated parameters b=(x X) - 1 X Y Predicted

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

Lecture 10: 2 k Factorial Design Montgomery: Chapter 6

Lecture 10: 2 k Factorial Design Montgomery: Chapter 6 Lecture 10: 2 k Factorial Design Montgomery: Chapter 6 Page 1 2 k Factorial Design Involving k factors Each factor has two levels (often labeled + and ) Factor screening experiment (preliminary study)

More information

Handout 4: Simple Linear Regression

Handout 4: Simple Linear Regression Handout 4: Simple Linear Regression By: Brandon Berman The following problem comes from Kokoska s Introductory Statistics: A Problem-Solving Approach. The data can be read in to R using the following code:

More information

DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya

DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Jurusan Teknik Industri Universitas Brawijaya Outline Introduction The Analysis of Variance Models for the Data Post-ANOVA Comparison of Means Sample

More information

Lecture 9: Factorial Design Montgomery: chapter 5

Lecture 9: Factorial Design Montgomery: chapter 5 Lecture 9: Factorial Design Montgomery: chapter 5 Page 1 Examples Example I. Two factors (A, B) each with two levels (, +) Page 2 Three Data for Example I Ex.I-Data 1 A B + + 27,33 51,51 18,22 39,41 EX.I-Data

More information

df=degrees of freedom = n - 1

df=degrees of freedom = n - 1 One sample t-test test of the mean Assumptions: Independent, random samples Approximately normal distribution (from intro class: σ is unknown, need to calculate and use s (sample standard deviation)) Hypotheses:

More information

Regression Analysis IV... More MLR and Model Building

Regression Analysis IV... More MLR and Model Building Regression Analysis IV... More MLR and Model Building This session finishes up presenting the formal methods of inference based on the MLR model and then begins discussion of "model building" (use of regression

More information

STAT 571A Advanced Statistical Regression Analysis. Chapter 8 NOTES Quantitative and Qualitative Predictors for MLR

STAT 571A Advanced Statistical Regression Analysis. Chapter 8 NOTES Quantitative and Qualitative Predictors for MLR STAT 571A Advanced Statistical Regression Analysis Chapter 8 NOTES Quantitative and Qualitative Predictors for MLR 2015 University of Arizona Statistics GIDP. All rights reserved, except where previous

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20 Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat). Statistics 512: Solution to Homework#11 Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat). 1. Perform the two-way ANOVA without interaction for this model. Use the results

More information

Week 7.1--IES 612-STA STA doc

Week 7.1--IES 612-STA STA doc Week 7.1--IES 612-STA 4-573-STA 4-576.doc IES 612/STA 4-576 Winter 2009 ANOVA MODELS model adequacy aka RESIDUAL ANALYSIS Numeric data samples from t populations obtained Assume Y ij ~ independent N(μ

More information

This is a Randomized Block Design (RBD) with a single factor treatment arrangement (2 levels) which are fixed.

This is a Randomized Block Design (RBD) with a single factor treatment arrangement (2 levels) which are fixed. EXST3201 Chapter 13c Geaghan Fall 2005: Page 1 Linear Models Y ij = µ + βi + τ j + βτij + εijk This is a Randomized Block Design (RBD) with a single factor treatment arrangement (2 levels) which are fixed.

More information

Statistical Techniques II EXST7015 Simple Linear Regression

Statistical Techniques II EXST7015 Simple Linear Regression Statistical Techniques II EXST7015 Simple Linear Regression 03a_SLR 1 Y - the dependent variable 35 30 25 The objective Given points plotted on two coordinates, Y and X, find the best line to fit the data.

More information

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box. FINAL EXAM ** Two different ways to submit your answer sheet (i) Use MS-Word and place it in a drop-box. (ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box. Deadline: December

More information

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS STAT 512 MidTerm I (2/21/2013) Spring 2013 Name: Key INSTRUCTIONS 1. This exam is open book/open notes. All papers (but no electronic devices except for calculators) are allowed. 2. There are 5 pages in

More information

STAT 525 Fall Final exam. Tuesday December 14, 2010

STAT 525 Fall Final exam. Tuesday December 14, 2010 STAT 525 Fall 2010 Final exam Tuesday December 14, 2010 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will

More information

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression Introduction to Correlation and Regression The procedures discussed in the previous ANOVA labs are most useful in cases where we are interested

More information

STAT 8200 Design of Experiments for Research Workers Lab 11 Due: Friday, Nov. 22, 2013

STAT 8200 Design of Experiments for Research Workers Lab 11 Due: Friday, Nov. 22, 2013 Example: STAT 8200 Design of Experiments for Research Workers Lab 11 Due: Friday, Nov. 22, 2013 An experiment is designed to study pigment dispersion in paint. Four different methods of mixing a particular

More information

13 Simple Linear Regression

13 Simple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity

More information

Regression. Marc H. Mehlman University of New Haven

Regression. Marc H. Mehlman University of New Haven Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and

More information

Week 3: Simple Linear Regression

Week 3: Simple Linear Regression Week 3: Simple Linear Regression Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ALL RIGHTS RESERVED 1 Outline

More information

cor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson )

cor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson ) Tutorial 7: Correlation and Regression Correlation Used to test whether two variables are linearly associated. A correlation coefficient (r) indicates the strength and direction of the association. A correlation

More information

Lecture 3: Inference in SLR

Lecture 3: Inference in SLR Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

Lecture 12: 2 k Factorial Design Montgomery: Chapter 6

Lecture 12: 2 k Factorial Design Montgomery: Chapter 6 Lecture 12: 2 k Factorial Design Montgomery: Chapter 6 1 Lecture 12 Page 1 2 k Factorial Design Involvingk factors: each has two levels (often labeled+and ) Very useful design for preliminary study Can

More information

Lecture 11 Multiple Linear Regression

Lecture 11 Multiple Linear Regression Lecture 11 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 11-1 Topic Overview Review: Multiple Linear Regression (MLR) Computer Science Case Study 11-2 Multiple Regression

More information

STAT22200 Spring 2014 Chapter 14

STAT22200 Spring 2014 Chapter 14 STAT22200 Spring 2014 Chapter 14 Yibi Huang May 27, 2014 Chapter 14 Incomplete Block Designs 14.1 Balanced Incomplete Block Designs (BIBD) Chapter 14-1 Incomplete Block Designs A Brief Introduction to

More information

14 Multiple Linear Regression

14 Multiple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in

More information

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Fall, 2013 Page 1 Random Variable and Probability Distribution Discrete random variable Y : Finite possible values {y

More information

Density Temp vs Ratio. temp

Density Temp vs Ratio. temp Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,

More information

Analysis of Variance

Analysis of Variance Statistical Techniques II EXST7015 Analysis of Variance 15a_ANOVA_Introduction 1 Design The simplest model for Analysis of Variance (ANOVA) is the CRD, the Completely Randomized Design This model is also

More information

Chapter 5 Exercises 1

Chapter 5 Exercises 1 Chapter 5 Exercises 1 Data Analysis & Graphics Using R, 2 nd edn Solutions to Exercises (December 13, 2006) Preliminaries > library(daag) Exercise 2 For each of the data sets elastic1 and elastic2, determine

More information

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph. Regression, Part I I. Difference from correlation. II. Basic idea: A) Correlation describes the relationship between two variables, where neither is independent or a predictor. - In correlation, it would

More information

Stat 500 Midterm 2 12 November 2009 page 0 of 11

Stat 500 Midterm 2 12 November 2009 page 0 of 11 Stat 500 Midterm 2 12 November 2009 page 0 of 11 Please put your name on the back of your answer book. Do NOT put it on the front. Thanks. Do not start until I tell you to. The exam is closed book, closed

More information

Topic 23: Diagnostics and Remedies

Topic 23: Diagnostics and Remedies Topic 23: Diagnostics and Remedies Outline Diagnostics residual checks ANOVA remedial measures Diagnostics Overview We will take the diagnostics and remedial measures that we learned for regression and

More information

Lecture 7: Latin Square and Related Design

Lecture 7: Latin Square and Related Design Lecture 7: Latin Square and Related Design Montgomery: Section 4.2-4.3 Page 1 Automobile Emission Experiment Four cars and four drivers are employed in a study for possible differences between four gasoline

More information

Lecture 1 Linear Regression with One Predictor Variable.p2

Lecture 1 Linear Regression with One Predictor Variable.p2 Lecture Linear Regression with One Predictor Variablep - Basics - Meaning of regression parameters p - β - the slope of the regression line -it indicates the change in mean of the probability distn of

More information

What If There Are More Than. Two Factor Levels?

What If There Are More Than. Two Factor Levels? What If There Are More Than Chapter 3 Two Factor Levels? Comparing more that two factor levels the analysis of variance ANOVA decomposition of total variability Statistical testing & analysis Checking

More information

MATH11400 Statistics Homepage

MATH11400 Statistics Homepage MATH11400 Statistics 1 2010 11 Homepage http://www.stats.bris.ac.uk/%7emapjg/teach/stats1/ 4. Linear Regression 4.1 Introduction So far our data have consisted of observations on a single variable of interest.

More information

PLS205 Lab 2 January 15, Laboratory Topic 3

PLS205 Lab 2 January 15, Laboratory Topic 3 PLS205 Lab 2 January 15, 2015 Laboratory Topic 3 General format of ANOVA in SAS Testing the assumption of homogeneity of variances by "/hovtest" by ANOVA of squared residuals Proc Power for ANOVA One-way

More information

STAT 3A03 Applied Regression With SAS Fall 2017

STAT 3A03 Applied Regression With SAS Fall 2017 STAT 3A03 Applied Regression With SAS Fall 2017 Assignment 2 Solution Set Q. 1 I will add subscripts relating to the question part to the parameters and their estimates as well as the errors and residuals.

More information

Analysis of Covariance

Analysis of Covariance Analysis of Covariance (ANCOVA) Bruce A Craig Department of Statistics Purdue University STAT 514 Topic 10 1 When to Use ANCOVA In experiment, there is a nuisance factor x that is 1 Correlated with y 2

More information

Statistical Modelling in Stata 5: Linear Models

Statistical Modelling in Stata 5: Linear Models Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 07/11/2017 Structure This Week What is a linear model? How good is my model? Does

More information

Lecture 4. Random Effects in Completely Randomized Design

Lecture 4. Random Effects in Completely Randomized Design Lecture 4. Random Effects in Completely Randomized Design Montgomery: 3.9, 13.1 and 13.7 1 Lecture 4 Page 1 Random Effects vs Fixed Effects Consider factor with numerous possible levels Want to draw inference

More information

Residual Analysis for two-way ANOVA The twoway model with K replicates, including interaction,

Residual Analysis for two-way ANOVA The twoway model with K replicates, including interaction, Residual Analysis for two-way ANOVA The twoway model with K replicates, including interaction, is Y ijk = µ ij + ɛ ijk = µ + α i + β j + γ ij + ɛ ijk with i = 1,..., I, j = 1,..., J, k = 1,..., K. In carrying

More information

STAT 350. Assignment 4

STAT 350. Assignment 4 STAT 350 Assignment 4 1. For the Mileage data in assignment 3 conduct a residual analysis and report your findings. I used the full model for this since my answers to assignment 3 suggested we needed the

More information

22s:152 Applied Linear Regression. Take random samples from each of m populations.

22s:152 Applied Linear Regression. Take random samples from each of m populations. 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

STATISTICS 479 Exam II (100 points)

STATISTICS 479 Exam II (100 points) Name STATISTICS 79 Exam II (1 points) 1. A SAS data set was created using the following input statement: Answer parts(a) to (e) below. input State $ City $ Pop199 Income Housing Electric; (a) () Give the

More information

Chapter 16. Simple Linear Regression and Correlation

Chapter 16. Simple Linear Regression and Correlation Chapter 16 Simple Linear Regression and Correlation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Lecture 14 Simple Linear Regression

Lecture 14 Simple Linear Regression Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent

More information

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3-1 through 3-3

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3-1 through 3-3 Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3-1 through 3-3 Page 1 Tensile Strength Experiment Investigate the tensile strength of a new synthetic fiber. The factor is the weight percent

More information

Topic 32: Two-Way Mixed Effects Model

Topic 32: Two-Way Mixed Effects Model Topic 3: Two-Way Mixed Effects Model Outline Two-way mixed models Three-way mixed models Data for two-way design Y is the response variable Factor A with levels i = 1 to a Factor B with levels j = 1 to

More information

Scenarios Where Utilizing a Spline Model in Developing a Regression Model Is Appropriate

Scenarios Where Utilizing a Spline Model in Developing a Regression Model Is Appropriate Paper 1760-2014 Scenarios Where Utilizing a Spline Model in Developing a Regression Model Is Appropriate Ning Huang, University of Southern California ABSTRACT Linear regression has been a widely used

More information

STAT 705 Chapter 16: One-way ANOVA

STAT 705 Chapter 16: One-way ANOVA STAT 705 Chapter 16: One-way ANOVA Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 21 What is ANOVA? Analysis of variance (ANOVA) models are regression

More information

STAT 571A Advanced Statistical Regression Analysis. Chapter 3 NOTES Diagnostics and Remedial Measures

STAT 571A Advanced Statistical Regression Analysis. Chapter 3 NOTES Diagnostics and Remedial Measures STAT 571A Advanced Statistical Regression Analysis Chapter 3 NOTES Diagnostics and Remedial Measures 2015 University of Arizona Statistics GIDP. All rights reserved, except where previous rights exist.

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression MATH 282A Introduction to Computational Statistics University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/ eariasca/math282a.html MATH 282A University

More information

CHAPTER EIGHT Linear Regression

CHAPTER EIGHT Linear Regression 7 CHAPTER EIGHT Linear Regression 8. Scatter Diagram Example 8. A chemical engineer is investigating the effect of process operating temperature ( x ) on product yield ( y ). The study results in the following

More information

Chapter 5 Introduction to Factorial Designs Solutions

Chapter 5 Introduction to Factorial Designs Solutions Solutions from Montgomery, D. C. (1) Design and Analysis of Experiments, Wiley, NY Chapter 5 Introduction to Factorial Designs Solutions 5.1. The following output was obtained from a computer program that

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

Answer to exercise: Blood pressure lowering drugs

Answer to exercise: Blood pressure lowering drugs Answer to exercise: Blood pressure lowering drugs The data set bloodpressure.txt contains data from a cross-over trial, involving three different formulations of a drug for lowering of blood pressure:

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

This exam contains 5 questions. Each question is worth 10 points. Therefore, this exam is worth 50 points.

This exam contains 5 questions. Each question is worth 10 points. Therefore, this exam is worth 50 points. GROUND RULES: This exam contains 5 questions. Each question is worth 10 points. Therefore, this exam is worth 50 points. Print your name at the top of this page in the upper right hand corner. This is

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information