Diagnostics and Remedial Measures: An Overview

Size: px
Start display at page:

Download "Diagnostics and Remedial Measures: An Overview"

Transcription

1 Diagnostics and Remedial Measures: An Overview Residuals Model diagnostics Graphical techniques Hypothesis testing Remedial measures Transformation Later: more about all this for multiple regression W. Zhou (Colorado State University) STAT 540 July 6th, / 50

2 Model Assumptions Recall simple linear regression model. Y i = β 0 + β 1 X i + ɛ i, ɛ i iid N(0, σ 2 ), for i = 1,..., n. A linear-line relationship between E(Y ) and X: Homogeneous variance: Independence: Normal distribution: W. Zhou (Colorado State University) STAT 540 July 6th, / 50

3 Ramifications If Assumptions Violated Recall simple linear regression model. Nonlinearity Linear model will fit poorly Parameter estimates may be meaningless Non-independence Parameter estimates are still unbiased Standard errors are a problem and thus so is inference Nonconstant variance Parameter estimates are still unbiased Standard errors are a problem Non-normality Least important, why? Inference is fairly robust to non-normality Important effects on prediction intervals W. Zhou (Colorado State University) STAT 540 July 6th, / 50

4 Model Diagnostics Reliable inference hinges on reasonable adherence to model assumptions Hence it is important to evaluate the FOUR model assumptions, that is, to perform model diagnostics. The main approach to model diagnostics is to examine the residuals (thanks to the additive model assumption) Consider two approaches. Graphical techniques: More subjective but quick and very informative for an expert. Hypothesis tests: More objective and comfortable for amateurs, but outcomes depend on assumptions, sensitivity. Tendency to use as a crutch. W. Zhou (Colorado State University) STAT 540 July 6th, / 50

5 Graphical Techniques At this point in the analysis, you have already done EDA. 1D exploration of X and Y. 2D exploration of X and Y. Not very effective for model diagnostics except in drastic cases Recall the definition of residual e i = Y i Ŷi, where i = 1,..., n e i can be treated as an estimate of the true error ɛ i = Y i E(Y i ) iid N(0, σ 2 ) e i can be used to check normality, homoscedasticity, linearity, and independence. W. Zhou (Colorado State University) STAT 540 July 6th, / 50

6 Properties of Residuals Mean: ē = Variance: Nonindependence: MSE = SSE n n n 2 = i=1 e2 i n 2 = i=1 (e i ē) 2 = s 2. n 2 When the sample size n is large, however, residuals can be treated as independent. W. Zhou (Colorado State University) STAT 540 July 6th, / 50

7 Standardized Residuals For diagnostics there are superior choices to the ordinary residuals Standardized (KNNL: semi-studentized ) residuals: V ar(ɛ i) = σ 2 therefore is is natural to apply the standardization e i = But each e i has a different variance.. Use this fact to derive superior type of residuals below W. Zhou (Colorado State University) STAT 540 July 6th, / 50

8 Hat Values Ŷ i = The h ij are called hat values. W. Zhou (Colorado State University) STAT 540 July 6th, / 50

9 Deriving the Variance of Residuals Using Ŷi = j h ijy j we obtain e i = Therefore (since the Y s are independent) V ar{e i } = W. Zhou (Colorado State University) STAT 540 July 6th, / 50

10 Continuing to Derive Variance of Residuals Using V ar{e i } = σ [(1 2 h ii ) 2 + ] j i h2 ij, we have h 2 ij = h ii (show it in HW.) Finally, j V ar{e i } = σ 2 (1 h ii ) 2 + j i h 2 ij = σ 2 1 2h ii + h 2 ii + j i h 2 ij = σ 2 1 2h ii + j h 2 ij = σ 2 (1 2h ii + h ii ) = σ 2 (1 h ii ) W. Zhou (Colorado State University) STAT 540 July 6th, / 50

11 Studentized Residuals Now we may scale each residual separately by its own standard deviation The (internally) studentized residual is r i = e i / MSE(1 h ii ) There is still a problem: Imagine that Y i is a severe outlier Yi will strongly pull the regression line toward it ei will understate the distance between Y i and the true regression line The solution is to use externally studentized residuals... W. Zhou (Colorado State University) STAT 540 July 6th, / 50

12 Studentized Residuals Now we may scale each residual separately by its own standard deviation The (internally) studentized residual is r i = e i / MSE(1 h ii ) There is still a problem: Imagine that Y i is a severe outlier Yi will strongly pull the regression line toward it ei will understate the distance between Y i and the true regression line The solution is to use externally studentized residuals... W. Zhou (Colorado State University) STAT 540 July 6th, / 50

13 Studentized Deleted Residuals To eliminate the influence of Y i on the misfit at the ith point, fit the regression line based on all points except the ith. Define the prediction at X i using this deleted regression as Ŷi(i) The deleted residual is d i = Y i Ŷi(i) The studentized deleted residual is t i = d i /ŝ{d i } = Y i Ŷi(i) MSE(i) /(1 h ii ) No need to fit n deleted regressions, we can show that d i = e i /(1 h ii ) (n 2)MSE = (n 3)MSE (i) + e 2 i /(1 h ii ) Also, t i has a t-distribution: t i t n 3 W. Zhou (Colorado State University) STAT 540 July 6th, / 50

14 Residual Plots Residual plot is a primary graphic diagnostic method. Departures from model assumptions can be difficult to detect directly from X and Y. Use the externally standardized residuals Some key residual plots: Plot ti against predicted values Ŷi (Not Yi) detect nonconstant variance detect nonlinearity detect outliers Plot ti against X i. In simple linear regression this is same as above (Why?) In multiple regression will be useful to detect partial correlation Plot ti versus other possible predictors (e.g., time) Detect important lurking variable Plot ti versus lagged residuals Detect correlated errors QQ-plot or normal probability (PP-) plot of ti. Detect non-normality W. Zhou (Colorado State University) STAT 540 July 6th, / 50

15 Nonlinearity of Regression Function Plot t i against Ŷi (and X i for multiple linear regressions). Random scatter indicates no serious departure from linearity. Banana indicates departure from linearity. Could fit nonparametric smoother to residual plot to aid detection Example: Curved relationship (KNNL Figure 3.4(a)). Plotting Y vs. X is not nearly as effective for detecting nonlinearity because trend has not been removed Logically, you are investigating model assumptions not marginal effect. W. Zhou (Colorado State University) STAT 540 July 6th, / 50

16 Nonlinearity of Regression Function Plot t i against Ŷi (and X i for multiple linear regressions). Random scatter indicates no serious departure from linearity. Banana indicates departure from linearity. Could fit nonparametric smoother to residual plot to aid detection Example: Curved relationship (KNNL Figure 3.4(a)). Plotting Y vs. X is not nearly as effective for detecting nonlinearity because trend has not been removed Logically, you are investigating model assumptions not marginal effect. W. Zhou (Colorado State University) STAT 540 July 6th, / 50

17 Nonconstant Error Variance Plot t i against Ŷi (and X i for multiple linear regressions). Random scatter indicates no serious departure from constant variance. Could fit nonparametric smoother to this plot to aid detection Funnel indicates non-constant variance. Example: KNNL Figure 3.4(c). Often both nonconstant variance and nonlinearity exist. W. Zhou (Colorado State University) STAT 540 July 6th, / 50

18 Nonindependence of Error Terms Possible causes of nonindependence. Observations collected over time and/or across space. Study done on sets of siblings. Departure from independence. For example, Trend effect (KNNL Figure 3.4(d), 3.8(a)). Cyclical nonindependence (KNNL Figure 3.8(b)). Plot t i against other covariate, such as time. Autocorrelation function plot ( acf() ) W. Zhou (Colorado State University) STAT 540 July 6th, / 50

19 Nonnormality of Error Terms Box plot, histogram, stem-and-leaf plot of t i. QQ (quantile-quantile) plot. 1 Order the residuals: t (1) t (2) t (n). 2 Find the corresponding rankits : z (1) z (2) z (n), where for k = 1,..., n, z (k) = MSE z ( ) k n is an approximation of the expected value of the kth smallest observation in a normal random sample. 3 Plot t (k) against z (k). QQ plot should be approximately linear if normality holds S shape means distribution of residuals has light ( short ) tails Backwards S means heavy tails C or backwards C means skew It is a good idea to examine other possible problems first. W. Zhou (Colorado State University) STAT 540 July 6th, / 50

20 Presence of Outliers An outlier refers to an extreme observation. Some diagnostic methods Box plot of ti. Plot ti against Ŷi (and Xi). ti which are very unlikely compared to the reference t-distribution could be called outliers Modern cluster analysis methods Outliers may convey important information. An error. A different mechanism is at work. A significant discovery. Temptation to throw away outliers because they may strongly influence parameter estimates. Doesn t mean that the model is right and the data point is wrong The data point is right and the model is wrong W. Zhou (Colorado State University) STAT 540 July 6th, / 50

21 Graphical Techniques: Remarks We generally do not plot residuals (t i ) against response (Y i ). Why? Residual plots may provide evidence against model assumptions, but do not generally validate assumptions. For data analysis in practice: Fit model and check model assumptions (an iterative process). Generally do not include residual plots in a report, but include a sentence or two such as Standard diagnostics did not indicate any violations of the assumptions for this model. For this class, always include residual plots for homework assignments so you can learn the methods No magic formulas. Decision may be difficult for small sample size. As much art as science. W. Zhou (Colorado State University) STAT 540 July 6th, / 50

22 Diagnostic Methods Based on Hypothesis Testing Tests for linearity: F test for lack of fit (Section 3.7). Tests for constancy of variance (Section 3.6): Brown-Forsythe test. Breusch-Pagan test. Levene s test. Bartlett s test. Tests for independence (Chapter 12): Runs test. Durbin-Watson test. Tests for normality (Section 3.5). χ 2 test. Kolmogorov-Smirnov test. Tests for outliers (Chapter 10). W. Zhou (Colorado State University) STAT 540 July 6th, / 50

23 F Test for Lack of Fit Residual plots can be used to assess the adequacy of a simple linear regression model. A more formal procedure is a test for lack of fit using pure error. Need repeat groups For a given data set, suppose we have fitted a simple linear regression model and computed regression error sum of squares SSE = n (Y i Ŷi) 2. i=1 These deviations Y i Ŷi could be due to either random fluctuations around the linear line or an inadequate model W. Zhou (Colorado State University) STAT 540 July 6th, / 50

24 Pure Error and Lack of Fit The main idea is to take several observations on Y for the same X, independently, to distinguish the error due to random fluctuations around the linear line and the error due to lack of fit of the simple linear regression model. The variation among the repeated measurements is called pure error. The remaining error variation is called lack of fit Thus we can partition the regression SSE into two parts: SSE = SSPE + SSLF where SSPE = SS Pure Error and SSLF = SS Lack of Fit. Actually, we are comparing a Linear Function with a Simple function. W. Zhou (Colorado State University) STAT 540 July 6th, / 50

25 Pure Error and Lack of Fit One possibility is that pure error is comparatively large and the linear model seems adequate. That is, pure error is a large part of the SSE. The other possibility is that pure error is comparatively small and linear model seems inadequate. That is, pure error is a small part of the regression error and error due to lack of fit is then a large part of the SSE. If the latter case holds, there may be significant evidence of lack of fit. W. Zhou (Colorado State University) STAT 540 July 6th, / 50

26 Notation Models (R notation): Null (N): Y 1, common mean model Linear regression is Reduced (R): Y X, regression model ANOVA is Full (F): Y factor(x), separate mean model Notation: Y ij are the data, where j indexes groups and i indexes individuals. (Sums will be taken over all available indices). Ȳ is the grand mean Ȳ j is the jth group mean Ŷ ij are the fitted values using the regression line. Note that Ȳj are the fitted values under the ANOVA model that fits group means, Y factor(x) W. Zhou (Colorado State University) STAT 540 July 6th, / 50

27 Sums of Squares Recall: All sums are over both i and j except as noted. SST O = (Y ij Ȳ )2 SSR R = (Ŷij Ȳ )2 SSE R = (Y ij Ŷij) 2 SST O = SSR R + SSE R SSP E = SSE F = (Y ij Ȳj) 2 SSLF = (Ȳj Ŷij) 2 = j n j(ȳj Ŷij) 2 SSE R = SSP E + SSLF W. Zhou (Colorado State University) STAT 540 July 6th, / 50

28 LOF ANOVA Table One way to summarize the LOF test is by ANOVA: Source df SS MS Regression 1 SSR SSR/1 Lack of Fit r 2 SSLF MS LF =SSLF/(r 2) Pure Error n r SSPE MS P E =SSPE/(n r) Total n 1 SSTO E(MSP E) = σ 2 and E(MS LF ) = σ 2 + F-test for lack of fit is therefore: r n i (µ i (β 0 +β 1 x i )) 2 i=1 r 2 W. Zhou (Colorado State University) STAT 540 July 6th, / 50

29 LOF as model comparison In fact, the above lack of fit test is doing model comparison our desired model Y X to the potentially better model Y factor(x) which would be required if the linear model fit poorly. Apply the GLT to compare these two models (are they nested?) F LOF = SSE / R SSE F SSEF df R df F df F Notice SSE R SSE F = SSE R SSP E = SSLF and SSE F = SSP E so F LOF = MSLF/MSP E and LOF ANOVA F-test is same as model comparison by F LOF. W. Zhou (Colorado State University) STAT 540 July 6th, / 50

30 Lack of Fit in R anova(reduced.lm) Df Sum Sq Mean Sq F value Pr(>F) x e-09 #< SSR_R Residuals #< SSE_R full.lm=lm(y~factor(x),purerr) anova(full.lm) Df Sum Sq Mean Sq F value Pr(>F) factor(x) e-10 #< SSR_F Residuals #< SSE_F = SSPE anova(reduced.lm,full.lm) Model 1: y ~ x Model 2: y ~ factor(x) Res.Df RSS Df Sum of Sq F Pr(>F) e-06 #< F_LOF (= F_GLT here) Therefore F LOF = (SSE R SSE F )/(df SSE,R df SSE,F ) SSE F /df SSE,F ( )/(14 8) = = (4.3213/6)/(0.0983/8) = /8 W. Zhou (Colorado State University) STAT 540 July 6th, / 50

31 LOF p-value Compare F = with F (6, 8), p-value = P (F (6, 8) ) < There is very strong evidence of a lack of fit. ANVOA Table for LOF Source df SS MS Regression 1 SSR = MSR = Lack of Fit 6 SSLF = MSLF = Pure Error 8 SSPE = MSPE = Total 15 SSTO = W. Zhou (Colorado State University) STAT 540 July 6th, / 50

32 y LOF by hand The data consist of 16 observations with X repeated at several values: X: Y : X: Y : x W. Zhou (Colorado State University) STAT 540 July 6th, / 50

33 Computing SS Pure Error by hand There are 5 repeat groups out of 8 groups: i X Y n i Compute SSPE as (Y i1 Ȳ1) 2 + (Y i2 Ȳ2) 2 + (Y i3 Ȳ3) 2 i=1 3 (Y i4 Ȳ4) 2 + i=1 i=1 i=1 2 (Y i5 Ȳ5) 2 i=1 W. Zhou (Colorado State University) STAT 540 July 6th, / 50

34 Computing SS Pure Error by hand Y ij : the ith observation for the jth group. n j : the number of observations in the jth group. c: the number of repeat groups. In general, SSP E = df P E = n c j (Y ij Ȳj) 2, j=1 i=1 c (n j 1) = n r j=1 In the example, n 1 = 3, n 2 = 2, n 3 = 3, n 4 = 3, n 5 = 2, c = 5 and hence df PE is = 8 and SSPE = W. Zhou (Colorado State University) STAT 540 July 6th, / 50

35 Computing SS Lack of Fit by Hand By subtraction. In the example, SSE = 4.42 from the regression ANOVA table on df = 14. SSP E = on df = 8. Thus SSLF = SSE SSP E = = df LF = 14 8 = 6. Note SSLF = c n j (Ȳj Ŷij) 2 j=1 i=1 where k denotes the number of groups (here c = 8). W. Zhou (Colorado State University) STAT 540 July 6th, / 50

36 Lack of Fit Test by Hand For testing H 0 : No lack of fit (here, simple linear regression is adequate) versus H a : Lack of fit (here, simple linear regression is inadequate). Use the fact that, under H 0, F = SSLF/df LF SSP E/df P E F (df LF, df P E ). In the example, the observed F test statistic is F = 4.332/ /8 = W. Zhou (Colorado State University) STAT 540 July 6th, / 50

37 Lack of Fit Test: Remarks Note that the R 2 = 93.2% is high, but according to the LOF test, the model is still inadequate. Possible remedy is to use polynomial regression. The repeats need to be independent measurements. If there are no repeats at all, some consider approximate repeat groups by binning the X s close to one another into groups. In this case, the LOF test is an approximate test. W. Zhou (Colorado State University) STAT 540 July 6th, / 50

38 Remedial Measures For simple linear regression, consider two basic approaches. Abandon the current model and look for a better one. Transform the data so that the simple linear regression model is appropriate. Nonlinearity of regression function: Transformation (X or Y or both) Polynomial regression. Nonlinear regression. Nonconstancy of error variance: Transformation (Y) Weighted least squares. W. Zhou (Colorado State University) STAT 540 July 6th, / 50

39 Remedial Measures Simultaneous nonlinearity and nonconstant variance Sometimes works to... Transform Y to fix variance then Transform X to fix linearity Nonindependence of error terms: First-order differencing. Models with correlated error terms. Nonnormality of error terms. Transformation. Generalized linear models. Presence of outliers: Removal of outliers (with extreme caution). Analyze both with and without outliers Robust estimation. New model. W. Zhou (Colorado State University) STAT 540 July 6th, / 50

40 Example: bacteria Data consist of number of surviving bacteria after exposure to X-rays for different periods of time. Let t denote time (in number of 6-minute intervals) and let n denote number of surviving bacteria (in 100s) after exposure to X-rays for t time. t n t n W. Zhou (Colorado State University) STAT 540 July 6th, / 50

41 Example: bacteria n We fit a simple linear regression model to the data. Residuals vs Fitted Residuals t Fitted values It appears that the linear-line model is not adequate. The assumption of correct model seems to be violated. What to do? W. Zhou (Colorado State University) STAT 540 July 6th, / 50

42 Example: bacteria Consider nonlinear model (why nonlinear?) n t = n 0 e βt, where t is time, n t is the number of bacteria at time t, n 0 is the number of bacteria at t = 0, and β < 0 is a decay rate. Take natural logs of both sides of the model, we have, by setting α = ln(n 0 ). ln(n t ) = ln(n 0 ) + ln(e βt ) = ln(n 0 ) + βt = α + βt, That is, we log-transformed n t and the result is a usual linear-line model! W. Zhou (Colorado State University) STAT 540 July 6th, / 50

43 Example: bacteria The transformed data are as follows. t ln(n) t ln(n) Residuals vs Fitted log(n) Residuals t Fitted values W. Zhou (Colorado State University) STAT 540 July 6th, / 50

44 Example: bacteria Based on the log-transformed counts, we can fit the model to get the LS estimates ˆα = 6.029, ˆβ = 0.222, sln(n) t = , R 2 = Inference for β is straightforward. [Unit? Interpretation?] Inference for α is straightforward. [Unit? Interpretation?] Inference for n 0 is not straightforward. Since α = ln(n0), n 0 = e α. Given ˆα = 6.029, we obtain an estimate of n0 ˆn 0 = e ˆα = But the estimate ˆn0 is biased (i.e., E(ˆn 0) n 0). W. Zhou (Colorado State University) STAT 540 July 6th, / 50

45 Transformation: Remarks The purpose of transformation is to meet the assumptions of the linear regression analysis. Linear Models Nonlinear Models (L1) Y = β 0 + β 1 X (L2) Y = β 0 + β 1 X 2 (L3) Y = β 0 + β 1 e X (N1) Y = αe βx ln(y ) = ln(α) + βx (N2) Y = αx β ln(y ) = ln(α) + β ln(x) (N3) Y = α + e βx In (L2) and (L3), the relationship between X and Y is not linear, but the model is linear in the parameters and hence the model is linear. In (N1) and (N2), the model is nonlinear, but can be log-transformed to a linear model (i.e., linearized). In (N3), the model is nonlinear and cannot be linearized. W. Zhou (Colorado State University) STAT 540 July 6th, / 50

46 Transformation: Remarks Transformation could be for X, or Y, or both. Common transformations are log 10 {Z}, ln{z}, Z, and Z 2. Less common transformations include 1/Z, 1/Z 2, arcsin Z, and log 2 {Z}. Another use of transformation is to control unequal variance. Example: If Y are counts, then often larger variances are associated with larger counts. In this case, Y transformation can help stabilize variance. Example: If Y are proportions (of successes among trials), then V ar(y ) = π(1 π)/n, which depends on the true success rate π. Residual plots would reveal the unequal variance problem. In this case, arcsin( Y ) transformation can help stabilize variance. Rule of thumbs: positive data - use log Y ; data are proportions - use arcsin Y ; data are counts - use Y W. Zhou (Colorado State University) STAT 540 July 6th, / 50

47 Transformation: Remarks Ideally, theory should dictate what transformation to use as in the bacteria count example. But in practice, transformation is usually chosen empirically. Transforming Y can affect both linearity and variance homogeneity, but transforming X can affect only linearity. Sometimes solving one problem can create another. For example, transforming Y to stabilize variance causes curved relationship. Usually it is best to start with a simple transformation and experiment. It happens often that a simple transformation allows the use of the linear regression model. When needed, use more complicated methods such as nonlinear regression. Transformations are useful not only for simple linear regression, but also for multiple linear regression and design of experiment. W. Zhou (Colorado State University) STAT 540 July 6th, / 50

48 Variance Stabilizing Transformations If Var(Y ) = g(e(y )), then a variance stabilizing transform is h(y) (g(z)) 1/2 dz Example: if var mean, then g(z) = z and h(y) = y if var mean 2, then g(z) = z 2 and h(y) = ln(y) if var mean(1-mean), then g(z) = z(1 z), then h(y) = sin 1 y W. Zhou (Colorado State University) STAT 540 July 6th, / 50

49 Box-Cox Transformation Consider a transformation ladder for Z = X or Y. λ Z 1 1 Z 2 Z log(z) Z Z Z 2 1 Z Moving up or down the ladder (starting at 1) changes the residual plots in a consistent manner. Use only these choices for manual search, too! Box-Cox method is a formal approach to selecting λ to transform Y. The idea is to consider Y λ i = β 0 + β 1 X i + ɛ i. Estimate λ (along with β 0, β 1, σ 2 ) using maximum likelihood. Box-Cox method may give ˆλ = Round to the nearest interpretable value ˆλ = 0.5. If ˆλ 1, do not transform. In R, boxcox gives a 95% CI for λ. Choose an interpretable ˆλ within the CI. W. Zhou (Colorado State University) STAT 540 July 6th, / 50

50 Box-Cox Transformation Box-Cox family of transformations Z = Y λ I(λ 0) + ln(y ) Estimate λ using maximum likelihood or use the variance - mean relationship Using variance-mean relationship to estimate g(y ) works for 2 groups or many groups (ANOVA presented in near future) Compute Ȳ and S Y for each group Regress log(sy ) on log(ȳ ) and estimate the slope β Use transformation Y λ with λ = 1 β W. Zhou (Colorado State University) STAT 540 July 6th, / 50

51 model for variability σ = Var(Y ) = kµ β Box-Cox Transformation When does this work or Var(Y ) = σ 2 = [kµ β ] 2 := g(µ) Use the delta method to obtain the transformation: Z = g(y ) = Y λ Consider the Taylor expansion Z = g(y ) g(µ) + (Y µ)g (µ) then an approximation for Var(g(µ)) is Var(g(Y )) [g (µ)] 2 Var(Y ) which is the famous Delta Method W. Zhou (Colorado State University) STAT 540 July 6th, / 50

52 Box-Cox Transformation For Z = g(y ) = Y λ we have From the Delta method dz dy = g (Y ) = λy λ 1 Var(Z) = (λµ λ 1 ) 2 (kµ β ) 2 = k 2 λ 2 µ 2(λ 1+β) When λ = 1 β, Var(Z) k 2 λ 2 is approximately constant Analyze the transformed data: e.g., Z 11 = ln(y 11 ), Z 12 = ln(y 12 ),..., Z 2,n2 = ln(y 2,n2 ) Usually round to a reasonable value if β is not integer as discussed above Caution: Some researchers estimate the slope from the regression of ln(var(y )) on ln(ȳ ) then use the transform Z = Y λ with λ = 1 β/2. W. Zhou (Colorado State University) STAT 540 July 6th, / 50

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

Diagnostics and Remedial Measures

Diagnostics and Remedial Measures Diagnostics and Remedial Measures Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Diagnostics and Remedial Measures 1 / 72 Remedial Measures How do we know that the regression

More information

Chapter 3. Diagnostics and Remedial Measures

Chapter 3. Diagnostics and Remedial Measures Chapter 3. Diagnostics and Remedial Measures So far, we took data (X i, Y i ) and we assumed Y i = β 0 + β 1 X i + ǫ i i = 1, 2,..., n, where ǫ i iid N(0, σ 2 ), β 0, β 1 and σ 2 are unknown parameters,

More information

3. Diagnostics and Remedial Measures

3. Diagnostics and Remedial Measures 3. Diagnostics and Remedial Measures So far, we took data (X i, Y i ) and we assumed where ɛ i iid N(0, σ 2 ), Y i = β 0 + β 1 X i + ɛ i i = 1, 2,..., n, β 0, β 1 and σ 2 are unknown parameters, X i s

More information

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32 Model Diagnostics:

More information

Remedial Measures, Brown-Forsythe test, F test

Remedial Measures, Brown-Forsythe test, F test Remedial Measures, Brown-Forsythe test, F test Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 7, Slide 1 Remedial Measures How do we know that the regression function

More information

STAT5044: Regression and Anova

STAT5044: Regression and Anova STAT5044: Regression and Anova Inyoung Kim 1 / 49 Outline 1 How to check assumptions 2 / 49 Assumption Linearity: scatter plot, residual plot Randomness: Run test, Durbin-Watson test when the data can

More information

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij = K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing

More information

STAT 571A Advanced Statistical Regression Analysis. Chapter 3 NOTES Diagnostics and Remedial Measures

STAT 571A Advanced Statistical Regression Analysis. Chapter 3 NOTES Diagnostics and Remedial Measures STAT 571A Advanced Statistical Regression Analysis Chapter 3 NOTES Diagnostics and Remedial Measures 2015 University of Arizona Statistics GIDP. All rights reserved, except where previous rights exist.

More information

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model Outline 1 Multiple Linear Regression (Estimation, Inference, Diagnostics and Remedial Measures) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression

More information

Concordia University (5+5)Q 1.

Concordia University (5+5)Q 1. (5+5)Q 1. Concordia University Department of Mathematics and Statistics Course Number Section Statistics 360/1 40 Examination Date Time Pages Mid Term Test May 26, 2004 Two Hours 3 Instructor Course Examiner

More information

The Model Building Process Part I: Checking Model Assumptions Best Practice

The Model Building Process Part I: Checking Model Assumptions Best Practice The Model Building Process Part I: Checking Model Assumptions Best Practice Authored by: Sarah Burke, PhD 31 July 2017 The goal of the STAT T&E COE is to assist in developing rigorous, defensible test

More information

One-way ANOVA Model Assumptions

One-way ANOVA Model Assumptions One-way ANOVA Model Assumptions STAT:5201 Week 4: Lecture 1 1 / 31 One-way ANOVA: Model Assumptions Consider the single factor model: Y ij = µ + α }{{} i ij iid with ɛ ij N(0, σ 2 ) mean structure random

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO.

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO. Analysis of variance approach to regression If x is useless, i.e. β 1 = 0, then E(Y i ) = β 0. In this case β 0 is estimated by Ȳ. The ith deviation about this grand mean can be written: deviation about

More information

Unit 10: Simple Linear Regression and Correlation

Unit 10: Simple Linear Regression and Correlation Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for

More information

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) Authored by: Sarah Burke, PhD Version 1: 31 July 2017 Version 1.1: 24 October 2017 The goal of the STAT T&E COE

More information

Lecture 10 Multiple Linear Regression

Lecture 10 Multiple Linear Regression Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable

More information

22s:152 Applied Linear Regression. Take random samples from each of m populations.

22s:152 Applied Linear Regression. Take random samples from each of m populations. 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

STAT Checking Model Assumptions

STAT Checking Model Assumptions STAT 704 --- Checking Model Assumptions Recall we assumed the following in our model: (1) The regression relationship between the response and the predictor(s) specified in the model is appropriate (2)

More information

1 One-way Analysis of Variance

1 One-way Analysis of Variance 1 One-way Analysis of Variance Suppose that a random sample of q individuals receives treatment T i, i = 1,,... p. Let Y ij be the response from the jth individual to be treated with the ith treatment

More information

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation. Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1

More information

Diagnostics can identify two possible areas of failure of assumptions when fitting linear models.

Diagnostics can identify two possible areas of failure of assumptions when fitting linear models. 1 Transformations 1.1 Introduction Diagnostics can identify two possible areas of failure of assumptions when fitting linear models. (i) lack of Normality (ii) heterogeneity of variances It is important

More information

Math 5305 Notes. Diagnostics and Remedial Measures. Jesse Crawford. Department of Mathematics Tarleton State University

Math 5305 Notes. Diagnostics and Remedial Measures. Jesse Crawford. Department of Mathematics Tarleton State University Math 5305 Notes Diagnostics and Remedial Measures Jesse Crawford Department of Mathematics Tarleton State University (Tarleton State University) Diagnostics and Remedial Measures 1 / 44 Model Assumptions

More information

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

Chapter 2. Continued. Proofs For ANOVA Proof of ANOVA Identity. the product term in the above equation can be simplified as n

Chapter 2. Continued. Proofs For ANOVA Proof of ANOVA Identity. the product term in the above equation can be simplified as n Chapter 2. Continued Proofs For ANOVA Proof of ANOVA Identity We are going to prove that Writing SST SSR + SSE. Y i Ȳ (Y i Ŷ i ) + (Ŷ i Ȳ ) Squaring both sides summing over all i 1,...n, we get (Y i Ȳ

More information

Lecture 1: Linear Models and Applications

Lecture 1: Linear Models and Applications Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation

More information

Diagnostics of Linear Regression

Diagnostics of Linear Regression Diagnostics of Linear Regression Junhui Qian October 7, 14 The Objectives After estimating a model, we should always perform diagnostics on the model. In particular, we should check whether the assumptions

More information

REGRESSION DIAGNOSTICS AND REMEDIAL MEASURES

REGRESSION DIAGNOSTICS AND REMEDIAL MEASURES REGRESSION DIAGNOSTICS AND REMEDIAL MEASURES Lalmohan Bhar I.A.S.R.I., Library Avenue, Pusa, New Delhi 110 01 lmbhar@iasri.res.in 1. Introduction Regression analysis is a statistical methodology that utilizes

More information

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X. Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.

More information

Chapter 11 - Lecture 1 Single Factor ANOVA

Chapter 11 - Lecture 1 Single Factor ANOVA Chapter 11 - Lecture 1 Single Factor ANOVA April 7th, 2010 Means Variance Sum of Squares Review In Chapter 9 we have seen how to make hypothesis testing for one population mean. In Chapter 10 we have seen

More information

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing STAT763: Applied Regression Analysis Multiple linear regression 4.4 Hypothesis testing Chunsheng Ma E-mail: cma@math.wichita.edu 4.4.1 Significance of regression Null hypothesis (Test whether all β j =

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

Topic 23: Diagnostics and Remedies

Topic 23: Diagnostics and Remedies Topic 23: Diagnostics and Remedies Outline Diagnostics residual checks ANOVA remedial measures Diagnostics Overview We will take the diagnostics and remedial measures that we learned for regression and

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

F-tests and Nested Models

F-tests and Nested Models F-tests and Nested Models Nested Models: A core concept in statistics is comparing nested s. Consider the Y = β 0 + β 1 x 1 + β 2 x 2 + ǫ. (1) The following reduced s are special cases (nested within)

More information

STAT 540: Data Analysis and Regression

STAT 540: Data Analysis and Regression STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State

More information

Statistics for Managers using Microsoft Excel 6 th Edition

Statistics for Managers using Microsoft Excel 6 th Edition Statistics for Managers using Microsoft Excel 6 th Edition Chapter 13 Simple Linear Regression 13-1 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Advanced Regression Topics: Violation of Assumptions

Advanced Regression Topics: Violation of Assumptions Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, 2005 Applied Regression Analysis Lecture #7-2/15/2005 Slide 1 of 36 Today s Lecture Today s Lecture rapping Up Revisiting residuals.

More information

Applied Econometrics (QEM)

Applied Econometrics (QEM) Applied Econometrics (QEM) based on Prinicples of Econometrics Jakub Mućk Department of Quantitative Economics Jakub Mućk Applied Econometrics (QEM) Meeting #3 1 / 42 Outline 1 2 3 t-test P-value Linear

More information

6. Multiple Linear Regression

6. Multiple Linear Regression 6. Multiple Linear Regression SLR: 1 predictor X, MLR: more than 1 predictor Example data set: Y i = #points scored by UF football team in game i X i1 = #games won by opponent in their last 10 games X

More information

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng.   Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple

More information

Sociology 6Z03 Review II

Sociology 6Z03 Review II Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability

More information

Chapter 16. Simple Linear Regression and dcorrelation

Chapter 16. Simple Linear Regression and dcorrelation Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Ch. 1: Data and Distributions

Ch. 1: Data and Distributions Ch. 1: Data and Distributions Populations vs. Samples How to graphically display data Histograms, dot plots, stem plots, etc Helps to show how samples are distributed Distributions of both continuous and

More information

We like to capture and represent the relationship between a set of possible causes and their response, by using a statistical predictive model.

We like to capture and represent the relationship between a set of possible causes and their response, by using a statistical predictive model. Statistical Methods in Business Lecture 5. Linear Regression We like to capture and represent the relationship between a set of possible causes and their response, by using a statistical predictive model.

More information

Homework 2: Simple Linear Regression

Homework 2: Simple Linear Regression STAT 4385 Applied Regression Analysis Homework : Simple Linear Regression (Simple Linear Regression) Thirty (n = 30) College graduates who have recently entered the job market. For each student, the CGPA

More information

holding all other predictors constant

holding all other predictors constant Multiple Regression Numeric Response variable (y) p Numeric predictor variables (p < n) Model: Y = b 0 + b 1 x 1 + + b p x p + e Partial Regression Coefficients: b i effect (on the mean response) of increasing

More information

PubH 7405: REGRESSION ANALYSIS SLR: DIAGNOSTICS & REMEDIES

PubH 7405: REGRESSION ANALYSIS SLR: DIAGNOSTICS & REMEDIES PubH 7405: REGRESSION ANALYSIS SLR: DIAGNOSTICS & REMEDIES Normal Error RegressionModel : Y = β 0 + β ε N(0,σ 2 1 x ) + ε The Model has several parts: Normal Distribution, Linear Mean, Constant Variance,

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Chapter 1. Linear Regression with One Predictor Variable

Chapter 1. Linear Regression with One Predictor Variable Chapter 1. Linear Regression with One Predictor Variable 1.1 Statistical Relation Between Two Variables To motivate statistical relationships, let us consider a mathematical relation between two mathematical

More information

Regression Models for Quantitative and Qualitative Predictors: An Overview

Regression Models for Quantitative and Qualitative Predictors: An Overview Regression Models for Quantitative and Qualitative Predictors: An Overview Polynomial regression models Interaction regression models Qualitative predictors Indicator variables Modeling interactions between

More information

Outline. Topic 20 - Diagnostics and Remedies. Residuals. Overview. Diagnostics Plots Residual checks Formal Tests. STAT Fall 2013

Outline. Topic 20 - Diagnostics and Remedies. Residuals. Overview. Diagnostics Plots Residual checks Formal Tests. STAT Fall 2013 Topic 20 - Diagnostics and Remedies - Fall 2013 Diagnostics Plots Residual checks Formal Tests Remedial Measures Outline Topic 20 2 General assumptions Overview Normally distributed error terms Independent

More information

Assessing Model Adequacy

Assessing Model Adequacy Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for inferences. In cases where some assumptions are violated, there are

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

Stat 427/527: Advanced Data Analysis I

Stat 427/527: Advanced Data Analysis I Stat 427/527: Advanced Data Analysis I Review of Chapters 1-4 Sep, 2017 1 / 18 Concepts you need to know/interpret Numerical summaries: measures of center (mean, median, mode) measures of spread (sample

More information

Matrix Approach to Simple Linear Regression: An Overview

Matrix Approach to Simple Linear Regression: An Overview Matrix Approach to Simple Linear Regression: An Overview Aspects of matrices that you should know: Definition of a matrix Addition/subtraction/multiplication of matrices Symmetric/diagonal/identity matrix

More information

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013 Applied Regression Chapter 2 Simple Linear Regression Hongcheng Li April, 6, 2013 Outline 1 Introduction of simple linear regression 2 Scatter plot 3 Simple linear regression model 4 Test of Hypothesis

More information

Applied Regression Analysis

Applied Regression Analysis Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

Two-Way Factorial Designs

Two-Way Factorial Designs 81-86 Two-Way Factorial Designs Yibi Huang 81-86 Two-Way Factorial Designs Chapter 8A - 1 Problem 81 Sprouting Barley (p166 in Oehlert) Brewer s malt is produced from germinating barley, so brewers like

More information

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College 1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College Spring 2010 The basic ANOVA situation Two variables: 1 Categorical, 1 Quantitative Main Question: Do the (means of) the quantitative

More information

Mathematics for Economics MA course

Mathematics for Economics MA course Mathematics for Economics MA course Simple Linear Regression Dr. Seetha Bandara Simple Regression Simple linear regression is a statistical method that allows us to summarize and study relationships between

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20 Logistic regression 11 Nov 2010 Logistic regression (EPFL) Applied Statistics 11 Nov 2010 1 / 20 Modeling overview Want to capture important features of the relationship between a (set of) variable(s)

More information

What to do if Assumptions are Violated?

What to do if Assumptions are Violated? What to do if Assumptions are Violated? Abandon simple linear regression for something else (usually more complicated). Some examples of alternative models: weighted least square appropriate model if the

More information

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 Lecture 3: Linear Models Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector of observed

More information

STAT 4385 Topic 06: Model Diagnostics

STAT 4385 Topic 06: Model Diagnostics STAT 4385 Topic 06: Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 1/ 40 Outline Several Types of Residuals Raw, Standardized, Studentized

More information

STAT2012 Statistical Tests 23 Regression analysis: method of least squares

STAT2012 Statistical Tests 23 Regression analysis: method of least squares 23 Regression analysis: method of least squares L23 Regression analysis The main purpose of regression is to explore the dependence of one variable (Y ) on another variable (X). 23.1 Introduction (P.532-555)

More information

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines)

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines) Dr. Maddah ENMG 617 EM Statistics 11/28/12 Multiple Regression (3) (Chapter 15, Hines) Problems in multiple regression: Multicollinearity This arises when the independent variables x 1, x 2,, x k, are

More information

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables Regression Analysis Regression: Methodology for studying the relationship among two or more variables Two major aims: Determine an appropriate model for the relationship between the variables Predict the

More information

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College 1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College An example ANOVA situation Example (Treating Blisters) Subjects: 25 patients with blisters Treatments: Treatment A, Treatment

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20 Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent

More information

Lec 3: Model Adequacy Checking

Lec 3: Model Adequacy Checking November 16, 2011 Model validation Model validation is a very important step in the model building procedure. (one of the most overlooked) A high R 2 value does not guarantee that the model fits the data

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent

More information

Inference for the Regression Coefficient

Inference for the Regression Coefficient Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression line. We can shows that b 0 and b 1 are the unbiased estimates

More information

Basic Business Statistics 6 th Edition

Basic Business Statistics 6 th Edition Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based

More information

STAT22200 Spring 2014 Chapter 8A

STAT22200 Spring 2014 Chapter 8A STAT22200 Spring 2014 Chapter 8A Yibi Huang May 13, 2014 81-86 Two-Way Factorial Designs Chapter 8A - 1 Problem 81 Sprouting Barley (p166 in Oehlert) Brewer s malt is produced from germinating barley,

More information

Chapter 4: Regression Models

Chapter 4: Regression Models Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Q = (Y i β 0 β 1 X i1 β 2 X i2 β p 1 X i.p 1 ) 2, which in matrix notation is Q = (Y Xβ) (Y

More information

Design & Analysis of Experiments 7E 2009 Montgomery

Design & Analysis of Experiments 7E 2009 Montgomery 1 What If There Are More Than Two Factor Levels? The t-test does not directly apply ppy There are lots of practical situations where there are either more than two levels of interest, or there are several

More information

Simple Linear Regression for the Advertising Data

Simple Linear Regression for the Advertising Data Revenue 0 10 20 30 40 50 5 10 15 20 25 Pages of Advertising Simple Linear Regression for the Advertising Data What do we do with the data? y i = Revenue of i th Issue x i = Pages of Advertisement in i

More information

Section 4.6 Simple Linear Regression

Section 4.6 Simple Linear Regression Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval

More information

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on

More information

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference. Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences

More information

Regression diagnostics

Regression diagnostics Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model

More information

Formulary Applied Econometrics

Formulary Applied Econometrics Department of Economics Formulary Applied Econometrics c c Seminar of Statistics University of Fribourg Formulary Applied Econometrics 1 Rescaling With y = cy we have: ˆβ = cˆβ With x = Cx we have: ˆβ

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Correlation Analysis

Correlation Analysis Simple Regression Correlation Analysis Correlation analysis is used to measure strength of the association (linear relationship) between two variables Correlation is only concerned with strength of the

More information

Remedial Measures Wrap-Up and Transformations-Box Cox

Remedial Measures Wrap-Up and Transformations-Box Cox Remedial Measures Wrap-Up and Transformations-Box Cox Frank Wood October 25, 2011 Last Class Graphical procedures for determining appropriateness of regression fit - Normal probability plot Tests to determine

More information

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Multiple Regression Analysis. Part III. Multiple Regression Analysis Part III Multiple Regression Analysis As of Sep 26, 2017 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant

More information

Confidence Interval for the mean response

Confidence Interval for the mean response Week 3: Prediction and Confidence Intervals at specified x. Testing lack of fit with replicates at some x's. Inference for the correlation. Introduction to regression with several explanatory variables.

More information

Applied linear statistical models: An overview

Applied linear statistical models: An overview Applied linear statistical models: An overview Gunnar Stefansson 1 Dept. of Mathematics Univ. Iceland August 27, 2010 Outline Some basics Course: Applied linear statistical models This lecture: A description

More information

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6 STA 8 Applied Linear Models: Regression Analysis Spring 011 Solution for Homework #6 6. a) = 11 1 31 41 51 1 3 4 5 11 1 31 41 51 β = β1 β β 3 b) = 1 1 1 1 1 11 1 31 41 51 1 3 4 5 β = β 0 β1 β 6.15 a) Stem-and-leaf

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information