One-way ANOVA Model Assumptions

Size: px
Start display at page:

Download "One-way ANOVA Model Assumptions"

Transcription

1 One-way ANOVA Model Assumptions STAT:5201 Week 4: Lecture 1 1 / 31

2 One-way ANOVA: Model Assumptions Consider the single factor model: Y ij = µ + α }{{} i ij iid with ɛ ij N(0, σ 2 ) mean structure random Here, the assumptions are coming from the errors: 1 Normally distributed 2 Constant variance 3 Independent We will use the estimated errors ˆɛ ij = Y ij Ŷij or residuals from the model to check the assumptions. (Though internally studentized residuals or externally studentized residuals can also be used.) 4 Adequacy of the model is about the mean structure, and here, we are fitting the most complex mean structure for this single factor study, which is a separate mean for each group, so in the full model, this shouldn t be a concern. 2 / 31

3 One-way ANOVA: Model Assumptions Consider the single factor model: Y ij = µ + α }{{} i ij iid with ɛ ij N(0, σ 2 ) mean structure random If the model is correct, our inferences are good. If the assumptions are not true, our inferences may not be valid. Confidence intervals might not cover at the stated level. p-values are not necessarily valid. Type I error rates could be larger (or smaller) than stated. Assumptions need to be checked for the validity of the tests. 3 / 31

4 One-way ANOVA: Model Assumptions Consider the single factor model: Y ij = µ + α }{{} i ij iid with ɛ ij N(0, σ 2 ) mean structure random Some procedures work reasonably well even if some of the assumptions are violated (we ll explore this for the two-sample t-test in homework). This is called robustness of validity. More mild violations are of course better than more extreme violations, with respect to validity. 4 / 31

5 One-way C(.dl~ ANOVA: Checking Constant Variance c~ VfVt/ov~ Checking constant variance Plot residuals vs. fitted values PLo-/ 01- rm/~ V5, /I/J V~. 7f f'i1 ozid! OK I s.~ 5 /'a.d~ :;ea IIPi I/;:; If the model is OK for constant variance, then this plot should show a random 1jJN'ds scattering ~~ of points ~ above ~Iow andfk- belowvp[-/-'w the reference r~ line at a horizontal 0, as on the left below. The right one shows nonconstant variance. ;;~ a/~ a. hoy'" 2-crJd 0, r. e.. ~ d.,... " 0,, " #.. ' r:,. r ',, +;.\\--J v~ '" y LlVheJ ~ /J ~ pet 4~VY':'\ (=? v"" itl VIee Jl/2 f.v.d.s 6Yl ~ <'am ) Megaphone pattern violation (variance depends on mean) 5 / 31

6 One-way ANOVA: Checking Constant Variance There are some statistical tests that will perform a hypothesis test for the equality of variances across groups. H 0 : σ 2 i is equal for all i. Levene s Test Brown-Forsythe Test (a.k.a. modified Levene s test) Example (SAS: levene s test) proc glm data=upsit plots=diagnostics; class agegroup; model smell = agegroup; means agegroup / hovtest=levene; run; For the other homogeneity of variance (HOV) tests in SAS, just google SAS HOVTEST. You can use HOVTEST=BF for brown-forsythe test. 6 / 31

7 R-Square Coeff Var Root MSE smell Mean One-way ANOVA: Checking Constant Variance Source DF Type I SS Mean Square F Value Pr > F agegroup <.0001 Example Source (SAS: DF Type III SSlevene s Mean Square F Value Pr > test) F agegroup <.0001 Distribution of smell 1.4 F Prob > F < The SAS System 12:31 Thursda smell 1.0 The GLM Procedure 0.8 Levene's Test for Homogeneity of smell Variance ANOVAof Squared Deviations from Group Means 0.6 Source DF Sum of Squares Mean Square F Value Pr > F agegroup < agegroup Error The plot suggests we have nonconstant variance, and the null hypothesis of H 0 : σ 1 = σ 2 = σ 5 is strongly rejected by Levene s test. Oehlert has some issues with using statistical tests for nonconstant variance due to sensitivity to non-normality, but you may still be asked about these tests. 7 / 31

8 One-way ANOVA: Model Adequacy Plot residuals vs. fitted values The residuals vs. fitted plot can also give you some information about the adequacy of the model in a multi-factor ANOVA (we ll see this later in multi-factor factorials, plot shown below). For instance, if you are missing an important interaction term in the mean structure, then this plot will often display a curved trend. 8 / 31

9 One-way ANOVA: Dealing with Nonconstant Variance Dealing with nonconstant variance When the variance depends on the mean (like in the megaphone pattern), the usual approach is to apply a transformation to the response variable. These are called variance-stabilizing transformations. Suppose Var(y) mean or Var(y) µ I want a transformation of y, or function h(y), such that Var[h(y)]=constant. Consider a Taylor Series expansion of h around µ 9 / 31

10 One-way ANOVA: Dealing with Nonconstant Variance We now have the first-order approximation: Var[h(y)] Var[h(µ) + h (µ)(y µ)] = [h (µ)] 2 Var(y) = c 0 [h (µ)] 2 µ {as Var(y) µ} And we want Var[h(y)] to be a constant. So, set [h (µ)] 2 µ equal to a constant and solve for h. 10 / 31

11 One-way ANOVA: Dealing with Nonconstant Variance Setting equal to a constant and solving for the unknown h: [h (µ)] 2 µ =constant constant h (µ) = µ 1 h(µ) =c 1 dµ µ 1/2 h(µ) =c 2 µ So, if Var(y) µ, then use a square root transformation to achieve constant variance. 11 / 31

12 One-way ANOVA: Dealing with Nonconstant Variance This is built on the Delta Method. If Var(y) mean, use h(y) = y. If Var(y) mean 2, use h(y) = ln(y) Many times it s hard to tell from the data what the specific relationship between the variance and the mean is, so a trial-and-error process is applied. Other possibilities if the spread increases with µ: h(y) = 1 y. h(y) = log 10 (y) Other possibilities if the spread decreases with µ: h(y) = y 2. h(y) = y / 31

13 One-way ANOVA: Dealing with Nonconstant Variance The Box-Cox procedure chooses a transformation based on the observed data. The λ parameter dictated the transformation. The following form for the transformation is suggested to create the new outcome variable y (λ) : y (λ) = { y λ 1 λ when λ 0 log(y) when λ = 0 Though λ is continuous, in practice we usually use a convenient λ that is near to the optimal, like 0.5 or 0, etc. 13 / 31

14 One-way ANOVA: Dealing with Nonconstant Variance Example (SAS: Box-Cox for 2-factor factorial, perceived as single factor or a superfactor. Weeks has 5 levels, Water has 2 levels.) proc transreg data=germ; model boxcox(germination)=class(superfactor); run; A transformation using λ = 0.25 is suggested, but the convenient λ = 0.5 looks to be in the confidence interval (we will check constant variance after the transformation). 14 / 31

15 One-way ANOVA: Dealing with Nonconstant Variance Example (R: Box-Cox for 2-factor factorial, perceived as single factor or a superfactor. Weeks has 5 levels, Water has 2 levels.) > library(mass) > bc.fit <- boxcox(germination as.factor(superfactor)) > bc.fit 15 / 31

16 One-way ANOVA: Dealing with Nonconstant Variance So, if we know the proportional relationship between the variance and the mean, then we can analytically find an appropriate transformation to achieve constant variance (or near constant variance). When we do not know the relationship, then we can apply the Box-Cox transformation to give us a suggestion of an appropriate transformation. In practice, if I observe a nonconstant variance that can be corrected through transformation (not all of them are), I mostly see variance increasing with the mean, and I just try a square-root or log transformation right away (or log(y + 1) if there are zeros.). REMINDER: In a two-sample t-test with nonconstant variance, we do have an availalbe method called Welch s Approximate t or or Welch-Satterthwaite t. 16 / 31

17 One-way ANOVA: Checking Normality Checking Normality This is usually done by plotting a normal probability plot or normal QQ-plot. If the data were generated from a normal distribution, then the normal probability plot will show the data points falling approximately along the diagonal reference line (this is not a best-fit line, it simply connects the 25th and 75th percentile points). 17 / 31

18 One-way ANOVA: Checking Normality Chi-squared, right-skewed Chi-squared (df=2) Normal Q-Q Plot Frequency Sample Quantiles rchisq(1000, 2) Theoretical Quantiles Uniform, thin tails Uniform(0,1) Normal Q-Q Plot Frequency Sample Quantiles y Theoretical Quantiles 18 / 31

19 One-way ANOVA: Checking Normality There are a number of statistical tests that test for non-normality: Anderson-Darling test Shapiro-Wilk test Many others One issue with normality tests is that as your N gets larger, you start to get a lot of power for detecting very small deviations from normality. In small samples, you ll probably never reject. In practice, I feel like the visual normal probability plot is most useful. But clients will commonly ask how to perform certain tests, such as these, in software. 19 / 31

20 One-way ANOVA: Dealing with Non-normality Dealing with Non-normality Try a transformation. If there s an outlier that a transformation does not fix, do a sensitivity analysis where you perform the analysis with and without the outlier. These can both be reported. If the important items do not change (e.g. significance) the outlier is perhaps not a big issue. DO NOT simply remove an outlier because it is unusual!! Not OK. (See link on webpage). And in fact, that data point could be telling you something very interesting. Try a non-parametric test: Randomization test (pretty general) Wicoxon rank-sum test/mann-whitney test (for 2-sample t-test) Wilcoxon signed-rank test (for paired t-test) Kruskal-Wallis test (for 1-way ANOVA, extends Mann-Whitney test) Freidman test (for RCBD) 20 / 31

21 One-way ANOVA: Checking Independence Checking Independence Many times, a client brings the data to you and you have to rely on their description of the data collection, and that independence holds. Ideally, it should be part of the design in terms of randomly assigning treatments to EUs and randomly assigning the order of the runs. If you happen to know the order in which data were collected, or the time the observations were collected, it s a good idea to check for correlation in the residuals with respect to order or time (e.g. plot resids vs. time and/or resids vs. order). 21 / 31

22 One-way ANOVA: Checking Independence If a pattern appears in the plots, then these items are sources of variation that can be added to the model (though it makes the model a bit more complex). Above we see sequential observations over time, and an observable trend. 22 / 31

23 One-way ANOVA: Checking Independence The Durbin-Watson test statistic can be used to check for time dependence or serial dependence. The residuals e i are used to calculate DW: DW = n 1 i (e i e i 1 ) 2 n i e 2 i Independent data tend to have DW around 2. A positive correlation makes DW smaller and negative correlation makes it bigger. If DW gets as low as 1.5 or as high as 2.5, you should start worrying about time correlation and it s affect on the inference. 23 / 31

24 One-way ANOVA: Checking Independence Example (Checking residuals for time correlation) The SAS data set diags contains a vector of residuals under the column name resid. Below, nothing is listed as a predictor in the model statement because only an intercept is used. proc reg data=diags; model resid= /DW; The SAS System 14:15 Thursday, February 1, run; The REG Procedure Model: MODEL1 Dependent Variable: Resid Residual Durbin-Watson D Number of Observations 48 1st Order Autocorrelation An approximate 95% CI for ρ is ± 2 1 n or ± 0.289, so there appears to be correlation in the residuals over time. 24 / 31

25 One-way ANOVA: Dealing with Dependence Another type of correlation is a spatial correlation, which could be checked using a variogram. If there is some kind of non-independence, we should incorporate this into our model. Perhaps there is time-correlation or spatial-correlation, and we have models that will incorporate this correlation structure. If there are repeated measures on a single subject, then this also represents correlation (within the observations on a person), and we can incorporate that into our model. If the residuals are non-independent because you were missing an important factor, then we can include that factor in the model. 25 / 31

26 Importance of Assumption Violations Biggest Issue: Non-independence Standard errors for treatments can be biased, and this can greatly affect the type I error rate of our test. Next Biggest Issue: Nonconstant variance If you have balanced data, then the affect on p-values is potentially small. For unbalanced data, the error rates can be greatly affected. Smallest Issue: Non-normality If you have moderate non-normality, the p-values are only slightly affected. If it s very non-normal, inference can be affected. Similarly, one very strong outlier can greatly affect the results. Again, this will have the least impact on error rates in balanced data. 26 / 31

27 Checking Assumptions: Example Example (Response time for circuit types) Returning to our previous 1-way ANOVA example to check assumptions... Three different types of circuit are investigated for response time in milliseconds. Fifteen are completed in a balanced CRD with the single factor of Type (1,2,3). Circuit Type Response Time From D.C Montgomery (2005). Design and Analysis of Experiments. Wiley:USA 27 / 31

28 Checking Assumptions: Example Example (Response time for circuit types) Normality looks violated. 28 / 31

29 Checking Assumptions: Example Example (Response time for circuit types) We ll apply the natural log-transformation and perform the 1-way ANVOA on the transformed response. 29 / 31

30 Checking Assumptions: Example Example (Response time for circuit types) Constant variance seems to be worse here. We will go back and try a nonparametric test on the original data. 30 / 31

31 Checking Assumptions: Example Example (Response time for circuit types) Perform a 1-way ANOVA using a nonparametric test: Kruskal-Wallis. Kruskal-Wallis Test Chi-Square DF 2 Asymptotic Pr > Chi-Square Exact Pr >= Chi-Square Reject H 0 : α i = 0 for all i, where H A : at least one group different 31 / 31

Outline. Topic 20 - Diagnostics and Remedies. Residuals. Overview. Diagnostics Plots Residual checks Formal Tests. STAT Fall 2013

Outline. Topic 20 - Diagnostics and Remedies. Residuals. Overview. Diagnostics Plots Residual checks Formal Tests. STAT Fall 2013 Topic 20 - Diagnostics and Remedies - Fall 2013 Diagnostics Plots Residual checks Formal Tests Remedial Measures Outline Topic 20 2 General assumptions Overview Normally distributed error terms Independent

More information

Week 7.1--IES 612-STA STA doc

Week 7.1--IES 612-STA STA doc Week 7.1--IES 612-STA 4-573-STA 4-576.doc IES 612/STA 4-576 Winter 2009 ANOVA MODELS model adequacy aka RESIDUAL ANALYSIS Numeric data samples from t populations obtained Assume Y ij ~ independent N(μ

More information

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij = K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing

More information

One-way ANOVA (Single-Factor CRD)

One-way ANOVA (Single-Factor CRD) One-way ANOVA (Single-Factor CRD) STAT:5201 Week 3: Lecture 3 1 / 23 One-way ANOVA We have already described a completed randomized design (CRD) where treatments are randomly assigned to EUs. There is

More information

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS Ravinder Malhotra and Vipul Sharma National Dairy Research Institute, Karnal-132001 The most common use of statistics in dairy science is testing

More information

Nonparametric Statistics. Leah Wright, Tyler Ross, Taylor Brown

Nonparametric Statistics. Leah Wright, Tyler Ross, Taylor Brown Nonparametric Statistics Leah Wright, Tyler Ross, Taylor Brown Before we get to nonparametric statistics, what are parametric statistics? These statistics estimate and test population means, while holding

More information

The Model Building Process Part I: Checking Model Assumptions Best Practice

The Model Building Process Part I: Checking Model Assumptions Best Practice The Model Building Process Part I: Checking Model Assumptions Best Practice Authored by: Sarah Burke, PhD 31 July 2017 The goal of the STAT T&E COE is to assist in developing rigorous, defensible test

More information

22s:152 Applied Linear Regression. Take random samples from each of m populations.

22s:152 Applied Linear Regression. Take random samples from each of m populations. 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

Lecture 4. Checking Model Adequacy

Lecture 4. Checking Model Adequacy Lecture 4. Checking Model Adequacy Montgomery: 3-4, 15-1.1 Page 1 Model Checking and Diagnostics Model Assumptions 1 Model is correct 2 Independent observations 3 Errors normally distributed 4 Constant

More information

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) Authored by: Sarah Burke, PhD Version 1: 31 July 2017 Version 1.1: 24 October 2017 The goal of the STAT T&E COE

More information

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

Topic 23: Diagnostics and Remedies

Topic 23: Diagnostics and Remedies Topic 23: Diagnostics and Remedies Outline Diagnostics residual checks ANOVA remedial measures Diagnostics Overview We will take the diagnostics and remedial measures that we learned for regression and

More information

df=degrees of freedom = n - 1

df=degrees of freedom = n - 1 One sample t-test test of the mean Assumptions: Independent, random samples Approximately normal distribution (from intro class: σ is unknown, need to calculate and use s (sample standard deviation)) Hypotheses:

More information

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont. TCELL 9/4/205 36-309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted

More information

STAT5044: Regression and Anova

STAT5044: Regression and Anova STAT5044: Regression and Anova Inyoung Kim 1 / 49 Outline 1 How to check assumptions 2 / 49 Assumption Linearity: scatter plot, residual plot Randomness: Run test, Durbin-Watson test when the data can

More information

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression 36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 22, 2015 Lecture 4: Linear Regression TCELL Simple Regression Example Male black wheatear birds carry stones to the nest as a form

More information

Exam details. Final Review Session. Things to Review

Exam details. Final Review Session. Things to Review Exam details Final Review Session Short answer, similar to book problems Formulae and tables will be given You CAN use a calculator Date and Time: Dec. 7, 006, 1-1:30 pm Location: Osborne Centre, Unit

More information

Non-parametric (Distribution-free) approaches p188 CN

Non-parametric (Distribution-free) approaches p188 CN Week 1: Introduction to some nonparametric and computer intensive (re-sampling) approaches: the sign test, Wilcoxon tests and multi-sample extensions, Spearman s rank correlation; the Bootstrap. (ch14

More information

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c Inference About the Slope ffl As with all estimates, ^fi1 subject to sampling var ffl Because Y jx _ Normal, the estimate ^fi1 _ Normal A linear combination of indep Normals is Normal Simple Linear Regression

More information

IX. Complete Block Designs (CBD s)

IX. Complete Block Designs (CBD s) IX. Complete Block Designs (CBD s) A.Background Noise Factors nuisance factors whose values can be controlled within the context of the experiment but not outside the context of the experiment Covariates

More information

SEVERAL μs AND MEDIANS: MORE ISSUES. Business Statistics

SEVERAL μs AND MEDIANS: MORE ISSUES. Business Statistics SEVERAL μs AND MEDIANS: MORE ISSUES Business Statistics CONTENTS Post-hoc analysis ANOVA for 2 groups The equal variances assumption The Kruskal-Wallis test Old exam question Further study POST-HOC ANALYSIS

More information

Hypothesis testing, part 2. With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal

Hypothesis testing, part 2. With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal Hypothesis testing, part 2 With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal 1 CATEGORICAL IV, NUMERIC DV 2 Independent samples, one IV # Conditions Normal/Parametric Non-parametric

More information

Nonparametric tests. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 704: Data Analysis I

Nonparametric tests. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 704: Data Analysis I 1 / 16 Nonparametric tests Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I Nonparametric one and two-sample tests 2 / 16 If data do not come from a normal

More information

Lecture 3: Inference in SLR

Lecture 3: Inference in SLR Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals

More information

Analysis of variance and regression. April 17, Contents Comparison of several groups One-way ANOVA. Two-way ANOVA Interaction Model checking

Analysis of variance and regression. April 17, Contents Comparison of several groups One-way ANOVA. Two-way ANOVA Interaction Model checking Analysis of variance and regression Contents Comparison of several groups One-way ANOVA April 7, 008 Two-way ANOVA Interaction Model checking ANOVA, April 008 Comparison of or more groups Julie Lyng Forman,

More information

Introduction to Design and Analysis of Experiments with the SAS System (Stat 7010 Lecture Notes)

Introduction to Design and Analysis of Experiments with the SAS System (Stat 7010 Lecture Notes) Introduction to Design and Analysis of Experiments with the SAS System (Stat 7010 Lecture Notes) Asheber Abebe Discrete and Statistical Sciences Auburn University Contents 1 Completely Randomized Design

More information

Chapter 16. Simple Linear Regression and Correlation

Chapter 16. Simple Linear Regression and Correlation Chapter 16 Simple Linear Regression and Correlation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Introduction to Crossover Trials

Introduction to Crossover Trials Introduction to Crossover Trials Stat 6500 Tutorial Project Isaac Blackhurst A crossover trial is a type of randomized control trial. It has advantages over other designed experiments because, under certain

More information

Answer Keys to Homework#10

Answer Keys to Homework#10 Answer Keys to Homework#10 Problem 1 Use either restricted or unrestricted mixed models. Problem 2 (a) First, the respective means for the 8 level combinations are listed in the following table A B C Mean

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

Diagnostics and Remedial Measures: An Overview

Diagnostics and Remedial Measures: An Overview Diagnostics and Remedial Measures: An Overview Residuals Model diagnostics Graphical techniques Hypothesis testing Remedial measures Transformation Later: more about all this for multiple regression W.

More information

Analysis of variance. April 16, Contents Comparison of several groups

Analysis of variance. April 16, Contents Comparison of several groups Contents Comparison of several groups Analysis of variance April 16, 2009 One-way ANOVA Two-way ANOVA Interaction Model checking Acknowledgement for use of presentation Julie Lyng Forman, Dept. of Biostatistics

More information

Lecture 7: Hypothesis Testing and ANOVA

Lecture 7: Hypothesis Testing and ANOVA Lecture 7: Hypothesis Testing and ANOVA Goals Overview of key elements of hypothesis testing Review of common one and two sample tests Introduction to ANOVA Hypothesis Testing The intent of hypothesis

More information

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X. Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.

More information

Analysis of variance. April 16, 2009

Analysis of variance. April 16, 2009 Analysis of variance April 16, 2009 Contents Comparison of several groups One-way ANOVA Two-way ANOVA Interaction Model checking Acknowledgement for use of presentation Julie Lyng Forman, Dept. of Biostatistics

More information

Non-parametric tests, part A:

Non-parametric tests, part A: Two types of statistical test: Non-parametric tests, part A: Parametric tests: Based on assumption that the data have certain characteristics or "parameters": Results are only valid if (a) the data are

More information

unadjusted model for baseline cholesterol 22:31 Monday, April 19,

unadjusted model for baseline cholesterol 22:31 Monday, April 19, unadjusted model for baseline cholesterol 22:31 Monday, April 19, 2004 1 Class Level Information Class Levels Values TRETGRP 3 3 4 5 SEX 2 0 1 Number of observations 916 unadjusted model for baseline cholesterol

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

CHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC

CHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC CHI SQUARE ANALYSIS I N T R O D U C T I O N T O N O N - P A R A M E T R I C A N A L Y S E S HYPOTHESIS TESTS SO FAR We ve discussed One-sample t-test Dependent Sample t-tests Independent Samples t-tests

More information

Stat 427/527: Advanced Data Analysis I

Stat 427/527: Advanced Data Analysis I Stat 427/527: Advanced Data Analysis I Review of Chapters 1-4 Sep, 2017 1 / 18 Concepts you need to know/interpret Numerical summaries: measures of center (mean, median, mode) measures of spread (sample

More information

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006 Chapter 17 Simple Linear Regression and Correlation 17.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007) FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter

More information

Selection should be based on the desired biological interpretation!

Selection should be based on the desired biological interpretation! Statistical tools to compare levels of parasitism Jen_ Reiczigel,, Lajos Rózsa Hungary What to compare? The prevalence? The mean intensity? The median intensity? Or something else? And which statistical

More information

4.1. Introduction: Comparing Means

4.1. Introduction: Comparing Means 4. Analysis of Variance (ANOVA) 4.1. Introduction: Comparing Means Consider the problem of testing H 0 : µ 1 = µ 2 against H 1 : µ 1 µ 2 in two independent samples of two different populations of possibly

More information

Chapter 16. Simple Linear Regression and dcorrelation

Chapter 16. Simple Linear Regression and dcorrelation Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6 STA 8 Applied Linear Models: Regression Analysis Spring 011 Solution for Homework #6 6. a) = 11 1 31 41 51 1 3 4 5 11 1 31 41 51 β = β1 β β 3 b) = 1 1 1 1 1 11 1 31 41 51 1 3 4 5 β = β 0 β1 β 6.15 a) Stem-and-leaf

More information

Lecture 11: Simple Linear Regression

Lecture 11: Simple Linear Regression Lecture 11: Simple Linear Regression Readings: Sections 3.1-3.3, 11.1-11.3 Apr 17, 2009 In linear regression, we examine the association between two quantitative variables. Number of beers that you drink

More information

3. Nonparametric methods

3. Nonparametric methods 3. Nonparametric methods If the probability distributions of the statistical variables are unknown or are not as required (e.g. normality assumption violated), then we may still apply nonparametric tests

More information

Advanced Regression Topics: Violation of Assumptions

Advanced Regression Topics: Violation of Assumptions Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, 2005 Applied Regression Analysis Lecture #7-2/15/2005 Slide 1 of 36 Today s Lecture Today s Lecture rapping Up Revisiting residuals.

More information

Tentative solutions TMA4255 Applied Statistics 16 May, 2015

Tentative solutions TMA4255 Applied Statistics 16 May, 2015 Norwegian University of Science and Technology Department of Mathematical Sciences Page of 9 Tentative solutions TMA455 Applied Statistics 6 May, 05 Problem Manufacturer of fertilizers a) Are these independent

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Topic 2. Chapter 3: Diagnostics and Remedial Measures

Topic 2. Chapter 3: Diagnostics and Remedial Measures Topic Overview This topic will cover Regression Diagnostics Remedial Measures Statistics 512: Applied Linear Models Some other Miscellaneous Topics Topic 2 Chapter 3: Diagnostics and Remedial Measures

More information

Assignment 9 Answer Keys

Assignment 9 Answer Keys Assignment 9 Answer Keys Problem 1 (a) First, the respective means for the 8 level combinations are listed in the following table A B C Mean 26.00 + 34.67 + 39.67 + + 49.33 + 42.33 + + 37.67 + + 54.67

More information

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) 22s:152 Applied Linear Regression Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) We now consider an analysis with only categorical predictors (i.e. all predictors are

More information

Statistics for exp. medical researchers Comparison of groups, T-tests and ANOVA

Statistics for exp. medical researchers Comparison of groups, T-tests and ANOVA Faculty of Health Sciences Outline Statistics for exp. medical researchers Comparison of groups, T-tests and ANOVA Lene Theil Skovgaard Sept. 14, 2015 Paired comparisons: tests and confidence intervals

More information

appstats27.notebook April 06, 2017

appstats27.notebook April 06, 2017 Chapter 27 Objective Students will conduct inference on regression and analyze data to write a conclusion. Inferences for Regression An Example: Body Fat and Waist Size pg 634 Our chapter example revolves

More information

Assessing Model Adequacy

Assessing Model Adequacy Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for inferences. In cases where some assumptions are violated, there are

More information

Inferences About the Difference Between Two Means

Inferences About the Difference Between Two Means 7 Inferences About the Difference Between Two Means Chapter Outline 7.1 New Concepts 7.1.1 Independent Versus Dependent Samples 7.1. Hypotheses 7. Inferences About Two Independent Means 7..1 Independent

More information

Model Checking and Improvement

Model Checking and Improvement Model Checking and Improvement Statistics 220 Spring 2005 Copyright c 2005 by Mark E. Irwin Model Checking All models are wrong but some models are useful George E. P. Box So far we have looked at a number

More information

Correlation and Regression

Correlation and Regression Correlation and Regression Dr. Bob Gee Dean Scott Bonney Professor William G. Journigan American Meridian University 1 Learning Objectives Upon successful completion of this module, the student should

More information

One-Way ANOVA Cohen Chapter 12 EDUC/PSY 6600

One-Way ANOVA Cohen Chapter 12 EDUC/PSY 6600 One-Way ANOVA Cohen Chapter 1 EDUC/PSY 6600 1 It is easy to lie with statistics. It is hard to tell the truth without statistics. -Andrejs Dunkels Motivating examples Dr. Vito randomly assigns 30 individuals

More information

Diagnostics and Remedial Measures

Diagnostics and Remedial Measures Diagnostics and Remedial Measures Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Diagnostics and Remedial Measures 1 / 72 Remedial Measures How do we know that the regression

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on

More information

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous

More information

1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as

1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as ST 51, Summer, Dr. Jason A. Osborne Homework assignment # - Solutions 1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available

More information

PLS205 Lab 2 January 15, Laboratory Topic 3

PLS205 Lab 2 January 15, Laboratory Topic 3 PLS205 Lab 2 January 15, 2015 Laboratory Topic 3 General format of ANOVA in SAS Testing the assumption of homogeneity of variances by "/hovtest" by ANOVA of squared residuals Proc Power for ANOVA One-way

More information

Unbalanced Data in Factorials Types I, II, III SS Part 1

Unbalanced Data in Factorials Types I, II, III SS Part 1 Unbalanced Data in Factorials Types I, II, III SS Part 1 Chapter 10 in Oehlert STAT:5201 Week 9 - Lecture 2 1 / 14 When we perform an ANOVA, we try to quantify the amount of variability in the data accounted

More information

z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests

z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests Chapters 3.5.1 3.5.2, 3.3.2 Prof. Tesler Math 283 Fall 2018 Prof. Tesler z and t tests for mean Math

More information

Handling Categorical Predictors: ANOVA

Handling Categorical Predictors: ANOVA Handling Categorical Predictors: ANOVA 1/33 I Hate Lines! When we think of experiments, we think of manipulating categories Control, Treatment 1, Treatment 2 Models with Categorical Predictors still reflect

More information

sphericity, 5-29, 5-32 residuals, 7-1 spread and level, 2-17 t test, 1-13 transformations, 2-15 violations, 1-19

sphericity, 5-29, 5-32 residuals, 7-1 spread and level, 2-17 t test, 1-13 transformations, 2-15 violations, 1-19 additive tree structure, 10-28 ADDTREE, 10-51, 10-53 EXTREE, 10-31 four point condition, 10-29 ADDTREE, 10-28, 10-51, 10-53 adjusted R 2, 8-7 ALSCAL, 10-49 ANCOVA, 9-1 assumptions, 9-5 example, 9-7 MANOVA

More information

Types of Statistical Tests DR. MIKE MARRAPODI

Types of Statistical Tests DR. MIKE MARRAPODI Types of Statistical Tests DR. MIKE MARRAPODI Tests t tests ANOVA Correlation Regression Multivariate Techniques Non-parametric t tests One sample t test Independent t test Paired sample t test One sample

More information

Outline. Analysis of Variance. Acknowledgements. Comparison of 2 or more groups. Comparison of serveral groups

Outline. Analysis of Variance. Acknowledgements. Comparison of 2 or more groups. Comparison of serveral groups Outline Analysis of Variance Analysis of variance and regression course http://staff.pubhealth.ku.dk/~lts/regression10_2/index.html Comparison of serveral groups Model checking Marc Andersen, mja@statgroup.dk

More information

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3-1 through 3-3

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3-1 through 3-3 Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3-1 through 3-3 Page 1 Tensile Strength Experiment Investigate the tensile strength of a new synthetic fiber. The factor is the weight percent

More information

Degrees of freedom df=1. Limitations OR in SPSS LIM: Knowing σ and µ is unlikely in large

Degrees of freedom df=1. Limitations OR in SPSS LIM: Knowing σ and µ is unlikely in large Z Test Comparing a group mean to a hypothesis T test (about 1 mean) T test (about 2 means) Comparing mean to sample mean. Similar means = will have same response to treatment Two unknown means are different

More information

PLS205!! Lab 9!! March 6, Topic 13: Covariance Analysis

PLS205!! Lab 9!! March 6, Topic 13: Covariance Analysis PLS205!! Lab 9!! March 6, 2014 Topic 13: Covariance Analysis Covariable as a tool for increasing precision Carrying out a full ANCOVA Testing ANOVA assumptions Happiness! Covariable as a Tool for Increasing

More information

Chapter 15: Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics

Chapter 15: Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics Understand Difference between Parametric and Nonparametric Statistical Procedures Parametric statistical procedures inferential procedures that rely

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

What is a Hypothesis?

What is a Hypothesis? What is a Hypothesis? A hypothesis is a claim (assumption) about a population parameter: population mean Example: The mean monthly cell phone bill in this city is μ = $42 population proportion Example:

More information

Rank-Based Methods. Lukas Meier

Rank-Based Methods. Lukas Meier Rank-Based Methods Lukas Meier 20.01.2014 Introduction Up to now we basically always used a parametric family, like the normal distribution N (µ, σ 2 ) for modeling random data. Based on observed data

More information

An Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01

An Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01 An Analysis of College Algebra Exam s December, 000 James D Jones Math - Section 0 An Analysis of College Algebra Exam s Introduction Students often complain about a test being too difficult. Are there

More information

Lec 3: Model Adequacy Checking

Lec 3: Model Adequacy Checking November 16, 2011 Model validation Model validation is a very important step in the model building procedure. (one of the most overlooked) A high R 2 value does not guarantee that the model fits the data

More information

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p.

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p. Preface p. xi Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p. 6 The Scientific Method and the Design of

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

My data doesn t look like that..

My data doesn t look like that.. Testing assumptions My data doesn t look like that.. We have made a big deal about testing model assumptions each week. Bill Pine Testing assumptions Testing assumptions We have made a big deal about testing

More information

STAT 135 Lab 9 Multiple Testing, One-Way ANOVA and Kruskal-Wallis

STAT 135 Lab 9 Multiple Testing, One-Way ANOVA and Kruskal-Wallis STAT 135 Lab 9 Multiple Testing, One-Way ANOVA and Kruskal-Wallis Rebecca Barter April 6, 2015 Multiple Testing Multiple Testing Recall that when we were doing two sample t-tests, we were testing the equality

More information

Outline. Analysis of Variance. Comparison of 2 or more groups. Acknowledgements. Comparison of serveral groups

Outline. Analysis of Variance. Comparison of 2 or more groups. Acknowledgements. Comparison of serveral groups Outline Analysis of Variance Analysis of variance and regression course http://staff.pubhealth.ku.dk/~jufo/varianceregressionf2011.html Comparison of serveral groups Model checking Marc Andersen, mja@statgroup.dk

More information

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10)

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10) Name Economics 170 Spring 2004 Honor pledge: I have neither given nor received aid on this exam including the preparation of my one page formula list and the preparation of the Stata assignment for the

More information

Application of Variance Homogeneity Tests Under Violation of Normality Assumption

Application of Variance Homogeneity Tests Under Violation of Normality Assumption Application of Variance Homogeneity Tests Under Violation of Normality Assumption Alisa A. Gorbunova, Boris Yu. Lemeshko Novosibirsk State Technical University Novosibirsk, Russia e-mail: gorbunova.alisa@gmail.com

More information

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Fall, 2013 Page 1 Random Variable and Probability Distribution Discrete random variable Y : Finite possible values {y

More information

Stat 500 Midterm 2 12 November 2009 page 0 of 11

Stat 500 Midterm 2 12 November 2009 page 0 of 11 Stat 500 Midterm 2 12 November 2009 page 0 of 11 Please put your name on the back of your answer book. Do NOT put it on the front. Thanks. Do not start until I tell you to. The exam is closed book, closed

More information

Analysis of 2x2 Cross-Over Designs using T-Tests

Analysis of 2x2 Cross-Over Designs using T-Tests Chapter 234 Analysis of 2x2 Cross-Over Designs using T-Tests Introduction This procedure analyzes data from a two-treatment, two-period (2x2) cross-over design. The response is assumed to be a continuous

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line

Inference for Regression Inference about the Regression Model and Using the Regression Line Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about

More information

Analysis of Variance (ANOVA) Cancer Research UK 10 th of May 2018 D.-L. Couturier / R. Nicholls / M. Fernandes

Analysis of Variance (ANOVA) Cancer Research UK 10 th of May 2018 D.-L. Couturier / R. Nicholls / M. Fernandes Analysis of Variance (ANOVA) Cancer Research UK 10 th of May 2018 D.-L. Couturier / R. Nicholls / M. Fernandes 2 Quick review: Normal distribution Y N(µ, σ 2 ), f Y (y) = 1 2πσ 2 (y µ)2 e 2σ 2 E[Y ] =

More information

Analysis of Variance

Analysis of Variance 1 / 70 Analysis of Variance Analysis of variance and regression course http://staff.pubhealth.ku.dk/~lts/regression11_2 Marc Andersen, mja@statgroup.dk Analysis of variance and regression for health researchers,

More information

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă HYPOTHESIS TESTING II TESTS ON MEANS Sorana D. Bolboacă OBJECTIVES Significance value vs p value Parametric vs non parametric tests Tests on means: 1 Dec 14 2 SIGNIFICANCE LEVEL VS. p VALUE Materials and

More information

The ε ij (i.e. the errors or residuals) are normally distributed. This assumption has the least influence on the F test.

The ε ij (i.e. the errors or residuals) are normally distributed. This assumption has the least influence on the F test. Lecture 11 Topic 8: Data Transformations Assumptions of the Analysis of Variance 1. Independence of errors The ε ij (i.e. the errors or residuals) are statistically independent from one another. Failure

More information

Example: Four levels of herbicide strength in an experiment on dry weight of treated plants.

Example: Four levels of herbicide strength in an experiment on dry weight of treated plants. The idea of ANOVA Reminders: A factor is a variable that can take one of several levels used to differentiate one group from another. An experiment has a one-way, or completely randomized, design if several

More information

PSY 307 Statistics for the Behavioral Sciences. Chapter 20 Tests for Ranked Data, Choosing Statistical Tests

PSY 307 Statistics for the Behavioral Sciences. Chapter 20 Tests for Ranked Data, Choosing Statistical Tests PSY 307 Statistics for the Behavioral Sciences Chapter 20 Tests for Ranked Data, Choosing Statistical Tests What To Do with Non-normal Distributions Tranformations (pg 382): The shape of the distribution

More information