Outline The Rank-Sum Test Procedure Paired Data Comparing Two Variances Lab 8: Hypothesis Testing with R. Week 13 Comparing Two Populations, Part II
|
|
- Florence Cameron
- 5 years ago
- Views:
Transcription
1 Week 13 Comparing Two Populations, Part II
2 Week 13 Objectives Coverage of the topic of comparing two population continues with new procedures and a new sampling design. The week concludes with a lab session. In particular: 1 The rank-sum test, is presented. 2 The concept of paired data is introduced, and the paired-data T -test, the signed-rank test, and McNemar s test are described. 3 Levene s test and the F-test for comparing two variances are presented. 4 The lab session demonstrates the R implementation of test procedures for one-sample, two-samples, and regression including checking for the validity of assumptions.
3
4 Motivation Outline If the sample sizes are small and the populations non-normal the T test is not valid. The Mann-Whitney-Wilcoxon rank-sum test (or rank-sum test for short), which will be described, can be used with both small and large sample sizes. If the two populations are continuous, the null distribution of the TS is known even with very small sample sizes. For discrete populations, the null distribution of the TS can be well approximated with much smaller sample sizes than those required by contrast-based procedure. The rank-sum test has high power, especially if the two population distributions are heavy tailed, or skewed.
5 The Null Hypothesis and the TS The rank sum procedure tests H F 0 : F 1 = F 2. Let R ij denote the (mid-)rank of observation X ij in the combined set of N = n 1 + n 2 observations, and set n 1 W 1 = R 1j, R 1 = W 1, W 2 = n 1 j=1 n 2 j=1 R 2j, R 2 = W 2 n 2. Then, the Mann-Whitney-Wilcoxon TS is R 1 R 2 = N ( ) N + 1 W 1 n 1, or simply W 1 n 1 n 2 2
6 The Standardized Rank-Sum TS and RR If there are no ties, Z H0 = W 1 n 1 (N + 1)/2 n1 n 2 (N + 1)/12 If H F 0 holds, Z H 0 N(0, 1), for n 1, n 2 > 8. The RR are: H a µ 1 µ 2 > 0 Z H0 z α µ 1 µ 2 < 0 Z H0 z α µ 1 µ 2 0 Z H0 z α/2 Rejection region at level α
7 Example Data on sputum histamine levels from 9 allergic and 13 non-allergic individuals are given in edu/acq/401/data/histamindata.txt. Is there a difference between the two populations? Test at α =.01. Solution. Here R 11 = 18, R 12 = 11, R 13 = 22, R 14 = 19, R 15 = 17, R 16 = 21, R 17 = 7, R 18 = 20, R 19 = 16. Thus W = j R 1j = 151 and Z H0 = 151 9(23)/2 9(13)(23)/12 = Since n 1, n 2 > 8, p-value=2[1 Φ(3.17)] = Thus the difference is significant.
8 Effect of Outliers Outline In the above example, the t test does not reject at level With data in data frame hi, gives p-value of 0.13, and t.test(hi$level hi$sample) t.test(hi$level hi$sample, var.equal=true) gives p-value of In general, using a procedure when the underlying assumptions are violated will give misleading results.
9 Introduction, Motivation Paired data arise when each experimental unit receives each of the two treatments that being compared. 1 Compare the durability of two types of tires. 2 Compare two labs for the analysis of mercury content. 3 Two acne treatments, two cataract treatments, etc. Paired data are of the form: (X 11, X 21 ),..., (X 1n, X 2n ). CIs and the TS are again based on X 1 X 2. But now they are not independent. Thus, previous formulas do not apply. For example, σ 2 X 1 X 2 = σ 2 X 1 + σ 2 X 2 2Cov(X 1, X 2 ). Similarly, the rank sum test is not valid now.
10 The paired data T-test While Cov(X 1, X 2 ) can be estimated, it is easier to use D 1 = X 11 X 21,..., D n = X 1n X 2n D 1,..., D n are independent, and D = X 1 X 2. Thus, σ 2 = σ 2 X 1 X 2 D can be estimated by σ2 = D S2 D /n, where [ n ] SD 2 = 1 Di 2 1 n n 1 n ( D i ) 2 i=1 CIs and testing are based on the fact: i=1 D µ D S D / n T n 1 if normality holds, or if n 30
11 Example A total of 12 water samples are analyzed for mercury content by labs A and B. The paired data yields D = X 1 X 2 = and S D = Does lab B give, on average, higher concentration results than lab A? Test at α = Solution. Here H 0 : µ 1 µ 2 = 0, H a : µ 1 µ 2 < 0. Because n < 30, we must assume normality. Doing so we have: T H0 = D S D / n = / 12 = Since T H0 < t.05,11 = 1.796, H 0 is rejected.
12 It is important to be able to recognize paired data. For example, A study was conducted to see whether two cars, A and B, having very different wheel bases and turning radii, took the same time to parallel park. 7 drivers were randomly obtained and the time required for each of them to parallel park each of the 2 cars was measured. The results are as follows: Driver Car A B
13 The Signed-Rank test 1 Rank the absolute differences D 1,..., D n from smallest to largest. Let R i denote the rank of D i. 2 Assign to R i the sign of D i, forming thus signed ranks. 3 Let S + be the sum of the ranks R i with positive sign, i.e. the sum of the positive signed ranks. If H 0 holds, µ S+ = n(n+1) 4, σs 2 + = n(n+1)(2n+1) 24. If H 0 holds, and n > 10, S + N(µ S+, σs 2 + ). The TS for testing H 0 : µ D = 0 is ( ) n(n + 1) n(n + 1)(2n + 1) Z H0 = S + /, 4 24 The RRs are the usual RRs of a Z -test.
14 Example (Mercury concentrations from Labs A and B) The 12 differences, D i and the ranks of their absolute values are given in the table below. Test H 0 : µ 1 µ 2 = 0, H a : µ 1 µ 2 < 0 at α = D i R i D i R i Solution: Here S + = = 13. Thus Z H0 = = 2.04, with p-value= Φ( 2.04) = Setting the differences in the object d, e.g., d=c( , , , , , , , , , , , ), the command wilcox.test(d,alternative= less ) returns a p-value of
15 Two Proportions with paired data Here each pair (X 1j, X 2j ) can be either (1, 1) or (1, 0) or (0, 1) or (0, 0). As an example, if n voters are asked, both before and after a presidential speech, whether or not they support a certain policy, X 1j = 1 or 0 if the jth voter supports or not before the speech, and X 2j = 1 or 0 if the same voter supports or not after the speech. Typically, however, the pairs (X 1j, X 2j ) are not given. Instead the data are presented in the following table format.
16 After 1 0 Before 1 Y 1 Y 2 0 Y 3 Y 4 Y 1 is the number of (1, 1) pairs, Y 2 is the number of (1, 0) pairs, Y 3 is the number of (0, 1) pairs, Y 4 is the number of (0, 0) pairs, Y Y 4 = n
17 A variation of the T statistic, used only for testing H 0 : p 1 p 2 = 0, is the McNemar test statistic: MN = Y 2 Y 3 Y2 + Y 3 This is referred to N(0,1), so the RR for H a : p 1 > p 2 is MN > z α. Similarly for the other H a. R uses the square of MN and refers it to a χ 2 1 distribution. In this form only H a : p 1 p 2 can be tested with p-value 1-pchisq(MN 2, 1).
18 Example (McNemar s test) Data on approval of the President s performance in office in two surveys, one month apart, of 1600 voting-age Americans, give Y 1 = 794, Y 2 = 150, Y 3 = 86, Y 4 = 570. Is there evidence, at α = 0.05, of a shift in public opinion? Report the p-value. Solution. Here, MN = (150 86)/ = Since z = 1.96 we conclude that there is evidence of a shift in public opinion. The R command 2*(1-pnorm(4.166)) returns a p-value of 3.10e-05.
19 Levene s Test Outline It is based on the idea that if the variances are equal, V 1j = X 1j X 1, j = 1,..., n 1, and V 2j = X 2j X 2, j = 1,..., n 2, where X i, i = 1, 2 is the median from ith sample, correspond to populations with equal means and variances. Thus, equality of variances can be tested by testing the hypothesis H 0 : µ V1 = µ V2 vs µ V1 µ V2 using the two-sample t-test with pooled variance.
20 Example The plasma vitamin C concentration (µmol/l) of five randomly selected smokers and nonsmokers are: Nonsmokers s 1 = Smokers s 2 = Test H 0 : σ1 2 = σ2 2 vs H a : σ1 2 σ2 2 at α = Solution. Here X 1 = 41.68, X 2 = Thus, V 1 values for Nonsmokers V 2 values for Smokers The R commands x=c(0.20,0.03,0.30, 0.00, 0.50); y=c(0.26,0.00,0.17,0.05,0.23); t.test(x, y, var.equal=t) gives a p-value of Thus, H 0 is not rejected.
21 The F Test Under Normality When the two samples have been drawn from normal populations, the exact distribution of S1 2/S2 2 is a multiple of an F distribution. Theorem Let X 11,..., X 1n1 be a random sample from a normal distribution with variance σ1 2, let X 21,..., X 2n2 be another sample from a normal distribution with variance σ2 2, and let S2 1 and S2 2 denote the two sample variances. Then the rv F = S2 1 /σ2 1 S 2 2 /σ2 2 has an F distribution with ν 1 = n 1 1 and ν 2 = n 2 1 degrees of freedom.
22 The test statistic for H 0 : σ 2 1 = σ2 2 is: F H0 = S2 1 S2 2. If the ratio differs sufficiently from 1, the null hypothesis is rejected. In particular the RRs for testing H 0 : σ 2 1 = σ2 2 are H a RR at level α σ1 2 > σ2 2 F H0 > F n1 1,n 2 1;α σ1 2 < σ2 2 F 1 H 0 > F n2 1,n 1 1;α σ1 2 σ2 2 either F H0 > F n1 1,n 2 1;α/2 or F 1 H 0 > F n2 1,n 1 1;α/2
23 Example Consider the data in the previous example, and assume the underlying populations are normal. The test statistic is F H0 = = By the formula for the p-value in p. 333, the p-value, found with 2(1-pf(2.4, 3,3)) is 0.49.
24
25 The R commands Outline If y and x contain the values of the response and the predictor, the basic commands for testing in regression are: out=lm(y x); summary(out); summary(aov(out)) summary(out) gives the estimated regression coefficients and their standard errors, the p-values for testing that each coefficient is zero, R 2, and also the F-test statistic and p-value for the model utility test. summary(aov(out)) gives the ANOVA table.
26 Illustration with Simulated Data e=rnorm(50,0,5); x=runif(50,0,10); y=25-3.4*x+e; out=lm(y x) For the data generated the summary(out) output includes Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 x <2e-16 Residual standard error: on 48 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 48 DF, p-value: < 2.2e-16
27 Moreover, the summary(aov(out)) output includes Df Sum Sq Mean Sq F value Pr(>F) x <2e-16 Residuals The standard errors of the coefficients in the summary(out) output can be used for computing T statistics for other hypotheses regarding them. For example, for the T statistic for testing H 0 : β 1 = 3.4 vs H a : β is T H0 = = with corresponding p-value 2(1 G 48 (1.253)) = qqnorm(resid(out)); qqline(resid(out), col=2) can be used to check the normality assumption
28
29 T-tests and T-intervals for one mean Let x contain the data set. By default, the command t.test(x), which is equivalent to t.test(x, mu=0, alternative= two.sided, conf.level=0.95) gives the t-statistic, the df, the p-value for testing H 0 : µ = 0 against the two-sided alternative, the 95% CI for µ, and X. To test H 0 : µ = 8.5, replace mu=0 by mu=8.5. For one-sided alternatives, use alternative = less and alternative = greater. Note, however, the CIs are now one-sided.
30 Example Is there evidence that the average level of radiation is higher than the federal health standard of 10 W/cm 2? Use the data in ExRadiationTestData.txt to test at α = Also, report the p value, and construct a 95% CI. Solution. Reading the data set into the R object x, the command t.test(x, mu=10, alternative= greater ) returns a p-value of Thus, H 0 : µ = 10 cannot be rejected in favor of H a : µ > 10 at α = Next use t.test(x, mu=10, alternative= two.sided ) to get a 95% CI of (9.773, ).
31 Power and sample size calculations for H 0 : µ = µ 0 First one needs to install the package pwr using the command install.packages( pwr ) Then issue the command library(pwr) to load the package in the current R session. The command for computing the power at a given µ a with a given n, α and S value, for H a : µ > µ 0, is pwr.t.test(n, (µ a µ 0 )/S, α, power=null, one.sample, greater ) For H a : µ < µ 0 and H a : µ µ 0 replace greater by less and two.sided, respectively.
32 Example For the testing problem H 0 : µ = 10 vs H a : µ > 10 with the ExRadiationTestData.txt data set, find the power at µ a = 11. Solution. The commands length(x); sd(x) return n = 25 and S = 2.00 for this data set. The R command pwr.t.test(25, (11-10)/2.00, 0.05, power = NULL, one.sample, greater ) returns a power of NOTE: Treating S as the true σ, the command 1-pnorm((10-11)/(2.00/sqrt(25)) + qnorm(0.95)) returns a power of 0.80 according to the formula in the teaching slides.
33 The command for computing the sample size needed to achieve a certain level of power at µ a with a given α and S value, for H a : µ > µ 0, is pwr.t.test(n=null, (µ a µ 0 )/S, α, power(µ a ), one.sample, greater ) For H a : µ < µ 0 and H a : µ µ 0 replace greater by less and two.sided, respectively.
34 Example For the testing problem H 0 : µ = 10 vs H a : µ > 10 with the ExRadiationTestData.txt data set, find the sample size needed to achieve power of 0.9 at µ a = 11. Solution. The R command pwr.t.test(n=null, (11-10)/2.00, 0.05, 0.9, one.sample, greater ) returns a sample size of 35.65, which is rounded up to 36. NOTE: Treating S as the true σ, the command (2.00*(qnorm(.95)+qnorm(.9))/(10-11))**2 returns a sample size of 34.26, which is rounded to 35, according to the formula in the teaching slides.
35 Two independent samples The t.test command can also be used for comparing two means, both with independent and with paired data. The two samples can be in two separate columns (i.e., x and y), or combined in one column, say y, with a separate column, say x, indicating the sample membership of each observation. The default is to treat the two samples as independent, do 95% CI, and give the p-value for H a : µ 1 µ 2 0, without assuming σ 1 = σ 2. The command with these default options is: t.test(x, y) # One sample in x, the other in y t.test(y x) # For values in y and sample index in x
36 For the pooled variance T test, and 99% CI do: t.test(y x, var.equal = TRUE, conf.level = 0.99) and similarly if the two samples are in separate columns. To test a different null hypothesis, e.g., H 0 : µ 1 µ 2 = 1.8 vs H a : µ 1 µ 2 < 1.8 do: t.test(y x, mu=1.8, alternative = less ). and similarly if the two samples are in separate columns. Other options are: alternative = greater, or the default two.sided.
37 Example Use the R data set airquality to compare the ozone levels in May and August. Report the p-value, test at 0.05, and construct a 95%CI for µ 1 µ 2, with and without the assumption of equal variances. [NOTE: Normality is violated; check with boxplot(ozone Month, data = airquality). ] Solution: Use: y1=airquality$ozone; x1=airquality$month x=y1[which(x1==5)]; y=y1[which(x1==8)]; t.test(x, y); t.test(x, y, var.equal = T) More advanced application ( ) : t.test(ozone Month, data = airquality, subset = Month %in% c(5, 8))
38 Outline The basic command for testing and CI construction with paired data is t.test(y x, paired = T) and similarly if the two samples are in different columns. Other options can be added as before. For example, t.test(y x, alternative = c( two.sided, less, greater ), mu = 1.8, paired = T, conf.level = 0.9) With paired data, equality of the two marginal variances is a non-issue, so you never need to use var.equal=t.
39 Example Two brands of motorcycle tires are to be compared for durability. Eight motorcycles are selected at random and one tire from each brand is randomly assigned (front or back) on each motorcycle. The motorcycles are then run until the tires wear out. The data in motorcycletireslifetimes.txt are in km. Use the paired T -test procedure to test the hypothesis of equal average durability at level α = 0.05, and to construct a 90% CI for µ 1 µ 2. Solution: Read the data in tl and use: x=tl$brand1; y=tl$brand2; t.test(x,y,paired=t, conf.level=0.9) # set x and y and construct the test and CIs
40
41 The Rank Sum Test The wilcox.test command can be used to conduct both the rank-sum test and the signed-rank test. Again, the two samples can be in two separate columns, or combined in one column with a separate column indicating the sample membership of each observation. The default is to treat the two samples as independent, and give the p-value for testing equality of the two populations against the two-sided alternative, without constructing a CI: wilcox.test(x, y) # One sample in x, the other in y wilcox.test(y x) # For values in y and sample index in x
42 To get a CI for the location difference use: wilcox.test(y x, conf.int = TRUE, conf.level = 0.9) [The description of this CI is not in the book.] To test for different null and alternative hypotheses use: wilcox.test(y x, mu=1.8, alternative = c( less, greater )) Similarly if the two samples are in different columns.
43 Example Use the R data set airquality to compare the ozone levels in May and August. [Check data set with boxplot(ozone Month, data = airquality)] Solution: Use y1=airquality$ozone; x1=airquality$month x=y1[which(x1==5)]; y=y1[which(x1==8)]; wilcox.test(x, y, conf.int = T) More advanced application ( ) : wilcox.test(ozone Month, data = airquality, subset = Month %in% c(5, 8))
44 Rank sum for paired data (Signed-Rank Test) The basic command for the signed-rank test with paired data (without constructing a CI) is: wilcox.test(x, y, paired = T) # One sample in x, the other in y wilcox.test(y x, paired = T) # For values in y and sample index in x Other options can be added as before. For example, wilcox.test(x, y, alternative = c( less, greater ), mu = 1.8, paired = T, conf.int = T, conf.level = 0.9)
45 Example Two brands of motorcycle tires are to be compared for durability. Eight motorcycles are selected at random and one tire from each brand is randomly assigned (front or back) on each motorcycle. The motorcycles are then run until the tires wear out. The data in 401/Data/motorcycleTiresLifetimes.txt are in km. Use the signed-rank test procedure to test the hypothesis of equal durability at level α = 0.05, and to construct a 90% CI for the location difference. Solution: Read the data in tl and use: x=tl$brand1; y=tl$brand2; wilcox.test(x, y, paired=t, conf.int = T, conf.level=0.9) # set x and y and construct the test and CIs
46
47 Set the number of successes and the number of trials in x and n. For example, use x=c(16,14); n=c(200,400) if X 1 = 16, X 2 = 14, n 1 = 200, n 2 = 400. To test H 0 : p 1 p 2 = 0 vs the two-sided alternative, and construct a 95% CI for p 1 p 2, use prop.test(x, n), or, equivalently: prop.test(x, n, alternative = two.sided, conf.level = 0.95) Other alternative options are less, or greater. No option for testing other null hypotheses, e.g., H 0 : p 1 p 2 = 0.1
48 Example An article in Knee Surgery, Sports Traumatology, Arthroscopy (2005), Vol. 13, , reported results of arthroscopic meniscal repair with an absorbable screw. For tears greater than 25 millimeters, 10 of 18 repairs were successful, while for tears less than 25 millimeters, 22 of 30 were successful. Is there evidence that the success rate for the two types of tears are different? Test at α = 0.1, report the p-value, and construct a 90% confidence interval for p 1 p 2. Solution: Use x=c(10,22); n=c(18,30); prop.test(x, n, conf.level = 0.9)
Introduction to Nonparametric Statistics
Introduction to Nonparametric Statistics by James Bernhard Spring 2012 Parameters Parametric method Nonparametric method µ[x 2 X 1 ] paired t-test Wilcoxon signed rank test µ[x 1 ], µ[x 2 ] 2-sample t-test
More informationWeek 14 Comparing k(> 2) Populations
Week 14 Comparing k(> 2) Populations Week 14 Objectives Methods associated with testing for the equality of k(> 2) means or proportions are presented. Post-testing concepts and analysis are introduced.
More information= 1 i. normal approximation to χ 2 df > df
χ tests 1) 1 categorical variable χ test for goodness-of-fit ) categorical variables χ test for independence (association, contingency) 3) categorical variables McNemar's test for change χ df k (O i 1
More informationContents 1. Contents
Contents 1 Contents 1 One-Sample Methods 3 1.1 Parametric Methods.................... 4 1.1.1 One-sample Z-test (see Chapter 0.3.1)...... 4 1.1.2 One-sample t-test................. 6 1.1.3 Large sample
More informationComparison of two samples
Comparison of two samples Pierre Legendre, Université de Montréal August 009 - Introduction This lecture will describe how to compare two groups of observations (samples) to determine if they may possibly
More informationRank-Based Methods. Lukas Meier
Rank-Based Methods Lukas Meier 20.01.2014 Introduction Up to now we basically always used a parametric family, like the normal distribution N (µ, σ 2 ) for modeling random data. Based on observed data
More informationConfidence Intervals, Testing and ANOVA Summary
Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0
More informationSEVERAL μs AND MEDIANS: MORE ISSUES. Business Statistics
SEVERAL μs AND MEDIANS: MORE ISSUES Business Statistics CONTENTS Post-hoc analysis ANOVA for 2 groups The equal variances assumption The Kruskal-Wallis test Old exam question Further study POST-HOC ANALYSIS
More informationExercise I.1 I.2 I.3 I.4 II.1 II.2 III.1 III.2 III.3 IV.1 Question (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Answer
Solutions to Exam in 02402 December 2012 Exercise I.1 I.2 I.3 I.4 II.1 II.2 III.1 III.2 III.3 IV.1 Question (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Answer 3 1 5 2 5 2 3 5 1 3 Exercise IV.2 IV.3 IV.4 V.1
More informationBIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES
BIOL 458 - Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES PART 1: INTRODUCTION TO ANOVA Purpose of ANOVA Analysis of Variance (ANOVA) is an extremely useful statistical method
More informationComparing Two Variances. CI For Variance Ratio
STAT 503 Two Sample Inferences Comparing Two Variances Assume independent normal populations. Slide For Σ χ ν and Σ χ ν independent the ration Σ /ν Σ /ν follows an F-distribution with degrees of freedom
More informationDistribution-Free Procedures (Devore Chapter Fifteen)
Distribution-Free Procedures (Devore Chapter Fifteen) MATH-5-01: Probability and Statistics II Spring 018 Contents 1 Nonparametric Hypothesis Tests 1 1.1 The Wilcoxon Rank Sum Test........... 1 1. Normal
More informationData are sometimes not compatible with the assumptions of parametric statistical tests (i.e. t-test, regression, ANOVA)
BSTT523 Pagano & Gauvreau Chapter 13 1 Nonparametric Statistics Data are sometimes not compatible with the assumptions of parametric statistical tests (i.e. t-test, regression, ANOVA) In particular, data
More informationTHE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE
THE ROYAL STATISTICAL SOCIETY 004 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE PAPER II STATISTICAL METHODS The Society provides these solutions to assist candidates preparing for the examinations in future
More informationSTA 101 Final Review
STA 101 Final Review Statistics 101 Thomas Leininger June 24, 2013 Announcements All work (besides projects) should be returned to you and should be entered on Sakai. Office Hour: 2 3pm today (Old Chem
More informationHYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă
HYPOTHESIS TESTING II TESTS ON MEANS Sorana D. Bolboacă OBJECTIVES Significance value vs p value Parametric vs non parametric tests Tests on means: 1 Dec 14 2 SIGNIFICANCE LEVEL VS. p VALUE Materials and
More informationInferences About the Difference Between Two Means
7 Inferences About the Difference Between Two Means Chapter Outline 7.1 New Concepts 7.1.1 Independent Versus Dependent Samples 7.1. Hypotheses 7. Inferences About Two Independent Means 7..1 Independent
More informationThe Statistical Sleuth in R: Chapter 5
The Statistical Sleuth in R: Chapter 5 Linda Loi Kate Aloisio Ruobing Zhang Nicholas J. Horton January 21, 2013 Contents 1 Introduction 1 2 Diet and lifespan 2 2.1 Summary statistics and graphical display........................
More informationChapter 7 Comparison of two independent samples
Chapter 7 Comparison of two independent samples 7.1 Introduction Population 1 µ σ 1 1 N 1 Sample 1 y s 1 1 n 1 Population µ σ N Sample y s n 1, : population means 1, : population standard deviations N
More informationDisadvantages of using many pooled t procedures. The sampling distribution of the sample means. The variability between the sample means
Stat 529 (Winter 2011) Analysis of Variance (ANOVA) Reading: Sections 5.1 5.3. Introduction and notation Birthweight example Disadvantages of using many pooled t procedures The analysis of variance procedure
More informationResampling Methods. Lukas Meier
Resampling Methods Lukas Meier 20.01.2014 Introduction: Example Hail prevention (early 80s) Is a vaccination of clouds really reducing total energy? Data: Hail energy for n clouds (via radar image) Y i
More informationStatistics for Managers Using Microsoft Excel Chapter 9 Two Sample Tests With Numerical Data
Statistics for Managers Using Microsoft Excel Chapter 9 Two Sample Tests With Numerical Data 999 Prentice-Hall, Inc. Chap. 9 - Chapter Topics Comparing Two Independent Samples: Z Test for the Difference
More informationz and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests
z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests Chapters 3.5.1 3.5.2, 3.3.2 Prof. Tesler Math 283 Fall 2018 Prof. Tesler z and t tests for mean Math
More information22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)
22s:152 Applied Linear Regression Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) We now consider an analysis with only categorical predictors (i.e. all predictors are
More informationANOVA - analysis of variance - used to compare the means of several populations.
12.1 One-Way Analysis of Variance ANOVA - analysis of variance - used to compare the means of several populations. Assumptions for One-Way ANOVA: 1. Independent samples are taken using a randomized design.
More informationExam details. Final Review Session. Things to Review
Exam details Final Review Session Short answer, similar to book problems Formulae and tables will be given You CAN use a calculator Date and Time: Dec. 7, 006, 1-1:30 pm Location: Osborne Centre, Unit
More information22s:152 Applied Linear Regression. Take random samples from each of m populations.
22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each
More informationComparison of Two Population Means
Comparison of Two Population Means Esra Akdeniz March 15, 2015 Independent versus Dependent (paired) Samples We have independent samples if we perform an experiment in two unrelated populations. We have
More informationSTAT 135 Lab 9 Multiple Testing, One-Way ANOVA and Kruskal-Wallis
STAT 135 Lab 9 Multiple Testing, One-Way ANOVA and Kruskal-Wallis Rebecca Barter April 6, 2015 Multiple Testing Multiple Testing Recall that when we were doing two sample t-tests, we were testing the equality
More informationNon-parametric (Distribution-free) approaches p188 CN
Week 1: Introduction to some nonparametric and computer intensive (re-sampling) approaches: the sign test, Wilcoxon tests and multi-sample extensions, Spearman s rank correlation; the Bootstrap. (ch14
More information22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA
22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each
More informationTentative solutions TMA4255 Applied Statistics 16 May, 2015
Norwegian University of Science and Technology Department of Mathematical Sciences Page of 9 Tentative solutions TMA455 Applied Statistics 6 May, 05 Problem Manufacturer of fertilizers a) Are these independent
More informationContents 1. Contents
Contents 1 Contents 2 Two-Sample Methods 3 2.1 Classic Method...................... 7 2.2 A Two-sample Permutation Test............. 11 2.2.1 Permutation test................. 11 2.2.2 Steps for a two-sample
More informationRelating Graph to Matlab
There are two related course documents on the web Probability and Statistics Review -should be read by people without statistics background and it is helpful as a review for those with prior statistics
More informationWeek 7.1--IES 612-STA STA doc
Week 7.1--IES 612-STA 4-573-STA 4-576.doc IES 612/STA 4-576 Winter 2009 ANOVA MODELS model adequacy aka RESIDUAL ANALYSIS Numeric data samples from t populations obtained Assume Y ij ~ independent N(μ
More informationTABLES AND FORMULAS FOR MOORE Basic Practice of Statistics
TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Exploring Data: Distributions Look for overall pattern (shape, center, spread) and deviations (outliers). Mean (use a calculator): x = x 1 + x
More information22s:152 Applied Linear Regression. 1-way ANOVA visual:
22s:152 Applied Linear Regression 1-way ANOVA visual: Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Y We now consider an analysis
More informationThe Statistical Sleuth in R: Chapter 5
The Statistical Sleuth in R: Chapter 5 Kate Aloisio Ruobing Zhang Nicholas J. Horton June 15, 2016 Contents 1 Introduction 1 2 Diet and lifespan 2 2.1 Summary statistics and graphical display........................
More informationR in Linguistic Analysis. Wassink 2012 University of Washington Week 6
R in Linguistic Analysis Wassink 2012 University of Washington Week 6 Overview R for phoneticians and lab phonologists Johnson 3 Reading Qs Equivalence of means (t-tests) Multiple Regression Principal
More informationWilcoxon Test and Calculating Sample Sizes
Wilcoxon Test and Calculating Sample Sizes Dan Spencer UC Santa Cruz Dan Spencer (UC Santa Cruz) Wilcoxon Test and Calculating Sample Sizes 1 / 33 Differences in the Means of Two Independent Groups When
More informationR Short Course Session 4
R Short Course Session 4 Daniel Zhao, PhD Sixia Chen, PhD Department of Biostatistics and Epidemiology College of Public Health, OUHSC 11/13/2015 Outline Random distributions Summary statistics Statistical
More informationPSY 307 Statistics for the Behavioral Sciences. Chapter 20 Tests for Ranked Data, Choosing Statistical Tests
PSY 307 Statistics for the Behavioral Sciences Chapter 20 Tests for Ranked Data, Choosing Statistical Tests What To Do with Non-normal Distributions Tranformations (pg 382): The shape of the distribution
More informationLecture 7: Hypothesis Testing and ANOVA
Lecture 7: Hypothesis Testing and ANOVA Goals Overview of key elements of hypothesis testing Review of common one and two sample tests Introduction to ANOVA Hypothesis Testing The intent of hypothesis
More informationST4241 Design and Analysis of Clinical Trials Lecture 7: N. Lecture 7: Non-parametric tests for PDG data
ST4241 Design and Analysis of Clinical Trials Lecture 7: Non-parametric tests for PDG data Department of Statistics & Applied Probability 8:00-10:00 am, Friday, September 2, 2016 Outline Non-parametric
More informationThe independent-means t-test:
The independent-means t-test: Answers the question: is there a "real" difference between the two conditions in my experiment? Or is the difference due to chance? Previous lecture: (a) Dependent-means t-test:
More informationRegression and the 2-Sample t
Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression
More informationLecture 6 Multiple Linear Regression, cont.
Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression
More informationSection 4.6 Simple Linear Regression
Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval
More informationST430 Exam 2 Solutions
ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving
More informationGROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION
FOR SAMPLE OF RAW DATA (E.G. 4, 1, 7, 5, 11, 6, 9, 7, 11, 5, 4, 7) BE ABLE TO COMPUTE MEAN G / STANDARD DEVIATION MEDIAN AND QUARTILES Σ ( Σ) / 1 GROUPED DATA E.G. AGE FREQ. 0-9 53 10-19 4...... 80-89
More informationUnit 10: Simple Linear Regression and Correlation
Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for
More informationStatistical inference (estimation, hypothesis tests, confidence intervals) Oct 2018
Statistical inference (estimation, hypothesis tests, confidence intervals) Oct 2018 Sampling A trait is measured on each member of a population. f(y) = propn of individuals in the popn with measurement
More informationChapter 15: Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics
Section 15.1: An Overview of Nonparametric Statistics Understand Difference between Parametric and Nonparametric Statistical Procedures Parametric statistical procedures inferential procedures that rely
More informationAnalysis of Variance Bios 662
Analysis of Variance Bios 662 Michael G. Hudgens, Ph.D. mhudgens@bios.unc.edu http://www.bios.unc.edu/ mhudgens 2008-10-21 13:34 BIOS 662 1 ANOVA Outline Introduction Alternative models SS decomposition
More information4/6/16. Non-parametric Test. Overview. Stephen Opiyo. Distinguish Parametric and Nonparametric Test Procedures
Non-parametric Test Stephen Opiyo Overview Distinguish Parametric and Nonparametric Test Procedures Explain commonly used Nonparametric Test Procedures Perform Hypothesis Tests Using Nonparametric Procedures
More informationNonparametric Statistics. Leah Wright, Tyler Ross, Taylor Brown
Nonparametric Statistics Leah Wright, Tyler Ross, Taylor Brown Before we get to nonparametric statistics, what are parametric statistics? These statistics estimate and test population means, while holding
More informationStatistics for Managers Using Microsoft Excel Chapter 10 ANOVA and Other C-Sample Tests With Numerical Data
Statistics for Managers Using Microsoft Excel Chapter 10 ANOVA and Other C-Sample Tests With Numerical Data 1999 Prentice-Hall, Inc. Chap. 10-1 Chapter Topics The Completely Randomized Model: One-Factor
More informationBusiness Statistics. Lecture 10: Course Review
Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,
More informationStat 5102 Final Exam May 14, 2015
Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions
More informationBasic Business Statistics, 10/e
Chapter 1 1-1 Basic Business Statistics 11 th Edition Chapter 1 Chi-Square Tests and Nonparametric Tests Basic Business Statistics, 11e 009 Prentice-Hall, Inc. Chap 1-1 Learning Objectives In this chapter,
More informationTwo sample Hypothesis tests in R.
Example. (Dependent samples) Two sample Hypothesis tests in R. A Calculus professor gives their students a 10 question algebra pretest on the first day of class, and a similar test towards the end of the
More informationCHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)
FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter
More informationT.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS
ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS In our work on hypothesis testing, we used the value of a sample statistic to challenge an accepted value of a population parameter. We focused only
More informationNonparametric Statistics
Nonparametric Statistics Nonparametric or Distribution-free statistics: used when data are ordinal (i.e., rankings) used when ratio/interval data are not normally distributed (data are converted to ranks)
More informationNature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.
Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences
More informationChapter 10: Inferences based on two samples
November 16 th, 2017 Overview Week 1 Week 2 Week 4 Week 7 Week 10 Week 12 Chapter 1: Descriptive statistics Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter 8: Confidence
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationSummary of Chapters 7-9
Summary of Chapters 7-9 Chapter 7. Interval Estimation 7.2. Confidence Intervals for Difference of Two Means Let X 1,, X n and Y 1, Y 2,, Y m be two independent random samples of sizes n and m from two
More informationSociology 6Z03 Review II
Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability
More informationAnalysis of 2x2 Cross-Over Designs using T-Tests
Chapter 234 Analysis of 2x2 Cross-Over Designs using T-Tests Introduction This procedure analyzes data from a two-treatment, two-period (2x2) cross-over design. The response is assumed to be a continuous
More informationInference for Regression
Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu
More informationNonparametric Location Tests: k-sample
Nonparametric Location Tests: k-sample Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 04-Jan-2017 Nathaniel E. Helwig (U of Minnesota)
More informationi=1 X i/n i=1 (X i X) 2 /(n 1). Find the constant c so that the statistic c(x X n+1 )/S has a t-distribution. If n = 8, determine k such that
Math 47 Homework Assignment 4 Problem 411 Let X 1, X,, X n, X n+1 be a random sample of size n + 1, n > 1, from a distribution that is N(µ, σ ) Let X = n i=1 X i/n and S = n i=1 (X i X) /(n 1) Find the
More informationStatistics Handbook. All statistical tables were computed by the author.
Statistics Handbook Contents Page Wilcoxon rank-sum test (Mann-Whitney equivalent) Wilcoxon matched-pairs test 3 Normal Distribution 4 Z-test Related samples t-test 5 Unrelated samples t-test 6 Variance
More information3. Nonparametric methods
3. Nonparametric methods If the probability distributions of the statistical variables are unknown or are not as required (e.g. normality assumption violated), then we may still apply nonparametric tests
More informationNonparametric Tests. Mathematics 47: Lecture 25. Dan Sloughter. Furman University. April 20, 2006
Nonparametric Tests Mathematics 47: Lecture 25 Dan Sloughter Furman University April 20, 2006 Dan Sloughter (Furman University) Nonparametric Tests April 20, 2006 1 / 14 The sign test Suppose X 1, X 2,...,
More informationSampling distribution of t. 2. Sampling distribution of t. 3. Example: Gas mileage investigation. II. Inferential Statistics (8) t =
2. The distribution of t values that would be obtained if a value of t were calculated for each sample mean for all possible random of a given size from a population _ t ratio: (X - µ hyp ) t s x The result
More information" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2
Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the
More informationStat 427/527: Advanced Data Analysis I
Stat 427/527: Advanced Data Analysis I Review of Chapters 1-4 Sep, 2017 1 / 18 Concepts you need to know/interpret Numerical summaries: measures of center (mean, median, mode) measures of spread (sample
More informationInference with Heteroskedasticity
Inference with Heteroskedasticity Note on required packages: The following code requires the packages sandwich and lmtest to estimate regression error variance that may change with the explanatory variables.
More informationDr. Junchao Xia Center of Biophysics and Computational Biology. Fall /8/2016 1/38
BIO5312 Biostatistics Lecture 11: Multisample Hypothesis Testing II Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/8/2016 1/38 Outline In this lecture, we will continue to
More informationQuantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing
Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu October
More informationNon-Parametric Two-Sample Analysis: The Mann-Whitney U Test
Non-Parametric Two-Sample Analysis: The Mann-Whitney U Test When samples do not meet the assumption of normality parametric tests should not be used. To overcome this problem, non-parametric tests can
More informationCentral Limit Theorem ( 5.3)
Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately
More informationStatistics for EES Factorial analysis of variance
Statistics for EES Factorial analysis of variance Dirk Metzler June 12, 2015 Contents 1 ANOVA and F -Test 1 2 Pairwise comparisons and multiple testing 6 3 Non-parametric: The Kruskal-Wallis Test 9 1 ANOVA
More informationStatistics for IT Managers
Statistics for IT Managers 95-796, Fall 2012 Module 2: Hypothesis Testing and Statistical Inference (5 lectures) Reading: Statistics for Business and Economics, Ch. 5-7 Confidence intervals Given the sample
More informationSCHOOL OF MATHEMATICS AND STATISTICS
RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester
More information13 Simple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity
More informationChapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing
Chapter Fifteen Frequency Distribution, Cross-Tabulation, and Hypothesis Testing Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 15-1 Internet Usage Data Table 15.1 Respondent Sex Familiarity
More informationTMA4255 Applied Statistics V2016 (23)
TMA4255 Applied Statistics V2016 (23) Part 7: Nonparametric tests Signed-Rank test [16.2] Wilcoxon Rank-sum test [16.3] Anna Marie Holand April 19, 2016, wiki.math.ntnu.no/tma4255/2016v/start 2 Outline
More informationChapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides
Chapter 7 Inference for Distributions Introduction to the Practice of STATISTICS SEVENTH EDITION Moore / McCabe / Craig Lecture Presentation Slides Chapter 7 Inference for Distributions 7.1 Inference for
More informationSTATISTICS 141 Final Review
STATISTICS 141 Final Review Bin Zou bzou@ualberta.ca Department of Mathematical & Statistical Sciences University of Alberta Winter 2015 Bin Zou (bzou@ualberta.ca) STAT 141 Final Review Winter 2015 1 /
More informationMultiple Regression Introduction to Statistics Using R (Psychology 9041B)
Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Paul Gribble Winter, 2016 1 Correlation, Regression & Multiple Regression 1.1 Bivariate correlation The Pearson product-moment
More informationPsychology 282 Lecture #4 Outline Inferences in SLR
Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations
More information10.4 Hypothesis Testing: Two Independent Samples Proportion
10.4 Hypothesis Testing: Two Independent Samples Proportion Example 3: Smoking cigarettes has been known to cause cancer and other ailments. One politician believes that a higher tax should be imposed
More informationWeek 12 Hypothesis Testing, Part II Comparing Two Populations
Week 12 Hypothesis Testing, Part II Week 12 Hypothesis Testing, Part II Week 12 Objectives 1 The principle of Analysis of Variance is introduced and used to derive the F-test for testing the model utility
More informationIntroductory Statistics with R: Simple Inferences for continuous data
Introductory Statistics with R: Simple Inferences for continuous data Statistical Packages STAT 1301 / 2300, Fall 2014 Sungkyu Jung Department of Statistics University of Pittsburgh E-mail: sungkyu@pitt.edu
More informationStatistical Inference: Estimation and Confidence Intervals Hypothesis Testing
Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire
More informationStat 311: HW 9, due Th 5/27/10 in your Quiz Section
Stat 311: HW 9, due Th 5/27/10 in your Quiz Section Fritz Scholz Your returned assignment should show your name and student ID number. It should be printed or written clearly. 1. The data set ReactionTime
More information