Data Analysis and Statistical Methods Statistics 651

Similar documents
Data Analysis and Statistical Methods Statistics 651

The independent-means t-test:

Comparison of Two Population Means

Data Analysis and Statistical Methods Statistics 651

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

AMS 7 Correlation and Regression Lecture 8

Solutions exercises of Chapter 7

Introduction to hypothesis testing

Chapter 27 Summary Inferences for Regression

Business Statistics. Lecture 5: Confidence Intervals

Data Analysis and Statistical Methods Statistics 651

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides

Business Statistics. Lecture 10: Course Review

Distribution-Free Procedures (Devore Chapter Fifteen)

z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests

Data Analysis and Statistical Methods Statistics 651

1 Descriptive statistics. 2 Scores and probability distributions. 3 Hypothesis testing and one-sample t-test. 4 More on t-tests

+ Specify 1 tail / 2 tail

Rama Nada. -Ensherah Mokheemer. 1 P a g e

Physics 509: Non-Parametric Statistics and Correlation Testing

Chapter 18 Resampling and Nonparametric Approaches To Data

Performance Evaluation and Comparison

Wilcoxon Test and Calculating Sample Sizes

ANOVA - analysis of variance - used to compare the means of several populations.

Data Analysis and Statistical Methods Statistics 651

Resampling Methods. Lukas Meier

Chapter 7 Comparison of two independent samples

Review 6. n 1 = 85 n 2 = 75 x 1 = x 2 = s 1 = 38.7 s 2 = 39.2

Lecture 7: Hypothesis Testing and ANOVA

Data Analysis and Statistical Methods Statistics 651

Violating the normal distribution assumption. So what do you do if the data are not normal and you still need to perform a test?

Background to Statistics

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

One sided tests. An example of a two sided alternative is what we ve been using for our two sample tests:

Module 9: Nonparametric Statistics Statistics (OA3102)

appstats27.notebook April 06, 2017

Data Analysis and Statistical Methods Statistics 651

Mathematical Notation Math Introduction to Applied Statistics

Statistics: CI, Tolerance Intervals, Exceedance, and Hypothesis Testing. Confidence intervals on mean. CL = x ± t * CL1- = exp

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

Comparing Means from Two-Sample

Chapter 6. Estimates and Sample Sizes

Do not copy, post, or distribute. Independent-Samples t Test and Mann- C h a p t e r 13

3. Nonparametric methods

Inference for Regression

22s:152 Applied Linear Regression. 1-way ANOVA visual:

Two-Sample Inferential Statistics

PHP2510: Principles of Biostatistics & Data Analysis. Lecture X: Hypothesis testing. PHP 2510 Lec 10: Hypothesis testing 1

Statistics: revision

Ch. 7. One sample hypothesis tests for µ and σ

Inference for Distributions Inference for the Mean of a Population

t-test for b Copyright 2000 Tom Malloy. All rights reserved. Regression

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Nonparametric Statistics. Leah Wright, Tyler Ross, Taylor Brown

Analysis of variance (ANOVA) Comparing the means of more than two groups

Chapter 23. Inferences About Means. Monday, May 6, 13. Copyright 2009 Pearson Education, Inc.

Introduction to Nonparametric Statistics

Correlation and Regression

Analysis of 2x2 Cross-Over Designs using T-Tests

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee

Preliminary Statistics Lecture 5: Hypothesis Testing (Outline)

Density Temp vs Ratio. temp

Hypothesis Testing hypothesis testing approach formulation of the test statistic

Inferences for Regression

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

Things you always wanted to know about statistics but were afraid to ask

1 Least Squares Estimation - multiple regression.

psychological statistics

Intuitive Biostatistics: Choosing a statistical test

Non-parametric methods

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Non-parametric (Distribution-free) approaches p188 CN

WELCOME! Lecture 13 Thommy Perlinger

Data Analysis and Statistical Methods Statistics 651

LECTURE 5. Introduction to Econometrics. Hypothesis testing

Basics on t-tests Independent Sample t-tests Single-Sample t-tests Summary of t-tests Multiple Tests, Effect Size Proportions. Statistiek I.

Pooled Variance t Test

MAT Mathematics in Today's World

Hypothesis Testing. Hypothesis: conjecture, proposition or statement based on published literature, data, or a theory that may or may not be true

Descriptive Statistics CE 311S

Contrasts and Multiple Comparisons Supplement for Pages

Sampling distribution of t. 2. Sampling distribution of t. 3. Example: Gas mileage investigation. II. Inferential Statistics (8) t =

Population Variance. Concepts from previous lectures. HUMBEHV 3HB3 one-sample t-tests. Week 8

Data Analysis and Statistical Methods Statistics 651

MS&E 226: Small Data

Power and nonparametric methods Basic statistics for experimental researchersrs 2017

CHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC

Chapter 6 The Standard Deviation as a Ruler and the Normal Model

Statistics Handbook. All statistical tables were computed by the author.

We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p).

2011 Pearson Education, Inc

Nonparametric tests. Mark Muldoon School of Mathematics, University of Manchester. Mark Muldoon, November 8, 2005 Nonparametric tests - p.

MORE ON MULTIPLE REGRESSION

Dealing with the assumption of independence between samples - introducing the paired design.

Business Statistics. Lecture 9: Simple Regression

Data analysis and Geostatistics - lecture VII

PSY 307 Statistics for the Behavioral Sciences. Chapter 20 Tests for Ranked Data, Choosing Statistical Tests

Lecture 18: Simple Linear Regression

Transcription:

Data Analysis and Statistical Methods Statistics 65 http://www.stat.tamu.edu/~suhasini/teaching.html Suhasini Subba Rao Review In the previous lecture we considered the following tests: The independent sample t-test This when we have two completely independent samples drawn from two populations, and we want to compare their population means. If the sample sizes are relatively large and there are few outliers we can test equality of the means (one-sided versions) using the independent sample t-test. Wilcoxon sum rank test This when we have two completely independent samples drawn from two populations, and we want to compare their population means (or distributions). If the sample sizes are small and there appears to outliers we can test equility of their distributions (means) using the Wilcoxon sum rank test (we do not have standard errors in this test). The paired t-test This is when we have paired observations, each of the pairs coming from different populations. Eg. the running time of a runner at high and low altitudes. In this case the pairs are dependent, we cannot use either of the above tests because they are dependent. We can check for dependence by plotting the pairs against each other (eg. high altitude against low altitude). If sample size is relatively large and their are not many outliers use a paired t-test. The wilcox sigh rank test Today s class. A quick review on the idea of testing Suppose a bomb was exploded at the center 0. The bomb spreads debris everywhere. The closer you are to the bomb the more likely the debris will hit you. The standard derivation of the bomb spread distance (remember it is a measure of the amount of spread) is 3 miles (this means the variance is 9). Supposing the spread of the debris has a normal distribution - basically this means that about 95% percent of the debris will be spread over.96 3 radius (meaning either side) of 0. The other 5% will outside this radius. Suppose I am standing 4 miles from the center (0). Do you think the bomb will hit me? Suppose I am standing 9 miles to the right of the center, do you think I will get hit? Whats the probability I will get hit? It is P(Z > 9 3 ) = P(Z > 3) = 0.003. This is very small. 2 3

Suppose I am standing 9 miles to the right of the center and I did get hit. Everyone is telling me the bomb was exploded at 0. But I know the the probability of me being hit when the bomb is located at center 0 is 0.%. I am suspicious, this number is really small - though I don t know the exact location of the bomb. But I am pretty sure that its not at 0 or any location less than 0. This value 0.003 is a p-value. Its the probability of being hit when the center is zero. If its small its unlikely the center is zero. This means rejecting the null that the bomb is located at zero. Another story A more scattering bomb is being exploded at center 0. The standard deviation (measure of spread) of the debris is 6 miles (this means the variance is 36). This means that that 95% of the debris will be spread within.96 6 miles of the center 0. I am standing 9 miles away from the bomb. The chance of me being hit is P(Z > 9 6 ) = 0.09. This means almost a 0% chance of me being hit when the bomb is at zero. Suppose I get hit, then I cannot say that the bomb is at zero, but I cannot say that its not at zero. 0.09 is the p-value for me being hit when the bomb was exploded at zero. This means we cannot reject the null: that the bomb is located at zero. 4 5 What have you learnt from my stories I am an idiot for standing so close to the bombs. The distance that is safe to stay away from the bomb depends on the standard deviation (the variance). The smaller the ratio: distance/(standard deviation), the more likely I am to be hit when the center is zero. The larger the ratio distance/(standard deviation) the less likely I am to be hit when the center is zero. A quick review of JMP output Typically the output in JMP looks like: Mean (difference) std. error Upper 95% Lower 95% N t ratio DF Prob > t Mean is the sample mean, or the differences in the sample mean. Std.Error refers to the estimated standard deviation of the estimator (sample mean). N refers to the sample size. 6 7

t-ratio refers to the z-transform of the estimator. Usually t = M ean/std.error DF refs to the degrees of freedom of the t-distribution used. Prob > t refers to 2 p-value, when testing hypothesis H 0 : true mean = 0 (or difference in means is equal to zero) against the alternative H A : true mean not equal to zero (or difference in means is not equal to zero). JMP output and what you should be able to do with it Using this output you should be able to: Construct 90% CIs and 99% CIs etc. Test the hypothesis H 0 : true mean = 5, for example, (or difference in means is equal to five) against the alternative H A : true mean not equal to 5, for example, (or difference in means is not equal to zero). 8 9 Tests we have done so far Unstandardised Coefficients Example B Std. Error t Sig one sample mean independent samples t-test paired t-test X D D q s 2 n r s 2 n p + m r s 2 d n X µ 0 q s 2 n 0 P 0 @t n X µ 0 q s 2 n A r D s 2 n p + m P B @ t n+m 2 r D C s 2 n p + m A D r s 2 d n 0 P B @ t n r D s 2 C A d n Remember means taking the positive value of a number eg 3 = 3 etc. Remember standard deviation is a measure of spread. If we have normality then a 95% CI for the true parameter is [B.96 Std.Error, B +.96 Std.Error]. Roughly, this means it is highly likely the true mean lies in this interval. In the above table the p-value is evaluated when testing if the true parameters coefficient is zero. That is: Example B hypothesis, p-value evaluted under the null sample mean X H0 : µ = µ 0 H A : µ µ 0 t-test D H0 : µ µ 2 = 0 H A : µ µ 2 0 paired t-test D H0 : µ µ 2 = 0 H A : µ µ 2 0 0

Example I: Runners at altitude Runners were compared at a high and low altitude. For each runner, the running time was measured at a high altitude and then again at a low altitude. 2 runners were used. Runner 2 3 4 5 6 7 8 9 0 2 High 9.4 9.8 9.9 0.3 8.9 8.8 9.8 8.2 9.4 9.9 2.2 9.3 Low 8.7 7.9 8.3 8.4 9.2 9. 8.2 8. 8.9 8.2 8.9 7.5 Do you think altitude has an effect on running time? Let µ Y denote the mean time at a low altitude and µ X denote the mean time at a high altitude. Use α = 0.05. We want to test H 0 : µ Y µ X = µ d 0 against the alternative H A : µ Y µ X = µ d < 0. Solution I: Using the paired t-test Runner 2 3 4 5 6 7 8 9 0 High 9.4 9.8 9.9 0.3 8.9 8.8 9.8 8.2 9.4 9.9 2. Low 8.7 7.9 8.3 8.4 9.2 9. 8.2 8. 8.9 8.2 8.9 Low - High -0.7 -.9 -.6 -.9 0.3 0.3 -.6-0. -0.5 -.7-3.3 Using the sample differences, calculate the sample mean and sample variance: D =.2 and s 2 d = {(0.7.2)2 +...+(.8.2) 2 } =.6. Since D =.2, which is on the side of the alternative (noting it is a one-sided test), we can do the test. 2 3 Under the null hypothesis Example II: The use of cell phones D t(2 ). s 2 d /2 Now we construct a rejection region. The rejection region is towards the left hand side (we see this from the alternative). Hence we reject the null if D = Ȳ X s =.2 is less than 0 t 0.05 () 2 d 2 =.6.79 2 = 0.556. Since D =.2 is less than 0.566 we can reject the null. Equivalently. Using JMP we can get the p-value we see that P(t 3.89) = 0.0026, the p-value is so small that there is enough evidence to reject the null. There has been a lot of speculation that the use of cell phones while driving has increased the number of accidents. Scientists wanted to test whether talking on a cell phone increased a drivers reaction time. To test the hypothesis they randomly sampled 30 people and placed each of them in a car simulator. For each driver the reaction time to the sudden appearance of a colour stimuli when the driver was not on the the phone and when the driver was on the phone was recorded. Based on this sample the average reaction time when not on the phone was X = 0.5 seconds. The average response time when on the phone was Ȳ = 0.7 seconds. The pooled sample variance is s 2 p = 0.4 2 (s 2 p = 0.6) and the sample variance of the differences is s 2 d = 0.2 (s 2 d = 0.0). Is there evidence to suggest that using a cell phone increases the reaction 4 5

time while driving (state the test you would use, the hypothesis and do the test at the 5% level)? Solutions II Because the same person is being used in both experiments (and it is highly likely that reaction time is individual dependent), it would be wise to do a paired based test (since there is likely to be dependence between the pairs). Let µ X be the mean reaction time not on the phone and µ Y be the mean reaction time when on the phone. Let µ d = µ Y µ X, this is the mean reaction on the phone minus mean rection time when off the phone. We do not observe µ X, µ Y and µ d. However, it is conjectured that reaction time increases with cell phone use, if this is true then µ d = µ Y µ X > 0 (eg. average reaction time on phone minus average reaction time without phone is great than zero). Hence we want to test H 0 : µ d 0 against H A : µ d > 0. Since the pairs in the data are dependent, we use the paired t-test. This means to do 6 7 the test we use the sample variance of the differences which is s 2 d = 0.0 and as an estimate of µ d we use ˆD = Ȳ X = 0.7 0.5 = 0.2. Using D = 0.2, s 2 d = 0.0 and (n=30 - there are n pairs) we can do the test just as if it were a one-sample test but using the differences instead (hence we use a t-test - since we estimate the variance). We note that s the standard error is 2 d 0.0 n = 30 = 0.08. Question Suppose we want to test that the average reaction time increases by more than 0. second, what would be the conclusion of the test be (using α = 5%)? We construct a 5% rejection region (RR), it is on the right hand side (since H A is pointing right). Remember to construct the RR we need to center it about the mean in the null which is 0, hence the RR is any value D s greater than 0 + t 29 (0.05) 2 d n =.699 0.08 = 0.03. Since D = 0.2 > 0.03, there is evidence to reject the null. 8 9

What to do when the number of pairs is small? The sample size n = 6 is small, and we have used a t-distribution from 5 degrees of freedom (which means it is not very sensitive at detecting effects, since t 0.025 (5) is quite large). [But despite this we have still rejected the null hypothesis]. Important For obvious reasons a paired t-test is only possible when the two samples are of the same size. But to do the paired t-test we require the usual assumptions. If n is small, then the observations must from a normal distribution (difficult to check in practice). If n is large it does not matter because D will be almost normal. The small sample size begs the question; is there nonparametric version of this test, which does not require normality of D i?. A Nonparametric alternative: The Wilcoxon Sign-Rank test (uses Table 6) This is to test H 0 : µ d = 0 against H A : µ d 0. We do not require normality of D i, but the distribution of the differences D i must be symmetric about the median (one of the main assumptions in this test). Hence the test is equivalent to checking whether the median of the difference is zero, against the alternative that it is not zero. Recipe: Calculate the difference between the pairs of samples D i = X i Y i. 20 2 Delete all zero values and let n be the number of non-zero values. List the absolute values of the differences. To each rank give sign (negative or positive), depending on whether the difference is negative or positive. Add all the negative ranks together, call this T +. Add all the positive ranks together, call this T. Find the smaller of T and T +, and label this T (T = min(t, T + )). Wilcoxon sign-rank test for the Friday 3th data Sam. Sam. 2 Difference sign Abs. Rank - + 9 3-4 - 4 4 6 2-6 - 5 5 4-3 - 3 3 0 +.5.5 3 4 - -.5.5 5 2-7 - 6 6 Total T = 20.5 T + =.5 Look up Table 6. The columns are the sample size of the pairs, depending on whether you are doing a two-sided or one-sided test and α, is select the appropriate value. If T is less than this value, there is enough evidence to reject the null. The test is H 0 : is the mean number of accidents which happen on Friday 3th and 6th are the same (µ d = 0), against the alternative H A : the mean number of accidents which happen on Friday 3th and 6th are different. 22 23

The Wilcoxon sign rank test and Table 6 T + = sum of positive ranks (T = 20.5). T = sum of negative ranks (T + =.5). Though I think this is just because the sample size is too small and the power of the test is zero! If we increased the level to α = 0.2, then Value= 2 and we would be able reject the null. Based on this test, what are your conclusions about Friday 3th??? T = smaller of T + and T (in our example T =.5). Look up Table 6, for a particular p = α and n (so in our case we use a two-sided test with p = 0.05 and n = 6). Reject H 0 if T < value in the table. Looking up the tables with p = 0.05 and n = 6 we see that Value= 0. Since.5 > 0. Using the Wilcoxon signed-rank test we do not have enough evidence to reject the null hypothesis. 24 25 Reminder: What test to do when... If we want to test whether two samples come from the same population (or whether one distribution is a shift of another) and the both samples are of the same size and pairs of observations are independent and the sample size is large or small and data normal then use the t-test. the sample size is small and the data not normal then use the Wilcoxon rank sum (Mann-Whitney U) test. pairs of observations are dependent and the sample size is large or small and data normal then use the paired t-test. the sample size is small and the data not normal then use the Wilcoxon signed rank test. We should use the test which suits the data. Because loss of power can occur if we use the wrong test. Aside What happens when we use the wrong test: For the t-test comparing two sample with the same size we have that the non-rejection region is 2 [ t α/2 (2n 2)s p n, t 2 α/2(2n 2)s n ], s p pooled variance. For the paired t-test we use a t-distribution, and the non-rejection region is [ t α/2 (n )s n, t α/2(n )s d n ], s d = n i (d i d) 2. 26 27

Remember s d and s p are different. In the case of independent pairs s d 2s p. By comparing the non-rejection regions we see: If there is large dependence between pairs and we mistakenly use the t-test we loose power because s is large. If there is only a small dependence between pairs and we mistakenly use the paired t-test we loose power because we are using a t(n ) rather than t(2n 2). Example: Runners at altitude Runners were compared at a high and low altitude. For each runner, the running time was measure at a high altitude and then again at a low altitude. 2 runners were used. Runner 2 3 4 5 6 7 8 9 0 2 High 9.4 9.8 9.9 0.3 8.9 8.8 9.8 8.2 9.4 9.9 2.2 9.3 Low 8.7 7.9 8.3 8.4 9.2 9. 8.2 8. 8.9 8.2 8.9 7.5 Use the Wilcoxon sign rank test to test the hypothesis that the median difference is the same against the alternative that it is different. 28 29 Aside: The normal approximation of the sign-rank test When n > 50 we use a normal approximation. Let µ T = n (n + ) 4 σt 2 = n (n + )(2n + ). 24 Under the null Z = T µ T σ 2 T N(0,). Calculate P(Z < z ). If this is smaller than α/2 reject H 0. 30